Navegador

navegador / docs / guide / ingestion.md
1
# Ingesting a Repo
2
3
Navegador builds the graph from four sources: code, manual knowledge curation, GitHub wikis, and Planopticon knowledge graph output.
4
5
---
6
7
## Code ingestion
8
9
```bash
10
navegador ingest ./repo
11
```
12
13
### What gets extracted
14
15
Navegador walks all source files in the repo and uses tree-sitter to extract structure. Supported languages:
16
17
| Extension(s) | Language | Extra |
18
|---|---|---|
19
| `.py` | Python | — |
20
| `.ts`, `.tsx` | TypeScript | — |
21
| `.js`, `.jsx` | JavaScript | — |
22
| `.go` | Go | — |
23
| `.rs` | Rust | — |
24
| `.java` | Java | — |
25
| `.kt`, `.kts` | Kotlin | `[languages]` |
26
| `.cs` | C# | `[languages]` |
27
| `.php` | PHP | `[languages]` |
28
| `.rb` | Ruby | `[languages]` |
29
| `.swift` | Swift | `[languages]` |
30
| `.c`, `.h` | C | `[languages]` |
31
| `.cpp`, `.cc`, `.cxx`, `.hpp` | C++ | `[languages]` |
32
33
**Infrastructure-as-Code:**
34
35
| Extension(s) | Language | Extra |
36
|---|---|---|
37
| `.tf`, `.hcl` | HCL / Terraform | `[iac]` |
38
| `.pp` | Puppet | `[iac]` |
39
| `.sh`, `.bash`, `.zsh` | Bash / Shell | `[iac]` |
40
| `.yml`, `.yaml` | Ansible | `[iac]` (heuristic detection) |
41
| `.rb` (in Chef cookbooks) | Chef | `[iac]` (enricher on Ruby parser) |
42
43
Ansible files are not matched by extension — navegador detects them by directory structure (`roles/`, `playbooks/`, `group_vars/`, `host_vars/`) or content (`hosts:` + `tasks:` keys). Chef uses the existing Ruby parser; the Chef enricher promotes nodes with Chef-specific semantic types.
44
45
Install language and IaC support:
46
47
```bash
48
pip install "navegador[languages]"
49
pip install "navegador[iac]"
50
```
51
52
The following directories are always skipped: `.git`, `.venv`, `venv`, `node_modules`, `__pycache__`, `dist`, `build`, `.next`, `target` (Rust/Java builds), `vendor` (Go modules), `.gradle`.
53
54
### What gets extracted
55
56
| What | Graph nodes / edges created |
57
|---|---|
58
| Files | `File` node; `CONTAINS` edge from `Repository` |
59
| Classes, structs, interfaces | `Class` node with `name`, `file`, `line`, `docstring` |
60
| Functions and methods | `Function` / `Method` nodes with `name`, `docstring`, `line` |
61
| Imports / use declarations | `Import` node; `IMPORTS` edge from the importing file |
62
| Call relationships | `CALLS` edges between functions based on static call analysis |
63
| Inheritance | `INHERITS` edges from subclass to parent |
64
65
Doc comment formats supported per language: Python docstrings, JSDoc (`/** */`), Rust `///`, Java Javadoc.
66
67
### IaC extraction
68
69
IaC parsers map infrastructure constructs to the standard node labels with a `semantic_type` property for specificity:
70
71
| Language | Construct | Node label | `semantic_type` |
72
|---|---|---|---|
73
| Terraform | `resource` | `Class` | `terraform_resource` |
74
| Terraform | `variable` / `output` / `locals` | `Variable` | `terraform_variable` / `terraform_output` / `terraform_local` |
75
| Terraform | `module` | `Module` | `terraform_module` |
76
| Terraform | `data` / `provider` | `Class` | `terraform_data` / `terraform_provider` |
77
| Puppet | `class` / `define` / `node` | `Class` | `puppet_class` / `puppet_defined_type` / `puppet_node` |
78
| Puppet | resource declaration | `Function` | `puppet_resource` |
79
| Puppet | `include` | `Import` | `puppet_include` |
80
| Ansible | playbook file | `Module` | `ansible_playbook` |
81
| Ansible | play | `Class` | `ansible_play` |
82
| Ansible | task / handler | `Function` | `ansible_task` / `ansible_handler` |
83
| Ansible | role | `Import` | `ansible_role` |
84
| Bash | function | `Function` | `shell_function` |
85
| Bash | variable | `Variable` | `shell_variable` |
86
| Bash | `source` / `.` | `Import` | `shell_source` |
87
88
Cross-references are extracted where possible: Terraform `var.x`, `module.x`, and resource-to-resource dependencies become `REFERENCES` / `DEPENDS_ON` edges. Ansible `notify:` keys create `CALLS` edges to handlers. Puppet `include` creates `IMPORTS` edges.
89
90
### Options
91
92
| Flag | Effect |
93
|---|---|
94
| `--clear` | Wipe the graph before ingesting (full rebuild) |
95
| `--incremental` | Only reprocess files whose content hash has changed |
96
| `--watch` | Keep running and re-ingest on file changes |
97
| `--redact` | Strip secrets (tokens, passwords, keys) from string literals |
98
| `--monorepo` | Traverse workspace sub-packages (Turborepo, Nx, Yarn, pnpm, Cargo, Go) |
99
| `--json` | Output a JSON summary of nodes and edges created |
100
| `--db <path>` | Use a specific database file |
101
102
### Re-ingesting
103
104
Re-run `navegador ingest` anytime to pick up changes. Nodes are upserted by identity (file path + name), so repeated ingestion is idempotent for unchanged nodes. Use `--incremental` for large repos to skip unchanged files. Use `--clear` when you need a clean slate (e.g., after a large rename refactor).
105
106
### Incremental ingestion
107
108
`--incremental` uses SHA-256 content hashing to skip files that haven't changed since the last ingest. The hash is stored on each `File` node. On large repos this can reduce ingest time by 90%+ after the initial run.
109
110
```bash
111
navegador ingest ./repo --incremental
112
```
113
114
### Watch mode
115
116
`--watch` starts a file-system watcher and automatically re-ingests any file that changes:
117
118
```bash
119
navegador ingest ./repo --watch
120
```
121
122
Press `Ctrl-C` to stop. Watch mode uses `--incremental` automatically.
123
124
### Sensitive content redaction
125
126
`--redact` scans string literals for patterns that look like API keys, tokens, and passwords, and replaces their values with `[REDACTED]` in the graph. Source files are never modified.
127
128
```bash
129
navegador ingest ./repo --redact
130
```
131
132
### Monorepo support
133
134
`--monorepo` detects the workspace type and traverses all sub-packages:
135
136
```bash
137
navegador ingest ./monorepo --monorepo
138
```
139
140
Supported workspace formats: Turborepo, Nx, Yarn workspaces, pnpm workspaces, Cargo workspaces, Go workspaces.
141
142
---
143
144
## Knowledge curation
145
146
Manual knowledge is added with `navegador add` commands and linked to code with `navegador annotate`.
147
148
### Concepts
149
150
A concept is a named idea or design pattern relevant to the codebase.
151
152
```bash
153
navegador add concept "Idempotency" \
154
--desc "Operations safe to retry without side effects" \
155
--domain Payments
156
```
157
158
### Rules
159
160
A rule is an enforceable constraint on code behaviour.
161
162
```bash
163
navegador add rule "RequireIdempotencyKey" \
164
--desc "All write endpoints must accept an idempotency key header" \
165
--domain Payments \
166
--severity critical \
167
--rationale "Prevents double-processing on client retries"
168
```
169
170
Severity values: `info`, `warning`, `critical`.
171
172
### Decisions
173
174
An architectural decision record (ADR) stored in the graph.
175
176
```bash
177
navegador add decision "UsePostgresForTransactions" \
178
--desc "PostgreSQL is the primary datastore for transactional data" \
179
--domain Infrastructure \
180
--rationale "ACID guarantees required for financial data" \
181
--alternatives "MySQL, CockroachDB" \
182
--date 2025-03-01 \
183
--status accepted
184
```
185
186
Status values: `proposed`, `accepted`, `deprecated`, `superseded`.
187
188
### People
189
190
```bash
191
navegador add person "Alice Chen" \
192
--email [email protected] \
193
--role "Lead Engineer" \
194
--team Payments
195
```
196
197
### Domains
198
199
Domains are top-level groupings for concepts, rules, and decisions.
200
201
```bash
202
navegador add domain "Payments" \
203
--desc "Everything related to payment processing and billing"
204
```
205
206
### Annotating code
207
208
Link a code node to a concept or rule:
209
210
```bash
211
navegador annotate process_payment \
212
--type Function \
213
--concept Idempotency \
214
--rule RequireIdempotencyKey
215
```
216
217
`--type` accepts: `Function`, `Class`, `Method`, `File`, `Module`.
218
219
This creates `ANNOTATES` edges between the knowledge nodes and the code node. The code node then appears in results for `navegador concept Idempotency` and `navegador explain process_payment`.
220
221
---
222
223
## Wiki ingestion
224
225
Pull a GitHub wiki into the graph as `WikiPage` nodes.
226
227
```bash
228
# ingest from GitHub API
229
navegador wiki ingest --repo myorg/myrepo --token $GITHUB_TOKEN
230
231
# ingest from a locally cloned wiki directory
232
navegador wiki ingest --dir ./myrepo.wiki
233
234
# force API mode (bypass auto-detection)
235
navegador wiki ingest --repo myorg/myrepo --api
236
```
237
238
Each wiki page becomes a `WikiPage` node with `title`, `content`, `url`, and `updated_at` properties. Pages are linked to relevant `Concept`, `Domain`, or `Function` nodes with `DOCUMENTS` edges where names match.
239
240
Set `GITHUB_TOKEN` in your environment to avoid rate limits and to access private wikis.
241
242
---
243
244
## Planopticon ingestion
245
246
[Planopticon](planopticon.md) is a video/meeting knowledge extraction tool. It produces structured knowledge graph output that navegador can ingest directly.
247
248
```bash
249
navegador planopticon ingest ./meeting-output/ --type auto
250
```
251
252
See the [Planopticon guide](planopticon.md) for the full input format reference and entity mapping details.
253

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button