|
ce0374a…
|
lmata
|
1 |
# Ingesting a Repo |
|
ce0374a…
|
lmata
|
2 |
|
|
ce0374a…
|
lmata
|
3 |
Navegador builds the graph from four sources: code, manual knowledge curation, GitHub wikis, and Planopticon knowledge graph output. |
|
ce0374a…
|
lmata
|
4 |
|
|
ce0374a…
|
lmata
|
5 |
--- |
|
ce0374a…
|
lmata
|
6 |
|
|
ce0374a…
|
lmata
|
7 |
## Code ingestion |
|
ce0374a…
|
lmata
|
8 |
|
|
ce0374a…
|
lmata
|
9 |
```bash |
|
ce0374a…
|
lmata
|
10 |
navegador ingest ./repo |
|
ce0374a…
|
lmata
|
11 |
``` |
|
ce0374a…
|
lmata
|
12 |
|
|
ce0374a…
|
lmata
|
13 |
### What gets extracted |
|
ce0374a…
|
lmata
|
14 |
|
|
8fe1420…
|
lmata
|
15 |
Navegador walks all source files in the repo and uses tree-sitter to extract structure. Supported languages: |
|
8fe1420…
|
lmata
|
16 |
|
|
89816aa…
|
lmata
|
17 |
| Extension(s) | Language | Extra | |
|
89816aa…
|
lmata
|
18 |
|---|---|---| |
|
89816aa…
|
lmata
|
19 |
| `.py` | Python | — | |
|
89816aa…
|
lmata
|
20 |
| `.ts`, `.tsx` | TypeScript | — | |
|
89816aa…
|
lmata
|
21 |
| `.js`, `.jsx` | JavaScript | — | |
|
89816aa…
|
lmata
|
22 |
| `.go` | Go | — | |
|
89816aa…
|
lmata
|
23 |
| `.rs` | Rust | — | |
|
89816aa…
|
lmata
|
24 |
| `.java` | Java | — | |
|
89816aa…
|
lmata
|
25 |
| `.kt`, `.kts` | Kotlin | `[languages]` | |
|
89816aa…
|
lmata
|
26 |
| `.cs` | C# | `[languages]` | |
|
89816aa…
|
lmata
|
27 |
| `.php` | PHP | `[languages]` | |
|
89816aa…
|
lmata
|
28 |
| `.rb` | Ruby | `[languages]` | |
|
89816aa…
|
lmata
|
29 |
| `.swift` | Swift | `[languages]` | |
|
89816aa…
|
lmata
|
30 |
| `.c`, `.h` | C | `[languages]` | |
|
89816aa…
|
lmata
|
31 |
| `.cpp`, `.cc`, `.cxx`, `.hpp` | C++ | `[languages]` | |
|
89816aa…
|
lmata
|
32 |
|
|
dcf17e9…
|
lmata
|
33 |
**Infrastructure-as-Code:** |
|
dcf17e9…
|
lmata
|
34 |
|
|
dcf17e9…
|
lmata
|
35 |
| Extension(s) | Language | Extra | |
|
dcf17e9…
|
lmata
|
36 |
|---|---|---| |
|
dcf17e9…
|
lmata
|
37 |
| `.tf`, `.hcl` | HCL / Terraform | `[iac]` | |
|
dcf17e9…
|
lmata
|
38 |
| `.pp` | Puppet | `[iac]` | |
|
dcf17e9…
|
lmata
|
39 |
| `.sh`, `.bash`, `.zsh` | Bash / Shell | `[iac]` | |
|
dcf17e9…
|
lmata
|
40 |
| `.yml`, `.yaml` | Ansible | `[iac]` (heuristic detection) | |
|
dcf17e9…
|
lmata
|
41 |
| `.rb` (in Chef cookbooks) | Chef | `[iac]` (enricher on Ruby parser) | |
|
dcf17e9…
|
lmata
|
42 |
|
|
dcf17e9…
|
lmata
|
43 |
Ansible files are not matched by extension — navegador detects them by directory structure (`roles/`, `playbooks/`, `group_vars/`, `host_vars/`) or content (`hosts:` + `tasks:` keys). Chef uses the existing Ruby parser; the Chef enricher promotes nodes with Chef-specific semantic types. |
|
dcf17e9…
|
lmata
|
44 |
|
|
dcf17e9…
|
lmata
|
45 |
Install language and IaC support: |
|
89816aa…
|
lmata
|
46 |
|
|
89816aa…
|
lmata
|
47 |
```bash |
|
89816aa…
|
lmata
|
48 |
pip install "navegador[languages]" |
|
dcf17e9…
|
lmata
|
49 |
pip install "navegador[iac]" |
|
89816aa…
|
lmata
|
50 |
``` |
|
8fe1420…
|
lmata
|
51 |
|
|
8fe1420…
|
lmata
|
52 |
The following directories are always skipped: `.git`, `.venv`, `venv`, `node_modules`, `__pycache__`, `dist`, `build`, `.next`, `target` (Rust/Java builds), `vendor` (Go modules), `.gradle`. |
|
8fe1420…
|
lmata
|
53 |
|
|
8fe1420…
|
lmata
|
54 |
### What gets extracted |
|
ce0374a…
|
lmata
|
55 |
|
|
ce0374a…
|
lmata
|
56 |
| What | Graph nodes / edges created | |
|
ce0374a…
|
lmata
|
57 |
|---|---| |
|
8fe1420…
|
lmata
|
58 |
| Files | `File` node; `CONTAINS` edge from `Repository` | |
|
8fe1420…
|
lmata
|
59 |
| Classes, structs, interfaces | `Class` node with `name`, `file`, `line`, `docstring` | |
|
8fe1420…
|
lmata
|
60 |
| Functions and methods | `Function` / `Method` nodes with `name`, `docstring`, `line` | |
|
8fe1420…
|
lmata
|
61 |
| Imports / use declarations | `Import` node; `IMPORTS` edge from the importing file | |
|
ce0374a…
|
lmata
|
62 |
| Call relationships | `CALLS` edges between functions based on static call analysis | |
|
8fe1420…
|
lmata
|
63 |
| Inheritance | `INHERITS` edges from subclass to parent | |
|
8fe1420…
|
lmata
|
64 |
|
|
8fe1420…
|
lmata
|
65 |
Doc comment formats supported per language: Python docstrings, JSDoc (`/** */`), Rust `///`, Java Javadoc. |
|
dcf17e9…
|
lmata
|
66 |
|
|
dcf17e9…
|
lmata
|
67 |
### IaC extraction |
|
dcf17e9…
|
lmata
|
68 |
|
|
dcf17e9…
|
lmata
|
69 |
IaC parsers map infrastructure constructs to the standard node labels with a `semantic_type` property for specificity: |
|
dcf17e9…
|
lmata
|
70 |
|
|
dcf17e9…
|
lmata
|
71 |
| Language | Construct | Node label | `semantic_type` | |
|
dcf17e9…
|
lmata
|
72 |
|---|---|---|---| |
|
dcf17e9…
|
lmata
|
73 |
| Terraform | `resource` | `Class` | `terraform_resource` | |
|
dcf17e9…
|
lmata
|
74 |
| Terraform | `variable` / `output` / `locals` | `Variable` | `terraform_variable` / `terraform_output` / `terraform_local` | |
|
dcf17e9…
|
lmata
|
75 |
| Terraform | `module` | `Module` | `terraform_module` | |
|
dcf17e9…
|
lmata
|
76 |
| Terraform | `data` / `provider` | `Class` | `terraform_data` / `terraform_provider` | |
|
dcf17e9…
|
lmata
|
77 |
| Puppet | `class` / `define` / `node` | `Class` | `puppet_class` / `puppet_defined_type` / `puppet_node` | |
|
dcf17e9…
|
lmata
|
78 |
| Puppet | resource declaration | `Function` | `puppet_resource` | |
|
dcf17e9…
|
lmata
|
79 |
| Puppet | `include` | `Import` | `puppet_include` | |
|
dcf17e9…
|
lmata
|
80 |
| Ansible | playbook file | `Module` | `ansible_playbook` | |
|
dcf17e9…
|
lmata
|
81 |
| Ansible | play | `Class` | `ansible_play` | |
|
dcf17e9…
|
lmata
|
82 |
| Ansible | task / handler | `Function` | `ansible_task` / `ansible_handler` | |
|
dcf17e9…
|
lmata
|
83 |
| Ansible | role | `Import` | `ansible_role` | |
|
dcf17e9…
|
lmata
|
84 |
| Bash | function | `Function` | `shell_function` | |
|
dcf17e9…
|
lmata
|
85 |
| Bash | variable | `Variable` | `shell_variable` | |
|
dcf17e9…
|
lmata
|
86 |
| Bash | `source` / `.` | `Import` | `shell_source` | |
|
dcf17e9…
|
lmata
|
87 |
|
|
dcf17e9…
|
lmata
|
88 |
Cross-references are extracted where possible: Terraform `var.x`, `module.x`, and resource-to-resource dependencies become `REFERENCES` / `DEPENDS_ON` edges. Ansible `notify:` keys create `CALLS` edges to handlers. Puppet `include` creates `IMPORTS` edges. |
|
89816aa…
|
lmata
|
89 |
|
|
ce0374a…
|
lmata
|
90 |
### Options |
|
ce0374a…
|
lmata
|
91 |
|
|
ce0374a…
|
lmata
|
92 |
| Flag | Effect | |
|
ce0374a…
|
lmata
|
93 |
|---|---| |
|
ce0374a…
|
lmata
|
94 |
| `--clear` | Wipe the graph before ingesting (full rebuild) | |
|
89816aa…
|
lmata
|
95 |
| `--incremental` | Only reprocess files whose content hash has changed | |
|
89816aa…
|
lmata
|
96 |
| `--watch` | Keep running and re-ingest on file changes | |
|
89816aa…
|
lmata
|
97 |
| `--redact` | Strip secrets (tokens, passwords, keys) from string literals | |
|
89816aa…
|
lmata
|
98 |
| `--monorepo` | Traverse workspace sub-packages (Turborepo, Nx, Yarn, pnpm, Cargo, Go) | |
|
ce0374a…
|
lmata
|
99 |
| `--json` | Output a JSON summary of nodes and edges created | |
|
ce0374a…
|
lmata
|
100 |
| `--db <path>` | Use a specific database file | |
|
ce0374a…
|
lmata
|
101 |
|
|
ce0374a…
|
lmata
|
102 |
### Re-ingesting |
|
ce0374a…
|
lmata
|
103 |
|
|
89816aa…
|
lmata
|
104 |
Re-run `navegador ingest` anytime to pick up changes. Nodes are upserted by identity (file path + name), so repeated ingestion is idempotent for unchanged nodes. Use `--incremental` for large repos to skip unchanged files. Use `--clear` when you need a clean slate (e.g., after a large rename refactor). |
|
89816aa…
|
lmata
|
105 |
|
|
89816aa…
|
lmata
|
106 |
### Incremental ingestion |
|
89816aa…
|
lmata
|
107 |
|
|
89816aa…
|
lmata
|
108 |
`--incremental` uses SHA-256 content hashing to skip files that haven't changed since the last ingest. The hash is stored on each `File` node. On large repos this can reduce ingest time by 90%+ after the initial run. |
|
89816aa…
|
lmata
|
109 |
|
|
89816aa…
|
lmata
|
110 |
```bash |
|
89816aa…
|
lmata
|
111 |
navegador ingest ./repo --incremental |
|
89816aa…
|
lmata
|
112 |
``` |
|
89816aa…
|
lmata
|
113 |
|
|
89816aa…
|
lmata
|
114 |
### Watch mode |
|
89816aa…
|
lmata
|
115 |
|
|
89816aa…
|
lmata
|
116 |
`--watch` starts a file-system watcher and automatically re-ingests any file that changes: |
|
89816aa…
|
lmata
|
117 |
|
|
89816aa…
|
lmata
|
118 |
```bash |
|
89816aa…
|
lmata
|
119 |
navegador ingest ./repo --watch |
|
89816aa…
|
lmata
|
120 |
``` |
|
89816aa…
|
lmata
|
121 |
|
|
89816aa…
|
lmata
|
122 |
Press `Ctrl-C` to stop. Watch mode uses `--incremental` automatically. |
|
89816aa…
|
lmata
|
123 |
|
|
89816aa…
|
lmata
|
124 |
### Sensitive content redaction |
|
89816aa…
|
lmata
|
125 |
|
|
89816aa…
|
lmata
|
126 |
`--redact` scans string literals for patterns that look like API keys, tokens, and passwords, and replaces their values with `[REDACTED]` in the graph. Source files are never modified. |
|
89816aa…
|
lmata
|
127 |
|
|
89816aa…
|
lmata
|
128 |
```bash |
|
89816aa…
|
lmata
|
129 |
navegador ingest ./repo --redact |
|
89816aa…
|
lmata
|
130 |
``` |
|
89816aa…
|
lmata
|
131 |
|
|
89816aa…
|
lmata
|
132 |
### Monorepo support |
|
89816aa…
|
lmata
|
133 |
|
|
89816aa…
|
lmata
|
134 |
`--monorepo` detects the workspace type and traverses all sub-packages: |
|
89816aa…
|
lmata
|
135 |
|
|
89816aa…
|
lmata
|
136 |
```bash |
|
89816aa…
|
lmata
|
137 |
navegador ingest ./monorepo --monorepo |
|
89816aa…
|
lmata
|
138 |
``` |
|
89816aa…
|
lmata
|
139 |
|
|
89816aa…
|
lmata
|
140 |
Supported workspace formats: Turborepo, Nx, Yarn workspaces, pnpm workspaces, Cargo workspaces, Go workspaces. |
|
ce0374a…
|
lmata
|
141 |
|
|
ce0374a…
|
lmata
|
142 |
--- |
|
ce0374a…
|
lmata
|
143 |
|
|
ce0374a…
|
lmata
|
144 |
## Knowledge curation |
|
ce0374a…
|
lmata
|
145 |
|
|
ce0374a…
|
lmata
|
146 |
Manual knowledge is added with `navegador add` commands and linked to code with `navegador annotate`. |
|
ce0374a…
|
lmata
|
147 |
|
|
ce0374a…
|
lmata
|
148 |
### Concepts |
|
ce0374a…
|
lmata
|
149 |
|
|
ce0374a…
|
lmata
|
150 |
A concept is a named idea or design pattern relevant to the codebase. |
|
ce0374a…
|
lmata
|
151 |
|
|
ce0374a…
|
lmata
|
152 |
```bash |
|
ce0374a…
|
lmata
|
153 |
navegador add concept "Idempotency" \ |
|
ce0374a…
|
lmata
|
154 |
--desc "Operations safe to retry without side effects" \ |
|
ce0374a…
|
lmata
|
155 |
--domain Payments |
|
ce0374a…
|
lmata
|
156 |
``` |
|
ce0374a…
|
lmata
|
157 |
|
|
ce0374a…
|
lmata
|
158 |
### Rules |
|
ce0374a…
|
lmata
|
159 |
|
|
ce0374a…
|
lmata
|
160 |
A rule is an enforceable constraint on code behaviour. |
|
ce0374a…
|
lmata
|
161 |
|
|
ce0374a…
|
lmata
|
162 |
```bash |
|
ce0374a…
|
lmata
|
163 |
navegador add rule "RequireIdempotencyKey" \ |
|
ce0374a…
|
lmata
|
164 |
--desc "All write endpoints must accept an idempotency key header" \ |
|
ce0374a…
|
lmata
|
165 |
--domain Payments \ |
|
ce0374a…
|
lmata
|
166 |
--severity critical \ |
|
ce0374a…
|
lmata
|
167 |
--rationale "Prevents double-processing on client retries" |
|
ce0374a…
|
lmata
|
168 |
``` |
|
ce0374a…
|
lmata
|
169 |
|
|
ce0374a…
|
lmata
|
170 |
Severity values: `info`, `warning`, `critical`. |
|
ce0374a…
|
lmata
|
171 |
|
|
ce0374a…
|
lmata
|
172 |
### Decisions |
|
ce0374a…
|
lmata
|
173 |
|
|
ce0374a…
|
lmata
|
174 |
An architectural decision record (ADR) stored in the graph. |
|
ce0374a…
|
lmata
|
175 |
|
|
ce0374a…
|
lmata
|
176 |
```bash |
|
ce0374a…
|
lmata
|
177 |
navegador add decision "UsePostgresForTransactions" \ |
|
ce0374a…
|
lmata
|
178 |
--desc "PostgreSQL is the primary datastore for transactional data" \ |
|
ce0374a…
|
lmata
|
179 |
--domain Infrastructure \ |
|
ce0374a…
|
lmata
|
180 |
--rationale "ACID guarantees required for financial data" \ |
|
ce0374a…
|
lmata
|
181 |
--alternatives "MySQL, CockroachDB" \ |
|
ce0374a…
|
lmata
|
182 |
--date 2025-03-01 \ |
|
ce0374a…
|
lmata
|
183 |
--status accepted |
|
ce0374a…
|
lmata
|
184 |
``` |
|
ce0374a…
|
lmata
|
185 |
|
|
ce0374a…
|
lmata
|
186 |
Status values: `proposed`, `accepted`, `deprecated`, `superseded`. |
|
ce0374a…
|
lmata
|
187 |
|
|
ce0374a…
|
lmata
|
188 |
### People |
|
ce0374a…
|
lmata
|
189 |
|
|
ce0374a…
|
lmata
|
190 |
```bash |
|
ce0374a…
|
lmata
|
191 |
navegador add person "Alice Chen" \ |
|
ce0374a…
|
lmata
|
192 |
--email [email protected] \ |
|
ce0374a…
|
lmata
|
193 |
--role "Lead Engineer" \ |
|
ce0374a…
|
lmata
|
194 |
--team Payments |
|
ce0374a…
|
lmata
|
195 |
``` |
|
ce0374a…
|
lmata
|
196 |
|
|
ce0374a…
|
lmata
|
197 |
### Domains |
|
ce0374a…
|
lmata
|
198 |
|
|
ce0374a…
|
lmata
|
199 |
Domains are top-level groupings for concepts, rules, and decisions. |
|
ce0374a…
|
lmata
|
200 |
|
|
ce0374a…
|
lmata
|
201 |
```bash |
|
ce0374a…
|
lmata
|
202 |
navegador add domain "Payments" \ |
|
ce0374a…
|
lmata
|
203 |
--desc "Everything related to payment processing and billing" |
|
ce0374a…
|
lmata
|
204 |
``` |
|
ce0374a…
|
lmata
|
205 |
|
|
ce0374a…
|
lmata
|
206 |
### Annotating code |
|
ce0374a…
|
lmata
|
207 |
|
|
ce0374a…
|
lmata
|
208 |
Link a code node to a concept or rule: |
|
ce0374a…
|
lmata
|
209 |
|
|
ce0374a…
|
lmata
|
210 |
```bash |
|
ce0374a…
|
lmata
|
211 |
navegador annotate process_payment \ |
|
ce0374a…
|
lmata
|
212 |
--type Function \ |
|
ce0374a…
|
lmata
|
213 |
--concept Idempotency \ |
|
ce0374a…
|
lmata
|
214 |
--rule RequireIdempotencyKey |
|
ce0374a…
|
lmata
|
215 |
``` |
|
ce0374a…
|
lmata
|
216 |
|
|
ce0374a…
|
lmata
|
217 |
`--type` accepts: `Function`, `Class`, `Method`, `File`, `Module`. |
|
ce0374a…
|
lmata
|
218 |
|
|
ce0374a…
|
lmata
|
219 |
This creates `ANNOTATES` edges between the knowledge nodes and the code node. The code node then appears in results for `navegador concept Idempotency` and `navegador explain process_payment`. |
|
ce0374a…
|
lmata
|
220 |
|
|
ce0374a…
|
lmata
|
221 |
--- |
|
ce0374a…
|
lmata
|
222 |
|
|
ce0374a…
|
lmata
|
223 |
## Wiki ingestion |
|
ce0374a…
|
lmata
|
224 |
|
|
ce0374a…
|
lmata
|
225 |
Pull a GitHub wiki into the graph as `WikiPage` nodes. |
|
ce0374a…
|
lmata
|
226 |
|
|
ce0374a…
|
lmata
|
227 |
```bash |
|
ce0374a…
|
lmata
|
228 |
# ingest from GitHub API |
|
ce0374a…
|
lmata
|
229 |
navegador wiki ingest --repo myorg/myrepo --token $GITHUB_TOKEN |
|
ce0374a…
|
lmata
|
230 |
|
|
ce0374a…
|
lmata
|
231 |
# ingest from a locally cloned wiki directory |
|
ce0374a…
|
lmata
|
232 |
navegador wiki ingest --dir ./myrepo.wiki |
|
ce0374a…
|
lmata
|
233 |
|
|
ce0374a…
|
lmata
|
234 |
# force API mode (bypass auto-detection) |
|
ce0374a…
|
lmata
|
235 |
navegador wiki ingest --repo myorg/myrepo --api |
|
ce0374a…
|
lmata
|
236 |
``` |
|
ce0374a…
|
lmata
|
237 |
|
|
ce0374a…
|
lmata
|
238 |
Each wiki page becomes a `WikiPage` node with `title`, `content`, `url`, and `updated_at` properties. Pages are linked to relevant `Concept`, `Domain`, or `Function` nodes with `DOCUMENTS` edges where names match. |
|
ce0374a…
|
lmata
|
239 |
|
|
ce0374a…
|
lmata
|
240 |
Set `GITHUB_TOKEN` in your environment to avoid rate limits and to access private wikis. |
|
ce0374a…
|
lmata
|
241 |
|
|
ce0374a…
|
lmata
|
242 |
--- |
|
ce0374a…
|
lmata
|
243 |
|
|
ce0374a…
|
lmata
|
244 |
## Planopticon ingestion |
|
ce0374a…
|
lmata
|
245 |
|
|
ce0374a…
|
lmata
|
246 |
[Planopticon](planopticon.md) is a video/meeting knowledge extraction tool. It produces structured knowledge graph output that navegador can ingest directly. |
|
ce0374a…
|
lmata
|
247 |
|
|
ce0374a…
|
lmata
|
248 |
```bash |
|
ce0374a…
|
lmata
|
249 |
navegador planopticon ingest ./meeting-output/ --type auto |
|
ce0374a…
|
lmata
|
250 |
``` |
|
5e4b8e4…
|
anonymous
|
251 |
|
|
ce0374a…
|
lmata
|
252 |
See the [Planopticon guide](planopticon.md) for the full input format reference and entity mapping details. |