Navegador

navegador / docs / guide / ingestion.md
Source Blame History 252 lines
ce0374a… lmata 1 # Ingesting a Repo
ce0374a… lmata 2
ce0374a… lmata 3 Navegador builds the graph from four sources: code, manual knowledge curation, GitHub wikis, and Planopticon knowledge graph output.
ce0374a… lmata 4
ce0374a… lmata 5 ---
ce0374a… lmata 6
ce0374a… lmata 7 ## Code ingestion
ce0374a… lmata 8
ce0374a… lmata 9 ```bash
ce0374a… lmata 10 navegador ingest ./repo
ce0374a… lmata 11 ```
ce0374a… lmata 12
ce0374a… lmata 13 ### What gets extracted
ce0374a… lmata 14
8fe1420… lmata 15 Navegador walks all source files in the repo and uses tree-sitter to extract structure. Supported languages:
8fe1420… lmata 16
89816aa… lmata 17 | Extension(s) | Language | Extra |
89816aa… lmata 18 |---|---|---|
89816aa… lmata 19 | `.py` | Python | — |
89816aa… lmata 20 | `.ts`, `.tsx` | TypeScript | — |
89816aa… lmata 21 | `.js`, `.jsx` | JavaScript | — |
89816aa… lmata 22 | `.go` | Go | — |
89816aa… lmata 23 | `.rs` | Rust | — |
89816aa… lmata 24 | `.java` | Java | — |
89816aa… lmata 25 | `.kt`, `.kts` | Kotlin | `[languages]` |
89816aa… lmata 26 | `.cs` | C# | `[languages]` |
89816aa… lmata 27 | `.php` | PHP | `[languages]` |
89816aa… lmata 28 | `.rb` | Ruby | `[languages]` |
89816aa… lmata 29 | `.swift` | Swift | `[languages]` |
89816aa… lmata 30 | `.c`, `.h` | C | `[languages]` |
89816aa… lmata 31 | `.cpp`, `.cc`, `.cxx`, `.hpp` | C++ | `[languages]` |
89816aa… lmata 32
dcf17e9… lmata 33 **Infrastructure-as-Code:**
dcf17e9… lmata 34
dcf17e9… lmata 35 | Extension(s) | Language | Extra |
dcf17e9… lmata 36 |---|---|---|
dcf17e9… lmata 37 | `.tf`, `.hcl` | HCL / Terraform | `[iac]` |
dcf17e9… lmata 38 | `.pp` | Puppet | `[iac]` |
dcf17e9… lmata 39 | `.sh`, `.bash`, `.zsh` | Bash / Shell | `[iac]` |
dcf17e9… lmata 40 | `.yml`, `.yaml` | Ansible | `[iac]` (heuristic detection) |
dcf17e9… lmata 41 | `.rb` (in Chef cookbooks) | Chef | `[iac]` (enricher on Ruby parser) |
dcf17e9… lmata 42
dcf17e9… lmata 43 Ansible files are not matched by extension — navegador detects them by directory structure (`roles/`, `playbooks/`, `group_vars/`, `host_vars/`) or content (`hosts:` + `tasks:` keys). Chef uses the existing Ruby parser; the Chef enricher promotes nodes with Chef-specific semantic types.
dcf17e9… lmata 44
dcf17e9… lmata 45 Install language and IaC support:
89816aa… lmata 46
89816aa… lmata 47 ```bash
89816aa… lmata 48 pip install "navegador[languages]"
dcf17e9… lmata 49 pip install "navegador[iac]"
89816aa… lmata 50 ```
8fe1420… lmata 51
8fe1420… lmata 52 The following directories are always skipped: `.git`, `.venv`, `venv`, `node_modules`, `__pycache__`, `dist`, `build`, `.next`, `target` (Rust/Java builds), `vendor` (Go modules), `.gradle`.
8fe1420… lmata 53
8fe1420… lmata 54 ### What gets extracted
ce0374a… lmata 55
ce0374a… lmata 56 | What | Graph nodes / edges created |
ce0374a… lmata 57 |---|---|
8fe1420… lmata 58 | Files | `File` node; `CONTAINS` edge from `Repository` |
8fe1420… lmata 59 | Classes, structs, interfaces | `Class` node with `name`, `file`, `line`, `docstring` |
8fe1420… lmata 60 | Functions and methods | `Function` / `Method` nodes with `name`, `docstring`, `line` |
8fe1420… lmata 61 | Imports / use declarations | `Import` node; `IMPORTS` edge from the importing file |
ce0374a… lmata 62 | Call relationships | `CALLS` edges between functions based on static call analysis |
8fe1420… lmata 63 | Inheritance | `INHERITS` edges from subclass to parent |
8fe1420… lmata 64
8fe1420… lmata 65 Doc comment formats supported per language: Python docstrings, JSDoc (`/** */`), Rust `///`, Java Javadoc.
dcf17e9… lmata 66
dcf17e9… lmata 67 ### IaC extraction
dcf17e9… lmata 68
dcf17e9… lmata 69 IaC parsers map infrastructure constructs to the standard node labels with a `semantic_type` property for specificity:
dcf17e9… lmata 70
dcf17e9… lmata 71 | Language | Construct | Node label | `semantic_type` |
dcf17e9… lmata 72 |---|---|---|---|
dcf17e9… lmata 73 | Terraform | `resource` | `Class` | `terraform_resource` |
dcf17e9… lmata 74 | Terraform | `variable` / `output` / `locals` | `Variable` | `terraform_variable` / `terraform_output` / `terraform_local` |
dcf17e9… lmata 75 | Terraform | `module` | `Module` | `terraform_module` |
dcf17e9… lmata 76 | Terraform | `data` / `provider` | `Class` | `terraform_data` / `terraform_provider` |
dcf17e9… lmata 77 | Puppet | `class` / `define` / `node` | `Class` | `puppet_class` / `puppet_defined_type` / `puppet_node` |
dcf17e9… lmata 78 | Puppet | resource declaration | `Function` | `puppet_resource` |
dcf17e9… lmata 79 | Puppet | `include` | `Import` | `puppet_include` |
dcf17e9… lmata 80 | Ansible | playbook file | `Module` | `ansible_playbook` |
dcf17e9… lmata 81 | Ansible | play | `Class` | `ansible_play` |
dcf17e9… lmata 82 | Ansible | task / handler | `Function` | `ansible_task` / `ansible_handler` |
dcf17e9… lmata 83 | Ansible | role | `Import` | `ansible_role` |
dcf17e9… lmata 84 | Bash | function | `Function` | `shell_function` |
dcf17e9… lmata 85 | Bash | variable | `Variable` | `shell_variable` |
dcf17e9… lmata 86 | Bash | `source` / `.` | `Import` | `shell_source` |
dcf17e9… lmata 87
dcf17e9… lmata 88 Cross-references are extracted where possible: Terraform `var.x`, `module.x`, and resource-to-resource dependencies become `REFERENCES` / `DEPENDS_ON` edges. Ansible `notify:` keys create `CALLS` edges to handlers. Puppet `include` creates `IMPORTS` edges.
89816aa… lmata 89
ce0374a… lmata 90 ### Options
ce0374a… lmata 91
ce0374a… lmata 92 | Flag | Effect |
ce0374a… lmata 93 |---|---|
ce0374a… lmata 94 | `--clear` | Wipe the graph before ingesting (full rebuild) |
89816aa… lmata 95 | `--incremental` | Only reprocess files whose content hash has changed |
89816aa… lmata 96 | `--watch` | Keep running and re-ingest on file changes |
89816aa… lmata 97 | `--redact` | Strip secrets (tokens, passwords, keys) from string literals |
89816aa… lmata 98 | `--monorepo` | Traverse workspace sub-packages (Turborepo, Nx, Yarn, pnpm, Cargo, Go) |
ce0374a… lmata 99 | `--json` | Output a JSON summary of nodes and edges created |
ce0374a… lmata 100 | `--db <path>` | Use a specific database file |
ce0374a… lmata 101
ce0374a… lmata 102 ### Re-ingesting
ce0374a… lmata 103
89816aa… lmata 104 Re-run `navegador ingest` anytime to pick up changes. Nodes are upserted by identity (file path + name), so repeated ingestion is idempotent for unchanged nodes. Use `--incremental` for large repos to skip unchanged files. Use `--clear` when you need a clean slate (e.g., after a large rename refactor).
89816aa… lmata 105
89816aa… lmata 106 ### Incremental ingestion
89816aa… lmata 107
89816aa… lmata 108 `--incremental` uses SHA-256 content hashing to skip files that haven't changed since the last ingest. The hash is stored on each `File` node. On large repos this can reduce ingest time by 90%+ after the initial run.
89816aa… lmata 109
89816aa… lmata 110 ```bash
89816aa… lmata 111 navegador ingest ./repo --incremental
89816aa… lmata 112 ```
89816aa… lmata 113
89816aa… lmata 114 ### Watch mode
89816aa… lmata 115
89816aa… lmata 116 `--watch` starts a file-system watcher and automatically re-ingests any file that changes:
89816aa… lmata 117
89816aa… lmata 118 ```bash
89816aa… lmata 119 navegador ingest ./repo --watch
89816aa… lmata 120 ```
89816aa… lmata 121
89816aa… lmata 122 Press `Ctrl-C` to stop. Watch mode uses `--incremental` automatically.
89816aa… lmata 123
89816aa… lmata 124 ### Sensitive content redaction
89816aa… lmata 125
89816aa… lmata 126 `--redact` scans string literals for patterns that look like API keys, tokens, and passwords, and replaces their values with `[REDACTED]` in the graph. Source files are never modified.
89816aa… lmata 127
89816aa… lmata 128 ```bash
89816aa… lmata 129 navegador ingest ./repo --redact
89816aa… lmata 130 ```
89816aa… lmata 131
89816aa… lmata 132 ### Monorepo support
89816aa… lmata 133
89816aa… lmata 134 `--monorepo` detects the workspace type and traverses all sub-packages:
89816aa… lmata 135
89816aa… lmata 136 ```bash
89816aa… lmata 137 navegador ingest ./monorepo --monorepo
89816aa… lmata 138 ```
89816aa… lmata 139
89816aa… lmata 140 Supported workspace formats: Turborepo, Nx, Yarn workspaces, pnpm workspaces, Cargo workspaces, Go workspaces.
ce0374a… lmata 141
ce0374a… lmata 142 ---
ce0374a… lmata 143
ce0374a… lmata 144 ## Knowledge curation
ce0374a… lmata 145
ce0374a… lmata 146 Manual knowledge is added with `navegador add` commands and linked to code with `navegador annotate`.
ce0374a… lmata 147
ce0374a… lmata 148 ### Concepts
ce0374a… lmata 149
ce0374a… lmata 150 A concept is a named idea or design pattern relevant to the codebase.
ce0374a… lmata 151
ce0374a… lmata 152 ```bash
ce0374a… lmata 153 navegador add concept "Idempotency" \
ce0374a… lmata 154 --desc "Operations safe to retry without side effects" \
ce0374a… lmata 155 --domain Payments
ce0374a… lmata 156 ```
ce0374a… lmata 157
ce0374a… lmata 158 ### Rules
ce0374a… lmata 159
ce0374a… lmata 160 A rule is an enforceable constraint on code behaviour.
ce0374a… lmata 161
ce0374a… lmata 162 ```bash
ce0374a… lmata 163 navegador add rule "RequireIdempotencyKey" \
ce0374a… lmata 164 --desc "All write endpoints must accept an idempotency key header" \
ce0374a… lmata 165 --domain Payments \
ce0374a… lmata 166 --severity critical \
ce0374a… lmata 167 --rationale "Prevents double-processing on client retries"
ce0374a… lmata 168 ```
ce0374a… lmata 169
ce0374a… lmata 170 Severity values: `info`, `warning`, `critical`.
ce0374a… lmata 171
ce0374a… lmata 172 ### Decisions
ce0374a… lmata 173
ce0374a… lmata 174 An architectural decision record (ADR) stored in the graph.
ce0374a… lmata 175
ce0374a… lmata 176 ```bash
ce0374a… lmata 177 navegador add decision "UsePostgresForTransactions" \
ce0374a… lmata 178 --desc "PostgreSQL is the primary datastore for transactional data" \
ce0374a… lmata 179 --domain Infrastructure \
ce0374a… lmata 180 --rationale "ACID guarantees required for financial data" \
ce0374a… lmata 181 --alternatives "MySQL, CockroachDB" \
ce0374a… lmata 182 --date 2025-03-01 \
ce0374a… lmata 183 --status accepted
ce0374a… lmata 184 ```
ce0374a… lmata 185
ce0374a… lmata 186 Status values: `proposed`, `accepted`, `deprecated`, `superseded`.
ce0374a… lmata 187
ce0374a… lmata 188 ### People
ce0374a… lmata 189
ce0374a… lmata 190 ```bash
ce0374a… lmata 191 navegador add person "Alice Chen" \
ce0374a… lmata 192 --email [email protected] \
ce0374a… lmata 193 --role "Lead Engineer" \
ce0374a… lmata 194 --team Payments
ce0374a… lmata 195 ```
ce0374a… lmata 196
ce0374a… lmata 197 ### Domains
ce0374a… lmata 198
ce0374a… lmata 199 Domains are top-level groupings for concepts, rules, and decisions.
ce0374a… lmata 200
ce0374a… lmata 201 ```bash
ce0374a… lmata 202 navegador add domain "Payments" \
ce0374a… lmata 203 --desc "Everything related to payment processing and billing"
ce0374a… lmata 204 ```
ce0374a… lmata 205
ce0374a… lmata 206 ### Annotating code
ce0374a… lmata 207
ce0374a… lmata 208 Link a code node to a concept or rule:
ce0374a… lmata 209
ce0374a… lmata 210 ```bash
ce0374a… lmata 211 navegador annotate process_payment \
ce0374a… lmata 212 --type Function \
ce0374a… lmata 213 --concept Idempotency \
ce0374a… lmata 214 --rule RequireIdempotencyKey
ce0374a… lmata 215 ```
ce0374a… lmata 216
ce0374a… lmata 217 `--type` accepts: `Function`, `Class`, `Method`, `File`, `Module`.
ce0374a… lmata 218
ce0374a… lmata 219 This creates `ANNOTATES` edges between the knowledge nodes and the code node. The code node then appears in results for `navegador concept Idempotency` and `navegador explain process_payment`.
ce0374a… lmata 220
ce0374a… lmata 221 ---
ce0374a… lmata 222
ce0374a… lmata 223 ## Wiki ingestion
ce0374a… lmata 224
ce0374a… lmata 225 Pull a GitHub wiki into the graph as `WikiPage` nodes.
ce0374a… lmata 226
ce0374a… lmata 227 ```bash
ce0374a… lmata 228 # ingest from GitHub API
ce0374a… lmata 229 navegador wiki ingest --repo myorg/myrepo --token $GITHUB_TOKEN
ce0374a… lmata 230
ce0374a… lmata 231 # ingest from a locally cloned wiki directory
ce0374a… lmata 232 navegador wiki ingest --dir ./myrepo.wiki
ce0374a… lmata 233
ce0374a… lmata 234 # force API mode (bypass auto-detection)
ce0374a… lmata 235 navegador wiki ingest --repo myorg/myrepo --api
ce0374a… lmata 236 ```
ce0374a… lmata 237
ce0374a… lmata 238 Each wiki page becomes a `WikiPage` node with `title`, `content`, `url`, and `updated_at` properties. Pages are linked to relevant `Concept`, `Domain`, or `Function` nodes with `DOCUMENTS` edges where names match.
ce0374a… lmata 239
ce0374a… lmata 240 Set `GITHUB_TOKEN` in your environment to avoid rate limits and to access private wikis.
ce0374a… lmata 241
ce0374a… lmata 242 ---
ce0374a… lmata 243
ce0374a… lmata 244 ## Planopticon ingestion
ce0374a… lmata 245
ce0374a… lmata 246 [Planopticon](planopticon.md) is a video/meeting knowledge extraction tool. It produces structured knowledge graph output that navegador can ingest directly.
ce0374a… lmata 247
ce0374a… lmata 248 ```bash
ce0374a… lmata 249 navegador planopticon ingest ./meeting-output/ --type auto
ce0374a… lmata 250 ```
5e4b8e4… anonymous 251
ce0374a… lmata 252 See the [Planopticon guide](planopticon.md) for the full input format reference and entity mapping details.

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button