Navegador

navegador / docs / api / ingestion.md
Source Blame History 359 lines
ce0374a… lmata 1 # Ingestion API
ce0374a… lmata 2
ce0374a… lmata 3 All ingesters accept a `GraphStore` instance and return an `IngestionResult` dataclass.
ce0374a… lmata 4
ce0374a… lmata 5 ```python
ce0374a… lmata 6 from navegador.graph import GraphStore
ce0374a… lmata 7 from navegador.ingest import RepoIngester, KnowledgeIngester, WikiIngester, PlanopticonIngester
ce0374a… lmata 8 ```
ce0374a… lmata 9
ce0374a… lmata 10 ---
ce0374a… lmata 11
ce0374a… lmata 12 ## IngestionResult
ce0374a… lmata 13
ce0374a… lmata 14 ```python
ce0374a… lmata 15 @dataclass
ce0374a… lmata 16 class IngestionResult:
ce0374a… lmata 17 nodes_created: int
ce0374a… lmata 18 nodes_updated: int
ce0374a… lmata 19 edges_created: int
ce0374a… lmata 20 files_processed: int
ce0374a… lmata 21 errors: list[str]
ce0374a… lmata 22 duration_seconds: float
ce0374a… lmata 23 ```
ce0374a… lmata 24
ce0374a… lmata 25 ---
ce0374a… lmata 26
ce0374a… lmata 27 ## RepoIngester
ce0374a… lmata 28
ce0374a… lmata 29 Parses a source tree and writes code layer nodes and edges.
ce0374a… lmata 30
ce0374a… lmata 31 ```python
ce0374a… lmata 32 class RepoIngester:
ce0374a… lmata 33 def __init__(self, store: GraphStore) -> None: ...
ce0374a… lmata 34
ce0374a… lmata 35 def ingest(
ce0374a… lmata 36 self,
ce0374a… lmata 37 path: str | Path,
ce0374a… lmata 38 *,
ce0374a… lmata 39 clear: bool = False,
89816aa… lmata 40 incremental: bool = False,
89816aa… lmata 41 redact: bool = False,
89816aa… lmata 42 monorepo: bool = False,
ce0374a… lmata 43 ) -> IngestionResult: ...
ce0374a… lmata 44
ce0374a… lmata 45 def ingest_file(
ce0374a… lmata 46 self,
ce0374a… lmata 47 path: str | Path,
89816aa… lmata 48 *,
89816aa… lmata 49 redact: bool = False,
ce0374a… lmata 50 ) -> IngestionResult: ...
ce0374a… lmata 51 ```
ce0374a… lmata 52
ce0374a… lmata 53 ### Usage
ce0374a… lmata 54
ce0374a… lmata 55 ```python
ce0374a… lmata 56 store = GraphStore.sqlite(".navegador/navegador.db")
ce0374a… lmata 57 ingester = RepoIngester(store)
ce0374a… lmata 58
ce0374a… lmata 59 # full repo ingest
ce0374a… lmata 60 result = ingester.ingest("./src")
ce0374a… lmata 61 print(f"{result.nodes_created} nodes, {result.edges_created} edges")
ce0374a… lmata 62
89816aa… lmata 63 # incremental ingest — only reprocesses files whose content hash has changed
89816aa… lmata 64 result = ingester.ingest("./src", incremental=True)
89816aa… lmata 65
ce0374a… lmata 66 # incremental: single file
ce0374a… lmata 67 result = ingester.ingest_file("./src/auth/service.py")
ce0374a… lmata 68
ce0374a… lmata 69 # wipe + rebuild
ce0374a… lmata 70 result = ingester.ingest("./src", clear=True)
89816aa… lmata 71
89816aa… lmata 72 # redact sensitive content (strips tokens, passwords, keys from string literals)
89816aa… lmata 73 result = ingester.ingest("./src", redact=True)
89816aa… lmata 74
89816aa… lmata 75 # monorepo — traverse workspace sub-packages
89816aa… lmata 76 result = ingester.ingest("./monorepo", monorepo=True)
ce0374a… lmata 77 ```
ce0374a… lmata 78
ce0374a… lmata 79 ### Supported languages
ce0374a… lmata 80
89816aa… lmata 81 | Language | File extensions | Parser | Extra required |
89816aa… lmata 82 |---|---|---|---|
89816aa… lmata 83 | Python | `.py` | tree-sitter-python | — (included) |
89816aa… lmata 84 | TypeScript | `.ts`, `.tsx` | tree-sitter-typescript | — (included) |
89816aa… lmata 85 | JavaScript | `.js`, `.jsx` | tree-sitter-javascript | — (included) |
89816aa… lmata 86 | Go | `.go` | tree-sitter-go | — (included) |
89816aa… lmata 87 | Rust | `.rs` | tree-sitter-rust | — (included) |
89816aa… lmata 88 | Java | `.java` | tree-sitter-java | — (included) |
89816aa… lmata 89 | Kotlin | `.kt`, `.kts` | tree-sitter-kotlin | `navegador[languages]` |
89816aa… lmata 90 | C# | `.cs` | tree-sitter-c-sharp | `navegador[languages]` |
89816aa… lmata 91 | PHP | `.php` | tree-sitter-php | `navegador[languages]` |
89816aa… lmata 92 | Ruby | `.rb` | tree-sitter-ruby | `navegador[languages]` |
89816aa… lmata 93 | Swift | `.swift` | tree-sitter-swift | `navegador[languages]` |
89816aa… lmata 94 | C | `.c`, `.h` | tree-sitter-c | `navegador[languages]` |
89816aa… lmata 95 | C++ | `.cpp`, `.cc`, `.cxx`, `.hpp` | tree-sitter-cpp | `navegador[languages]` |
ce0374a… lmata 96
ce0374a… lmata 97 ### Adding a new language parser
ce0374a… lmata 98
ce0374a… lmata 99 1. Install the tree-sitter grammar: `pip install tree-sitter-<lang>`
ce0374a… lmata 100 2. Subclass `navegador.ingest.base.LanguageParser`:
ce0374a… lmata 101
ce0374a… lmata 102 ```python
ce0374a… lmata 103 from navegador.ingest.base import LanguageParser, ParseResult
ce0374a… lmata 104
ce0374a… lmata 105 class RubyParser(LanguageParser):
ce0374a… lmata 106 language = "ruby"
ce0374a… lmata 107 extensions = [".rb"]
ce0374a… lmata 108
ce0374a… lmata 109 def parse(self, source: str, file_path: str) -> ParseResult:
ce0374a… lmata 110 # use self.tree_sitter_language to build the tree
ce0374a… lmata 111 # return ParseResult with nodes and edges
ce0374a… lmata 112 ...
ce0374a… lmata 113 ```
ce0374a… lmata 114
ce0374a… lmata 115 3. Register in `navegador/ingest/registry.py`:
ce0374a… lmata 116
ce0374a… lmata 117 ```python
ce0374a… lmata 118 from .ruby import RubyParser
ce0374a… lmata 119 PARSERS["ruby"] = RubyParser
ce0374a… lmata 120 ```
ce0374a… lmata 121
ce0374a… lmata 122 `RepoIngester` dispatches to registered parsers by file extension.
89816aa… lmata 123
89816aa… lmata 124 ### Framework enrichers
89816aa… lmata 125
89816aa… lmata 126 After parsing, `RepoIngester` runs framework-specific enrichers that annotate nodes with framework context. Enrichers are discovered automatically based on what frameworks are detected in the repo.
89816aa… lmata 127
89816aa… lmata 128 | Framework | What gets enriched |
89816aa… lmata 129 |---|---|
89816aa… lmata 130 | Django | Models, views, URL patterns, admin registrations |
89816aa… lmata 131 | FastAPI | Route handlers, dependency injections, Pydantic schemas |
89816aa… lmata 132 | React | Components, hooks, prop types |
89816aa… lmata 133 | Express | Route handlers, middleware chains |
89816aa… lmata 134 | React Native | Screens, navigators |
89816aa… lmata 135 | Rails | Controllers, models, routes |
89816aa… lmata 136 | Spring Boot | Beans, controllers, repositories |
89816aa… lmata 137 | Laravel | Controllers, models, routes |
ce0374a… lmata 138
ce0374a… lmata 139 ---
ce0374a… lmata 140
ce0374a… lmata 141 ## KnowledgeIngester
ce0374a… lmata 142
ce0374a… lmata 143 Writes knowledge layer nodes. Wraps the `navegador add` commands programmatically.
ce0374a… lmata 144
ce0374a… lmata 145 ```python
ce0374a… lmata 146 class KnowledgeIngester:
ce0374a… lmata 147 def __init__(self, store: GraphStore) -> None: ...
ce0374a… lmata 148
ce0374a… lmata 149 def add_concept(
ce0374a… lmata 150 self,
ce0374a… lmata 151 name: str,
ce0374a… lmata 152 *,
ce0374a… lmata 153 description: str = "",
ce0374a… lmata 154 domain: str = "",
ce0374a… lmata 155 status: str = "",
ce0374a… lmata 156 ) -> str: ... # returns node ID
ce0374a… lmata 157
ce0374a… lmata 158 def add_rule(
ce0374a… lmata 159 self,
ce0374a… lmata 160 name: str,
ce0374a… lmata 161 *,
ce0374a… lmata 162 description: str = "",
ce0374a… lmata 163 domain: str = "",
ce0374a… lmata 164 severity: str = "info",
ce0374a… lmata 165 rationale: str = "",
ce0374a… lmata 166 ) -> str: ...
ce0374a… lmata 167
ce0374a… lmata 168 def add_decision(
ce0374a… lmata 169 self,
ce0374a… lmata 170 name: str,
ce0374a… lmata 171 *,
ce0374a… lmata 172 description: str = "",
ce0374a… lmata 173 domain: str = "",
ce0374a… lmata 174 rationale: str = "",
ce0374a… lmata 175 alternatives: str = "",
ce0374a… lmata 176 date: str = "",
ce0374a… lmata 177 status: str = "proposed",
ce0374a… lmata 178 ) -> str: ...
ce0374a… lmata 179
ce0374a… lmata 180 def add_person(
ce0374a… lmata 181 self,
ce0374a… lmata 182 name: str,
ce0374a… lmata 183 *,
ce0374a… lmata 184 email: str = "",
ce0374a… lmata 185 role: str = "",
ce0374a… lmata 186 team: str = "",
ce0374a… lmata 187 ) -> str: ...
ce0374a… lmata 188
ce0374a… lmata 189 def add_domain(
ce0374a… lmata 190 self,
ce0374a… lmata 191 name: str,
ce0374a… lmata 192 *,
ce0374a… lmata 193 description: str = "",
ce0374a… lmata 194 ) -> str: ...
ce0374a… lmata 195
ce0374a… lmata 196 def annotate(
ce0374a… lmata 197 self,
ce0374a… lmata 198 code_name: str,
ce0374a… lmata 199 *,
ce0374a… lmata 200 node_type: str = "Function",
ce0374a… lmata 201 concept: str = "",
ce0374a… lmata 202 rule: str = "",
ce0374a… lmata 203 ) -> None: ...
ce0374a… lmata 204 ```
ce0374a… lmata 205
ce0374a… lmata 206 ### Usage
ce0374a… lmata 207
ce0374a… lmata 208 ```python
ce0374a… lmata 209 store = GraphStore.sqlite(".navegador/navegador.db")
ce0374a… lmata 210 ingester = KnowledgeIngester(store)
ce0374a… lmata 211
ce0374a… lmata 212 ingester.add_domain("Payments", description="Payment processing and billing")
ce0374a… lmata 213 ingester.add_concept("Idempotency", domain="Payments",
ce0374a… lmata 214 description="Operations safe to retry without side effects")
ce0374a… lmata 215 ingester.add_rule("RequireIdempotencyKey",
ce0374a… lmata 216 domain="Payments", severity="critical",
ce0374a… lmata 217 rationale="Card networks retry on timeout")
ce0374a… lmata 218 ingester.annotate("process_payment", node_type="Function",
ce0374a… lmata 219 concept="Idempotency", rule="RequireIdempotencyKey")
ce0374a… lmata 220 ```
ce0374a… lmata 221
ce0374a… lmata 222 ---
ce0374a… lmata 223
ce0374a… lmata 224 ## WikiIngester
ce0374a… lmata 225
ce0374a… lmata 226 Fetches GitHub wiki pages and writes `WikiPage` nodes.
ce0374a… lmata 227
ce0374a… lmata 228 ```python
ce0374a… lmata 229 class WikiIngester:
ce0374a… lmata 230 def __init__(self, store: GraphStore) -> None: ...
ce0374a… lmata 231
ce0374a… lmata 232 def ingest_repo(
ce0374a… lmata 233 self,
ce0374a… lmata 234 repo: str,
ce0374a… lmata 235 *,
ce0374a… lmata 236 token: str = "",
ce0374a… lmata 237 use_api: bool = False,
ce0374a… lmata 238 ) -> IngestionResult: ...
ce0374a… lmata 239
ce0374a… lmata 240 def ingest_dir(
ce0374a… lmata 241 self,
ce0374a… lmata 242 path: str | Path,
ce0374a… lmata 243 ) -> IngestionResult: ...
ce0374a… lmata 244 ```
ce0374a… lmata 245
ce0374a… lmata 246 ### Usage
ce0374a… lmata 247
ce0374a… lmata 248 ```python
ce0374a… lmata 249 import os
ce0374a… lmata 250 store = GraphStore.sqlite(".navegador/navegador.db")
ce0374a… lmata 251 ingester = WikiIngester(store)
ce0374a… lmata 252
ce0374a… lmata 253 # from GitHub API
ce0374a… lmata 254 result = ingester.ingest_repo("myorg/myrepo", token=os.environ["GITHUB_TOKEN"])
ce0374a… lmata 255
ce0374a… lmata 256 # from local clone
ce0374a… lmata 257 result = ingester.ingest_dir("./myrepo.wiki")
ce0374a… lmata 258 ```
ce0374a… lmata 259
ce0374a… lmata 260 ---
ce0374a… lmata 261
ce0374a… lmata 262 ## PlanopticonIngester
ce0374a… lmata 263
ce0374a… lmata 264 Ingests Planopticon knowledge graph output into the knowledge layer.
ce0374a… lmata 265
ce0374a… lmata 266 ```python
ce0374a… lmata 267 class PlanopticonIngester:
ce0374a… lmata 268 def __init__(self, store: GraphStore) -> None: ...
ce0374a… lmata 269
ce0374a… lmata 270 def ingest(
ce0374a… lmata 271 self,
ce0374a… lmata 272 path: str | Path,
ce0374a… lmata 273 *,
ce0374a… lmata 274 input_type: str = "auto",
ce0374a… lmata 275 source: str = "",
ce0374a… lmata 276 ) -> IngestionResult: ...
ce0374a… lmata 277
ce0374a… lmata 278 def ingest_manifest(
ce0374a… lmata 279 self,
ce0374a… lmata 280 path: str | Path,
ce0374a… lmata 281 *,
ce0374a… lmata 282 source: str = "",
ce0374a… lmata 283 ) -> IngestionResult: ...
ce0374a… lmata 284
ce0374a… lmata 285 def ingest_kg(
ce0374a… lmata 286 self,
ce0374a… lmata 287 path: str | Path,
ce0374a… lmata 288 *,
ce0374a… lmata 289 source: str = "",
ce0374a… lmata 290 ) -> IngestionResult: ...
ce0374a… lmata 291
ce0374a… lmata 292 def ingest_interchange(
ce0374a… lmata 293 self,
ce0374a… lmata 294 path: str | Path,
ce0374a… lmata 295 *,
ce0374a… lmata 296 source: str = "",
ce0374a… lmata 297 ) -> IngestionResult: ...
ce0374a… lmata 298
ce0374a… lmata 299 def ingest_batch(
ce0374a… lmata 300 self,
ce0374a… lmata 301 path: str | Path,
ce0374a… lmata 302 *,
ce0374a… lmata 303 source: str = "",
ce0374a… lmata 304 ) -> IngestionResult: ...
ce0374a… lmata 305 ```
ce0374a… lmata 306
ce0374a… lmata 307 `input_type` values: `"auto"`, `"manifest"`, `"kg"`, `"interchange"`, `"batch"`.
ce0374a… lmata 308
ce0374a… lmata 309 See [Planopticon guide](../guide/planopticon.md) for format details and entity mapping.
89816aa… lmata 310
89816aa… lmata 311 ---
89816aa… lmata 312
89816aa… lmata 313 ## Export and import
89816aa… lmata 314
89816aa… lmata 315 Navegador can export the full graph (or a subset) to JSONL for backup, migration, or sharing. The JSONL format is one JSON object per line, where each object is either a node or an edge.
89816aa… lmata 316
89816aa… lmata 317 ```bash
89816aa… lmata 318 navegador export > graph.jsonl
89816aa… lmata 319 navegador export --nodes-only > nodes.jsonl
89816aa… lmata 320 navegador import graph.jsonl
89816aa… lmata 321 ```
89816aa… lmata 322
89816aa… lmata 323 Python API:
89816aa… lmata 324
89816aa… lmata 325 ```python
89816aa… lmata 326 from navegador.graph import GraphStore
89816aa… lmata 327
89816aa… lmata 328 store = GraphStore.sqlite(".navegador/navegador.db")
89816aa… lmata 329
89816aa… lmata 330 # export
89816aa… lmata 331 with open("graph.jsonl", "w") as f:
89816aa… lmata 332 store.export_jsonl(f)
89816aa… lmata 333
89816aa… lmata 334 # import into a new store
89816aa… lmata 335 new_store = GraphStore.sqlite(".navegador/new.db")
89816aa… lmata 336 with open("graph.jsonl") as f:
89816aa… lmata 337 new_store.import_jsonl(f)
89816aa… lmata 338 ```
89816aa… lmata 339
89816aa… lmata 340 ---
89816aa… lmata 341
89816aa… lmata 342 ## Schema migrations
89816aa… lmata 343
89816aa… lmata 344 When upgrading navegador, run `navegador migrate` before re-ingesting to apply schema changes (new node properties, new edge types, index updates):
89816aa… lmata 345
89816aa… lmata 346 ```bash
89816aa… lmata 347 navegador migrate
89816aa… lmata 348 ```
89816aa… lmata 349
89816aa… lmata 350 Migrations are idempotent — safe to run multiple times. The migration state is stored in the graph itself under a `_MigrationState` node.
89816aa… lmata 351
89816aa… lmata 352 Python API:
89816aa… lmata 353
89816aa… lmata 354 ```python
89816aa… lmata 355 from navegador.graph import GraphStore, migrate
89816aa… lmata 356
89816aa… lmata 357 store = GraphStore.sqlite(".navegador/navegador.db")
89816aa… lmata 358 migrate(store) # applies any pending migrations
89816aa… lmata 359 ```

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button