PlanOpticon

planopticon / docs / guide / knowledge-graphs.md

Source Rendered

Blame History Raw 651 lines

1	`# Knowledge Graphs`
2
3	`PlanOpticon builds structured knowledge graphs from video analyses, document ingestion, and other content sources. A knowledge graph captures entities (people, technologies, concepts, organizations) and the relationships between them, providing a queryable representation of everything discussed or presented in your source material.`
4
5	`---`
6
7	`## Storage`
8
9	Knowledge graphs are stored as SQLite databases (`knowledge_graph.db`) using Python's built-in `sqlite3` module. This means:
10
11	`- Zero external dependencies. No database server to install or manage.`
12	- Single-file portability. Copy the `.db` file to share a knowledge graph.
13	`- WAL mode. SQLite Write-Ahead Logging is enabled for concurrent read performance.`
14	- JSON fallback. Knowledge graphs can also be saved as `knowledge_graph.json` for interoperability, though SQLite is preferred for performance and querying.
15
16	`### Database Schema`
17
18	`The SQLite store uses the following tables:`
19
20	`\| Table \| Purpose \|`
21	`\|---\|---\|`
22	\| `entities` \| Core entity records with name, type, descriptions, source, and arbitrary properties \|
23	\| `occurrences` \| Where and when each entity was mentioned (source, timestamp, text snippet) \|
24	\| `relationships` \| Directed edges between entities with type, content source, timestamp, and properties \|
25	\| `sources` \| Registered content sources with provenance metadata (source type, title, path, URL, MIME type, ingestion timestamp) \|
26	\| `source_locations` \| Links between sources and specific entities/relationships, with location details (timestamp, page, section, line range, text snippet) \|
27
28	All entity lookups are case-insensitive (indexed on `name_lower`). Entities and relationships are indexed on their source and target fields for efficient traversal.
29
30	`### Storage Backends`
31
32	`PlanOpticon supports two storage backends, selected automatically:`
33
34	`\| Backend \| When Used \| Persistence \|`
35	`\|---\|---\|---\|`
36	\| `SQLiteStore` \| When a `db_path` is provided \| Persistent on disk \|
37	\| `InMemoryStore` \| When no path is given, or as fallback \| In-memory only \|
38
39	Both backends implement the same `GraphStore` abstract interface, so all query and manipulation code works identically regardless of backend.
40
41	```python
42	`from video_processor.integrators.graph_store import create_store`
43
44	`# Persistent SQLite store`
45	`store = create_store("/path/to/knowledge_graph.db")`
46
47	`# In-memory store (for temporary operations)`
48	`store = create_store()`
49	```
50
51	`---`
52
53	`## Entity Types`
54
55	`Entities extracted from content are assigned one of the following base types:`
56
57	`\| Type \| Description \| Specificity Rank \|`
58	`\|---\|---\|---\|`
59	\| `person` \| People mentioned or participating \| 3 (highest) \|
60	\| `technology` \| Tools, languages, frameworks, platforms \| 3 \|
61	\| `organization` \| Companies, teams, departments \| 2 \|
62	\| `time` \| Dates, deadlines, time references \| 1 \|
63	\| `diagram` \| Visual diagrams extracted from video frames \| 1 \|
64	\| `concept` \| General concepts, topics, ideas (default) \| 0 (lowest) \|
65
66	The specificity rank is used during merge operations: when two entities are matched as duplicates, the more specific type wins (e.g., `technology` overrides `concept`).
67
68	`### Planning Taxonomy`
69
70	Beyond the base entity types, PlanOpticon includes a planning taxonomy for classifying entities into project-planning categories. The `TaxonomyClassifier` maps extracted entities into these types:
71
72	`\| Planning Type \| Keywords Matched \|`
73	`\|---\|---\|`
74	\| `goal` \| goal, objective, aim, target outcome \|
75	\| `requirement` \| must, should, requirement, need, required \|
76	\| `constraint` \| constraint, limitation, restrict, cannot, must not \|
77	\| `decision` \| decided, decision, chose, selected, agreed \|
78	\| `risk` \| risk, concern, worry, danger, threat \|
79	\| `assumption` \| assume, assumption, expecting, presume \|
80	\| `dependency` \| depends, dependency, relies on, prerequisite, blocked \|
81	\| `milestone` \| milestone, deadline, deliverable, release, launch \|
82	\| `task` \| task, todo, action item, work item, implement \|
83	\| `feature` \| feature, capability, functionality \|
84
85	`Classification works in two stages:`
86
87	`1. Heuristic classification. Entity descriptions are scanned for the keywords listed above. First match wins.`
88	2. LLM refinement. If an LLM provider is available, entities are sent to the LLM for more nuanced classification with priority assignment (`high`, `medium`, `low`). LLM results override heuristic results on conflicts.
89
90	`Classified entities are used by planning agent skills (project_plan, prd, roadmap, task_breakdown) to produce targeted, context-aware artifacts.`
91
92	`---`
93
94	`## Relationship Types`
95
96	Relationships are directed edges between entities. The `type` field is a free-text string determined by the LLM during extraction. Common relationship types include:
97
98	- `related_to` (default)
99	- `works_with`
100	- `uses`
101	- `depends_on`
102	- `proposed`
103	- `discussed_by`
104	- `employed_by`
105	- `collaborates_with`
106	- `expert_in`
107
108	`### Typed Relationships`
109
110	The `add_typed_relationship()` method creates edges with custom labels and optional properties, enabling richer graph semantics:
111
112	```python
113	`store.add_typed_relationship(`
114	`source="Authentication Service",`
115	`target="PostgreSQL",`
116	`edge_label="USES_SYSTEM",`
117	`properties={"purpose": "user credential storage", "version": "15"},`
118	`)`
119	```
120
121	`### Relationship Checks`
122
123	`You can check whether a relationship exists between two entities:`
124
125	```python
126	`# Check for any relationship`
127	`store.has_relationship("Alice", "Kubernetes")`
128
129	`# Check for a specific relationship type`
130	`store.has_relationship("Alice", "Kubernetes", edge_label="expert_in")`
131	```
132
133	`---`
134
135	`## Building a Knowledge Graph`
136
137	`### From Video Analysis`
138
139	The primary path for building a knowledge graph is through video analysis. When you run `planopticon analyze`, the pipeline extracts entities and relationships from:
140
141	`- Transcript segments -- batched in groups of 10 for efficient API usage, with speaker identification`
142	`- Diagram content -- text extracted from visual diagrams detected in video frames`
143
144	```bash
145	`planopticon analyze -i meeting.mp4 -o results/`
146	`# Creates results/knowledge_graph.db`
147	```
148
149	`### From Document Ingestion`
150
151	`Documents (Markdown, PDF, DOCX) can be ingested directly into a knowledge graph:`
152
153	```bash
154	`# Ingest a single file`
155	`planopticon ingest -i requirements.pdf -o results/`
156
157	`# Ingest a directory recursively`
158	`planopticon ingest -i docs/ -o results/ --recursive`
159
160	`# Ingest into an existing knowledge graph`
161	`planopticon ingest -i notes.md --db results/knowledge_graph.db`
162	```
163
164	`### From Batch Processing`
165
166	`Multiple videos can be processed in batch mode, with all results merged into a single knowledge graph:`
167
168	```bash
169	`planopticon batch -i videos/ -o results/`
170	```
171
172	`### Programmatic Construction`
173
174	```python
175	`from video_processor.integrators.knowledge_graph import KnowledgeGraph`
176
177	`# Create a new knowledge graph with LLM extraction`
178	`from video_processor.providers.manager import ProviderManager`
179	`pm = ProviderManager()`
180	`kg = KnowledgeGraph(provider_manager=pm, db_path="knowledge_graph.db")`
181
182	`# Add content (entities and relationships are extracted by LLM)`
183	`kg.add_content(`
184	`text="Alice proposed using Kubernetes for container orchestration.",`
185	`source="meeting_notes",`
186	`timestamp=120.5,`
187	`)`
188
189	`# Process a full transcript`
190	`kg.process_transcript(transcript_data, batch_size=10)`
191
192	`# Process diagram results`
193	`kg.process_diagrams(diagram_results)`
194
195	`# Save`
196	`kg.save("knowledge_graph.db")`
197	```
198
199	`---`
200
201	`## Merge and Deduplication`
202
203	`When combining knowledge graphs from multiple sources, PlanOpticon performs intelligent merge with deduplication.`
204
205	`### Fuzzy Name Matching`
206
207	Entity names are compared using Python's `SequenceMatcher` with a threshold of 0.85. This means "Kubernetes" and "kubernetes" are matched exactly (case-insensitive), while "React.js" and "ReactJS" may be matched as duplicates if their similarity ratio meets the threshold.
208
209	`### Type Conflict Resolution`
210
211	`When two entities match but have different types, the more specific type wins based on the specificity ranking:`
212
213	`\| Scenario \| Result \|`
214	`\|---\|---\|`
215	\| `concept` vs `technology` \| `technology` wins (rank 3 > rank 0) \|
216	\| `person` vs `concept` \| `person` wins (rank 3 > rank 0) \|
217	\| `organization` vs `concept` \| `organization` wins (rank 2 > rank 0) \|
218	\| `person` vs `technology` \| Keeps whichever was first (equal rank) \|
219
220	`### Provenance Tracking`
221
222	Merged entities receive a `merged_from:<original_name>` description entry, preserving the audit trail of which entities were unified.
223
224	`### Programmatic Merge`
225
226	```python
227	`from video_processor.integrators.knowledge_graph import KnowledgeGraph`
228
229	`# Load two knowledge graphs`
230	`kg1 = KnowledgeGraph(db_path="project_a.db")`
231	`kg2 = KnowledgeGraph(db_path="project_b.db")`
232
233	`# Merge kg2 into kg1`
234	`kg1.merge(kg2)`
235
236	`# Save the merged result`
237	`kg1.save("merged.db")`
238	```
239
240	`The merge operation also copies all registered sources and occurrences, so provenance information is preserved across merges.`
241
242	`---`
243
244	`## Querying`
245
246	`PlanOpticon provides two query modes: direct mode (no LLM required) and agentic mode (LLM-powered natural language).`
247
248	`### Direct Mode`
249
250	`Direct mode queries are fast, deterministic, and require no API key. They are the right choice for structured lookups.`
251
252	`#### Stats`
253
254	`Return entity count, relationship count, and entity type breakdown:`
255
256	```bash
257	`planopticon query`
258	```
259
260	```python
261	`engine.stats()`
262	`# QueryResult with data: {`
263	`# "entity_count": 42,`
264	`# "relationship_count": 87,`
265	`# "entity_types": {"technology": 15, "person": 12, ...}`
266	`# }`
267	```
268
269	`#### Entities`
270
271	`Filter entities by name substring and/or type:`
272
273	```bash
274	`planopticon query "entities --type technology"`
275	`planopticon query "entities --name python"`
276	```
277
278	```python
279	`engine.entities(entity_type="technology")`
280	`engine.entities(name="python")`
281	`engine.entities(name="auth", entity_type="concept", limit=10)`
282	```
283
284	All filtering is case-insensitive. Results are capped at 50 by default (configurable via `limit`).
285
286	`#### Neighbors`
287
288	`Get an entity and all directly connected nodes and relationships:`
289
290	```bash
291	`planopticon query "neighbors Alice"`
292	```
293
294	```python
295	`engine.neighbors("Alice", depth=1)`
296	```
297
298	The `depth` parameter controls how many hops to traverse (default 1). The result includes both entity objects and relationship objects.
299
300	`#### Relationships`
301
302	`Filter relationships by source, target, and/or type:`
303
304	```bash
305	`planopticon query "relationships --source Alice"`
306	```
307
308	```python
309	`engine.relationships(source="Alice")`
310	`engine.relationships(target="Kubernetes", rel_type="uses")`
311	```
312
313	`#### Sources`
314
315	`List all registered content sources:`
316
317	```python
318	`engine.sources()`
319	```
320
321	`#### Provenance`
322
323	`Get all source locations for a specific entity, showing exactly where it was mentioned:`
324
325	```python
326	`engine.provenance("Kubernetes")`
327	`# Returns source locations with timestamps, pages, sections, and text snippets`
328	```
329
330	`#### Raw SQL`
331
332	`Execute arbitrary SQL against the SQLite backend (SQLite stores only):`
333
334	```python
335	`engine.sql("SELECT name, type FROM entities WHERE type = 'technology' ORDER BY name")`
336	```
337
338	`### Agentic Mode`
339
340	`Agentic mode accepts natural-language questions and uses the LLM to plan and execute queries. It requires a configured LLM provider.`
341
342	```bash
343	`planopticon query "What technologies were discussed?"`
344	`planopticon query "Who are the key people mentioned?"`
345	`planopticon query "What depends on the authentication service?"`
346	```
347
348	`The agentic query pipeline:`
349
350	`1. Plan. The LLM receives graph stats and available actions (entities, relationships, neighbors, stats). It selects exactly one action and its parameters.`
351	`2. Execute. The chosen action is run through the direct-mode engine.`
352	`3. Synthesize. The LLM receives the raw query results and the original question, then produces a concise natural-language answer.`
353
354	`This design ensures the LLM never generates arbitrary code -- it only selects from a fixed set of known query actions.`
355
356	```bash
357	`# Requires an API key`
358	`planopticon query "What technologies were discussed?" -p openai`
359
360	`# Use the interactive REPL for multiple queries`
361	`planopticon query -I`
362	```
363
364	`---`
365
366	`## Graph Query Engine Python API`
367
368	The `GraphQueryEngine` class provides the programmatic interface for all query operations.
369
370	`### Initialization`
371
372	```python
373	`from video_processor.integrators.graph_query import GraphQueryEngine`
374	`from video_processor.integrators.graph_discovery import find_nearest_graph`
375
376	`# From a .db file`
377	`path = find_nearest_graph()`
378	`engine = GraphQueryEngine.from_db_path(path)`
379
380	`# From a .json file`
381	`engine = GraphQueryEngine.from_json_path("knowledge_graph.json")`
382
383	`# With an LLM provider for agentic mode`
384	`from video_processor.providers.manager import ProviderManager`
385	`pm = ProviderManager()`
386	`engine = GraphQueryEngine.from_db_path(path, provider_manager=pm)`
387	```
388
389	`### QueryResult`
390
391	All query methods return a `QueryResult` dataclass with multiple output formats:
392
393	```python
394	`result = engine.stats()`
395
396	`# Human-readable text`
397	`print(result.to_text())`
398
399	`# JSON string`
400	`print(result.to_json())`
401
402	`# Mermaid diagram (for graph results)`
403	`result = engine.neighbors("Alice")`
404	`print(result.to_mermaid())`
405	```
406
407	The `QueryResult` contains:
408
409	`\| Field \| Type \| Description \|`
410	`\|---\|---\|---\|`
411	\| `data` \| Any \| The raw result data (dict, list, or scalar) \|
412	\| `query_type` \| str \| `"filter"` for direct mode, `"agentic"` for LLM mode, `"sql"` for raw SQL \|
413	\| `raw_query` \| str \| String representation of the executed query \|
414	\| `explanation` \| str \| Human-readable explanation or LLM-synthesized answer \|
415
416	`---`
417
418	`## The Self-Contained HTML Viewer`
419
420	PlanOpticon includes a zero-dependency HTML knowledge graph viewer at `knowledge-base/viewer.html`. This file is fully self-contained -- it inlines D3.js and requires no build step, no server, and no internet connection.
421
422	To use it, open `viewer.html` in a browser. It will load and visualize a `knowledge_graph.json` file (place it in the same directory, or use the file picker in the viewer).
423
424	`The viewer provides:`
425
426	`- Interactive force-directed graph layout`
427	`- Zoom and pan navigation`
428	`- Entity nodes colored by type`
429	`- Relationship edges with labels`
430	`- Click-to-focus on individual entities`
431	`- Entity detail panel showing descriptions and connections`
432
433	`This covers approximately 80% of graph exploration needs with zero infrastructure.`
434
435	`---`
436
437	`## KG Management Commands`
438
439	The `planopticon kg` command group provides utilities for managing knowledge graph files.
440
441	`### kg convert`
442
443	`Convert a knowledge graph between SQLite and JSON formats:`
444
445	```bash
446	`# SQLite to JSON`
447	`planopticon kg convert results/knowledge_graph.db output.json`
448
449	`# JSON to SQLite`
450	`planopticon kg convert knowledge_graph.json knowledge_graph.db`
451	```
452
453	`The output format is inferred from the destination file extension. Source and destination must be different formats.`
454
455	`### kg sync`
456
457	Synchronize a `.db` and `.json` knowledge graph, updating the stale one:
458
459	```bash
460	`# Auto-detect which is newer and sync`
461	`planopticon kg sync results/knowledge_graph.db`
462
463	`# Explicit JSON path`
464	`planopticon kg sync knowledge_graph.db knowledge_graph.json`
465
466	`# Force a specific direction`
467	`planopticon kg sync knowledge_graph.db knowledge_graph.json --direction db-to-json`
468	`planopticon kg sync knowledge_graph.db knowledge_graph.json --direction json-to-db`
469	```
470
471	If `JSON_PATH` is omitted, the `.json` path is derived from the `.db` path (same name, different extension). In `auto` mode (the default), the newer file is used as the source.
472
473	`### kg inspect`
474
475	`Show summary statistics for a knowledge graph file:`
476
477	```bash
478	`planopticon kg inspect results/knowledge_graph.db`
479	```
480
481	`Output:`
482
483	```
484	`File: results/knowledge_graph.db`
485	`Store: sqlite`
486	`Entities: 42`
487	`Relationships: 87`
488	`Entity types:`
489	`technology: 15`
490	`person: 12`
491	`concept: 10`
492	`organization: 5`
493	```
494
495	Works with both `.db` and `.json` files.
496
497	`### kg classify`
498
499	`Classify knowledge graph entities into planning taxonomy types:`
500
501	```bash
502	`# Heuristic + LLM classification`
503	`planopticon kg classify results/knowledge_graph.db`
504
505	`# Heuristic only (no API key needed)`
506	`planopticon kg classify results/knowledge_graph.db -p none`
507
508	`# JSON output`
509	`planopticon kg classify results/knowledge_graph.db --format json`
510	```
511
512	`Text output groups entities by planning type:`
513
514	```
515	`GOALS (3)`
516	`- Improve system reliability [high]`
517	`Must achieve 99.9% uptime`
518	`- Reduce deployment time [medium]`
519	`Automate the deployment pipeline`
520
521	`RISKS (2)`
522	`- Data migration complexity [high]`
523	`Legacy schema incompatibilities`
524	`...`
525
526	`TASKS (5)`
527	`- Implement OAuth2 flow`
528	`Set up authentication service`
529	`...`
530	```
531
532	JSON output returns an array of `PlanningEntity` objects with `name`, `planning_type`, `priority`, `description`, and `source_entities` fields.
533
534	`### kg from-exchange`
535
536	`Import a PlanOpticonExchange JSON file into a knowledge graph database:`
537
538	```bash
539	`# Import to default location (./knowledge_graph.db)`
540	`planopticon kg from-exchange exchange.json`
541
542	`# Import to a specific path`
543	`planopticon kg from-exchange exchange.json -o project.db`
544	```
545
546	`The PlanOpticonExchange format is a standardized interchange format that includes entities, relationships, and source records.`
547
548	`---`
549
550	`## Output Formats`
551
552	`Query results can be output in three formats:`
553
554	`### Text (default)`
555
556	`Human-readable format with entity types in brackets, relationship arrows, and indented details:`
557
558	```
559	`Found 15 entities`
560	`[technology] Python -- General-purpose programming language`
561	`[person] Alice -- Lead engineer on the project`
562	`[concept] Microservices -- Architectural pattern discussed`
563	```
564
565	`### JSON`
566
567	`Full structured output including query metadata:`
568
569	```bash
570	`planopticon query --format json stats`
571	```
572
573	```json
574	`{`
575	`"query_type": "filter",`
576	`"raw_query": "stats()",`
577	`"explanation": "Knowledge graph statistics",`
578	`"data": {`
579	`"entity_count": 42,`
580	`"relationship_count": 87,`
581	`"entity_types": {`
582	`"technology": 15,`
583	`"person": 12`
584	`}`
585	`}`
586	`}`
587	```
588
589	`### Mermaid`
590
591	`Graph results rendered as Mermaid diagram syntax, ready for embedding in markdown:`
592
593	```bash
594	`planopticon query --format mermaid "neighbors Alice"`
595	```
596
597	```
598	`graph LR`
599	`Alice["Alice"]:::person`
600	`Python["Python"]:::technology`
601	`Kubernetes["Kubernetes"]:::technology`
602	`Alice -- "expert_in" --> Kubernetes`
603	`Alice -- "works_with" --> Python`
604	`classDef person fill:#f9d5e5,stroke:#333`
605	`classDef concept fill:#eeeeee,stroke:#333`
606	`classDef technology fill:#d5e5f9,stroke:#333`
607	`classDef organization fill:#f9e5d5,stroke:#333`
608	```
609
610	The `KnowledgeGraph.generate_mermaid()` method also produces full-graph Mermaid diagrams, capped at the top 30 most-connected nodes by default.
611
612	`---`
613
614	`## Auto-Discovery`
615
616	PlanOpticon automatically locates knowledge graph files using the `find_nearest_graph()` function. The search order is:
617
618	1. Current directory -- check for `knowledge_graph.db` and `knowledge_graph.json`
619	2. Common subdirectories -- `results/`, `output/`, `knowledge-base/`
620	`3. Recursive downward walk -- up to 4 levels deep, skipping hidden directories`
621	`4. Parent directory walk -- upward through the directory tree, checking each level and its common subdirectories`
622
623	Within each search phase, `.db` files are preferred over `.json` files. Results are sorted by proximity (closest first).
624
625	```python
626	`from video_processor.integrators.graph_discovery import (`
627	`find_nearest_graph,`
628	`find_knowledge_graphs,`
629	`describe_graph,`
630	`)`
631
632	`# Find the single closest knowledge graph`
633	`path = find_nearest_graph()`
634
635	`# Find all knowledge graphs, sorted by proximity`
636	`paths = find_knowledge_graphs()`
637
638	`# Find graphs starting from a specific directory`
639	`paths = find_knowledge_graphs(start_dir="/path/to/project")`
640
641	`# Disable upward walking`
642	`paths = find_knowledge_graphs(walk_up=False)`
643
644	`# Get summary stats without loading the full graph`
645	`info = describe_graph(path)`
646	`# {"entity_count": 42, "relationship_count": 87,`
647	`# "entity_types": {...}, "store_type": "sqlite"}`
648	```
649
650	Auto-discovery is used by the Companion REPL, the `planopticon query` command, and the planning agent when no explicit `--kb` path is provided.
651

PlanOpticon

Keyboard Shortcuts