PlanOpticon

planopticon / docs / guide / knowledge-graphs.md
1
# Knowledge Graphs
2
3
PlanOpticon builds structured knowledge graphs from video analyses, document ingestion, and other content sources. A knowledge graph captures **entities** (people, technologies, concepts, organizations) and the **relationships** between them, providing a queryable representation of everything discussed or presented in your source material.
4
5
---
6
7
## Storage
8
9
Knowledge graphs are stored as SQLite databases (`knowledge_graph.db`) using Python's built-in `sqlite3` module. This means:
10
11
- **Zero external dependencies.** No database server to install or manage.
12
- **Single-file portability.** Copy the `.db` file to share a knowledge graph.
13
- **WAL mode.** SQLite Write-Ahead Logging is enabled for concurrent read performance.
14
- **JSON fallback.** Knowledge graphs can also be saved as `knowledge_graph.json` for interoperability, though SQLite is preferred for performance and querying.
15
16
### Database Schema
17
18
The SQLite store uses the following tables:
19
20
| Table | Purpose |
21
|---|---|
22
| `entities` | Core entity records with name, type, descriptions, source, and arbitrary properties |
23
| `occurrences` | Where and when each entity was mentioned (source, timestamp, text snippet) |
24
| `relationships` | Directed edges between entities with type, content source, timestamp, and properties |
25
| `sources` | Registered content sources with provenance metadata (source type, title, path, URL, MIME type, ingestion timestamp) |
26
| `source_locations` | Links between sources and specific entities/relationships, with location details (timestamp, page, section, line range, text snippet) |
27
28
All entity lookups are case-insensitive (indexed on `name_lower`). Entities and relationships are indexed on their source and target fields for efficient traversal.
29
30
### Storage Backends
31
32
PlanOpticon supports two storage backends, selected automatically:
33
34
| Backend | When Used | Persistence |
35
|---|---|---|
36
| `SQLiteStore` | When a `db_path` is provided | Persistent on disk |
37
| `InMemoryStore` | When no path is given, or as fallback | In-memory only |
38
39
Both backends implement the same `GraphStore` abstract interface, so all query and manipulation code works identically regardless of backend.
40
41
```python
42
from video_processor.integrators.graph_store import create_store
43
44
# Persistent SQLite store
45
store = create_store("/path/to/knowledge_graph.db")
46
47
# In-memory store (for temporary operations)
48
store = create_store()
49
```
50
51
---
52
53
## Entity Types
54
55
Entities extracted from content are assigned one of the following base types:
56
57
| Type | Description | Specificity Rank |
58
|---|---|---|
59
| `person` | People mentioned or participating | 3 (highest) |
60
| `technology` | Tools, languages, frameworks, platforms | 3 |
61
| `organization` | Companies, teams, departments | 2 |
62
| `time` | Dates, deadlines, time references | 1 |
63
| `diagram` | Visual diagrams extracted from video frames | 1 |
64
| `concept` | General concepts, topics, ideas (default) | 0 (lowest) |
65
66
The specificity rank is used during merge operations: when two entities are matched as duplicates, the more specific type wins (e.g., `technology` overrides `concept`).
67
68
### Planning Taxonomy
69
70
Beyond the base entity types, PlanOpticon includes a planning taxonomy for classifying entities into project-planning categories. The `TaxonomyClassifier` maps extracted entities into these types:
71
72
| Planning Type | Keywords Matched |
73
|---|---|
74
| `goal` | goal, objective, aim, target outcome |
75
| `requirement` | must, should, requirement, need, required |
76
| `constraint` | constraint, limitation, restrict, cannot, must not |
77
| `decision` | decided, decision, chose, selected, agreed |
78
| `risk` | risk, concern, worry, danger, threat |
79
| `assumption` | assume, assumption, expecting, presume |
80
| `dependency` | depends, dependency, relies on, prerequisite, blocked |
81
| `milestone` | milestone, deadline, deliverable, release, launch |
82
| `task` | task, todo, action item, work item, implement |
83
| `feature` | feature, capability, functionality |
84
85
Classification works in two stages:
86
87
1. **Heuristic classification.** Entity descriptions are scanned for the keywords listed above. First match wins.
88
2. **LLM refinement.** If an LLM provider is available, entities are sent to the LLM for more nuanced classification with priority assignment (`high`, `medium`, `low`). LLM results override heuristic results on conflicts.
89
90
Classified entities are used by planning agent skills (project_plan, prd, roadmap, task_breakdown) to produce targeted, context-aware artifacts.
91
92
---
93
94
## Relationship Types
95
96
Relationships are directed edges between entities. The `type` field is a free-text string determined by the LLM during extraction. Common relationship types include:
97
98
- `related_to` (default)
99
- `works_with`
100
- `uses`
101
- `depends_on`
102
- `proposed`
103
- `discussed_by`
104
- `employed_by`
105
- `collaborates_with`
106
- `expert_in`
107
108
### Typed Relationships
109
110
The `add_typed_relationship()` method creates edges with custom labels and optional properties, enabling richer graph semantics:
111
112
```python
113
store.add_typed_relationship(
114
source="Authentication Service",
115
target="PostgreSQL",
116
edge_label="USES_SYSTEM",
117
properties={"purpose": "user credential storage", "version": "15"},
118
)
119
```
120
121
### Relationship Checks
122
123
You can check whether a relationship exists between two entities:
124
125
```python
126
# Check for any relationship
127
store.has_relationship("Alice", "Kubernetes")
128
129
# Check for a specific relationship type
130
store.has_relationship("Alice", "Kubernetes", edge_label="expert_in")
131
```
132
133
---
134
135
## Building a Knowledge Graph
136
137
### From Video Analysis
138
139
The primary path for building a knowledge graph is through video analysis. When you run `planopticon analyze`, the pipeline extracts entities and relationships from:
140
141
- **Transcript segments** -- batched in groups of 10 for efficient API usage, with speaker identification
142
- **Diagram content** -- text extracted from visual diagrams detected in video frames
143
144
```bash
145
planopticon analyze -i meeting.mp4 -o results/
146
# Creates results/knowledge_graph.db
147
```
148
149
### From Document Ingestion
150
151
Documents (Markdown, PDF, DOCX) can be ingested directly into a knowledge graph:
152
153
```bash
154
# Ingest a single file
155
planopticon ingest -i requirements.pdf -o results/
156
157
# Ingest a directory recursively
158
planopticon ingest -i docs/ -o results/ --recursive
159
160
# Ingest into an existing knowledge graph
161
planopticon ingest -i notes.md --db results/knowledge_graph.db
162
```
163
164
### From Batch Processing
165
166
Multiple videos can be processed in batch mode, with all results merged into a single knowledge graph:
167
168
```bash
169
planopticon batch -i videos/ -o results/
170
```
171
172
### Programmatic Construction
173
174
```python
175
from video_processor.integrators.knowledge_graph import KnowledgeGraph
176
177
# Create a new knowledge graph with LLM extraction
178
from video_processor.providers.manager import ProviderManager
179
pm = ProviderManager()
180
kg = KnowledgeGraph(provider_manager=pm, db_path="knowledge_graph.db")
181
182
# Add content (entities and relationships are extracted by LLM)
183
kg.add_content(
184
text="Alice proposed using Kubernetes for container orchestration.",
185
source="meeting_notes",
186
timestamp=120.5,
187
)
188
189
# Process a full transcript
190
kg.process_transcript(transcript_data, batch_size=10)
191
192
# Process diagram results
193
kg.process_diagrams(diagram_results)
194
195
# Save
196
kg.save("knowledge_graph.db")
197
```
198
199
---
200
201
## Merge and Deduplication
202
203
When combining knowledge graphs from multiple sources, PlanOpticon performs intelligent merge with deduplication.
204
205
### Fuzzy Name Matching
206
207
Entity names are compared using Python's `SequenceMatcher` with a threshold of 0.85. This means "Kubernetes" and "kubernetes" are matched exactly (case-insensitive), while "React.js" and "ReactJS" may be matched as duplicates if their similarity ratio meets the threshold.
208
209
### Type Conflict Resolution
210
211
When two entities match but have different types, the more specific type wins based on the specificity ranking:
212
213
| Scenario | Result |
214
|---|---|
215
| `concept` vs `technology` | `technology` wins (rank 3 > rank 0) |
216
| `person` vs `concept` | `person` wins (rank 3 > rank 0) |
217
| `organization` vs `concept` | `organization` wins (rank 2 > rank 0) |
218
| `person` vs `technology` | Keeps whichever was first (equal rank) |
219
220
### Provenance Tracking
221
222
Merged entities receive a `merged_from:<original_name>` description entry, preserving the audit trail of which entities were unified.
223
224
### Programmatic Merge
225
226
```python
227
from video_processor.integrators.knowledge_graph import KnowledgeGraph
228
229
# Load two knowledge graphs
230
kg1 = KnowledgeGraph(db_path="project_a.db")
231
kg2 = KnowledgeGraph(db_path="project_b.db")
232
233
# Merge kg2 into kg1
234
kg1.merge(kg2)
235
236
# Save the merged result
237
kg1.save("merged.db")
238
```
239
240
The merge operation also copies all registered sources and occurrences, so provenance information is preserved across merges.
241
242
---
243
244
## Querying
245
246
PlanOpticon provides two query modes: direct mode (no LLM required) and agentic mode (LLM-powered natural language).
247
248
### Direct Mode
249
250
Direct mode queries are fast, deterministic, and require no API key. They are the right choice for structured lookups.
251
252
#### Stats
253
254
Return entity count, relationship count, and entity type breakdown:
255
256
```bash
257
planopticon query
258
```
259
260
```python
261
engine.stats()
262
# QueryResult with data: {
263
# "entity_count": 42,
264
# "relationship_count": 87,
265
# "entity_types": {"technology": 15, "person": 12, ...}
266
# }
267
```
268
269
#### Entities
270
271
Filter entities by name substring and/or type:
272
273
```bash
274
planopticon query "entities --type technology"
275
planopticon query "entities --name python"
276
```
277
278
```python
279
engine.entities(entity_type="technology")
280
engine.entities(name="python")
281
engine.entities(name="auth", entity_type="concept", limit=10)
282
```
283
284
All filtering is case-insensitive. Results are capped at 50 by default (configurable via `limit`).
285
286
#### Neighbors
287
288
Get an entity and all directly connected nodes and relationships:
289
290
```bash
291
planopticon query "neighbors Alice"
292
```
293
294
```python
295
engine.neighbors("Alice", depth=1)
296
```
297
298
The `depth` parameter controls how many hops to traverse (default 1). The result includes both entity objects and relationship objects.
299
300
#### Relationships
301
302
Filter relationships by source, target, and/or type:
303
304
```bash
305
planopticon query "relationships --source Alice"
306
```
307
308
```python
309
engine.relationships(source="Alice")
310
engine.relationships(target="Kubernetes", rel_type="uses")
311
```
312
313
#### Sources
314
315
List all registered content sources:
316
317
```python
318
engine.sources()
319
```
320
321
#### Provenance
322
323
Get all source locations for a specific entity, showing exactly where it was mentioned:
324
325
```python
326
engine.provenance("Kubernetes")
327
# Returns source locations with timestamps, pages, sections, and text snippets
328
```
329
330
#### Raw SQL
331
332
Execute arbitrary SQL against the SQLite backend (SQLite stores only):
333
334
```python
335
engine.sql("SELECT name, type FROM entities WHERE type = 'technology' ORDER BY name")
336
```
337
338
### Agentic Mode
339
340
Agentic mode accepts natural-language questions and uses the LLM to plan and execute queries. It requires a configured LLM provider.
341
342
```bash
343
planopticon query "What technologies were discussed?"
344
planopticon query "Who are the key people mentioned?"
345
planopticon query "What depends on the authentication service?"
346
```
347
348
The agentic query pipeline:
349
350
1. **Plan.** The LLM receives graph stats and available actions (entities, relationships, neighbors, stats). It selects exactly one action and its parameters.
351
2. **Execute.** The chosen action is run through the direct-mode engine.
352
3. **Synthesize.** The LLM receives the raw query results and the original question, then produces a concise natural-language answer.
353
354
This design ensures the LLM never generates arbitrary code -- it only selects from a fixed set of known query actions.
355
356
```bash
357
# Requires an API key
358
planopticon query "What technologies were discussed?" -p openai
359
360
# Use the interactive REPL for multiple queries
361
planopticon query -I
362
```
363
364
---
365
366
## Graph Query Engine Python API
367
368
The `GraphQueryEngine` class provides the programmatic interface for all query operations.
369
370
### Initialization
371
372
```python
373
from video_processor.integrators.graph_query import GraphQueryEngine
374
from video_processor.integrators.graph_discovery import find_nearest_graph
375
376
# From a .db file
377
path = find_nearest_graph()
378
engine = GraphQueryEngine.from_db_path(path)
379
380
# From a .json file
381
engine = GraphQueryEngine.from_json_path("knowledge_graph.json")
382
383
# With an LLM provider for agentic mode
384
from video_processor.providers.manager import ProviderManager
385
pm = ProviderManager()
386
engine = GraphQueryEngine.from_db_path(path, provider_manager=pm)
387
```
388
389
### QueryResult
390
391
All query methods return a `QueryResult` dataclass with multiple output formats:
392
393
```python
394
result = engine.stats()
395
396
# Human-readable text
397
print(result.to_text())
398
399
# JSON string
400
print(result.to_json())
401
402
# Mermaid diagram (for graph results)
403
result = engine.neighbors("Alice")
404
print(result.to_mermaid())
405
```
406
407
The `QueryResult` contains:
408
409
| Field | Type | Description |
410
|---|---|---|
411
| `data` | Any | The raw result data (dict, list, or scalar) |
412
| `query_type` | str | `"filter"` for direct mode, `"agentic"` for LLM mode, `"sql"` for raw SQL |
413
| `raw_query` | str | String representation of the executed query |
414
| `explanation` | str | Human-readable explanation or LLM-synthesized answer |
415
416
---
417
418
## The Self-Contained HTML Viewer
419
420
PlanOpticon includes a zero-dependency HTML knowledge graph viewer at `knowledge-base/viewer.html`. This file is fully self-contained -- it inlines D3.js and requires no build step, no server, and no internet connection.
421
422
To use it, open `viewer.html` in a browser. It will load and visualize a `knowledge_graph.json` file (place it in the same directory, or use the file picker in the viewer).
423
424
The viewer provides:
425
426
- Interactive force-directed graph layout
427
- Zoom and pan navigation
428
- Entity nodes colored by type
429
- Relationship edges with labels
430
- Click-to-focus on individual entities
431
- Entity detail panel showing descriptions and connections
432
433
This covers approximately 80% of graph exploration needs with zero infrastructure.
434
435
---
436
437
## KG Management Commands
438
439
The `planopticon kg` command group provides utilities for managing knowledge graph files.
440
441
### kg convert
442
443
Convert a knowledge graph between SQLite and JSON formats:
444
445
```bash
446
# SQLite to JSON
447
planopticon kg convert results/knowledge_graph.db output.json
448
449
# JSON to SQLite
450
planopticon kg convert knowledge_graph.json knowledge_graph.db
451
```
452
453
The output format is inferred from the destination file extension. Source and destination must be different formats.
454
455
### kg sync
456
457
Synchronize a `.db` and `.json` knowledge graph, updating the stale one:
458
459
```bash
460
# Auto-detect which is newer and sync
461
planopticon kg sync results/knowledge_graph.db
462
463
# Explicit JSON path
464
planopticon kg sync knowledge_graph.db knowledge_graph.json
465
466
# Force a specific direction
467
planopticon kg sync knowledge_graph.db knowledge_graph.json --direction db-to-json
468
planopticon kg sync knowledge_graph.db knowledge_graph.json --direction json-to-db
469
```
470
471
If `JSON_PATH` is omitted, the `.json` path is derived from the `.db` path (same name, different extension). In `auto` mode (the default), the newer file is used as the source.
472
473
### kg inspect
474
475
Show summary statistics for a knowledge graph file:
476
477
```bash
478
planopticon kg inspect results/knowledge_graph.db
479
```
480
481
Output:
482
483
```
484
File: results/knowledge_graph.db
485
Store: sqlite
486
Entities: 42
487
Relationships: 87
488
Entity types:
489
technology: 15
490
person: 12
491
concept: 10
492
organization: 5
493
```
494
495
Works with both `.db` and `.json` files.
496
497
### kg classify
498
499
Classify knowledge graph entities into planning taxonomy types:
500
501
```bash
502
# Heuristic + LLM classification
503
planopticon kg classify results/knowledge_graph.db
504
505
# Heuristic only (no API key needed)
506
planopticon kg classify results/knowledge_graph.db -p none
507
508
# JSON output
509
planopticon kg classify results/knowledge_graph.db --format json
510
```
511
512
Text output groups entities by planning type:
513
514
```
515
GOALS (3)
516
- Improve system reliability [high]
517
Must achieve 99.9% uptime
518
- Reduce deployment time [medium]
519
Automate the deployment pipeline
520
521
RISKS (2)
522
- Data migration complexity [high]
523
Legacy schema incompatibilities
524
...
525
526
TASKS (5)
527
- Implement OAuth2 flow
528
Set up authentication service
529
...
530
```
531
532
JSON output returns an array of `PlanningEntity` objects with `name`, `planning_type`, `priority`, `description`, and `source_entities` fields.
533
534
### kg from-exchange
535
536
Import a PlanOpticonExchange JSON file into a knowledge graph database:
537
538
```bash
539
# Import to default location (./knowledge_graph.db)
540
planopticon kg from-exchange exchange.json
541
542
# Import to a specific path
543
planopticon kg from-exchange exchange.json -o project.db
544
```
545
546
The PlanOpticonExchange format is a standardized interchange format that includes entities, relationships, and source records.
547
548
---
549
550
## Output Formats
551
552
Query results can be output in three formats:
553
554
### Text (default)
555
556
Human-readable format with entity types in brackets, relationship arrows, and indented details:
557
558
```
559
Found 15 entities
560
[technology] Python -- General-purpose programming language
561
[person] Alice -- Lead engineer on the project
562
[concept] Microservices -- Architectural pattern discussed
563
```
564
565
### JSON
566
567
Full structured output including query metadata:
568
569
```bash
570
planopticon query --format json stats
571
```
572
573
```json
574
{
575
"query_type": "filter",
576
"raw_query": "stats()",
577
"explanation": "Knowledge graph statistics",
578
"data": {
579
"entity_count": 42,
580
"relationship_count": 87,
581
"entity_types": {
582
"technology": 15,
583
"person": 12
584
}
585
}
586
}
587
```
588
589
### Mermaid
590
591
Graph results rendered as Mermaid diagram syntax, ready for embedding in markdown:
592
593
```bash
594
planopticon query --format mermaid "neighbors Alice"
595
```
596
597
```
598
graph LR
599
Alice["Alice"]:::person
600
Python["Python"]:::technology
601
Kubernetes["Kubernetes"]:::technology
602
Alice -- "expert_in" --> Kubernetes
603
Alice -- "works_with" --> Python
604
classDef person fill:#f9d5e5,stroke:#333
605
classDef concept fill:#eeeeee,stroke:#333
606
classDef technology fill:#d5e5f9,stroke:#333
607
classDef organization fill:#f9e5d5,stroke:#333
608
```
609
610
The `KnowledgeGraph.generate_mermaid()` method also produces full-graph Mermaid diagrams, capped at the top 30 most-connected nodes by default.
611
612
---
613
614
## Auto-Discovery
615
616
PlanOpticon automatically locates knowledge graph files using the `find_nearest_graph()` function. The search order is:
617
618
1. **Current directory** -- check for `knowledge_graph.db` and `knowledge_graph.json`
619
2. **Common subdirectories** -- `results/`, `output/`, `knowledge-base/`
620
3. **Recursive downward walk** -- up to 4 levels deep, skipping hidden directories
621
4. **Parent directory walk** -- upward through the directory tree, checking each level and its common subdirectories
622
623
Within each search phase, `.db` files are preferred over `.json` files. Results are sorted by proximity (closest first).
624
625
```python
626
from video_processor.integrators.graph_discovery import (
627
find_nearest_graph,
628
find_knowledge_graphs,
629
describe_graph,
630
)
631
632
# Find the single closest knowledge graph
633
path = find_nearest_graph()
634
635
# Find all knowledge graphs, sorted by proximity
636
paths = find_knowledge_graphs()
637
638
# Find graphs starting from a specific directory
639
paths = find_knowledge_graphs(start_dir="/path/to/project")
640
641
# Disable upward walking
642
paths = find_knowledge_graphs(walk_up=False)
643
644
# Get summary stats without loading the full graph
645
info = describe_graph(path)
646
# {"entity_count": 42, "relationship_count": 87,
647
# "entity_types": {...}, "store_type": "sqlite"}
648
```
649
650
Auto-discovery is used by the Companion REPL, the `planopticon query` command, and the planning agent when no explicit `--kb` path is provided.
651

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button