Knowledge Graphs¶

PlanOpticon builds structured knowledge graphs from video analyses, document ingestion, and other content sources. A knowledge graph captures entities (people, technologies, concepts, organizations) and the relationships between them, providing a queryable representation of everything discussed or presented in your source material.

Storage¶

Knowledge graphs are stored as SQLite databases (knowledge_graph.db) using Python's built-in sqlite3 module. This means:

Zero external dependencies. No database server to install or manage.
Single-file portability. Copy the .db file to share a knowledge graph.
WAL mode. SQLite Write-Ahead Logging is enabled for concurrent read performance.
JSON fallback. Knowledge graphs can also be saved as knowledge_graph.json for interoperability, though SQLite is preferred for performance and querying.

Database Schema¶

The SQLite store uses the following tables:

Table	Purpose
`entities`	Core entity records with name, type, descriptions, source, and arbitrary properties
`occurrences`	Where and when each entity was mentioned (source, timestamp, text snippet)
`relationships`	Directed edges between entities with type, content source, timestamp, and properties
`sources`	Registered content sources with provenance metadata (source type, title, path, URL, MIME type, ingestion timestamp)
`source_locations`	Links between sources and specific entities/relationships, with location details (timestamp, page, section, line range, text snippet)

All entity lookups are case-insensitive (indexed on name_lower). Entities and relationships are indexed on their source and target fields for efficient traversal.

Storage Backends¶

PlanOpticon supports two storage backends, selected automatically:

Backend	When Used	Persistence
`SQLiteStore`	When a `db_path` is provided	Persistent on disk
`InMemoryStore`	When no path is given, or as fallback	In-memory only

Both backends implement the same GraphStore abstract interface, so all query and manipulation code works identically regardless of backend.

from video_processor.integrators.graph_store import create_store

# Persistent SQLite store
store = create_store("/path/to/knowledge_graph.db")

# In-memory store (for temporary operations)
store = create_store()

Entity Types¶

Entities extracted from content are assigned one of the following base types:

Type	Description	Specificity Rank
`person`	People mentioned or participating	3 (highest)
`technology`	Tools, languages, frameworks, platforms	3
`organization`	Companies, teams, departments	2
`time`	Dates, deadlines, time references	1
`diagram`	Visual diagrams extracted from video frames	1
`concept`	General concepts, topics, ideas (default)	0 (lowest)

The specificity rank is used during merge operations: when two entities are matched as duplicates, the more specific type wins (e.g., technology overrides concept).

Planning Taxonomy¶

Beyond the base entity types, PlanOpticon includes a planning taxonomy for classifying entities into project-planning categories. The TaxonomyClassifier maps extracted entities into these types:

Planning Type	Keywords Matched
`goal`	goal, objective, aim, target outcome
`requirement`	must, should, requirement, need, required
`constraint`	constraint, limitation, restrict, cannot, must not
`decision`	decided, decision, chose, selected, agreed
`risk`	risk, concern, worry, danger, threat
`assumption`	assume, assumption, expecting, presume
`dependency`	depends, dependency, relies on, prerequisite, blocked
`milestone`	milestone, deadline, deliverable, release, launch
`task`	task, todo, action item, work item, implement
`feature`	feature, capability, functionality

Classification works in two stages:

Heuristic classification. Entity descriptions are scanned for the keywords listed above. First match wins.
LLM refinement. If an LLM provider is available, entities are sent to the LLM for more nuanced classification with priority assignment (high, medium, low). LLM results override heuristic results on conflicts.

Classified entities are used by planning agent skills (project_plan, prd, roadmap, task_breakdown) to produce targeted, context-aware artifacts.

Relationship Types¶

Relationships are directed edges between entities. The type field is a free-text string determined by the LLM during extraction. Common relationship types include:

related_to (default)
works_with
uses
depends_on
proposed
discussed_by
employed_by
collaborates_with
expert_in

Typed Relationships¶

The add_typed_relationship() method creates edges with custom labels and optional properties, enabling richer graph semantics:

store.add_typed_relationship(
    source="Authentication Service",
    target="PostgreSQL",
    edge_label="USES_SYSTEM",
    properties={"purpose": "user credential storage", "version": "15"},
)

Relationship Checks¶

You can check whether a relationship exists between two entities:

# Check for any relationship
store.has_relationship("Alice", "Kubernetes")

# Check for a specific relationship type
store.has_relationship("Alice", "Kubernetes", edge_label="expert_in")

Building a Knowledge Graph¶

From Video Analysis¶

The primary path for building a knowledge graph is through video analysis. When you run planopticon analyze, the pipeline extracts entities and relationships from:

Transcript segments -- batched in groups of 10 for efficient API usage, with speaker identification
Diagram content -- text extracted from visual diagrams detected in video frames

planopticon analyze -i meeting.mp4 -o results/
# Creates results/knowledge_graph.db

From Document Ingestion¶

Documents (Markdown, PDF, DOCX) can be ingested directly into a knowledge graph:

# Ingest a single file
planopticon ingest -i requirements.pdf -o results/

# Ingest a directory recursively
planopticon ingest -i docs/ -o results/ --recursive

# Ingest into an existing knowledge graph
planopticon ingest -i notes.md --db results/knowledge_graph.db

From Batch Processing¶

Multiple videos can be processed in batch mode, with all results merged into a single knowledge graph:

planopticon batch -i videos/ -o results/

Programmatic Construction¶

from video_processor.integrators.knowledge_graph import KnowledgeGraph

# Create a new knowledge graph with LLM extraction
from video_processor.providers.manager import ProviderManager
pm = ProviderManager()
kg = KnowledgeGraph(provider_manager=pm, db_path="knowledge_graph.db")

# Add content (entities and relationships are extracted by LLM)
kg.add_content(
    text="Alice proposed using Kubernetes for container orchestration.",
    source="meeting_notes",
    timestamp=120.5,
)

# Process a full transcript
kg.process_transcript(transcript_data, batch_size=10)

# Process diagram results
kg.process_diagrams(diagram_results)

# Save
kg.save("knowledge_graph.db")

Merge and Deduplication¶

When combining knowledge graphs from multiple sources, PlanOpticon performs intelligent merge with deduplication.

Fuzzy Name Matching¶

Entity names are compared using Python's SequenceMatcher with a threshold of 0.85. This means "Kubernetes" and "kubernetes" are matched exactly (case-insensitive), while "React.js" and "ReactJS" may be matched as duplicates if their similarity ratio meets the threshold.

Type Conflict Resolution¶

When two entities match but have different types, the more specific type wins based on the specificity ranking:

Scenario	Result
`concept` vs `technology`	`technology` wins (rank 3 > rank 0)
`person` vs `concept`	`person` wins (rank 3 > rank 0)
`organization` vs `concept`	`organization` wins (rank 2 > rank 0)
`person` vs `technology`	Keeps whichever was first (equal rank)

Provenance Tracking¶

Merged entities receive a merged_from:<original_name> description entry, preserving the audit trail of which entities were unified.

Programmatic Merge¶

from video_processor.integrators.knowledge_graph import KnowledgeGraph

# Load two knowledge graphs
kg1 = KnowledgeGraph(db_path="project_a.db")
kg2 = KnowledgeGraph(db_path="project_b.db")

# Merge kg2 into kg1
kg1.merge(kg2)

# Save the merged result
kg1.save("merged.db")

The merge operation also copies all registered sources and occurrences, so provenance information is preserved across merges.

Querying¶

PlanOpticon provides two query modes: direct mode (no LLM required) and agentic mode (LLM-powered natural language).

Direct Mode¶

Direct mode queries are fast, deterministic, and require no API key. They are the right choice for structured lookups.

Stats¶

Return entity count, relationship count, and entity type breakdown:

planopticon query

engine.stats()
# QueryResult with data: {
#   "entity_count": 42,
#   "relationship_count": 87,
#   "entity_types": {"technology": 15, "person": 12, ...}
# }

Entities¶

Filter entities by name substring and/or type:

planopticon query "entities --type technology"
planopticon query "entities --name python"

engine.entities(entity_type="technology")
engine.entities(name="python")
engine.entities(name="auth", entity_type="concept", limit=10)

All filtering is case-insensitive. Results are capped at 50 by default (configurable via limit).

Neighbors¶

Get an entity and all directly connected nodes and relationships:

planopticon query "neighbors Alice"

engine.neighbors("Alice", depth=1)

The depth parameter controls how many hops to traverse (default 1). The result includes both entity objects and relationship objects.

Relationships¶

Filter relationships by source, target, and/or type:

planopticon query "relationships --source Alice"

engine.relationships(source="Alice")
engine.relationships(target="Kubernetes", rel_type="uses")

Sources¶

List all registered content sources:

engine.sources()

Provenance¶

Get all source locations for a specific entity, showing exactly where it was mentioned:

engine.provenance("Kubernetes")
# Returns source locations with timestamps, pages, sections, and text snippets

Raw SQL¶

Execute arbitrary SQL against the SQLite backend (SQLite stores only):

engine.sql("SELECT name, type FROM entities WHERE type = 'technology' ORDER BY name")

Agentic Mode¶

Agentic mode accepts natural-language questions and uses the LLM to plan and execute queries. It requires a configured LLM provider.

planopticon query "What technologies were discussed?"
planopticon query "Who are the key people mentioned?"
planopticon query "What depends on the authentication service?"

The agentic query pipeline:

Plan. The LLM receives graph stats and available actions (entities, relationships, neighbors, stats). It selects exactly one action and its parameters.
Execute. The chosen action is run through the direct-mode engine.
Synthesize. The LLM receives the raw query results and the original question, then produces a concise natural-language answer.

This design ensures the LLM never generates arbitrary code -- it only selects from a fixed set of known query actions.

# Requires an API key
planopticon query "What technologies were discussed?" -p openai

# Use the interactive REPL for multiple queries
planopticon query -I

Graph Query Engine Python API¶

The GraphQueryEngine class provides the programmatic interface for all query operations.

Initialization¶

from video_processor.integrators.graph_query import GraphQueryEngine
from video_processor.integrators.graph_discovery import find_nearest_graph

# From a .db file
path = find_nearest_graph()
engine = GraphQueryEngine.from_db_path(path)

# From a .json file
engine = GraphQueryEngine.from_json_path("knowledge_graph.json")

# With an LLM provider for agentic mode
from video_processor.providers.manager import ProviderManager
pm = ProviderManager()
engine = GraphQueryEngine.from_db_path(path, provider_manager=pm)

QueryResult¶

All query methods return a QueryResult dataclass with multiple output formats:

result = engine.stats()

# Human-readable text
print(result.to_text())

# JSON string
print(result.to_json())

# Mermaid diagram (for graph results)
result = engine.neighbors("Alice")
print(result.to_mermaid())

The QueryResult contains:

Field	Type	Description
`data`	Any	The raw result data (dict, list, or scalar)
`query_type`	str	`"filter"` for direct mode, `"agentic"` for LLM mode, `"sql"` for raw SQL
`raw_query`	str	String representation of the executed query
`explanation`	str	Human-readable explanation or LLM-synthesized answer

The Self-Contained HTML Viewer¶

PlanOpticon includes a zero-dependency HTML knowledge graph viewer at knowledge-base/viewer.html. This file is fully self-contained -- it inlines D3.js and requires no build step, no server, and no internet connection.

To use it, open viewer.html in a browser. It will load and visualize a knowledge_graph.json file (place it in the same directory, or use the file picker in the viewer).

The viewer provides:

Interactive force-directed graph layout
Zoom and pan navigation
Entity nodes colored by type
Relationship edges with labels
Click-to-focus on individual entities
Entity detail panel showing descriptions and connections

This covers approximately 80% of graph exploration needs with zero infrastructure.

KG Management Commands¶

The planopticon kg command group provides utilities for managing knowledge graph files.

kg convert¶

Convert a knowledge graph between SQLite and JSON formats:

# SQLite to JSON
planopticon kg convert results/knowledge_graph.db output.json

# JSON to SQLite
planopticon kg convert knowledge_graph.json knowledge_graph.db

The output format is inferred from the destination file extension. Source and destination must be different formats.

kg sync¶

Synchronize a .db and .json knowledge graph, updating the stale one:

# Auto-detect which is newer and sync
planopticon kg sync results/knowledge_graph.db

# Explicit JSON path
planopticon kg sync knowledge_graph.db knowledge_graph.json

# Force a specific direction
planopticon kg sync knowledge_graph.db knowledge_graph.json --direction db-to-json
planopticon kg sync knowledge_graph.db knowledge_graph.json --direction json-to-db

If JSON_PATH is omitted, the .json path is derived from the .db path (same name, different extension). In auto mode (the default), the newer file is used as the source.

kg inspect¶

Show summary statistics for a knowledge graph file:

planopticon kg inspect results/knowledge_graph.db

Output:

File: results/knowledge_graph.db
Store: sqlite
Entities: 42
Relationships: 87
Entity types:
  technology: 15
  person: 12
  concept: 10
  organization: 5

Works with both .db and .json files.

kg classify¶

Classify knowledge graph entities into planning taxonomy types:

# Heuristic + LLM classification
planopticon kg classify results/knowledge_graph.db

# Heuristic only (no API key needed)
planopticon kg classify results/knowledge_graph.db -p none

# JSON output
planopticon kg classify results/knowledge_graph.db --format json

Text output groups entities by planning type:

GOALS (3)
  - Improve system reliability [high]
    Must achieve 99.9% uptime
  - Reduce deployment time [medium]
    Automate the deployment pipeline

RISKS (2)
  - Data migration complexity [high]
    Legacy schema incompatibilities
  ...

TASKS (5)
  - Implement OAuth2 flow
    Set up authentication service
  ...

JSON output returns an array of PlanningEntity objects with name, planning_type, priority, description, and source_entities fields.

kg from-exchange¶

Import a PlanOpticonExchange JSON file into a knowledge graph database:

# Import to default location (./knowledge_graph.db)
planopticon kg from-exchange exchange.json

# Import to a specific path
planopticon kg from-exchange exchange.json -o project.db

The PlanOpticonExchange format is a standardized interchange format that includes entities, relationships, and source records.

Output Formats¶

Query results can be output in three formats:

Text (default)¶

Human-readable format with entity types in brackets, relationship arrows, and indented details:

Found 15 entities
  [technology] Python -- General-purpose programming language
  [person] Alice -- Lead engineer on the project
  [concept] Microservices -- Architectural pattern discussed

JSON¶

Full structured output including query metadata:

planopticon query --format json stats

{
  "query_type": "filter",
  "raw_query": "stats()",
  "explanation": "Knowledge graph statistics",
  "data": {
    "entity_count": 42,
    "relationship_count": 87,
    "entity_types": {
      "technology": 15,
      "person": 12
    }
  }
}

Mermaid¶

Graph results rendered as Mermaid diagram syntax, ready for embedding in markdown:

planopticon query --format mermaid "neighbors Alice"

graph LR
    Alice["Alice"]:::person
    Python["Python"]:::technology
    Kubernetes["Kubernetes"]:::technology
    Alice -- "expert_in" --> Kubernetes
    Alice -- "works_with" --> Python
    classDef person fill:#f9d5e5,stroke:#333
    classDef concept fill:#eeeeee,stroke:#333
    classDef technology fill:#d5e5f9,stroke:#333
    classDef organization fill:#f9e5d5,stroke:#333

The KnowledgeGraph.generate_mermaid() method also produces full-graph Mermaid diagrams, capped at the top 30 most-connected nodes by default.

Auto-Discovery¶

PlanOpticon automatically locates knowledge graph files using the find_nearest_graph() function. The search order is:

Current directory -- check for knowledge_graph.db and knowledge_graph.json
Common subdirectories -- results/, output/, knowledge-base/
Recursive downward walk -- up to 4 levels deep, skipping hidden directories
Parent directory walk -- upward through the directory tree, checking each level and its common subdirectories

Within each search phase, .db files are preferred over .json files. Results are sorted by proximity (closest first).

from video_processor.integrators.graph_discovery import (
    find_nearest_graph,
    find_knowledge_graphs,
    describe_graph,
)

# Find the single closest knowledge graph
path = find_nearest_graph()

# Find all knowledge graphs, sorted by proximity
paths = find_knowledge_graphs()

# Find graphs starting from a specific directory
paths = find_knowledge_graphs(start_dir="/path/to/project")

# Disable upward walking
paths = find_knowledge_graphs(walk_up=False)

# Get summary stats without loading the full graph
info = describe_graph(path)
# {"entity_count": 42, "relationship_count": 87,
#  "entity_types": {...}, "store_type": "sqlite"}

Auto-discovery is used by the Companion REPL, the planopticon query command, and the planning agent when no explicit --kb path is provided.

PlanOpticon