PlanOpticon

docs: comprehensive v0.4.0 documentation — 27 pages, use cases, FAQ New pages (11): - guide/companion.md — Interactive Companion REPL - guide/planning-agent.md — Planning Agent and 11 skills - guide/knowledge-graphs.md — KG storage, querying, taxonomy, viewer - guide/authentication.md — OAuth setup for 6 services - guide/document-ingestion.md — PDF, Markdown, plaintext ingestion - guide/export.md — 7 markdown doc types, Obsidian, Notion, Wiki, Exchange - api/agent.md — PlanningAgent, AgentContext, Skills API - api/sources.md — BaseSource, 21 source connectors - api/auth.md — AuthConfig, OAuthManager API - use-cases.md — 10 real-world workflows with full commands - faq.md — FAQ and troubleshooting guide Updated pages (10): - guide/output-formats.md — all output formats including SQLite KG, Exchange - guide/single-video.md — taxonomy, --speakers, --output-format, post-analysis - guide/batch.md — fuzzy merge, querying results, incremental processing - architecture/pipeline.md — 5 mermaid diagrams, all pipelines - contributing.md — ruff, ProviderRegistry, skills, processors, exporters - getting-started/configuration.md — full .env example with OAuth walkthroughs - api/models.md — all 17+ Pydantic models documented - api/providers.md — BaseProvider, ProviderRegistry, ProviderManager - api/analyzers.md — DiagramAnalyzer, ContentAnalyzer, ActionDetector - mkdocs.yml — nav updated with all new pages Also fixes check-yaml pre-commit hook to handle mkdocs.yml Python tags.

lmata 2026-03-08 00:17 trunk

Commit 3da1f8f9af3d2ae023942b141853a68b784a7fd986e8a1e92f650e182fe78dbd

Parent 0981a082d9bd09a…

22 files changed +1 +407 +382 +377 +498 +496 +281 +292 -14 +444 -13 +301 +141 -25 +525 +120 -10 +531 +434 +756 +650 +303 -21 +425 +190 -14 +342 +11

~ .pre-commit-config.yaml + docs/api/agent.md ~ docs/api/analyzers.md + docs/api/auth.md ~ docs/api/models.md ~ docs/api/providers.md + docs/api/sources.md ~ docs/architecture/pipeline.md ~ docs/contributing.md + docs/faq.md ~ docs/getting-started/configuration.md + docs/guide/authentication.md ~ docs/guide/batch.md + docs/guide/companion.md + docs/guide/document-ingestion.md + docs/guide/export.md + docs/guide/knowledge-graphs.md ~ docs/guide/output-formats.md + docs/guide/planning-agent.md ~ docs/guide/single-video.md + docs/use-cases.md ~ mkdocs.yml

M .pre-commit-config.yaml

		--- .pre-commit-config.yaml
		+++ .pre-commit-config.yaml
		@@ -9,9 +9,10 @@
9	9	rev: v5.0.0
10	10	hooks:
11	11	- id: trailing-whitespace
12	12	- id: end-of-file-fixer
13	13	- id: check-yaml
	14	+ args: [--unsafe]
14	15	- id: check-added-large-files
15	16	args: [--maxkb=500]
16	17	- id: check-merge-conflict
17	18	- id: detect-private-key
18	19
19	20	ADDED docs/api/agent.md

	--- .pre-commit-config.yaml
	+++ .pre-commit-config.yaml
	@@ -9,9 +9,10 @@
9	rev: v5.0.0
10	hooks:
11	- id: trailing-whitespace
12	- id: end-of-file-fixer
13	- id: check-yaml

14	- id: check-added-large-files
15	args: [--maxkb=500]
16	- id: check-merge-conflict
17	- id: detect-private-key
18
19	DDED docs/api/agent.md

	--- .pre-commit-config.yaml
	+++ .pre-commit-config.yaml
	@@ -9,9 +9,10 @@
9	rev: v5.0.0
10	hooks:
11	- id: trailing-whitespace
12	- id: end-of-file-fixer
13	- id: check-yaml
14	args: [--unsafe]
15	- id: check-added-large-files
16	args: [--maxkb=500]
17	- id: check-merge-conflict
18	- id: detect-private-key
19
20	DDED docs/api/agent.md

A docs/api/agent.md

+407

		--- a/docs/api/agent.md
		+++ b/docs/api/agent.md
		@@ -0,0 +1,407 @@
	1	+# Agent API Reference
	2	+
	3	+::: video_processor.agent.agent_loop
	4	+
	5	+::: video_processor.agent.skills.base
	6	+
	7	+::: video_processor.agent.kb_context
	8	+
	9	+---
	10	+
	11	+## Overview
	12	+
	13	+The agent module implements a planning agent that synthesizes knowledge from processed video content into actionable artifacts such as project plans, PRDs, task breakdowns, and roadmaps. The agent operates on knowledge graphs loaded via `KBContext` and uses a skill-based architecture for extensibility.
	14	+
	15	+Key components:
	16	+
	17	+- `PlanningAgent` -- orchestrates skill selection and execution based on user requests
	18	+- `AgentContext` -- shared state passed between skills during execution
	19	+- `Skill` (ABC) -- base class for pluggable agent capabilities
	20	+- `Artifact` -- output produced by skill execution
	21	+- `KBContext` -- loads and merges multiple knowledge graph sources
	22	+
	23	+---
	24	+
	25	+## PlanningAgent
	26	+
	27	+```python
	28	+from video_processor.agent.agent_loop import PlanningAgent
	29	+```
	30	+
	31	+AI agent that synthesizes knowledge into planning artifacts. Uses an LLM to select which skills to execute for a given request, or falls back to keyword matching when no LLM is available.
	32	+
	33	+### Constructor
	34	+
	35	+```python
	36	+def __init__(self, context: AgentContext)
	37	+```
	38	+
	39	+\| Parameter \| Type \| Description \|
	40	+\|---\|---\|---\|
	41	+\| `context` \| `AgentContext` \| Shared context containing knowledge graph, query engine, and provider \|
	42	+
	43	+### from_kb_paths()
	44	+
	45	+```python
	46	+@classmethod
	47	+def from_kb_paths(
	48	+ cls,
	49	+ kb_paths: List[Path],
	50	+ provider_manager=None,
	51	+) -> PlanningAgent
	52	+```
	53	+
	54	+Factory method that creates an agent from one or more knowledge base file paths. Handles loading and merging knowledge graphs automatically.
	55	+
	56	+Parameters:
	57	+
	58	+\| Parameter \| Type \| Default \| Description \|
	59	+\|---\|---\|---\|---\|
	60	+\| `kb_paths` \| `List[Path]` \| required \| Paths to `.db` or `.json` knowledge graph files, or directories to search \|
	61	+\| `provider_manager` \| `ProviderManager` \| `None` \| LLM provider for agent operations \|
	62	+
	63	+Returns: `PlanningAgent` -- configured agent with loaded knowledge base.
	64	+
	65	+```python
	66	+from pathlib import Path
	67	+from video_processor.agent.agent_loop import PlanningAgent
	68	+from video_processor.providers.manager import ProviderManager
	69	+
	70	+agent = PlanningAgent.from_kb_paths(
	71	+ kb_paths=[Path("results/knowledge_graph.db")],
	72	+ provider_manager=ProviderManager(),
	73	+)
	74	+```
	75	+
	76	+### execute()
	77	+
	78	+```python
	79	+def execute(self, request: str) -> List[Artifact]
	80	+```
	81	+
	82	+Execute a user request by selecting and running appropriate skills.
	83	+
	84	+Process:
	85	+
	86	+1. Build a context summary from the knowledge base statistics
	87	+2. Format available skills with their descriptions
	88	+3. Ask the LLM to select skills and parameters (or use keyword matching as fallback)
	89	+4. Execute selected skills in order, accumulating artifacts
	90	+
	91	+Parameters:
	92	+
	93	+\| Parameter \| Type \| Description \|
	94	+\|---\|---\|---\|
	95	+\| `request` \| `str` \| Natural language request (e.g., "Generate a project plan") \|
	96	+
	97	+Returns: `List[Artifact]` -- generated artifacts from skill execution.
	98	+
	99	+LLM mode: The LLM receives the knowledge base summary, available skills, and user request, then returns a JSON array of `{"skill": "name", "params": {}}` objects to execute.
	100	+
	101	+Keyword fallback: Without an LLM, skills are matched by splitting the skill name into words and checking if any appear in the request text.
	102	+
	103	+```python
	104	+artifacts = agent.execute("Create a PRD and task breakdown")
	105	+for artifact in artifacts:
	106	+ print(f"--- {artifact.name} ({artifact.artifact_type}) ---")
	107	+ print(artifact.content[:500])
	108	+```
	109	+
	110	+### chat()
	111	+
	112	+```python
	113	+def chat(self, message: str) -> str
	114	+```
	115	+
	116	+Interactive chat mode. Maintains conversation history and provides contextual responses about the loaded knowledge base.
	117	+
	118	+Parameters:
	119	+
	120	+\| Parameter \| Type \| Description \|
	121	+\|---\|---\|---\|
	122	+\| `message` \| `str` \| User message \|
	123	+
	124	+Returns: `str` -- assistant response.
	125	+
	126	+The chat mode provides the LLM with:
	127	+
	128	+- Knowledge base statistics (entity counts, relationship counts)
	129	+- List of previously generated artifacts
	130	+- Full conversation history
	131	+- Available REPL commands (e.g., `/entities`, `/search`, `/plan`, `/export`)
	132	+
	133	+Requires a configured `provider_manager`. Returns a static error message if no LLM is available.
	134	+
	135	+```python
	136	+response = agent.chat("What technologies were discussed in the meetings?")
	137	+print(response)
	138	+
	139	+response = agent.chat("Which of those have the most dependencies?")
	140	+print(response)
	141	+```
	142	+
	143	+---
	144	+
	145	+## AgentContext
	146	+
	147	+```python
	148	+from video_processor.agent.skills.base import AgentContext
	149	+```
	150	+
	151	+Shared state dataclass passed to all skills during execution. Accumulates artifacts and conversation history across the agent session.
	152	+
	153	+\| Field \| Type \| Default \| Description \|
	154	+\|---\|---\|---\|---\|
	155	+\| `knowledge_graph` \| `Any` \| `None` \| `KnowledgeGraph` instance \|
	156	+\| `query_engine` \| `Any` \| `None` \| `GraphQueryEngine` instance for querying the KG \|
	157	+\| `provider_manager` \| `Any` \| `None` \| `ProviderManager` instance for LLM calls \|
	158	+\| `planning_entities` \| `List[Any]` \| `[]` \| Extracted `PlanningEntity` instances \|
	159	+\| `user_requirements` \| `Dict[str, Any]` \| `{}` \| User-specified requirements and constraints \|
	160	+\| `conversation_history` \| `List[Dict[str, str]]` \| `[]` \| Chat message history (`role`, `content` dicts) \|
	161	+\| `artifacts` \| `List[Artifact]` \| `[]` \| Previously generated artifacts \|
	162	+\| `config` \| `Dict[str, Any]` \| `{}` \| Additional configuration \|
	163	+
	164	+```python
	165	+from video_processor.agent.skills.base import AgentContext
	166	+
	167	+context = AgentContext(
	168	+ knowledge_graph=kg,
	169	+ query_engine=engine,
	170	+ provider_manager=pm,
	171	+ config={"output_format": "markdown"},
	172	+)
	173	+```
	174	+
	175	+---
	176	+
	177	+## Skill (ABC)
	178	+
	179	+```python
	180	+from video_processor.agent.skills.base import Skill
	181	+```
	182	+
	183	+Base class for agent skills. Each skill represents a discrete capability that produces an artifact from the agent context.
	184	+
	185	+Class attributes:
	186	+
	187	+\| Attribute \| Type \| Description \|
	188	+\|---\|---\|---\|
	189	+\| `name` \| `str` \| Skill identifier (e.g., `"project_plan"`, `"prd"`) \|
	190	+\| `description` \| `str` \| Human-readable description shown to the LLM for skill selection \|
	191	+
	192	+### execute()
	193	+
	194	+```python
	195	+@abstractmethod
	196	+def execute(self, context: AgentContext, **kwargs) -> Artifact
	197	+```
	198	+
	199	+Execute this skill and return an artifact. Receives the shared agent context and any parameters selected by the LLM planner.
	200	+
	201	+### can_execute()
	202	+
	203	+```python
	204	+def can_execute(self, context: AgentContext) -> bool
	205	+```
	206	+
	207	+Check if this skill can execute given the current context. The default implementation requires both `knowledge_graph` and `provider_manager` to be set. Override for skills with different requirements.
	208	+
	209	+Returns: `bool`
	210	+
	211	+### Implementing a custom skill
	212	+
	213	+```python
	214	+from video_processor.agent.skills.base import Skill, Artifact, AgentContext, register_skill
	215	+
	216	+class SummarySkill(Skill):
	217	+ name = "summary"
	218	+ description = "Generate a concise summary of the knowledge base"
	219	+
	220	+ def execute(self, context: AgentContext, **kwargs) -> Artifact:
	221	+ stats = context.query_engine.stats()
	222	+ prompt = f"Summarize this knowledge base:\n{stats.to_text()}"
	223	+ content = context.provider_manager.chat(
	224	+ [{"role": "user", "content": prompt}]
	225	+ )
	226	+ return Artifact(
	227	+ name="Knowledge Base Summary",
	228	+ content=content,
	229	+ artifact_type="document",
	230	+ format="markdown",
	231	+ )
	232	+
	233	+ def can_execute(self, context: AgentContext) -> bool:
	234	+ return context.query_engine is not None and context.provider_manager is not None
	235	+
	236	+# Register the skill so the agent can discover it
	237	+register_skill(SummarySkill())
	238	+```
	239	+
	240	+---
	241	+
	242	+## Artifact
	243	+
	244	+```python
	245	+from video_processor.agent.skills.base import Artifact
	246	+```
	247	+
	248	+Dataclass representing the output of a skill execution.
	249	+
	250	+\| Field \| Type \| Default \| Description \|
	251	+\|---\|---\|---\|---\|
	252	+\| `name` \| `str` \| required \| Human-readable artifact name \|
	253	+\| `content` \| `str` \| required \| Generated content (Markdown, JSON, Mermaid, etc.) \|
	254	+\| `artifact_type` \| `str` \| required \| Type: `"project_plan"`, `"prd"`, `"roadmap"`, `"task_list"`, `"document"`, `"issues"` \|
	255	+\| `format` \| `str` \| `"markdown"` \| Content format: `"markdown"`, `"json"`, `"mermaid"` \|
	256	+\| `metadata` \| `Dict[str, Any]` \| `{}` \| Additional metadata \|
	257	+
	258	+---
	259	+
	260	+## Skill Registry Functions
	261	+
	262	+### register_skill()
	263	+
	264	+```python
	265	+def register_skill(skill: Skill) -> None
	266	+```
	267	+
	268	+Register a skill instance in the global registry. Skills must be registered before the agent can discover and execute them.
	269	+
	270	+### get_skill()
	271	+
	272	+```python
	273	+def get_skill(name: str) -> Optional[Skill]
	274	+```
	275	+
	276	+Look up a registered skill by name.
	277	+
	278	+Returns: `Optional[Skill]` -- the skill instance, or `None` if not found.
	279	+
	280	+### list_skills()
	281	+
	282	+```python
	283	+def list_skills() -> List[Skill]
	284	+```
	285	+
	286	+Return all registered skill instances.
	287	+
	288	+---
	289	+
	290	+## KBContext
	291	+
	292	+```python
	293	+from video_processor.agent.kb_context import KBContext
	294	+```
	295	+
	296	+Loads and merges multiple knowledge graph sources into a unified context for agent consumption. Supports both FalkorDB (`.db`) and JSON (`.json`) formats, and can auto-discover graphs in a directory tree.
	297	+
	298	+### Constructor
	299	+
	300	+```python
	301	+def __init__(self)
	302	+```
	303	+
	304	+Creates an empty context. Use `add_source()` to add knowledge graph paths, then `load()` to initialize.
	305	+
	306	+### add_source()
	307	+
	308	+```python
	309	+def add_source(self, path) -> None
	310	+```
	311	+
	312	+Add a knowledge graph source.
	313	+
	314	+Parameters:
	315	+
	316	+\| Parameter \| Type \| Description \|
	317	+\|---\|---\|---\|
	318	+\| `path` \| `str \\| Path` \| Path to a `.db` file, `.json` file, or directory to search for knowledge graphs \|
	319	+
	320	+If `path` is a directory, it is searched recursively for knowledge graph files using `find_knowledge_graphs()`.
	321	+
	322	+Raises: `FileNotFoundError` if the path does not exist.
	323	+
	324	+### load()
	325	+
	326	+```python
	327	+def load(self, provider_manager=None) -> KBContext
	328	+```
	329	+
	330	+Load and merge all added sources into a single knowledge graph and query engine.
	331	+
	332	+Parameters:
	333	+
	334	+\| Parameter \| Type \| Default \| Description \|
	335	+\|---\|---\|---\|---\|
	336	+\| `provider_manager` \| `ProviderManager` \| `None` \| LLM provider for the knowledge graph and query engine \|
	337	+
	338	+Returns: `KBContext` -- self, for method chaining.
	339	+
	340	+### Properties
	341	+
	342	+\| Property \| Type \| Description \|
	343	+\|---\|---\|---\|
	344	+\| `knowledge_graph` \| `KnowledgeGraph` \| The merged knowledge graph (raises `RuntimeError` if not loaded) \|
	345	+\| `query_engine` \| `GraphQueryEngine` \| Query engine for the merged graph (raises `RuntimeError` if not loaded) \|
	346	+\| `sources` \| `List[Path]` \| List of resolved source paths \|
	347	+
	348	+### summary()
	349	+
	350	+```python
	351	+def summary(self) -> str
	352	+```
	353	+
	354	+Generate a brief text summary of the loaded knowledge base, including entity counts by type and relationship counts.
	355	+
	356	+Returns: `str` -- multi-line summary text.
	357	+
	358	+### auto_discover()
	359	+
	360	+```python
	361	+@classmethod
	362	+def auto_discover(
	363	+ cls,
	364	+ start_dir: Optional[Path] = None,
	365	+ provider_manager=None,
	366	+) -> KBContext
	367	+```
	368	+
	369	+Factory method that creates a `KBContext` by auto-discovering knowledge graphs near `start_dir` (defaults to current directory).
	370	+
	371	+Returns: `KBContext` -- loaded context (may have zero sources if none found).
	372	+
	373	+### Usage examples
	374	+
	375	+```python
	376	+from pathlib import Path
	377	+from video_processor.agent.kb_context import KBContext
	378	+
	379	+# Manual source management
	380	+kb = KBContext()
	381	+kb.add_source(Path("project_a/knowledge_graph.db"))
	382	+kb.add_source(Path("project_b/results/")) # searches directory
	383	+kb.load(provider_manager=pm)
	384	+
	385	+print(kb.summary())
	386	+# Knowledge base: 3 source(s)
	387	+# Entities: 142
	388	+# Relationships: 89
	389	+# Entity types:
	390	+# technology: 45
	391	+# person: 23
	392	+# concept: 74
	393	+
	394	+# Auto-discover from current directory
	395	+kb = KBContext.auto_discover()
	396	+
	397	+# Use with the agent
	398	+from video_processor.agent.agent_loop import PlanningAgent
	399	+from video_processor.agent.skills.base import AgentContext
	400	+
	401	+context = AgentContext(
	402	+ knowledge_graph=kb.knowledge_graph,
	403	+ query_engine=kb.query_engine,
	404	+ provider_manager=pm,
	405	+)
	406	+agent = PlanningAgent(context)
	407	+```

	--- a/docs/api/agent.md
	+++ b/docs/api/agent.md
	@@ -0,0 +1,407 @@

	--- a/docs/api/agent.md
	+++ b/docs/api/agent.md
	@@ -0,0 +1,407 @@
1	# Agent API Reference
2
3	::: video_processor.agent.agent_loop
4
5	::: video_processor.agent.skills.base
6
7	::: video_processor.agent.kb_context
8
9	---
10
11	## Overview
12
13	The agent module implements a planning agent that synthesizes knowledge from processed video content into actionable artifacts such as project plans, PRDs, task breakdowns, and roadmaps. The agent operates on knowledge graphs loaded via `KBContext` and uses a skill-based architecture for extensibility.
14
15	Key components:
16
17	- `PlanningAgent` -- orchestrates skill selection and execution based on user requests
18	- `AgentContext` -- shared state passed between skills during execution
19	- `Skill` (ABC) -- base class for pluggable agent capabilities
20	- `Artifact` -- output produced by skill execution
21	- `KBContext` -- loads and merges multiple knowledge graph sources
22
23	---
24
25	## PlanningAgent
26
27	```python
28	from video_processor.agent.agent_loop import PlanningAgent
29	```
30
31	AI agent that synthesizes knowledge into planning artifacts. Uses an LLM to select which skills to execute for a given request, or falls back to keyword matching when no LLM is available.
32
33	### Constructor
34
35	```python
36	def __init__(self, context: AgentContext)
37	```
38
39	\| Parameter \| Type \| Description \|
40	\|---\|---\|---\|
41	\| `context` \| `AgentContext` \| Shared context containing knowledge graph, query engine, and provider \|
42
43	### from_kb_paths()
44
45	```python
46	@classmethod
47	def from_kb_paths(
48	cls,
49	kb_paths: List[Path],
50	provider_manager=None,
51	) -> PlanningAgent
52	```
53
54	Factory method that creates an agent from one or more knowledge base file paths. Handles loading and merging knowledge graphs automatically.
55
56	Parameters:
57
58	\| Parameter \| Type \| Default \| Description \|
59	\|---\|---\|---\|---\|
60	\| `kb_paths` \| `List[Path]` \| required \| Paths to `.db` or `.json` knowledge graph files, or directories to search \|
61	\| `provider_manager` \| `ProviderManager` \| `None` \| LLM provider for agent operations \|
62
63	Returns: `PlanningAgent` -- configured agent with loaded knowledge base.
64
65	```python
66	from pathlib import Path
67	from video_processor.agent.agent_loop import PlanningAgent
68	from video_processor.providers.manager import ProviderManager
69
70	agent = PlanningAgent.from_kb_paths(
71	kb_paths=[Path("results/knowledge_graph.db")],
72	provider_manager=ProviderManager(),
73	)
74	```
75
76	### execute()
77
78	```python
79	def execute(self, request: str) -> List[Artifact]
80	```
81
82	Execute a user request by selecting and running appropriate skills.
83
84	Process:
85
86	1. Build a context summary from the knowledge base statistics
87	2. Format available skills with their descriptions
88	3. Ask the LLM to select skills and parameters (or use keyword matching as fallback)
89	4. Execute selected skills in order, accumulating artifacts
90
91	Parameters:
92
93	\| Parameter \| Type \| Description \|
94	\|---\|---\|---\|
95	\| `request` \| `str` \| Natural language request (e.g., "Generate a project plan") \|
96
97	Returns: `List[Artifact]` -- generated artifacts from skill execution.
98
99	LLM mode: The LLM receives the knowledge base summary, available skills, and user request, then returns a JSON array of `{"skill": "name", "params": {}}` objects to execute.
100
101	Keyword fallback: Without an LLM, skills are matched by splitting the skill name into words and checking if any appear in the request text.
102
103	```python
104	artifacts = agent.execute("Create a PRD and task breakdown")
105	for artifact in artifacts:
106	print(f"--- {artifact.name} ({artifact.artifact_type}) ---")
107	print(artifact.content[:500])
108	```
109
110	### chat()
111
112	```python
113	def chat(self, message: str) -> str
114	```
115
116	Interactive chat mode. Maintains conversation history and provides contextual responses about the loaded knowledge base.
117
118	Parameters:
119
120	\| Parameter \| Type \| Description \|
121	\|---\|---\|---\|
122	\| `message` \| `str` \| User message \|
123
124	Returns: `str` -- assistant response.
125
126	The chat mode provides the LLM with:
127
128	- Knowledge base statistics (entity counts, relationship counts)
129	- List of previously generated artifacts
130	- Full conversation history
131	- Available REPL commands (e.g., `/entities`, `/search`, `/plan`, `/export`)
132
133	Requires a configured `provider_manager`. Returns a static error message if no LLM is available.
134
135	```python
136	response = agent.chat("What technologies were discussed in the meetings?")
137	print(response)
138
139	response = agent.chat("Which of those have the most dependencies?")
140	print(response)
141	```
142
143	---
144
145	## AgentContext
146
147	```python
148	from video_processor.agent.skills.base import AgentContext
149	```
150
151	Shared state dataclass passed to all skills during execution. Accumulates artifacts and conversation history across the agent session.
152
153	\| Field \| Type \| Default \| Description \|
154	\|---\|---\|---\|---\|
155	\| `knowledge_graph` \| `Any` \| `None` \| `KnowledgeGraph` instance \|
156	\| `query_engine` \| `Any` \| `None` \| `GraphQueryEngine` instance for querying the KG \|
157	\| `provider_manager` \| `Any` \| `None` \| `ProviderManager` instance for LLM calls \|
158	\| `planning_entities` \| `List[Any]` \| `[]` \| Extracted `PlanningEntity` instances \|
159	\| `user_requirements` \| `Dict[str, Any]` \| `{}` \| User-specified requirements and constraints \|
160	\| `conversation_history` \| `List[Dict[str, str]]` \| `[]` \| Chat message history (`role`, `content` dicts) \|
161	\| `artifacts` \| `List[Artifact]` \| `[]` \| Previously generated artifacts \|
162	\| `config` \| `Dict[str, Any]` \| `{}` \| Additional configuration \|
163
164	```python
165	from video_processor.agent.skills.base import AgentContext
166
167	context = AgentContext(
168	knowledge_graph=kg,
169	query_engine=engine,
170	provider_manager=pm,
171	config={"output_format": "markdown"},
172	)
173	```
174
175	---
176
177	## Skill (ABC)
178
179	```python
180	from video_processor.agent.skills.base import Skill
181	```
182
183	Base class for agent skills. Each skill represents a discrete capability that produces an artifact from the agent context.
184
185	Class attributes:
186
187	\| Attribute \| Type \| Description \|
188	\|---\|---\|---\|
189	\| `name` \| `str` \| Skill identifier (e.g., `"project_plan"`, `"prd"`) \|
190	\| `description` \| `str` \| Human-readable description shown to the LLM for skill selection \|
191
192	### execute()
193
194	```python
195	@abstractmethod
196	def execute(self, context: AgentContext, **kwargs) -> Artifact
197	```
198
199	Execute this skill and return an artifact. Receives the shared agent context and any parameters selected by the LLM planner.
200
201	### can_execute()
202
203	```python
204	def can_execute(self, context: AgentContext) -> bool
205	```
206
207	Check if this skill can execute given the current context. The default implementation requires both `knowledge_graph` and `provider_manager` to be set. Override for skills with different requirements.
208
209	Returns: `bool`
210
211	### Implementing a custom skill
212
213	```python
214	from video_processor.agent.skills.base import Skill, Artifact, AgentContext, register_skill
215
216	class SummarySkill(Skill):
217	name = "summary"
218	description = "Generate a concise summary of the knowledge base"
219
220	def execute(self, context: AgentContext, **kwargs) -> Artifact:
221	stats = context.query_engine.stats()
222	prompt = f"Summarize this knowledge base:\n{stats.to_text()}"
223	content = context.provider_manager.chat(
224	[{"role": "user", "content": prompt}]
225	)
226	return Artifact(
227	name="Knowledge Base Summary",
228	content=content,
229	artifact_type="document",
230	format="markdown",
231	)
232
233	def can_execute(self, context: AgentContext) -> bool:
234	return context.query_engine is not None and context.provider_manager is not None
235
236	# Register the skill so the agent can discover it
237	register_skill(SummarySkill())
238	```
239
240	---
241
242	## Artifact
243
244	```python
245	from video_processor.agent.skills.base import Artifact
246	```
247
248	Dataclass representing the output of a skill execution.
249
250	\| Field \| Type \| Default \| Description \|
251	\|---\|---\|---\|---\|
252	\| `name` \| `str` \| required \| Human-readable artifact name \|
253	\| `content` \| `str` \| required \| Generated content (Markdown, JSON, Mermaid, etc.) \|
254	\| `artifact_type` \| `str` \| required \| Type: `"project_plan"`, `"prd"`, `"roadmap"`, `"task_list"`, `"document"`, `"issues"` \|
255	\| `format` \| `str` \| `"markdown"` \| Content format: `"markdown"`, `"json"`, `"mermaid"` \|
256	\| `metadata` \| `Dict[str, Any]` \| `{}` \| Additional metadata \|
257
258	---
259
260	## Skill Registry Functions
261
262	### register_skill()
263
264	```python
265	def register_skill(skill: Skill) -> None
266	```
267
268	Register a skill instance in the global registry. Skills must be registered before the agent can discover and execute them.
269
270	### get_skill()
271
272	```python
273	def get_skill(name: str) -> Optional[Skill]
274	```
275
276	Look up a registered skill by name.
277
278	Returns: `Optional[Skill]` -- the skill instance, or `None` if not found.
279
280	### list_skills()
281
282	```python
283	def list_skills() -> List[Skill]
284	```
285
286	Return all registered skill instances.
287
288	---
289
290	## KBContext
291
292	```python
293	from video_processor.agent.kb_context import KBContext
294	```
295
296	Loads and merges multiple knowledge graph sources into a unified context for agent consumption. Supports both FalkorDB (`.db`) and JSON (`.json`) formats, and can auto-discover graphs in a directory tree.
297
298	### Constructor
299
300	```python
301	def __init__(self)
302	```
303
304	Creates an empty context. Use `add_source()` to add knowledge graph paths, then `load()` to initialize.
305
306	### add_source()
307
308	```python
309	def add_source(self, path) -> None
310	```
311
312	Add a knowledge graph source.
313
314	Parameters:
315
316	\| Parameter \| Type \| Description \|
317	\|---\|---\|---\|
318	\| `path` \| `str \\| Path` \| Path to a `.db` file, `.json` file, or directory to search for knowledge graphs \|
319
320	If `path` is a directory, it is searched recursively for knowledge graph files using `find_knowledge_graphs()`.
321
322	Raises: `FileNotFoundError` if the path does not exist.
323
324	### load()
325
326	```python
327	def load(self, provider_manager=None) -> KBContext
328	```
329
330	Load and merge all added sources into a single knowledge graph and query engine.
331
332	Parameters:
333
334	\| Parameter \| Type \| Default \| Description \|
335	\|---\|---\|---\|---\|
336	\| `provider_manager` \| `ProviderManager` \| `None` \| LLM provider for the knowledge graph and query engine \|
337
338	Returns: `KBContext` -- self, for method chaining.
339
340	### Properties
341
342	\| Property \| Type \| Description \|
343	\|---\|---\|---\|
344	\| `knowledge_graph` \| `KnowledgeGraph` \| The merged knowledge graph (raises `RuntimeError` if not loaded) \|
345	\| `query_engine` \| `GraphQueryEngine` \| Query engine for the merged graph (raises `RuntimeError` if not loaded) \|
346	\| `sources` \| `List[Path]` \| List of resolved source paths \|
347
348	### summary()
349
350	```python
351	def summary(self) -> str
352	```
353
354	Generate a brief text summary of the loaded knowledge base, including entity counts by type and relationship counts.
355
356	Returns: `str` -- multi-line summary text.
357
358	### auto_discover()
359
360	```python
361	@classmethod
362	def auto_discover(
363	cls,
364	start_dir: Optional[Path] = None,
365	provider_manager=None,
366	) -> KBContext
367	```
368
369	Factory method that creates a `KBContext` by auto-discovering knowledge graphs near `start_dir` (defaults to current directory).
370
371	Returns: `KBContext` -- loaded context (may have zero sources if none found).
372
373	### Usage examples
374
375	```python
376	from pathlib import Path
377	from video_processor.agent.kb_context import KBContext
378
379	# Manual source management
380	kb = KBContext()
381	kb.add_source(Path("project_a/knowledge_graph.db"))
382	kb.add_source(Path("project_b/results/")) # searches directory
383	kb.load(provider_manager=pm)
384
385	print(kb.summary())
386	# Knowledge base: 3 source(s)
387	# Entities: 142
388	# Relationships: 89
389	# Entity types:
390	# technology: 45
391	# person: 23
392	# concept: 74
393
394	# Auto-discover from current directory
395	kb = KBContext.auto_discover()
396
397	# Use with the agent
398	from video_processor.agent.agent_loop import PlanningAgent
399	from video_processor.agent.skills.base import AgentContext
400
401	context = AgentContext(
402	knowledge_graph=kb.knowledge_graph,
403	query_engine=kb.query_engine,
404	provider_manager=pm,
405	)
406	agent = PlanningAgent(context)
407	```

M docs/api/analyzers.md

+382

		--- docs/api/analyzers.md
		+++ docs/api/analyzers.md
		@@ -3,5 +3,387 @@
3	3	::: video_processor.analyzers.diagram_analyzer
4	4
5	5	::: video_processor.analyzers.content_analyzer
6	6
7	7	::: video_processor.analyzers.action_detector
	8	+
	9	+---
	10	+
	11	+## Overview
	12	+
	13	+The analyzers module contains the core content extraction logic for PlanOpticon. These analyzers process video frames and transcripts to extract structured knowledge: diagrams, key points, action items, and cross-referenced entities.
	14	+
	15	+All analyzers accept an optional `ProviderManager` instance. When provided, they use LLM capabilities for richer extraction. Without one, they fall back to heuristic/pattern-based methods where possible.
	16	+
	17	+---
	18	+
	19	+## DiagramAnalyzer
	20	+
	21	+```python
	22	+from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
	23	+```
	24	+
	25	+Vision model-based diagram detection and analysis. Classifies video frames as diagrams, slides, screenshots, or other content, then performs full extraction on high-confidence frames.
	26	+
	27	+### Constructor
	28	+
	29	+```python
	30	+def __init__(
	31	+ self,
	32	+ provider_manager: Optional[ProviderManager] = None,
	33	+ confidence_threshold: float = 0.3,
	34	+)
	35	+```
	36	+
	37	+\| Parameter \| Type \| Default \| Description \|
	38	+\|---\|---\|---\|---\|
	39	+\| `provider_manager` \| `Optional[ProviderManager]` \| `None` \| LLM provider (creates a default if not provided) \|
	40	+\| `confidence_threshold` \| `float` \| `0.3` \| Minimum confidence to process a frame at all \|
	41	+
	42	+### classify_frame()
	43	+
	44	+```python
	45	+def classify_frame(self, image_path: Union[str, Path]) -> dict
	46	+```
	47	+
	48	+Classify a single frame using a vision model. Determines whether the frame contains a diagram, slide, or other visual content worth extracting.
	49	+
	50	+Parameters:
	51	+
	52	+\| Parameter \| Type \| Description \|
	53	+\|---\|---\|---\|
	54	+\| `image_path` \| `Union[str, Path]` \| Path to the frame image file \|
	55	+
	56	+Returns: `dict` with the following keys:
	57	+
	58	+\| Key \| Type \| Description \|
	59	+\|---\|---\|---\|
	60	+\| `is_diagram` \| `bool` \| Whether the frame contains extractable content \|
	61	+\| `diagram_type` \| `str` \| One of: `flowchart`, `sequence`, `architecture`, `whiteboard`, `chart`, `table`, `slide`, `screenshot`, `unknown` \|
	62	+\| `confidence` \| `float` \| Detection confidence from 0.0 to 1.0 \|
	63	+\| `content_type` \| `str` \| Content category: `slide`, `diagram`, `document`, `screen_share`, `whiteboard`, `chart`, `person`, `other` \|
	64	+\| `brief_description` \| `str` \| One-sentence description of the frame content \|
	65	+
	66	+Important: Frames showing people, webcam feeds, or video conference participant views return `confidence: 0.0`. The classifier is tuned to detect only shared/presented content.
	67	+
	68	+```python
	69	+analyzer = DiagramAnalyzer()
	70	+result = analyzer.classify_frame("/path/to/frame_042.jpg")
	71	+if result["confidence"] >= 0.7:
	72	+ print(f"Diagram detected: {result['diagram_type']}")
	73	+```
	74	+
	75	+### analyze_diagram_single_pass()
	76	+
	77	+```python
	78	+def analyze_diagram_single_pass(self, image_path: Union[str, Path]) -> dict
	79	+```
	80	+
	81	+Full single-pass diagram analysis. Extracts description, text content, elements, relationships, Mermaid syntax, and chart data in a single LLM call.
	82	+
	83	+Returns: `dict` with the following keys:
	84	+
	85	+\| Key \| Type \| Description \|
	86	+\|---\|---\|---\|
	87	+\| `diagram_type` \| `str` \| Diagram classification \|
	88	+\| `description` \| `str` \| Detailed description of the visual content \|
	89	+\| `text_content` \| `str` \| All visible text, preserving structure \|
	90	+\| `elements` \| `list[str]` \| Identified elements/components \|
	91	+\| `relationships` \| `list[str]` \| Relationships in `"A -> B: label"` format \|
	92	+\| `mermaid` \| `str` \| Valid Mermaid diagram syntax \|
	93	+\| `chart_data` \| `dict \\| None` \| Chart data with `labels`, `values`, `chart_type` (only for data charts) \|
	94	+
	95	+Returns an empty `dict` on failure.
	96	+
	97	+### caption_frame()
	98	+
	99	+```python
	100	+def caption_frame(self, image_path: Union[str, Path]) -> str
	101	+```
	102	+
	103	+Get a brief 1-2 sentence caption for a frame. Used as a fallback when full diagram analysis is not warranted.
	104	+
	105	+Returns: `str` -- a brief description of the frame content.
	106	+
	107	+### process_frames()
	108	+
	109	+```python
	110	+def process_frames(
	111	+ self,
	112	+ frame_paths: List[Union[str, Path]],
	113	+ diagrams_dir: Optional[Path] = None,
	114	+ captures_dir: Optional[Path] = None,
	115	+) -> Tuple[List[DiagramResult], List[ScreenCapture]]
	116	+```
	117	+
	118	+Process a batch of extracted video frames through the full classification and analysis pipeline.
	119	+
	120	+Parameters:
	121	+
	122	+\| Parameter \| Type \| Default \| Description \|
	123	+\|---\|---\|---\|---\|
	124	+\| `frame_paths` \| `List[Union[str, Path]]` \| required \| Paths to frame images \|
	125	+\| `diagrams_dir` \| `Optional[Path]` \| `None` \| Output directory for diagram files (images, mermaid, JSON) \|
	126	+\| `captures_dir` \| `Optional[Path]` \| `None` \| Output directory for screengrab fallback files \|
	127	+
	128	+Returns: `Tuple[List[DiagramResult], List[ScreenCapture]]`
	129	+
	130	+Confidence thresholds:
	131	+
	132	+\| Confidence Range \| Action \|
	133	+\|---\|---\|
	134	+\| >= 0.7 \| Full diagram analysis -- extracts elements, relationships, Mermaid syntax \|
	135	+\| 0.3 to 0.7 \| Screengrab fallback -- saves frame with a brief caption \|
	136	+\| < 0.3 \| Skipped entirely \|
	137	+
	138	+Output files (when directories are provided):
	139	+
	140	+For diagrams (`diagrams_dir`):
	141	+
	142	+- `diagram_N.jpg` -- original frame image
	143	+- `diagram_N.mermaid` -- Mermaid source (if generated)
	144	+- `diagram_N.json` -- full DiagramResult as JSON
	145	+
	146	+For screen captures (`captures_dir`):
	147	+
	148	+- `capture_N.jpg` -- original frame image
	149	+- `capture_N.json` -- ScreenCapture metadata as JSON
	150	+
	151	+```python
	152	+from pathlib import Path
	153	+from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
	154	+from video_processor.providers.manager import ProviderManager
	155	+
	156	+analyzer = DiagramAnalyzer(
	157	+ provider_manager=ProviderManager(),
	158	+ confidence_threshold=0.3,
	159	+)
	160	+
	161	+frame_paths = list(Path("output/frames").glob("*.jpg"))
	162	+diagrams, captures = analyzer.process_frames(
	163	+ frame_paths,
	164	+ diagrams_dir=Path("output/diagrams"),
	165	+ captures_dir=Path("output/captures"),
	166	+)
	167	+
	168	+print(f"Found {len(diagrams)} diagrams, {len(captures)} screengrabs")
	169	+for d in diagrams:
	170	+ print(f" [{d.diagram_type.value}] {d.description}")
	171	+```
	172	+
	173	+---
	174	+
	175	+## ContentAnalyzer
	176	+
	177	+```python
	178	+from video_processor.analyzers.content_analyzer import ContentAnalyzer
	179	+```
	180	+
	181	+Cross-references transcript and diagram entities for richer knowledge extraction. Merges entities found in different sources and enriches key points with diagram links.
	182	+
	183	+### Constructor
	184	+
	185	+```python
	186	+def __init__(self, provider_manager: Optional[ProviderManager] = None)
	187	+```
	188	+
	189	+\| Parameter \| Type \| Default \| Description \|
	190	+\|---\|---\|---\|---\|
	191	+\| `provider_manager` \| `Optional[ProviderManager]` \| `None` \| Required for LLM-based fuzzy matching \|
	192	+
	193	+### cross_reference()
	194	+
	195	+```python
	196	+def cross_reference(
	197	+ self,
	198	+ transcript_entities: List[Entity],
	199	+ diagram_entities: List[Entity],
	200	+) -> List[Entity]
	201	+```
	202	+
	203	+Merge entities from transcripts and diagrams into a unified list with source attribution.
	204	+
	205	+Merge strategy:
	206	+
	207	+1. Index all transcript entities by lowercase name, marked with `source="transcript"`
	208	+2. Merge diagram entities: if a name matches, set `source="both"` and combine descriptions/occurrences; otherwise add as `source="diagram"`
	209	+3. If a `ProviderManager` is available, use LLM fuzzy matching to find additional matches among unmatched entities (e.g., "PostgreSQL" from transcript matching "Postgres" from diagram)
	210	+
	211	+Parameters:
	212	+
	213	+\| Parameter \| Type \| Description \|
	214	+\|---\|---\|---\|
	215	+\| `transcript_entities` \| `List[Entity]` \| Entities extracted from transcript \|
	216	+\| `diagram_entities` \| `List[Entity]` \| Entities extracted from diagrams \|
	217	+
	218	+Returns: `List[Entity]` -- merged entity list with `source` attribution.
	219	+
	220	+```python
	221	+from video_processor.analyzers.content_analyzer import ContentAnalyzer
	222	+from video_processor.models import Entity
	223	+
	224	+analyzer = ContentAnalyzer(provider_manager=pm)
	225	+
	226	+transcript_entities = [
	227	+ Entity(name="PostgreSQL", type="technology"),
	228	+ Entity(name="Alice", type="person"),
	229	+]
	230	+diagram_entities = [
	231	+ Entity(name="Postgres", type="technology"),
	232	+ Entity(name="Redis", type="technology"),
	233	+]
	234	+
	235	+merged = analyzer.cross_reference(transcript_entities, diagram_entities)
	236	+# "PostgreSQL" and "Postgres" may be fuzzy-matched and merged
	237	+```
	238	+
	239	+### enrich_key_points()
	240	+
	241	+```python
	242	+def enrich_key_points(
	243	+ self,
	244	+ key_points: List[KeyPoint],
	245	+ diagrams: list,
	246	+ transcript_text: str,
	247	+) -> List[KeyPoint]
	248	+```
	249	+
	250	+Link key points to relevant diagrams by entity overlap. Examines word overlap between key point text and diagram elements/text content.
	251	+
	252	+Parameters:
	253	+
	254	+\| Parameter \| Type \| Description \|
	255	+\|---\|---\|---\|
	256	+\| `key_points` \| `List[KeyPoint]` \| Key points to enrich \|
	257	+\| `diagrams` \| `list` \| List of `DiagramResult` objects or dicts \|
	258	+\| `transcript_text` \| `str` \| Full transcript text (reserved for future use) \|
	259	+
	260	+Returns: `List[KeyPoint]` -- key points with `related_diagrams` indices populated.
	261	+
	262	+A key point is linked to a diagram when they share 2 or more words (excluding short words) between the key point text/details and the diagram's elements/text content.
	263	+
	264	+---
	265	+
	266	+## ActionDetector
	267	+
	268	+```python
	269	+from video_processor.analyzers.action_detector import ActionDetector
	270	+```
	271	+
	272	+Detects action items from transcripts and diagram content using LLM extraction with a regex pattern fallback.
	273	+
	274	+### Constructor
	275	+
	276	+```python
	277	+def __init__(self, provider_manager: Optional[ProviderManager] = None)
	278	+```
	279	+
	280	+\| Parameter \| Type \| Default \| Description \|
	281	+\|---\|---\|---\|---\|
	282	+\| `provider_manager` \| `Optional[ProviderManager]` \| `None` \| Required for LLM-based extraction \|
	283	+
	284	+### detect_from_transcript()
	285	+
	286	+```python
	287	+def detect_from_transcript(
	288	+ self,
	289	+ text: str,
	290	+ segments: Optional[List[TranscriptSegment]] = None,
	291	+) -> List[ActionItem]
	292	+```
	293	+
	294	+Detect action items from transcript text.
	295	+
	296	+Parameters:
	297	+
	298	+\| Parameter \| Type \| Default \| Description \|
	299	+\|---\|---\|---\|---\|
	300	+\| `text` \| `str` \| required \| Transcript text to analyze \|
	301	+\| `segments` \| `Optional[List[TranscriptSegment]]` \| `None` \| Transcript segments for timestamp attachment \|
	302	+
	303	+Returns: `List[ActionItem]` -- detected action items with `source="transcript"`.
	304	+
	305	+Extraction modes:
	306	+
	307	+- LLM mode (when `provider_manager` is set): Sends the transcript to the LLM with a structured extraction prompt. Extracts action, assignee, deadline, priority, and context.
	308	+- Pattern mode (fallback): Matches sentences against regex patterns for action-oriented language.
	309	+
	310	+Pattern matching detects sentences containing:
	311	+
	312	+- "need/needs to", "should/must/shall"
	313	+- "will/going to", "action item/todo/follow-up"
	314	+- "assigned to/responsible for", "deadline/due by"
	315	+- "let's/let us", "make sure/ensure"
	316	+- "can you/could you/please"
	317	+
	318	+Timestamp attachment: When `segments` are provided, each action item is matched to the most relevant transcript segment (by word overlap, minimum 3 matching words), and a timestamp is added to `context`.
	319	+
	320	+### detect_from_diagrams()
	321	+
	322	+```python
	323	+def detect_from_diagrams(self, diagrams: list) -> List[ActionItem]
	324	+```
	325	+
	326	+Extract action items from diagram text content and elements. Processes each diagram's combined text using either LLM or pattern extraction.
	327	+
	328	+Parameters:
	329	+
	330	+\| Parameter \| Type \| Description \|
	331	+\|---\|---\|---\|
	332	+\| `diagrams` \| `list` \| List of `DiagramResult` objects or dicts \|
	333	+
	334	+Returns: `List[ActionItem]` -- action items with `source="diagram"`.
	335	+
	336	+### merge_action_items()
	337	+
	338	+```python
	339	+def merge_action_items(
	340	+ self,
	341	+ transcript_items: List[ActionItem],
	342	+ diagram_items: List[ActionItem],
	343	+) -> List[ActionItem]
	344	+```
	345	+
	346	+Merge action items from multiple sources, deduplicating by action text (case-insensitive, whitespace-normalized).
	347	+
	348	+Returns: `List[ActionItem]` -- deduplicated merged list.
	349	+
	350	+### Usage example
	351	+
	352	+```python
	353	+from video_processor.analyzers.action_detector import ActionDetector
	354	+from video_processor.providers.manager import ProviderManager
	355	+
	356	+detector = ActionDetector(provider_manager=ProviderManager())
	357	+
	358	+# From transcript
	359	+transcript_items = detector.detect_from_transcript(
	360	+ text="Alice needs to update the API docs by Friday. "
	361	+ "Bob should review the PR before merging.",
	362	+ segments=transcript_segments,
	363	+)
	364	+
	365	+# From diagrams
	366	+diagram_items = detector.detect_from_diagrams(diagram_results)
	367	+
	368	+# Merge and deduplicate
	369	+all_items = detector.merge_action_items(transcript_items, diagram_items)
	370	+
	371	+for item in all_items:
	372	+ print(f"[{item.priority or 'unset'}] {item.action}")
	373	+ if item.assignee:
	374	+ print(f" Assignee: {item.assignee}")
	375	+ if item.deadline:
	376	+ print(f" Deadline: {item.deadline}")
	377	+```
	378	+
	379	+### Pattern fallback (no LLM)
	380	+
	381	+```python
	382	+# Works without any API keys
	383	+detector = ActionDetector() # No provider_manager
	384	+items = detector.detect_from_transcript(
	385	+ "We need to finalize the database schema. "
	386	+ "Please update the deployment scripts."
	387	+)
	388	+# Returns ActionItems matched by regex patterns
	389	+```
8	390
9	391	ADDED docs/api/auth.md

	--- docs/api/analyzers.md
	+++ docs/api/analyzers.md
	@@ -3,5 +3,387 @@
3	::: video_processor.analyzers.diagram_analyzer
4
5	::: video_processor.analyzers.content_analyzer
6
7	::: video_processor.analyzers.action_detector






























































































































































































































































































































































































8
9	DDED docs/api/auth.md

	--- docs/api/analyzers.md
	+++ docs/api/analyzers.md
	@@ -3,5 +3,387 @@
3	::: video_processor.analyzers.diagram_analyzer
4
5	::: video_processor.analyzers.content_analyzer
6
7	::: video_processor.analyzers.action_detector
8
9	---
10
11	## Overview
12
13	The analyzers module contains the core content extraction logic for PlanOpticon. These analyzers process video frames and transcripts to extract structured knowledge: diagrams, key points, action items, and cross-referenced entities.
14
15	All analyzers accept an optional `ProviderManager` instance. When provided, they use LLM capabilities for richer extraction. Without one, they fall back to heuristic/pattern-based methods where possible.
16
17	---
18
19	## DiagramAnalyzer
20
21	```python
22	from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
23	```
24
25	Vision model-based diagram detection and analysis. Classifies video frames as diagrams, slides, screenshots, or other content, then performs full extraction on high-confidence frames.
26
27	### Constructor
28
29	```python
30	def __init__(
31	self,
32	provider_manager: Optional[ProviderManager] = None,
33	confidence_threshold: float = 0.3,
34	)
35	```
36
37	\| Parameter \| Type \| Default \| Description \|
38	\|---\|---\|---\|---\|
39	\| `provider_manager` \| `Optional[ProviderManager]` \| `None` \| LLM provider (creates a default if not provided) \|
40	\| `confidence_threshold` \| `float` \| `0.3` \| Minimum confidence to process a frame at all \|
41
42	### classify_frame()
43
44	```python
45	def classify_frame(self, image_path: Union[str, Path]) -> dict
46	```
47
48	Classify a single frame using a vision model. Determines whether the frame contains a diagram, slide, or other visual content worth extracting.
49
50	Parameters:
51
52	\| Parameter \| Type \| Description \|
53	\|---\|---\|---\|
54	\| `image_path` \| `Union[str, Path]` \| Path to the frame image file \|
55
56	Returns: `dict` with the following keys:
57
58	\| Key \| Type \| Description \|
59	\|---\|---\|---\|
60	\| `is_diagram` \| `bool` \| Whether the frame contains extractable content \|
61	\| `diagram_type` \| `str` \| One of: `flowchart`, `sequence`, `architecture`, `whiteboard`, `chart`, `table`, `slide`, `screenshot`, `unknown` \|
62	\| `confidence` \| `float` \| Detection confidence from 0.0 to 1.0 \|
63	\| `content_type` \| `str` \| Content category: `slide`, `diagram`, `document`, `screen_share`, `whiteboard`, `chart`, `person`, `other` \|
64	\| `brief_description` \| `str` \| One-sentence description of the frame content \|
65
66	Important: Frames showing people, webcam feeds, or video conference participant views return `confidence: 0.0`. The classifier is tuned to detect only shared/presented content.
67
68	```python
69	analyzer = DiagramAnalyzer()
70	result = analyzer.classify_frame("/path/to/frame_042.jpg")
71	if result["confidence"] >= 0.7:
72	print(f"Diagram detected: {result['diagram_type']}")
73	```
74
75	### analyze_diagram_single_pass()
76
77	```python
78	def analyze_diagram_single_pass(self, image_path: Union[str, Path]) -> dict
79	```
80
81	Full single-pass diagram analysis. Extracts description, text content, elements, relationships, Mermaid syntax, and chart data in a single LLM call.
82
83	Returns: `dict` with the following keys:
84
85	\| Key \| Type \| Description \|
86	\|---\|---\|---\|
87	\| `diagram_type` \| `str` \| Diagram classification \|
88	\| `description` \| `str` \| Detailed description of the visual content \|
89	\| `text_content` \| `str` \| All visible text, preserving structure \|
90	\| `elements` \| `list[str]` \| Identified elements/components \|
91	\| `relationships` \| `list[str]` \| Relationships in `"A -> B: label"` format \|
92	\| `mermaid` \| `str` \| Valid Mermaid diagram syntax \|
93	\| `chart_data` \| `dict \\| None` \| Chart data with `labels`, `values`, `chart_type` (only for data charts) \|
94
95	Returns an empty `dict` on failure.
96
97	### caption_frame()
98
99	```python
100	def caption_frame(self, image_path: Union[str, Path]) -> str
101	```
102
103	Get a brief 1-2 sentence caption for a frame. Used as a fallback when full diagram analysis is not warranted.
104
105	Returns: `str` -- a brief description of the frame content.
106
107	### process_frames()
108
109	```python
110	def process_frames(
111	self,
112	frame_paths: List[Union[str, Path]],
113	diagrams_dir: Optional[Path] = None,
114	captures_dir: Optional[Path] = None,
115	) -> Tuple[List[DiagramResult], List[ScreenCapture]]
116	```
117
118	Process a batch of extracted video frames through the full classification and analysis pipeline.
119
120	Parameters:
121
122	\| Parameter \| Type \| Default \| Description \|
123	\|---\|---\|---\|---\|
124	\| `frame_paths` \| `List[Union[str, Path]]` \| required \| Paths to frame images \|
125	\| `diagrams_dir` \| `Optional[Path]` \| `None` \| Output directory for diagram files (images, mermaid, JSON) \|
126	\| `captures_dir` \| `Optional[Path]` \| `None` \| Output directory for screengrab fallback files \|
127
128	Returns: `Tuple[List[DiagramResult], List[ScreenCapture]]`
129
130	Confidence thresholds:
131
132	\| Confidence Range \| Action \|
133	\|---\|---\|
134	\| >= 0.7 \| Full diagram analysis -- extracts elements, relationships, Mermaid syntax \|
135	\| 0.3 to 0.7 \| Screengrab fallback -- saves frame with a brief caption \|
136	\| < 0.3 \| Skipped entirely \|
137
138	Output files (when directories are provided):
139
140	For diagrams (`diagrams_dir`):
141
142	- `diagram_N.jpg` -- original frame image
143	- `diagram_N.mermaid` -- Mermaid source (if generated)
144	- `diagram_N.json` -- full DiagramResult as JSON
145
146	For screen captures (`captures_dir`):
147
148	- `capture_N.jpg` -- original frame image
149	- `capture_N.json` -- ScreenCapture metadata as JSON
150
151	```python
152	from pathlib import Path
153	from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
154	from video_processor.providers.manager import ProviderManager
155
156	analyzer = DiagramAnalyzer(
157	provider_manager=ProviderManager(),
158	confidence_threshold=0.3,
159	)
160
161	frame_paths = list(Path("output/frames").glob("*.jpg"))
162	diagrams, captures = analyzer.process_frames(
163	frame_paths,
164	diagrams_dir=Path("output/diagrams"),
165	captures_dir=Path("output/captures"),
166	)
167
168	print(f"Found {len(diagrams)} diagrams, {len(captures)} screengrabs")
169	for d in diagrams:
170	print(f" [{d.diagram_type.value}] {d.description}")
171	```
172
173	---
174
175	## ContentAnalyzer
176
177	```python
178	from video_processor.analyzers.content_analyzer import ContentAnalyzer
179	```
180
181	Cross-references transcript and diagram entities for richer knowledge extraction. Merges entities found in different sources and enriches key points with diagram links.
182
183	### Constructor
184
185	```python
186	def __init__(self, provider_manager: Optional[ProviderManager] = None)
187	```
188
189	\| Parameter \| Type \| Default \| Description \|
190	\|---\|---\|---\|---\|
191	\| `provider_manager` \| `Optional[ProviderManager]` \| `None` \| Required for LLM-based fuzzy matching \|
192
193	### cross_reference()
194
195	```python
196	def cross_reference(
197	self,
198	transcript_entities: List[Entity],
199	diagram_entities: List[Entity],
200	) -> List[Entity]
201	```
202
203	Merge entities from transcripts and diagrams into a unified list with source attribution.
204
205	Merge strategy:
206
207	1. Index all transcript entities by lowercase name, marked with `source="transcript"`
208	2. Merge diagram entities: if a name matches, set `source="both"` and combine descriptions/occurrences; otherwise add as `source="diagram"`
209	3. If a `ProviderManager` is available, use LLM fuzzy matching to find additional matches among unmatched entities (e.g., "PostgreSQL" from transcript matching "Postgres" from diagram)
210
211	Parameters:
212
213	\| Parameter \| Type \| Description \|
214	\|---\|---\|---\|
215	\| `transcript_entities` \| `List[Entity]` \| Entities extracted from transcript \|
216	\| `diagram_entities` \| `List[Entity]` \| Entities extracted from diagrams \|
217
218	Returns: `List[Entity]` -- merged entity list with `source` attribution.
219
220	```python
221	from video_processor.analyzers.content_analyzer import ContentAnalyzer
222	from video_processor.models import Entity
223
224	analyzer = ContentAnalyzer(provider_manager=pm)
225
226	transcript_entities = [
227	Entity(name="PostgreSQL", type="technology"),
228	Entity(name="Alice", type="person"),
229	]
230	diagram_entities = [
231	Entity(name="Postgres", type="technology"),
232	Entity(name="Redis", type="technology"),
233	]
234
235	merged = analyzer.cross_reference(transcript_entities, diagram_entities)
236	# "PostgreSQL" and "Postgres" may be fuzzy-matched and merged
237	```
238
239	### enrich_key_points()
240
241	```python
242	def enrich_key_points(
243	self,
244	key_points: List[KeyPoint],
245	diagrams: list,
246	transcript_text: str,
247	) -> List[KeyPoint]
248	```
249
250	Link key points to relevant diagrams by entity overlap. Examines word overlap between key point text and diagram elements/text content.
251
252	Parameters:
253
254	\| Parameter \| Type \| Description \|
255	\|---\|---\|---\|
256	\| `key_points` \| `List[KeyPoint]` \| Key points to enrich \|
257	\| `diagrams` \| `list` \| List of `DiagramResult` objects or dicts \|
258	\| `transcript_text` \| `str` \| Full transcript text (reserved for future use) \|
259
260	Returns: `List[KeyPoint]` -- key points with `related_diagrams` indices populated.
261
262	A key point is linked to a diagram when they share 2 or more words (excluding short words) between the key point text/details and the diagram's elements/text content.
263
264	---
265
266	## ActionDetector
267
268	```python
269	from video_processor.analyzers.action_detector import ActionDetector
270	```
271
272	Detects action items from transcripts and diagram content using LLM extraction with a regex pattern fallback.
273
274	### Constructor
275
276	```python
277	def __init__(self, provider_manager: Optional[ProviderManager] = None)
278	```
279
280	\| Parameter \| Type \| Default \| Description \|
281	\|---\|---\|---\|---\|
282	\| `provider_manager` \| `Optional[ProviderManager]` \| `None` \| Required for LLM-based extraction \|
283
284	### detect_from_transcript()
285
286	```python
287	def detect_from_transcript(
288	self,
289	text: str,
290	segments: Optional[List[TranscriptSegment]] = None,
291	) -> List[ActionItem]
292	```
293
294	Detect action items from transcript text.
295
296	Parameters:
297
298	\| Parameter \| Type \| Default \| Description \|
299	\|---\|---\|---\|---\|
300	\| `text` \| `str` \| required \| Transcript text to analyze \|
301	\| `segments` \| `Optional[List[TranscriptSegment]]` \| `None` \| Transcript segments for timestamp attachment \|
302
303	Returns: `List[ActionItem]` -- detected action items with `source="transcript"`.
304
305	Extraction modes:
306
307	- LLM mode (when `provider_manager` is set): Sends the transcript to the LLM with a structured extraction prompt. Extracts action, assignee, deadline, priority, and context.
308	- Pattern mode (fallback): Matches sentences against regex patterns for action-oriented language.
309
310	Pattern matching detects sentences containing:
311
312	- "need/needs to", "should/must/shall"
313	- "will/going to", "action item/todo/follow-up"
314	- "assigned to/responsible for", "deadline/due by"
315	- "let's/let us", "make sure/ensure"
316	- "can you/could you/please"
317
318	Timestamp attachment: When `segments` are provided, each action item is matched to the most relevant transcript segment (by word overlap, minimum 3 matching words), and a timestamp is added to `context`.
319
320	### detect_from_diagrams()
321
322	```python
323	def detect_from_diagrams(self, diagrams: list) -> List[ActionItem]
324	```
325
326	Extract action items from diagram text content and elements. Processes each diagram's combined text using either LLM or pattern extraction.
327
328	Parameters:
329
330	\| Parameter \| Type \| Description \|
331	\|---\|---\|---\|
332	\| `diagrams` \| `list` \| List of `DiagramResult` objects or dicts \|
333
334	Returns: `List[ActionItem]` -- action items with `source="diagram"`.
335
336	### merge_action_items()
337
338	```python
339	def merge_action_items(
340	self,
341	transcript_items: List[ActionItem],
342	diagram_items: List[ActionItem],
343	) -> List[ActionItem]
344	```
345
346	Merge action items from multiple sources, deduplicating by action text (case-insensitive, whitespace-normalized).
347
348	Returns: `List[ActionItem]` -- deduplicated merged list.
349
350	### Usage example
351
352	```python
353	from video_processor.analyzers.action_detector import ActionDetector
354	from video_processor.providers.manager import ProviderManager
355
356	detector = ActionDetector(provider_manager=ProviderManager())
357
358	# From transcript
359	transcript_items = detector.detect_from_transcript(
360	text="Alice needs to update the API docs by Friday. "
361	"Bob should review the PR before merging.",
362	segments=transcript_segments,
363	)
364
365	# From diagrams
366	diagram_items = detector.detect_from_diagrams(diagram_results)
367
368	# Merge and deduplicate
369	all_items = detector.merge_action_items(transcript_items, diagram_items)
370
371	for item in all_items:
372	print(f"[{item.priority or 'unset'}] {item.action}")
373	if item.assignee:
374	print(f" Assignee: {item.assignee}")
375	if item.deadline:
376	print(f" Deadline: {item.deadline}")
377	```
378
379	### Pattern fallback (no LLM)
380
381	```python
382	# Works without any API keys
383	detector = ActionDetector() # No provider_manager
384	items = detector.detect_from_transcript(
385	"We need to finalize the database schema. "
386	"Please update the deployment scripts."
387	)
388	# Returns ActionItems matched by regex patterns
389	```
390
391	DDED docs/api/auth.md

A docs/api/auth.md

+377

		--- a/docs/api/auth.md
		+++ b/docs/api/auth.md
		@@ -0,0 +1,377 @@
	1	+# Auth API Reference
	2	+
	3	+::: video_processor.auth
	4	+
	5	+---
	6	+
	7	+## Overview
	8	+
	9	+The `video_processor.auth` module provides a unified OAuth and authentication strategy for all PlanOpticon source connectors. It supports multiple authentication methods tried in a consistent order:
	10	+
	11	+1. Saved token -- load from disk, auto-refresh if expired
	12	+2. Client Credentials -- server-to-server OAuth (e.g., Zoom S2S)
	13	+3. OAuth 2.0 PKCE -- interactive Authorization Code flow with PKCE
	14	+4. API key fallback -- environment variable lookup
	15	+
	16	+Tokens are persisted to `~/.planopticon/` and automatically refreshed on expiry.
	17	+
	18	+---
	19	+
	20	+## AuthConfig
	21	+
	22	+```python
	23	+from video_processor.auth import AuthConfig
	24	+```
	25	+
	26	+Dataclass configuring authentication for a specific service. Defines OAuth endpoints, client credentials, API key fallback, scopes, and token storage.
	27	+
	28	+### Fields
	29	+
	30	+\| Field \| Type \| Default \| Description \|
	31	+\|---\|---\|---\|---\|
	32	+\| `service` \| `str` \| required \| Service identifier (e.g., `"zoom"`, `"notion"`) \|
	33	+\| `oauth_authorize_url` \| `Optional[str]` \| `None` \| OAuth authorization endpoint URL \|
	34	+\| `oauth_token_url` \| `Optional[str]` \| `None` \| OAuth token exchange endpoint URL \|
	35	+\| `client_id` \| `Optional[str]` \| `None` \| OAuth client ID (direct value) \|
	36	+\| `client_secret` \| `Optional[str]` \| `None` \| OAuth client secret (direct value) \|
	37	+\| `client_id_env` \| `Optional[str]` \| `None` \| Environment variable for client ID \|
	38	+\| `client_secret_env` \| `Optional[str]` \| `None` \| Environment variable for client secret \|
	39	+\| `api_key_env` \| `Optional[str]` \| `None` \| Environment variable for API key fallback \|
	40	+\| `scopes` \| `List[str]` \| `[]` \| OAuth scopes to request \|
	41	+\| `redirect_uri` \| `str` \| `"urn:ietf:wg:oauth:2.0:oob"` \| Redirect URI for auth code flow \|
	42	+\| `account_id` \| `Optional[str]` \| `None` \| Account ID for client credentials grant (direct value) \|
	43	+\| `account_id_env` \| `Optional[str]` \| `None` \| Environment variable for account ID \|
	44	+\| `token_path` \| `Optional[Path]` \| `None` \| Custom token storage path \|
	45	+
	46	+### Resolved Properties
	47	+
	48	+These properties resolve values by checking the direct field first, then falling back to the environment variable.
	49	+
	50	+\| Property \| Return Type \| Description \|
	51	+\|---\|---\|---\|
	52	+\| `resolved_client_id` \| `Optional[str]` \| Client ID from `client_id` or `os.environ[client_id_env]` \|
	53	+\| `resolved_client_secret` \| `Optional[str]` \| Client secret from `client_secret` or `os.environ[client_secret_env]` \|
	54	+\| `resolved_api_key` \| `Optional[str]` \| API key from `os.environ[api_key_env]` \|
	55	+\| `resolved_account_id` \| `Optional[str]` \| Account ID from `account_id` or `os.environ[account_id_env]` \|
	56	+\| `resolved_token_path` \| `Path` \| Token file path: `token_path` or `~/.planopticon/{service}_token.json` \|
	57	+\| `supports_oauth` \| `bool` \| `True` if both `oauth_authorize_url` and `oauth_token_url` are set \|
	58	+
	59	+```python
	60	+from video_processor.auth import AuthConfig
	61	+
	62	+config = AuthConfig(
	63	+ service="notion",
	64	+ oauth_authorize_url="https://api.notion.com/v1/oauth/authorize",
	65	+ oauth_token_url="https://api.notion.com/v1/oauth/token",
	66	+ client_id_env="NOTION_CLIENT_ID",
	67	+ client_secret_env="NOTION_CLIENT_SECRET",
	68	+ api_key_env="NOTION_API_KEY",
	69	+ scopes=["read_content"],
	70	+)
	71	+
	72	+# Check resolved values
	73	+print(config.resolved_client_id) # From NOTION_CLIENT_ID env var
	74	+print(config.supports_oauth) # True
	75	+print(config.resolved_token_path) # ~/.planopticon/notion_token.json
	76	+```
	77	+
	78	+---
	79	+
	80	+## AuthResult
	81	+
	82	+```python
	83	+from video_processor.auth import AuthResult
	84	+```
	85	+
	86	+Dataclass representing the result of an authentication attempt.
	87	+
	88	+\| Field \| Type \| Default \| Description \|
	89	+\|---\|---\|---\|---\|
	90	+\| `success` \| `bool` \| required \| Whether authentication succeeded \|
	91	+\| `access_token` \| `Optional[str]` \| `None` \| The access token (if successful) \|
	92	+\| `method` \| `Optional[str]` \| `None` \| Auth method used: `"saved_token"`, `"oauth_pkce"`, `"client_credentials"`, `"api_key"` \|
	93	+\| `expires_at` \| `Optional[float]` \| `None` \| Token expiration as Unix timestamp \|
	94	+\| `refresh_token` \| `Optional[str]` \| `None` \| OAuth refresh token (if available) \|
	95	+\| `error` \| `Optional[str]` \| `None` \| Error message (if failed) \|
	96	+
	97	+```python
	98	+result = manager.authenticate()
	99	+if result.success:
	100	+ print(f"Authenticated via {result.method}")
	101	+ print(f"Token: {result.access_token[:20]}...")
	102	+ if result.expires_at:
	103	+ import time
	104	+ remaining = result.expires_at - time.time()
	105	+ print(f"Expires in {remaining/60:.0f} minutes")
	106	+else:
	107	+ print(f"Auth failed: {result.error}")
	108	+```
	109	+
	110	+---
	111	+
	112	+## OAuthManager
	113	+
	114	+```python
	115	+from video_processor.auth import OAuthManager
	116	+```
	117	+
	118	+Manages the full authentication lifecycle for a service. Tries auth methods in priority order and handles token persistence, refresh, and PKCE flow.
	119	+
	120	+### Constructor
	121	+
	122	+```python
	123	+def __init__(self, config: AuthConfig)
	124	+```
	125	+
	126	+\| Parameter \| Type \| Description \|
	127	+\|---\|---\|---\|
	128	+\| `config` \| `AuthConfig` \| Authentication configuration for the target service \|
	129	+
	130	+### authenticate()
	131	+
	132	+```python
	133	+def authenticate(self) -> AuthResult
	134	+```
	135	+
	136	+Run the full auth chain and return the result. Methods are tried in order:
	137	+
	138	+1. Saved token -- checks `~/.planopticon/{service}_token.json`, refreshes if expired
	139	+2. Client Credentials -- if `account_id` is set and OAuth is configured, uses the client credentials grant (server-to-server)
	140	+3. OAuth PKCE -- if OAuth is configured and client ID is available, opens a browser for interactive authorization with PKCE
	141	+4. API key -- falls back to the environment variable specified in `api_key_env`
	142	+
	143	+Returns: `AuthResult` -- success/failure with token and method details.
	144	+
	145	+If all methods fail, returns an `AuthResult` with `success=False` and a helpful error message listing which environment variables to set.
	146	+
	147	+### get_token()
	148	+
	149	+```python
	150	+def get_token(self) -> Optional[str]
	151	+```
	152	+
	153	+Convenience method: run `authenticate()` and return just the access token string.
	154	+
	155	+Returns: `Optional[str]` -- the access token, or `None` if authentication failed.
	156	+
	157	+### clear_token()
	158	+
	159	+```python
	160	+def clear_token(self) -> None
	161	+```
	162	+
	163	+Remove the saved token file for this service (effectively a logout). The next `authenticate()` call will require re-authentication.
	164	+
	165	+---
	166	+
	167	+## Authentication Flows
	168	+
	169	+### Saved Token (auto-refresh)
	170	+
	171	+Tokens are saved to `~/.planopticon/{service}_token.json` as JSON. On each `authenticate()` call, the saved token is loaded and checked:
	172	+
	173	+- If the token has not expired (`time.time() < expires_at`), it is returned immediately
	174	+- If expired but a refresh token is available, the manager attempts to refresh using the OAuth token endpoint
	175	+- The refreshed token is saved back to disk
	176	+
	177	+### Client Credentials Grant
	178	+
	179	+Used for server-to-server authentication (e.g., Zoom Server-to-Server OAuth). Requires `account_id`, `client_id`, and `client_secret`. Sends a POST to the token endpoint with `grant_type=account_credentials`.
	180	+
	181	+### OAuth 2.0 Authorization Code with PKCE
	182	+
	183	+Interactive flow for user authentication:
	184	+
	185	+1. Generates a PKCE code verifier and S256 challenge
	186	+2. Constructs the authorization URL with client ID, redirect URI, scopes, and PKCE challenge
	187	+3. Opens the URL in the user's browser
	188	+4. Prompts the user to paste the authorization code
	189	+5. Exchanges the code for tokens at the token endpoint
	190	+6. Saves the tokens to disk
	191	+
	192	+### API Key Fallback
	193	+
	194	+If no OAuth flow succeeds, falls back to checking the environment variable specified in `api_key_env`. Returns the value directly as the access token.
	195	+
	196	+---
	197	+
	198	+## KNOWN_CONFIGS
	199	+
	200	+```python
	201	+from video_processor.auth import KNOWN_CONFIGS
	202	+```
	203	+
	204	+Pre-built `AuthConfig` instances for supported services. These cover the most common cloud integrations and can be used directly or as templates for custom configurations.
	205	+
	206	+\| Service Key \| Service \| OAuth Endpoints \| Client ID Env \| API Key Env \|
	207	+\|---\|---\|---\|---\|---\|
	208	+\| `"zoom"` \| Zoom \| `zoom.us/oauth/...` \| `ZOOM_CLIENT_ID` \| -- \|
	209	+\| `"notion"` \| Notion \| `api.notion.com/v1/oauth/...` \| `NOTION_CLIENT_ID` \| `NOTION_API_KEY` \|
	210	+\| `"dropbox"` \| Dropbox \| `dropbox.com/oauth2/...` \| `DROPBOX_APP_KEY` \| `DROPBOX_ACCESS_TOKEN` \|
	211	+\| `"github"` \| GitHub \| `github.com/login/oauth/...` \| `GITHUB_CLIENT_ID` \| `GITHUB_TOKEN` \|
	212	+\| `"google"` \| Google \| `accounts.google.com/o/oauth2/...` \| `GOOGLE_CLIENT_ID` \| `GOOGLE_API_KEY` \|
	213	+\| `"microsoft"` \| Microsoft \| `login.microsoftonline.com/.../oauth2/...` \| `MICROSOFT_CLIENT_ID` \| -- \|
	214	+
	215	+### Zoom
	216	+
	217	+Supports both Server-to-Server (via `ZOOM_ACCOUNT_ID`) and OAuth PKCE flows.
	218	+
	219	+```bash
	220	+# Server-to-Server
	221	+export ZOOM_CLIENT_ID="..."
	222	+export ZOOM_CLIENT_SECRET="..."
	223	+export ZOOM_ACCOUNT_ID="..."
	224	+
	225	+# Or interactive OAuth (omit ZOOM_ACCOUNT_ID)
	226	+export ZOOM_CLIENT_ID="..."
	227	+export ZOOM_CLIENT_SECRET="..."
	228	+```
	229	+
	230	+### Google (Drive, Meet, Workspace)
	231	+
	232	+Supports OAuth PKCE and API key fallback. Scopes include Drive and Docs read-only access.
	233	+
	234	+```bash
	235	+export GOOGLE_CLIENT_ID="..."
	236	+export GOOGLE_CLIENT_SECRET="..."
	237	+# Or for API-key-only access:
	238	+export GOOGLE_API_KEY="..."
	239	+```
	240	+
	241	+### GitHub
	242	+
	243	+Supports OAuth PKCE and personal access token. Requests `repo` and `read:org` scopes.
	244	+
	245	+```bash
	246	+# OAuth
	247	+export GITHUB_CLIENT_ID="..."
	248	+export GITHUB_CLIENT_SECRET="..."
	249	+# Or personal access token
	250	+export GITHUB_TOKEN="ghp_..."
	251	+```
	252	+
	253	+---
	254	+
	255	+## Helper Functions
	256	+
	257	+### get_auth_config()
	258	+
	259	+```python
	260	+def get_auth_config(service: str) -> Optional[AuthConfig]
	261	+```
	262	+
	263	+Get a pre-built `AuthConfig` for a known service.
	264	+
	265	+Parameters:
	266	+
	267	+\| Parameter \| Type \| Description \|
	268	+\|---\|---\|---\|
	269	+\| `service` \| `str` \| Service name (e.g., `"zoom"`, `"notion"`, `"github"`) \|
	270	+
	271	+Returns: `Optional[AuthConfig]` -- the config, or `None` if the service is not in `KNOWN_CONFIGS`.
	272	+
	273	+### get_auth_manager()
	274	+
	275	+```python
	276	+def get_auth_manager(service: str) -> Optional[OAuthManager]
	277	+```
	278	+
	279	+Get an `OAuthManager` for a known service. Convenience wrapper that looks up the config and creates the manager in one call.
	280	+
	281	+Returns: `Optional[OAuthManager]` -- the manager, or `None` if the service is not known.
	282	+
	283	+---
	284	+
	285	+## Usage Examples
	286	+
	287	+### Quick authentication for a known service
	288	+
	289	+```python
	290	+from video_processor.auth import get_auth_manager
	291	+
	292	+manager = get_auth_manager("zoom")
	293	+if manager:
	294	+ result = manager.authenticate()
	295	+ if result.success:
	296	+ print(f"Authenticated via {result.method}")
	297	+ # Use result.access_token for API calls
	298	+ else:
	299	+ print(f"Failed: {result.error}")
	300	+```
	301	+
	302	+### Custom service configuration
	303	+
	304	+```python
	305	+from video_processor.auth import AuthConfig, OAuthManager
	306	+
	307	+config = AuthConfig(
	308	+ service="my_service",
	309	+ oauth_authorize_url="https://my-service.com/oauth/authorize",
	310	+ oauth_token_url="https://my-service.com/oauth/token",
	311	+ client_id_env="MY_SERVICE_CLIENT_ID",
	312	+ client_secret_env="MY_SERVICE_CLIENT_SECRET",
	313	+ api_key_env="MY_SERVICE_API_KEY",
	314	+ scopes=["read", "write"],
	315	+)
	316	+
	317	+manager = OAuthManager(config)
	318	+token = manager.get_token() # Returns str or None
	319	+```
	320	+
	321	+### Using auth in a custom source connector
	322	+
	323	+```python
	324	+from pathlib import Path
	325	+from typing import List, Optional
	326	+
	327	+from video_processor.auth import OAuthManager, AuthConfig
	328	+from video_processor.sources.base import BaseSource, SourceFile
	329	+
	330	+class CustomSource(BaseSource):
	331	+ def __init__(self):
	332	+ self._config = AuthConfig(
	333	+ service="custom",
	334	+ api_key_env="CUSTOM_API_KEY",
	335	+ )
	336	+ self._manager = OAuthManager(self._config)
	337	+ self._token: Optional[str] = None
	338	+
	339	+ def authenticate(self) -> bool:
	340	+ self._token = self._manager.get_token()
	341	+ return self._token is not None
	342	+
	343	+ def list_videos(self, **kwargs) -> List[SourceFile]:
	344	+ # Use self._token to query the API
	345	+ ...
	346	+
	347	+ def download(self, file: SourceFile, destination: Path) -> Path:
	348	+ # Use self._token for authenticated downloads
	349	+ ...
	350	+```
	351	+
	352	+### Logout / clear saved token
	353	+
	354	+```python
	355	+from video_processor.auth import get_auth_manager
	356	+
	357	+manager = get_auth_manager("zoom")
	358	+if manager:
	359	+ manager.clear_token()
	360	+ print("Zoom token cleared")
	361	+```
	362	+
	363	+### Token storage location
	364	+
	365	+All tokens are stored under `~/.planopticon/`:
	366	+
	367	+```
	368	+~/.planopticon/
	369	+ zoom_token.json
	370	+ notion_token.json
	371	+ github_token.json
	372	+ google_token.json
	373	+ microsoft_token.json
	374	+ dropbox_token.json
	375	+```
	376	+
	377	+Each file contains a JSON object with `access_token`, `refresh_token` (if applicable), `expires_at`, and client credentials for refresh.

	--- a/docs/api/auth.md
	+++ b/docs/api/auth.md
	@@ -0,0 +1,377 @@

	--- a/docs/api/auth.md
	+++ b/docs/api/auth.md
	@@ -0,0 +1,377 @@
1	# Auth API Reference
2
3	::: video_processor.auth
4
5	---
6
7	## Overview
8
9	The `video_processor.auth` module provides a unified OAuth and authentication strategy for all PlanOpticon source connectors. It supports multiple authentication methods tried in a consistent order:
10
11	1. Saved token -- load from disk, auto-refresh if expired
12	2. Client Credentials -- server-to-server OAuth (e.g., Zoom S2S)
13	3. OAuth 2.0 PKCE -- interactive Authorization Code flow with PKCE
14	4. API key fallback -- environment variable lookup
15
16	Tokens are persisted to `~/.planopticon/` and automatically refreshed on expiry.
17
18	---
19
20	## AuthConfig
21
22	```python
23	from video_processor.auth import AuthConfig
24	```
25
26	Dataclass configuring authentication for a specific service. Defines OAuth endpoints, client credentials, API key fallback, scopes, and token storage.
27
28	### Fields
29
30	\| Field \| Type \| Default \| Description \|
31	\|---\|---\|---\|---\|
32	\| `service` \| `str` \| required \| Service identifier (e.g., `"zoom"`, `"notion"`) \|
33	\| `oauth_authorize_url` \| `Optional[str]` \| `None` \| OAuth authorization endpoint URL \|
34	\| `oauth_token_url` \| `Optional[str]` \| `None` \| OAuth token exchange endpoint URL \|
35	\| `client_id` \| `Optional[str]` \| `None` \| OAuth client ID (direct value) \|
36	\| `client_secret` \| `Optional[str]` \| `None` \| OAuth client secret (direct value) \|
37	\| `client_id_env` \| `Optional[str]` \| `None` \| Environment variable for client ID \|
38	\| `client_secret_env` \| `Optional[str]` \| `None` \| Environment variable for client secret \|
39	\| `api_key_env` \| `Optional[str]` \| `None` \| Environment variable for API key fallback \|
40	\| `scopes` \| `List[str]` \| `[]` \| OAuth scopes to request \|
41	\| `redirect_uri` \| `str` \| `"urn:ietf:wg:oauth:2.0:oob"` \| Redirect URI for auth code flow \|
42	\| `account_id` \| `Optional[str]` \| `None` \| Account ID for client credentials grant (direct value) \|
43	\| `account_id_env` \| `Optional[str]` \| `None` \| Environment variable for account ID \|
44	\| `token_path` \| `Optional[Path]` \| `None` \| Custom token storage path \|
45
46	### Resolved Properties
47
48	These properties resolve values by checking the direct field first, then falling back to the environment variable.
49
50	\| Property \| Return Type \| Description \|
51	\|---\|---\|---\|
52	\| `resolved_client_id` \| `Optional[str]` \| Client ID from `client_id` or `os.environ[client_id_env]` \|
53	\| `resolved_client_secret` \| `Optional[str]` \| Client secret from `client_secret` or `os.environ[client_secret_env]` \|
54	\| `resolved_api_key` \| `Optional[str]` \| API key from `os.environ[api_key_env]` \|
55	\| `resolved_account_id` \| `Optional[str]` \| Account ID from `account_id` or `os.environ[account_id_env]` \|
56	\| `resolved_token_path` \| `Path` \| Token file path: `token_path` or `~/.planopticon/{service}_token.json` \|
57	\| `supports_oauth` \| `bool` \| `True` if both `oauth_authorize_url` and `oauth_token_url` are set \|
58
59	```python
60	from video_processor.auth import AuthConfig
61
62	config = AuthConfig(
63	service="notion",
64	oauth_authorize_url="https://api.notion.com/v1/oauth/authorize",
65	oauth_token_url="https://api.notion.com/v1/oauth/token",
66	client_id_env="NOTION_CLIENT_ID",
67	client_secret_env="NOTION_CLIENT_SECRET",
68	api_key_env="NOTION_API_KEY",
69	scopes=["read_content"],
70	)
71
72	# Check resolved values
73	print(config.resolved_client_id) # From NOTION_CLIENT_ID env var
74	print(config.supports_oauth) # True
75	print(config.resolved_token_path) # ~/.planopticon/notion_token.json
76	```
77
78	---
79
80	## AuthResult
81
82	```python
83	from video_processor.auth import AuthResult
84	```
85
86	Dataclass representing the result of an authentication attempt.
87
88	\| Field \| Type \| Default \| Description \|
89	\|---\|---\|---\|---\|
90	\| `success` \| `bool` \| required \| Whether authentication succeeded \|
91	\| `access_token` \| `Optional[str]` \| `None` \| The access token (if successful) \|
92	\| `method` \| `Optional[str]` \| `None` \| Auth method used: `"saved_token"`, `"oauth_pkce"`, `"client_credentials"`, `"api_key"` \|
93	\| `expires_at` \| `Optional[float]` \| `None` \| Token expiration as Unix timestamp \|
94	\| `refresh_token` \| `Optional[str]` \| `None` \| OAuth refresh token (if available) \|
95	\| `error` \| `Optional[str]` \| `None` \| Error message (if failed) \|
96
97	```python
98	result = manager.authenticate()
99	if result.success:
100	print(f"Authenticated via {result.method}")
101	print(f"Token: {result.access_token[:20]}...")
102	if result.expires_at:
103	import time
104	remaining = result.expires_at - time.time()
105	print(f"Expires in {remaining/60:.0f} minutes")
106	else:
107	print(f"Auth failed: {result.error}")
108	```
109
110	---
111
112	## OAuthManager
113
114	```python
115	from video_processor.auth import OAuthManager
116	```
117
118	Manages the full authentication lifecycle for a service. Tries auth methods in priority order and handles token persistence, refresh, and PKCE flow.
119
120	### Constructor
121
122	```python
123	def __init__(self, config: AuthConfig)
124	```
125
126	\| Parameter \| Type \| Description \|
127	\|---\|---\|---\|
128	\| `config` \| `AuthConfig` \| Authentication configuration for the target service \|
129
130	### authenticate()
131
132	```python
133	def authenticate(self) -> AuthResult
134	```
135
136	Run the full auth chain and return the result. Methods are tried in order:
137
138	1. Saved token -- checks `~/.planopticon/{service}_token.json`, refreshes if expired
139	2. Client Credentials -- if `account_id` is set and OAuth is configured, uses the client credentials grant (server-to-server)
140	3. OAuth PKCE -- if OAuth is configured and client ID is available, opens a browser for interactive authorization with PKCE
141	4. API key -- falls back to the environment variable specified in `api_key_env`
142
143	Returns: `AuthResult` -- success/failure with token and method details.
144
145	If all methods fail, returns an `AuthResult` with `success=False` and a helpful error message listing which environment variables to set.
146
147	### get_token()
148
149	```python
150	def get_token(self) -> Optional[str]
151	```
152
153	Convenience method: run `authenticate()` and return just the access token string.
154
155	Returns: `Optional[str]` -- the access token, or `None` if authentication failed.
156
157	### clear_token()
158
159	```python
160	def clear_token(self) -> None
161	```
162
163	Remove the saved token file for this service (effectively a logout). The next `authenticate()` call will require re-authentication.
164
165	---
166
167	## Authentication Flows
168
169	### Saved Token (auto-refresh)
170
171	Tokens are saved to `~/.planopticon/{service}_token.json` as JSON. On each `authenticate()` call, the saved token is loaded and checked:
172
173	- If the token has not expired (`time.time() < expires_at`), it is returned immediately
174	- If expired but a refresh token is available, the manager attempts to refresh using the OAuth token endpoint
175	- The refreshed token is saved back to disk
176
177	### Client Credentials Grant
178
179	Used for server-to-server authentication (e.g., Zoom Server-to-Server OAuth). Requires `account_id`, `client_id`, and `client_secret`. Sends a POST to the token endpoint with `grant_type=account_credentials`.
180
181	### OAuth 2.0 Authorization Code with PKCE
182
183	Interactive flow for user authentication:
184
185	1. Generates a PKCE code verifier and S256 challenge
186	2. Constructs the authorization URL with client ID, redirect URI, scopes, and PKCE challenge
187	3. Opens the URL in the user's browser
188	4. Prompts the user to paste the authorization code
189	5. Exchanges the code for tokens at the token endpoint
190	6. Saves the tokens to disk
191
192	### API Key Fallback
193
194	If no OAuth flow succeeds, falls back to checking the environment variable specified in `api_key_env`. Returns the value directly as the access token.
195
196	---
197
198	## KNOWN_CONFIGS
199
200	```python
201	from video_processor.auth import KNOWN_CONFIGS
202	```
203
204	Pre-built `AuthConfig` instances for supported services. These cover the most common cloud integrations and can be used directly or as templates for custom configurations.
205
206	\| Service Key \| Service \| OAuth Endpoints \| Client ID Env \| API Key Env \|
207	\|---\|---\|---\|---\|---\|
208	\| `"zoom"` \| Zoom \| `zoom.us/oauth/...` \| `ZOOM_CLIENT_ID` \| -- \|
209	\| `"notion"` \| Notion \| `api.notion.com/v1/oauth/...` \| `NOTION_CLIENT_ID` \| `NOTION_API_KEY` \|
210	\| `"dropbox"` \| Dropbox \| `dropbox.com/oauth2/...` \| `DROPBOX_APP_KEY` \| `DROPBOX_ACCESS_TOKEN` \|
211	\| `"github"` \| GitHub \| `github.com/login/oauth/...` \| `GITHUB_CLIENT_ID` \| `GITHUB_TOKEN` \|
212	\| `"google"` \| Google \| `accounts.google.com/o/oauth2/...` \| `GOOGLE_CLIENT_ID` \| `GOOGLE_API_KEY` \|
213	\| `"microsoft"` \| Microsoft \| `login.microsoftonline.com/.../oauth2/...` \| `MICROSOFT_CLIENT_ID` \| -- \|
214
215	### Zoom
216
217	Supports both Server-to-Server (via `ZOOM_ACCOUNT_ID`) and OAuth PKCE flows.
218
219	```bash
220	# Server-to-Server
221	export ZOOM_CLIENT_ID="..."
222	export ZOOM_CLIENT_SECRET="..."
223	export ZOOM_ACCOUNT_ID="..."
224
225	# Or interactive OAuth (omit ZOOM_ACCOUNT_ID)
226	export ZOOM_CLIENT_ID="..."
227	export ZOOM_CLIENT_SECRET="..."
228	```
229
230	### Google (Drive, Meet, Workspace)
231
232	Supports OAuth PKCE and API key fallback. Scopes include Drive and Docs read-only access.
233
234	```bash
235	export GOOGLE_CLIENT_ID="..."
236	export GOOGLE_CLIENT_SECRET="..."
237	# Or for API-key-only access:
238	export GOOGLE_API_KEY="..."
239	```
240
241	### GitHub
242
243	Supports OAuth PKCE and personal access token. Requests `repo` and `read:org` scopes.
244
245	```bash
246	# OAuth
247	export GITHUB_CLIENT_ID="..."
248	export GITHUB_CLIENT_SECRET="..."
249	# Or personal access token
250	export GITHUB_TOKEN="ghp_..."
251	```
252
253	---
254
255	## Helper Functions
256
257	### get_auth_config()
258
259	```python
260	def get_auth_config(service: str) -> Optional[AuthConfig]
261	```
262
263	Get a pre-built `AuthConfig` for a known service.
264
265	Parameters:
266
267	\| Parameter \| Type \| Description \|
268	\|---\|---\|---\|
269	\| `service` \| `str` \| Service name (e.g., `"zoom"`, `"notion"`, `"github"`) \|
270
271	Returns: `Optional[AuthConfig]` -- the config, or `None` if the service is not in `KNOWN_CONFIGS`.
272
273	### get_auth_manager()
274
275	```python
276	def get_auth_manager(service: str) -> Optional[OAuthManager]
277	```
278
279	Get an `OAuthManager` for a known service. Convenience wrapper that looks up the config and creates the manager in one call.
280
281	Returns: `Optional[OAuthManager]` -- the manager, or `None` if the service is not known.
282
283	---
284
285	## Usage Examples
286
287	### Quick authentication for a known service
288
289	```python
290	from video_processor.auth import get_auth_manager
291
292	manager = get_auth_manager("zoom")
293	if manager:
294	result = manager.authenticate()
295	if result.success:
296	print(f"Authenticated via {result.method}")
297	# Use result.access_token for API calls
298	else:
299	print(f"Failed: {result.error}")
300	```
301
302	### Custom service configuration
303
304	```python
305	from video_processor.auth import AuthConfig, OAuthManager
306
307	config = AuthConfig(
308	service="my_service",
309	oauth_authorize_url="https://my-service.com/oauth/authorize",
310	oauth_token_url="https://my-service.com/oauth/token",
311	client_id_env="MY_SERVICE_CLIENT_ID",
312	client_secret_env="MY_SERVICE_CLIENT_SECRET",
313	api_key_env="MY_SERVICE_API_KEY",
314	scopes=["read", "write"],
315	)
316
317	manager = OAuthManager(config)
318	token = manager.get_token() # Returns str or None
319	```
320
321	### Using auth in a custom source connector
322
323	```python
324	from pathlib import Path
325	from typing import List, Optional
326
327	from video_processor.auth import OAuthManager, AuthConfig
328	from video_processor.sources.base import BaseSource, SourceFile
329
330	class CustomSource(BaseSource):
331	def __init__(self):
332	self._config = AuthConfig(
333	service="custom",
334	api_key_env="CUSTOM_API_KEY",
335	)
336	self._manager = OAuthManager(self._config)
337	self._token: Optional[str] = None
338
339	def authenticate(self) -> bool:
340	self._token = self._manager.get_token()
341	return self._token is not None
342
343	def list_videos(self, **kwargs) -> List[SourceFile]:
344	# Use self._token to query the API
345	...
346
347	def download(self, file: SourceFile, destination: Path) -> Path:
348	# Use self._token for authenticated downloads
349	...
350	```
351
352	### Logout / clear saved token
353
354	```python
355	from video_processor.auth import get_auth_manager
356
357	manager = get_auth_manager("zoom")
358	if manager:
359	manager.clear_token()
360	print("Zoom token cleared")
361	```
362
363	### Token storage location
364
365	All tokens are stored under `~/.planopticon/`:
366
367	```
368	~/.planopticon/
369	zoom_token.json
370	notion_token.json
371	github_token.json
372	google_token.json
373	microsoft_token.json
374	dropbox_token.json
375	```
376
377	Each file contains a JSON object with `access_token`, `refresh_token` (if applicable), `expires_at`, and client credentials for refresh.

M docs/api/models.md

+498

		--- docs/api/models.md
		+++ docs/api/models.md
		@@ -1,3 +1,501 @@
1	1	# Models API Reference
2	2
3	3	::: video_processor.models
	4	+
	5	+---
	6	+
	7	+## Overview
	8	+
	9	+The `video_processor.models` module defines all Pydantic data models used throughout PlanOpticon for structured output, serialization, and validation. These models represent everything from individual transcript segments to complete batch processing manifests.
	10	+
	11	+All models inherit from `pydantic.BaseModel` and support JSON serialization via `.model_dump_json()` and deserialization via `.model_validate_json()`.
	12	+
	13	+---
	14	+
	15	+## Enumerations
	16	+
	17	+### DiagramType
	18	+
	19	+Types of visual content detected in video frames.
	20	+
	21	+```python
	22	+from video_processor.models import DiagramType
	23	+```
	24	+
	25	+\| Value \| Description \|
	26	+\|---\|---\|
	27	+\| `flowchart` \| Process flow or decision tree diagrams \|
	28	+\| `sequence` \| Sequence or interaction diagrams \|
	29	+\| `architecture` \| System architecture diagrams \|
	30	+\| `whiteboard` \| Whiteboard drawings or sketches \|
	31	+\| `chart` \| Data charts (bar, line, pie, scatter) \|
	32	+\| `table` \| Tabular data \|
	33	+\| `slide` \| Presentation slides \|
	34	+\| `screenshot` \| Application screenshots or screen shares \|
	35	+\| `unknown` \| Unclassified visual content \|
	36	+
	37	+### OutputFormat
	38	+
	39	+Available output formats for processing results.
	40	+
	41	+\| Value \| Description \|
	42	+\|---\|---\|
	43	+\| `markdown` \| Markdown text \|
	44	+\| `json` \| JSON data \|
	45	+\| `html` \| HTML document \|
	46	+\| `pdf` \| PDF document \|
	47	+\| `svg` \| SVG vector graphic \|
	48	+\| `png` \| PNG raster image \|
	49	+
	50	+### PlanningEntityType
	51	+
	52	+Classification types for entities in a planning taxonomy.
	53	+
	54	+\| Value \| Description \|
	55	+\|---\|---\|
	56	+\| `goal` \| Project goals or objectives \|
	57	+\| `requirement` \| Functional or non-functional requirements \|
	58	+\| `constraint` \| Limitations or constraints \|
	59	+\| `decision` \| Decisions made during planning \|
	60	+\| `risk` \| Identified risks \|
	61	+\| `assumption` \| Planning assumptions \|
	62	+\| `dependency` \| External or internal dependencies \|
	63	+\| `milestone` \| Project milestones \|
	64	+\| `task` \| Actionable tasks \|
	65	+\| `feature` \| Product features \|
	66	+
	67	+### PlanningRelationshipType
	68	+
	69	+Relationship types within a planning taxonomy.
	70	+
	71	+\| Value \| Description \|
	72	+\|---\|---\|
	73	+\| `requires` \| Entity A requires entity B \|
	74	+\| `blocked_by` \| Entity A is blocked by entity B \|
	75	+\| `has_risk` \| Entity A has an associated risk B \|
	76	+\| `depends_on` \| Entity A depends on entity B \|
	77	+\| `addresses` \| Entity A addresses entity B \|
	78	+\| `has_tradeoff` \| Entity A involves a tradeoff with entity B \|
	79	+\| `delivers` \| Entity A delivers entity B \|
	80	+\| `implements` \| Entity A implements entity B \|
	81	+\| `parent_of` \| Entity A is the parent of entity B \|
	82	+
	83	+---
	84	+
	85	+## Protocols
	86	+
	87	+### ProgressCallback
	88	+
	89	+A runtime-checkable protocol for receiving pipeline progress updates. Implement this interface to integrate custom progress reporting (e.g., web UI, logging).
	90	+
	91	+```python
	92	+from video_processor.models import ProgressCallback
	93	+
	94	+class MyProgress:
	95	+ def on_step_start(self, step: str, index: int, total: int) -> None:
	96	+ print(f"Starting {step} ({index}/{total})")
	97	+
	98	+ def on_step_complete(self, step: str, index: int, total: int) -> None:
	99	+ print(f"Completed {step} ({index}/{total})")
	100	+
	101	+ def on_progress(self, step: str, percent: float, message: str = "") -> None:
	102	+ print(f"{step}: {percent:.0f}% {message}")
	103	+
	104	+assert isinstance(MyProgress(), ProgressCallback) # True
	105	+```
	106	+
	107	+Methods:
	108	+
	109	+\| Method \| Parameters \| Description \|
	110	+\|---\|---\|---\|
	111	+\| `on_step_start` \| `step: str`, `index: int`, `total: int` \| Called when a pipeline step begins \|
	112	+\| `on_step_complete` \| `step: str`, `index: int`, `total: int` \| Called when a pipeline step finishes \|
	113	+\| `on_progress` \| `step: str`, `percent: float`, `message: str` \| Called with incremental progress updates \|
	114	+
	115	+---
	116	+
	117	+## Transcript Models
	118	+
	119	+### TranscriptSegment
	120	+
	121	+A single segment of transcribed audio with timing and optional speaker identification.
	122	+
	123	+\| Field \| Type \| Default \| Description \|
	124	+\|---\|---\|---\|---\|
	125	+\| `start` \| `float` \| required \| Start time in seconds \|
	126	+\| `end` \| `float` \| required \| End time in seconds \|
	127	+\| `text` \| `str` \| required \| Transcribed text content \|
	128	+\| `speaker` \| `Optional[str]` \| `None` \| Speaker identifier (e.g., "Speaker 1") \|
	129	+\| `confidence` \| `Optional[float]` \| `None` \| Transcription confidence score (0.0 to 1.0) \|
	130	+
	131	+```json
	132	+{
	133	+ "start": 12.5,
	134	+ "end": 15.3,
	135	+ "text": "We should migrate to the new API by next quarter.",
	136	+ "speaker": "Alice",
	137	+ "confidence": 0.95
	138	+}
	139	+```
	140	+
	141	+---
	142	+
	143	+## Content Extraction Models
	144	+
	145	+### ActionItem
	146	+
	147	+An action item extracted from transcript or diagram content.
	148	+
	149	+\| Field \| Type \| Default \| Description \|
	150	+\|---\|---\|---\|---\|
	151	+\| `action` \| `str` \| required \| The action to be taken \|
	152	+\| `assignee` \| `Optional[str]` \| `None` \| Person responsible for the action \|
	153	+\| `deadline` \| `Optional[str]` \| `None` \| Deadline or timeframe \|
	154	+\| `priority` \| `Optional[str]` \| `None` \| Priority level (e.g., "high", "medium", "low") \|
	155	+\| `context` \| `Optional[str]` \| `None` \| Additional context or notes \|
	156	+\| `source` \| `Optional[str]` \| `None` \| Where this was found: `"transcript"`, `"diagram"`, or `"both"` \|
	157	+
	158	+```json
	159	+{
	160	+ "action": "Migrate authentication service to OAuth 2.0",
	161	+ "assignee": "Bob",
	162	+ "deadline": "Q2 2026",
	163	+ "priority": "high",
	164	+ "context": "at 245s",
	165	+ "source": "transcript"
	166	+}
	167	+```
	168	+
	169	+### KeyPoint
	170	+
	171	+A key point extracted from content, optionally linked to diagrams.
	172	+
	173	+\| Field \| Type \| Default \| Description \|
	174	+\|---\|---\|---\|---\|
	175	+\| `point` \| `str` \| required \| The key point text \|
	176	+\| `topic` \| `Optional[str]` \| `None` \| Topic or category \|
	177	+\| `details` \| `Optional[str]` \| `None` \| Supporting details \|
	178	+\| `timestamp` \| `Optional[float]` \| `None` \| Timestamp in video (seconds) \|
	179	+\| `source` \| `Optional[str]` \| `None` \| Where this was found \|
	180	+\| `related_diagrams` \| `List[int]` \| `[]` \| Indices of related diagrams in the manifest \|
	181	+
	182	+```json
	183	+{
	184	+ "point": "Team decided to use FalkorDB for graph storage",
	185	+ "topic": "Architecture",
	186	+ "details": "Embedded database avoids infrastructure overhead for CLI use",
	187	+ "timestamp": 342.0,
	188	+ "source": "transcript",
	189	+ "related_diagrams": [0, 2]
	190	+}
	191	+```
	192	+
	193	+---
	194	+
	195	+## Diagram Models
	196	+
	197	+### DiagramResult
	198	+
	199	+Result from diagram extraction and analysis. Contains structured data extracted from visual content, along with paths to output files.
	200	+
	201	+\| Field \| Type \| Default \| Description \|
	202	+\|---\|---\|---\|---\|
	203	+\| `frame_index` \| `int` \| required \| Index of the source frame \|
	204	+\| `timestamp` \| `Optional[float]` \| `None` \| Timestamp in video (seconds) \|
	205	+\| `diagram_type` \| `DiagramType` \| `unknown` \| Type of diagram detected \|
	206	+\| `confidence` \| `float` \| `0.0` \| Detection confidence (0.0 to 1.0) \|
	207	+\| `description` \| `Optional[str]` \| `None` \| Detailed description of the diagram \|
	208	+\| `text_content` \| `Optional[str]` \| `None` \| All visible text, preserving structure \|
	209	+\| `elements` \| `List[str]` \| `[]` \| Identified elements or components \|
	210	+\| `relationships` \| `List[str]` \| `[]` \| Identified relationships (e.g., `"A -> B: connects"`) \|
	211	+\| `mermaid` \| `Optional[str]` \| `None` \| Mermaid syntax representation \|
	212	+\| `chart_data` \| `Optional[Dict[str, Any]]` \| `None` \| Extractable chart data (`labels`, `values`, `chart_type`) \|
	213	+\| `image_path` \| `Optional[str]` \| `None` \| Relative path to original frame image \|
	214	+\| `svg_path` \| `Optional[str]` \| `None` \| Relative path to rendered SVG \|
	215	+\| `png_path` \| `Optional[str]` \| `None` \| Relative path to rendered PNG \|
	216	+\| `mermaid_path` \| `Optional[str]` \| `None` \| Relative path to mermaid source file \|
	217	+
	218	+```json
	219	+{
	220	+ "frame_index": 5,
	221	+ "timestamp": 120.0,
	222	+ "diagram_type": "architecture",
	223	+ "confidence": 0.92,
	224	+ "description": "Microservices architecture showing API gateway, auth service, and database layer",
	225	+ "text_content": "API Gateway\nAuth Service\nUser DB\nPostgreSQL",
	226	+ "elements": ["API Gateway", "Auth Service", "User DB", "PostgreSQL"],
	227	+ "relationships": ["API Gateway -> Auth Service: authenticates", "Auth Service -> User DB: queries"],
	228	+ "mermaid": "graph LR\n A[API Gateway] --> B[Auth Service]\n B --> C[User DB]",
	229	+ "chart_data": null,
	230	+ "image_path": "diagrams/diagram_0.jpg",
	231	+ "svg_path": null,
	232	+ "png_path": null,
	233	+ "mermaid_path": "diagrams/diagram_0.mermaid"
	234	+}
	235	+```
	236	+
	237	+### ScreenCapture
	238	+
	239	+A screengrab fallback created when diagram extraction fails or confidence is too low for full analysis.
	240	+
	241	+\| Field \| Type \| Default \| Description \|
	242	+\|---\|---\|---\|---\|
	243	+\| `frame_index` \| `int` \| required \| Index of the source frame \|
	244	+\| `timestamp` \| `Optional[float]` \| `None` \| Timestamp in video (seconds) \|
	245	+\| `caption` \| `Optional[str]` \| `None` \| Brief description of the content \|
	246	+\| `image_path` \| `Optional[str]` \| `None` \| Relative path to screenshot image \|
	247	+\| `confidence` \| `float` \| `0.0` \| Detection confidence that triggered fallback \|
	248	+
	249	+```json
	250	+{
	251	+ "frame_index": 8,
	252	+ "timestamp": 195.0,
	253	+ "caption": "Code editor showing a Python function definition",
	254	+ "image_path": "captures/capture_0.jpg",
	255	+ "confidence": 0.45
	256	+}
	257	+```
	258	+
	259	+---
	260	+
	261	+## Knowledge Graph Models
	262	+
	263	+### Entity
	264	+
	265	+An entity in the knowledge graph, representing a person, concept, technology, or other named item extracted from content.
	266	+
	267	+\| Field \| Type \| Default \| Description \|
	268	+\|---\|---\|---\|---\|
	269	+\| `name` \| `str` \| required \| Entity name \|
	270	+\| `type` \| `str` \| `"concept"` \| Entity type: `"person"`, `"concept"`, `"technology"`, `"time"`, `"diagram"` \|
	271	+\| `descriptions` \| `List[str]` \| `[]` \| Accumulated descriptions of this entity \|
	272	+\| `source` \| `Optional[str]` \| `None` \| Source attribution: `"transcript"`, `"diagram"`, or `"both"` \|
	273	+\| `occurrences` \| `List[Dict[str, Any]]` \| `[]` \| Occurrences with source, timestamp, and text context \|
	274	+
	275	+```json
	276	+{
	277	+ "name": "FalkorDB",
	278	+ "type": "technology",
	279	+ "descriptions": ["Embedded graph database", "Supports Cypher queries"],
	280	+ "source": "both",
	281	+ "occurrences": [
	282	+ {"source": "transcript", "timestamp": 120.0, "text": "We chose FalkorDB for graph storage"},
	283	+ {"source": "diagram", "text": "FalkorDB Lite"}
	284	+ ]
	285	+}
	286	+```
	287	+
	288	+### Relationship
	289	+
	290	+A directed relationship between two entities in the knowledge graph.
	291	+
	292	+\| Field \| Type \| Default \| Description \|
	293	+\|---\|---\|---\|---\|
	294	+\| `source` \| `str` \| required \| Source entity name \|
	295	+\| `target` \| `str` \| required \| Target entity name \|
	296	+\| `type` \| `str` \| `"related_to"` \| Relationship type (e.g., `"uses"`, `"manages"`, `"related_to"`) \|
	297	+\| `content_source` \| `Optional[str]` \| `None` \| Content source identifier \|
	298	+\| `timestamp` \| `Optional[float]` \| `None` \| Timestamp in seconds \|
	299	+
	300	+```json
	301	+{
	302	+ "source": "PlanOpticon",
	303	+ "target": "FalkorDB",
	304	+ "type": "uses",
	305	+ "content_source": "transcript",
	306	+ "timestamp": 125.0
	307	+}
	308	+```
	309	+
	310	+### SourceRecord
	311	+
	312	+A content source registered in the knowledge graph for provenance tracking.
	313	+
	314	+\| Field \| Type \| Default \| Description \|
	315	+\|---\|---\|---\|---\|
	316	+\| `source_id` \| `str` \| required \| Unique identifier for this source \|
	317	+\| `source_type` \| `str` \| required \| Source type: `"video"`, `"document"`, `"url"`, `"api"`, `"manual"` \|
	318	+\| `title` \| `str` \| required \| Human-readable title \|
	319	+\| `path` \| `Optional[str]` \| `None` \| Local file path \|
	320	+\| `url` \| `Optional[str]` \| `None` \| URL if applicable \|
	321	+\| `mime_type` \| `Optional[str]` \| `None` \| MIME type of the source \|
	322	+\| `ingested_at` \| `str` \| auto \| ISO format ingestion timestamp (auto-generated) \|
	323	+\| `metadata` \| `Dict[str, Any]` \| `{}` \| Additional source metadata \|
	324	+
	325	+```json
	326	+{
	327	+ "source_id": "vid_abc123",
	328	+ "source_type": "video",
	329	+ "title": "Sprint Planning Meeting - Jan 15",
	330	+ "path": "/recordings/sprint-planning.mp4",
	331	+ "url": null,
	332	+ "mime_type": "video/mp4",
	333	+ "ingested_at": "2026-01-15T10:30:00",
	334	+ "metadata": {"duration": 3600, "resolution": "1920x1080"}
	335	+}
	336	+```
	337	+
	338	+### KnowledgeGraphData
	339	+
	340	+Serializable knowledge graph data containing all nodes, relationships, and source provenance.
	341	+
	342	+\| Field \| Type \| Default \| Description \|
	343	+\|---\|---\|---\|---\|
	344	+\| `nodes` \| `List[Entity]` \| `[]` \| Graph nodes/entities \|
	345	+\| `relationships` \| `List[Relationship]` \| `[]` \| Graph relationships \|
	346	+\| `sources` \| `List[SourceRecord]` \| `[]` \| Content sources for provenance tracking \|
	347	+
	348	+---
	349	+
	350	+## Planning Models
	351	+
	352	+### PlanningEntity
	353	+
	354	+An entity classified for planning purposes, with priority and status tracking.
	355	+
	356	+\| Field \| Type \| Default \| Description \|
	357	+\|---\|---\|---\|---\|
	358	+\| `name` \| `str` \| required \| Entity name \|
	359	+\| `planning_type` \| `PlanningEntityType` \| required \| Planning classification \|
	360	+\| `description` \| `str` \| `""` \| Detailed description \|
	361	+\| `priority` \| `Optional[str]` \| `None` \| Priority: `"high"`, `"medium"`, `"low"` \|
	362	+\| `status` \| `Optional[str]` \| `None` \| Status: `"identified"`, `"confirmed"`, `"resolved"` \|
	363	+\| `source_entities` \| `List[str]` \| `[]` \| Names of source KG entities this was derived from \|
	364	+\| `metadata` \| `Dict[str, Any]` \| `{}` \| Additional metadata \|
	365	+
	366	+```json
	367	+{
	368	+ "name": "Migrate to OAuth 2.0",
	369	+ "planning_type": "task",
	370	+ "description": "Replace custom auth with OAuth 2.0 across all services",
	371	+ "priority": "high",
	372	+ "status": "identified",
	373	+ "source_entities": ["OAuth", "Authentication Service"],
	374	+ "metadata": {}
	375	+}
	376	+```
	377	+
	378	+---
	379	+
	380	+## Processing and Metadata Models
	381	+
	382	+### ProcessingStats
	383	+
	384	+Statistics about a processing run, including model usage tracking.
	385	+
	386	+\| Field \| Type \| Default \| Description \|
	387	+\|---\|---\|---\|---\|
	388	+\| `start_time` \| `Optional[str]` \| `None` \| ISO format start time \|
	389	+\| `end_time` \| `Optional[str]` \| `None` \| ISO format end time \|
	390	+\| `duration_seconds` \| `Optional[float]` \| `None` \| Total processing time \|
	391	+\| `frames_extracted` \| `int` \| `0` \| Number of frames extracted from video \|
	392	+\| `people_frames_filtered` \| `int` \| `0` \| Frames filtered out (contained people/webcam) \|
	393	+\| `diagrams_detected` \| `int` \| `0` \| Number of diagrams detected \|
	394	+\| `screen_captures` \| `int` \| `0` \| Number of screen captures saved \|
	395	+\| `transcript_duration_seconds` \| `Optional[float]` \| `None` \| Duration of transcribed audio \|
	396	+\| `models_used` \| `Dict[str, str]` \| `{}` \| Map of task to model used (e.g., `{"vision": "gpt-4o"}`) \|
	397	+
	398	+### VideoMetadata
	399	+
	400	+Metadata about the source video file.
	401	+
	402	+\| Field \| Type \| Default \| Description \|
	403	+\|---\|---\|---\|---\|
	404	+\| `title` \| `str` \| required \| Video title \|
	405	+\| `source_path` \| `Optional[str]` \| `None` \| Original video file path \|
	406	+\| `duration_seconds` \| `Optional[float]` \| `None` \| Video duration in seconds \|
	407	+\| `resolution` \| `Optional[str]` \| `None` \| Video resolution (e.g., `"1920x1080"`) \|
	408	+\| `processed_at` \| `str` \| auto \| ISO format processing timestamp \|
	409	+
	410	+---
	411	+
	412	+## Manifest Models
	413	+
	414	+### VideoManifest
	415	+
	416	+The single source of truth for a video processing run. Contains all output paths, inline structured data, and processing statistics.
	417	+
	418	+\| Field \| Type \| Default \| Description \|
	419	+\|---\|---\|---\|---\|
	420	+\| `version` \| `str` \| `"1.0"` \| Manifest schema version \|
	421	+\| `video` \| `VideoMetadata` \| required \| Source video metadata \|
	422	+\| `stats` \| `ProcessingStats` \| default \| Processing statistics \|
	423	+\| `transcript_json` \| `Optional[str]` \| `None` \| Relative path to transcript JSON \|
	424	+\| `transcript_txt` \| `Optional[str]` \| `None` \| Relative path to transcript text \|
	425	+\| `transcript_srt` \| `Optional[str]` \| `None` \| Relative path to SRT subtitles \|
	426	+\| `analysis_md` \| `Optional[str]` \| `None` \| Relative path to analysis Markdown \|
	427	+\| `analysis_html` \| `Optional[str]` \| `None` \| Relative path to analysis HTML \|
	428	+\| `analysis_pdf` \| `Optional[str]` \| `None` \| Relative path to analysis PDF \|
	429	+\| `knowledge_graph_json` \| `Optional[str]` \| `None` \| Relative path to knowledge graph JSON \|
	430	+\| `knowledge_graph_db` \| `Optional[str]` \| `None` \| Relative path to knowledge graph DB \|
	431	+\| `key_points_json` \| `Optional[str]` \| `None` \| Relative path to key points JSON \|
	432	+\| `action_items_json` \| `Optional[str]` \| `None` \| Relative path to action items JSON \|
	433	+\| `key_points` \| `List[KeyPoint]` \| `[]` \| Inline key points data \|
	434	+\| `action_items` \| `List[ActionItem]` \| `[]` \| Inline action items data \|
	435	+\| `diagrams` \| `List[DiagramResult]` \| `[]` \| Inline diagram results \|
	436	+\| `screen_captures` \| `List[ScreenCapture]` \| `[]` \| Inline screen captures \|
	437	+\| `frame_paths` \| `List[str]` \| `[]` \| Relative paths to extracted frames \|
	438	+
	439	+```python
	440	+from video_processor.models import VideoManifest, VideoMetadata
	441	+
	442	+manifest = VideoManifest(
	443	+ video=VideoMetadata(title="Sprint Planning"),
	444	+ key_points=[...],
	445	+ action_items=[...],
	446	+ diagrams=[...],
	447	+)
	448	+
	449	+# Serialize to JSON
	450	+manifest.model_dump_json(indent=2)
	451	+
	452	+# Load from file
	453	+loaded = VideoManifest.model_validate_json(Path("manifest.json").read_text())
	454	+```
	455	+
	456	+### BatchVideoEntry
	457	+
	458	+Summary of a single video within a batch processing run.
	459	+
	460	+\| Field \| Type \| Default \| Description \|
	461	+\|---\|---\|---\|---\|
	462	+\| `video_name` \| `str` \| required \| Video file name \|
	463	+\| `manifest_path` \| `str` \| required \| Relative path to the video's manifest file \|
	464	+\| `status` \| `str` \| `"pending"` \| Processing status: `"pending"`, `"completed"`, `"failed"` \|
	465	+\| `error` \| `Optional[str]` \| `None` \| Error message if processing failed \|
	466	+\| `diagrams_count` \| `int` \| `0` \| Number of diagrams detected \|
	467	+\| `action_items_count` \| `int` \| `0` \| Number of action items extracted \|
	468	+\| `key_points_count` \| `int` \| `0` \| Number of key points extracted \|
	469	+\| `duration_seconds` \| `Optional[float]` \| `None` \| Processing duration \|
	470	+
	471	+### BatchManifest
	472	+
	473	+Manifest for a batch processing run across multiple videos.
	474	+
	475	+\| Field \| Type \| Default \| Description \|
	476	+\|---\|---\|---\|---\|
	477	+\| `version` \| `str` \| `"1.0"` \| Manifest schema version \|
	478	+\| `title` \| `str` \| `"Batch Processing Results"` \| Batch title \|
	479	+\| `processed_at` \| `str` \| auto \| ISO format timestamp \|
	480	+\| `stats` \| `ProcessingStats` \| default \| Aggregated processing statistics \|
	481	+\| `videos` \| `List[BatchVideoEntry]` \| `[]` \| Per-video summaries \|
	482	+\| `total_videos` \| `int` \| `0` \| Total number of videos in batch \|
	483	+\| `completed_videos` \| `int` \| `0` \| Successfully processed videos \|
	484	+\| `failed_videos` \| `int` \| `0` \| Videos that failed processing \|
	485	+\| `total_diagrams` \| `int` \| `0` \| Total diagrams across all videos \|
	486	+\| `total_action_items` \| `int` \| `0` \| Total action items across all videos \|
	487	+\| `total_key_points` \| `int` \| `0` \| Total key points across all videos \|
	488	+\| `batch_summary_md` \| `Optional[str]` \| `None` \| Relative path to batch summary Markdown \|
	489	+\| `merged_knowledge_graph_json` \| `Optional[str]` \| `None` \| Relative path to merged KG JSON \|
	490	+\| `merged_knowledge_graph_db` \| `Optional[str]` \| `None` \| Relative path to merged KG database \|
	491	+
	492	+```python
	493	+from video_processor.models import BatchManifest
	494	+
	495	+batch = BatchManifest(
	496	+ title="Weekly Recordings",
	497	+ total_videos=5,
	498	+ completed_videos=4,
	499	+ failed_videos=1,
	500	+)
	501	+```
4	502

	--- docs/api/models.md
	+++ docs/api/models.md
	@@ -1,3 +1,501 @@
1	# Models API Reference
2
3	::: video_processor.models


















































































































































































































































































































































































































































































































4

	--- docs/api/models.md
	+++ docs/api/models.md
	@@ -1,3 +1,501 @@
1	# Models API Reference
2
3	::: video_processor.models
4
5	---
6
7	## Overview
8
9	The `video_processor.models` module defines all Pydantic data models used throughout PlanOpticon for structured output, serialization, and validation. These models represent everything from individual transcript segments to complete batch processing manifests.
10
11	All models inherit from `pydantic.BaseModel` and support JSON serialization via `.model_dump_json()` and deserialization via `.model_validate_json()`.
12
13	---
14
15	## Enumerations
16
17	### DiagramType
18
19	Types of visual content detected in video frames.
20
21	```python
22	from video_processor.models import DiagramType
23	```
24
25	\| Value \| Description \|
26	\|---\|---\|
27	\| `flowchart` \| Process flow or decision tree diagrams \|
28	\| `sequence` \| Sequence or interaction diagrams \|
29	\| `architecture` \| System architecture diagrams \|
30	\| `whiteboard` \| Whiteboard drawings or sketches \|
31	\| `chart` \| Data charts (bar, line, pie, scatter) \|
32	\| `table` \| Tabular data \|
33	\| `slide` \| Presentation slides \|
34	\| `screenshot` \| Application screenshots or screen shares \|
35	\| `unknown` \| Unclassified visual content \|
36
37	### OutputFormat
38
39	Available output formats for processing results.
40
41	\| Value \| Description \|
42	\|---\|---\|
43	\| `markdown` \| Markdown text \|
44	\| `json` \| JSON data \|
45	\| `html` \| HTML document \|
46	\| `pdf` \| PDF document \|
47	\| `svg` \| SVG vector graphic \|
48	\| `png` \| PNG raster image \|
49
50	### PlanningEntityType
51
52	Classification types for entities in a planning taxonomy.
53
54	\| Value \| Description \|
55	\|---\|---\|
56	\| `goal` \| Project goals or objectives \|
57	\| `requirement` \| Functional or non-functional requirements \|
58	\| `constraint` \| Limitations or constraints \|
59	\| `decision` \| Decisions made during planning \|
60	\| `risk` \| Identified risks \|
61	\| `assumption` \| Planning assumptions \|
62	\| `dependency` \| External or internal dependencies \|
63	\| `milestone` \| Project milestones \|
64	\| `task` \| Actionable tasks \|
65	\| `feature` \| Product features \|
66
67	### PlanningRelationshipType
68
69	Relationship types within a planning taxonomy.
70
71	\| Value \| Description \|
72	\|---\|---\|
73	\| `requires` \| Entity A requires entity B \|
74	\| `blocked_by` \| Entity A is blocked by entity B \|
75	\| `has_risk` \| Entity A has an associated risk B \|
76	\| `depends_on` \| Entity A depends on entity B \|
77	\| `addresses` \| Entity A addresses entity B \|
78	\| `has_tradeoff` \| Entity A involves a tradeoff with entity B \|
79	\| `delivers` \| Entity A delivers entity B \|
80	\| `implements` \| Entity A implements entity B \|
81	\| `parent_of` \| Entity A is the parent of entity B \|
82
83	---
84
85	## Protocols
86
87	### ProgressCallback
88
89	A runtime-checkable protocol for receiving pipeline progress updates. Implement this interface to integrate custom progress reporting (e.g., web UI, logging).
90
91	```python
92	from video_processor.models import ProgressCallback
93
94	class MyProgress:
95	def on_step_start(self, step: str, index: int, total: int) -> None:
96	print(f"Starting {step} ({index}/{total})")
97
98	def on_step_complete(self, step: str, index: int, total: int) -> None:
99	print(f"Completed {step} ({index}/{total})")
100
101	def on_progress(self, step: str, percent: float, message: str = "") -> None:
102	print(f"{step}: {percent:.0f}% {message}")
103
104	assert isinstance(MyProgress(), ProgressCallback) # True
105	```
106
107	Methods:
108
109	\| Method \| Parameters \| Description \|
110	\|---\|---\|---\|
111	\| `on_step_start` \| `step: str`, `index: int`, `total: int` \| Called when a pipeline step begins \|
112	\| `on_step_complete` \| `step: str`, `index: int`, `total: int` \| Called when a pipeline step finishes \|
113	\| `on_progress` \| `step: str`, `percent: float`, `message: str` \| Called with incremental progress updates \|
114
115	---
116
117	## Transcript Models
118
119	### TranscriptSegment
120
121	A single segment of transcribed audio with timing and optional speaker identification.
122
123	\| Field \| Type \| Default \| Description \|
124	\|---\|---\|---\|---\|
125	\| `start` \| `float` \| required \| Start time in seconds \|
126	\| `end` \| `float` \| required \| End time in seconds \|
127	\| `text` \| `str` \| required \| Transcribed text content \|
128	\| `speaker` \| `Optional[str]` \| `None` \| Speaker identifier (e.g., "Speaker 1") \|
129	\| `confidence` \| `Optional[float]` \| `None` \| Transcription confidence score (0.0 to 1.0) \|
130
131	```json
132	{
133	"start": 12.5,
134	"end": 15.3,
135	"text": "We should migrate to the new API by next quarter.",
136	"speaker": "Alice",
137	"confidence": 0.95
138	}
139	```
140
141	---
142
143	## Content Extraction Models
144
145	### ActionItem
146
147	An action item extracted from transcript or diagram content.
148
149	\| Field \| Type \| Default \| Description \|
150	\|---\|---\|---\|---\|
151	\| `action` \| `str` \| required \| The action to be taken \|
152	\| `assignee` \| `Optional[str]` \| `None` \| Person responsible for the action \|
153	\| `deadline` \| `Optional[str]` \| `None` \| Deadline or timeframe \|
154	\| `priority` \| `Optional[str]` \| `None` \| Priority level (e.g., "high", "medium", "low") \|
155	\| `context` \| `Optional[str]` \| `None` \| Additional context or notes \|
156	\| `source` \| `Optional[str]` \| `None` \| Where this was found: `"transcript"`, `"diagram"`, or `"both"` \|
157
158	```json
159	{
160	"action": "Migrate authentication service to OAuth 2.0",
161	"assignee": "Bob",
162	"deadline": "Q2 2026",
163	"priority": "high",
164	"context": "at 245s",
165	"source": "transcript"
166	}
167	```
168
169	### KeyPoint
170
171	A key point extracted from content, optionally linked to diagrams.
172
173	\| Field \| Type \| Default \| Description \|
174	\|---\|---\|---\|---\|
175	\| `point` \| `str` \| required \| The key point text \|
176	\| `topic` \| `Optional[str]` \| `None` \| Topic or category \|
177	\| `details` \| `Optional[str]` \| `None` \| Supporting details \|
178	\| `timestamp` \| `Optional[float]` \| `None` \| Timestamp in video (seconds) \|
179	\| `source` \| `Optional[str]` \| `None` \| Where this was found \|
180	\| `related_diagrams` \| `List[int]` \| `[]` \| Indices of related diagrams in the manifest \|
181
182	```json
183	{
184	"point": "Team decided to use FalkorDB for graph storage",
185	"topic": "Architecture",
186	"details": "Embedded database avoids infrastructure overhead for CLI use",
187	"timestamp": 342.0,
188	"source": "transcript",
189	"related_diagrams": [0, 2]
190	}
191	```
192
193	---
194
195	## Diagram Models
196
197	### DiagramResult
198
199	Result from diagram extraction and analysis. Contains structured data extracted from visual content, along with paths to output files.
200
201	\| Field \| Type \| Default \| Description \|
202	\|---\|---\|---\|---\|
203	\| `frame_index` \| `int` \| required \| Index of the source frame \|
204	\| `timestamp` \| `Optional[float]` \| `None` \| Timestamp in video (seconds) \|
205	\| `diagram_type` \| `DiagramType` \| `unknown` \| Type of diagram detected \|
206	\| `confidence` \| `float` \| `0.0` \| Detection confidence (0.0 to 1.0) \|
207	\| `description` \| `Optional[str]` \| `None` \| Detailed description of the diagram \|
208	\| `text_content` \| `Optional[str]` \| `None` \| All visible text, preserving structure \|
209	\| `elements` \| `List[str]` \| `[]` \| Identified elements or components \|
210	\| `relationships` \| `List[str]` \| `[]` \| Identified relationships (e.g., `"A -> B: connects"`) \|
211	\| `mermaid` \| `Optional[str]` \| `None` \| Mermaid syntax representation \|
212	\| `chart_data` \| `Optional[Dict[str, Any]]` \| `None` \| Extractable chart data (`labels`, `values`, `chart_type`) \|
213	\| `image_path` \| `Optional[str]` \| `None` \| Relative path to original frame image \|
214	\| `svg_path` \| `Optional[str]` \| `None` \| Relative path to rendered SVG \|
215	\| `png_path` \| `Optional[str]` \| `None` \| Relative path to rendered PNG \|
216	\| `mermaid_path` \| `Optional[str]` \| `None` \| Relative path to mermaid source file \|
217
218	```json
219	{
220	"frame_index": 5,
221	"timestamp": 120.0,
222	"diagram_type": "architecture",
223	"confidence": 0.92,
224	"description": "Microservices architecture showing API gateway, auth service, and database layer",
225	"text_content": "API Gateway\nAuth Service\nUser DB\nPostgreSQL",
226	"elements": ["API Gateway", "Auth Service", "User DB", "PostgreSQL"],
227	"relationships": ["API Gateway -> Auth Service: authenticates", "Auth Service -> User DB: queries"],
228	"mermaid": "graph LR\n A[API Gateway] --> B[Auth Service]\n B --> C[User DB]",
229	"chart_data": null,
230	"image_path": "diagrams/diagram_0.jpg",
231	"svg_path": null,
232	"png_path": null,
233	"mermaid_path": "diagrams/diagram_0.mermaid"
234	}
235	```
236
237	### ScreenCapture
238
239	A screengrab fallback created when diagram extraction fails or confidence is too low for full analysis.
240
241	\| Field \| Type \| Default \| Description \|
242	\|---\|---\|---\|---\|
243	\| `frame_index` \| `int` \| required \| Index of the source frame \|
244	\| `timestamp` \| `Optional[float]` \| `None` \| Timestamp in video (seconds) \|
245	\| `caption` \| `Optional[str]` \| `None` \| Brief description of the content \|
246	\| `image_path` \| `Optional[str]` \| `None` \| Relative path to screenshot image \|
247	\| `confidence` \| `float` \| `0.0` \| Detection confidence that triggered fallback \|
248
249	```json
250	{
251	"frame_index": 8,
252	"timestamp": 195.0,
253	"caption": "Code editor showing a Python function definition",
254	"image_path": "captures/capture_0.jpg",
255	"confidence": 0.45
256	}
257	```
258
259	---
260
261	## Knowledge Graph Models
262
263	### Entity
264
265	An entity in the knowledge graph, representing a person, concept, technology, or other named item extracted from content.
266
267	\| Field \| Type \| Default \| Description \|
268	\|---\|---\|---\|---\|
269	\| `name` \| `str` \| required \| Entity name \|
270	\| `type` \| `str` \| `"concept"` \| Entity type: `"person"`, `"concept"`, `"technology"`, `"time"`, `"diagram"` \|
271	\| `descriptions` \| `List[str]` \| `[]` \| Accumulated descriptions of this entity \|
272	\| `source` \| `Optional[str]` \| `None` \| Source attribution: `"transcript"`, `"diagram"`, or `"both"` \|
273	\| `occurrences` \| `List[Dict[str, Any]]` \| `[]` \| Occurrences with source, timestamp, and text context \|
274
275	```json
276	{
277	"name": "FalkorDB",
278	"type": "technology",
279	"descriptions": ["Embedded graph database", "Supports Cypher queries"],
280	"source": "both",
281	"occurrences": [
282	{"source": "transcript", "timestamp": 120.0, "text": "We chose FalkorDB for graph storage"},
283	{"source": "diagram", "text": "FalkorDB Lite"}
284	]
285	}
286	```
287
288	### Relationship
289
290	A directed relationship between two entities in the knowledge graph.
291
292	\| Field \| Type \| Default \| Description \|
293	\|---\|---\|---\|---\|
294	\| `source` \| `str` \| required \| Source entity name \|
295	\| `target` \| `str` \| required \| Target entity name \|
296	\| `type` \| `str` \| `"related_to"` \| Relationship type (e.g., `"uses"`, `"manages"`, `"related_to"`) \|
297	\| `content_source` \| `Optional[str]` \| `None` \| Content source identifier \|
298	\| `timestamp` \| `Optional[float]` \| `None` \| Timestamp in seconds \|
299
300	```json
301	{
302	"source": "PlanOpticon",
303	"target": "FalkorDB",
304	"type": "uses",
305	"content_source": "transcript",
306	"timestamp": 125.0
307	}
308	```
309
310	### SourceRecord
311
312	A content source registered in the knowledge graph for provenance tracking.
313
314	\| Field \| Type \| Default \| Description \|
315	\|---\|---\|---\|---\|
316	\| `source_id` \| `str` \| required \| Unique identifier for this source \|
317	\| `source_type` \| `str` \| required \| Source type: `"video"`, `"document"`, `"url"`, `"api"`, `"manual"` \|
318	\| `title` \| `str` \| required \| Human-readable title \|
319	\| `path` \| `Optional[str]` \| `None` \| Local file path \|
320	\| `url` \| `Optional[str]` \| `None` \| URL if applicable \|
321	\| `mime_type` \| `Optional[str]` \| `None` \| MIME type of the source \|
322	\| `ingested_at` \| `str` \| auto \| ISO format ingestion timestamp (auto-generated) \|
323	\| `metadata` \| `Dict[str, Any]` \| `{}` \| Additional source metadata \|
324
325	```json
326	{
327	"source_id": "vid_abc123",
328	"source_type": "video",
329	"title": "Sprint Planning Meeting - Jan 15",
330	"path": "/recordings/sprint-planning.mp4",
331	"url": null,
332	"mime_type": "video/mp4",
333	"ingested_at": "2026-01-15T10:30:00",
334	"metadata": {"duration": 3600, "resolution": "1920x1080"}
335	}
336	```
337
338	### KnowledgeGraphData
339
340	Serializable knowledge graph data containing all nodes, relationships, and source provenance.
341
342	\| Field \| Type \| Default \| Description \|
343	\|---\|---\|---\|---\|
344	\| `nodes` \| `List[Entity]` \| `[]` \| Graph nodes/entities \|
345	\| `relationships` \| `List[Relationship]` \| `[]` \| Graph relationships \|
346	\| `sources` \| `List[SourceRecord]` \| `[]` \| Content sources for provenance tracking \|
347
348	---
349
350	## Planning Models
351
352	### PlanningEntity
353
354	An entity classified for planning purposes, with priority and status tracking.
355
356	\| Field \| Type \| Default \| Description \|
357	\|---\|---\|---\|---\|
358	\| `name` \| `str` \| required \| Entity name \|
359	\| `planning_type` \| `PlanningEntityType` \| required \| Planning classification \|
360	\| `description` \| `str` \| `""` \| Detailed description \|
361	\| `priority` \| `Optional[str]` \| `None` \| Priority: `"high"`, `"medium"`, `"low"` \|
362	\| `status` \| `Optional[str]` \| `None` \| Status: `"identified"`, `"confirmed"`, `"resolved"` \|
363	\| `source_entities` \| `List[str]` \| `[]` \| Names of source KG entities this was derived from \|
364	\| `metadata` \| `Dict[str, Any]` \| `{}` \| Additional metadata \|
365
366	```json
367	{
368	"name": "Migrate to OAuth 2.0",
369	"planning_type": "task",
370	"description": "Replace custom auth with OAuth 2.0 across all services",
371	"priority": "high",
372	"status": "identified",
373	"source_entities": ["OAuth", "Authentication Service"],
374	"metadata": {}
375	}
376	```
377
378	---
379
380	## Processing and Metadata Models
381
382	### ProcessingStats
383
384	Statistics about a processing run, including model usage tracking.
385
386	\| Field \| Type \| Default \| Description \|
387	\|---\|---\|---\|---\|
388	\| `start_time` \| `Optional[str]` \| `None` \| ISO format start time \|
389	\| `end_time` \| `Optional[str]` \| `None` \| ISO format end time \|
390	\| `duration_seconds` \| `Optional[float]` \| `None` \| Total processing time \|
391	\| `frames_extracted` \| `int` \| `0` \| Number of frames extracted from video \|
392	\| `people_frames_filtered` \| `int` \| `0` \| Frames filtered out (contained people/webcam) \|
393	\| `diagrams_detected` \| `int` \| `0` \| Number of diagrams detected \|
394	\| `screen_captures` \| `int` \| `0` \| Number of screen captures saved \|
395	\| `transcript_duration_seconds` \| `Optional[float]` \| `None` \| Duration of transcribed audio \|
396	\| `models_used` \| `Dict[str, str]` \| `{}` \| Map of task to model used (e.g., `{"vision": "gpt-4o"}`) \|
397
398	### VideoMetadata
399
400	Metadata about the source video file.
401
402	\| Field \| Type \| Default \| Description \|
403	\|---\|---\|---\|---\|
404	\| `title` \| `str` \| required \| Video title \|
405	\| `source_path` \| `Optional[str]` \| `None` \| Original video file path \|
406	\| `duration_seconds` \| `Optional[float]` \| `None` \| Video duration in seconds \|
407	\| `resolution` \| `Optional[str]` \| `None` \| Video resolution (e.g., `"1920x1080"`) \|
408	\| `processed_at` \| `str` \| auto \| ISO format processing timestamp \|
409
410	---
411
412	## Manifest Models
413
414	### VideoManifest
415
416	The single source of truth for a video processing run. Contains all output paths, inline structured data, and processing statistics.
417
418	\| Field \| Type \| Default \| Description \|
419	\|---\|---\|---\|---\|
420	\| `version` \| `str` \| `"1.0"` \| Manifest schema version \|
421	\| `video` \| `VideoMetadata` \| required \| Source video metadata \|
422	\| `stats` \| `ProcessingStats` \| default \| Processing statistics \|
423	\| `transcript_json` \| `Optional[str]` \| `None` \| Relative path to transcript JSON \|
424	\| `transcript_txt` \| `Optional[str]` \| `None` \| Relative path to transcript text \|
425	\| `transcript_srt` \| `Optional[str]` \| `None` \| Relative path to SRT subtitles \|
426	\| `analysis_md` \| `Optional[str]` \| `None` \| Relative path to analysis Markdown \|
427	\| `analysis_html` \| `Optional[str]` \| `None` \| Relative path to analysis HTML \|
428	\| `analysis_pdf` \| `Optional[str]` \| `None` \| Relative path to analysis PDF \|
429	\| `knowledge_graph_json` \| `Optional[str]` \| `None` \| Relative path to knowledge graph JSON \|
430	\| `knowledge_graph_db` \| `Optional[str]` \| `None` \| Relative path to knowledge graph DB \|
431	\| `key_points_json` \| `Optional[str]` \| `None` \| Relative path to key points JSON \|
432	\| `action_items_json` \| `Optional[str]` \| `None` \| Relative path to action items JSON \|
433	\| `key_points` \| `List[KeyPoint]` \| `[]` \| Inline key points data \|
434	\| `action_items` \| `List[ActionItem]` \| `[]` \| Inline action items data \|
435	\| `diagrams` \| `List[DiagramResult]` \| `[]` \| Inline diagram results \|
436	\| `screen_captures` \| `List[ScreenCapture]` \| `[]` \| Inline screen captures \|
437	\| `frame_paths` \| `List[str]` \| `[]` \| Relative paths to extracted frames \|
438
439	```python
440	from video_processor.models import VideoManifest, VideoMetadata
441
442	manifest = VideoManifest(
443	video=VideoMetadata(title="Sprint Planning"),
444	key_points=[...],
445	action_items=[...],
446	diagrams=[...],
447	)
448
449	# Serialize to JSON
450	manifest.model_dump_json(indent=2)
451
452	# Load from file
453	loaded = VideoManifest.model_validate_json(Path("manifest.json").read_text())
454	```
455
456	### BatchVideoEntry
457
458	Summary of a single video within a batch processing run.
459
460	\| Field \| Type \| Default \| Description \|
461	\|---\|---\|---\|---\|
462	\| `video_name` \| `str` \| required \| Video file name \|
463	\| `manifest_path` \| `str` \| required \| Relative path to the video's manifest file \|
464	\| `status` \| `str` \| `"pending"` \| Processing status: `"pending"`, `"completed"`, `"failed"` \|
465	\| `error` \| `Optional[str]` \| `None` \| Error message if processing failed \|
466	\| `diagrams_count` \| `int` \| `0` \| Number of diagrams detected \|
467	\| `action_items_count` \| `int` \| `0` \| Number of action items extracted \|
468	\| `key_points_count` \| `int` \| `0` \| Number of key points extracted \|
469	\| `duration_seconds` \| `Optional[float]` \| `None` \| Processing duration \|
470
471	### BatchManifest
472
473	Manifest for a batch processing run across multiple videos.
474
475	\| Field \| Type \| Default \| Description \|
476	\|---\|---\|---\|---\|
477	\| `version` \| `str` \| `"1.0"` \| Manifest schema version \|
478	\| `title` \| `str` \| `"Batch Processing Results"` \| Batch title \|
479	\| `processed_at` \| `str` \| auto \| ISO format timestamp \|
480	\| `stats` \| `ProcessingStats` \| default \| Aggregated processing statistics \|
481	\| `videos` \| `List[BatchVideoEntry]` \| `[]` \| Per-video summaries \|
482	\| `total_videos` \| `int` \| `0` \| Total number of videos in batch \|
483	\| `completed_videos` \| `int` \| `0` \| Successfully processed videos \|
484	\| `failed_videos` \| `int` \| `0` \| Videos that failed processing \|
485	\| `total_diagrams` \| `int` \| `0` \| Total diagrams across all videos \|
486	\| `total_action_items` \| `int` \| `0` \| Total action items across all videos \|
487	\| `total_key_points` \| `int` \| `0` \| Total key points across all videos \|
488	\| `batch_summary_md` \| `Optional[str]` \| `None` \| Relative path to batch summary Markdown \|
489	\| `merged_knowledge_graph_json` \| `Optional[str]` \| `None` \| Relative path to merged KG JSON \|
490	\| `merged_knowledge_graph_db` \| `Optional[str]` \| `None` \| Relative path to merged KG database \|
491
492	```python
493	from video_processor.models import BatchManifest
494
495	batch = BatchManifest(
496	title="Weekly Recordings",
497	total_videos=5,
498	completed_videos=4,
499	failed_videos=1,
500	)
501	```
502

M docs/api/providers.md

+496

		--- docs/api/providers.md
		+++ docs/api/providers.md
		@@ -3,5 +3,501 @@
3	3	::: video_processor.providers.base
4	4
5	5	::: video_processor.providers.manager
6	6
7	7	::: video_processor.providers.discovery
	8	+
	9	+---
	10	+
	11	+## Overview
	12	+
	13	+The provider system abstracts LLM API calls behind a unified interface. It supports multiple providers (OpenAI, Anthropic, Gemini, Ollama, and OpenAI-compatible services), automatic model discovery, capability-based routing, and usage tracking.
	14	+
	15	+Key components:
	16	+
	17	+- `BaseProvider` -- abstract interface that all providers implement
	18	+- `ProviderRegistry` -- global registry mapping provider names to classes
	19	+- `ProviderManager` -- high-level router that picks the best provider for each task
	20	+- `discover_available_models()` -- scans all configured providers for available models
	21	+
	22	+---
	23	+
	24	+## BaseProvider (ABC)
	25	+
	26	+```python
	27	+from video_processor.providers.base import BaseProvider
	28	+```
	29	+
	30	+Abstract base class that all provider implementations must subclass. Defines the four core capabilities: chat, vision, audio transcription, and model listing.
	31	+
	32	+Class attribute:
	33	+
	34	+\| Attribute \| Type \| Description \|
	35	+\|---\|---\|---\|
	36	+\| `provider_name` \| `str` \| Identifier for this provider (e.g., `"openai"`, `"anthropic"`) \|
	37	+
	38	+### chat()
	39	+
	40	+```python
	41	+def chat(
	42	+ self,
	43	+ messages: list[dict],
	44	+ max_tokens: int = 4096,
	45	+ temperature: float = 0.7,
	46	+ model: Optional[str] = None,
	47	+) -> str
	48	+```
	49	+
	50	+Send a chat completion request.
	51	+
	52	+Parameters:
	53	+
	54	+\| Parameter \| Type \| Default \| Description \|
	55	+\|---\|---\|---\|---\|
	56	+\| `messages` \| `list[dict]` \| required \| OpenAI-format message list (`role`, `content`) \|
	57	+\| `max_tokens` \| `int` \| `4096` \| Maximum tokens in the response \|
	58	+\| `temperature` \| `float` \| `0.7` \| Sampling temperature \|
	59	+\| `model` \| `Optional[str]` \| `None` \| Override model ID \|
	60	+
	61	+Returns: `str` -- the assistant's text response.
	62	+
	63	+### analyze_image()
	64	+
	65	+```python
	66	+def analyze_image(
	67	+ self,
	68	+ image_bytes: bytes,
	69	+ prompt: str,
	70	+ max_tokens: int = 4096,
	71	+ model: Optional[str] = None,
	72	+) -> str
	73	+```
	74	+
	75	+Analyze an image with a text prompt using a vision-capable model.
	76	+
	77	+Parameters:
	78	+
	79	+\| Parameter \| Type \| Default \| Description \|
	80	+\|---\|---\|---\|---\|
	81	+\| `image_bytes` \| `bytes` \| required \| Raw image data (JPEG, PNG, etc.) \|
	82	+\| `prompt` \| `str` \| required \| Analysis instructions \|
	83	+\| `max_tokens` \| `int` \| `4096` \| Maximum tokens in the response \|
	84	+\| `model` \| `Optional[str]` \| `None` \| Override model ID \|
	85	+
	86	+Returns: `str` -- the assistant's analysis text.
	87	+
	88	+### transcribe_audio()
	89	+
	90	+```python
	91	+def transcribe_audio(
	92	+ self,
	93	+ audio_path: str \| Path,
	94	+ language: Optional[str] = None,
	95	+ model: Optional[str] = None,
	96	+) -> dict
	97	+```
	98	+
	99	+Transcribe an audio file.
	100	+
	101	+Parameters:
	102	+
	103	+\| Parameter \| Type \| Default \| Description \|
	104	+\|---\|---\|---\|---\|
	105	+\| `audio_path` \| `str \\| Path` \| required \| Path to the audio file \|
	106	+\| `language` \| `Optional[str]` \| `None` \| Language hint (ISO 639-1 code) \|
	107	+\| `model` \| `Optional[str]` \| `None` \| Override model ID \|
	108	+
	109	+Returns: `dict` -- transcription result with keys `text`, `segments`, `duration`, etc.
	110	+
	111	+### list_models()
	112	+
	113	+```python
	114	+def list_models(self) -> list[ModelInfo]
	115	+```
	116	+
	117	+Discover available models from this provider's API.
	118	+
	119	+Returns: `list[ModelInfo]` -- available models with capability metadata.
	120	+
	121	+---
	122	+
	123	+## ModelInfo
	124	+
	125	+```python
	126	+from video_processor.providers.base import ModelInfo
	127	+```
	128	+
	129	+Pydantic model describing an available model from a provider.
	130	+
	131	+\| Field \| Type \| Default \| Description \|
	132	+\|---\|---\|---\|---\|
	133	+\| `id` \| `str` \| required \| Model identifier (e.g., `"gpt-4o"`, `"claude-haiku-4-5-20251001"`) \|
	134	+\| `provider` \| `str` \| required \| Provider name (e.g., `"openai"`, `"anthropic"`, `"gemini"`) \|
	135	+\| `display_name` \| `str` \| `""` \| Human-readable display name \|
	136	+\| `capabilities` \| `List[str]` \| `[]` \| Model capabilities: `"chat"`, `"vision"`, `"audio"`, `"embedding"` \|
	137	+
	138	+```json
	139	+{
	140	+ "id": "gpt-4o",
	141	+ "provider": "openai",
	142	+ "display_name": "GPT-4o",
	143	+ "capabilities": ["chat", "vision"]
	144	+}
	145	+```
	146	+
	147	+---
	148	+
	149	+## ProviderRegistry
	150	+
	151	+```python
	152	+from video_processor.providers.base import ProviderRegistry
	153	+```
	154	+
	155	+Class-level registry for provider classes. Providers register themselves with metadata on import. This registry is used internally by `ProviderManager` but can also be used directly for introspection.
	156	+
	157	+### register()
	158	+
	159	+```python
	160	+@classmethod
	161	+def register(
	162	+ cls,
	163	+ name: str,
	164	+ provider_class: type,
	165	+ env_var: str = "",
	166	+ model_prefixes: Optional[List[str]] = None,
	167	+ default_models: Optional[Dict[str, str]] = None,
	168	+) -> None
	169	+```
	170	+
	171	+Register a provider class with its metadata. Called by each provider module at import time.
	172	+
	173	+Parameters:
	174	+
	175	+\| Parameter \| Type \| Default \| Description \|
	176	+\|---\|---\|---\|---\|
	177	+\| `name` \| `str` \| required \| Provider name (e.g., `"openai"`) \|
	178	+\| `provider_class` \| `type` \| required \| The provider class \|
	179	+\| `env_var` \| `str` \| `""` \| Environment variable for API key \|
	180	+\| `model_prefixes` \| `Optional[List[str]]` \| `None` \| Model ID prefixes for auto-detection (e.g., `["gpt-", "o1-"]`) \|
	181	+\| `default_models` \| `Optional[Dict[str, str]]` \| `None` \| Default models per capability (e.g., `{"chat": "gpt-4o", "vision": "gpt-4o"}`) \|
	182	+
	183	+### get()
	184	+
	185	+```python
	186	+@classmethod
	187	+def get(cls, name: str) -> type
	188	+```
	189	+
	190	+Return the provider class for a given name. Raises `ValueError` if the provider is not registered.
	191	+
	192	+### get_by_model()
	193	+
	194	+```python
	195	+@classmethod
	196	+def get_by_model(cls, model_id: str) -> Optional[str]
	197	+```
	198	+
	199	+Return the provider name for a model ID based on prefix matching. Returns `None` if no match is found.
	200	+
	201	+### get_default_models()
	202	+
	203	+```python
	204	+@classmethod
	205	+def get_default_models(cls, name: str) -> Dict[str, str]
	206	+```
	207	+
	208	+Return the default models dict for a provider, mapping capability names to model IDs.
	209	+
	210	+### available()
	211	+
	212	+```python
	213	+@classmethod
	214	+def available(cls) -> List[str]
	215	+```
	216	+
	217	+Return names of providers whose required environment variable is set (or providers with no env var requirement, like Ollama).
	218	+
	219	+### all_registered()
	220	+
	221	+```python
	222	+@classmethod
	223	+def all_registered(cls) -> Dict[str, Dict]
	224	+```
	225	+
	226	+Return all registered providers and their metadata dictionaries.
	227	+
	228	+---
	229	+
	230	+## OpenAICompatibleProvider
	231	+
	232	+```python
	233	+from video_processor.providers.base import OpenAICompatibleProvider
	234	+```
	235	+
	236	+Base class for providers using OpenAI-compatible APIs (Together, Fireworks, Cerebras, xAI, Azure). Implements `chat()`, `analyze_image()`, and `list_models()` using the OpenAI client library. `transcribe_audio()` raises `NotImplementedError` by default.
	237	+
	238	+Constructor:
	239	+
	240	+```python
	241	+def __init__(self, api_key: Optional[str] = None, base_url: Optional[str] = None)
	242	+```
	243	+
	244	+\| Parameter \| Type \| Default \| Description \|
	245	+\|---\|---\|---\|---\|
	246	+\| `api_key` \| `Optional[str]` \| `None` \| API key (falls back to `self.env_var` environment variable) \|
	247	+\| `base_url` \| `Optional[str]` \| `None` \| API base URL (falls back to `self.base_url` class attribute) \|
	248	+
	249	+Subclass attributes to override:
	250	+
	251	+\| Attribute \| Description \|
	252	+\|---\|---\|
	253	+\| `provider_name` \| Provider identifier string \|
	254	+\| `base_url` \| Default API base URL \|
	255	+\| `env_var` \| Environment variable name for the API key \|
	256	+
	257	+Usage tracking: After each `chat()` or `analyze_image()` call, the provider stores token counts in `self._last_usage` as `{"input_tokens": int, "output_tokens": int}`. This is consumed by `ProviderManager._track()`.
	258	+
	259	+---
	260	+
	261	+## ProviderManager
	262	+
	263	+```python
	264	+from video_processor.providers.manager import ProviderManager
	265	+```
	266	+
	267	+High-level router that selects the best available provider and model for each API call. Supports explicit model selection, forced provider, or automatic selection based on discovered capabilities.
	268	+
	269	+### Constructor
	270	+
	271	+```python
	272	+def __init__(
	273	+ self,
	274	+ vision_model: Optional[str] = None,
	275	+ chat_model: Optional[str] = None,
	276	+ transcription_model: Optional[str] = None,
	277	+ provider: Optional[str] = None,
	278	+ auto: bool = True,
	279	+)
	280	+```
	281	+
	282	+\| Parameter \| Type \| Default \| Description \|
	283	+\|---\|---\|---\|---\|
	284	+\| `vision_model` \| `Optional[str]` \| `None` \| Override model for vision tasks (e.g., `"gpt-4o"`) \|
	285	+\| `chat_model` \| `Optional[str]` \| `None` \| Override model for chat/LLM tasks \|
	286	+\| `transcription_model` \| `Optional[str]` \| `None` \| Override model for transcription \|
	287	+\| `provider` \| `Optional[str]` \| `None` \| Force all tasks to a single provider \|
	288	+\| `auto` \| `bool` \| `True` \| If `True` and no model specified, pick the best available \|
	289	+
	290	+Attributes:
	291	+
	292	+\| Attribute \| Type \| Description \|
	293	+\|---\|---\|---\|
	294	+\| `usage` \| `UsageTracker` \| Tracks token counts and API costs across all calls \|
	295	+
	296	+### Auto-selection preferences
	297	+
	298	+When `auto=True` and no explicit model is set, providers are tried in this order:
	299	+
	300	+Vision: Gemini (`gemini-2.5-flash`) > OpenAI (`gpt-4o-mini`) > Anthropic (`claude-haiku-4-5-20251001`)
	301	+
	302	+Chat: Anthropic (`claude-haiku-4-5-20251001`) > OpenAI (`gpt-4o-mini`) > Gemini (`gemini-2.5-flash`)
	303	+
	304	+Transcription: OpenAI (`whisper-1`) > Gemini (`gemini-2.5-flash`)
	305	+
	306	+If no API-key-based provider is available, Ollama is tried as a fallback.
	307	+
	308	+### chat()
	309	+
	310	+```python
	311	+def chat(
	312	+ self,
	313	+ messages: list[dict],
	314	+ max_tokens: int = 4096,
	315	+ temperature: float = 0.7,
	316	+) -> str
	317	+```
	318	+
	319	+Send a chat completion to the best available provider. Automatically resolves which provider and model to use.
	320	+
	321	+Parameters:
	322	+
	323	+\| Parameter \| Type \| Default \| Description \|
	324	+\|---\|---\|---\|---\|
	325	+\| `messages` \| `list[dict]` \| required \| OpenAI-format messages \|
	326	+\| `max_tokens` \| `int` \| `4096` \| Maximum response tokens \|
	327	+\| `temperature` \| `float` \| `0.7` \| Sampling temperature \|
	328	+
	329	+Returns: `str` -- assistant response text.
	330	+
	331	+Raises: `RuntimeError` if no provider is available for the `chat` capability.
	332	+
	333	+### analyze_image()
	334	+
	335	+```python
	336	+def analyze_image(
	337	+ self,
	338	+ image_bytes: bytes,
	339	+ prompt: str,
	340	+ max_tokens: int = 4096,
	341	+) -> str
	342	+```
	343	+
	344	+Analyze an image using the best available vision provider.
	345	+
	346	+Returns: `str` -- analysis text.
	347	+
	348	+Raises: `RuntimeError` if no provider is available for the `vision` capability.
	349	+
	350	+### transcribe_audio()
	351	+
	352	+```python
	353	+def transcribe_audio(
	354	+ self,
	355	+ audio_path: str \| Path,
	356	+ language: Optional[str] = None,
	357	+ speaker_hints: Optional[list[str]] = None,
	358	+) -> dict
	359	+```
	360	+
	361	+Transcribe audio. Prefers local Whisper (no file size limits, no API costs) when available, falling back to API-based transcription.
	362	+
	363	+Parameters:
	364	+
	365	+\| Parameter \| Type \| Default \| Description \|
	366	+\|---\|---\|---\|---\|
	367	+\| `audio_path` \| `str \\| Path` \| required \| Path to the audio file \|
	368	+\| `language` \| `Optional[str]` \| `None` \| Language hint \|
	369	+\| `speaker_hints` \| `Optional[list[str]]` \| `None` \| Speaker names for better recognition \|
	370	+
	371	+Returns: `dict` -- transcription result with `text`, `segments`, `duration`.
	372	+
	373	+Local Whisper: If `transcription_model` is unset or starts with `"whisper-local"`, the manager tries local Whisper first. Use `"whisper-local:large"` to specify a model size.
	374	+
	375	+### get_models_used()
	376	+
	377	+```python
	378	+def get_models_used(self) -> dict[str, str]
	379	+```
	380	+
	381	+Return a dict mapping capability to `"provider/model"` string for tracking purposes.
	382	+
	383	+```python
	384	+pm = ProviderManager()
	385	+print(pm.get_models_used())
	386	+# {"vision": "gemini/gemini-2.5-flash", "chat": "anthropic/claude-haiku-4-5-20251001", ...}
	387	+```
	388	+
	389	+### Usage examples
	390	+
	391	+```python
	392	+from video_processor.providers.manager import ProviderManager
	393	+
	394	+# Auto-select best providers
	395	+pm = ProviderManager()
	396	+
	397	+# Force everything through one provider
	398	+pm = ProviderManager(provider="openai")
	399	+
	400	+# Explicit model selection
	401	+pm = ProviderManager(
	402	+ vision_model="gpt-4o",
	403	+ chat_model="claude-haiku-4-5-20251001",
	404	+ transcription_model="whisper-local:large",
	405	+)
	406	+
	407	+# Chat completion
	408	+response = pm.chat([
	409	+ {"role": "user", "content": "Summarize this meeting transcript..."}
	410	+])
	411	+
	412	+# Image analysis
	413	+with open("diagram.png", "rb") as f:
	414	+ analysis = pm.analyze_image(f.read(), "Describe this architecture diagram")
	415	+
	416	+# Transcription with speaker hints
	417	+result = pm.transcribe_audio(
	418	+ "meeting.mp3",
	419	+ language="en",
	420	+ speaker_hints=["Alice", "Bob", "Charlie"],
	421	+)
	422	+
	423	+# Check usage
	424	+print(pm.usage.summary())
	425	+```
	426	+
	427	+---
	428	+
	429	+## discover_available_models()
	430	+
	431	+```python
	432	+from video_processor.providers.discovery import discover_available_models
	433	+```
	434	+
	435	+```python
	436	+def discover_available_models(
	437	+ api_keys: Optional[dict[str, str]] = None,
	438	+ force_refresh: bool = False,
	439	+) -> list[ModelInfo]
	440	+```
	441	+
	442	+Discover available models from all configured providers. For each provider with a valid API key, calls `list_models()` and returns a unified, sorted list.
	443	+
	444	+Parameters:
	445	+
	446	+\| Parameter \| Type \| Default \| Description \|
	447	+\|---\|---\|---\|---\|
	448	+\| `api_keys` \| `Optional[dict[str, str]]` \| `None` \| Override API keys (defaults to environment variables) \|
	449	+\| `force_refresh` \| `bool` \| `False` \| Force re-discovery, ignoring the session cache \|
	450	+
	451	+Returns: `list[ModelInfo]` -- all discovered models, sorted by provider then model ID.
	452	+
	453	+Caching: Results are cached for the session. Use `force_refresh=True` or `clear_discovery_cache()` to refresh.
	454	+
	455	+```python
	456	+from video_processor.providers.discovery import (
	457	+ discover_available_models,
	458	+ clear_discovery_cache,
	459	+)
	460	+
	461	+# Discover models using environment variables
	462	+models = discover_available_models()
	463	+for m in models:
	464	+ print(f"{m.provider}/{m.id} - {m.capabilities}")
	465	+
	466	+# Force refresh
	467	+models = discover_available_models(force_refresh=True)
	468	+
	469	+# Override API keys
	470	+models = discover_available_models(api_keys={
	471	+ "openai": "sk-...",
	472	+ "anthropic": "sk-ant-...",
	473	+})
	474	+
	475	+# Clear cache
	476	+clear_discovery_cache()
	477	+```
	478	+
	479	+### clear_discovery_cache()
	480	+
	481	+```python
	482	+def clear_discovery_cache() -> None
	483	+```
	484	+
	485	+Clear the cached model list, forcing the next `discover_available_models()` call to re-query providers.
	486	+
	487	+---
	488	+
	489	+## Built-in Providers
	490	+
	491	+The following providers are registered automatically when the provider system initializes:
	492	+
	493	+\| Provider \| Environment Variable \| Capabilities \| Default Chat Model \|
	494	+\|---\|---\|---\|---\|
	495	+\| `openai` \| `OPENAI_API_KEY` \| chat, vision, audio \| `gpt-4o-mini` \|
	496	+\| `anthropic` \| `ANTHROPIC_API_KEY` \| chat, vision \| `claude-haiku-4-5-20251001` \|
	497	+\| `gemini` \| `GEMINI_API_KEY` \| chat, vision, audio \| `gemini-2.5-flash` \|
	498	+\| `ollama` \| (none -- checks server) \| chat, vision \| (depends on installed models) \|
	499	+\| `together` \| `TOGETHER_API_KEY` \| chat \| (varies) \|
	500	+\| `fireworks` \| `FIREWORKS_API_KEY` \| chat \| (varies) \|
	501	+\| `cerebras` \| `CEREBRAS_API_KEY` \| chat \| (varies) \|
	502	+\| `xai` \| `XAI_API_KEY` \| chat \| (varies) \|
	503	+\| `azure` \| `AZURE_OPENAI_API_KEY` \| chat, vision \| (varies) \|
8	504
9	505	ADDED docs/api/sources.md

	--- docs/api/providers.md
	+++ docs/api/providers.md
	@@ -3,5 +3,501 @@
3	::: video_processor.providers.base
4
5	::: video_processor.providers.manager
6
7	::: video_processor.providers.discovery
















































































































































































































































































































































































































































































































8
9	DDED docs/api/sources.md

	--- docs/api/providers.md
	+++ docs/api/providers.md
	@@ -3,5 +3,501 @@
3	::: video_processor.providers.base
4
5	::: video_processor.providers.manager
6
7	::: video_processor.providers.discovery
8
9	---
10
11	## Overview
12
13	The provider system abstracts LLM API calls behind a unified interface. It supports multiple providers (OpenAI, Anthropic, Gemini, Ollama, and OpenAI-compatible services), automatic model discovery, capability-based routing, and usage tracking.
14
15	Key components:
16
17	- `BaseProvider` -- abstract interface that all providers implement
18	- `ProviderRegistry` -- global registry mapping provider names to classes
19	- `ProviderManager` -- high-level router that picks the best provider for each task
20	- `discover_available_models()` -- scans all configured providers for available models
21
22	---
23
24	## BaseProvider (ABC)
25
26	```python
27	from video_processor.providers.base import BaseProvider
28	```
29
30	Abstract base class that all provider implementations must subclass. Defines the four core capabilities: chat, vision, audio transcription, and model listing.
31
32	Class attribute:
33
34	\| Attribute \| Type \| Description \|
35	\|---\|---\|---\|
36	\| `provider_name` \| `str` \| Identifier for this provider (e.g., `"openai"`, `"anthropic"`) \|
37
38	### chat()
39
40	```python
41	def chat(
42	self,
43	messages: list[dict],
44	max_tokens: int = 4096,
45	temperature: float = 0.7,
46	model: Optional[str] = None,
47	) -> str
48	```
49
50	Send a chat completion request.
51
52	Parameters:
53
54	\| Parameter \| Type \| Default \| Description \|
55	\|---\|---\|---\|---\|
56	\| `messages` \| `list[dict]` \| required \| OpenAI-format message list (`role`, `content`) \|
57	\| `max_tokens` \| `int` \| `4096` \| Maximum tokens in the response \|
58	\| `temperature` \| `float` \| `0.7` \| Sampling temperature \|
59	\| `model` \| `Optional[str]` \| `None` \| Override model ID \|
60
61	Returns: `str` -- the assistant's text response.
62
63	### analyze_image()
64
65	```python
66	def analyze_image(
67	self,
68	image_bytes: bytes,
69	prompt: str,
70	max_tokens: int = 4096,
71	model: Optional[str] = None,
72	) -> str
73	```
74
75	Analyze an image with a text prompt using a vision-capable model.
76
77	Parameters:
78
79	\| Parameter \| Type \| Default \| Description \|
80	\|---\|---\|---\|---\|
81	\| `image_bytes` \| `bytes` \| required \| Raw image data (JPEG, PNG, etc.) \|
82	\| `prompt` \| `str` \| required \| Analysis instructions \|
83	\| `max_tokens` \| `int` \| `4096` \| Maximum tokens in the response \|
84	\| `model` \| `Optional[str]` \| `None` \| Override model ID \|
85
86	Returns: `str` -- the assistant's analysis text.
87
88	### transcribe_audio()
89
90	```python
91	def transcribe_audio(
92	self,
93	audio_path: str \| Path,
94	language: Optional[str] = None,
95	model: Optional[str] = None,
96	) -> dict
97	```
98
99	Transcribe an audio file.
100
101	Parameters:
102
103	\| Parameter \| Type \| Default \| Description \|
104	\|---\|---\|---\|---\|
105	\| `audio_path` \| `str \\| Path` \| required \| Path to the audio file \|
106	\| `language` \| `Optional[str]` \| `None` \| Language hint (ISO 639-1 code) \|
107	\| `model` \| `Optional[str]` \| `None` \| Override model ID \|
108
109	Returns: `dict` -- transcription result with keys `text`, `segments`, `duration`, etc.
110
111	### list_models()
112
113	```python
114	def list_models(self) -> list[ModelInfo]
115	```
116
117	Discover available models from this provider's API.
118
119	Returns: `list[ModelInfo]` -- available models with capability metadata.
120
121	---
122
123	## ModelInfo
124
125	```python
126	from video_processor.providers.base import ModelInfo
127	```
128
129	Pydantic model describing an available model from a provider.
130
131	\| Field \| Type \| Default \| Description \|
132	\|---\|---\|---\|---\|
133	\| `id` \| `str` \| required \| Model identifier (e.g., `"gpt-4o"`, `"claude-haiku-4-5-20251001"`) \|
134	\| `provider` \| `str` \| required \| Provider name (e.g., `"openai"`, `"anthropic"`, `"gemini"`) \|
135	\| `display_name` \| `str` \| `""` \| Human-readable display name \|
136	\| `capabilities` \| `List[str]` \| `[]` \| Model capabilities: `"chat"`, `"vision"`, `"audio"`, `"embedding"` \|
137
138	```json
139	{
140	"id": "gpt-4o",
141	"provider": "openai",
142	"display_name": "GPT-4o",
143	"capabilities": ["chat", "vision"]
144	}
145	```
146
147	---
148
149	## ProviderRegistry
150
151	```python
152	from video_processor.providers.base import ProviderRegistry
153	```
154
155	Class-level registry for provider classes. Providers register themselves with metadata on import. This registry is used internally by `ProviderManager` but can also be used directly for introspection.
156
157	### register()
158
159	```python
160	@classmethod
161	def register(
162	cls,
163	name: str,
164	provider_class: type,
165	env_var: str = "",
166	model_prefixes: Optional[List[str]] = None,
167	default_models: Optional[Dict[str, str]] = None,
168	) -> None
169	```
170
171	Register a provider class with its metadata. Called by each provider module at import time.
172
173	Parameters:
174
175	\| Parameter \| Type \| Default \| Description \|
176	\|---\|---\|---\|---\|
177	\| `name` \| `str` \| required \| Provider name (e.g., `"openai"`) \|
178	\| `provider_class` \| `type` \| required \| The provider class \|
179	\| `env_var` \| `str` \| `""` \| Environment variable for API key \|
180	\| `model_prefixes` \| `Optional[List[str]]` \| `None` \| Model ID prefixes for auto-detection (e.g., `["gpt-", "o1-"]`) \|
181	\| `default_models` \| `Optional[Dict[str, str]]` \| `None` \| Default models per capability (e.g., `{"chat": "gpt-4o", "vision": "gpt-4o"}`) \|
182
183	### get()
184
185	```python
186	@classmethod
187	def get(cls, name: str) -> type
188	```
189
190	Return the provider class for a given name. Raises `ValueError` if the provider is not registered.
191
192	### get_by_model()
193
194	```python
195	@classmethod
196	def get_by_model(cls, model_id: str) -> Optional[str]
197	```
198
199	Return the provider name for a model ID based on prefix matching. Returns `None` if no match is found.
200
201	### get_default_models()
202
203	```python
204	@classmethod
205	def get_default_models(cls, name: str) -> Dict[str, str]
206	```
207
208	Return the default models dict for a provider, mapping capability names to model IDs.
209
210	### available()
211
212	```python
213	@classmethod
214	def available(cls) -> List[str]
215	```
216
217	Return names of providers whose required environment variable is set (or providers with no env var requirement, like Ollama).
218
219	### all_registered()
220
221	```python
222	@classmethod
223	def all_registered(cls) -> Dict[str, Dict]
224	```
225
226	Return all registered providers and their metadata dictionaries.
227
228	---
229
230	## OpenAICompatibleProvider
231
232	```python
233	from video_processor.providers.base import OpenAICompatibleProvider
234	```
235
236	Base class for providers using OpenAI-compatible APIs (Together, Fireworks, Cerebras, xAI, Azure). Implements `chat()`, `analyze_image()`, and `list_models()` using the OpenAI client library. `transcribe_audio()` raises `NotImplementedError` by default.
237
238	Constructor:
239
240	```python
241	def __init__(self, api_key: Optional[str] = None, base_url: Optional[str] = None)
242	```
243
244	\| Parameter \| Type \| Default \| Description \|
245	\|---\|---\|---\|---\|
246	\| `api_key` \| `Optional[str]` \| `None` \| API key (falls back to `self.env_var` environment variable) \|
247	\| `base_url` \| `Optional[str]` \| `None` \| API base URL (falls back to `self.base_url` class attribute) \|
248
249	Subclass attributes to override:
250
251	\| Attribute \| Description \|
252	\|---\|---\|
253	\| `provider_name` \| Provider identifier string \|
254	\| `base_url` \| Default API base URL \|
255	\| `env_var` \| Environment variable name for the API key \|
256
257	Usage tracking: After each `chat()` or `analyze_image()` call, the provider stores token counts in `self._last_usage` as `{"input_tokens": int, "output_tokens": int}`. This is consumed by `ProviderManager._track()`.
258
259	---
260
261	## ProviderManager
262
263	```python
264	from video_processor.providers.manager import ProviderManager
265	```
266
267	High-level router that selects the best available provider and model for each API call. Supports explicit model selection, forced provider, or automatic selection based on discovered capabilities.
268
269	### Constructor
270
271	```python
272	def __init__(
273	self,
274	vision_model: Optional[str] = None,
275	chat_model: Optional[str] = None,
276	transcription_model: Optional[str] = None,
277	provider: Optional[str] = None,
278	auto: bool = True,
279	)
280	```
281
282	\| Parameter \| Type \| Default \| Description \|
283	\|---\|---\|---\|---\|
284	\| `vision_model` \| `Optional[str]` \| `None` \| Override model for vision tasks (e.g., `"gpt-4o"`) \|
285	\| `chat_model` \| `Optional[str]` \| `None` \| Override model for chat/LLM tasks \|
286	\| `transcription_model` \| `Optional[str]` \| `None` \| Override model for transcription \|
287	\| `provider` \| `Optional[str]` \| `None` \| Force all tasks to a single provider \|
288	\| `auto` \| `bool` \| `True` \| If `True` and no model specified, pick the best available \|
289
290	Attributes:
291
292	\| Attribute \| Type \| Description \|
293	\|---\|---\|---\|
294	\| `usage` \| `UsageTracker` \| Tracks token counts and API costs across all calls \|
295
296	### Auto-selection preferences
297
298	When `auto=True` and no explicit model is set, providers are tried in this order:
299
300	Vision: Gemini (`gemini-2.5-flash`) > OpenAI (`gpt-4o-mini`) > Anthropic (`claude-haiku-4-5-20251001`)
301
302	Chat: Anthropic (`claude-haiku-4-5-20251001`) > OpenAI (`gpt-4o-mini`) > Gemini (`gemini-2.5-flash`)
303
304	Transcription: OpenAI (`whisper-1`) > Gemini (`gemini-2.5-flash`)
305
306	If no API-key-based provider is available, Ollama is tried as a fallback.
307
308	### chat()
309
310	```python
311	def chat(
312	self,
313	messages: list[dict],
314	max_tokens: int = 4096,
315	temperature: float = 0.7,
316	) -> str
317	```
318
319	Send a chat completion to the best available provider. Automatically resolves which provider and model to use.
320
321	Parameters:
322
323	\| Parameter \| Type \| Default \| Description \|
324	\|---\|---\|---\|---\|
325	\| `messages` \| `list[dict]` \| required \| OpenAI-format messages \|
326	\| `max_tokens` \| `int` \| `4096` \| Maximum response tokens \|
327	\| `temperature` \| `float` \| `0.7` \| Sampling temperature \|
328
329	Returns: `str` -- assistant response text.
330
331	Raises: `RuntimeError` if no provider is available for the `chat` capability.
332
333	### analyze_image()
334
335	```python
336	def analyze_image(
337	self,
338	image_bytes: bytes,
339	prompt: str,
340	max_tokens: int = 4096,
341	) -> str
342	```
343
344	Analyze an image using the best available vision provider.
345
346	Returns: `str` -- analysis text.
347
348	Raises: `RuntimeError` if no provider is available for the `vision` capability.
349
350	### transcribe_audio()
351
352	```python
353	def transcribe_audio(
354	self,
355	audio_path: str \| Path,
356	language: Optional[str] = None,
357	speaker_hints: Optional[list[str]] = None,
358	) -> dict
359	```
360
361	Transcribe audio. Prefers local Whisper (no file size limits, no API costs) when available, falling back to API-based transcription.
362
363	Parameters:
364
365	\| Parameter \| Type \| Default \| Description \|
366	\|---\|---\|---\|---\|
367	\| `audio_path` \| `str \\| Path` \| required \| Path to the audio file \|
368	\| `language` \| `Optional[str]` \| `None` \| Language hint \|
369	\| `speaker_hints` \| `Optional[list[str]]` \| `None` \| Speaker names for better recognition \|
370
371	Returns: `dict` -- transcription result with `text`, `segments`, `duration`.
372
373	Local Whisper: If `transcription_model` is unset or starts with `"whisper-local"`, the manager tries local Whisper first. Use `"whisper-local:large"` to specify a model size.
374
375	### get_models_used()
376
377	```python
378	def get_models_used(self) -> dict[str, str]
379	```
380
381	Return a dict mapping capability to `"provider/model"` string for tracking purposes.
382
383	```python
384	pm = ProviderManager()
385	print(pm.get_models_used())
386	# {"vision": "gemini/gemini-2.5-flash", "chat": "anthropic/claude-haiku-4-5-20251001", ...}
387	```
388
389	### Usage examples
390
391	```python
392	from video_processor.providers.manager import ProviderManager
393
394	# Auto-select best providers
395	pm = ProviderManager()
396
397	# Force everything through one provider
398	pm = ProviderManager(provider="openai")
399
400	# Explicit model selection
401	pm = ProviderManager(
402	vision_model="gpt-4o",
403	chat_model="claude-haiku-4-5-20251001",
404	transcription_model="whisper-local:large",
405	)
406
407	# Chat completion
408	response = pm.chat([
409	{"role": "user", "content": "Summarize this meeting transcript..."}
410	])
411
412	# Image analysis
413	with open("diagram.png", "rb") as f:
414	analysis = pm.analyze_image(f.read(), "Describe this architecture diagram")
415
416	# Transcription with speaker hints
417	result = pm.transcribe_audio(
418	"meeting.mp3",
419	language="en",
420	speaker_hints=["Alice", "Bob", "Charlie"],
421	)
422
423	# Check usage
424	print(pm.usage.summary())
425	```
426
427	---
428
429	## discover_available_models()
430
431	```python
432	from video_processor.providers.discovery import discover_available_models
433	```
434
435	```python
436	def discover_available_models(
437	api_keys: Optional[dict[str, str]] = None,
438	force_refresh: bool = False,
439	) -> list[ModelInfo]
440	```
441
442	Discover available models from all configured providers. For each provider with a valid API key, calls `list_models()` and returns a unified, sorted list.
443
444	Parameters:
445
446	\| Parameter \| Type \| Default \| Description \|
447	\|---\|---\|---\|---\|
448	\| `api_keys` \| `Optional[dict[str, str]]` \| `None` \| Override API keys (defaults to environment variables) \|
449	\| `force_refresh` \| `bool` \| `False` \| Force re-discovery, ignoring the session cache \|
450
451	Returns: `list[ModelInfo]` -- all discovered models, sorted by provider then model ID.
452
453	Caching: Results are cached for the session. Use `force_refresh=True` or `clear_discovery_cache()` to refresh.
454
455	```python
456	from video_processor.providers.discovery import (
457	discover_available_models,
458	clear_discovery_cache,
459	)
460
461	# Discover models using environment variables
462	models = discover_available_models()
463	for m in models:
464	print(f"{m.provider}/{m.id} - {m.capabilities}")
465
466	# Force refresh
467	models = discover_available_models(force_refresh=True)
468
469	# Override API keys
470	models = discover_available_models(api_keys={
471	"openai": "sk-...",
472	"anthropic": "sk-ant-...",
473	})
474
475	# Clear cache
476	clear_discovery_cache()
477	```
478
479	### clear_discovery_cache()
480
481	```python
482	def clear_discovery_cache() -> None
483	```
484
485	Clear the cached model list, forcing the next `discover_available_models()` call to re-query providers.
486
487	---
488
489	## Built-in Providers
490
491	The following providers are registered automatically when the provider system initializes:
492
493	\| Provider \| Environment Variable \| Capabilities \| Default Chat Model \|
494	\|---\|---\|---\|---\|
495	\| `openai` \| `OPENAI_API_KEY` \| chat, vision, audio \| `gpt-4o-mini` \|
496	\| `anthropic` \| `ANTHROPIC_API_KEY` \| chat, vision \| `claude-haiku-4-5-20251001` \|
497	\| `gemini` \| `GEMINI_API_KEY` \| chat, vision, audio \| `gemini-2.5-flash` \|
498	\| `ollama` \| (none -- checks server) \| chat, vision \| (depends on installed models) \|
499	\| `together` \| `TOGETHER_API_KEY` \| chat \| (varies) \|
500	\| `fireworks` \| `FIREWORKS_API_KEY` \| chat \| (varies) \|
501	\| `cerebras` \| `CEREBRAS_API_KEY` \| chat \| (varies) \|
502	\| `xai` \| `XAI_API_KEY` \| chat \| (varies) \|
503	\| `azure` \| `AZURE_OPENAI_API_KEY` \| chat, vision \| (varies) \|
504
505	DDED docs/api/sources.md

A docs/api/sources.md

+281

		--- a/docs/api/sources.md
		+++ b/docs/api/sources.md
		@@ -0,0 +1,281 @@
	1	+# Sources API Reference
	2	+
	3	+::: video_processor.sources.base
	4	+
	5	+---
	6	+
	7	+## Overview
	8	+
	9	+The sources module provides a unified interface for fetching content from cloud services, local applications, and the web. All sources implement the `BaseSource` abstract class, providing consistent `authenticate()`, `list_videos()`, and `download()` methods.
	10	+
	11	+Sources are lazy-loaded to avoid pulling in optional dependencies at import time. You can import any source directly from `video_processor.sources` and the correct module will be loaded on demand.
	12	+
	13	+---
	14	+
	15	+## BaseSource (ABC)
	16	+
	17	+```python
	18	+from video_processor.sources import BaseSource
	19	+```
	20	+
	21	+Abstract base class that all source integrations implement. Defines the standard three-step workflow: authenticate, list, download.
	22	+
	23	+### authenticate()
	24	+
	25	+```python
	26	+@abstractmethod
	27	+def authenticate(self) -> bool
	28	+```
	29	+
	30	+Authenticate with the cloud provider or service. Uses the auth strategy defined for the source (OAuth, API key, local access, etc.).
	31	+
	32	+Returns: `bool` -- `True` on successful authentication, `False` on failure.
	33	+
	34	+### list_videos()
	35	+
	36	+```python
	37	+@abstractmethod
	38	+def list_videos(
	39	+ self,
	40	+ folder_id: Optional[str] = None,
	41	+ folder_path: Optional[str] = None,
	42	+ patterns: Optional[List[str]] = None,
	43	+) -> List[SourceFile]
	44	+```
	45	+
	46	+List available video files (or other content, depending on the source).
	47	+
	48	+Parameters:
	49	+
	50	+\| Parameter \| Type \| Default \| Description \|
	51	+\|---\|---\|---\|---\|
	52	+\| `folder_id` \| `Optional[str]` \| `None` \| Provider-specific folder/container identifier \|
	53	+\| `folder_path` \| `Optional[str]` \| `None` \| Path within the source (e.g., folder name) \|
	54	+\| `patterns` \| `Optional[List[str]]` \| `None` \| File name glob patterns to filter results \|
	55	+
	56	+Returns: `List[SourceFile]` -- available files matching the criteria.
	57	+
	58	+### download()
	59	+
	60	+```python
	61	+@abstractmethod
	62	+def download(
	63	+ self,
	64	+ file: SourceFile,
	65	+ destination: Path,
	66	+) -> Path
	67	+```
	68	+
	69	+Download a single file to a local path.
	70	+
	71	+Parameters:
	72	+
	73	+\| Parameter \| Type \| Description \|
	74	+\|---\|---\|---\|
	75	+\| `file` \| `SourceFile` \| File descriptor from `list_videos()` \|
	76	+\| `destination` \| `Path` \| Local destination path \|
	77	+
	78	+Returns: `Path` -- the local path where the file was saved.
	79	+
	80	+### download_all()
	81	+
	82	+```python
	83	+def download_all(
	84	+ self,
	85	+ files: List[SourceFile],
	86	+ destination_dir: Path,
	87	+) -> List[Path]
	88	+```
	89	+
	90	+Download multiple files to a directory, preserving subfolder structure from `SourceFile.path`. This is a concrete method provided by the base class.
	91	+
	92	+Parameters:
	93	+
	94	+\| Parameter \| Type \| Description \|
	95	+\|---\|---\|---\|
	96	+\| `files` \| `List[SourceFile]` \| Files to download \|
	97	+\| `destination_dir` \| `Path` \| Base directory for downloads (created if needed) \|
	98	+
	99	+Returns: `List[Path]` -- local paths of successfully downloaded files. Failed downloads are logged and skipped.
	100	+
	101	+---
	102	+
	103	+## SourceFile
	104	+
	105	+```python
	106	+from video_processor.sources import SourceFile
	107	+```
	108	+
	109	+Pydantic model describing a file available in a cloud source.
	110	+
	111	+\| Field \| Type \| Default \| Description \|
	112	+\|---\|---\|---\|---\|
	113	+\| `name` \| `str` \| required \| File name \|
	114	+\| `id` \| `str` \| required \| Provider-specific file identifier \|
	115	+\| `size_bytes` \| `Optional[int]` \| `None` \| File size in bytes \|
	116	+\| `mime_type` \| `Optional[str]` \| `None` \| MIME type (e.g., `"video/mp4"`) \|
	117	+\| `modified_at` \| `Optional[str]` \| `None` \| Last modified timestamp \|
	118	+\| `path` \| `Optional[str]` \| `None` \| Path within the source folder (used for subfolder structure in `download_all`) \|
	119	+
	120	+```json
	121	+{
	122	+ "name": "sprint-review-2026-03-01.mp4",
	123	+ "id": "abc123def456",
	124	+ "size_bytes": 524288000,
	125	+ "mime_type": "video/mp4",
	126	+ "modified_at": "2026-03-01T14:30:00Z",
	127	+ "path": "recordings/march/sprint-review-2026-03-01.mp4"
	128	+}
	129	+```
	130	+
	131	+---
	132	+
	133	+## Lazy Loading Pattern
	134	+
	135	+All sources are lazy-loaded via `__getattr__` in the package `__init__.py`. This means importing `video_processor.sources` does not pull in any external dependencies (e.g., `google-auth`, `msal`, `notion-client`). The actual module is loaded only when you access the class.
	136	+
	137	+```python
	138	+# This import is instant -- no dependencies loaded
	139	+from video_processor.sources import ZoomSource
	140	+
	141	+# The zoom_source module (and its dependencies) are loaded here
	142	+source = ZoomSource()
	143	+```
	144	+
	145	+---
	146	+
	147	+## Available Sources
	148	+
	149	+### Cloud Recordings
	150	+
	151	+Sources for fetching recorded meetings from video conferencing platforms.
	152	+
	153	+\| Source \| Class \| Auth Method \| Description \|
	154	+\|---\|---\|---\|---\|
	155	+\| Zoom \| `ZoomSource` \| OAuth / Server-to-Server \| List and download Zoom cloud recordings \|
	156	+\| Google Meet \| `MeetRecordingSource` \| OAuth (Google) \| List and download Google Meet recordings from Drive \|
	157	+\| Microsoft Teams \| `TeamsRecordingSource` \| OAuth (Microsoft) \| List and download Teams meeting recordings \|
	158	+
	159	+### Cloud Storage and Workspace
	160	+
	161	+Sources for accessing files stored in cloud platforms.
	162	+
	163	+\| Source \| Class \| Auth Method \| Description \|
	164	+\|---\|---\|---\|---\|
	165	+\| Google Drive \| `GoogleDriveSource` \| OAuth (Google) \| Files from Google Drive \|
	166	+\| Google Workspace \| `GWSSource` \| OAuth (Google) \| Google Docs, Sheets, Slides \|
	167	+\| Microsoft 365 \| `M365Source` \| OAuth (Microsoft) \| OneDrive, SharePoint files \|
	168	+\| Notion \| `NotionSource` \| OAuth / API key \| Notion pages and databases \|
	169	+\| GitHub \| `GitHubSource` \| OAuth / API token \| Repository files, issues, discussions \|
	170	+\| Dropbox \| `DropboxSource` \| OAuth / access token \| (via auth config) \|
	171	+
	172	+### Notes Applications
	173	+
	174	+Sources for local and cloud-based note-taking apps.
	175	+
	176	+\| Source \| Class \| Auth Method \| Description \|
	177	+\|---\|---\|---\|---\|
	178	+\| Apple Notes \| `AppleNotesSource` \| Local (macOS) \| Notes from Apple Notes.app \|
	179	+\| Obsidian \| `ObsidianSource` \| Local filesystem \| Markdown files from Obsidian vaults \|
	180	+\| Logseq \| `LogseqSource` \| Local filesystem \| Pages from Logseq graphs \|
	181	+\| OneNote \| `OneNoteSource` \| OAuth (Microsoft) \| Microsoft OneNote notebooks \|
	182	+\| Google Keep \| `GoogleKeepSource` \| OAuth (Google) \| Google Keep notes \|
	183	+
	184	+### Web and Content
	185	+
	186	+Sources for fetching content from the web.
	187	+
	188	+\| Source \| Class \| Auth Method \| Description \|
	189	+\|---\|---\|---\|---\|
	190	+\| YouTube \| `YouTubeSource` \| API key / OAuth \| YouTube video metadata and transcripts \|
	191	+\| Web \| `WebSource` \| None \| General web page content extraction \|
	192	+\| RSS \| `RSSSource` \| None \| RSS/Atom feed entries \|
	193	+\| Podcast \| `PodcastSource` \| None \| Podcast episodes from RSS feeds \|
	194	+\| arXiv \| `ArxivSource` \| None \| Academic papers from arXiv \|
	195	+\| Hacker News \| `HackerNewsSource` \| None \| Hacker News posts and comments \|
	196	+\| Reddit \| `RedditSource` \| API credentials \| Reddit posts and comments \|
	197	+\| Twitter/X \| `TwitterSource` \| API credentials \| Tweets and threads \|
	198	+
	199	+---
	200	+
	201	+## Auth Integration
	202	+
	203	+Most sources use PlanOpticon's unified auth system (see [Auth API](auth.md)). The typical pattern within a source implementation:
	204	+
	205	+```python
	206	+from video_processor.auth import get_auth_manager
	207	+
	208	+class MySource(BaseSource):
	209	+ def __init__(self):
	210	+ self._token = None
	211	+
	212	+ def authenticate(self) -> bool:
	213	+ manager = get_auth_manager("my_service")
	214	+ if manager:
	215	+ token = manager.get_token()
	216	+ if token:
	217	+ self._token = token
	218	+ return True
	219	+ return False
	220	+
	221	+ def list_videos(self, **kwargs) -> list[SourceFile]:
	222	+ if not self._token:
	223	+ raise RuntimeError("Not authenticated. Call authenticate() first.")
	224	+ # Use self._token to call the API
	225	+ ...
	226	+```
	227	+
	228	+---
	229	+
	230	+## Usage Examples
	231	+
	232	+### Listing and downloading Zoom recordings
	233	+
	234	+```python
	235	+from pathlib import Path
	236	+from video_processor.sources import ZoomSource
	237	+
	238	+source = ZoomSource()
	239	+if source.authenticate():
	240	+ recordings = source.list_videos()
	241	+ for rec in recordings:
	242	+ print(f"{rec.name} ({rec.size_bytes} bytes)")
	243	+
	244	+ # Download all to a local directory
	245	+ paths = source.download_all(recordings, Path("./downloads"))
	246	+```
	247	+
	248	+### Fetching from multiple sources
	249	+
	250	+```python
	251	+from pathlib import Path
	252	+from video_processor.sources import GoogleDriveSource, NotionSource
	253	+
	254	+# Google Drive
	255	+gdrive = GoogleDriveSource()
	256	+if gdrive.authenticate():
	257	+ files = gdrive.list_videos(
	258	+ folder_path="Meeting Recordings",
	259	+ patterns=[".mp4", ".webm"],
	260	+ )
	261	+ gdrive.download_all(files, Path("./drive-downloads"))
	262	+
	263	+# Notion
	264	+notion = NotionSource()
	265	+if notion.authenticate():
	266	+ pages = notion.list_videos() # Lists Notion pages
	267	+ for page in pages:
	268	+ print(f"Page: {page.name}")
	269	+```
	270	+
	271	+### YouTube content
	272	+
	273	+```python
	274	+from video_processor.sources import YouTubeSource
	275	+
	276	+yt = YouTubeSource()
	277	+if yt.authenticate():
	278	+ videos = yt.list_videos(folder_path="https://youtube.com/playlist?list=...")
	279	+ for v in videos:
	280	+ print(f"{v.name} - {v.id}")
	281	+```

	--- a/docs/api/sources.md
	+++ b/docs/api/sources.md
	@@ -0,0 +1,281 @@

	--- a/docs/api/sources.md
	+++ b/docs/api/sources.md
	@@ -0,0 +1,281 @@
1	# Sources API Reference
2
3	::: video_processor.sources.base
4
5	---
6
7	## Overview
8
9	The sources module provides a unified interface for fetching content from cloud services, local applications, and the web. All sources implement the `BaseSource` abstract class, providing consistent `authenticate()`, `list_videos()`, and `download()` methods.
10
11	Sources are lazy-loaded to avoid pulling in optional dependencies at import time. You can import any source directly from `video_processor.sources` and the correct module will be loaded on demand.
12
13	---
14
15	## BaseSource (ABC)
16
17	```python
18	from video_processor.sources import BaseSource
19	```
20
21	Abstract base class that all source integrations implement. Defines the standard three-step workflow: authenticate, list, download.
22
23	### authenticate()
24
25	```python
26	@abstractmethod
27	def authenticate(self) -> bool
28	```
29
30	Authenticate with the cloud provider or service. Uses the auth strategy defined for the source (OAuth, API key, local access, etc.).
31
32	Returns: `bool` -- `True` on successful authentication, `False` on failure.
33
34	### list_videos()
35
36	```python
37	@abstractmethod
38	def list_videos(
39	self,
40	folder_id: Optional[str] = None,
41	folder_path: Optional[str] = None,
42	patterns: Optional[List[str]] = None,
43	) -> List[SourceFile]
44	```
45
46	List available video files (or other content, depending on the source).
47
48	Parameters:
49
50	\| Parameter \| Type \| Default \| Description \|
51	\|---\|---\|---\|---\|
52	\| `folder_id` \| `Optional[str]` \| `None` \| Provider-specific folder/container identifier \|
53	\| `folder_path` \| `Optional[str]` \| `None` \| Path within the source (e.g., folder name) \|
54	\| `patterns` \| `Optional[List[str]]` \| `None` \| File name glob patterns to filter results \|
55
56	Returns: `List[SourceFile]` -- available files matching the criteria.
57
58	### download()
59
60	```python
61	@abstractmethod
62	def download(
63	self,
64	file: SourceFile,
65	destination: Path,
66	) -> Path
67	```
68
69	Download a single file to a local path.
70
71	Parameters:
72
73	\| Parameter \| Type \| Description \|
74	\|---\|---\|---\|
75	\| `file` \| `SourceFile` \| File descriptor from `list_videos()` \|
76	\| `destination` \| `Path` \| Local destination path \|
77
78	Returns: `Path` -- the local path where the file was saved.
79
80	### download_all()
81
82	```python
83	def download_all(
84	self,
85	files: List[SourceFile],
86	destination_dir: Path,
87	) -> List[Path]
88	```
89
90	Download multiple files to a directory, preserving subfolder structure from `SourceFile.path`. This is a concrete method provided by the base class.
91
92	Parameters:
93
94	\| Parameter \| Type \| Description \|
95	\|---\|---\|---\|
96	\| `files` \| `List[SourceFile]` \| Files to download \|
97	\| `destination_dir` \| `Path` \| Base directory for downloads (created if needed) \|
98
99	Returns: `List[Path]` -- local paths of successfully downloaded files. Failed downloads are logged and skipped.
100
101	---
102
103	## SourceFile
104
105	```python
106	from video_processor.sources import SourceFile
107	```
108
109	Pydantic model describing a file available in a cloud source.
110
111	\| Field \| Type \| Default \| Description \|
112	\|---\|---\|---\|---\|
113	\| `name` \| `str` \| required \| File name \|
114	\| `id` \| `str` \| required \| Provider-specific file identifier \|
115	\| `size_bytes` \| `Optional[int]` \| `None` \| File size in bytes \|
116	\| `mime_type` \| `Optional[str]` \| `None` \| MIME type (e.g., `"video/mp4"`) \|
117	\| `modified_at` \| `Optional[str]` \| `None` \| Last modified timestamp \|
118	\| `path` \| `Optional[str]` \| `None` \| Path within the source folder (used for subfolder structure in `download_all`) \|
119
120	```json
121	{
122	"name": "sprint-review-2026-03-01.mp4",
123	"id": "abc123def456",
124	"size_bytes": 524288000,
125	"mime_type": "video/mp4",
126	"modified_at": "2026-03-01T14:30:00Z",
127	"path": "recordings/march/sprint-review-2026-03-01.mp4"
128	}
129	```
130
131	---
132
133	## Lazy Loading Pattern
134
135	All sources are lazy-loaded via `__getattr__` in the package `__init__.py`. This means importing `video_processor.sources` does not pull in any external dependencies (e.g., `google-auth`, `msal`, `notion-client`). The actual module is loaded only when you access the class.
136
137	```python
138	# This import is instant -- no dependencies loaded
139	from video_processor.sources import ZoomSource
140
141	# The zoom_source module (and its dependencies) are loaded here
142	source = ZoomSource()
143	```
144
145	---
146
147	## Available Sources
148
149	### Cloud Recordings
150
151	Sources for fetching recorded meetings from video conferencing platforms.
152
153	\| Source \| Class \| Auth Method \| Description \|
154	\|---\|---\|---\|---\|
155	\| Zoom \| `ZoomSource` \| OAuth / Server-to-Server \| List and download Zoom cloud recordings \|
156	\| Google Meet \| `MeetRecordingSource` \| OAuth (Google) \| List and download Google Meet recordings from Drive \|
157	\| Microsoft Teams \| `TeamsRecordingSource` \| OAuth (Microsoft) \| List and download Teams meeting recordings \|
158
159	### Cloud Storage and Workspace
160
161	Sources for accessing files stored in cloud platforms.
162
163	\| Source \| Class \| Auth Method \| Description \|
164	\|---\|---\|---\|---\|
165	\| Google Drive \| `GoogleDriveSource` \| OAuth (Google) \| Files from Google Drive \|
166	\| Google Workspace \| `GWSSource` \| OAuth (Google) \| Google Docs, Sheets, Slides \|
167	\| Microsoft 365 \| `M365Source` \| OAuth (Microsoft) \| OneDrive, SharePoint files \|
168	\| Notion \| `NotionSource` \| OAuth / API key \| Notion pages and databases \|
169	\| GitHub \| `GitHubSource` \| OAuth / API token \| Repository files, issues, discussions \|
170	\| Dropbox \| `DropboxSource` \| OAuth / access token \| (via auth config) \|
171
172	### Notes Applications
173
174	Sources for local and cloud-based note-taking apps.
175
176	\| Source \| Class \| Auth Method \| Description \|
177	\|---\|---\|---\|---\|
178	\| Apple Notes \| `AppleNotesSource` \| Local (macOS) \| Notes from Apple Notes.app \|
179	\| Obsidian \| `ObsidianSource` \| Local filesystem \| Markdown files from Obsidian vaults \|
180	\| Logseq \| `LogseqSource` \| Local filesystem \| Pages from Logseq graphs \|
181	\| OneNote \| `OneNoteSource` \| OAuth (Microsoft) \| Microsoft OneNote notebooks \|
182	\| Google Keep \| `GoogleKeepSource` \| OAuth (Google) \| Google Keep notes \|
183
184	### Web and Content
185
186	Sources for fetching content from the web.
187
188	\| Source \| Class \| Auth Method \| Description \|
189	\|---\|---\|---\|---\|
190	\| YouTube \| `YouTubeSource` \| API key / OAuth \| YouTube video metadata and transcripts \|
191	\| Web \| `WebSource` \| None \| General web page content extraction \|
192	\| RSS \| `RSSSource` \| None \| RSS/Atom feed entries \|
193	\| Podcast \| `PodcastSource` \| None \| Podcast episodes from RSS feeds \|
194	\| arXiv \| `ArxivSource` \| None \| Academic papers from arXiv \|
195	\| Hacker News \| `HackerNewsSource` \| None \| Hacker News posts and comments \|
196	\| Reddit \| `RedditSource` \| API credentials \| Reddit posts and comments \|
197	\| Twitter/X \| `TwitterSource` \| API credentials \| Tweets and threads \|
198
199	---
200
201	## Auth Integration
202
203	Most sources use PlanOpticon's unified auth system (see [Auth API](auth.md)). The typical pattern within a source implementation:
204
205	```python
206	from video_processor.auth import get_auth_manager
207
208	class MySource(BaseSource):
209	def __init__(self):
210	self._token = None
211
212	def authenticate(self) -> bool:
213	manager = get_auth_manager("my_service")
214	if manager:
215	token = manager.get_token()
216	if token:
217	self._token = token
218	return True
219	return False
220
221	def list_videos(self, **kwargs) -> list[SourceFile]:
222	if not self._token:
223	raise RuntimeError("Not authenticated. Call authenticate() first.")
224	# Use self._token to call the API
225	...
226	```
227
228	---
229
230	## Usage Examples
231
232	### Listing and downloading Zoom recordings
233
234	```python
235	from pathlib import Path
236	from video_processor.sources import ZoomSource
237
238	source = ZoomSource()
239	if source.authenticate():
240	recordings = source.list_videos()
241	for rec in recordings:
242	print(f"{rec.name} ({rec.size_bytes} bytes)")
243
244	# Download all to a local directory
245	paths = source.download_all(recordings, Path("./downloads"))
246	```
247
248	### Fetching from multiple sources
249
250	```python
251	from pathlib import Path
252	from video_processor.sources import GoogleDriveSource, NotionSource
253
254	# Google Drive
255	gdrive = GoogleDriveSource()
256	if gdrive.authenticate():
257	files = gdrive.list_videos(
258	folder_path="Meeting Recordings",
259	patterns=[".mp4", ".webm"],
260	)
261	gdrive.download_all(files, Path("./drive-downloads"))
262
263	# Notion
264	notion = NotionSource()
265	if notion.authenticate():
266	pages = notion.list_videos() # Lists Notion pages
267	for page in pages:
268	print(f"Page: {page.name}")
269	```
270
271	### YouTube content
272
273	```python
274	from video_processor.sources import YouTubeSource
275
276	yt = YouTubeSource()
277	if yt.authenticate():
278	videos = yt.list_videos(folder_path="https://youtube.com/playlist?list=...")
279	for v in videos:
280	print(f"{v.name} - {v.id}")
281	```

M docs/architecture/pipeline.md

+292 -14

		--- docs/architecture/pipeline.md
		+++ docs/architecture/pipeline.md
		@@ -1,8 +1,14 @@
1	1	# Processing Pipeline
	2	+
	3	+PlanOpticon has four main pipelines: video analysis, document ingestion, source connector, and export. Each pipeline can operate independently, and they connect through the shared knowledge graph.
	4	+
	5	+---
2	6
3	7	## Single video pipeline
	8	+
	9	+The core video analysis pipeline processes a single video file through eight sequential steps with checkpoint/resume support.
4	10
5	11	```mermaid
6	12	sequenceDiagram
7	13	participant CLI
8	14	participant Pipeline
		@@ -9,49 +15,321 @@
9	15	participant FrameExtractor
10	16	participant AudioExtractor
11	17	participant Provider
12	18	participant DiagramAnalyzer
13	19	participant KnowledgeGraph
	20	+ participant Exporter
14	21
15	22	CLI->>Pipeline: process_single_video()
	23	+
	24	+ Note over Pipeline: Step 1: Extract frames
16	25	Pipeline->>FrameExtractor: extract_frames()
17	26	Note over FrameExtractor: Change detection + periodic capture (every 30s)
	27	+ FrameExtractor-->>Pipeline: frame_paths[]
	28	+
	29	+ Note over Pipeline: Step 2: Filter people frames
18	30	Pipeline->>Pipeline: filter_people_frames()
19	31	Note over Pipeline: OpenCV face detection removes webcam/people frames
	32	+
	33	+ Note over Pipeline: Step 3: Extract + transcribe audio
20	34	Pipeline->>AudioExtractor: extract_audio()
21	35	Pipeline->>Provider: transcribe_audio()
	36	+ Note over Provider: Supports speaker hints via --speakers flag
	37	+
	38	+ Note over Pipeline: Step 4: Analyze visuals
22	39	Pipeline->>DiagramAnalyzer: process_frames()
23		-
24		- loop Each frame
	40	+ loop Each frame (up to 10 standard / 20 comprehensive)
25	41	DiagramAnalyzer->>Provider: classify (vision)
26	42	alt High confidence diagram
27	43	DiagramAnalyzer->>Provider: full analysis
	44	+ Note over Provider: Extract description, text, mermaid, chart data
28	45	else Medium confidence
29	46	DiagramAnalyzer-->>Pipeline: screengrab fallback
30	47	end
31	48	end
32	49
	50	+ Note over Pipeline: Step 5: Build knowledge graph
	51	+ Pipeline->>KnowledgeGraph: register_source()
33	52	Pipeline->>KnowledgeGraph: process_transcript()
34	53	Pipeline->>KnowledgeGraph: process_diagrams()
	54	+ Note over KnowledgeGraph: Writes knowledge_graph.db (SQLite) + .json
	55	+
	56	+ Note over Pipeline: Step 6: Extract key points + action items
35	57	Pipeline->>Provider: extract key points
36	58	Pipeline->>Provider: extract action items
37		- Pipeline->>Pipeline: generate reports
38		- Pipeline->>Pipeline: export formats
	59	+
	60	+ Note over Pipeline: Step 7: Generate report
	61	+ Pipeline->>Pipeline: generate markdown report
	62	+ Note over Pipeline: Includes mermaid diagrams, tables, cross-references
	63	+
	64	+ Note over Pipeline: Step 8: Export formats
	65	+ Pipeline->>Exporter: export_all_formats()
	66	+ Note over Exporter: HTML report, PDF, SVG/PNG renderings, chart reproductions
	67	+
39	68	Pipeline-->>CLI: VideoManifest
40	69	```
	70	+
	71	+### Pipeline steps in detail
	72	+
	73	+\| Step \| Name \| Checkpointable \| Description \|
	74	+\|------\|------\|----------------\|-------------\|
	75	+\| 1 \| Extract frames \| Yes \| Change detection + periodic capture. Skipped if `frames/frame_*.jpg` exist on disk. \|
	76	+\| 2 \| Filter people frames \| No \| Inline with step 1. OpenCV face detection removes webcam frames. \|
	77	+\| 3 \| Extract + transcribe audio \| Yes \| Skipped if `transcript/transcript.json` exists. Speaker hints passed if `--speakers` provided. \|
	78	+\| 4 \| Analyze visuals \| Yes \| Skipped if `diagrams/` is populated. Evenly samples frames (not just first N). \|
	79	+\| 5 \| Build knowledge graph \| Yes \| Skipped if `results/knowledge_graph.db` exists. Registers source, processes transcript and diagrams. \|
	80	+\| 6 \| Extract key points + actions \| Yes \| Skipped if `results/key_points.json` and `results/action_items.json` exist. \|
	81	+\| 7 \| Generate report \| Yes \| Skipped if `results/analysis.md` exists. \|
	82	+\| 8 \| Export formats \| No \| Always runs. Renders mermaid to SVG/PNG, reproduces charts, generates HTML/PDF. \|
	83	+
	84	+---
41	85
42	86	## Batch pipeline
43	87
44		-The batch command wraps the single-video pipeline:
	88	+The batch pipeline wraps the single-video pipeline and adds cross-video knowledge graph merging.
	89	+
	90	+```mermaid
	91	+flowchart TD
	92	+ A[Scan input directory] --> B[Match video files by pattern]
	93	+ B --> C{For each video}
	94	+ C --> D[process_single_video]
	95	+ D --> E{Success?}
	96	+ E -->\|Yes\| F[Collect manifest + KG]
	97	+ E -->\|No\| G[Log error, continue]
	98	+ F --> H[Next video]
	99	+ G --> H
	100	+ H --> C
	101	+ C -->\|All done\| I[Merge knowledge graphs]
	102	+ I --> J[Fuzzy matching + conflict resolution]
	103	+ J --> K[Generate batch summary]
	104	+ K --> L[Write batch manifest]
	105	+ L --> M[batch_manifest.json + batch_summary.md + merged KG]
	106	+```
	107	+
	108	+### Knowledge graph merge strategy
	109	+
	110	+During batch merging, `KnowledgeGraph.merge()` applies:
	111	+
	112	+1. Case-insensitive exact matching for entity names
	113	+2. Fuzzy matching via `SequenceMatcher` (threshold >= 0.85) for near-duplicates
	114	+3. Type conflict resolution using a specificity ranking (e.g., `technology` > `concept`)
	115	+4. Description union across all sources
	116	+5. Relationship deduplication by (source, target, type) tuple
	117	+
	118	+---
	119	+
	120	+## Document ingestion pipeline
	121	+
	122	+The document ingestion pipeline processes files (Markdown, plaintext, PDF) into knowledge graphs without video analysis.
	123	+
	124	+```mermaid
	125	+flowchart TD
	126	+ A[Input: file or directory] --> B{File or directory?}
	127	+ B -->\|File\| C[get_processor by extension]
	128	+ B -->\|Directory\| D[Glob for supported extensions]
	129	+ D --> E{Recursive?}
	130	+ E -->\|Yes\| F[rglob all files]
	131	+ E -->\|No\| G[glob top-level only]
	132	+ F --> H[For each file]
	133	+ G --> H
	134	+ H --> C
	135	+ C --> I[DocumentProcessor.process]
	136	+ I --> J[DocumentChunk list]
	137	+ J --> K[Register source in KG]
	138	+ K --> L[Add chunks as content]
	139	+ L --> M[KG extracts entities + relationships]
	140	+ M --> N[knowledge_graph.db]
	141	+```
	142	+
	143	+### Supported document types
	144	+
	145	+\| Extension \| Processor \| Notes \|
	146	+\|-----------\|-----------\|-------\|
	147	+\| `.md` \| `MarkdownProcessor` \| Splits by headings into sections \|
	148	+\| `.txt` \| `PlaintextProcessor` \| Splits into fixed-size chunks \|
	149	+\| `.pdf` \| `PdfProcessor` \| Requires `pymupdf` or `pdfplumber`. Falls back gracefully between libraries. \|
	150	+
	151	+### Adding documents to an existing graph
	152	+
	153	+The `--db-path` flag lets you ingest documents into an existing knowledge graph:
	154	+
	155	+```bash
	156	+planopticon ingest spec.md --db-path existing.db
	157	+planopticon ingest ./docs/ -o ./output --recursive
	158	+```
	159	+
	160	+---
	161	+
	162	+## Source connector pipeline
	163	+
	164	+Source connectors fetch content from cloud services, note-taking apps, and web sources. Each source implements the `BaseSource` ABC with three methods: `authenticate()`, `list_videos()`, and `download()`.
	165	+
	166	+```mermaid
	167	+flowchart TD
	168	+ A[Source command] --> B[Authenticate with provider]
	169	+ B --> C{Auth success?}
	170	+ C -->\|No\| D[Error: check credentials]
	171	+ C -->\|Yes\| E[List files in folder]
	172	+ E --> F[Filter by pattern / type]
	173	+ F --> G[Download to local path]
	174	+ G --> H{Analyze or ingest?}
	175	+ H -->\|Video\| I[process_single_video / batch]
	176	+ H -->\|Document\| J[ingest_file / ingest_directory]
	177	+ I --> K[Knowledge graph]
	178	+ J --> K
	179	+```
	180	+
	181	+### Available sources
	182	+
	183	+PlanOpticon includes connectors for:
	184	+
	185	+\| Category \| Sources \|
	186	+\|----------\|---------\|
	187	+\| Cloud storage \| Google Drive, S3, Dropbox \|
	188	+\| Meeting recordings \| Zoom, Google Meet, Microsoft Teams \|
	189	+\| Productivity suites \| Google Workspace (Docs/Sheets/Slides), Microsoft 365 (SharePoint/OneDrive/OneNote) \|
	190	+\| Note-taking apps \| Obsidian, Logseq, Apple Notes, Google Keep, Notion \|
	191	+\| Web sources \| YouTube, Web (URL), RSS, Podcasts \|
	192	+\| Developer platforms \| GitHub, arXiv \|
	193	+\| Social media \| Reddit, Twitter/X, Hacker News \|
	194	+
	195	+Each source authenticates via environment variables (API keys, OAuth tokens) specific to the provider.
	196	+
	197	+---
	198	+
	199	+## Planning agent pipeline
	200	+
	201	+The planning agent consumes a knowledge graph and uses registered skills to generate planning artifacts.
	202	+
	203	+```mermaid
	204	+flowchart TD
	205	+ A[Knowledge graph] --> B[Load into AgentContext]
	206	+ B --> C[GraphQueryEngine]
	207	+ C --> D[Taxonomy classification]
	208	+ D --> E[Agent orchestrator]
	209	+ E --> F{Select skill}
	210	+ F --> G[ProjectPlan skill]
	211	+ F --> H[PRD skill]
	212	+ F --> I[Roadmap skill]
	213	+ F --> J[TaskBreakdown skill]
	214	+ F --> K[DocGenerator skill]
	215	+ F --> L[WikiGenerator skill]
	216	+ F --> M[NotesExport skill]
	217	+ F --> N[ArtifactExport skill]
	218	+ F --> O[GitHubIntegration skill]
	219	+ F --> P[RequirementsChat skill]
	220	+ G --> Q[Artifact output]
	221	+ H --> Q
	222	+ I --> Q
	223	+ J --> Q
	224	+ K --> Q
	225	+ L --> Q
	226	+ M --> Q
	227	+ N --> Q
	228	+ O --> Q
	229	+ P --> Q
	230	+ Q --> R[Write to disk / push to service]
	231	+```
	232	+
	233	+### Skill execution flow
	234	+
	235	+1. The `AgentContext` is populated with the knowledge graph, query engine, provider manager, and any planning entities from taxonomy classification
	236	+2. Each `Skill` checks `can_execute()` against the context (requires at minimum a knowledge graph and provider manager)
	237	+3. The skill's `execute()` method generates an `Artifact` with a name, content, type, and format
	238	+4. Artifacts are collected and can be exported to disk or pushed to external services (GitHub issues, wiki pages, etc.)
	239	+
	240	+---
	241	+
	242	+## Export pipeline
	243	+
	244	+The export pipeline converts knowledge graphs and analysis artifacts into various output formats.
	245	+
	246	+```mermaid
	247	+flowchart TD
	248	+ A[knowledge_graph.db] --> B{Export command}
	249	+ B --> C[export markdown]
	250	+ B --> D[export obsidian]
	251	+ B --> E[export notion]
	252	+ B --> F[export exchange]
	253	+ B --> G[wiki generate]
	254	+ B --> H[kg convert]
	255	+ C --> I[7 document types + entity briefs + CSV]
	256	+ D --> J[Obsidian vault with frontmatter + wiki-links]
	257	+ E --> K[Notion-compatible markdown + CSV database]
	258	+ F --> L[PlanOpticonExchange JSON payload]
	259	+ G --> M[GitHub wiki pages + sidebar + home]
	260	+ H --> N[Convert between .db / .json / .graphml / .csv]
	261	+```
	262	+
	263	+All export commands accept a `knowledge_graph.db` (or `.json`) path as input. No API key is required for template-based exports (markdown, obsidian, notion, wiki, exchange, convert). Only the planning agent skills that generate new content require a provider.
	264	+
	265	+---
	266	+
	267	+## How pipelines connect
	268	+
	269	+```mermaid
	270	+flowchart LR
	271	+ V[Video files] --> VP[Video Pipeline]
	272	+ D[Documents] --> DI[Document Ingestion]
	273	+ S[Cloud Sources] --> SC[Source Connectors]
	274	+ SC --> V
	275	+ SC --> D
	276	+ VP --> KG[(knowledge_graph.db)]
	277	+ DI --> KG
	278	+ KG --> QE[Query Engine]
	279	+ KG --> EP[Export Pipeline]
	280	+ KG --> PA[Planning Agent]
	281	+ PA --> AR[Artifacts]
	282	+ AR --> EP
	283	+```
	284	+
	285	+All pipelines converge on the knowledge graph as the central data store. The knowledge graph is the shared interface between ingestion (video or document), querying, exporting, and planning.
45	286
46		-1. Scan input directory for matching video files
47		-2. For each video: `process_single_video()` with error handling
48		-3. Merge knowledge graphs across all completed videos
49		-4. Generate batch summary with aggregated stats
50		-5. Write batch manifest
	287	+---
51	288
52	289	## Error handling
53	290
54		-- Individual video failures don't stop the batch
55		-- Failed videos are logged with error details in the manifest
56		-- Diagram analysis failures fall back to screengrabs
57		-- LLM extraction failures return empty results gracefully
	291	+Error handling follows consistent patterns across all pipelines:
	292	+
	293	+\| Scenario \| Behavior \|
	294	+\|----------\|----------\|
	295	+\| Video fails in batch \| Batch continues. Failed video recorded in manifest with error details. \|
	296	+\| Diagram analysis fails \| Falls back to screengrab (captioned screenshot). \|
	297	+\| LLM extraction fails \| Returns empty results gracefully. Key points and action items will be empty arrays. \|
	298	+\| Document processor not found \| Raises `ValueError` with list of supported extensions. \|
	299	+\| Source authentication fails \| Returns `False` from `authenticate()`. CLI prints error message. \|
	300	+\| Checkpoint file found \| Step is skipped entirely and results are loaded from disk. \|
	301	+\| Progress callback fails \| Warning logged. Pipeline continues without progress updates. \|
	302	+
	303	+---
	304	+
	305	+## Progress callback system
	306	+
	307	+The pipeline supports a `ProgressCallback` protocol for real-time progress tracking. This is used by the CLI's progress bars and can be implemented by external integrations (web UIs, CI systems, etc.).
	308	+
	309	+```python
	310	+from video_processor.models import ProgressCallback
	311	+
	312	+class MyCallback:
	313	+ def on_step_start(self, step: str, index: int, total: int) -> None:
	314	+ print(f"Starting step {index}/{total}: {step}")
	315	+
	316	+ def on_step_complete(self, step: str, index: int, total: int) -> None:
	317	+ print(f"Completed step {index}/{total}: {step}")
	318	+
	319	+ def on_progress(self, step: str, percent: float, message: str = "") -> None:
	320	+ print(f" {step}: {percent:.0%} {message}")
	321	+```
	322	+
	323	+Pass the callback to `process_single_video()`:
	324	+
	325	+```python
	326	+from video_processor.pipeline import process_single_video
	327	+
	328	+manifest = process_single_video(
	329	+ input_path="recording.mp4",
	330	+ output_dir="./output",
	331	+ progress_callback=MyCallback(),
	332	+)
	333	+```
	334	+
	335	+The callback methods are called within a try/except wrapper, so a failing callback never interrupts the pipeline. If a callback method raises an exception, a warning is logged and processing continues.
58	336

	--- docs/architecture/pipeline.md
	+++ docs/architecture/pipeline.md
	@@ -1,8 +1,14 @@
1	# Processing Pipeline




2
3	## Single video pipeline


4
5	```mermaid
6	sequenceDiagram
7	participant CLI
8	participant Pipeline
	@@ -9,49 +15,321 @@
9	participant FrameExtractor
10	participant AudioExtractor
11	participant Provider
12	participant DiagramAnalyzer
13	participant KnowledgeGraph

14
15	CLI->>Pipeline: process_single_video()


16	Pipeline->>FrameExtractor: extract_frames()
17	Note over FrameExtractor: Change detection + periodic capture (every 30s)



18	Pipeline->>Pipeline: filter_people_frames()
19	Note over Pipeline: OpenCV face detection removes webcam/people frames


20	Pipeline->>AudioExtractor: extract_audio()
21	Pipeline->>Provider: transcribe_audio()



22	Pipeline->>DiagramAnalyzer: process_frames()
23
24	loop Each frame
25	DiagramAnalyzer->>Provider: classify (vision)
26	alt High confidence diagram
27	DiagramAnalyzer->>Provider: full analysis

28	else Medium confidence
29	DiagramAnalyzer-->>Pipeline: screengrab fallback
30	end
31	end
32


33	Pipeline->>KnowledgeGraph: process_transcript()
34	Pipeline->>KnowledgeGraph: process_diagrams()



35	Pipeline->>Provider: extract key points
36	Pipeline->>Provider: extract action items
37	Pipeline->>Pipeline: generate reports
38	Pipeline->>Pipeline: export formats







39	Pipeline-->>CLI: VideoManifest
40	```















41
42	## Batch pipeline
43
44	The batch command wraps the single-video pipeline:





































































































































































































45
46	1. Scan input directory for matching video files
47	2. For each video: `process_single_video()` with error handling
48	3. Merge knowledge graphs across all completed videos
49	4. Generate batch summary with aggregated stats
50	5. Write batch manifest
51
52	## Error handling
53
54	- Individual video failures don't stop the batch
55	- Failed videos are logged with error details in the manifest
56	- Diagram analysis failures fall back to screengrabs
57	- LLM extraction failures return empty results gracefully









































58

	--- docs/architecture/pipeline.md
	+++ docs/architecture/pipeline.md
	@@ -1,8 +1,14 @@
1	# Processing Pipeline
2
3	PlanOpticon has four main pipelines: video analysis, document ingestion, source connector, and export. Each pipeline can operate independently, and they connect through the shared knowledge graph.
4
5	---
6
7	## Single video pipeline
8
9	The core video analysis pipeline processes a single video file through eight sequential steps with checkpoint/resume support.
10
11	```mermaid
12	sequenceDiagram
13	participant CLI
14	participant Pipeline
	@@ -9,49 +15,321 @@
15	participant FrameExtractor
16	participant AudioExtractor
17	participant Provider
18	participant DiagramAnalyzer
19	participant KnowledgeGraph
20	participant Exporter
21
22	CLI->>Pipeline: process_single_video()
23
24	Note over Pipeline: Step 1: Extract frames
25	Pipeline->>FrameExtractor: extract_frames()
26	Note over FrameExtractor: Change detection + periodic capture (every 30s)
27	FrameExtractor-->>Pipeline: frame_paths[]
28
29	Note over Pipeline: Step 2: Filter people frames
30	Pipeline->>Pipeline: filter_people_frames()
31	Note over Pipeline: OpenCV face detection removes webcam/people frames
32
33	Note over Pipeline: Step 3: Extract + transcribe audio
34	Pipeline->>AudioExtractor: extract_audio()
35	Pipeline->>Provider: transcribe_audio()
36	Note over Provider: Supports speaker hints via --speakers flag
37
38	Note over Pipeline: Step 4: Analyze visuals
39	Pipeline->>DiagramAnalyzer: process_frames()
40	loop Each frame (up to 10 standard / 20 comprehensive)

41	DiagramAnalyzer->>Provider: classify (vision)
42	alt High confidence diagram
43	DiagramAnalyzer->>Provider: full analysis
44	Note over Provider: Extract description, text, mermaid, chart data
45	else Medium confidence
46	DiagramAnalyzer-->>Pipeline: screengrab fallback
47	end
48	end
49
50	Note over Pipeline: Step 5: Build knowledge graph
51	Pipeline->>KnowledgeGraph: register_source()
52	Pipeline->>KnowledgeGraph: process_transcript()
53	Pipeline->>KnowledgeGraph: process_diagrams()
54	Note over KnowledgeGraph: Writes knowledge_graph.db (SQLite) + .json
55
56	Note over Pipeline: Step 6: Extract key points + action items
57	Pipeline->>Provider: extract key points
58	Pipeline->>Provider: extract action items
59
60	Note over Pipeline: Step 7: Generate report
61	Pipeline->>Pipeline: generate markdown report
62	Note over Pipeline: Includes mermaid diagrams, tables, cross-references
63
64	Note over Pipeline: Step 8: Export formats
65	Pipeline->>Exporter: export_all_formats()
66	Note over Exporter: HTML report, PDF, SVG/PNG renderings, chart reproductions
67
68	Pipeline-->>CLI: VideoManifest
69	```
70
71	### Pipeline steps in detail
72
73	\| Step \| Name \| Checkpointable \| Description \|
74	\|------\|------\|----------------\|-------------\|
75	\| 1 \| Extract frames \| Yes \| Change detection + periodic capture. Skipped if `frames/frame_*.jpg` exist on disk. \|
76	\| 2 \| Filter people frames \| No \| Inline with step 1. OpenCV face detection removes webcam frames. \|
77	\| 3 \| Extract + transcribe audio \| Yes \| Skipped if `transcript/transcript.json` exists. Speaker hints passed if `--speakers` provided. \|
78	\| 4 \| Analyze visuals \| Yes \| Skipped if `diagrams/` is populated. Evenly samples frames (not just first N). \|
79	\| 5 \| Build knowledge graph \| Yes \| Skipped if `results/knowledge_graph.db` exists. Registers source, processes transcript and diagrams. \|
80	\| 6 \| Extract key points + actions \| Yes \| Skipped if `results/key_points.json` and `results/action_items.json` exist. \|
81	\| 7 \| Generate report \| Yes \| Skipped if `results/analysis.md` exists. \|
82	\| 8 \| Export formats \| No \| Always runs. Renders mermaid to SVG/PNG, reproduces charts, generates HTML/PDF. \|
83
84	---
85
86	## Batch pipeline
87
88	The batch pipeline wraps the single-video pipeline and adds cross-video knowledge graph merging.
89
90	```mermaid
91	flowchart TD
92	A[Scan input directory] --> B[Match video files by pattern]
93	B --> C{For each video}
94	C --> D[process_single_video]
95	D --> E{Success?}
96	E -->\|Yes\| F[Collect manifest + KG]
97	E -->\|No\| G[Log error, continue]
98	F --> H[Next video]
99	G --> H
100	H --> C
101	C -->\|All done\| I[Merge knowledge graphs]
102	I --> J[Fuzzy matching + conflict resolution]
103	J --> K[Generate batch summary]
104	K --> L[Write batch manifest]
105	L --> M[batch_manifest.json + batch_summary.md + merged KG]
106	```
107
108	### Knowledge graph merge strategy
109
110	During batch merging, `KnowledgeGraph.merge()` applies:
111
112	1. Case-insensitive exact matching for entity names
113	2. Fuzzy matching via `SequenceMatcher` (threshold >= 0.85) for near-duplicates
114	3. Type conflict resolution using a specificity ranking (e.g., `technology` > `concept`)
115	4. Description union across all sources
116	5. Relationship deduplication by (source, target, type) tuple
117
118	---
119
120	## Document ingestion pipeline
121
122	The document ingestion pipeline processes files (Markdown, plaintext, PDF) into knowledge graphs without video analysis.
123
124	```mermaid
125	flowchart TD
126	A[Input: file or directory] --> B{File or directory?}
127	B -->\|File\| C[get_processor by extension]
128	B -->\|Directory\| D[Glob for supported extensions]
129	D --> E{Recursive?}
130	E -->\|Yes\| F[rglob all files]
131	E -->\|No\| G[glob top-level only]
132	F --> H[For each file]
133	G --> H
134	H --> C
135	C --> I[DocumentProcessor.process]
136	I --> J[DocumentChunk list]
137	J --> K[Register source in KG]
138	K --> L[Add chunks as content]
139	L --> M[KG extracts entities + relationships]
140	M --> N[knowledge_graph.db]
141	```
142
143	### Supported document types
144
145	\| Extension \| Processor \| Notes \|
146	\|-----------\|-----------\|-------\|
147	\| `.md` \| `MarkdownProcessor` \| Splits by headings into sections \|
148	\| `.txt` \| `PlaintextProcessor` \| Splits into fixed-size chunks \|
149	\| `.pdf` \| `PdfProcessor` \| Requires `pymupdf` or `pdfplumber`. Falls back gracefully between libraries. \|
150
151	### Adding documents to an existing graph
152
153	The `--db-path` flag lets you ingest documents into an existing knowledge graph:
154
155	```bash
156	planopticon ingest spec.md --db-path existing.db
157	planopticon ingest ./docs/ -o ./output --recursive
158	```
159
160	---
161
162	## Source connector pipeline
163
164	Source connectors fetch content from cloud services, note-taking apps, and web sources. Each source implements the `BaseSource` ABC with three methods: `authenticate()`, `list_videos()`, and `download()`.
165
166	```mermaid
167	flowchart TD
168	A[Source command] --> B[Authenticate with provider]
169	B --> C{Auth success?}
170	C -->\|No\| D[Error: check credentials]
171	C -->\|Yes\| E[List files in folder]
172	E --> F[Filter by pattern / type]
173	F --> G[Download to local path]
174	G --> H{Analyze or ingest?}
175	H -->\|Video\| I[process_single_video / batch]
176	H -->\|Document\| J[ingest_file / ingest_directory]
177	I --> K[Knowledge graph]
178	J --> K
179	```
180
181	### Available sources
182
183	PlanOpticon includes connectors for:
184
185	\| Category \| Sources \|
186	\|----------\|---------\|
187	\| Cloud storage \| Google Drive, S3, Dropbox \|
188	\| Meeting recordings \| Zoom, Google Meet, Microsoft Teams \|
189	\| Productivity suites \| Google Workspace (Docs/Sheets/Slides), Microsoft 365 (SharePoint/OneDrive/OneNote) \|
190	\| Note-taking apps \| Obsidian, Logseq, Apple Notes, Google Keep, Notion \|
191	\| Web sources \| YouTube, Web (URL), RSS, Podcasts \|
192	\| Developer platforms \| GitHub, arXiv \|
193	\| Social media \| Reddit, Twitter/X, Hacker News \|
194
195	Each source authenticates via environment variables (API keys, OAuth tokens) specific to the provider.
196
197	---
198
199	## Planning agent pipeline
200
201	The planning agent consumes a knowledge graph and uses registered skills to generate planning artifacts.
202
203	```mermaid
204	flowchart TD
205	A[Knowledge graph] --> B[Load into AgentContext]
206	B --> C[GraphQueryEngine]
207	C --> D[Taxonomy classification]
208	D --> E[Agent orchestrator]
209	E --> F{Select skill}
210	F --> G[ProjectPlan skill]
211	F --> H[PRD skill]
212	F --> I[Roadmap skill]
213	F --> J[TaskBreakdown skill]
214	F --> K[DocGenerator skill]
215	F --> L[WikiGenerator skill]
216	F --> M[NotesExport skill]
217	F --> N[ArtifactExport skill]
218	F --> O[GitHubIntegration skill]
219	F --> P[RequirementsChat skill]
220	G --> Q[Artifact output]
221	H --> Q
222	I --> Q
223	J --> Q
224	K --> Q
225	L --> Q
226	M --> Q
227	N --> Q
228	O --> Q
229	P --> Q
230	Q --> R[Write to disk / push to service]
231	```
232
233	### Skill execution flow
234
235	1. The `AgentContext` is populated with the knowledge graph, query engine, provider manager, and any planning entities from taxonomy classification
236	2. Each `Skill` checks `can_execute()` against the context (requires at minimum a knowledge graph and provider manager)
237	3. The skill's `execute()` method generates an `Artifact` with a name, content, type, and format
238	4. Artifacts are collected and can be exported to disk or pushed to external services (GitHub issues, wiki pages, etc.)
239
240	---
241
242	## Export pipeline
243
244	The export pipeline converts knowledge graphs and analysis artifacts into various output formats.
245
246	```mermaid
247	flowchart TD
248	A[knowledge_graph.db] --> B{Export command}
249	B --> C[export markdown]
250	B --> D[export obsidian]
251	B --> E[export notion]
252	B --> F[export exchange]
253	B --> G[wiki generate]
254	B --> H[kg convert]
255	C --> I[7 document types + entity briefs + CSV]
256	D --> J[Obsidian vault with frontmatter + wiki-links]
257	E --> K[Notion-compatible markdown + CSV database]
258	F --> L[PlanOpticonExchange JSON payload]
259	G --> M[GitHub wiki pages + sidebar + home]
260	H --> N[Convert between .db / .json / .graphml / .csv]
261	```
262
263	All export commands accept a `knowledge_graph.db` (or `.json`) path as input. No API key is required for template-based exports (markdown, obsidian, notion, wiki, exchange, convert). Only the planning agent skills that generate new content require a provider.
264
265	---
266
267	## How pipelines connect
268
269	```mermaid
270	flowchart LR
271	V[Video files] --> VP[Video Pipeline]
272	D[Documents] --> DI[Document Ingestion]
273	S[Cloud Sources] --> SC[Source Connectors]
274	SC --> V
275	SC --> D
276	VP --> KG[(knowledge_graph.db)]
277	DI --> KG
278	KG --> QE[Query Engine]
279	KG --> EP[Export Pipeline]
280	KG --> PA[Planning Agent]
281	PA --> AR[Artifacts]
282	AR --> EP
283	```
284
285	All pipelines converge on the knowledge graph as the central data store. The knowledge graph is the shared interface between ingestion (video or document), querying, exporting, and planning.
286
287	---




288
289	## Error handling
290
291	Error handling follows consistent patterns across all pipelines:
292
293	\| Scenario \| Behavior \|
294	\|----------\|----------\|
295	\| Video fails in batch \| Batch continues. Failed video recorded in manifest with error details. \|
296	\| Diagram analysis fails \| Falls back to screengrab (captioned screenshot). \|
297	\| LLM extraction fails \| Returns empty results gracefully. Key points and action items will be empty arrays. \|
298	\| Document processor not found \| Raises `ValueError` with list of supported extensions. \|
299	\| Source authentication fails \| Returns `False` from `authenticate()`. CLI prints error message. \|
300	\| Checkpoint file found \| Step is skipped entirely and results are loaded from disk. \|
301	\| Progress callback fails \| Warning logged. Pipeline continues without progress updates. \|
302
303	---
304
305	## Progress callback system
306
307	The pipeline supports a `ProgressCallback` protocol for real-time progress tracking. This is used by the CLI's progress bars and can be implemented by external integrations (web UIs, CI systems, etc.).
308
309	```python
310	from video_processor.models import ProgressCallback
311
312	class MyCallback:
313	def on_step_start(self, step: str, index: int, total: int) -> None:
314	print(f"Starting step {index}/{total}: {step}")
315
316	def on_step_complete(self, step: str, index: int, total: int) -> None:
317	print(f"Completed step {index}/{total}: {step}")
318
319	def on_progress(self, step: str, percent: float, message: str = "") -> None:
320	print(f" {step}: {percent:.0%} {message}")
321	```
322
323	Pass the callback to `process_single_video()`:
324
325	```python
326	from video_processor.pipeline import process_single_video
327
328	manifest = process_single_video(
329	input_path="recording.mp4",
330	output_dir="./output",
331	progress_callback=MyCallback(),
332	)
333	```
334
335	The callback methods are called within a try/except wrapper, so a failing callback never interrupts the pipeline. If a callback method raises an exception, a warning is logged and processing continues.
336

M docs/contributing.md

+444 -13

		--- docs/contributing.md
		+++ docs/contributing.md
		@@ -10,54 +10,485 @@
10	10	pip install -e ".[dev]"
11	11	```
12	12
13	13	## Running tests
14	14
	15	+PlanOpticon has 822+ tests covering providers, pipeline stages, document processors, knowledge graph operations, exporters, skills, and CLI commands.
	16	+
15	17	```bash
16	18	# Run all tests
17	19	pytest tests/ -v
18	20
19	21	# Run with coverage
20	22	pytest tests/ --cov=video_processor --cov-report=html
21	23
22	24	# Run a specific test file
23	25	pytest tests/test_models.py -v
	26	+
	27	+# Run tests matching a keyword
	28	+pytest tests/ -k "test_knowledge_graph" -v
	29	+
	30	+# Run only fast tests (skip slow integration tests)
	31	+pytest tests/ -m "not slow" -v
	32	+```
	33	+
	34	+### Test conventions
	35	+
	36	+- All tests live in the `tests/` directory, mirroring the `video_processor/` package structure
	37	+- Test files are named `test_<module>.py`
	38	+- Use `pytest` as the test runner -- do not use `unittest.TestCase` unless necessary for specific setup/teardown patterns
	39	+- Mock external API calls. Never make real API calls in tests. Use `unittest.mock.patch` or `pytest-mock` fixtures to mock provider responses.
	40	+- Use `tmp_path` (pytest fixture) for any tests that write files to disk
	41	+- Fixtures shared across test files go in `conftest.py`
	42	+- For testing CLI commands, use `click.testing.CliRunner`
	43	+- For testing provider implementations, mock at the HTTP client level (e.g., patch `requests.post` or the provider's SDK client)
	44	+
	45	+### Mocking patterns
	46	+
	47	+```python
	48	+# Mocking a provider's chat method
	49	+from unittest.mock import MagicMock, patch
	50	+
	51	+def test_key_point_extraction():
	52	+ pm = MagicMock()
	53	+ pm.chat.return_value = '["Point 1", "Point 2"]'
	54	+ result = extract_key_points(pm, "transcript text")
	55	+ assert len(result) == 2
	56	+
	57	+# Mocking an external API at the HTTP level
	58	+@patch("requests.post")
	59	+def test_provider_chat(mock_post):
	60	+ mock_post.return_value.json.return_value = {
	61	+ "choices": [{"message": {"content": "response"}}]
	62	+ }
	63	+ provider = OpenAIProvider(api_key="test")
	64	+ result = provider.chat([{"role": "user", "content": "hello"}])
	65	+ assert result == "response"
24	66	```
25	67
26	68	## Code style
27	69
28	70	We use:
29	71
30		-- Ruff for linting
31		-- Black for formatting (100 char line length)
32		-- isort for import sorting
	72	+- Ruff for both linting and formatting (100 char line length)
33	73	- mypy for type checking
	74	+
	75	+Ruff handles all linting (error, warning, pyflakes, and import sorting rules) and formatting in a single tool. There is no need to run Black or isort separately.
34	76
35	77	```bash
	78	+# Lint
36	79	ruff check video_processor/
37		-black video_processor/
38		-isort video_processor/
	80	+
	81	+# Format
	82	+ruff format video_processor/
	83	+
	84	+# Auto-fix lint issues
	85	+ruff check video_processor/ --fix
	86	+
	87	+# Type check
39	88	mypy video_processor/ --ignore-missing-imports
40	89	```
41	90
	91	+### Ruff configuration
	92	+
	93	+The project's `pyproject.toml` configures ruff as follows:
	94	+
	95	+```toml
	96	+[tool.ruff]
	97	+line-length = 100
	98	+target-version = "py310"
	99	+
	100	+[tool.ruff.lint]
	101	+select = ["E", "F", "W", "I"]
	102	+```
	103	+
	104	+The `I` rule set covers import sorting (equivalent to isort), so imports are automatically organized by ruff.
	105	+
42	106	## Project structure
43	107
44		-See [Architecture Overview](architecture/overview.md) for the module structure.
	108	+```
	109	+PlanOpticon/
	110	+├── video_processor/
	111	+│ ├── cli/ # Click CLI commands
	112	+│ │ └── commands.py
	113	+│ ├── providers/ # LLM/API provider implementations
	114	+│ │ ├── base.py # BaseProvider, ProviderRegistry
	115	+│ │ ├── manager.py # ProviderManager
	116	+│ │ ├── discovery.py # Auto-discovery of available providers
	117	+│ │ ├── openai_provider.py
	118	+│ │ ├── anthropic_provider.py
	119	+│ │ ├── gemini_provider.py
	120	+│ │ └── ... # 15+ provider implementations
	121	+│ ├── sources/ # Cloud and web source connectors
	122	+│ │ ├── base.py # BaseSource, SourceFile
	123	+│ │ ├── google_drive.py
	124	+│ │ ├── zoom_source.py
	125	+│ │ └── ... # 20+ source implementations
	126	+│ ├── processors/ # Document processors
	127	+│ │ ├── base.py # DocumentProcessor, registry
	128	+│ │ ├── ingest.py # File/directory ingestion
	129	+│ │ ├── markdown_processor.py
	130	+│ │ ├── pdf_processor.py
	131	+│ │ └── __init__.py # Auto-registration of built-in processors
	132	+│ ├── integrators/ # Knowledge graph and analysis
	133	+│ │ ├── knowledge_graph.py # KnowledgeGraph class
	134	+│ │ ├── graph_store.py # SQLite graph storage
	135	+│ │ ├── graph_query.py # GraphQueryEngine
	136	+│ │ ├── graph_discovery.py # Auto-find knowledge_graph.db
	137	+│ │ └── taxonomy.py # Planning taxonomy classifier
	138	+│ ├── agent/ # Planning agent
	139	+│ │ ├── orchestrator.py # Agent orchestration
	140	+│ │ └── skills/ # Skill implementations
	141	+│ │ ├── base.py # Skill ABC, registry, Artifact
	142	+│ │ ├── project_plan.py
	143	+│ │ ├── prd.py
	144	+│ │ ├── roadmap.py
	145	+│ │ ├── task_breakdown.py
	146	+│ │ ├── doc_generator.py
	147	+│ │ ├── wiki_generator.py
	148	+│ │ ├── notes_export.py
	149	+│ │ ├── artifact_export.py
	150	+│ │ ├── github_integration.py
	151	+│ │ ├── requirements_chat.py
	152	+│ │ ├── cli_adapter.py
	153	+│ │ └── __init__.py # Auto-registration of skills
	154	+│ ├── exporters/ # Output format exporters
	155	+│ │ ├── __init__.py
	156	+│ │ └── markdown.py # Template-based markdown generation
	157	+│ ├── utils/ # Shared utilities
	158	+│ │ ├── export.py # Multi-format export orchestration
	159	+│ │ ├── rendering.py # Mermaid/chart rendering
	160	+│ │ ├── prompt_templates.py
	161	+│ │ ├── callbacks.py # Progress callback helpers
	162	+│ │ └── ...
	163	+│ ├── exchange.py # PlanOpticonExchange format
	164	+│ ├── pipeline.py # Main video processing pipeline
	165	+│ ├── models.py # Pydantic data models
	166	+│ └── output_structure.py # Output directory helpers
	167	+├── tests/ # 822+ tests
	168	+├── knowledge-base/ # Local-first graph tools
	169	+│ ├── viewer.html # Self-contained D3.js graph viewer
	170	+│ └── query.py # Python query script (NetworkX)
	171	+├── docs/ # MkDocs documentation
	172	+└── pyproject.toml # Project configuration
	173	+```
	174	+
	175	+See [Architecture Overview](architecture/overview.md) for a more detailed breakdown of module responsibilities.
45	176
46	177	## Adding a new provider
47	178
	179	+Providers self-register via `ProviderRegistry.register()` at module level. When the provider module is imported, it registers itself automatically.
	180	+
48	181	1. Create `video_processor/providers/your_provider.py`
49	182	2. Extend `BaseProvider` from `video_processor/providers/base.py`
50		-3. Implement `chat()`, `analyze_image()`, `transcribe_audio()`, `list_models()`
51		-4. Register in `video_processor/providers/discovery.py`
52		-5. Add tests in `tests/test_providers.py`
	183	+3. Implement the four required methods: `chat()`, `analyze_image()`, `transcribe_audio()`, `list_models()`
	184	+4. Call `ProviderRegistry.register()` at module level
	185	+5. Add the import to `video_processor/providers/manager.py` in the lazy-import block
	186	+6. Add tests in `tests/test_providers.py`
	187	+
	188	+### Example provider skeleton
	189	+
	190	+```python
	191	+"""Your provider implementation."""
	192	+
	193	+from video_processor.providers.base import BaseProvider, ModelInfo, ProviderRegistry
	194	+
	195	+
	196	+class YourProvider(BaseProvider):
	197	+ provider_name = "yourprovider"
	198	+
	199	+ def __init__(self, api_key: str \| None = None):
	200	+ import os
	201	+ self.api_key = api_key or os.environ.get("YOUR_API_KEY", "")
	202	+
	203	+ def chat(self, messages, max_tokens=4096, temperature=0.7, model=None):
	204	+ # Implement chat completion
	205	+ ...
	206	+
	207	+ def analyze_image(self, image_bytes, prompt, max_tokens=4096, model=None):
	208	+ # Implement image analysis
	209	+ ...
	210	+
	211	+ def transcribe_audio(self, audio_path, language=None, model=None):
	212	+ # Implement audio transcription (or raise NotImplementedError)
	213	+ ...
	214	+
	215	+ def list_models(self):
	216	+ return [ModelInfo(id="your-model", provider="yourprovider", capabilities=["chat"])]
	217	+
	218	+
	219	+# Self-registration at import time
	220	+ProviderRegistry.register(
	221	+ "yourprovider",
	222	+ YourProvider,
	223	+ env_var="YOUR_API_KEY",
	224	+ model_prefixes=["your-"],
	225	+ default_models={"chat": "your-model"},
	226	+)
	227	+```
	228	+
	229	+### OpenAI-compatible providers
	230	+
	231	+For providers that use the OpenAI API format, extend `OpenAICompatibleProvider` instead of `BaseProvider`. This provides default implementations of `chat()`, `analyze_image()`, and `list_models()` -- you only need to configure the base URL and model mappings.
	232	+
	233	+```python
	234	+from video_processor.providers.base import OpenAICompatibleProvider, ProviderRegistry
	235	+
	236	+class YourProvider(OpenAICompatibleProvider):
	237	+ provider_name = "yourprovider"
	238	+ base_url = "https://api.yourprovider.com/v1"
	239	+ env_var = "YOUR_API_KEY"
	240	+
	241	+ProviderRegistry.register("yourprovider", YourProvider, env_var="YOUR_API_KEY")
	242	+```
53	243
54	244	## Adding a new cloud source
55	245
	246	+Source connectors implement the `BaseSource` ABC from `video_processor/sources/base.py`. Authentication is handled per-source, typically via environment variables.
	247	+
56	248	1. Create `video_processor/sources/your_source.py`
57		-2. Implement auth flow and file listing/downloading
58		-3. Add CLI integration in `video_processor/cli/commands.py`
59		-4. Add tests and docs
	249	+2. Extend `BaseSource`
	250	+3. Implement `authenticate()`, `list_videos()`, and `download()`
	251	+4. Add the class to the lazy-import map in `video_processor/sources/__init__.py`
	252	+5. Add CLI commands in `video_processor/cli/commands.py` if needed
	253	+6. Add tests and documentation
	254	+
	255	+### Example source skeleton
	256	+
	257	+```python
	258	+"""Your source integration."""
	259	+
	260	+import os
	261	+import logging
	262	+from pathlib import Path
	263	+from typing import List, Optional
	264	+
	265	+from video_processor.sources.base import BaseSource, SourceFile
	266	+
	267	+logger = logging.getLogger(__name__)
	268	+
	269	+
	270	+class YourSource(BaseSource):
	271	+ def __init__(self, api_key: Optional[str] = None):
	272	+ self.api_key = api_key or os.environ.get("YOUR_SOURCE_KEY", "")
	273	+
	274	+ def authenticate(self) -> bool:
	275	+ """Validate credentials. Return True on success."""
	276	+ if not self.api_key:
	277	+ logger.error("API key not set. Set YOUR_SOURCE_KEY env var.")
	278	+ return False
	279	+ # Make a test API call to verify credentials
	280	+ ...
	281	+ return True
	282	+
	283	+ def list_videos(
	284	+ self,
	285	+ folder_id: Optional[str] = None,
	286	+ folder_path: Optional[str] = None,
	287	+ patterns: Optional[List[str]] = None,
	288	+ ) -> List[SourceFile]:
	289	+ """List available video files."""
	290	+ ...
	291	+
	292	+ def download(self, file: SourceFile, destination: Path) -> Path:
	293	+ """Download a single file. Return the local path."""
	294	+ destination.parent.mkdir(parents=True, exist_ok=True)
	295	+ # Download file content to destination
	296	+ ...
	297	+ return destination
	298	+```
	299	+
	300	+### Registering in `__init__.py`
	301	+
	302	+Add your source to the `__all__` list and the `_lazy_map` dictionary in `video_processor/sources/__init__.py`:
	303	+
	304	+```python
	305	+__all__ = [
	306	+ ...
	307	+ "YourSource",
	308	+]
	309	+
	310	+_lazy_map = {
	311	+ ...
	312	+ "YourSource": "video_processor.sources.your_source",
	313	+}
	314	+```
	315	+
	316	+## Adding a new skill
	317	+
	318	+Agent skills extend the `Skill` ABC from `video_processor/agent/skills/base.py` and self-register via `register_skill()`.
	319	+
	320	+1. Create `video_processor/agent/skills/your_skill.py`
	321	+2. Extend `Skill` and set `name` and `description` class attributes
	322	+3. Implement `execute()` to return an `Artifact`
	323	+4. Optionally override `can_execute()` for custom precondition checks
	324	+5. Call `register_skill()` at module level
	325	+6. Add the import to `video_processor/agent/skills/__init__.py`
	326	+7. Add tests
	327	+
	328	+### Example skill skeleton
	329	+
	330	+```python
	331	+"""Your custom skill."""
	332	+
	333	+from video_processor.agent.skills.base import AgentContext, Artifact, Skill, register_skill
	334	+
	335	+
	336	+class YourSkill(Skill):
	337	+ name = "your_skill"
	338	+ description = "Generates a custom artifact from the knowledge graph."
	339	+
	340	+ def execute(self, context: AgentContext, **kwargs) -> Artifact:
	341	+ """Generate the artifact."""
	342	+ kg_data = context.knowledge_graph.to_dict()
	343	+ # Build content from knowledge graph data
	344	+ content = f"# Your Artifact\n\n{len(kg_data.get('entities', []))} entities found."
	345	+ return Artifact(
	346	+ name="your_artifact",
	347	+ content=content,
	348	+ artifact_type="document",
	349	+ format="markdown",
	350	+ )
	351	+
	352	+ def can_execute(self, context: AgentContext) -> bool:
	353	+ """Check prerequisites (default requires KG + provider)."""
	354	+ return context.knowledge_graph is not None
	355	+
	356	+
	357	+# Self-registration at import time
	358	+register_skill(YourSkill())
	359	+```
	360	+
	361	+### Registering in `__init__.py`
	362	+
	363	+Add the import to `video_processor/agent/skills/__init__.py` so the skill is loaded (and self-registered) when the skills package is imported:
	364	+
	365	+```python
	366	+from video_processor.agent.skills import (
	367	+ ...
	368	+ your_skill, # noqa: F401
	369	+)
	370	+```
	371	+
	372	+## Adding a new document processor
	373	+
	374	+Document processors extend the `DocumentProcessor` ABC from `video_processor/processors/base.py` and are registered via `register_processor()`.
	375	+
	376	+1. Create `video_processor/processors/your_processor.py`
	377	+2. Extend `DocumentProcessor`
	378	+3. Set `supported_extensions` class attribute
	379	+4. Implement `process()` (returns `List[DocumentChunk]`) and `can_process()`
	380	+5. Call `register_processor()` at module level
	381	+6. Add the import to `video_processor/processors/__init__.py`
	382	+7. Add tests
	383	+
	384	+### Example processor skeleton
	385	+
	386	+```python
	387	+"""Your document processor."""
	388	+
	389	+from pathlib import Path
	390	+from typing import List
	391	+
	392	+from video_processor.processors.base import (
	393	+ DocumentChunk,
	394	+ DocumentProcessor,
	395	+ register_processor,
	396	+)
	397	+
	398	+
	399	+class YourProcessor(DocumentProcessor):
	400	+ supported_extensions = [".xyz", ".abc"]
	401	+
	402	+ def can_process(self, path: Path) -> bool:
	403	+ return path.suffix.lower() in self.supported_extensions
	404	+
	405	+ def process(self, path: Path) -> List[DocumentChunk]:
	406	+ text = path.read_text()
	407	+ # Split into chunks as appropriate for your format
	408	+ return [
	409	+ DocumentChunk(
	410	+ text=text,
	411	+ source_file=str(path),
	412	+ chunk_index=0,
	413	+ metadata={"format": "xyz"},
	414	+ )
	415	+ ]
	416	+
	417	+
	418	+# Self-registration at import time
	419	+register_processor([".xyz", ".abc"], YourProcessor)
	420	+```
	421	+
	422	+### Registering in `__init__.py`
	423	+
	424	+Add the import to `video_processor/processors/__init__.py`:
	425	+
	426	+```python
	427	+from video_processor.processors import (
	428	+ markdown_processor, # noqa: F401, E402
	429	+ pdf_processor, # noqa: F401, E402
	430	+ your_processor, # noqa: F401, E402
	431	+)
	432	+```
	433	+
	434	+## Adding a new exporter
	435	+
	436	+Exporters live in `video_processor/exporters/` and are typically called from CLI commands. There is no strict ABC for exporters -- they are plain functions that accept knowledge graph data and an output directory.
	437	+
	438	+1. Create `video_processor/exporters/your_exporter.py`
	439	+2. Implement one or more export functions that accept KG data (as a dict) and an output path
	440	+3. Add CLI integration in `video_processor/cli/commands.py` under the `export` group
	441	+4. Add tests
	442	+
	443	+### Example exporter skeleton
	444	+
	445	+```python
	446	+"""Your exporter."""
	447	+
	448	+import json
	449	+from pathlib import Path
	450	+from typing import List
	451	+
	452	+
	453	+def export_your_format(kg_data: dict, output_dir: Path) -> List[Path]:
	454	+ """Export knowledge graph data in your format.
	455	+
	456	+ Args:
	457	+ kg_data: Knowledge graph as a dict (from KnowledgeGraph.to_dict()).
	458	+ output_dir: Directory to write output files.
	459	+
	460	+ Returns:
	461	+ List of created file paths.
	462	+ """
	463	+ output_dir.mkdir(parents=True, exist_ok=True)
	464	+ created = []
	465	+
	466	+ output_file = output_dir / "export.xyz"
	467	+ output_file.write_text(json.dumps(kg_data, indent=2))
	468	+ created.append(output_file)
	469	+
	470	+ return created
	471	+```
	472	+
	473	+### Adding the CLI command
	474	+
	475	+Add a subcommand under the `export` group in `video_processor/cli/commands.py`:
	476	+
	477	+```python
	478	+@export.command("your-format")
	479	+@click.argument("db_path", type=click.Path(exists=True))
	480	+@click.option("-o", "--output", type=click.Path(), default=None)
	481	+def export_your_format_cmd(db_path, output):
	482	+ """Export knowledge graph in your format."""
	483	+ from video_processor.exporters.your_exporter import export_your_format
	484	+ from video_processor.integrators.knowledge_graph import KnowledgeGraph
	485	+
	486	+ kg = KnowledgeGraph(db_path=Path(db_path))
	487	+ out_dir = Path(output) if output else Path.cwd() / "your-export"
	488	+ created = export_your_format(kg.to_dict(), out_dir)
	489	+ click.echo(f"Exported {len(created)} files to {out_dir}/")
	490	+```
60	491
61	492	## License
62	493
63		-MIT License — Copyright (c) 2025 CONFLICT LLC. All rights reserved.
	494	+MIT License -- Copyright (c) 2026 CONFLICT LLC. All rights reserved.
64	495
65	496	ADDED docs/faq.md

	--- docs/contributing.md
	+++ docs/contributing.md
	@@ -10,54 +10,485 @@
10	pip install -e ".[dev]"
11	```
12
13	## Running tests
14


15	```bash
16	# Run all tests
17	pytest tests/ -v
18
19	# Run with coverage
20	pytest tests/ --cov=video_processor --cov-report=html
21
22	# Run a specific test file
23	pytest tests/test_models.py -v








































24	```
25
26	## Code style
27
28	We use:
29
30	- Ruff for linting
31	- Black for formatting (100 char line length)
32	- isort for import sorting
33	- mypy for type checking


34
35	```bash

36	ruff check video_processor/
37	black video_processor/
38	isort video_processor/






39	mypy video_processor/ --ignore-missing-imports
40	```
41















42	## Project structure
43
44	See [Architecture Overview](architecture/overview.md) for the module structure.



































































45
46	## Adding a new provider
47


48	1. Create `video_processor/providers/your_provider.py`
49	2. Extend `BaseProvider` from `video_processor/providers/base.py`
50	3. Implement `chat()`, `analyze_image()`, `transcribe_audio()`, `list_models()`
51	4. Register in `video_processor/providers/discovery.py`
52	5. Add tests in `tests/test_providers.py`

























































53
54	## Adding a new cloud source
55


56	1. Create `video_processor/sources/your_source.py`
57	2. Implement auth flow and file listing/downloading
58	3. Add CLI integration in `video_processor/cli/commands.py`
59	4. Add tests and docs















































































































































































































































60
61	## License
62
63	MIT License — Copyright (c) 2025 CONFLICT LLC. All rights reserved.
64
65	DDED docs/faq.md

	--- docs/contributing.md
	+++ docs/contributing.md
	@@ -10,54 +10,485 @@
10	pip install -e ".[dev]"
11	```
12
13	## Running tests
14
15	PlanOpticon has 822+ tests covering providers, pipeline stages, document processors, knowledge graph operations, exporters, skills, and CLI commands.
16
17	```bash
18	# Run all tests
19	pytest tests/ -v
20
21	# Run with coverage
22	pytest tests/ --cov=video_processor --cov-report=html
23
24	# Run a specific test file
25	pytest tests/test_models.py -v
26
27	# Run tests matching a keyword
28	pytest tests/ -k "test_knowledge_graph" -v
29
30	# Run only fast tests (skip slow integration tests)
31	pytest tests/ -m "not slow" -v
32	```
33
34	### Test conventions
35
36	- All tests live in the `tests/` directory, mirroring the `video_processor/` package structure
37	- Test files are named `test_<module>.py`
38	- Use `pytest` as the test runner -- do not use `unittest.TestCase` unless necessary for specific setup/teardown patterns
39	- Mock external API calls. Never make real API calls in tests. Use `unittest.mock.patch` or `pytest-mock` fixtures to mock provider responses.
40	- Use `tmp_path` (pytest fixture) for any tests that write files to disk
41	- Fixtures shared across test files go in `conftest.py`
42	- For testing CLI commands, use `click.testing.CliRunner`
43	- For testing provider implementations, mock at the HTTP client level (e.g., patch `requests.post` or the provider's SDK client)
44
45	### Mocking patterns
46
47	```python
48	# Mocking a provider's chat method
49	from unittest.mock import MagicMock, patch
50
51	def test_key_point_extraction():
52	pm = MagicMock()
53	pm.chat.return_value = '["Point 1", "Point 2"]'
54	result = extract_key_points(pm, "transcript text")
55	assert len(result) == 2
56
57	# Mocking an external API at the HTTP level
58	@patch("requests.post")
59	def test_provider_chat(mock_post):
60	mock_post.return_value.json.return_value = {
61	"choices": [{"message": {"content": "response"}}]
62	}
63	provider = OpenAIProvider(api_key="test")
64	result = provider.chat([{"role": "user", "content": "hello"}])
65	assert result == "response"
66	```
67
68	## Code style
69
70	We use:
71
72	- Ruff for both linting and formatting (100 char line length)


73	- mypy for type checking
74
75	Ruff handles all linting (error, warning, pyflakes, and import sorting rules) and formatting in a single tool. There is no need to run Black or isort separately.
76
77	```bash
78	# Lint
79	ruff check video_processor/
80
81	# Format
82	ruff format video_processor/
83
84	# Auto-fix lint issues
85	ruff check video_processor/ --fix
86
87	# Type check
88	mypy video_processor/ --ignore-missing-imports
89	```
90
91	### Ruff configuration
92
93	The project's `pyproject.toml` configures ruff as follows:
94
95	```toml
96	[tool.ruff]
97	line-length = 100
98	target-version = "py310"
99
100	[tool.ruff.lint]
101	select = ["E", "F", "W", "I"]
102	```
103
104	The `I` rule set covers import sorting (equivalent to isort), so imports are automatically organized by ruff.
105
106	## Project structure
107
108	```
109	PlanOpticon/
110	├── video_processor/
111	│ ├── cli/ # Click CLI commands
112	│ │ └── commands.py
113	│ ├── providers/ # LLM/API provider implementations
114	│ │ ├── base.py # BaseProvider, ProviderRegistry
115	│ │ ├── manager.py # ProviderManager
116	│ │ ├── discovery.py # Auto-discovery of available providers
117	│ │ ├── openai_provider.py
118	│ │ ├── anthropic_provider.py
119	│ │ ├── gemini_provider.py
120	│ │ └── ... # 15+ provider implementations
121	│ ├── sources/ # Cloud and web source connectors
122	│ │ ├── base.py # BaseSource, SourceFile
123	│ │ ├── google_drive.py
124	│ │ ├── zoom_source.py
125	│ │ └── ... # 20+ source implementations
126	│ ├── processors/ # Document processors
127	│ │ ├── base.py # DocumentProcessor, registry
128	│ │ ├── ingest.py # File/directory ingestion
129	│ │ ├── markdown_processor.py
130	│ │ ├── pdf_processor.py
131	│ │ └── __init__.py # Auto-registration of built-in processors
132	│ ├── integrators/ # Knowledge graph and analysis
133	│ │ ├── knowledge_graph.py # KnowledgeGraph class
134	│ │ ├── graph_store.py # SQLite graph storage
135	│ │ ├── graph_query.py # GraphQueryEngine
136	│ │ ├── graph_discovery.py # Auto-find knowledge_graph.db
137	│ │ └── taxonomy.py # Planning taxonomy classifier
138	│ ├── agent/ # Planning agent
139	│ │ ├── orchestrator.py # Agent orchestration
140	│ │ └── skills/ # Skill implementations
141	│ │ ├── base.py # Skill ABC, registry, Artifact
142	│ │ ├── project_plan.py
143	│ │ ├── prd.py
144	│ │ ├── roadmap.py
145	│ │ ├── task_breakdown.py
146	│ │ ├── doc_generator.py
147	│ │ ├── wiki_generator.py
148	│ │ ├── notes_export.py
149	│ │ ├── artifact_export.py
150	│ │ ├── github_integration.py
151	│ │ ├── requirements_chat.py
152	│ │ ├── cli_adapter.py
153	│ │ └── __init__.py # Auto-registration of skills
154	│ ├── exporters/ # Output format exporters
155	│ │ ├── __init__.py
156	│ │ └── markdown.py # Template-based markdown generation
157	│ ├── utils/ # Shared utilities
158	│ │ ├── export.py # Multi-format export orchestration
159	│ │ ├── rendering.py # Mermaid/chart rendering
160	│ │ ├── prompt_templates.py
161	│ │ ├── callbacks.py # Progress callback helpers
162	│ │ └── ...
163	│ ├── exchange.py # PlanOpticonExchange format
164	│ ├── pipeline.py # Main video processing pipeline
165	│ ├── models.py # Pydantic data models
166	│ └── output_structure.py # Output directory helpers
167	├── tests/ # 822+ tests
168	├── knowledge-base/ # Local-first graph tools
169	│ ├── viewer.html # Self-contained D3.js graph viewer
170	│ └── query.py # Python query script (NetworkX)
171	├── docs/ # MkDocs documentation
172	└── pyproject.toml # Project configuration
173	```
174
175	See [Architecture Overview](architecture/overview.md) for a more detailed breakdown of module responsibilities.
176
177	## Adding a new provider
178
179	Providers self-register via `ProviderRegistry.register()` at module level. When the provider module is imported, it registers itself automatically.
180
181	1. Create `video_processor/providers/your_provider.py`
182	2. Extend `BaseProvider` from `video_processor/providers/base.py`
183	3. Implement the four required methods: `chat()`, `analyze_image()`, `transcribe_audio()`, `list_models()`
184	4. Call `ProviderRegistry.register()` at module level
185	5. Add the import to `video_processor/providers/manager.py` in the lazy-import block
186	6. Add tests in `tests/test_providers.py`
187
188	### Example provider skeleton
189
190	```python
191	"""Your provider implementation."""
192
193	from video_processor.providers.base import BaseProvider, ModelInfo, ProviderRegistry
194
195
196	class YourProvider(BaseProvider):
197	provider_name = "yourprovider"
198
199	def __init__(self, api_key: str \| None = None):
200	import os
201	self.api_key = api_key or os.environ.get("YOUR_API_KEY", "")
202
203	def chat(self, messages, max_tokens=4096, temperature=0.7, model=None):
204	# Implement chat completion
205	...
206
207	def analyze_image(self, image_bytes, prompt, max_tokens=4096, model=None):
208	# Implement image analysis
209	...
210
211	def transcribe_audio(self, audio_path, language=None, model=None):
212	# Implement audio transcription (or raise NotImplementedError)
213	...
214
215	def list_models(self):
216	return [ModelInfo(id="your-model", provider="yourprovider", capabilities=["chat"])]
217
218
219	# Self-registration at import time
220	ProviderRegistry.register(
221	"yourprovider",
222	YourProvider,
223	env_var="YOUR_API_KEY",
224	model_prefixes=["your-"],
225	default_models={"chat": "your-model"},
226	)
227	```
228
229	### OpenAI-compatible providers
230
231	For providers that use the OpenAI API format, extend `OpenAICompatibleProvider` instead of `BaseProvider`. This provides default implementations of `chat()`, `analyze_image()`, and `list_models()` -- you only need to configure the base URL and model mappings.
232
233	```python
234	from video_processor.providers.base import OpenAICompatibleProvider, ProviderRegistry
235
236	class YourProvider(OpenAICompatibleProvider):
237	provider_name = "yourprovider"
238	base_url = "https://api.yourprovider.com/v1"
239	env_var = "YOUR_API_KEY"
240
241	ProviderRegistry.register("yourprovider", YourProvider, env_var="YOUR_API_KEY")
242	```
243
244	## Adding a new cloud source
245
246	Source connectors implement the `BaseSource` ABC from `video_processor/sources/base.py`. Authentication is handled per-source, typically via environment variables.
247
248	1. Create `video_processor/sources/your_source.py`
249	2. Extend `BaseSource`
250	3. Implement `authenticate()`, `list_videos()`, and `download()`
251	4. Add the class to the lazy-import map in `video_processor/sources/__init__.py`
252	5. Add CLI commands in `video_processor/cli/commands.py` if needed
253	6. Add tests and documentation
254
255	### Example source skeleton
256
257	```python
258	"""Your source integration."""
259
260	import os
261	import logging
262	from pathlib import Path
263	from typing import List, Optional
264
265	from video_processor.sources.base import BaseSource, SourceFile
266
267	logger = logging.getLogger(__name__)
268
269
270	class YourSource(BaseSource):
271	def __init__(self, api_key: Optional[str] = None):
272	self.api_key = api_key or os.environ.get("YOUR_SOURCE_KEY", "")
273
274	def authenticate(self) -> bool:
275	"""Validate credentials. Return True on success."""
276	if not self.api_key:
277	logger.error("API key not set. Set YOUR_SOURCE_KEY env var.")
278	return False
279	# Make a test API call to verify credentials
280	...
281	return True
282
283	def list_videos(
284	self,
285	folder_id: Optional[str] = None,
286	folder_path: Optional[str] = None,
287	patterns: Optional[List[str]] = None,
288	) -> List[SourceFile]:
289	"""List available video files."""
290	...
291
292	def download(self, file: SourceFile, destination: Path) -> Path:
293	"""Download a single file. Return the local path."""
294	destination.parent.mkdir(parents=True, exist_ok=True)
295	# Download file content to destination
296	...
297	return destination
298	```
299
300	### Registering in `__init__.py`
301
302	Add your source to the `__all__` list and the `_lazy_map` dictionary in `video_processor/sources/__init__.py`:
303
304	```python
305	__all__ = [
306	...
307	"YourSource",
308	]
309
310	_lazy_map = {
311	...
312	"YourSource": "video_processor.sources.your_source",
313	}
314	```
315
316	## Adding a new skill
317
318	Agent skills extend the `Skill` ABC from `video_processor/agent/skills/base.py` and self-register via `register_skill()`.
319
320	1. Create `video_processor/agent/skills/your_skill.py`
321	2. Extend `Skill` and set `name` and `description` class attributes
322	3. Implement `execute()` to return an `Artifact`
323	4. Optionally override `can_execute()` for custom precondition checks
324	5. Call `register_skill()` at module level
325	6. Add the import to `video_processor/agent/skills/__init__.py`
326	7. Add tests
327
328	### Example skill skeleton
329
330	```python
331	"""Your custom skill."""
332
333	from video_processor.agent.skills.base import AgentContext, Artifact, Skill, register_skill
334
335
336	class YourSkill(Skill):
337	name = "your_skill"
338	description = "Generates a custom artifact from the knowledge graph."
339
340	def execute(self, context: AgentContext, **kwargs) -> Artifact:
341	"""Generate the artifact."""
342	kg_data = context.knowledge_graph.to_dict()
343	# Build content from knowledge graph data
344	content = f"# Your Artifact\n\n{len(kg_data.get('entities', []))} entities found."
345	return Artifact(
346	name="your_artifact",
347	content=content,
348	artifact_type="document",
349	format="markdown",
350	)
351
352	def can_execute(self, context: AgentContext) -> bool:
353	"""Check prerequisites (default requires KG + provider)."""
354	return context.knowledge_graph is not None
355
356
357	# Self-registration at import time
358	register_skill(YourSkill())
359	```
360
361	### Registering in `__init__.py`
362
363	Add the import to `video_processor/agent/skills/__init__.py` so the skill is loaded (and self-registered) when the skills package is imported:
364
365	```python
366	from video_processor.agent.skills import (
367	...
368	your_skill, # noqa: F401
369	)
370	```
371
372	## Adding a new document processor
373
374	Document processors extend the `DocumentProcessor` ABC from `video_processor/processors/base.py` and are registered via `register_processor()`.
375
376	1. Create `video_processor/processors/your_processor.py`
377	2. Extend `DocumentProcessor`
378	3. Set `supported_extensions` class attribute
379	4. Implement `process()` (returns `List[DocumentChunk]`) and `can_process()`
380	5. Call `register_processor()` at module level
381	6. Add the import to `video_processor/processors/__init__.py`
382	7. Add tests
383
384	### Example processor skeleton
385
386	```python
387	"""Your document processor."""
388
389	from pathlib import Path
390	from typing import List
391
392	from video_processor.processors.base import (
393	DocumentChunk,
394	DocumentProcessor,
395	register_processor,
396	)
397
398
399	class YourProcessor(DocumentProcessor):
400	supported_extensions = [".xyz", ".abc"]
401
402	def can_process(self, path: Path) -> bool:
403	return path.suffix.lower() in self.supported_extensions
404
405	def process(self, path: Path) -> List[DocumentChunk]:
406	text = path.read_text()
407	# Split into chunks as appropriate for your format
408	return [
409	DocumentChunk(
410	text=text,
411	source_file=str(path),
412	chunk_index=0,
413	metadata={"format": "xyz"},
414	)
415	]
416
417
418	# Self-registration at import time
419	register_processor([".xyz", ".abc"], YourProcessor)
420	```
421
422	### Registering in `__init__.py`
423
424	Add the import to `video_processor/processors/__init__.py`:
425
426	```python
427	from video_processor.processors import (
428	markdown_processor, # noqa: F401, E402
429	pdf_processor, # noqa: F401, E402
430	your_processor, # noqa: F401, E402
431	)
432	```
433
434	## Adding a new exporter
435
436	Exporters live in `video_processor/exporters/` and are typically called from CLI commands. There is no strict ABC for exporters -- they are plain functions that accept knowledge graph data and an output directory.
437
438	1. Create `video_processor/exporters/your_exporter.py`
439	2. Implement one or more export functions that accept KG data (as a dict) and an output path
440	3. Add CLI integration in `video_processor/cli/commands.py` under the `export` group
441	4. Add tests
442
443	### Example exporter skeleton
444
445	```python
446	"""Your exporter."""
447
448	import json
449	from pathlib import Path
450	from typing import List
451
452
453	def export_your_format(kg_data: dict, output_dir: Path) -> List[Path]:
454	"""Export knowledge graph data in your format.
455
456	Args:
457	kg_data: Knowledge graph as a dict (from KnowledgeGraph.to_dict()).
458	output_dir: Directory to write output files.
459
460	Returns:
461	List of created file paths.
462	"""
463	output_dir.mkdir(parents=True, exist_ok=True)
464	created = []
465
466	output_file = output_dir / "export.xyz"
467	output_file.write_text(json.dumps(kg_data, indent=2))
468	created.append(output_file)
469
470	return created
471	```
472
473	### Adding the CLI command
474
475	Add a subcommand under the `export` group in `video_processor/cli/commands.py`:
476
477	```python
478	@export.command("your-format")
479	@click.argument("db_path", type=click.Path(exists=True))
480	@click.option("-o", "--output", type=click.Path(), default=None)
481	def export_your_format_cmd(db_path, output):
482	"""Export knowledge graph in your format."""
483	from video_processor.exporters.your_exporter import export_your_format
484	from video_processor.integrators.knowledge_graph import KnowledgeGraph
485
486	kg = KnowledgeGraph(db_path=Path(db_path))
487	out_dir = Path(output) if output else Path.cwd() / "your-export"
488	created = export_your_format(kg.to_dict(), out_dir)
489	click.echo(f"Exported {len(created)} files to {out_dir}/")
490	```
491
492	## License
493
494	MIT License -- Copyright (c) 2026 CONFLICT LLC. All rights reserved.
495
496	DDED docs/faq.md

A docs/faq.md

+301

		--- a/docs/faq.md
		+++ b/docs/faq.md
		@@ -0,0 +1,301 @@
	1	+# FAQ & Troubleshooting
	2	+
	3	+## Frequently Asked Questions
	4	+
	5	+### Do I need an API key?
	6	+
	7	+You need at least one of:
	8	+
	9	+- Cloud API key: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `GEMINI_API_KEY`
	10	+- Local Ollama: Install [Ollama](https://ollama.com), pull a model, and run `ollama serve`
	11	+
	12	+Some features work without any AI provider:
	13	+
	14	+- `planopticon query stats` — direct knowledge graph queries
	15	+- `planopticon query "entities --type person"` — structured entity lookups
	16	+- `planopticon export markdown` — document generation from existing KG (7 document types, no LLM)
	17	+- `planopticon kg inspect` — knowledge graph statistics
	18	+- `planopticon kg convert` — format conversion
	19	+
	20	+### How much does it cost?
	21	+
	22	+PlanOpticon defaults to cheap models to minimize costs:
	23	+
	24	+\| Task \| Default model \| Approximate cost \|
	25	+\|------\|--------------\|-----------------\|
	26	+\| Chat/analysis \| Claude Haiku / GPT-4o-mini \| ~$0.25-0.50 per 1M tokens \|
	27	+\| Vision (diagrams) \| Gemini Flash / GPT-4o-mini \| ~$0.10-0.50 per 1M tokens \|
	28	+\| Transcription \| Local Whisper (free) / Whisper-1 \| $0.006/minute \|
	29	+
	30	+A typical 1-hour meeting costs roughly $0.05-0.15 to process with default models. Use `--provider ollama` for zero cost.
	31	+
	32	+### Can I run fully offline?
	33	+
	34	+Yes. Install Ollama and local Whisper:
	35	+
	36	+```bash
	37	+ollama pull llama3.2
	38	+ollama pull llava
	39	+pip install planopticon[gpu]
	40	+planopticon analyze -i video.mp4 -o ./output --provider ollama
	41	+```
	42	+
	43	+No data leaves your machine.
	44	+
	45	+### What video formats are supported?
	46	+
	47	+Any format FFmpeg can decode:
	48	+
	49	+- MP4, MKV, AVI, MOV, WebM, FLV, WMV, M4V
	50	+- Container formats with common codecs (H.264, H.265, VP8, VP9, AV1)
	51	+
	52	+### What document formats can I ingest?
	53	+
	54	+- PDF — text extraction via pymupdf or pdfplumber
	55	+- Markdown — parsed with heading-based chunking
	56	+- Plain text — paragraph-based chunking with overlap
	57	+
	58	+### How does the knowledge graph work?
	59	+
	60	+PlanOpticon extracts entities (people, technologies, concepts, decisions) and relationships from your content. These are stored in a SQLite database (`knowledge_graph.db`) with zero external dependencies. Entities are automatically classified using a planning taxonomy (goals, requirements, risks, tasks, milestones).
	61	+
	62	+When you process multiple sources, entities are merged using fuzzy name matching (0.85 threshold) with type conflict resolution and provenance tracking.
	63	+
	64	+### Can I use PlanOpticon with my existing Obsidian vault?
	65	+
	66	+Yes, in both directions:
	67	+
	68	+```bash
	69	+# Ingest an Obsidian vault into PlanOpticon
	70	+planopticon ingest ~/Obsidian/MyVault --output ./kb --recursive
	71	+
	72	+# Export PlanOpticon knowledge to an Obsidian vault
	73	+planopticon export obsidian --input ./kb --output ~/Obsidian/PlanOpticon
	74	+```
	75	+
	76	+The Obsidian export produces proper YAML frontmatter, wiki-links (`[[Entity Name]]`), and tag pages.
	77	+
	78	+### How do I add my own AI provider?
	79	+
	80	+Create a provider module, extend `BaseProvider`, and register it:
	81	+
	82	+```python
	83	+from video_processor.providers.base import BaseProvider, ProviderRegistry
	84	+
	85	+class MyProvider(BaseProvider):
	86	+ provider_name = "myprovider"
	87	+
	88	+ def chat(self, messages, max_tokens=4096, temperature=0.7, model=None):
	89	+ # Your implementation
	90	+ ...
	91	+
	92	+ProviderRegistry.register(
	93	+ name="myprovider",
	94	+ provider_class=MyProvider,
	95	+ env_var="MY_PROVIDER_API_KEY",
	96	+ model_prefixes=["my-"],
	97	+ default_models={"chat": "my-model-v1", "vision": "", "audio": ""},
	98	+)
	99	+```
	100	+
	101	+See the [Contributing guide](contributing.md) for details.
	102	+
	103	+---
	104	+
	105	+## Troubleshooting
	106	+
	107	+### Authentication errors
	108	+
	109	+#### "No auth method available for zoom"
	110	+
	111	+You need to set credentials before authenticating:
	112	+
	113	+```bash
	114	+export ZOOM_CLIENT_ID="your-client-id"
	115	+export ZOOM_CLIENT_SECRET="your-client-secret"
	116	+planopticon auth zoom
	117	+```
	118	+
	119	+The error message tells you which environment variables to set. Each service requires different credentials — see the [Authentication guide](guide/authentication.md).
	120	+
	121	+#### "Token expired" or "401 Unauthorized"
	122	+
	123	+Your saved token has expired and auto-refresh failed. Re-authenticate:
	124	+
	125	+```bash
	126	+planopticon auth google # or whatever service
	127	+```
	128	+
	129	+To clear a stale token:
	130	+
	131	+```bash
	132	+planopticon auth google --logout
	133	+planopticon auth google
	134	+```
	135	+
	136	+Tokens are stored in `~/.planopticon/{service}_token.json`.
	137	+
	138	+#### OAuth redirect errors
	139	+
	140	+If the browser-based OAuth flow fails, check:
	141	+
	142	+1. Your client ID and secret are correct
	143	+2. The redirect URI in your OAuth app matches PlanOpticon's default (`urn:ietf:wg:oauth:2.0:oob`)
	144	+3. The OAuth app has the required scopes enabled
	145	+
	146	+### Provider errors
	147	+
	148	+#### "ANTHROPIC_API_KEY not set"
	149	+
	150	+Set at least one provider's API key:
	151	+
	152	+```bash
	153	+export OPENAI_API_KEY="sk-..."
	154	+# or
	155	+export ANTHROPIC_API_KEY="sk-ant-..."
	156	+# or
	157	+export GEMINI_API_KEY="AI..."
	158	+```
	159	+
	160	+Or use a `.env` file in your project directory.
	161	+
	162	+#### "Unexpected role system" (Anthropic)
	163	+
	164	+This was a bug in older versions where system messages were passed in the messages array instead of as a top-level parameter. Update to v0.4.0 or later.
	165	+
	166	+#### "Model not found" or "Invalid model"
	167	+
	168	+Check available models:
	169	+
	170	+```bash
	171	+planopticon list-models
	172	+```
	173	+
	174	+Common model name issues:
	175	+- Anthropic: use `claude-haiku-4-5-20251001`, not `claude-haiku`
	176	+- OpenAI: use `gpt-4o-mini`, not `gpt4o-mini`
	177	+
	178	+#### Rate limiting / 429 errors
	179	+
	180	+PlanOpticon doesn't currently implement automatic retry. If you hit rate limits:
	181	+
	182	+1. Use a different provider: `--provider gemini`
	183	+2. Use cheaper/faster models: `--chat-model gpt-4o-mini`
	184	+3. Reduce processing depth: `--depth basic`
	185	+4. Use Ollama for zero rate limits: `--provider ollama`
	186	+
	187	+### Processing errors
	188	+
	189	+#### "FFmpeg not found"
	190	+
	191	+Install FFmpeg:
	192	+
	193	+```bash
	194	+# macOS
	195	+brew install ffmpeg
	196	+
	197	+# Ubuntu/Debian
	198	+sudo apt-get install ffmpeg libsndfile1
	199	+
	200	+# Windows
	201	+# Download from https://ffmpeg.org/download.html and add to PATH
	202	+```
	203	+
	204	+#### "Audio extraction failed: no audio track found"
	205	+
	206	+The video file has no audio track. PlanOpticon will skip transcription and continue with frame analysis only.
	207	+
	208	+#### "Frame extraction memory error"
	209	+
	210	+For very long videos, frame extraction can use significant memory. Use the `--max-memory-mb` safety valve:
	211	+
	212	+```bash
	213	+planopticon analyze -i long-video.mp4 -o ./output --max-memory-mb 2048
	214	+```
	215	+
	216	+Or reduce the sampling rate:
	217	+
	218	+```bash
	219	+planopticon analyze -i long-video.mp4 -o ./output --sampling-rate 0.25
	220	+```
	221	+
	222	+#### Batch processing — one video fails
	223	+
	224	+Individual video failures don't stop the batch. Failed videos are logged in the batch manifest with error details. Check `batch_manifest.json` for the specific error.
	225	+
	226	+### Knowledge graph issues
	227	+
	228	+#### "No knowledge graph loaded" in companion
	229	+
	230	+The companion auto-discovers knowledge graphs by looking for `knowledge_graph.db` or `knowledge_graph.json` in the current directory and parent directories. Either:
	231	+
	232	+1. `cd` to the directory containing your knowledge graph
	233	+2. Specify the path explicitly: `planopticon companion --kb ./path/to/kb`
	234	+
	235	+#### Empty or sparse knowledge graph
	236	+
	237	+Common causes:
	238	+
	239	+1. Too few entities extracted: Try `--depth comprehensive` for deeper analysis
	240	+2. Short or low-quality transcript: Check `transcript/transcript.txt` — poor audio produces poor transcription
	241	+3. Wrong provider: Some models extract entities better than others. Try `--provider openai --chat-model gpt-4o` for higher quality
	242	+
	243	+#### Duplicate entities after merge
	244	+
	245	+The fuzzy matching threshold is 0.85 (SequenceMatcher ratio). If you're getting duplicates, the names are too different for automatic matching. You can manually inspect and merge:
	246	+
	247	+```bash
	248	+planopticon kg inspect ./knowledge_graph.db
	249	+planopticon query "entities --name python"
	250	+```
	251	+
	252	+### Companion / REPL issues
	253	+
	254	+#### Chat gives generic advice instead of project-specific answers
	255	+
	256	+The companion needs both a knowledge graph and an LLM provider. Check:
	257	+
	258	+```
	259	+planopticon> /status
	260	+```
	261	+
	262	+If it says "KG: not loaded" or "Provider: none", fix those first:
	263	+
	264	+```
	265	+planopticon> /provider openai
	266	+planopticon> /model gpt-4o-mini
	267	+```
	268	+
	269	+#### Companion is slow
	270	+
	271	+The companion makes LLM API calls for chat messages. To speed things up:
	272	+
	273	+1. Use a faster model: `/model gpt-4o-mini` or `/model claude-haiku-4-5-20251001`
	274	+2. Use direct queries instead of chat: `/entities`, `/search`, `/neighbors` don't need an LLM
	275	+3. Use Ollama locally for lower latency: `/provider ollama`
	276	+
	277	+### Export issues
	278	+
	279	+#### Obsidian export has broken links
	280	+
	281	+Make sure your Obsidian vault has wiki-links enabled (Settings > Files & Links > Use [[Wikilinks]]). PlanOpticon exports use wiki-link syntax by default.
	282	+
	283	+#### PDF export fails
	284	+
	285	+PDF export requires the `pdf` extra:
	286	+
	287	+```bash
	288	+pip install planopticon[pdf]
	289	+```
	290	+
	291	+This installs WeasyPrint, which has system dependencies. On macOS:
	292	+
	293	+```bash
	294	+brew install pango
	295	+```
	296	+
	297	+On Ubuntu:
	298	+
	299	+```bash
	300	+sudo apt-get install libpango1.0-dev
	301	+```

	--- a/docs/faq.md
	+++ b/docs/faq.md
	@@ -0,0 +1,301 @@

	--- a/docs/faq.md
	+++ b/docs/faq.md
	@@ -0,0 +1,301 @@
1	# FAQ & Troubleshooting
2
3	## Frequently Asked Questions
4
5	### Do I need an API key?
6
7	You need at least one of:
8
9	- Cloud API key: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `GEMINI_API_KEY`
10	- Local Ollama: Install [Ollama](https://ollama.com), pull a model, and run `ollama serve`
11
12	Some features work without any AI provider:
13
14	- `planopticon query stats` — direct knowledge graph queries
15	- `planopticon query "entities --type person"` — structured entity lookups
16	- `planopticon export markdown` — document generation from existing KG (7 document types, no LLM)
17	- `planopticon kg inspect` — knowledge graph statistics
18	- `planopticon kg convert` — format conversion
19
20	### How much does it cost?
21
22	PlanOpticon defaults to cheap models to minimize costs:
23
24	\| Task \| Default model \| Approximate cost \|
25	\|------\|--------------\|-----------------\|
26	\| Chat/analysis \| Claude Haiku / GPT-4o-mini \| ~$0.25-0.50 per 1M tokens \|
27	\| Vision (diagrams) \| Gemini Flash / GPT-4o-mini \| ~$0.10-0.50 per 1M tokens \|
28	\| Transcription \| Local Whisper (free) / Whisper-1 \| $0.006/minute \|
29
30	A typical 1-hour meeting costs roughly $0.05-0.15 to process with default models. Use `--provider ollama` for zero cost.
31
32	### Can I run fully offline?
33
34	Yes. Install Ollama and local Whisper:
35
36	```bash
37	ollama pull llama3.2
38	ollama pull llava
39	pip install planopticon[gpu]
40	planopticon analyze -i video.mp4 -o ./output --provider ollama
41	```
42
43	No data leaves your machine.
44
45	### What video formats are supported?
46
47	Any format FFmpeg can decode:
48
49	- MP4, MKV, AVI, MOV, WebM, FLV, WMV, M4V
50	- Container formats with common codecs (H.264, H.265, VP8, VP9, AV1)
51
52	### What document formats can I ingest?
53
54	- PDF — text extraction via pymupdf or pdfplumber
55	- Markdown — parsed with heading-based chunking
56	- Plain text — paragraph-based chunking with overlap
57
58	### How does the knowledge graph work?
59
60	PlanOpticon extracts entities (people, technologies, concepts, decisions) and relationships from your content. These are stored in a SQLite database (`knowledge_graph.db`) with zero external dependencies. Entities are automatically classified using a planning taxonomy (goals, requirements, risks, tasks, milestones).
61
62	When you process multiple sources, entities are merged using fuzzy name matching (0.85 threshold) with type conflict resolution and provenance tracking.
63
64	### Can I use PlanOpticon with my existing Obsidian vault?
65
66	Yes, in both directions:
67
68	```bash
69	# Ingest an Obsidian vault into PlanOpticon
70	planopticon ingest ~/Obsidian/MyVault --output ./kb --recursive
71
72	# Export PlanOpticon knowledge to an Obsidian vault
73	planopticon export obsidian --input ./kb --output ~/Obsidian/PlanOpticon
74	```
75
76	The Obsidian export produces proper YAML frontmatter, wiki-links (`[[Entity Name]]`), and tag pages.
77
78	### How do I add my own AI provider?
79
80	Create a provider module, extend `BaseProvider`, and register it:
81
82	```python
83	from video_processor.providers.base import BaseProvider, ProviderRegistry
84
85	class MyProvider(BaseProvider):
86	provider_name = "myprovider"
87
88	def chat(self, messages, max_tokens=4096, temperature=0.7, model=None):
89	# Your implementation
90	...
91
92	ProviderRegistry.register(
93	name="myprovider",
94	provider_class=MyProvider,
95	env_var="MY_PROVIDER_API_KEY",
96	model_prefixes=["my-"],
97	default_models={"chat": "my-model-v1", "vision": "", "audio": ""},
98	)
99	```
100
101	See the [Contributing guide](contributing.md) for details.
102
103	---
104
105	## Troubleshooting
106
107	### Authentication errors
108
109	#### "No auth method available for zoom"
110
111	You need to set credentials before authenticating:
112
113	```bash
114	export ZOOM_CLIENT_ID="your-client-id"
115	export ZOOM_CLIENT_SECRET="your-client-secret"
116	planopticon auth zoom
117	```
118
119	The error message tells you which environment variables to set. Each service requires different credentials — see the [Authentication guide](guide/authentication.md).
120
121	#### "Token expired" or "401 Unauthorized"
122
123	Your saved token has expired and auto-refresh failed. Re-authenticate:
124
125	```bash
126	planopticon auth google # or whatever service
127	```
128
129	To clear a stale token:
130
131	```bash
132	planopticon auth google --logout
133	planopticon auth google
134	```
135
136	Tokens are stored in `~/.planopticon/{service}_token.json`.
137
138	#### OAuth redirect errors
139
140	If the browser-based OAuth flow fails, check:
141
142	1. Your client ID and secret are correct
143	2. The redirect URI in your OAuth app matches PlanOpticon's default (`urn:ietf:wg:oauth:2.0:oob`)
144	3. The OAuth app has the required scopes enabled
145
146	### Provider errors
147
148	#### "ANTHROPIC_API_KEY not set"
149
150	Set at least one provider's API key:
151
152	```bash
153	export OPENAI_API_KEY="sk-..."
154	# or
155	export ANTHROPIC_API_KEY="sk-ant-..."
156	# or
157	export GEMINI_API_KEY="AI..."
158	```
159
160	Or use a `.env` file in your project directory.
161
162	#### "Unexpected role system" (Anthropic)
163
164	This was a bug in older versions where system messages were passed in the messages array instead of as a top-level parameter. Update to v0.4.0 or later.
165
166	#### "Model not found" or "Invalid model"
167
168	Check available models:
169
170	```bash
171	planopticon list-models
172	```
173
174	Common model name issues:
175	- Anthropic: use `claude-haiku-4-5-20251001`, not `claude-haiku`
176	- OpenAI: use `gpt-4o-mini`, not `gpt4o-mini`
177
178	#### Rate limiting / 429 errors
179
180	PlanOpticon doesn't currently implement automatic retry. If you hit rate limits:
181
182	1. Use a different provider: `--provider gemini`
183	2. Use cheaper/faster models: `--chat-model gpt-4o-mini`
184	3. Reduce processing depth: `--depth basic`
185	4. Use Ollama for zero rate limits: `--provider ollama`
186
187	### Processing errors
188
189	#### "FFmpeg not found"
190
191	Install FFmpeg:
192
193	```bash
194	# macOS
195	brew install ffmpeg
196
197	# Ubuntu/Debian
198	sudo apt-get install ffmpeg libsndfile1
199
200	# Windows
201	# Download from https://ffmpeg.org/download.html and add to PATH
202	```
203
204	#### "Audio extraction failed: no audio track found"
205
206	The video file has no audio track. PlanOpticon will skip transcription and continue with frame analysis only.
207
208	#### "Frame extraction memory error"
209
210	For very long videos, frame extraction can use significant memory. Use the `--max-memory-mb` safety valve:
211
212	```bash
213	planopticon analyze -i long-video.mp4 -o ./output --max-memory-mb 2048
214	```
215
216	Or reduce the sampling rate:
217
218	```bash
219	planopticon analyze -i long-video.mp4 -o ./output --sampling-rate 0.25
220	```
221
222	#### Batch processing — one video fails
223
224	Individual video failures don't stop the batch. Failed videos are logged in the batch manifest with error details. Check `batch_manifest.json` for the specific error.
225
226	### Knowledge graph issues
227
228	#### "No knowledge graph loaded" in companion
229
230	The companion auto-discovers knowledge graphs by looking for `knowledge_graph.db` or `knowledge_graph.json` in the current directory and parent directories. Either:
231
232	1. `cd` to the directory containing your knowledge graph
233	2. Specify the path explicitly: `planopticon companion --kb ./path/to/kb`
234
235	#### Empty or sparse knowledge graph
236
237	Common causes:
238
239	1. Too few entities extracted: Try `--depth comprehensive` for deeper analysis
240	2. Short or low-quality transcript: Check `transcript/transcript.txt` — poor audio produces poor transcription
241	3. Wrong provider: Some models extract entities better than others. Try `--provider openai --chat-model gpt-4o` for higher quality
242
243	#### Duplicate entities after merge
244
245	The fuzzy matching threshold is 0.85 (SequenceMatcher ratio). If you're getting duplicates, the names are too different for automatic matching. You can manually inspect and merge:
246
247	```bash
248	planopticon kg inspect ./knowledge_graph.db
249	planopticon query "entities --name python"
250	```
251
252	### Companion / REPL issues
253
254	#### Chat gives generic advice instead of project-specific answers
255
256	The companion needs both a knowledge graph and an LLM provider. Check:
257
258	```
259	planopticon> /status
260	```
261
262	If it says "KG: not loaded" or "Provider: none", fix those first:
263
264	```
265	planopticon> /provider openai
266	planopticon> /model gpt-4o-mini
267	```
268
269	#### Companion is slow
270
271	The companion makes LLM API calls for chat messages. To speed things up:
272
273	1. Use a faster model: `/model gpt-4o-mini` or `/model claude-haiku-4-5-20251001`
274	2. Use direct queries instead of chat: `/entities`, `/search`, `/neighbors` don't need an LLM
275	3. Use Ollama locally for lower latency: `/provider ollama`
276
277	### Export issues
278
279	#### Obsidian export has broken links
280
281	Make sure your Obsidian vault has wiki-links enabled (Settings > Files & Links > Use [[Wikilinks]]). PlanOpticon exports use wiki-link syntax by default.
282
283	#### PDF export fails
284
285	PDF export requires the `pdf` extra:
286
287	```bash
288	pip install planopticon[pdf]
289	```
290
291	This installs WeasyPrint, which has system dependencies. On macOS:
292
293	```bash
294	brew install pango
295	```
296
297	On Ubuntu:
298
299	```bash
300	sudo apt-get install libpango1.0-dev
301	```

M docs/getting-started/configuration.md

+141 -25

		--- docs/getting-started/configuration.md
		+++ docs/getting-started/configuration.md
		@@ -1,45 +1,150 @@
1	1	# Configuration
2	2
3		-## Environment variables
	3	+## Example `.env` file
	4	+
	5	+Create a `.env` file in your project directory. PlanOpticon loads it automatically.
	6	+
	7	+```bash
	8	+# =============================================================================
	9	+# PlanOpticon Configuration
	10	+# =============================================================================
	11	+# Copy this file to .env and fill in the values you need.
	12	+# You only need ONE AI provider — PlanOpticon auto-detects which are available.
	13	+
	14	+# --- AI Providers (set at least one) ----------------------------------------
	15	+
	16	+# OpenAI — get your key at https://platform.openai.com/api-keys
	17	+OPENAI_API_KEY=sk-...
	18	+
	19	+# Anthropic — get your key at https://console.anthropic.com/settings/keys
	20	+ANTHROPIC_API_KEY=sk-ant-...
	21	+
	22	+# Google Gemini — get your key at https://aistudio.google.com/apikey
	23	+GEMINI_API_KEY=AI...
	24	+
	25	+# Azure OpenAI — from your Azure portal deployment
	26	+# AZURE_OPENAI_API_KEY=...
	27	+# AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
	28	+
	29	+# Together AI — https://api.together.xyz/settings/api-keys
	30	+# TOGETHER_API_KEY=...
	31	+
	32	+# Fireworks AI — https://fireworks.ai/account/api-keys
	33	+# FIREWORKS_API_KEY=...
	34	+
	35	+# Cerebras — https://cloud.cerebras.ai/
	36	+# CEREBRAS_API_KEY=...
	37	+
	38	+# xAI (Grok) — https://console.x.ai/
	39	+# XAI_API_KEY=...
	40	+
	41	+# Ollama (local, no key needed) — just run: ollama serve
	42	+# OLLAMA_HOST=http://localhost:11434
	43	+
	44	+# --- Google (Drive, Docs, Sheets, Meet, YouTube) ----------------------------
	45	+# Option A: OAuth (interactive, recommended for personal use)
	46	+# Create credentials at https://console.cloud.google.com/apis/credentials
	47	+# 1. Create an OAuth 2.0 Client ID (Desktop application)
	48	+# 2. Enable these APIs: Google Drive API, Google Docs API
	49	+GOOGLE_CLIENT_ID=123456789-abc.apps.googleusercontent.com
	50	+GOOGLE_CLIENT_SECRET=GOCSPX-...
	51	+
	52	+# Option B: Service Account (automated/server-side)
	53	+# GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
	54	+
	55	+# --- Zoom (recordings) ------------------------------------------------------
	56	+# Create an OAuth app at https://marketplace.zoom.us/develop/create
	57	+# App type: "General App" with OAuth
	58	+# Scopes: cloud_recording:read:list_user_recordings, cloud_recording:read:recording
	59	+ZOOM_CLIENT_ID=...
	60	+ZOOM_CLIENT_SECRET=...
	61	+# For Server-to-Server (no browser needed):
	62	+# ZOOM_ACCOUNT_ID=...
	63	+
	64	+# --- Microsoft 365 (OneDrive, SharePoint, Teams) ----------------------------
	65	+# Register an app at https://portal.azure.com/#view/Microsoft_AAD_RegisteredApps
	66	+# API permissions: OnlineMeetings.Read, Files.Read (delegated)
	67	+MICROSOFT_CLIENT_ID=...
	68	+MICROSOFT_CLIENT_SECRET=...
	69	+
	70	+# --- Notion ------------------------------------------------------------------
	71	+# Option A: OAuth (create integration at https://www.notion.so/my-integrations)
	72	+# NOTION_CLIENT_ID=...
	73	+# NOTION_CLIENT_SECRET=...
	74	+
	75	+# Option B: API key (simpler, from the same integrations page)
	76	+NOTION_API_KEY=secret_...
	77	+
	78	+# --- GitHub ------------------------------------------------------------------
	79	+# Option A: Personal Access Token (simplest)
	80	+# Create at https://github.com/settings/tokens — needs 'repo' scope
	81	+GITHUB_TOKEN=ghp_...
	82	+
	83	+# Option B: OAuth App (for CI/automation)
	84	+# GITHUB_CLIENT_ID=...
	85	+# GITHUB_CLIENT_SECRET=...
	86	+
	87	+# --- Dropbox -----------------------------------------------------------------
	88	+# Create an app at https://www.dropbox.com/developers/apps
	89	+# DROPBOX_APP_KEY=...
	90	+# DROPBOX_APP_SECRET=...
	91	+# Or use a long-lived access token:
	92	+# DROPBOX_ACCESS_TOKEN=...
	93	+
	94	+# --- General -----------------------------------------------------------------
	95	+# CACHE_DIR=~/.cache/planopticon
	96	+```
	97	+
	98	+## Environment variables reference
4	99
5	100	### AI providers
6	101
7		-\| Variable \| Description \|
8		-\|----------\|-------------\|
9		-\| `OPENAI_API_KEY` \| OpenAI API key \|
10		-\| `ANTHROPIC_API_KEY` \| Anthropic API key \|
11		-\| `GEMINI_API_KEY` \| Google Gemini API key \|
12		-\| `AZURE_OPENAI_API_KEY` \| Azure OpenAI API key \|
13		-\| `AZURE_OPENAI_ENDPOINT` \| Azure OpenAI endpoint URL \|
14		-\| `TOGETHER_API_KEY` \| Together AI API key \|
15		-\| `FIREWORKS_API_KEY` \| Fireworks AI API key \|
16		-\| `CEREBRAS_API_KEY` \| Cerebras API key \|
17		-\| `XAI_API_KEY` \| xAI (Grok) API key \|
18		-\| `OLLAMA_HOST` \| Ollama server URL (default: `http://localhost:11434`) \|
	102	+\| Variable \| Required \| Where to get it \|
	103	+\|----------\|----------\|----------------\|
	104	+\| `OPENAI_API_KEY` \| At least one provider \| [platform.openai.com/api-keys](https://platform.openai.com/api-keys) \|
	105	+\| `ANTHROPIC_API_KEY` \| At least one provider \| [console.anthropic.com](https://console.anthropic.com/settings/keys) \|
	106	+\| `GEMINI_API_KEY` \| At least one provider \| [aistudio.google.com/apikey](https://aistudio.google.com/apikey) \|
	107	+\| `AZURE_OPENAI_API_KEY` \| Optional \| Azure portal > your OpenAI resource \|
	108	+\| `AZURE_OPENAI_ENDPOINT` \| With Azure \| Azure portal > your OpenAI resource \|
	109	+\| `TOGETHER_API_KEY` \| Optional \| [api.together.xyz](https://api.together.xyz/settings/api-keys) \|
	110	+\| `FIREWORKS_API_KEY` \| Optional \| [fireworks.ai](https://fireworks.ai/account/api-keys) \|
	111	+\| `CEREBRAS_API_KEY` \| Optional \| [cloud.cerebras.ai](https://cloud.cerebras.ai/) \|
	112	+\| `XAI_API_KEY` \| Optional \| [console.x.ai](https://console.x.ai/) \|
	113	+\| `OLLAMA_HOST` \| Optional \| Default: `http://localhost:11434` \|
19	114
20	115	### Cloud services
21	116
22		-\| Variable \| Description \|
23		-\|----------\|-------------\|
24		-\| `GOOGLE_APPLICATION_CREDENTIALS` \| Path to Google service account JSON (for server-side Drive access) \|
25		-\| `ZOOM_CLIENT_ID` \| Zoom OAuth app client ID \|
26		-\| `ZOOM_CLIENT_SECRET` \| Zoom OAuth app client secret \|
27		-\| `NOTION_API_KEY` \| Notion integration token \|
28		-\| `GITHUB_TOKEN` \| GitHub personal access token \|
29		-\| `MICROSOFT_CLIENT_ID` \| Azure AD app client ID (for Microsoft 365) \|
30		-\| `MICROSOFT_CLIENT_SECRET` \| Azure AD app client secret \|
	117	+\| Variable \| Service \| Auth method \|
	118	+\|----------\|---------\|-------------\|
	119	+\| `GOOGLE_CLIENT_ID` \| Google (Drive, Docs, Meet) \| OAuth \|
	120	+\| `GOOGLE_CLIENT_SECRET` \| Google \| OAuth \|
	121	+\| `GOOGLE_APPLICATION_CREDENTIALS` \| Google \| Service account \|
	122	+\| `ZOOM_CLIENT_ID` \| Zoom \| OAuth \|
	123	+\| `ZOOM_CLIENT_SECRET` \| Zoom \| OAuth \|
	124	+\| `ZOOM_ACCOUNT_ID` \| Zoom \| Server-to-Server \|
	125	+\| `MICROSOFT_CLIENT_ID` \| Microsoft 365 \| OAuth \|
	126	+\| `MICROSOFT_CLIENT_SECRET` \| Microsoft 365 \| OAuth \|
	127	+\| `NOTION_CLIENT_ID` \| Notion \| OAuth \|
	128	+\| `NOTION_CLIENT_SECRET` \| Notion \| OAuth \|
	129	+\| `NOTION_API_KEY` \| Notion \| API key \|
	130	+\| `GITHUB_CLIENT_ID` \| GitHub \| OAuth \|
	131	+\| `GITHUB_CLIENT_SECRET` \| GitHub \| OAuth \|
	132	+\| `GITHUB_TOKEN` \| GitHub \| API key \|
	133	+\| `DROPBOX_APP_KEY` \| Dropbox \| OAuth \|
	134	+\| `DROPBOX_APP_SECRET` \| Dropbox \| OAuth \|
	135	+\| `DROPBOX_ACCESS_TOKEN` \| Dropbox \| API key \|
31	136
32	137	### General
33	138
34	139	\| Variable \| Description \|
35	140	\|----------\|-------------\|
36	141	\| `CACHE_DIR` \| Directory for API response caching \|
37	142
38	143	## Authentication
39	144
40		-Most cloud services use OAuth via the `planopticon auth` command. Run it once per service to store credentials locally:
	145	+PlanOpticon uses OAuth for cloud services. Run `planopticon auth` once per service — tokens are saved locally and refreshed automatically.
41	146
42	147	```bash
43	148	planopticon auth google # Google Drive, Docs, Meet, YouTube
44	149	planopticon auth dropbox # Dropbox
45	150	planopticon auth zoom # Zoom recordings
		@@ -46,13 +151,24 @@
46	151	planopticon auth notion # Notion pages
47	152	planopticon auth github # GitHub repos and wikis
48	153	planopticon auth microsoft # OneDrive, SharePoint, Teams
49	154	```
50	155
51		-Credentials are stored in `~/.config/planopticon/`. Use `planopticon auth SERVICE --logout` to remove them.
	156	+Credentials are stored in `~/.planopticon/`. Use `planopticon auth SERVICE --logout` to remove them.
	157	+
	158	+### What each service needs
	159	+
	160	+\| Service \| Minimum setup \| Full OAuth setup \|
	161	+\|---------\|--------------\|-----------------\|
	162	+\| Google \| `GOOGLE_CLIENT_ID` + `GOOGLE_CLIENT_SECRET` \| Create OAuth credentials in [Google Cloud Console](https://console.cloud.google.com/apis/credentials) \|
	163	+\| Zoom \| `ZOOM_CLIENT_ID` + `ZOOM_CLIENT_SECRET` \| Create a General App at [marketplace.zoom.us](https://marketplace.zoom.us/develop/create) \|
	164	+\| Microsoft \| `MICROSOFT_CLIENT_ID` + `MICROSOFT_CLIENT_SECRET` \| Register app in [Azure AD](https://portal.azure.com/#view/Microsoft_AAD_RegisteredApps) \|
	165	+\| Notion \| `NOTION_API_KEY` (simplest) \| Create integration at [notion.so/my-integrations](https://www.notion.so/my-integrations) \|
	166	+\| GitHub \| `GITHUB_TOKEN` (simplest) \| Create token at [github.com/settings/tokens](https://github.com/settings/tokens) \|
	167	+\| Dropbox \| `DROPBOX_APP_KEY` + `DROPBOX_APP_SECRET` \| Create app at [dropbox.com/developers](https://www.dropbox.com/developers/apps) \|
52	168
53		-For Zoom and Microsoft 365, you also need to set the client ID and secret environment variables before running `planopticon auth`.
	169	+For detailed OAuth app creation walkthroughs, see the [Authentication guide](../guide/authentication.md).
54	170
55	171	## Provider routing
56	172
57	173	PlanOpticon auto-discovers available models and routes each task to the cheapest capable option:
58	174
59	175
60	176	ADDED docs/guide/authentication.md

	--- docs/getting-started/configuration.md
	+++ docs/getting-started/configuration.md
	@@ -1,45 +1,150 @@
1	# Configuration
2
3	## Environment variables































































































4
5	### AI providers
6
7	\| Variable \| Description \|
8	\|----------\|-------------\|
9	\| `OPENAI_API_KEY` \| OpenAI API key \|
10	\| `ANTHROPIC_API_KEY` \| Anthropic API key \|
11	\| `GEMINI_API_KEY` \| Google Gemini API key \|
12	\| `AZURE_OPENAI_API_KEY` \| Azure OpenAI API key \|
13	\| `AZURE_OPENAI_ENDPOINT` \| Azure OpenAI endpoint URL \|
14	\| `TOGETHER_API_KEY` \| Together AI API key \|
15	\| `FIREWORKS_API_KEY` \| Fireworks AI API key \|
16	\| `CEREBRAS_API_KEY` \| Cerebras API key \|
17	\| `XAI_API_KEY` \| xAI (Grok) API key \|
18	\| `OLLAMA_HOST` \| Ollama server URL (default: `http://localhost:11434`) \|
19
20	### Cloud services
21
22	\| Variable \| Description \|
23	\|----------\|-------------\|
24	\| `GOOGLE_APPLICATION_CREDENTIALS` \| Path to Google service account JSON (for server-side Drive access) \|
25	\| `ZOOM_CLIENT_ID` \| Zoom OAuth app client ID \|
26	\| `ZOOM_CLIENT_SECRET` \| Zoom OAuth app client secret \|
27	\| `NOTION_API_KEY` \| Notion integration token \|
28	\| `GITHUB_TOKEN` \| GitHub personal access token \|
29	\| `MICROSOFT_CLIENT_ID` \| Azure AD app client ID (for Microsoft 365) \|
30	\| `MICROSOFT_CLIENT_SECRET` \| Azure AD app client secret \|










31
32	### General
33
34	\| Variable \| Description \|
35	\|----------\|-------------\|
36	\| `CACHE_DIR` \| Directory for API response caching \|
37
38	## Authentication
39
40	Most cloud services use OAuth via the `planopticon auth` command. Run it once per service to store credentials locally:
41
42	```bash
43	planopticon auth google # Google Drive, Docs, Meet, YouTube
44	planopticon auth dropbox # Dropbox
45	planopticon auth zoom # Zoom recordings
	@@ -46,13 +151,24 @@
46	planopticon auth notion # Notion pages
47	planopticon auth github # GitHub repos and wikis
48	planopticon auth microsoft # OneDrive, SharePoint, Teams
49	```
50
51	Credentials are stored in `~/.config/planopticon/`. Use `planopticon auth SERVICE --logout` to remove them.











52
53	For Zoom and Microsoft 365, you also need to set the client ID and secret environment variables before running `planopticon auth`.
54
55	## Provider routing
56
57	PlanOpticon auto-discovers available models and routes each task to the cheapest capable option:
58
59
60	DDED docs/guide/authentication.md

	--- docs/getting-started/configuration.md
	+++ docs/getting-started/configuration.md
	@@ -1,45 +1,150 @@
1	# Configuration
2
3	## Example `.env` file
4
5	Create a `.env` file in your project directory. PlanOpticon loads it automatically.
6
7	```bash
8	# =============================================================================
9	# PlanOpticon Configuration
10	# =============================================================================
11	# Copy this file to .env and fill in the values you need.
12	# You only need ONE AI provider — PlanOpticon auto-detects which are available.
13
14	# --- AI Providers (set at least one) ----------------------------------------
15
16	# OpenAI — get your key at https://platform.openai.com/api-keys
17	OPENAI_API_KEY=sk-...
18
19	# Anthropic — get your key at https://console.anthropic.com/settings/keys
20	ANTHROPIC_API_KEY=sk-ant-...
21
22	# Google Gemini — get your key at https://aistudio.google.com/apikey
23	GEMINI_API_KEY=AI...
24
25	# Azure OpenAI — from your Azure portal deployment
26	# AZURE_OPENAI_API_KEY=...
27	# AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
28
29	# Together AI — https://api.together.xyz/settings/api-keys
30	# TOGETHER_API_KEY=...
31
32	# Fireworks AI — https://fireworks.ai/account/api-keys
33	# FIREWORKS_API_KEY=...
34
35	# Cerebras — https://cloud.cerebras.ai/
36	# CEREBRAS_API_KEY=...
37
38	# xAI (Grok) — https://console.x.ai/
39	# XAI_API_KEY=...
40
41	# Ollama (local, no key needed) — just run: ollama serve
42	# OLLAMA_HOST=http://localhost:11434
43
44	# --- Google (Drive, Docs, Sheets, Meet, YouTube) ----------------------------
45	# Option A: OAuth (interactive, recommended for personal use)
46	# Create credentials at https://console.cloud.google.com/apis/credentials
47	# 1. Create an OAuth 2.0 Client ID (Desktop application)
48	# 2. Enable these APIs: Google Drive API, Google Docs API
49	GOOGLE_CLIENT_ID=123456789-abc.apps.googleusercontent.com
50	GOOGLE_CLIENT_SECRET=GOCSPX-...
51
52	# Option B: Service Account (automated/server-side)
53	# GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
54
55	# --- Zoom (recordings) ------------------------------------------------------
56	# Create an OAuth app at https://marketplace.zoom.us/develop/create
57	# App type: "General App" with OAuth
58	# Scopes: cloud_recording:read:list_user_recordings, cloud_recording:read:recording
59	ZOOM_CLIENT_ID=...
60	ZOOM_CLIENT_SECRET=...
61	# For Server-to-Server (no browser needed):
62	# ZOOM_ACCOUNT_ID=...
63
64	# --- Microsoft 365 (OneDrive, SharePoint, Teams) ----------------------------
65	# Register an app at https://portal.azure.com/#view/Microsoft_AAD_RegisteredApps
66	# API permissions: OnlineMeetings.Read, Files.Read (delegated)
67	MICROSOFT_CLIENT_ID=...
68	MICROSOFT_CLIENT_SECRET=...
69
70	# --- Notion ------------------------------------------------------------------
71	# Option A: OAuth (create integration at https://www.notion.so/my-integrations)
72	# NOTION_CLIENT_ID=...
73	# NOTION_CLIENT_SECRET=...
74
75	# Option B: API key (simpler, from the same integrations page)
76	NOTION_API_KEY=secret_...
77
78	# --- GitHub ------------------------------------------------------------------
79	# Option A: Personal Access Token (simplest)
80	# Create at https://github.com/settings/tokens — needs 'repo' scope
81	GITHUB_TOKEN=ghp_...
82
83	# Option B: OAuth App (for CI/automation)
84	# GITHUB_CLIENT_ID=...
85	# GITHUB_CLIENT_SECRET=...
86
87	# --- Dropbox -----------------------------------------------------------------
88	# Create an app at https://www.dropbox.com/developers/apps
89	# DROPBOX_APP_KEY=...
90	# DROPBOX_APP_SECRET=...
91	# Or use a long-lived access token:
92	# DROPBOX_ACCESS_TOKEN=...
93
94	# --- General -----------------------------------------------------------------
95	# CACHE_DIR=~/.cache/planopticon
96	```
97
98	## Environment variables reference
99
100	### AI providers
101
102	\| Variable \| Required \| Where to get it \|
103	\|----------\|----------\|----------------\|
104	\| `OPENAI_API_KEY` \| At least one provider \| [platform.openai.com/api-keys](https://platform.openai.com/api-keys) \|
105	\| `ANTHROPIC_API_KEY` \| At least one provider \| [console.anthropic.com](https://console.anthropic.com/settings/keys) \|
106	\| `GEMINI_API_KEY` \| At least one provider \| [aistudio.google.com/apikey](https://aistudio.google.com/apikey) \|
107	\| `AZURE_OPENAI_API_KEY` \| Optional \| Azure portal > your OpenAI resource \|
108	\| `AZURE_OPENAI_ENDPOINT` \| With Azure \| Azure portal > your OpenAI resource \|
109	\| `TOGETHER_API_KEY` \| Optional \| [api.together.xyz](https://api.together.xyz/settings/api-keys) \|
110	\| `FIREWORKS_API_KEY` \| Optional \| [fireworks.ai](https://fireworks.ai/account/api-keys) \|
111	\| `CEREBRAS_API_KEY` \| Optional \| [cloud.cerebras.ai](https://cloud.cerebras.ai/) \|
112	\| `XAI_API_KEY` \| Optional \| [console.x.ai](https://console.x.ai/) \|
113	\| `OLLAMA_HOST` \| Optional \| Default: `http://localhost:11434` \|
114
115	### Cloud services
116
117	\| Variable \| Service \| Auth method \|
118	\|----------\|---------\|-------------\|
119	\| `GOOGLE_CLIENT_ID` \| Google (Drive, Docs, Meet) \| OAuth \|
120	\| `GOOGLE_CLIENT_SECRET` \| Google \| OAuth \|
121	\| `GOOGLE_APPLICATION_CREDENTIALS` \| Google \| Service account \|
122	\| `ZOOM_CLIENT_ID` \| Zoom \| OAuth \|
123	\| `ZOOM_CLIENT_SECRET` \| Zoom \| OAuth \|
124	\| `ZOOM_ACCOUNT_ID` \| Zoom \| Server-to-Server \|
125	\| `MICROSOFT_CLIENT_ID` \| Microsoft 365 \| OAuth \|
126	\| `MICROSOFT_CLIENT_SECRET` \| Microsoft 365 \| OAuth \|
127	\| `NOTION_CLIENT_ID` \| Notion \| OAuth \|
128	\| `NOTION_CLIENT_SECRET` \| Notion \| OAuth \|
129	\| `NOTION_API_KEY` \| Notion \| API key \|
130	\| `GITHUB_CLIENT_ID` \| GitHub \| OAuth \|
131	\| `GITHUB_CLIENT_SECRET` \| GitHub \| OAuth \|
132	\| `GITHUB_TOKEN` \| GitHub \| API key \|
133	\| `DROPBOX_APP_KEY` \| Dropbox \| OAuth \|
134	\| `DROPBOX_APP_SECRET` \| Dropbox \| OAuth \|
135	\| `DROPBOX_ACCESS_TOKEN` \| Dropbox \| API key \|
136
137	### General
138
139	\| Variable \| Description \|
140	\|----------\|-------------\|
141	\| `CACHE_DIR` \| Directory for API response caching \|
142
143	## Authentication
144
145	PlanOpticon uses OAuth for cloud services. Run `planopticon auth` once per service — tokens are saved locally and refreshed automatically.
146
147	```bash
148	planopticon auth google # Google Drive, Docs, Meet, YouTube
149	planopticon auth dropbox # Dropbox
150	planopticon auth zoom # Zoom recordings
	@@ -46,13 +151,24 @@
151	planopticon auth notion # Notion pages
152	planopticon auth github # GitHub repos and wikis
153	planopticon auth microsoft # OneDrive, SharePoint, Teams
154	```
155
156	Credentials are stored in `~/.planopticon/`. Use `planopticon auth SERVICE --logout` to remove them.
157
158	### What each service needs
159
160	\| Service \| Minimum setup \| Full OAuth setup \|
161	\|---------\|--------------\|-----------------\|
162	\| Google \| `GOOGLE_CLIENT_ID` + `GOOGLE_CLIENT_SECRET` \| Create OAuth credentials in [Google Cloud Console](https://console.cloud.google.com/apis/credentials) \|
163	\| Zoom \| `ZOOM_CLIENT_ID` + `ZOOM_CLIENT_SECRET` \| Create a General App at [marketplace.zoom.us](https://marketplace.zoom.us/develop/create) \|
164	\| Microsoft \| `MICROSOFT_CLIENT_ID` + `MICROSOFT_CLIENT_SECRET` \| Register app in [Azure AD](https://portal.azure.com/#view/Microsoft_AAD_RegisteredApps) \|
165	\| Notion \| `NOTION_API_KEY` (simplest) \| Create integration at [notion.so/my-integrations](https://www.notion.so/my-integrations) \|
166	\| GitHub \| `GITHUB_TOKEN` (simplest) \| Create token at [github.com/settings/tokens](https://github.com/settings/tokens) \|
167	\| Dropbox \| `DROPBOX_APP_KEY` + `DROPBOX_APP_SECRET` \| Create app at [dropbox.com/developers](https://www.dropbox.com/developers/apps) \|
168
169	For detailed OAuth app creation walkthroughs, see the [Authentication guide](../guide/authentication.md).
170
171	## Provider routing
172
173	PlanOpticon auto-discovers available models and routes each task to the cheapest capable option:
174
175
176	DDED docs/guide/authentication.md

A docs/guide/authentication.md

+525

		--- a/docs/guide/authentication.md
		+++ b/docs/guide/authentication.md
		@@ -0,0 +1,525 @@
	1	+# Authentication
	2	+
	3	+PlanOpticon uses a unified authentication system to connect with cloud services for fetching recordings, documents, and other content. The system is OAuth-first: it prefers OAuth 2.0 flows for security and token management, but falls back to API keys when OAuth is not configured.
	4	+
	5	+## Auth strategy overview
	6	+
	7	+PlanOpticon supports six cloud services out of the box: Google, Dropbox, Zoom, Notion, GitHub, and Microsoft. Each service uses the same authentication chain, implemented through the `OAuthManager` class. You configure credentials once (via environment variables or directly), and PlanOpticon handles token acquisition, storage, refresh, and fallback automatically.
	8	+
	9	+All authentication state is managed through the `planopticon auth` CLI command, the `/auth` companion REPL command, or programmatically via the Python API.
	10	+
	11	+## The auth chain
	12	+
	13	+When you authenticate with a service, PlanOpticon tries the following methods in order. It stops at the first one that succeeds:
	14	+
	15	+1. Saved token -- Checks `~/.planopticon/{service}_token.json` for a previously saved token. If the token has not expired, it is used immediately. If it has expired but a refresh token is available, PlanOpticon attempts an automatic token refresh.
	16	+
	17	+2. Client Credentials grant (Server-to-Server) -- If an `account_id` is configured (e.g., `ZOOM_ACCOUNT_ID`), PlanOpticon attempts a client credentials grant. This is a non-interactive flow suitable for automated pipelines and server-side integrations. No browser is required.
	18	+
	19	+3. OAuth 2.0 Authorization Code with PKCE (interactive) -- If a client ID is configured and OAuth endpoints are available, PlanOpticon initiates an interactive OAuth PKCE flow. It opens a browser to the service's authorization page, waits for you to paste the authorization code, and exchanges it for tokens. The tokens are saved for future use.
	20	+
	21	+4. API key fallback -- If no OAuth method succeeds, PlanOpticon checks for a service-specific API key environment variable (e.g., `GITHUB_TOKEN`, `NOTION_API_KEY`). This is the simplest setup but may have reduced capabilities compared to OAuth.
	22	+
	23	+If none of the four methods succeed, PlanOpticon returns an error with hints about which environment variables to set.
	24	+
	25	+## Token storage
	26	+
	27	+Tokens are persisted as JSON files in `~/.planopticon/`:
	28	+
	29	+```
	30	+~/.planopticon/
	31	+ google_token.json
	32	+ dropbox_token.json
	33	+ zoom_token.json
	34	+ notion_token.json
	35	+ github_token.json
	36	+ microsoft_token.json
	37	+```
	38	+
	39	+Each token file contains:
	40	+
	41	+\| Field \| Description \|
	42	+\|-------\|-------------\|
	43	+\| `access_token` \| The current access token \|
	44	+\| `refresh_token` \| Refresh token for automatic renewal (if provided by the service) \|
	45	+\| `expires_at` \| Unix timestamp when the token expires (with a 60-second safety margin) \|
	46	+\| `client_id` \| The client ID used for this token (for refresh) \|
	47	+\| `client_secret` \| The client secret used (for refresh) \|
	48	+
	49	+The `~/.planopticon/` directory is created automatically on first use. Token files are overwritten on each successful authentication or refresh.
	50	+
	51	+To remove a saved token, use `planopticon auth <service> --logout` or delete the file directly.
	52	+
	53	+## Supported services
	54	+
	55	+### Google
	56	+
	57	+Google authentication provides access to Google Drive and Google Docs for fetching documents, recordings, and other content.
	58	+
	59	+Scopes requested:
	60	+
	61	+- `https://www.googleapis.com/auth/drive.readonly`
	62	+- `https://www.googleapis.com/auth/documents.readonly`
	63	+
	64	+Environment variables:
	65	+
	66	+\| Variable \| Required \| Description \|
	67	+\|----------\|----------\|-------------\|
	68	+\| `GOOGLE_CLIENT_ID` \| For OAuth \| OAuth 2.0 Client ID from Google Cloud Console \|
	69	+\| `GOOGLE_CLIENT_SECRET` \| For OAuth \| OAuth 2.0 Client Secret \|
	70	+\| `GOOGLE_API_KEY` \| Fallback \| API key (limited access, no user-specific data) \|
	71	+
	72	+OAuth app setup:
	73	+
	74	+1. Go to the [Google Cloud Console](https://console.cloud.google.com/).
	75	+2. Create a project (or select an existing one).
	76	+3. Navigate to APIs & Services > Credentials.
	77	+4. Click Create Credentials > OAuth client ID.
	78	+5. Choose Desktop app as the application type.
	79	+6. Copy the Client ID and Client Secret.
	80	+7. Under APIs & Services > Library, enable the Google Drive API and Google Docs API.
	81	+8. Set the environment variables:
	82	+
	83	+```bash
	84	+export GOOGLE_CLIENT_ID="your-client-id.apps.googleusercontent.com"
	85	+export GOOGLE_CLIENT_SECRET="your-client-secret"
	86	+```
	87	+
	88	+Service account fallback: For automated pipelines, you can use a Google service account instead of OAuth. Generate a service account key JSON file from the Google Cloud Console and set `GOOGLE_APPLICATION_CREDENTIALS` to point to it. The PlanOpticon Google Workspace connector (`planopticon gws`) uses the `gws` CLI which has its own auth flow via `gws auth login`.
	89	+
	90	+### Dropbox
	91	+
	92	+Dropbox authentication provides access to files stored in Dropbox.
	93	+
	94	+Environment variables:
	95	+
	96	+\| Variable \| Required \| Description \|
	97	+\|----------\|----------\|-------------\|
	98	+\| `DROPBOX_APP_KEY` \| For OAuth \| App key from the Dropbox App Console \|
	99	+\| `DROPBOX_APP_SECRET` \| For OAuth \| App secret \|
	100	+\| `DROPBOX_ACCESS_TOKEN` \| Fallback \| Long-lived access token (for quick setup) \|
	101	+
	102	+OAuth app setup:
	103	+
	104	+1. Go to the [Dropbox App Console](https://www.dropbox.com/developers/apps).
	105	+2. Click Create App.
	106	+3. Choose Scoped access and Full Dropbox (or App folder for restricted access).
	107	+4. Copy the App key and App secret from the Settings tab.
	108	+5. Set the environment variables:
	109	+
	110	+```bash
	111	+export DROPBOX_APP_KEY="your-app-key"
	112	+export DROPBOX_APP_SECRET="your-app-secret"
	113	+```
	114	+
	115	+Access token shortcut: For quick testing, you can generate an access token directly from the app's Settings page in the Dropbox App Console and set it as `DROPBOX_ACCESS_TOKEN`. This bypasses OAuth entirely but the token may have a limited lifetime.
	116	+
	117	+### Zoom
	118	+
	119	+Zoom authentication provides access to cloud recordings, meeting metadata, and transcripts.
	120	+
	121	+Environment variables:
	122	+
	123	+\| Variable \| Required \| Description \|
	124	+\|----------\|----------\|-------------\|
	125	+\| `ZOOM_CLIENT_ID` \| For OAuth \| OAuth client ID from the Zoom Marketplace \|
	126	+\| `ZOOM_CLIENT_SECRET` \| For OAuth \| OAuth client secret \|
	127	+\| `ZOOM_ACCOUNT_ID` \| For S2S \| Account ID for Server-to-Server OAuth \|
	128	+
	129	+Server-to-Server (recommended for automation):
	130	+
	131	+When `ZOOM_ACCOUNT_ID` is set alongside `ZOOM_CLIENT_ID` and `ZOOM_CLIENT_SECRET`, PlanOpticon uses the client credentials grant (Server-to-Server OAuth). This is non-interactive and ideal for CI/CD pipelines and scheduled jobs.
	132	+
	133	+1. Go to the [Zoom Marketplace](https://marketplace.zoom.us/).
	134	+2. Click Develop > Build App.
	135	+3. Choose Server-to-Server OAuth.
	136	+4. Copy the Account ID, Client ID, and Client Secret.
	137	+5. Add the required scopes: `recording:read:admin` (or `recording:read`).
	138	+6. Set the environment variables:
	139	+
	140	+```bash
	141	+export ZOOM_CLIENT_ID="your-client-id"
	142	+export ZOOM_CLIENT_SECRET="your-client-secret"
	143	+export ZOOM_ACCOUNT_ID="your-account-id"
	144	+```
	145	+
	146	+User-level OAuth PKCE:
	147	+
	148	+If `ZOOM_ACCOUNT_ID` is not set, PlanOpticon falls back to the interactive OAuth PKCE flow. This opens a browser window for the user to authorize access.
	149	+
	150	+1. In the Zoom Marketplace, create a General App (or OAuth app).
	151	+2. Set the redirect URI to `urn:ietf:wg:oauth:2.0:oob` (out-of-band).
	152	+3. Copy the Client ID and Client Secret.
	153	+
	154	+### Notion
	155	+
	156	+Notion authentication provides access to pages, databases, and content in your Notion workspace.
	157	+
	158	+Environment variables:
	159	+
	160	+\| Variable \| Required \| Description \|
	161	+\|----------\|----------\|-------------\|
	162	+\| `NOTION_CLIENT_ID` \| For OAuth \| OAuth client ID from the Notion Integrations page \|
	163	+\| `NOTION_CLIENT_SECRET` \| For OAuth \| OAuth client secret \|
	164	+\| `NOTION_API_KEY` \| Fallback \| Internal integration token \|
	165	+
	166	+OAuth app setup:
	167	+
	168	+1. Go to [My Integrations](https://www.notion.so/my-integrations) in Notion.
	169	+2. Click New integration.
	170	+3. Select Public integration (required for OAuth).
	171	+4. Copy the OAuth Client ID and Client Secret.
	172	+5. Set the redirect URI.
	173	+6. Set the environment variables:
	174	+
	175	+```bash
	176	+export NOTION_CLIENT_ID="your-client-id"
	177	+export NOTION_CLIENT_SECRET="your-client-secret"
	178	+```
	179	+
	180	+Internal integration (API key fallback):
	181	+
	182	+For simpler setups, create an Internal integration from the Notion Integrations page. Copy the integration token and set it as `NOTION_API_KEY`. You must also share the relevant Notion pages/databases with the integration.
	183	+
	184	+```bash
	185	+export NOTION_API_KEY="ntn_your-integration-token"
	186	+```
	187	+
	188	+### GitHub
	189	+
	190	+GitHub authentication provides access to repositories, issues, and organization data.
	191	+
	192	+Scopes requested:
	193	+
	194	+- `repo`
	195	+- `read:org`
	196	+
	197	+Environment variables:
	198	+
	199	+\| Variable \| Required \| Description \|
	200	+\|----------\|----------\|-------------\|
	201	+\| `GITHUB_CLIENT_ID` \| For OAuth \| OAuth App client ID \|
	202	+\| `GITHUB_CLIENT_SECRET` \| For OAuth \| OAuth App client secret \|
	203	+\| `GITHUB_TOKEN` \| Fallback \| Personal access token (classic or fine-grained) \|
	204	+
	205	+OAuth app setup:
	206	+
	207	+1. Go to GitHub > Settings > Developer Settings > OAuth Apps.
	208	+2. Click New OAuth App.
	209	+3. Set the Authorization callback URL to `urn:ietf:wg:oauth:2.0:oob`.
	210	+4. Copy the Client ID and generate a Client Secret.
	211	+5. Set the environment variables:
	212	+
	213	+```bash
	214	+export GITHUB_CLIENT_ID="your-client-id"
	215	+export GITHUB_CLIENT_SECRET="your-client-secret"
	216	+```
	217	+
	218	+Personal access token (recommended for most users):
	219	+
	220	+The simplest approach is to create a Personal Access Token:
	221	+
	222	+1. Go to GitHub > Settings > Developer Settings > Personal Access Tokens.
	223	+2. Generate a token with `repo` and `read:org` scopes.
	224	+3. Set it as `GITHUB_TOKEN`:
	225	+
	226	+```bash
	227	+export GITHUB_TOKEN="ghp_your-token"
	228	+```
	229	+
	230	+### Microsoft
	231	+
	232	+Microsoft authentication provides access to Microsoft 365 resources via the Microsoft Graph API, including OneDrive, SharePoint, and Teams recordings.
	233	+
	234	+Scopes requested:
	235	+
	236	+- `https://graph.microsoft.com/OnlineMeetings.Read`
	237	+- `https://graph.microsoft.com/Files.Read`
	238	+
	239	+Environment variables:
	240	+
	241	+\| Variable \| Required \| Description \|
	242	+\|----------\|----------\|-------------\|
	243	+\| `MICROSOFT_CLIENT_ID` \| For OAuth \| Application (client) ID from Azure AD \|
	244	+\| `MICROSOFT_CLIENT_SECRET` \| For OAuth \| Client secret from Azure AD \|
	245	+
	246	+Azure AD app registration:
	247	+
	248	+1. Go to the [Azure Portal](https://portal.azure.com/).
	249	+2. Navigate to Azure Active Directory > App registrations.
	250	+3. Click New registration.
	251	+4. Name the application (e.g., "PlanOpticon").
	252	+5. Under Supported account types, select the appropriate option for your organization.
	253	+6. Set the redirect URI to `urn:ietf:wg:oauth:2.0:oob` with platform Mobile and desktop applications.
	254	+7. After registration, go to Certificates & secrets and create a new client secret.
	255	+8. Under API permissions, add:
	256	+ - `OnlineMeetings.Read`
	257	+ - `Files.Read`
	258	+9. Grant admin consent if required by your organization.
	259	+10. Set the environment variables:
	260	+
	261	+```bash
	262	+export MICROSOFT_CLIENT_ID="your-application-id"
	263	+export MICROSOFT_CLIENT_SECRET="your-client-secret"
	264	+```
	265	+
	266	+Microsoft 365 CLI: The `planopticon m365` commands use the `@pnp/cli-microsoft365` npm package, which has its own authentication flow via `m365 login`. This is separate from the OAuth flow described above.
	267	+
	268	+## CLI usage
	269	+
	270	+### `planopticon auth`
	271	+
	272	+Authenticate with a cloud service or manage saved tokens.
	273	+
	274	+```
	275	+planopticon auth SERVICE [--logout]
	276	+```
	277	+
	278	+Arguments:
	279	+
	280	+\| Argument \| Description \|
	281	+\|----------\|-------------\|
	282	+\| `SERVICE` \| One of: `google`, `dropbox`, `zoom`, `notion`, `github`, `microsoft` \|
	283	+
	284	+Options:
	285	+
	286	+\| Option \| Description \|
	287	+\|--------\|-------------\|
	288	+\| `--logout` \| Clear the saved token for the specified service \|
	289	+
	290	+Examples:
	291	+
	292	+```bash
	293	+# Authenticate with Google (triggers OAuth flow or uses saved token)
	294	+planopticon auth google
	295	+
	296	+# Authenticate with Zoom
	297	+planopticon auth zoom
	298	+
	299	+# Clear saved GitHub token
	300	+planopticon auth github --logout
	301	+```
	302	+
	303	+On success, the command prints the authentication method used:
	304	+
	305	+```
	306	+Google authentication successful (oauth_pkce).
	307	+```
	308	+
	309	+or
	310	+
	311	+```
	312	+Github authentication successful (api_key).
	313	+```
	314	+
	315	+### Companion REPL `/auth`
	316	+
	317	+Inside the interactive companion REPL (`planopticon -C` or `planopticon -I`), you can authenticate with services using the `/auth` command:
	318	+
	319	+```
	320	+/auth SERVICE
	321	+```
	322	+
	323	+Without arguments, `/auth` lists all available services:
	324	+
	325	+```
	326	+> /auth
	327	+Usage: /auth SERVICE
	328	+Available: dropbox, github, google, microsoft, notion, zoom
	329	+```
	330	+
	331	+With a service name, it runs the same auth chain as the CLI command:
	332	+
	333	+```
	334	+> /auth github
	335	+Github authentication successful (api_key).
	336	+```
	337	+
	338	+## Environment variables reference
	339	+
	340	+The following table summarizes all environment variables used by the authentication system:
	341	+
	342	+\| Service \| OAuth Client ID \| OAuth Client Secret \| API Key / Token \| Account ID \|
	343	+\|---------\|----------------\|--------------------\|--------------------\|------------\|
	344	+\| Google \| `GOOGLE_CLIENT_ID` \| `GOOGLE_CLIENT_SECRET` \| `GOOGLE_API_KEY` \| -- \|
	345	+\| Dropbox \| `DROPBOX_APP_KEY` \| `DROPBOX_APP_SECRET` \| `DROPBOX_ACCESS_TOKEN` \| -- \|
	346	+\| Zoom \| `ZOOM_CLIENT_ID` \| `ZOOM_CLIENT_SECRET` \| -- \| `ZOOM_ACCOUNT_ID` \|
	347	+\| Notion \| `NOTION_CLIENT_ID` \| `NOTION_CLIENT_SECRET` \| `NOTION_API_KEY` \| -- \|
	348	+\| GitHub \| `GITHUB_CLIENT_ID` \| `GITHUB_CLIENT_SECRET` \| `GITHUB_TOKEN` \| -- \|
	349	+\| Microsoft \| `MICROSOFT_CLIENT_ID` \| `MICROSOFT_CLIENT_SECRET` \| -- \| -- \|
	350	+
	351	+## Python API
	352	+
	353	+### AuthConfig
	354	+
	355	+The `AuthConfig` dataclass defines the authentication configuration for a service. It holds OAuth endpoints, credential references, scopes, and token storage paths.
	356	+
	357	+```python
	358	+from video_processor.auth import AuthConfig
	359	+
	360	+config = AuthConfig(
	361	+ service="myservice",
	362	+ oauth_authorize_url="https://example.com/oauth/authorize",
	363	+ oauth_token_url="https://example.com/oauth/token",
	364	+ client_id_env="MYSERVICE_CLIENT_ID",
	365	+ client_secret_env="MYSERVICE_CLIENT_SECRET",
	366	+ api_key_env="MYSERVICE_API_KEY",
	367	+ scopes=["read", "write"],
	368	+)
	369	+```
	370	+
	371	+Key fields:
	372	+
	373	+\| Field \| Type \| Description \|
	374	+\|-------\|------\|-------------\|
	375	+\| `service` \| `str` \| Service identifier (used for token filename) \|
	376	+\| `oauth_authorize_url` \| `Optional[str]` \| OAuth authorization endpoint \|
	377	+\| `oauth_token_url` \| `Optional[str]` \| OAuth token endpoint \|
	378	+\| `client_id` / `client_id_env` \| `Optional[str]` \| Client ID value or env var name \|
	379	+\| `client_secret` / `client_secret_env` \| `Optional[str]` \| Client secret value or env var name \|
	380	+\| `api_key_env` \| `Optional[str]` \| Environment variable for API key fallback \|
	381	+\| `scopes` \| `List[str]` \| OAuth scopes to request \|
	382	+\| `redirect_uri` \| `str` \| Redirect URI (default: `urn:ietf:wg:oauth:2.0:oob`) \|
	383	+\| `account_id` / `account_id_env` \| `Optional[str]` \| Account ID for client credentials grant \|
	384	+\| `token_path` \| `Optional[Path]` \| Override token storage path \|
	385	+
	386	+Resolved properties:
	387	+
	388	+- `resolved_client_id` -- Returns the client ID from the direct value or environment variable.
	389	+- `resolved_client_secret` -- Returns the client secret from the direct value or environment variable.
	390	+- `resolved_api_key` -- Returns the API key from the environment variable.
	391	+- `resolved_account_id` -- Returns the account ID from the direct value or environment variable.
	392	+- `resolved_token_path` -- Returns the token file path (default: `~/.planopticon/{service}_token.json`).
	393	+- `supports_oauth` -- Returns `True` if both OAuth endpoints are configured.
	394	+
	395	+### OAuthManager
	396	+
	397	+The `OAuthManager` class manages the full authentication lifecycle for a service.
	398	+
	399	+```python
	400	+from video_processor.auth import OAuthManager, AuthConfig
	401	+
	402	+config = AuthConfig(
	403	+ service="notion",
	404	+ oauth_authorize_url="https://api.notion.com/v1/oauth/authorize",
	405	+ oauth_token_url="https://api.notion.com/v1/oauth/token",
	406	+ client_id_env="NOTION_CLIENT_ID",
	407	+ client_secret_env="NOTION_CLIENT_SECRET",
	408	+ api_key_env="NOTION_API_KEY",
	409	+ scopes=["read_content"],
	410	+)
	411	+manager = OAuthManager(config)
	412	+
	413	+# Full auth chain -- returns AuthResult
	414	+result = manager.authenticate()
	415	+if result.success:
	416	+ print(f"Authenticated via {result.method}")
	417	+ print(f"Token: {result.access_token[:20]}...")
	418	+
	419	+# Convenience method -- returns just the token string or None
	420	+token = manager.get_token()
	421	+
	422	+# Clear saved token (logout)
	423	+manager.clear_token()
	424	+```
	425	+
	426	+AuthResult fields:
	427	+
	428	+\| Field \| Type \| Description \|
	429	+\|-------\|------\|-------------\|
	430	+\| `success` \| `bool` \| Whether authentication succeeded \|
	431	+\| `access_token` \| `Optional[str]` \| The access token (if successful) \|
	432	+\| `method` \| `Optional[str]` \| One of: `saved_token`, `oauth_pkce`, `client_credentials`, `api_key` \|
	433	+\| `expires_at` \| `Optional[float]` \| Token expiry as a Unix timestamp \|
	434	+\| `refresh_token` \| `Optional[str]` \| Refresh token (if provided) \|
	435	+\| `error` \| `Optional[str]` \| Error message (if unsuccessful) \|
	436	+
	437	+### Pre-built configs
	438	+
	439	+PlanOpticon ships with pre-built `AuthConfig` instances for all six supported services. Access them via convenience functions:
	440	+
	441	+```python
	442	+from video_processor.auth import get_auth_config, get_auth_manager
	443	+
	444	+# Get just the config
	445	+config = get_auth_config("zoom")
	446	+
	447	+# Get a ready-to-use manager
	448	+manager = get_auth_manager("github")
	449	+token = manager.get_token()
	450	+```
	451	+
	452	+### Building custom connectors
	453	+
	454	+To add authentication for a new service, create an `AuthConfig` with the service's OAuth endpoints and credential environment variables:
	455	+
	456	+```python
	457	+from video_processor.auth import AuthConfig, OAuthManager
	458	+
	459	+config = AuthConfig(
	460	+ service="slack",
	461	+ oauth_authorize_url="https://slack.com/oauth/v2/authorize",
	462	+ oauth_token_url="https://slack.com/api/oauth.v2.access",
	463	+ client_id_env="SLACK_CLIENT_ID",
	464	+ client_secret_env="SLACK_CLIENT_SECRET",
	465	+ api_key_env="SLACK_BOT_TOKEN",
	466	+ scopes=["channels:read", "channels:history"],
	467	+)
	468	+
	469	+manager = OAuthManager(config)
	470	+result = manager.authenticate()
	471	+```
	472	+
	473	+The token will be saved to `~/.planopticon/slack_token.json` and automatically refreshed on subsequent calls.
	474	+
	475	+## Troubleshooting
	476	+
	477	+### "No auth method available for {service}"
	478	+
	479	+This means none of the four auth methods succeeded. Check that:
	480	+
	481	+- The required environment variables are set and non-empty.
	482	+- For OAuth: both the client ID and client secret (or app key/secret) are set.
	483	+- For API key fallback: the correct environment variable is set.
	484	+
	485	+The error message includes hints about which variables to set.
	486	+
	487	+### Token refresh fails
	488	+
	489	+If automatic token refresh fails, PlanOpticon falls back to the next auth method in the chain. Common causes:
	490	+
	491	+- The refresh token has been revoked (e.g., you changed your password or revoked app access).
	492	+- The OAuth app's client secret has changed.
	493	+- The service requires re-authorization after a certain period.
	494	+
	495	+To resolve, clear the token and re-authenticate:
	496	+
	497	+```bash
	498	+planopticon auth google --logout
	499	+planopticon auth google
	500	+```
	501	+
	502	+### OAuth PKCE flow does not open a browser
	503	+
	504	+If the browser does not open automatically, PlanOpticon prints the authorization URL to the terminal. Copy and paste it into your browser manually. After authorizing, paste the authorization code back into the terminal prompt.
	505	+
	506	+### "requests not installed"
	507	+
	508	+The OAuth flows require the `requests` library. It is included as a dependency of PlanOpticon, but if you installed PlanOpticon in a minimal environment, install it manually:
	509	+
	510	+```bash
	511	+pip install requests
	512	+```
	513	+
	514	+### Permission denied on token file
	515	+
	516	+PlanOpticon needs write access to `~/.planopticon/`. If the directory or token files have restrictive permissions, adjust them:
	517	+
	518	+```bash
	519	+chmod 700 ~/.planopticon
	520	+chmod 600 ~/.planopticon/*_token.json
	521	+```
	522	+
	523	+### Microsoft authentication uses the `/common` tenant
	524	+
	525	+The default Microsoft OAuth configuration uses the `common` tenant endpoint (`login.microsoftonline.com/common/...`), which supports both personal Microsoft accounts and Azure AD organizational accounts. If your organization requires a specific tenant, you can create a custom `AuthConfig` with the tenant-specific URLs.

	--- a/docs/guide/authentication.md
	+++ b/docs/guide/authentication.md
	@@ -0,0 +1,525 @@

	--- a/docs/guide/authentication.md
	+++ b/docs/guide/authentication.md
	@@ -0,0 +1,525 @@
1	# Authentication
2
3	PlanOpticon uses a unified authentication system to connect with cloud services for fetching recordings, documents, and other content. The system is OAuth-first: it prefers OAuth 2.0 flows for security and token management, but falls back to API keys when OAuth is not configured.
4
5	## Auth strategy overview
6
7	PlanOpticon supports six cloud services out of the box: Google, Dropbox, Zoom, Notion, GitHub, and Microsoft. Each service uses the same authentication chain, implemented through the `OAuthManager` class. You configure credentials once (via environment variables or directly), and PlanOpticon handles token acquisition, storage, refresh, and fallback automatically.
8
9	All authentication state is managed through the `planopticon auth` CLI command, the `/auth` companion REPL command, or programmatically via the Python API.
10
11	## The auth chain
12
13	When you authenticate with a service, PlanOpticon tries the following methods in order. It stops at the first one that succeeds:
14
15	1. Saved token -- Checks `~/.planopticon/{service}_token.json` for a previously saved token. If the token has not expired, it is used immediately. If it has expired but a refresh token is available, PlanOpticon attempts an automatic token refresh.
16
17	2. Client Credentials grant (Server-to-Server) -- If an `account_id` is configured (e.g., `ZOOM_ACCOUNT_ID`), PlanOpticon attempts a client credentials grant. This is a non-interactive flow suitable for automated pipelines and server-side integrations. No browser is required.
18
19	3. OAuth 2.0 Authorization Code with PKCE (interactive) -- If a client ID is configured and OAuth endpoints are available, PlanOpticon initiates an interactive OAuth PKCE flow. It opens a browser to the service's authorization page, waits for you to paste the authorization code, and exchanges it for tokens. The tokens are saved for future use.
20
21	4. API key fallback -- If no OAuth method succeeds, PlanOpticon checks for a service-specific API key environment variable (e.g., `GITHUB_TOKEN`, `NOTION_API_KEY`). This is the simplest setup but may have reduced capabilities compared to OAuth.
22
23	If none of the four methods succeed, PlanOpticon returns an error with hints about which environment variables to set.
24
25	## Token storage
26
27	Tokens are persisted as JSON files in `~/.planopticon/`:
28
29	```
30	~/.planopticon/
31	google_token.json
32	dropbox_token.json
33	zoom_token.json
34	notion_token.json
35	github_token.json
36	microsoft_token.json
37	```
38
39	Each token file contains:
40
41	\| Field \| Description \|
42	\|-------\|-------------\|
43	\| `access_token` \| The current access token \|
44	\| `refresh_token` \| Refresh token for automatic renewal (if provided by the service) \|
45	\| `expires_at` \| Unix timestamp when the token expires (with a 60-second safety margin) \|
46	\| `client_id` \| The client ID used for this token (for refresh) \|
47	\| `client_secret` \| The client secret used (for refresh) \|
48
49	The `~/.planopticon/` directory is created automatically on first use. Token files are overwritten on each successful authentication or refresh.
50
51	To remove a saved token, use `planopticon auth <service> --logout` or delete the file directly.
52
53	## Supported services
54
55	### Google
56
57	Google authentication provides access to Google Drive and Google Docs for fetching documents, recordings, and other content.
58
59	Scopes requested:
60
61	- `https://www.googleapis.com/auth/drive.readonly`
62	- `https://www.googleapis.com/auth/documents.readonly`
63
64	Environment variables:
65
66	\| Variable \| Required \| Description \|
67	\|----------\|----------\|-------------\|
68	\| `GOOGLE_CLIENT_ID` \| For OAuth \| OAuth 2.0 Client ID from Google Cloud Console \|
69	\| `GOOGLE_CLIENT_SECRET` \| For OAuth \| OAuth 2.0 Client Secret \|
70	\| `GOOGLE_API_KEY` \| Fallback \| API key (limited access, no user-specific data) \|
71
72	OAuth app setup:
73
74	1. Go to the [Google Cloud Console](https://console.cloud.google.com/).
75	2. Create a project (or select an existing one).
76	3. Navigate to APIs & Services > Credentials.
77	4. Click Create Credentials > OAuth client ID.
78	5. Choose Desktop app as the application type.
79	6. Copy the Client ID and Client Secret.
80	7. Under APIs & Services > Library, enable the Google Drive API and Google Docs API.
81	8. Set the environment variables:
82
83	```bash
84	export GOOGLE_CLIENT_ID="your-client-id.apps.googleusercontent.com"
85	export GOOGLE_CLIENT_SECRET="your-client-secret"
86	```
87
88	Service account fallback: For automated pipelines, you can use a Google service account instead of OAuth. Generate a service account key JSON file from the Google Cloud Console and set `GOOGLE_APPLICATION_CREDENTIALS` to point to it. The PlanOpticon Google Workspace connector (`planopticon gws`) uses the `gws` CLI which has its own auth flow via `gws auth login`.
89
90	### Dropbox
91
92	Dropbox authentication provides access to files stored in Dropbox.
93
94	Environment variables:
95
96	\| Variable \| Required \| Description \|
97	\|----------\|----------\|-------------\|
98	\| `DROPBOX_APP_KEY` \| For OAuth \| App key from the Dropbox App Console \|
99	\| `DROPBOX_APP_SECRET` \| For OAuth \| App secret \|
100	\| `DROPBOX_ACCESS_TOKEN` \| Fallback \| Long-lived access token (for quick setup) \|
101
102	OAuth app setup:
103
104	1. Go to the [Dropbox App Console](https://www.dropbox.com/developers/apps).
105	2. Click Create App.
106	3. Choose Scoped access and Full Dropbox (or App folder for restricted access).
107	4. Copy the App key and App secret from the Settings tab.
108	5. Set the environment variables:
109
110	```bash
111	export DROPBOX_APP_KEY="your-app-key"
112	export DROPBOX_APP_SECRET="your-app-secret"
113	```
114
115	Access token shortcut: For quick testing, you can generate an access token directly from the app's Settings page in the Dropbox App Console and set it as `DROPBOX_ACCESS_TOKEN`. This bypasses OAuth entirely but the token may have a limited lifetime.
116
117	### Zoom
118
119	Zoom authentication provides access to cloud recordings, meeting metadata, and transcripts.
120
121	Environment variables:
122
123	\| Variable \| Required \| Description \|
124	\|----------\|----------\|-------------\|
125	\| `ZOOM_CLIENT_ID` \| For OAuth \| OAuth client ID from the Zoom Marketplace \|
126	\| `ZOOM_CLIENT_SECRET` \| For OAuth \| OAuth client secret \|
127	\| `ZOOM_ACCOUNT_ID` \| For S2S \| Account ID for Server-to-Server OAuth \|
128
129	Server-to-Server (recommended for automation):
130
131	When `ZOOM_ACCOUNT_ID` is set alongside `ZOOM_CLIENT_ID` and `ZOOM_CLIENT_SECRET`, PlanOpticon uses the client credentials grant (Server-to-Server OAuth). This is non-interactive and ideal for CI/CD pipelines and scheduled jobs.
132
133	1. Go to the [Zoom Marketplace](https://marketplace.zoom.us/).
134	2. Click Develop > Build App.
135	3. Choose Server-to-Server OAuth.
136	4. Copy the Account ID, Client ID, and Client Secret.
137	5. Add the required scopes: `recording:read:admin` (or `recording:read`).
138	6. Set the environment variables:
139
140	```bash
141	export ZOOM_CLIENT_ID="your-client-id"
142	export ZOOM_CLIENT_SECRET="your-client-secret"
143	export ZOOM_ACCOUNT_ID="your-account-id"
144	```
145
146	User-level OAuth PKCE:
147
148	If `ZOOM_ACCOUNT_ID` is not set, PlanOpticon falls back to the interactive OAuth PKCE flow. This opens a browser window for the user to authorize access.
149
150	1. In the Zoom Marketplace, create a General App (or OAuth app).
151	2. Set the redirect URI to `urn:ietf:wg:oauth:2.0:oob` (out-of-band).
152	3. Copy the Client ID and Client Secret.
153
154	### Notion
155
156	Notion authentication provides access to pages, databases, and content in your Notion workspace.
157
158	Environment variables:
159
160	\| Variable \| Required \| Description \|
161	\|----------\|----------\|-------------\|
162	\| `NOTION_CLIENT_ID` \| For OAuth \| OAuth client ID from the Notion Integrations page \|
163	\| `NOTION_CLIENT_SECRET` \| For OAuth \| OAuth client secret \|
164	\| `NOTION_API_KEY` \| Fallback \| Internal integration token \|
165
166	OAuth app setup:
167
168	1. Go to [My Integrations](https://www.notion.so/my-integrations) in Notion.
169	2. Click New integration.
170	3. Select Public integration (required for OAuth).
171	4. Copy the OAuth Client ID and Client Secret.
172	5. Set the redirect URI.
173	6. Set the environment variables:
174
175	```bash
176	export NOTION_CLIENT_ID="your-client-id"
177	export NOTION_CLIENT_SECRET="your-client-secret"
178	```
179
180	Internal integration (API key fallback):
181
182	For simpler setups, create an Internal integration from the Notion Integrations page. Copy the integration token and set it as `NOTION_API_KEY`. You must also share the relevant Notion pages/databases with the integration.
183
184	```bash
185	export NOTION_API_KEY="ntn_your-integration-token"
186	```
187
188	### GitHub
189
190	GitHub authentication provides access to repositories, issues, and organization data.
191
192	Scopes requested:
193
194	- `repo`
195	- `read:org`
196
197	Environment variables:
198
199	\| Variable \| Required \| Description \|
200	\|----------\|----------\|-------------\|
201	\| `GITHUB_CLIENT_ID` \| For OAuth \| OAuth App client ID \|
202	\| `GITHUB_CLIENT_SECRET` \| For OAuth \| OAuth App client secret \|
203	\| `GITHUB_TOKEN` \| Fallback \| Personal access token (classic or fine-grained) \|
204
205	OAuth app setup:
206
207	1. Go to GitHub > Settings > Developer Settings > OAuth Apps.
208	2. Click New OAuth App.
209	3. Set the Authorization callback URL to `urn:ietf:wg:oauth:2.0:oob`.
210	4. Copy the Client ID and generate a Client Secret.
211	5. Set the environment variables:
212
213	```bash
214	export GITHUB_CLIENT_ID="your-client-id"
215	export GITHUB_CLIENT_SECRET="your-client-secret"
216	```
217
218	Personal access token (recommended for most users):
219
220	The simplest approach is to create a Personal Access Token:
221
222	1. Go to GitHub > Settings > Developer Settings > Personal Access Tokens.
223	2. Generate a token with `repo` and `read:org` scopes.
224	3. Set it as `GITHUB_TOKEN`:
225
226	```bash
227	export GITHUB_TOKEN="ghp_your-token"
228	```
229
230	### Microsoft
231
232	Microsoft authentication provides access to Microsoft 365 resources via the Microsoft Graph API, including OneDrive, SharePoint, and Teams recordings.
233
234	Scopes requested:
235
236	- `https://graph.microsoft.com/OnlineMeetings.Read`
237	- `https://graph.microsoft.com/Files.Read`
238
239	Environment variables:
240
241	\| Variable \| Required \| Description \|
242	\|----------\|----------\|-------------\|
243	\| `MICROSOFT_CLIENT_ID` \| For OAuth \| Application (client) ID from Azure AD \|
244	\| `MICROSOFT_CLIENT_SECRET` \| For OAuth \| Client secret from Azure AD \|
245
246	Azure AD app registration:
247
248	1. Go to the [Azure Portal](https://portal.azure.com/).
249	2. Navigate to Azure Active Directory > App registrations.
250	3. Click New registration.
251	4. Name the application (e.g., "PlanOpticon").
252	5. Under Supported account types, select the appropriate option for your organization.
253	6. Set the redirect URI to `urn:ietf:wg:oauth:2.0:oob` with platform Mobile and desktop applications.
254	7. After registration, go to Certificates & secrets and create a new client secret.
255	8. Under API permissions, add:
256	- `OnlineMeetings.Read`
257	- `Files.Read`
258	9. Grant admin consent if required by your organization.
259	10. Set the environment variables:
260
261	```bash
262	export MICROSOFT_CLIENT_ID="your-application-id"
263	export MICROSOFT_CLIENT_SECRET="your-client-secret"
264	```
265
266	Microsoft 365 CLI: The `planopticon m365` commands use the `@pnp/cli-microsoft365` npm package, which has its own authentication flow via `m365 login`. This is separate from the OAuth flow described above.
267
268	## CLI usage
269
270	### `planopticon auth`
271
272	Authenticate with a cloud service or manage saved tokens.
273
274	```
275	planopticon auth SERVICE [--logout]
276	```
277
278	Arguments:
279
280	\| Argument \| Description \|
281	\|----------\|-------------\|
282	\| `SERVICE` \| One of: `google`, `dropbox`, `zoom`, `notion`, `github`, `microsoft` \|
283
284	Options:
285
286	\| Option \| Description \|
287	\|--------\|-------------\|
288	\| `--logout` \| Clear the saved token for the specified service \|
289
290	Examples:
291
292	```bash
293	# Authenticate with Google (triggers OAuth flow or uses saved token)
294	planopticon auth google
295
296	# Authenticate with Zoom
297	planopticon auth zoom
298
299	# Clear saved GitHub token
300	planopticon auth github --logout
301	```
302
303	On success, the command prints the authentication method used:
304
305	```
306	Google authentication successful (oauth_pkce).
307	```
308
309	or
310
311	```
312	Github authentication successful (api_key).
313	```
314
315	### Companion REPL `/auth`
316
317	Inside the interactive companion REPL (`planopticon -C` or `planopticon -I`), you can authenticate with services using the `/auth` command:
318
319	```
320	/auth SERVICE
321	```
322
323	Without arguments, `/auth` lists all available services:
324
325	```
326	> /auth
327	Usage: /auth SERVICE
328	Available: dropbox, github, google, microsoft, notion, zoom
329	```
330
331	With a service name, it runs the same auth chain as the CLI command:
332
333	```
334	> /auth github
335	Github authentication successful (api_key).
336	```
337
338	## Environment variables reference
339
340	The following table summarizes all environment variables used by the authentication system:
341
342	\| Service \| OAuth Client ID \| OAuth Client Secret \| API Key / Token \| Account ID \|
343	\|---------\|----------------\|--------------------\|--------------------\|------------\|
344	\| Google \| `GOOGLE_CLIENT_ID` \| `GOOGLE_CLIENT_SECRET` \| `GOOGLE_API_KEY` \| -- \|
345	\| Dropbox \| `DROPBOX_APP_KEY` \| `DROPBOX_APP_SECRET` \| `DROPBOX_ACCESS_TOKEN` \| -- \|
346	\| Zoom \| `ZOOM_CLIENT_ID` \| `ZOOM_CLIENT_SECRET` \| -- \| `ZOOM_ACCOUNT_ID` \|
347	\| Notion \| `NOTION_CLIENT_ID` \| `NOTION_CLIENT_SECRET` \| `NOTION_API_KEY` \| -- \|
348	\| GitHub \| `GITHUB_CLIENT_ID` \| `GITHUB_CLIENT_SECRET` \| `GITHUB_TOKEN` \| -- \|
349	\| Microsoft \| `MICROSOFT_CLIENT_ID` \| `MICROSOFT_CLIENT_SECRET` \| -- \| -- \|
350
351	## Python API
352
353	### AuthConfig
354
355	The `AuthConfig` dataclass defines the authentication configuration for a service. It holds OAuth endpoints, credential references, scopes, and token storage paths.
356
357	```python
358	from video_processor.auth import AuthConfig
359
360	config = AuthConfig(
361	service="myservice",
362	oauth_authorize_url="https://example.com/oauth/authorize",
363	oauth_token_url="https://example.com/oauth/token",
364	client_id_env="MYSERVICE_CLIENT_ID",
365	client_secret_env="MYSERVICE_CLIENT_SECRET",
366	api_key_env="MYSERVICE_API_KEY",
367	scopes=["read", "write"],
368	)
369	```
370
371	Key fields:
372
373	\| Field \| Type \| Description \|
374	\|-------\|------\|-------------\|
375	\| `service` \| `str` \| Service identifier (used for token filename) \|
376	\| `oauth_authorize_url` \| `Optional[str]` \| OAuth authorization endpoint \|
377	\| `oauth_token_url` \| `Optional[str]` \| OAuth token endpoint \|
378	\| `client_id` / `client_id_env` \| `Optional[str]` \| Client ID value or env var name \|
379	\| `client_secret` / `client_secret_env` \| `Optional[str]` \| Client secret value or env var name \|
380	\| `api_key_env` \| `Optional[str]` \| Environment variable for API key fallback \|
381	\| `scopes` \| `List[str]` \| OAuth scopes to request \|
382	\| `redirect_uri` \| `str` \| Redirect URI (default: `urn:ietf:wg:oauth:2.0:oob`) \|
383	\| `account_id` / `account_id_env` \| `Optional[str]` \| Account ID for client credentials grant \|
384	\| `token_path` \| `Optional[Path]` \| Override token storage path \|
385
386	Resolved properties:
387
388	- `resolved_client_id` -- Returns the client ID from the direct value or environment variable.
389	- `resolved_client_secret` -- Returns the client secret from the direct value or environment variable.
390	- `resolved_api_key` -- Returns the API key from the environment variable.
391	- `resolved_account_id` -- Returns the account ID from the direct value or environment variable.
392	- `resolved_token_path` -- Returns the token file path (default: `~/.planopticon/{service}_token.json`).
393	- `supports_oauth` -- Returns `True` if both OAuth endpoints are configured.
394
395	### OAuthManager
396
397	The `OAuthManager` class manages the full authentication lifecycle for a service.
398
399	```python
400	from video_processor.auth import OAuthManager, AuthConfig
401
402	config = AuthConfig(
403	service="notion",
404	oauth_authorize_url="https://api.notion.com/v1/oauth/authorize",
405	oauth_token_url="https://api.notion.com/v1/oauth/token",
406	client_id_env="NOTION_CLIENT_ID",
407	client_secret_env="NOTION_CLIENT_SECRET",
408	api_key_env="NOTION_API_KEY",
409	scopes=["read_content"],
410	)
411	manager = OAuthManager(config)
412
413	# Full auth chain -- returns AuthResult
414	result = manager.authenticate()
415	if result.success:
416	print(f"Authenticated via {result.method}")
417	print(f"Token: {result.access_token[:20]}...")
418
419	# Convenience method -- returns just the token string or None
420	token = manager.get_token()
421
422	# Clear saved token (logout)
423	manager.clear_token()
424	```
425
426	AuthResult fields:
427
428	\| Field \| Type \| Description \|
429	\|-------\|------\|-------------\|
430	\| `success` \| `bool` \| Whether authentication succeeded \|
431	\| `access_token` \| `Optional[str]` \| The access token (if successful) \|
432	\| `method` \| `Optional[str]` \| One of: `saved_token`, `oauth_pkce`, `client_credentials`, `api_key` \|
433	\| `expires_at` \| `Optional[float]` \| Token expiry as a Unix timestamp \|
434	\| `refresh_token` \| `Optional[str]` \| Refresh token (if provided) \|
435	\| `error` \| `Optional[str]` \| Error message (if unsuccessful) \|
436
437	### Pre-built configs
438
439	PlanOpticon ships with pre-built `AuthConfig` instances for all six supported services. Access them via convenience functions:
440
441	```python
442	from video_processor.auth import get_auth_config, get_auth_manager
443
444	# Get just the config
445	config = get_auth_config("zoom")
446
447	# Get a ready-to-use manager
448	manager = get_auth_manager("github")
449	token = manager.get_token()
450	```
451
452	### Building custom connectors
453
454	To add authentication for a new service, create an `AuthConfig` with the service's OAuth endpoints and credential environment variables:
455
456	```python
457	from video_processor.auth import AuthConfig, OAuthManager
458
459	config = AuthConfig(
460	service="slack",
461	oauth_authorize_url="https://slack.com/oauth/v2/authorize",
462	oauth_token_url="https://slack.com/api/oauth.v2.access",
463	client_id_env="SLACK_CLIENT_ID",
464	client_secret_env="SLACK_CLIENT_SECRET",
465	api_key_env="SLACK_BOT_TOKEN",
466	scopes=["channels:read", "channels:history"],
467	)
468
469	manager = OAuthManager(config)
470	result = manager.authenticate()
471	```
472
473	The token will be saved to `~/.planopticon/slack_token.json` and automatically refreshed on subsequent calls.
474
475	## Troubleshooting
476
477	### "No auth method available for {service}"
478
479	This means none of the four auth methods succeeded. Check that:
480
481	- The required environment variables are set and non-empty.
482	- For OAuth: both the client ID and client secret (or app key/secret) are set.
483	- For API key fallback: the correct environment variable is set.
484
485	The error message includes hints about which variables to set.
486
487	### Token refresh fails
488
489	If automatic token refresh fails, PlanOpticon falls back to the next auth method in the chain. Common causes:
490
491	- The refresh token has been revoked (e.g., you changed your password or revoked app access).
492	- The OAuth app's client secret has changed.
493	- The service requires re-authorization after a certain period.
494
495	To resolve, clear the token and re-authenticate:
496
497	```bash
498	planopticon auth google --logout
499	planopticon auth google
500	```
501
502	### OAuth PKCE flow does not open a browser
503
504	If the browser does not open automatically, PlanOpticon prints the authorization URL to the terminal. Copy and paste it into your browser manually. After authorizing, paste the authorization code back into the terminal prompt.
505
506	### "requests not installed"
507
508	The OAuth flows require the `requests` library. It is included as a dependency of PlanOpticon, but if you installed PlanOpticon in a minimal environment, install it manually:
509
510	```bash
511	pip install requests
512	```
513
514	### Permission denied on token file
515
516	PlanOpticon needs write access to `~/.planopticon/`. If the directory or token files have restrictive permissions, adjust them:
517
518	```bash
519	chmod 700 ~/.planopticon
520	chmod 600 ~/.planopticon/*_token.json
521	```
522
523	### Microsoft authentication uses the `/common` tenant
524
525	The default Microsoft OAuth configuration uses the `common` tenant endpoint (`login.microsoftonline.com/common/...`), which supports both personal Microsoft accounts and Azure AD organizational accounts. If your organization requires a specific tenant, you can create a custom `AuthConfig` with the tenant-specific URLs.

M docs/guide/batch.md

+120 -10

		--- docs/guide/batch.md
		+++ docs/guide/batch.md
		@@ -10,11 +10,11 @@
10	10
11	11	Batch mode:
12	12
13	13	1. Scans the input directory for video files matching the pattern
14	14	2. Processes each video through the full single-video pipeline
15		-3. Merges knowledge graphs across all videos (case-insensitive entity dedup)
	15	+3. Merges knowledge graphs across all videos with fuzzy matching and conflict resolution
16	16	4. Generates a batch summary with aggregated stats and action items
17	17	5. Writes a batch manifest linking to per-video results
18	18
19	19	## File patterns
20	20
		@@ -30,32 +30,58 @@
30	30
31	31	```
32	32	output/
33	33	├── batch_manifest.json # Batch-level manifest
34	34	├── batch_summary.md # Aggregated summary
35		-├── knowledge_graph.json # Merged KG across all videos
	35	+├── knowledge_graph.db # Merged KG across all videos (SQLite, primary)
	36	+├── knowledge_graph.json # Merged KG across all videos (JSON export)
36	37	└── videos/
37	38	├── meeting-01/
38	39	│ ├── manifest.json
39	40	│ ├── transcript/
40	41	│ ├── diagrams/
	42	+ │ ├── captures/
41	43	│ └── results/
	44	+ │ ├── analysis.md
	45	+ │ ├── analysis.html
	46	+ │ ├── knowledge_graph.db
	47	+ │ ├── knowledge_graph.json
	48	+ │ ├── key_points.json
	49	+ │ └── action_items.json
42	50	└── meeting-02/
43	51	├── manifest.json
44	52	└── ...
45	53	```
46	54
47	55	## Knowledge graph merging
48	56
49		-When the same entity appears across multiple videos, PlanOpticon merges them:
50		-
51		-- Case-insensitive name matching
52		-- Descriptions are unioned
53		-- Occurrences are concatenated with source tracking
54		-- Relationships are deduplicated
55		-
56		-The merged knowledge graph is saved at the batch root and included in the batch summary as a mermaid diagram.
	57	+When the same entity appears across multiple videos, PlanOpticon merges them using a multi-strategy approach:
	58	+
	59	+### Entity deduplication
	60	+
	61	+- Case-insensitive exact matching -- `"kubernetes"` and `"Kubernetes"` are recognized as the same entity
	62	+- Fuzzy name matching -- Uses `SequenceMatcher` with a threshold of 0.85 to unify near-duplicate entities (e.g., `"K8s"` and `"k8s cluster"` may be matched depending on context)
	63	+- Descriptions are unioned -- All unique descriptions from each video are combined
	64	+- Occurrences are concatenated with source tracking -- Each occurrence retains its source video reference
	65	+
	66	+### Relationship deduplication
	67	+
	68	+- Relationships are deduplicated by (source, target, type) tuple
	69	+- Descriptions from duplicate relationships are merged
	70	+
	71	+### Type conflict resolution
	72	+
	73	+When the same entity appears with different types across videos, PlanOpticon uses a specificity ranking to resolve the conflict. More specific types are preferred over general ones:
	74	+
	75	+- `technology` > `concept`
	76	+- `person` > `concept`
	77	+- `organization` > `concept`
	78	+- And so on through the full type hierarchy
	79	+
	80	+This ensures that an entity initially classified as a generic `concept` in one video gets upgraded to `technology` if it is identified more specifically in another.
	81	+
	82	+The merged knowledge graph is saved at the batch root in both SQLite (`knowledge_graph.db`) and JSON (`knowledge_graph.json`) formats, and is included in the batch summary as a Mermaid diagram.
57	83
58	84	## Error handling
59	85
60	86	If a video fails to process, the batch continues. Failed videos are recorded in the batch manifest with error details:
61	87
		@@ -64,5 +90,89 @@
64	90	"video_name": "corrupted-file",
65	91	"status": "failed",
66	92	"error": "Audio extraction failed: no audio track found"
67	93	}
68	94	```
	95	+
	96	+The batch manifest tracks completion status:
	97	+
	98	+```json
	99	+{
	100	+ "title": "Sprint Reviews",
	101	+ "total_videos": 5,
	102	+ "completed_videos": 4,
	103	+ "failed_videos": 1,
	104	+ "total_diagrams": 12,
	105	+ "total_action_items": 23,
	106	+ "total_key_points": 45,
	107	+ "videos": [...],
	108	+ "batch_summary_md": "batch_summary.md",
	109	+ "merged_knowledge_graph_json": "knowledge_graph.json",
	110	+ "merged_knowledge_graph_db": "knowledge_graph.db"
	111	+}
	112	+```
	113	+
	114	+## Using batch results
	115	+
	116	+### Query the merged knowledge graph
	117	+
	118	+After batch processing completes, the merged knowledge graph at the batch root contains entities and relationships from all successfully processed videos. You can query it just like a single-video knowledge graph:
	119	+
	120	+```bash
	121	+# Show stats for the merged graph
	122	+planopticon query --db output/knowledge_graph.db
	123	+
	124	+# List all people mentioned across all videos
	125	+planopticon query --db output/knowledge_graph.db "entities --type person"
	126	+
	127	+# See what connects to an entity across all videos
	128	+planopticon query --db output/knowledge_graph.db "neighbors Alice"
	129	+
	130	+# Ask natural language questions about the combined content
	131	+planopticon query --db output/knowledge_graph.db "What technologies were discussed across all meetings?"
	132	+
	133	+# Interactive REPL for exploration
	134	+planopticon query --db output/knowledge_graph.db -I
	135	+```
	136	+
	137	+### Export merged results
	138	+
	139	+All export commands work with the merged knowledge graph:
	140	+
	141	+```bash
	142	+# Generate documents from merged KG
	143	+planopticon export markdown output/knowledge_graph.db -o ./docs
	144	+
	145	+# Export as Obsidian vault
	146	+planopticon export obsidian output/knowledge_graph.db -o ./vault
	147	+
	148	+# Generate a project-wide exchange file
	149	+planopticon export exchange output/knowledge_graph.db --name "Sprint Reviews Q4"
	150	+
	151	+# Generate a GitHub wiki
	152	+planopticon wiki generate output/knowledge_graph.db -o ./wiki
	153	+```
	154	+
	155	+### Classify for planning
	156	+
	157	+Run taxonomy classification on the merged graph to categorize entities across all videos:
	158	+
	159	+```bash
	160	+planopticon kg classify output/knowledge_graph.db
	161	+```
	162	+
	163	+### Use with the planning agent
	164	+
	165	+The planning agent can consume the merged knowledge graph for cross-video analysis and planning:
	166	+
	167	+```bash
	168	+planopticon agent --db output/knowledge_graph.db
	169	+```
	170	+
	171	+### Incremental batch processing
	172	+
	173	+If you add new videos to the recordings directory, you can re-run the batch command. Videos that have already been processed (with output directories present) will be detected via checkpoint/resume within each video's pipeline, making incremental processing efficient.
	174	+
	175	+```bash
	176	+# Add new recordings to the folder, then re-run
	177	+planopticon batch -i ./recordings -o ./output --title "Sprint Reviews"
	178	+```
69	179
70	180	ADDED docs/guide/companion.md
71	181	ADDED docs/guide/document-ingestion.md
72	182	ADDED docs/guide/export.md
73	183	ADDED docs/guide/knowledge-graphs.md

	--- docs/guide/batch.md
	+++ docs/guide/batch.md
	@@ -10,11 +10,11 @@
10
11	Batch mode:
12
13	1. Scans the input directory for video files matching the pattern
14	2. Processes each video through the full single-video pipeline
15	3. Merges knowledge graphs across all videos (case-insensitive entity dedup)
16	4. Generates a batch summary with aggregated stats and action items
17	5. Writes a batch manifest linking to per-video results
18
19	## File patterns
20
	@@ -30,32 +30,58 @@
30
31	```
32	output/
33	├── batch_manifest.json # Batch-level manifest
34	├── batch_summary.md # Aggregated summary
35	├── knowledge_graph.json # Merged KG across all videos

36	└── videos/
37	├── meeting-01/
38	│ ├── manifest.json
39	│ ├── transcript/
40	│ ├── diagrams/

41	│ └── results/






42	└── meeting-02/
43	├── manifest.json
44	└── ...
45	```
46
47	## Knowledge graph merging
48
49	When the same entity appears across multiple videos, PlanOpticon merges them:
50
51	- Case-insensitive name matching
52	- Descriptions are unioned
53	- Occurrences are concatenated with source tracking
54	- Relationships are deduplicated
55
56	The merged knowledge graph is saved at the batch root and included in the batch summary as a mermaid diagram.


















57
58	## Error handling
59
60	If a video fails to process, the batch continues. Failed videos are recorded in the batch manifest with error details:
61
	@@ -64,5 +90,89 @@
64	"video_name": "corrupted-file",
65	"status": "failed",
66	"error": "Audio extraction failed: no audio track found"
67	}
68	```




















































































69
70	DDED docs/guide/companion.md
71	DDED docs/guide/document-ingestion.md
72	DDED docs/guide/export.md
73	DDED docs/guide/knowledge-graphs.md

	--- docs/guide/batch.md
	+++ docs/guide/batch.md
	@@ -10,11 +10,11 @@
10
11	Batch mode:
12
13	1. Scans the input directory for video files matching the pattern
14	2. Processes each video through the full single-video pipeline
15	3. Merges knowledge graphs across all videos with fuzzy matching and conflict resolution
16	4. Generates a batch summary with aggregated stats and action items
17	5. Writes a batch manifest linking to per-video results
18
19	## File patterns
20
	@@ -30,32 +30,58 @@
30
31	```
32	output/
33	├── batch_manifest.json # Batch-level manifest
34	├── batch_summary.md # Aggregated summary
35	├── knowledge_graph.db # Merged KG across all videos (SQLite, primary)
36	├── knowledge_graph.json # Merged KG across all videos (JSON export)
37	└── videos/
38	├── meeting-01/
39	│ ├── manifest.json
40	│ ├── transcript/
41	│ ├── diagrams/
42	│ ├── captures/
43	│ └── results/
44	│ ├── analysis.md
45	│ ├── analysis.html
46	│ ├── knowledge_graph.db
47	│ ├── knowledge_graph.json
48	│ ├── key_points.json
49	│ └── action_items.json
50	└── meeting-02/
51	├── manifest.json
52	└── ...
53	```
54
55	## Knowledge graph merging
56
57	When the same entity appears across multiple videos, PlanOpticon merges them using a multi-strategy approach:
58
59	### Entity deduplication
60
61	- Case-insensitive exact matching -- `"kubernetes"` and `"Kubernetes"` are recognized as the same entity
62	- Fuzzy name matching -- Uses `SequenceMatcher` with a threshold of 0.85 to unify near-duplicate entities (e.g., `"K8s"` and `"k8s cluster"` may be matched depending on context)
63	- Descriptions are unioned -- All unique descriptions from each video are combined
64	- Occurrences are concatenated with source tracking -- Each occurrence retains its source video reference
65
66	### Relationship deduplication
67
68	- Relationships are deduplicated by (source, target, type) tuple
69	- Descriptions from duplicate relationships are merged
70
71	### Type conflict resolution
72
73	When the same entity appears with different types across videos, PlanOpticon uses a specificity ranking to resolve the conflict. More specific types are preferred over general ones:
74
75	- `technology` > `concept`
76	- `person` > `concept`
77	- `organization` > `concept`
78	- And so on through the full type hierarchy
79
80	This ensures that an entity initially classified as a generic `concept` in one video gets upgraded to `technology` if it is identified more specifically in another.
81
82	The merged knowledge graph is saved at the batch root in both SQLite (`knowledge_graph.db`) and JSON (`knowledge_graph.json`) formats, and is included in the batch summary as a Mermaid diagram.
83
84	## Error handling
85
86	If a video fails to process, the batch continues. Failed videos are recorded in the batch manifest with error details:
87
	@@ -64,5 +90,89 @@
90	"video_name": "corrupted-file",
91	"status": "failed",
92	"error": "Audio extraction failed: no audio track found"
93	}
94	```
95
96	The batch manifest tracks completion status:
97
98	```json
99	{
100	"title": "Sprint Reviews",
101	"total_videos": 5,
102	"completed_videos": 4,
103	"failed_videos": 1,
104	"total_diagrams": 12,
105	"total_action_items": 23,
106	"total_key_points": 45,
107	"videos": [...],
108	"batch_summary_md": "batch_summary.md",
109	"merged_knowledge_graph_json": "knowledge_graph.json",
110	"merged_knowledge_graph_db": "knowledge_graph.db"
111	}
112	```
113
114	## Using batch results
115
116	### Query the merged knowledge graph
117
118	After batch processing completes, the merged knowledge graph at the batch root contains entities and relationships from all successfully processed videos. You can query it just like a single-video knowledge graph:
119
120	```bash
121	# Show stats for the merged graph
122	planopticon query --db output/knowledge_graph.db
123
124	# List all people mentioned across all videos
125	planopticon query --db output/knowledge_graph.db "entities --type person"
126
127	# See what connects to an entity across all videos
128	planopticon query --db output/knowledge_graph.db "neighbors Alice"
129
130	# Ask natural language questions about the combined content
131	planopticon query --db output/knowledge_graph.db "What technologies were discussed across all meetings?"
132
133	# Interactive REPL for exploration
134	planopticon query --db output/knowledge_graph.db -I
135	```
136
137	### Export merged results
138
139	All export commands work with the merged knowledge graph:
140
141	```bash
142	# Generate documents from merged KG
143	planopticon export markdown output/knowledge_graph.db -o ./docs
144
145	# Export as Obsidian vault
146	planopticon export obsidian output/knowledge_graph.db -o ./vault
147
148	# Generate a project-wide exchange file
149	planopticon export exchange output/knowledge_graph.db --name "Sprint Reviews Q4"
150
151	# Generate a GitHub wiki
152	planopticon wiki generate output/knowledge_graph.db -o ./wiki
153	```
154
155	### Classify for planning
156
157	Run taxonomy classification on the merged graph to categorize entities across all videos:
158
159	```bash
160	planopticon kg classify output/knowledge_graph.db
161	```
162
163	### Use with the planning agent
164
165	The planning agent can consume the merged knowledge graph for cross-video analysis and planning:
166
167	```bash
168	planopticon agent --db output/knowledge_graph.db
169	```
170
171	### Incremental batch processing
172
173	If you add new videos to the recordings directory, you can re-run the batch command. Videos that have already been processed (with output directories present) will be detected via checkpoint/resume within each video's pipeline, making incremental processing efficient.
174
175	```bash
176	# Add new recordings to the folder, then re-run
177	planopticon batch -i ./recordings -o ./output --title "Sprint Reviews"
178	```
179
180	DDED docs/guide/companion.md
181	DDED docs/guide/document-ingestion.md
182	DDED docs/guide/export.md
183	DDED docs/guide/knowledge-graphs.md

A docs/guide/companion.md

+531

		--- a/docs/guide/companion.md
		+++ b/docs/guide/companion.md
		@@ -0,0 +1,531 @@
	1	+# Interactive Companion REPL
	2	+
	3	+The PlanOpticon Companion is an interactive Read-Eval-Print Loop (REPL) that provides a conversational interface to PlanOpticon's full feature set. It combines workspace awareness, knowledge graph querying, LLM-powered chat, and planning agent skills into a single session.
	4	+
	5	+Use the Companion when you want to explore a knowledge graph interactively, ask natural-language questions about extracted content, generate planning artifacts on the fly, or switch between providers and models without restarting.
	6	+
	7	+---
	8	+
	9	+## Launching the Companion
	10	+
	11	+There are three equivalent ways to start the Companion.
	12	+
	13	+### As a subcommand
	14	+
	15	+```bash
	16	+planopticon companion
	17	+```
	18	+
	19	+### With the `--chat` / `-C` flag
	20	+
	21	+```bash
	22	+planopticon --chat
	23	+planopticon -C
	24	+```
	25	+
	26	+These flags launch the Companion directly from the top-level CLI, without invoking a subcommand.
	27	+
	28	+### With options
	29	+
	30	+The `companion` subcommand accepts options for specifying knowledge base paths, LLM provider, and model:
	31	+
	32	+```bash
	33	+# Point at a specific knowledge base
	34	+planopticon companion --kb ./results
	35	+
	36	+# Use a specific provider
	37	+planopticon companion -p anthropic
	38	+
	39	+# Use a specific model
	40	+planopticon companion --chat-model gpt-4o
	41	+
	42	+# Combine options
	43	+planopticon companion --kb ./results -p openai --chat-model gpt-4o
	44	+```
	45	+
	46	+\| Option \| Description \|
	47	+\|---\|---\|
	48	+\| `--kb PATH` \| Path to a knowledge graph file or directory (repeatable) \|
	49	+\| `-p, --provider NAME` \| LLM provider: `auto`, `openai`, `anthropic`, `gemini`, `ollama`, `azure`, `together`, `fireworks`, `cerebras`, `xai` \|
	50	+\| `--chat-model NAME` \| Override the default chat model for the selected provider \|
	51	+
	52	+---
	53	+
	54	+## Auto-discovery
	55	+
	56	+On startup, the Companion automatically scans the workspace for relevant files:
	57	+
	58	+Knowledge graphs. The Companion uses `find_nearest_graph()` to locate the closest `knowledge_graph.db` or `knowledge_graph.json` file. It searches the current directory, common output subdirectories (`results/`, `output/`, `knowledge-base/`), recursively downward (up to 4 levels), and upward through parent directories. SQLite `.db` files are preferred over `.json` files.
	59	+
	60	+Videos. The current directory is scanned for files with `.mp4`, `.mkv`, and `.webm` extensions.
	61	+
	62	+Documents. The current directory is scanned for files with `.md`, `.pdf`, and `.docx` extensions.
	63	+
	64	+LLM provider. If `--provider` is set to `auto` (the default), the Companion attempts to initialise a provider using any available API key in the environment (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`, etc.).
	65	+
	66	+All discovered context is displayed in the welcome banner:
	67	+
	68	+```
	69	+ PlanOpticon Companion
	70	+ Interactive planning REPL
	71	+
	72	+ Knowledge graph: knowledge_graph.db (42 entities, 87 relationships)
	73	+ Videos: meeting-2024-01-15.mp4, sprint-review.mp4
	74	+ Docs: requirements.md, architecture.pdf
	75	+ LLM provider: openai (model: gpt-4o)
	76	+
	77	+ Type /help for commands, or ask a question.
	78	+```
	79	+
	80	+If no knowledge graph is found, the banner shows "No knowledge graph loaded." Commands that require a KG will return an appropriate message rather than failing silently.
	81	+
	82	+---
	83	+
	84	+## Slash Commands
	85	+
	86	+The Companion supports 18 slash commands. Type `/help` at the prompt to see the full list.
	87	+
	88	+### /help
	89	+
	90	+Display all available commands with brief descriptions.
	91	+
	92	+```
	93	+planopticon> /help
	94	+Available commands:
	95	+ /help Show this help
	96	+ /status Workspace status
	97	+ /skills List available skills
	98	+ /entities [--type T] List KG entities
	99	+ /search TERM Search entities by name
	100	+ /neighbors ENTITY Show entity relationships
	101	+ /export FORMAT Export KG (markdown, obsidian, notion, csv)
	102	+ /analyze PATH Analyze a video/doc
	103	+ /ingest PATH Ingest a file into the KG
	104	+ /auth SERVICE Authenticate with a cloud service
	105	+ /provider [NAME] List or switch LLM provider
	106	+ /model [NAME] Show or switch chat model
	107	+ /run SKILL Run a skill by name
	108	+ /plan Run project_plan skill
	109	+ /prd Run PRD skill
	110	+ /tasks Run task_breakdown skill
	111	+ /quit, /exit Exit companion
	112	+
	113	+Any other input is sent to the chat agent (requires LLM).
	114	+```
	115	+
	116	+### /status
	117	+
	118	+Show a summary of the current workspace state: loaded knowledge graph (with entity and relationship counts, broken down by entity type), number of discovered videos and documents, and whether an LLM provider is active.
	119	+
	120	+```
	121	+planopticon> /status
	122	+Workspace status:
	123	+ KG: /home/user/project/results/knowledge_graph.db (42 entities, 87 relationships)
	124	+ technology: 15
	125	+ person: 12
	126	+ concept: 10
	127	+ organization: 5
	128	+ Videos: 2 found
	129	+ Docs: 3 found
	130	+ Provider: active
	131	+```
	132	+
	133	+### /skills
	134	+
	135	+List all registered planning agent skills with their names and descriptions. These are the skills that can be invoked via `/run`.
	136	+
	137	+```
	138	+planopticon> /skills
	139	+Available skills:
	140	+ project_plan: Generate a structured project plan from knowledge graph
	141	+ prd: Generate a product requirements document (PRD) / feature spec
	142	+ roadmap: Generate a product/project roadmap
	143	+ task_breakdown: Break down goals into tasks with dependencies
	144	+ github_issues: Generate GitHub issues from task breakdown
	145	+ requirements_chat: Interactive requirements gathering via guided questions
	146	+ doc_generator: Generate technical documentation, ADRs, or meeting notes
	147	+ artifact_export: Export artifacts in agent-ready formats
	148	+ cli_adapter: Push artifacts to external tools via their CLIs
	149	+ notes_export: Export knowledge graph as structured notes (Obsidian, Notion)
	150	+ wiki_generator: Generate a GitHub wiki from knowledge graph and artifacts
	151	+```
	152	+
	153	+### /entities [--type TYPE]
	154	+
	155	+List entities from the loaded knowledge graph. Optionally filter by entity type.
	156	+
	157	+```
	158	+planopticon> /entities
	159	+Found 42 entities
	160	+ [technology] Python -- General-purpose programming language
	161	+ [person] Alice -- Lead engineer on the project
	162	+ [concept] Microservices -- Architectural pattern discussed
	163	+ ...
	164	+
	165	+planopticon> /entities --type person
	166	+Found 12 entities
	167	+ [person] Alice -- Lead engineer on the project
	168	+ [person] Bob -- Product manager
	169	+ ...
	170	+```
	171	+
	172	+!!! note
	173	+ This command requires a loaded knowledge graph. If none is loaded, it returns "No knowledge graph loaded."
	174	+
	175	+### /search TERM
	176	+
	177	+Search entities by name substring (case-insensitive).
	178	+
	179	+```
	180	+planopticon> /search python
	181	+Found 3 entities
	182	+ [technology] Python -- General-purpose programming language
	183	+ [technology] Python Flask -- Web framework for Python
	184	+ [concept] Python packaging -- Discussion of pip and packaging tools
	185	+```
	186	+
	187	+### /neighbors ENTITY
	188	+
	189	+Show all entities and relationships connected to a given entity. This performs a breadth-first traversal (depth 1) from the named entity.
	190	+
	191	+```
	192	+planopticon> /neighbors Alice
	193	+Found 4 entities and 5 relationships
	194	+ [person] Alice -- Lead engineer on the project
	195	+ [technology] Python -- General-purpose programming language
	196	+ [organization] Acme Corp -- Employer
	197	+ [concept] Authentication -- Auth system design
	198	+ Alice --[works_with]--> Python
	199	+ Alice --[employed_by]--> Acme Corp
	200	+ Alice --[proposed]--> Authentication
	201	+ Bob --[collaborates_with]--> Alice
	202	+ Authentication --[discussed_by]--> Alice
	203	+```
	204	+
	205	+### /export FORMAT
	206	+
	207	+Request an export of the knowledge graph. Supported formats: `markdown`, `obsidian`, `notion`, `csv`. This command prints the equivalent CLI command to run.
	208	+
	209	+```
	210	+planopticon> /export obsidian
	211	+Export 'obsidian' requested. Use the CLI command:
	212	+ planopticon export obsidian /home/user/project/results/knowledge_graph.db
	213	+```
	214	+
	215	+### /analyze PATH
	216	+
	217	+Request analysis of a video or document file. Validates the file exists and prints the equivalent CLI command.
	218	+
	219	+```
	220	+planopticon> /analyze meeting.mp4
	221	+Analyze requested for meeting.mp4. Use the CLI:
	222	+ planopticon analyze -i /home/user/project/meeting.mp4
	223	+```
	224	+
	225	+### /ingest PATH
	226	+
	227	+Request ingestion of a file into the knowledge graph. Validates the file exists and prints the equivalent CLI command.
	228	+
	229	+```
	230	+planopticon> /ingest notes.md
	231	+Ingest requested for notes.md. Use the CLI:
	232	+ planopticon ingest /home/user/project/notes.md
	233	+```
	234	+
	235	+### /auth [SERVICE]
	236	+
	237	+Authenticate with a cloud service. When called without arguments, lists all available services. When called with a service name, triggers the authentication flow.
	238	+
	239	+```
	240	+planopticon> /auth
	241	+Usage: /auth SERVICE
	242	+Available: dropbox, github, google, microsoft, notion, zoom
	243	+
	244	+planopticon> /auth zoom
	245	+Zoom authenticated (oauth)
	246	+```
	247	+
	248	+### /provider [NAME]
	249	+
	250	+List available LLM providers and their status, or switch to a different provider.
	251	+
	252	+When called without arguments (or with `list`), shows all known providers with their availability status:
	253	+
	254	+- ready -- API key found in environment
	255	+- local -- runs locally (Ollama)
	256	+- no key -- no API key configured
	257	+
	258	+The currently active provider is marked.
	259	+
	260	+```
	261	+planopticon> /provider
	262	+Available providers:
	263	+ openai: ready (active)
	264	+ anthropic: ready
	265	+ gemini: no key
	266	+ ollama: local
	267	+ azure: no key
	268	+ together: no key
	269	+ fireworks: no key
	270	+ cerebras: no key
	271	+ xai: no key
	272	+
	273	+Current: openai
	274	+```
	275	+
	276	+To switch providers at runtime:
	277	+
	278	+```
	279	+planopticon> /provider anthropic
	280	+Switched to provider: anthropic
	281	+```
	282	+
	283	+Switching the provider reinitialises the provider manager and the planning agent. The chat model is reset to the provider's default. If initialisation fails, an error message is shown.
	284	+
	285	+### /model [NAME]
	286	+
	287	+Show the current chat model, or switch to a different one.
	288	+
	289	+```
	290	+planopticon> /model
	291	+Current model: default
	292	+Usage: /model MODEL_NAME
	293	+
	294	+planopticon> /model claude-sonnet-4-20250514
	295	+Switched to model: claude-sonnet-4-20250514
	296	+```
	297	+
	298	+Switching the model reinitialises both the provider manager and the planning agent.
	299	+
	300	+### /run SKILL
	301	+
	302	+Run any registered skill by name. The skill receives the current agent context (knowledge graph, query engine, provider, and any previously generated artifacts) and returns an artifact.
	303	+
	304	+```
	305	+planopticon> /run roadmap
	306	+--- Roadmap (roadmap) ---
	307	+# Roadmap
	308	+
	309	+## Vision & Strategy
	310	+...
	311	+```
	312	+
	313	+If the skill cannot execute (missing KG or provider), an error message is returned. Use `/skills` to see all available skill names.
	314	+
	315	+### /plan
	316	+
	317	+Shortcut for `/run project_plan`. Generates a structured project plan from the loaded knowledge graph.
	318	+
	319	+```
	320	+planopticon> /plan
	321	+--- Project Plan (project_plan) ---
	322	+# Project Plan
	323	+
	324	+## Executive Summary
	325	+...
	326	+```
	327	+
	328	+### /prd
	329	+
	330	+Shortcut for `/run prd`. Generates a product requirements document.
	331	+
	332	+```
	333	+planopticon> /prd
	334	+--- Product Requirements Document (prd) ---
	335	+# Product Requirements Document
	336	+
	337	+## Problem Statement
	338	+...
	339	+```
	340	+
	341	+### /tasks
	342	+
	343	+Shortcut for `/run task_breakdown`. Breaks goals and features into tasks with dependencies, priorities, and effort estimates. The output is JSON.
	344	+
	345	+```
	346	+planopticon> /tasks
	347	+--- Task Breakdown (task_list) ---
	348	+[
	349	+ {
	350	+ "id": "T1",
	351	+ "title": "Set up authentication service",
	352	+ "description": "Implement OAuth2 flow with JWT tokens",
	353	+ "depends_on": [],
	354	+ "priority": "high",
	355	+ "estimate": "1w",
	356	+ "assignee_role": "backend engineer"
	357	+ },
	358	+ ...
	359	+]
	360	+```
	361	+
	362	+### /quit and /exit
	363	+
	364	+Exit the Companion REPL.
	365	+
	366	+```
	367	+planopticon> /quit
	368	+Bye.
	369	+```
	370	+
	371	+---
	372	+
	373	+## Exiting the Companion
	374	+
	375	+In addition to `/quit` and `/exit`, you can exit by:
	376	+
	377	+- Typing `quit`, `exit`, `bye`, or `q` as bare words (without the `/` prefix)
	378	+- Pressing `Ctrl+C` or `Ctrl+D`
	379	+
	380	+All of these end the session with a "Bye." message.
	381	+
	382	+---
	383	+
	384	+## Chat Mode
	385	+
	386	+Any input that does not start with `/` and is not a bare exit word is sent to the chat agent as a natural-language message. This requires a configured LLM provider.
	387	+
	388	+```
	389	+planopticon> What technologies were discussed in the meeting?
	390	+Based on the knowledge graph, the following technologies were discussed:
	391	+
	392	+1. Python -- mentioned in the context of backend development
	393	+2. React -- proposed for the frontend redesign
	394	+3. PostgreSQL -- discussed as the primary database
	395	+...
	396	+```
	397	+
	398	+The chat agent maintains conversation history across the session. It has full awareness of:
	399	+
	400	+- The loaded knowledge graph (entity and relationship counts, types)
	401	+- Any artifacts generated during the session (via `/plan`, `/prd`, `/tasks`, `/run`)
	402	+- All available slash commands (which it may suggest when relevant)
	403	+- The full PlanOpticon CLI command set
	404	+
	405	+If no LLM provider is configured, chat mode returns an error with instructions:
	406	+
	407	+```
	408	+planopticon> What was discussed?
	409	+Chat requires an LLM provider. Set one of:
	410	+ OPENAI_API_KEY
	411	+ ANTHROPIC_API_KEY
	412	+ GEMINI_API_KEY
	413	+Or pass --provider / --chat-model.
	414	+```
	415	+
	416	+---
	417	+
	418	+## Runtime Provider and Model Switching
	419	+
	420	+One of the Companion's key features is the ability to switch LLM providers and models without restarting the session. This is useful for:
	421	+
	422	+- Comparing outputs across different models
	423	+- Falling back to a local model (Ollama) when API keys expire
	424	+- Using a cheaper model for exploratory queries and a more capable one for artifact generation
	425	+
	426	+When you switch providers or models via `/provider` or `/model`, the Companion:
	427	+
	428	+1. Updates the internal provider name and/or model name
	429	+2. Reinitialises the `ProviderManager`
	430	+3. Reinitialises the `PlanningAgent` with a fresh `AgentContext` that retains the loaded knowledge graph and query engine
	431	+
	432	+Conversation history is preserved across provider switches.
	433	+
	434	+---
	435	+
	436	+## Example Session
	437	+
	438	+The following walkthrough shows a typical Companion session, from launch through exploration to artifact generation.
	439	+
	440	+```bash
	441	+$ planopticon companion --kb ./results
	442	+```
	443	+
	444	+```
	445	+ PlanOpticon Companion
	446	+ Interactive planning REPL
	447	+
	448	+ Knowledge graph: knowledge_graph.db (58 entities, 124 relationships)
	449	+ Videos: sprint-review-2024-03.mp4
	450	+ Docs: architecture.md, requirements.pdf
	451	+ LLM provider: openai (model: default)
	452	+
	453	+ Type /help for commands, or ask a question.
	454	+
	455	+planopticon> /status
	456	+Workspace status:
	457	+ KG: /home/user/project/results/knowledge_graph.db (58 entities, 124 relationships)
	458	+ technology: 20
	459	+ person: 15
	460	+ concept: 13
	461	+ organization: 8
	462	+ time: 2
	463	+ Videos: 1 found
	464	+ Docs: 2 found
	465	+ Provider: active
	466	+
	467	+planopticon> /entities --type person
	468	+Found 15 entities
	469	+ [person] Alice -- Lead architect
	470	+ [person] Bob -- Product manager
	471	+ [person] Carol -- Frontend lead
	472	+ ...
	473	+
	474	+planopticon> /neighbors Alice
	475	+Found 6 entities and 8 relationships
	476	+ [person] Alice -- Lead architect
	477	+ [technology] Kubernetes -- Container orchestration platform
	478	+ [concept] Microservices -- Proposed architecture pattern
	479	+ ...
	480	+ Alice --[proposed]--> Microservices
	481	+ Alice --[expert_in]--> Kubernetes
	482	+ ...
	483	+
	484	+planopticon> What were the main decisions made in the sprint review?
	485	+Based on the knowledge graph, the sprint review covered several key decisions:
	486	+
	487	+1. Adopt microservices architecture -- Alice proposed and the team agreed
	488	+ to move from the monolith to a microservices pattern.
	489	+2. Use Kubernetes for orchestration -- Selected over Docker Swarm.
	490	+3. Prioritize authentication module -- Bob identified this as the highest
	491	+ priority for the next sprint.
	492	+
	493	+planopticon> /provider anthropic
	494	+Switched to provider: anthropic
	495	+
	496	+planopticon> /model claude-sonnet-4-20250514
	497	+Switched to model: claude-sonnet-4-20250514
	498	+
	499	+planopticon> /plan
	500	+--- Project Plan (project_plan) ---
	501	+# Project Plan
	502	+
	503	+## Executive Summary
	504	+This project plan outlines the migration from a monolithic architecture
	505	+to a microservices-based system, as discussed in the sprint review...
	506	+
	507	+## Goals & Objectives
	508	+...
	509	+
	510	+planopticon> /tasks
	511	+--- Task Breakdown (task_list) ---
	512	+[
	513	+ {
	514	+ "id": "T1",
	515	+ "title": "Design service boundaries",
	516	+ "description": "Define microservice boundaries based on domain analysis",
	517	+ "depends_on": [],
	518	+ "priority": "high",
	519	+ "estimate": "3d",
	520	+ "assignee_role": "architect"
	521	+ },
	522	+ ...
	523	+]
	524	+
	525	+planopticon> /export obsidian
	526	+Export 'obsidian' requested. Use the CLI command:
	527	+ planopticon export obsidian /home/user/project/results/knowledge_graph.db
	528	+
	529	+planopticon> quit
	530	+Bye.
	531	+```

	--- a/docs/guide/companion.md
	+++ b/docs/guide/companion.md
	@@ -0,0 +1,531 @@

	--- a/docs/guide/companion.md
	+++ b/docs/guide/companion.md
	@@ -0,0 +1,531 @@
1	# Interactive Companion REPL
2
3	The PlanOpticon Companion is an interactive Read-Eval-Print Loop (REPL) that provides a conversational interface to PlanOpticon's full feature set. It combines workspace awareness, knowledge graph querying, LLM-powered chat, and planning agent skills into a single session.
4
5	Use the Companion when you want to explore a knowledge graph interactively, ask natural-language questions about extracted content, generate planning artifacts on the fly, or switch between providers and models without restarting.
6
7	---
8
9	## Launching the Companion
10
11	There are three equivalent ways to start the Companion.
12
13	### As a subcommand
14
15	```bash
16	planopticon companion
17	```
18
19	### With the `--chat` / `-C` flag
20
21	```bash
22	planopticon --chat
23	planopticon -C
24	```
25
26	These flags launch the Companion directly from the top-level CLI, without invoking a subcommand.
27
28	### With options
29
30	The `companion` subcommand accepts options for specifying knowledge base paths, LLM provider, and model:
31
32	```bash
33	# Point at a specific knowledge base
34	planopticon companion --kb ./results
35
36	# Use a specific provider
37	planopticon companion -p anthropic
38
39	# Use a specific model
40	planopticon companion --chat-model gpt-4o
41
42	# Combine options
43	planopticon companion --kb ./results -p openai --chat-model gpt-4o
44	```
45
46	\| Option \| Description \|
47	\|---\|---\|
48	\| `--kb PATH` \| Path to a knowledge graph file or directory (repeatable) \|
49	\| `-p, --provider NAME` \| LLM provider: `auto`, `openai`, `anthropic`, `gemini`, `ollama`, `azure`, `together`, `fireworks`, `cerebras`, `xai` \|
50	\| `--chat-model NAME` \| Override the default chat model for the selected provider \|
51
52	---
53
54	## Auto-discovery
55
56	On startup, the Companion automatically scans the workspace for relevant files:
57
58	Knowledge graphs. The Companion uses `find_nearest_graph()` to locate the closest `knowledge_graph.db` or `knowledge_graph.json` file. It searches the current directory, common output subdirectories (`results/`, `output/`, `knowledge-base/`), recursively downward (up to 4 levels), and upward through parent directories. SQLite `.db` files are preferred over `.json` files.
59
60	Videos. The current directory is scanned for files with `.mp4`, `.mkv`, and `.webm` extensions.
61
62	Documents. The current directory is scanned for files with `.md`, `.pdf`, and `.docx` extensions.
63
64	LLM provider. If `--provider` is set to `auto` (the default), the Companion attempts to initialise a provider using any available API key in the environment (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`, etc.).
65
66	All discovered context is displayed in the welcome banner:
67
68	```
69	PlanOpticon Companion
70	Interactive planning REPL
71
72	Knowledge graph: knowledge_graph.db (42 entities, 87 relationships)
73	Videos: meeting-2024-01-15.mp4, sprint-review.mp4
74	Docs: requirements.md, architecture.pdf
75	LLM provider: openai (model: gpt-4o)
76
77	Type /help for commands, or ask a question.
78	```
79
80	If no knowledge graph is found, the banner shows "No knowledge graph loaded." Commands that require a KG will return an appropriate message rather than failing silently.
81
82	---
83
84	## Slash Commands
85
86	The Companion supports 18 slash commands. Type `/help` at the prompt to see the full list.
87
88	### /help
89
90	Display all available commands with brief descriptions.
91
92	```
93	planopticon> /help
94	Available commands:
95	/help Show this help
96	/status Workspace status
97	/skills List available skills
98	/entities [--type T] List KG entities
99	/search TERM Search entities by name
100	/neighbors ENTITY Show entity relationships
101	/export FORMAT Export KG (markdown, obsidian, notion, csv)
102	/analyze PATH Analyze a video/doc
103	/ingest PATH Ingest a file into the KG
104	/auth SERVICE Authenticate with a cloud service
105	/provider [NAME] List or switch LLM provider
106	/model [NAME] Show or switch chat model
107	/run SKILL Run a skill by name
108	/plan Run project_plan skill
109	/prd Run PRD skill
110	/tasks Run task_breakdown skill
111	/quit, /exit Exit companion
112
113	Any other input is sent to the chat agent (requires LLM).
114	```
115
116	### /status
117
118	Show a summary of the current workspace state: loaded knowledge graph (with entity and relationship counts, broken down by entity type), number of discovered videos and documents, and whether an LLM provider is active.
119
120	```
121	planopticon> /status
122	Workspace status:
123	KG: /home/user/project/results/knowledge_graph.db (42 entities, 87 relationships)
124	technology: 15
125	person: 12
126	concept: 10
127	organization: 5
128	Videos: 2 found
129	Docs: 3 found
130	Provider: active
131	```
132
133	### /skills
134
135	List all registered planning agent skills with their names and descriptions. These are the skills that can be invoked via `/run`.
136
137	```
138	planopticon> /skills
139	Available skills:
140	project_plan: Generate a structured project plan from knowledge graph
141	prd: Generate a product requirements document (PRD) / feature spec
142	roadmap: Generate a product/project roadmap
143	task_breakdown: Break down goals into tasks with dependencies
144	github_issues: Generate GitHub issues from task breakdown
145	requirements_chat: Interactive requirements gathering via guided questions
146	doc_generator: Generate technical documentation, ADRs, or meeting notes
147	artifact_export: Export artifacts in agent-ready formats
148	cli_adapter: Push artifacts to external tools via their CLIs
149	notes_export: Export knowledge graph as structured notes (Obsidian, Notion)
150	wiki_generator: Generate a GitHub wiki from knowledge graph and artifacts
151	```
152
153	### /entities [--type TYPE]
154
155	List entities from the loaded knowledge graph. Optionally filter by entity type.
156
157	```
158	planopticon> /entities
159	Found 42 entities
160	[technology] Python -- General-purpose programming language
161	[person] Alice -- Lead engineer on the project
162	[concept] Microservices -- Architectural pattern discussed
163	...
164
165	planopticon> /entities --type person
166	Found 12 entities
167	[person] Alice -- Lead engineer on the project
168	[person] Bob -- Product manager
169	...
170	```
171
172	!!! note
173	This command requires a loaded knowledge graph. If none is loaded, it returns "No knowledge graph loaded."
174
175	### /search TERM
176
177	Search entities by name substring (case-insensitive).
178
179	```
180	planopticon> /search python
181	Found 3 entities
182	[technology] Python -- General-purpose programming language
183	[technology] Python Flask -- Web framework for Python
184	[concept] Python packaging -- Discussion of pip and packaging tools
185	```
186
187	### /neighbors ENTITY
188
189	Show all entities and relationships connected to a given entity. This performs a breadth-first traversal (depth 1) from the named entity.
190
191	```
192	planopticon> /neighbors Alice
193	Found 4 entities and 5 relationships
194	[person] Alice -- Lead engineer on the project
195	[technology] Python -- General-purpose programming language
196	[organization] Acme Corp -- Employer
197	[concept] Authentication -- Auth system design
198	Alice --[works_with]--> Python
199	Alice --[employed_by]--> Acme Corp
200	Alice --[proposed]--> Authentication
201	Bob --[collaborates_with]--> Alice
202	Authentication --[discussed_by]--> Alice
203	```
204
205	### /export FORMAT
206
207	Request an export of the knowledge graph. Supported formats: `markdown`, `obsidian`, `notion`, `csv`. This command prints the equivalent CLI command to run.
208
209	```
210	planopticon> /export obsidian
211	Export 'obsidian' requested. Use the CLI command:
212	planopticon export obsidian /home/user/project/results/knowledge_graph.db
213	```
214
215	### /analyze PATH
216
217	Request analysis of a video or document file. Validates the file exists and prints the equivalent CLI command.
218
219	```
220	planopticon> /analyze meeting.mp4
221	Analyze requested for meeting.mp4. Use the CLI:
222	planopticon analyze -i /home/user/project/meeting.mp4
223	```
224
225	### /ingest PATH
226
227	Request ingestion of a file into the knowledge graph. Validates the file exists and prints the equivalent CLI command.
228
229	```
230	planopticon> /ingest notes.md
231	Ingest requested for notes.md. Use the CLI:
232	planopticon ingest /home/user/project/notes.md
233	```
234
235	### /auth [SERVICE]
236
237	Authenticate with a cloud service. When called without arguments, lists all available services. When called with a service name, triggers the authentication flow.
238
239	```
240	planopticon> /auth
241	Usage: /auth SERVICE
242	Available: dropbox, github, google, microsoft, notion, zoom
243
244	planopticon> /auth zoom
245	Zoom authenticated (oauth)
246	```
247
248	### /provider [NAME]
249
250	List available LLM providers and their status, or switch to a different provider.
251
252	When called without arguments (or with `list`), shows all known providers with their availability status:
253
254	- ready -- API key found in environment
255	- local -- runs locally (Ollama)
256	- no key -- no API key configured
257
258	The currently active provider is marked.
259
260	```
261	planopticon> /provider
262	Available providers:
263	openai: ready (active)
264	anthropic: ready
265	gemini: no key
266	ollama: local
267	azure: no key
268	together: no key
269	fireworks: no key
270	cerebras: no key
271	xai: no key
272
273	Current: openai
274	```
275
276	To switch providers at runtime:
277
278	```
279	planopticon> /provider anthropic
280	Switched to provider: anthropic
281	```
282
283	Switching the provider reinitialises the provider manager and the planning agent. The chat model is reset to the provider's default. If initialisation fails, an error message is shown.
284
285	### /model [NAME]
286
287	Show the current chat model, or switch to a different one.
288
289	```
290	planopticon> /model
291	Current model: default
292	Usage: /model MODEL_NAME
293
294	planopticon> /model claude-sonnet-4-20250514
295	Switched to model: claude-sonnet-4-20250514
296	```
297
298	Switching the model reinitialises both the provider manager and the planning agent.
299
300	### /run SKILL
301
302	Run any registered skill by name. The skill receives the current agent context (knowledge graph, query engine, provider, and any previously generated artifacts) and returns an artifact.
303
304	```
305	planopticon> /run roadmap
306	--- Roadmap (roadmap) ---
307	# Roadmap
308
309	## Vision & Strategy
310	...
311	```
312
313	If the skill cannot execute (missing KG or provider), an error message is returned. Use `/skills` to see all available skill names.
314
315	### /plan
316
317	Shortcut for `/run project_plan`. Generates a structured project plan from the loaded knowledge graph.
318
319	```
320	planopticon> /plan
321	--- Project Plan (project_plan) ---
322	# Project Plan
323
324	## Executive Summary
325	...
326	```
327
328	### /prd
329
330	Shortcut for `/run prd`. Generates a product requirements document.
331
332	```
333	planopticon> /prd
334	--- Product Requirements Document (prd) ---
335	# Product Requirements Document
336
337	## Problem Statement
338	...
339	```
340
341	### /tasks
342
343	Shortcut for `/run task_breakdown`. Breaks goals and features into tasks with dependencies, priorities, and effort estimates. The output is JSON.
344
345	```
346	planopticon> /tasks
347	--- Task Breakdown (task_list) ---
348	[
349	{
350	"id": "T1",
351	"title": "Set up authentication service",
352	"description": "Implement OAuth2 flow with JWT tokens",
353	"depends_on": [],
354	"priority": "high",
355	"estimate": "1w",
356	"assignee_role": "backend engineer"
357	},
358	...
359	]
360	```
361
362	### /quit and /exit
363
364	Exit the Companion REPL.
365
366	```
367	planopticon> /quit
368	Bye.
369	```
370
371	---
372
373	## Exiting the Companion
374
375	In addition to `/quit` and `/exit`, you can exit by:
376
377	- Typing `quit`, `exit`, `bye`, or `q` as bare words (without the `/` prefix)
378	- Pressing `Ctrl+C` or `Ctrl+D`
379
380	All of these end the session with a "Bye." message.
381
382	---
383
384	## Chat Mode
385
386	Any input that does not start with `/` and is not a bare exit word is sent to the chat agent as a natural-language message. This requires a configured LLM provider.
387
388	```
389	planopticon> What technologies were discussed in the meeting?
390	Based on the knowledge graph, the following technologies were discussed:
391
392	1. Python -- mentioned in the context of backend development
393	2. React -- proposed for the frontend redesign
394	3. PostgreSQL -- discussed as the primary database
395	...
396	```
397
398	The chat agent maintains conversation history across the session. It has full awareness of:
399
400	- The loaded knowledge graph (entity and relationship counts, types)
401	- Any artifacts generated during the session (via `/plan`, `/prd`, `/tasks`, `/run`)
402	- All available slash commands (which it may suggest when relevant)
403	- The full PlanOpticon CLI command set
404
405	If no LLM provider is configured, chat mode returns an error with instructions:
406
407	```
408	planopticon> What was discussed?
409	Chat requires an LLM provider. Set one of:
410	OPENAI_API_KEY
411	ANTHROPIC_API_KEY
412	GEMINI_API_KEY
413	Or pass --provider / --chat-model.
414	```
415
416	---
417
418	## Runtime Provider and Model Switching
419
420	One of the Companion's key features is the ability to switch LLM providers and models without restarting the session. This is useful for:
421
422	- Comparing outputs across different models
423	- Falling back to a local model (Ollama) when API keys expire
424	- Using a cheaper model for exploratory queries and a more capable one for artifact generation
425
426	When you switch providers or models via `/provider` or `/model`, the Companion:
427
428	1. Updates the internal provider name and/or model name
429	2. Reinitialises the `ProviderManager`
430	3. Reinitialises the `PlanningAgent` with a fresh `AgentContext` that retains the loaded knowledge graph and query engine
431
432	Conversation history is preserved across provider switches.
433
434	---
435
436	## Example Session
437
438	The following walkthrough shows a typical Companion session, from launch through exploration to artifact generation.
439
440	```bash
441	$ planopticon companion --kb ./results
442	```
443
444	```
445	PlanOpticon Companion
446	Interactive planning REPL
447
448	Knowledge graph: knowledge_graph.db (58 entities, 124 relationships)
449	Videos: sprint-review-2024-03.mp4
450	Docs: architecture.md, requirements.pdf
451	LLM provider: openai (model: default)
452
453	Type /help for commands, or ask a question.
454
455	planopticon> /status
456	Workspace status:
457	KG: /home/user/project/results/knowledge_graph.db (58 entities, 124 relationships)
458	technology: 20
459	person: 15
460	concept: 13
461	organization: 8
462	time: 2
463	Videos: 1 found
464	Docs: 2 found
465	Provider: active
466
467	planopticon> /entities --type person
468	Found 15 entities
469	[person] Alice -- Lead architect
470	[person] Bob -- Product manager
471	[person] Carol -- Frontend lead
472	...
473
474	planopticon> /neighbors Alice
475	Found 6 entities and 8 relationships
476	[person] Alice -- Lead architect
477	[technology] Kubernetes -- Container orchestration platform
478	[concept] Microservices -- Proposed architecture pattern
479	...
480	Alice --[proposed]--> Microservices
481	Alice --[expert_in]--> Kubernetes
482	...
483
484	planopticon> What were the main decisions made in the sprint review?
485	Based on the knowledge graph, the sprint review covered several key decisions:
486
487	1. Adopt microservices architecture -- Alice proposed and the team agreed
488	to move from the monolith to a microservices pattern.
489	2. Use Kubernetes for orchestration -- Selected over Docker Swarm.
490	3. Prioritize authentication module -- Bob identified this as the highest
491	priority for the next sprint.
492
493	planopticon> /provider anthropic
494	Switched to provider: anthropic
495
496	planopticon> /model claude-sonnet-4-20250514
497	Switched to model: claude-sonnet-4-20250514
498
499	planopticon> /plan
500	--- Project Plan (project_plan) ---
501	# Project Plan
502
503	## Executive Summary
504	This project plan outlines the migration from a monolithic architecture
505	to a microservices-based system, as discussed in the sprint review...
506
507	## Goals & Objectives
508	...
509
510	planopticon> /tasks
511	--- Task Breakdown (task_list) ---
512	[
513	{
514	"id": "T1",
515	"title": "Design service boundaries",
516	"description": "Define microservice boundaries based on domain analysis",
517	"depends_on": [],
518	"priority": "high",
519	"estimate": "3d",
520	"assignee_role": "architect"
521	},
522	...
523	]
524
525	planopticon> /export obsidian
526	Export 'obsidian' requested. Use the CLI command:
527	planopticon export obsidian /home/user/project/results/knowledge_graph.db
528
529	planopticon> quit
530	Bye.
531	```

A docs/guide/document-ingestion.md

+434

		--- a/docs/guide/document-ingestion.md
		+++ b/docs/guide/document-ingestion.md
		@@ -0,0 +1,434 @@
	1	+# Document Ingestion
	2	+
	3	+Document ingestion lets you process files -- PDFs, Markdown, and plaintext -- into a knowledge graph. PlanOpticon extracts text from documents, chunks it into manageable pieces, runs LLM-powered entity and relationship extraction, and stores the results in a FalkorDB knowledge graph. This is the same knowledge graph format produced by video analysis, so you can combine video and document insights in a single graph.
	4	+
	5	+## Supported formats
	6	+
	7	+\| Extension \| Processor \| Description \|
	8	+\|-----------\|-----------\|-------------\|
	9	+\| `.pdf` \| `PdfProcessor` \| Extracts text page by page using pymupdf or pdfplumber \|
	10	+\| `.md`, `.markdown` \| `MarkdownProcessor` \| Splits on headings into sections \|
	11	+\| `.txt`, `.text`, `.log`, `.csv` \| `PlaintextProcessor` \| Splits on paragraph boundaries \|
	12	+
	13	+Additional formats can be added by implementing the `DocumentProcessor` base class and registering it (see [Extending with custom processors](#extending-with-custom-processors) below).
	14	+
	15	+## CLI usage
	16	+
	17	+### `planopticon ingest`
	18	+
	19	+```
	20	+planopticon ingest INPUT_PATH [OPTIONS]
	21	+```
	22	+
	23	+Arguments:
	24	+
	25	+\| Argument \| Description \|
	26	+\|----------\|-------------\|
	27	+\| `INPUT_PATH` \| Path to a file or directory to ingest (must exist) \|
	28	+
	29	+Options:
	30	+
	31	+\| Option \| Short \| Default \| Description \|
	32	+\|--------\|-------\|---------\|-------------\|
	33	+\| `--output` \| `-o` \| Current directory \| Output directory for the knowledge graph \|
	34	+\| `--db-path` \| \| None \| Path to an existing `knowledge_graph.db` to merge into \|
	35	+\| `--recursive / --no-recursive` \| `-r` \| `--recursive` \| Recurse into subdirectories (directory ingestion only) \|
	36	+\| `--provider` \| `-p` \| `auto` \| LLM provider for entity extraction (`openai`, `anthropic`, `gemini`, `ollama`, `azure`, `together`, `fireworks`, `cerebras`, `xai`) \|
	37	+\| `--chat-model` \| \| None \| Override the model used for LLM entity extraction \|
	38	+
	39	+### Single file ingestion
	40	+
	41	+Process a single document and create a new knowledge graph:
	42	+
	43	+```bash
	44	+planopticon ingest spec.md
	45	+```
	46	+
	47	+This creates `knowledge_graph.db` and `knowledge_graph.json` in the current directory.
	48	+
	49	+Specify an output directory:
	50	+
	51	+```bash
	52	+planopticon ingest report.pdf -o ./results
	53	+```
	54	+
	55	+This creates `./results/knowledge_graph.db` and `./results/knowledge_graph.json`.
	56	+
	57	+### Directory ingestion
	58	+
	59	+Process all supported files in a directory:
	60	+
	61	+```bash
	62	+planopticon ingest ./docs/
	63	+```
	64	+
	65	+By default, this recurses into subdirectories. To process only the top-level directory:
	66	+
	67	+```bash
	68	+planopticon ingest ./docs/ --no-recursive
	69	+```
	70	+
	71	+PlanOpticon automatically filters for supported file extensions. Unsupported files are silently skipped.
	72	+
	73	+### Merging into an existing knowledge graph
	74	+
	75	+To add document content to an existing knowledge graph (e.g., one created from video analysis), use `--db-path`:
	76	+
	77	+```bash
	78	+# First, analyze a video
	79	+planopticon analyze meeting.mp4 -o ./results
	80	+
	81	+# Then, ingest supplementary documents into the same graph
	82	+planopticon ingest ./meeting-notes/ --db-path ./results/knowledge_graph.db
	83	+```
	84	+
	85	+The ingested entities and relationships are merged with the existing graph. Duplicate entities are consolidated automatically by the knowledge graph engine.
	86	+
	87	+### Choosing an LLM provider
	88	+
	89	+Entity and relationship extraction requires an LLM. By default, PlanOpticon auto-detects available providers based on your environment variables. You can override this:
	90	+
	91	+```bash
	92	+# Use Anthropic for extraction
	93	+planopticon ingest docs/ -p anthropic
	94	+
	95	+# Use a specific model
	96	+planopticon ingest docs/ -p openai --chat-model gpt-4o
	97	+
	98	+# Use a local Ollama model
	99	+planopticon ingest docs/ -p ollama --chat-model llama3
	100	+```
	101	+
	102	+### Output
	103	+
	104	+After ingestion, PlanOpticon prints a summary:
	105	+
	106	+```
	107	+Knowledge graph: ./knowledge_graph.db
	108	+ spec.md: 12 chunks
	109	+ architecture.md: 8 chunks
	110	+ requirements.txt: 3 chunks
	111	+
	112	+Ingestion complete:
	113	+ Files processed: 3
	114	+ Total chunks: 23
	115	+ Entities extracted: 47
	116	+ Relationships: 31
	117	+ Knowledge graph: ./knowledge_graph.db
	118	+```
	119	+
	120	+Both `.db` (SQLite/FalkorDB) and `.json` formats are saved automatically.
	121	+
	122	+## How each processor works
	123	+
	124	+### PDF processor
	125	+
	126	+The `PdfProcessor` extracts text from PDF files on a per-page basis. It tries two extraction libraries in order:
	127	+
	128	+1. pymupdf (preferred) -- Fast, reliable text extraction. Install with `pip install pymupdf`.
	129	+2. pdfplumber (fallback) -- Alternative extractor. Install with `pip install pdfplumber`.
	130	+
	131	+If neither library is installed, the processor raises an `ImportError` with installation instructions.
	132	+
	133	+Each page becomes a separate `DocumentChunk` with:
	134	+
	135	+- `text`: The extracted text content of the page
	136	+- `page`: The 1-based page number
	137	+- `metadata.extraction_method`: Which library was used (`pymupdf` or `pdfplumber`)
	138	+
	139	+To install PDF support:
	140	+
	141	+```bash
	142	+pip install 'planopticon[pdf]'
	143	+# or
	144	+pip install pymupdf
	145	+# or
	146	+pip install pdfplumber
	147	+```
	148	+
	149	+### Markdown processor
	150	+
	151	+The `MarkdownProcessor` splits Markdown files on heading boundaries (lines starting with `#` through `######`). Each heading and its content until the next heading becomes a separate chunk.
	152	+
	153	+Splitting behavior:
	154	+
	155	+- If the file contains headings, each heading section becomes a chunk. The `section` field records the heading text.
	156	+- Content before the first heading is captured as a `(preamble)` chunk.
	157	+- If the file contains no headings, it falls back to paragraph-based chunking (same as plaintext).
	158	+
	159	+For example, a file with this structure:
	160	+
	161	+```markdown
	162	+Some intro text.
	163	+
	164	+# Architecture
	165	+
	166	+The system uses a microservices architecture...
	167	+
	168	+## Components
	169	+
	170	+There are three main components...
	171	+
	172	+# Deployment
	173	+
	174	+Deployment is handled via...
	175	+```
	176	+
	177	+Produces four chunks: `(preamble)`, `Architecture`, `Components`, and `Deployment`.
	178	+
	179	+### Plaintext processor
	180	+
	181	+The `PlaintextProcessor` handles `.txt`, `.text`, `.log`, and `.csv` files. It splits text on paragraph boundaries (double newlines) and groups paragraphs into chunks with a configurable maximum size.
	182	+
	183	+Chunking parameters:
	184	+
	185	+\| Parameter \| Default \| Description \|
	186	+\|-----------\|---------\|-------------\|
	187	+\| `max_chunk_size` \| 2000 characters \| Maximum size of each chunk \|
	188	+\| `overlap` \| 200 characters \| Number of characters from the end of one chunk to repeat at the start of the next \|
	189	+
	190	+The overlap ensures that entities or context that spans a paragraph boundary are not lost. Chunks are created by accumulating paragraphs until the next paragraph would exceed `max_chunk_size`, at which point the current chunk is flushed and a new one begins.
	191	+
	192	+## The ingestion pipeline
	193	+
	194	+Document ingestion follows this pipeline:
	195	+
	196	+```
	197	+File on disk
	198	+ \|
	199	+ v
	200	+Processor selection (by file extension)
	201	+ \|
	202	+ v
	203	+Text extraction (PDF pages / Markdown sections / plaintext paragraphs)
	204	+ \|
	205	+ v
	206	+DocumentChunk objects (text + metadata)
	207	+ \|
	208	+ v
	209	+Source registration (provenance tracking in the KG)
	210	+ \|
	211	+ v
	212	+KG content addition (LLM entity/relationship extraction per chunk)
	213	+ \|
	214	+ v
	215	+Knowledge graph storage (.db + .json)
	216	+```
	217	+
	218	+### Step 1: Processor selection
	219	+
	220	+PlanOpticon maintains a registry of processors keyed by file extension. When you call `ingest_file()`, it looks up the appropriate processor using `get_processor(path)`. If no processor is registered for the file extension, a `ValueError` is raised.
	221	+
	222	+### Step 2: Text extraction
	223	+
	224	+The selected processor reads the file and produces a list of `DocumentChunk` objects. Each chunk contains:
	225	+
	226	+\| Field \| Type \| Description \|
	227	+\|-------\|------\|-------------\|
	228	+\| `text` \| `str` \| The extracted text content \|
	229	+\| `source_file` \| `str` \| Path to the source file \|
	230	+\| `chunk_index` \| `int` \| Sequential index of this chunk within the file \|
	231	+\| `page` \| `Optional[int]` \| Page number (PDF only, 1-based) \|
	232	+\| `section` \| `Optional[str]` \| Section heading (Markdown only) \|
	233	+\| `metadata` \| `Dict[str, Any]` \| Additional metadata (e.g., extraction method) \|
	234	+
	235	+### Step 3: Source registration
	236	+
	237	+Each ingested file is registered as a source in the knowledge graph with provenance metadata:
	238	+
	239	+- `source_id`: A SHA-256 hash of the absolute file path (first 12 characters), unless you provide a custom ID
	240	+- `source_type`: Always `"document"`
	241	+- `title`: The file stem (filename without extension)
	242	+- `path`: The file path
	243	+- `mime_type`: Detected MIME type
	244	+- `ingested_at`: ISO-8601 timestamp
	245	+- `metadata`: Chunk count and file extension
	246	+
	247	+### Step 4: Entity and relationship extraction
	248	+
	249	+Each chunk's text is passed to `knowledge_graph.add_content()`, which uses the configured LLM provider to extract entities and relationships. The content source is tagged with the document name and either the page number or section name:
	250	+
	251	+- `document:report.pdf:page:3`
	252	+- `document:spec.md:section:Architecture`
	253	+- `document:notes.txt` (no page or section)
	254	+
	255	+### Step 5: Storage
	256	+
	257	+The knowledge graph is saved in both `.db` (SQLite-backed FalkorDB) and `.json` formats.
	258	+
	259	+## Combining with video analysis
	260	+
	261	+A common workflow is to analyze a video recording and then ingest related documents into the same knowledge graph:
	262	+
	263	+```bash
	264	+# Step 1: Analyze the meeting recording
	265	+planopticon analyze meeting-recording.mp4 -o ./project-kg
	266	+
	267	+# Step 2: Ingest the meeting agenda
	268	+planopticon ingest agenda.md --db-path ./project-kg/knowledge_graph.db
	269	+
	270	+# Step 3: Ingest the project spec
	271	+planopticon ingest project-spec.pdf --db-path ./project-kg/knowledge_graph.db
	272	+
	273	+# Step 4: Ingest a whole docs folder
	274	+planopticon ingest ./reference-docs/ --db-path ./project-kg/knowledge_graph.db
	275	+
	276	+# Step 5: Query the combined graph
	277	+planopticon query --db-path ./project-kg/knowledge_graph.db
	278	+```
	279	+
	280	+The resulting knowledge graph contains entities and relationships from all sources -- video transcripts, meeting agendas, specs, and reference documents -- with full provenance tracking so you can trace any entity back to its source.
	281	+
	282	+## Python API
	283	+
	284	+### Ingesting a single file
	285	+
	286	+```python
	287	+from pathlib import Path
	288	+from video_processor.integrators.knowledge_graph import KnowledgeGraph
	289	+from video_processor.processors.ingest import ingest_file
	290	+
	291	+kg = KnowledgeGraph(db_path=Path("knowledge_graph.db"))
	292	+chunk_count = ingest_file(Path("document.pdf"), kg)
	293	+print(f"Processed {chunk_count} chunks")
	294	+
	295	+kg.save(Path("knowledge_graph.db"))
	296	+```
	297	+
	298	+### Ingesting a directory
	299	+
	300	+```python
	301	+from pathlib import Path
	302	+from video_processor.integrators.knowledge_graph import KnowledgeGraph
	303	+from video_processor.processors.ingest import ingest_directory
	304	+
	305	+kg = KnowledgeGraph(db_path=Path("knowledge_graph.db"))
	306	+results = ingest_directory(
	307	+ Path("./docs"),
	308	+ kg,
	309	+ recursive=True,
	310	+ extensions=[".md", ".pdf"], # Optional: filter by extension
	311	+)
	312	+
	313	+for filepath, chunks in results.items():
	314	+ print(f" {filepath}: {chunks} chunks")
	315	+
	316	+kg.save(Path("knowledge_graph.db"))
	317	+```
	318	+
	319	+### Listing supported extensions
	320	+
	321	+```python
	322	+from video_processor.processors.base import list_supported_extensions
	323	+
	324	+extensions = list_supported_extensions()
	325	+print(extensions)
	326	+# ['.csv', '.log', '.markdown', '.md', '.pdf', '.text', '.txt']
	327	+```
	328	+
	329	+### Working with processors directly
	330	+
	331	+```python
	332	+from pathlib import Path
	333	+from video_processor.processors.base import get_processor
	334	+
	335	+processor = get_processor(Path("report.pdf"))
	336	+if processor:
	337	+ chunks = processor.process(Path("report.pdf"))
	338	+ for chunk in chunks:
	339	+ print(f"Page {chunk.page}: {chunk.text[:100]}...")
	340	+```
	341	+
	342	+## Extending with custom processors
	343	+
	344	+To add support for a new file format, implement the `DocumentProcessor` abstract class and register it:
	345	+
	346	+```python
	347	+from pathlib import Path
	348	+from typing import List
	349	+from video_processor.processors.base import (
	350	+ DocumentChunk,
	351	+ DocumentProcessor,
	352	+ register_processor,
	353	+)
	354	+
	355	+
	356	+class HtmlProcessor(DocumentProcessor):
	357	+ supported_extensions = [".html", ".htm"]
	358	+
	359	+ def can_process(self, path: Path) -> bool:
	360	+ return path.suffix.lower() in self.supported_extensions
	361	+
	362	+ def process(self, path: Path) -> List[DocumentChunk]:
	363	+ from bs4 import BeautifulSoup
	364	+
	365	+ soup = BeautifulSoup(path.read_text(), "html.parser")
	366	+ text = soup.get_text(separator="\n")
	367	+ return [
	368	+ DocumentChunk(
	369	+ text=text,
	370	+ source_file=str(path),
	371	+ chunk_index=0,
	372	+ )
	373	+ ]
	374	+
	375	+
	376	+register_processor(HtmlProcessor.supported_extensions, HtmlProcessor)
	377	+```
	378	+
	379	+After registration, `planopticon ingest` will automatically handle `.html` and `.htm` files.
	380	+
	381	+## Companion REPL
	382	+
	383	+Inside the interactive companion REPL, you can ingest files using the `/ingest` command:
	384	+
	385	+```
	386	+> /ingest ./meeting-notes.md
	387	+Ingested meeting-notes.md: 5 chunks
	388	+```
	389	+
	390	+This adds content to the currently loaded knowledge graph.
	391	+
	392	+## Common workflows
	393	+
	394	+### Build a project knowledge base from scratch
	395	+
	396	+```bash
	397	+# Ingest all project docs
	398	+planopticon ingest ./project-docs/ -o ./knowledge-base
	399	+
	400	+# Query what was captured
	401	+planopticon query --db-path ./knowledge-base/knowledge_graph.db
	402	+
	403	+# Export as an Obsidian vault
	404	+planopticon export obsidian ./knowledge-base/knowledge_graph.db -o ./vault
	405	+```
	406	+
	407	+### Incrementally build a knowledge graph
	408	+
	409	+```bash
	410	+# Start with initial docs
	411	+planopticon ingest ./sprint-1-docs/ -o ./kg
	412	+
	413	+# Add more docs over time
	414	+planopticon ingest ./sprint-2-docs/ --db-path ./kg/knowledge_graph.db
	415	+planopticon ingest ./sprint-3-docs/ --db-path ./kg/knowledge_graph.db
	416	+
	417	+# The graph grows with each ingestion
	418	+planopticon query --db-path ./kg/knowledge_graph.db stats
	419	+```
	420	+
	421	+### Ingest from Google Workspace or Microsoft 365
	422	+
	423	+PlanOpticon provides integrated commands that fetch cloud documents and ingest them in one step:
	424	+
	425	+```bash
	426	+# Google Workspace
	427	+planopticon gws ingest --folder-id FOLDER_ID -o ./results
	428	+
	429	+# Microsoft 365 / SharePoint
	430	+planopticon m365 ingest --web-url https://contoso.sharepoint.com/sites/proj \
	431	+ --folder-url /sites/proj/Shared\ Documents
	432	+```
	433	+
	434	+These commands handle authentication, document download, text extraction, and knowledge graph creation automatically.

	--- a/docs/guide/document-ingestion.md
	+++ b/docs/guide/document-ingestion.md
	@@ -0,0 +1,434 @@

	--- a/docs/guide/document-ingestion.md
	+++ b/docs/guide/document-ingestion.md
	@@ -0,0 +1,434 @@
1	# Document Ingestion
2
3	Document ingestion lets you process files -- PDFs, Markdown, and plaintext -- into a knowledge graph. PlanOpticon extracts text from documents, chunks it into manageable pieces, runs LLM-powered entity and relationship extraction, and stores the results in a FalkorDB knowledge graph. This is the same knowledge graph format produced by video analysis, so you can combine video and document insights in a single graph.
4
5	## Supported formats
6
7	\| Extension \| Processor \| Description \|
8	\|-----------\|-----------\|-------------\|
9	\| `.pdf` \| `PdfProcessor` \| Extracts text page by page using pymupdf or pdfplumber \|
10	\| `.md`, `.markdown` \| `MarkdownProcessor` \| Splits on headings into sections \|
11	\| `.txt`, `.text`, `.log`, `.csv` \| `PlaintextProcessor` \| Splits on paragraph boundaries \|
12
13	Additional formats can be added by implementing the `DocumentProcessor` base class and registering it (see [Extending with custom processors](#extending-with-custom-processors) below).
14
15	## CLI usage
16
17	### `planopticon ingest`
18
19	```
20	planopticon ingest INPUT_PATH [OPTIONS]
21	```
22
23	Arguments:
24
25	\| Argument \| Description \|
26	\|----------\|-------------\|
27	\| `INPUT_PATH` \| Path to a file or directory to ingest (must exist) \|
28
29	Options:
30
31	\| Option \| Short \| Default \| Description \|
32	\|--------\|-------\|---------\|-------------\|
33	\| `--output` \| `-o` \| Current directory \| Output directory for the knowledge graph \|
34	\| `--db-path` \| \| None \| Path to an existing `knowledge_graph.db` to merge into \|
35	\| `--recursive / --no-recursive` \| `-r` \| `--recursive` \| Recurse into subdirectories (directory ingestion only) \|
36	\| `--provider` \| `-p` \| `auto` \| LLM provider for entity extraction (`openai`, `anthropic`, `gemini`, `ollama`, `azure`, `together`, `fireworks`, `cerebras`, `xai`) \|
37	\| `--chat-model` \| \| None \| Override the model used for LLM entity extraction \|
38
39	### Single file ingestion
40
41	Process a single document and create a new knowledge graph:
42
43	```bash
44	planopticon ingest spec.md
45	```
46
47	This creates `knowledge_graph.db` and `knowledge_graph.json` in the current directory.
48
49	Specify an output directory:
50
51	```bash
52	planopticon ingest report.pdf -o ./results
53	```
54
55	This creates `./results/knowledge_graph.db` and `./results/knowledge_graph.json`.
56
57	### Directory ingestion
58
59	Process all supported files in a directory:
60
61	```bash
62	planopticon ingest ./docs/
63	```
64
65	By default, this recurses into subdirectories. To process only the top-level directory:
66
67	```bash
68	planopticon ingest ./docs/ --no-recursive
69	```
70
71	PlanOpticon automatically filters for supported file extensions. Unsupported files are silently skipped.
72
73	### Merging into an existing knowledge graph
74
75	To add document content to an existing knowledge graph (e.g., one created from video analysis), use `--db-path`:
76
77	```bash
78	# First, analyze a video
79	planopticon analyze meeting.mp4 -o ./results
80
81	# Then, ingest supplementary documents into the same graph
82	planopticon ingest ./meeting-notes/ --db-path ./results/knowledge_graph.db
83	```
84
85	The ingested entities and relationships are merged with the existing graph. Duplicate entities are consolidated automatically by the knowledge graph engine.
86
87	### Choosing an LLM provider
88
89	Entity and relationship extraction requires an LLM. By default, PlanOpticon auto-detects available providers based on your environment variables. You can override this:
90
91	```bash
92	# Use Anthropic for extraction
93	planopticon ingest docs/ -p anthropic
94
95	# Use a specific model
96	planopticon ingest docs/ -p openai --chat-model gpt-4o
97
98	# Use a local Ollama model
99	planopticon ingest docs/ -p ollama --chat-model llama3
100	```
101
102	### Output
103
104	After ingestion, PlanOpticon prints a summary:
105
106	```
107	Knowledge graph: ./knowledge_graph.db
108	spec.md: 12 chunks
109	architecture.md: 8 chunks
110	requirements.txt: 3 chunks
111
112	Ingestion complete:
113	Files processed: 3
114	Total chunks: 23
115	Entities extracted: 47
116	Relationships: 31
117	Knowledge graph: ./knowledge_graph.db
118	```
119
120	Both `.db` (SQLite/FalkorDB) and `.json` formats are saved automatically.
121
122	## How each processor works
123
124	### PDF processor
125
126	The `PdfProcessor` extracts text from PDF files on a per-page basis. It tries two extraction libraries in order:
127
128	1. pymupdf (preferred) -- Fast, reliable text extraction. Install with `pip install pymupdf`.
129	2. pdfplumber (fallback) -- Alternative extractor. Install with `pip install pdfplumber`.
130
131	If neither library is installed, the processor raises an `ImportError` with installation instructions.
132
133	Each page becomes a separate `DocumentChunk` with:
134
135	- `text`: The extracted text content of the page
136	- `page`: The 1-based page number
137	- `metadata.extraction_method`: Which library was used (`pymupdf` or `pdfplumber`)
138
139	To install PDF support:
140
141	```bash
142	pip install 'planopticon[pdf]'
143	# or
144	pip install pymupdf
145	# or
146	pip install pdfplumber
147	```
148
149	### Markdown processor
150
151	The `MarkdownProcessor` splits Markdown files on heading boundaries (lines starting with `#` through `######`). Each heading and its content until the next heading becomes a separate chunk.
152
153	Splitting behavior:
154
155	- If the file contains headings, each heading section becomes a chunk. The `section` field records the heading text.
156	- Content before the first heading is captured as a `(preamble)` chunk.
157	- If the file contains no headings, it falls back to paragraph-based chunking (same as plaintext).
158
159	For example, a file with this structure:
160
161	```markdown
162	Some intro text.
163
164	# Architecture
165
166	The system uses a microservices architecture...
167
168	## Components
169
170	There are three main components...
171
172	# Deployment
173
174	Deployment is handled via...
175	```
176
177	Produces four chunks: `(preamble)`, `Architecture`, `Components`, and `Deployment`.
178
179	### Plaintext processor
180
181	The `PlaintextProcessor` handles `.txt`, `.text`, `.log`, and `.csv` files. It splits text on paragraph boundaries (double newlines) and groups paragraphs into chunks with a configurable maximum size.
182
183	Chunking parameters:
184
185	\| Parameter \| Default \| Description \|
186	\|-----------\|---------\|-------------\|
187	\| `max_chunk_size` \| 2000 characters \| Maximum size of each chunk \|
188	\| `overlap` \| 200 characters \| Number of characters from the end of one chunk to repeat at the start of the next \|
189
190	The overlap ensures that entities or context that spans a paragraph boundary are not lost. Chunks are created by accumulating paragraphs until the next paragraph would exceed `max_chunk_size`, at which point the current chunk is flushed and a new one begins.
191
192	## The ingestion pipeline
193
194	Document ingestion follows this pipeline:
195
196	```
197	File on disk
198	\|
199	v
200	Processor selection (by file extension)
201	\|
202	v
203	Text extraction (PDF pages / Markdown sections / plaintext paragraphs)
204	\|
205	v
206	DocumentChunk objects (text + metadata)
207	\|
208	v
209	Source registration (provenance tracking in the KG)
210	\|
211	v
212	KG content addition (LLM entity/relationship extraction per chunk)
213	\|
214	v
215	Knowledge graph storage (.db + .json)
216	```
217
218	### Step 1: Processor selection
219
220	PlanOpticon maintains a registry of processors keyed by file extension. When you call `ingest_file()`, it looks up the appropriate processor using `get_processor(path)`. If no processor is registered for the file extension, a `ValueError` is raised.
221
222	### Step 2: Text extraction
223
224	The selected processor reads the file and produces a list of `DocumentChunk` objects. Each chunk contains:
225
226	\| Field \| Type \| Description \|
227	\|-------\|------\|-------------\|
228	\| `text` \| `str` \| The extracted text content \|
229	\| `source_file` \| `str` \| Path to the source file \|
230	\| `chunk_index` \| `int` \| Sequential index of this chunk within the file \|
231	\| `page` \| `Optional[int]` \| Page number (PDF only, 1-based) \|
232	\| `section` \| `Optional[str]` \| Section heading (Markdown only) \|
233	\| `metadata` \| `Dict[str, Any]` \| Additional metadata (e.g., extraction method) \|
234
235	### Step 3: Source registration
236
237	Each ingested file is registered as a source in the knowledge graph with provenance metadata:
238
239	- `source_id`: A SHA-256 hash of the absolute file path (first 12 characters), unless you provide a custom ID
240	- `source_type`: Always `"document"`
241	- `title`: The file stem (filename without extension)
242	- `path`: The file path
243	- `mime_type`: Detected MIME type
244	- `ingested_at`: ISO-8601 timestamp
245	- `metadata`: Chunk count and file extension
246
247	### Step 4: Entity and relationship extraction
248
249	Each chunk's text is passed to `knowledge_graph.add_content()`, which uses the configured LLM provider to extract entities and relationships. The content source is tagged with the document name and either the page number or section name:
250
251	- `document:report.pdf:page:3`
252	- `document:spec.md:section:Architecture`
253	- `document:notes.txt` (no page or section)
254
255	### Step 5: Storage
256
257	The knowledge graph is saved in both `.db` (SQLite-backed FalkorDB) and `.json` formats.
258
259	## Combining with video analysis
260
261	A common workflow is to analyze a video recording and then ingest related documents into the same knowledge graph:
262
263	```bash
264	# Step 1: Analyze the meeting recording
265	planopticon analyze meeting-recording.mp4 -o ./project-kg
266
267	# Step 2: Ingest the meeting agenda
268	planopticon ingest agenda.md --db-path ./project-kg/knowledge_graph.db
269
270	# Step 3: Ingest the project spec
271	planopticon ingest project-spec.pdf --db-path ./project-kg/knowledge_graph.db
272
273	# Step 4: Ingest a whole docs folder
274	planopticon ingest ./reference-docs/ --db-path ./project-kg/knowledge_graph.db
275
276	# Step 5: Query the combined graph
277	planopticon query --db-path ./project-kg/knowledge_graph.db
278	```
279
280	The resulting knowledge graph contains entities and relationships from all sources -- video transcripts, meeting agendas, specs, and reference documents -- with full provenance tracking so you can trace any entity back to its source.
281
282	## Python API
283
284	### Ingesting a single file
285
286	```python
287	from pathlib import Path
288	from video_processor.integrators.knowledge_graph import KnowledgeGraph
289	from video_processor.processors.ingest import ingest_file
290
291	kg = KnowledgeGraph(db_path=Path("knowledge_graph.db"))
292	chunk_count = ingest_file(Path("document.pdf"), kg)
293	print(f"Processed {chunk_count} chunks")
294
295	kg.save(Path("knowledge_graph.db"))
296	```
297
298	### Ingesting a directory
299
300	```python
301	from pathlib import Path
302	from video_processor.integrators.knowledge_graph import KnowledgeGraph
303	from video_processor.processors.ingest import ingest_directory
304
305	kg = KnowledgeGraph(db_path=Path("knowledge_graph.db"))
306	results = ingest_directory(
307	Path("./docs"),
308	kg,
309	recursive=True,
310	extensions=[".md", ".pdf"], # Optional: filter by extension
311	)
312
313	for filepath, chunks in results.items():
314	print(f" {filepath}: {chunks} chunks")
315
316	kg.save(Path("knowledge_graph.db"))
317	```
318
319	### Listing supported extensions
320
321	```python
322	from video_processor.processors.base import list_supported_extensions
323
324	extensions = list_supported_extensions()
325	print(extensions)
326	# ['.csv', '.log', '.markdown', '.md', '.pdf', '.text', '.txt']
327	```
328
329	### Working with processors directly
330
331	```python
332	from pathlib import Path
333	from video_processor.processors.base import get_processor
334
335	processor = get_processor(Path("report.pdf"))
336	if processor:
337	chunks = processor.process(Path("report.pdf"))
338	for chunk in chunks:
339	print(f"Page {chunk.page}: {chunk.text[:100]}...")
340	```
341
342	## Extending with custom processors
343
344	To add support for a new file format, implement the `DocumentProcessor` abstract class and register it:
345
346	```python
347	from pathlib import Path
348	from typing import List
349	from video_processor.processors.base import (
350	DocumentChunk,
351	DocumentProcessor,
352	register_processor,
353	)
354
355
356	class HtmlProcessor(DocumentProcessor):
357	supported_extensions = [".html", ".htm"]
358
359	def can_process(self, path: Path) -> bool:
360	return path.suffix.lower() in self.supported_extensions
361
362	def process(self, path: Path) -> List[DocumentChunk]:
363	from bs4 import BeautifulSoup
364
365	soup = BeautifulSoup(path.read_text(), "html.parser")
366	text = soup.get_text(separator="\n")
367	return [
368	DocumentChunk(
369	text=text,
370	source_file=str(path),
371	chunk_index=0,
372	)
373	]
374
375
376	register_processor(HtmlProcessor.supported_extensions, HtmlProcessor)
377	```
378
379	After registration, `planopticon ingest` will automatically handle `.html` and `.htm` files.
380
381	## Companion REPL
382
383	Inside the interactive companion REPL, you can ingest files using the `/ingest` command:
384
385	```
386	> /ingest ./meeting-notes.md
387	Ingested meeting-notes.md: 5 chunks
388	```
389
390	This adds content to the currently loaded knowledge graph.
391
392	## Common workflows
393
394	### Build a project knowledge base from scratch
395
396	```bash
397	# Ingest all project docs
398	planopticon ingest ./project-docs/ -o ./knowledge-base
399
400	# Query what was captured
401	planopticon query --db-path ./knowledge-base/knowledge_graph.db
402
403	# Export as an Obsidian vault
404	planopticon export obsidian ./knowledge-base/knowledge_graph.db -o ./vault
405	```
406
407	### Incrementally build a knowledge graph
408
409	```bash
410	# Start with initial docs
411	planopticon ingest ./sprint-1-docs/ -o ./kg
412
413	# Add more docs over time
414	planopticon ingest ./sprint-2-docs/ --db-path ./kg/knowledge_graph.db
415	planopticon ingest ./sprint-3-docs/ --db-path ./kg/knowledge_graph.db
416
417	# The graph grows with each ingestion
418	planopticon query --db-path ./kg/knowledge_graph.db stats
419	```
420
421	### Ingest from Google Workspace or Microsoft 365
422
423	PlanOpticon provides integrated commands that fetch cloud documents and ingest them in one step:
424
425	```bash
426	# Google Workspace
427	planopticon gws ingest --folder-id FOLDER_ID -o ./results
428
429	# Microsoft 365 / SharePoint
430	planopticon m365 ingest --web-url https://contoso.sharepoint.com/sites/proj \
431	--folder-url /sites/proj/Shared\ Documents
432	```
433
434	These commands handle authentication, document download, text extraction, and knowledge graph creation automatically.

A docs/guide/export.md

+756

		--- a/docs/guide/export.md
		+++ b/docs/guide/export.md
		@@ -0,0 +1,756 @@
	1	+# Export
	2	+
	3	+PlanOpticon provides multiple ways to export knowledge graph data into formats suitable for documentation, note-taking, collaboration, and interchange. All export commands work offline from a `knowledge_graph.db` file -- no API key is needed for template-based exports.
	4	+
	5	+## Overview of export options
	6	+
	7	+\| Format \| Command \| API Key \| Description \|
	8	+\|--------\|---------\|---------\|-------------\|
	9	+\| Markdown documents \| `planopticon export markdown` \| No \| 7 document types: summary, meeting notes, glossary, and more \|
	10	+\| Obsidian vault \| `planopticon export obsidian` \| No \| YAML frontmatter, `[[wiki-links]]`, tag pages, Map of Content \|
	11	+\| Notion-compatible \| `planopticon export notion` \| No \| Callout blocks, CSV database for bulk import \|
	12	+\| PlanOpticonExchange JSON \| `planopticon export exchange` \| No \| Canonical interchange format for merging and sharing \|
	13	+\| GitHub wiki \| `planopticon wiki generate` \| No \| Home, Sidebar, entity pages, type indexes \|
	14	+\| GitHub wiki push \| `planopticon wiki push` \| Git auth \| Push generated wiki to a GitHub repo \|
	15	+
	16	+## Markdown document generator
	17	+
	18	+The markdown exporter produces structured documents from knowledge graph data using pure template-based generation. No LLM calls are made -- the output is deterministic and based entirely on the entities and relationships in the graph.
	19	+
	20	+### CLI usage
	21	+
	22	+```
	23	+planopticon export markdown DB_PATH [OPTIONS]
	24	+```
	25	+
	26	+Arguments:
	27	+
	28	+\| Argument \| Description \|
	29	+\|----------\|-------------\|
	30	+\| `DB_PATH` \| Path to a `knowledge_graph.db` file \|
	31	+
	32	+Options:
	33	+
	34	+\| Option \| Short \| Default \| Description \|
	35	+\|--------\|-------\|---------\|-------------\|
	36	+\| `--output` \| `-o` \| `./export` \| Output directory \|
	37	+\| `--type` \| \| `all` \| Document types to generate (repeatable). Choices: `summary`, `meeting-notes`, `glossary`, `relationship-map`, `status-report`, `entity-index`, `csv`, `all` \|
	38	+
	39	+Examples:
	40	+
	41	+```bash
	42	+# Generate all document types
	43	+planopticon export markdown knowledge_graph.db
	44	+
	45	+# Generate only summary and glossary
	46	+planopticon export markdown kg.db -o ./docs --type summary --type glossary
	47	+
	48	+# Generate meeting notes and CSV
	49	+planopticon export markdown kg.db --type meeting-notes --type csv
	50	+```
	51	+
	52	+### Document types
	53	+
	54	+#### summary (Executive Summary)
	55	+
	56	+A high-level overview of the knowledge graph. Contains:
	57	+
	58	+- Total entity and relationship counts
	59	+- Entity breakdown by type (table with counts and example names)
	60	+- Key entities ranked by number of connections (top 10)
	61	+- Relationship type breakdown with counts
	62	+
	63	+This is useful for getting a quick overview of what a knowledge base contains.
	64	+
	65	+#### meeting-notes (Meeting Notes)
	66	+
	67	+Formats knowledge graph data as structured meeting notes. Organizes entities into planning-relevant categories:
	68	+
	69	+- Discussion Topics: Entities of type `concept`, `technology`, or `topic` with their descriptions
	70	+- Participants: Entities of type `person`
	71	+- Decisions & Constraints: Entities of type `decision` or `constraint`
	72	+- Action Items: Entities of type `goal`, `feature`, or `milestone`, shown as checkboxes. If an entity has an `assigned_to` or `owned_by` relationship, the owner is shown as `@name`
	73	+- Open Questions / Loose Ends: Entities with one or fewer relationships (excluding people), indicating topics that may need follow-up
	74	+
	75	+Includes a generation timestamp.
	76	+
	77	+#### glossary (Glossary)
	78	+
	79	+An alphabetically sorted dictionary of all entities in the knowledge graph. Each entry shows:
	80	+
	81	+- Entity name (bold)
	82	+- Entity type (italic, in parentheses)
	83	+- First description
	84	+
	85	+Format:
	86	+
	87	+```
	88	+Entity Name (type)
	89	+: Description text here.
	90	+```
	91	+
	92	+#### relationship-map (Relationship Map)
	93	+
	94	+A comprehensive view of all relationships in the graph, organized by relationship type. Each type gets its own section with a table of source-target pairs.
	95	+
	96	+Also includes a Mermaid diagram of the top 20 most-connected entities, rendered as a `graph LR` flowchart with labeled edges. This diagram can be rendered natively in GitHub, GitLab, Obsidian, and many other Markdown viewers.
	97	+
	98	+#### status-report (Status Report)
	99	+
	100	+A project-oriented status report that highlights planning entities:
	101	+
	102	+- Overview: Counts of entities, relationships, features, milestones, requirements, and risks/constraints
	103	+- Milestones: Entities of type `milestone` with descriptions
	104	+- Features: Table of entities of type `feature` with descriptions (truncated to 60 characters)
	105	+- Risks & Constraints: Entities of type `risk` or `constraint`
	106	+
	107	+Includes a generation timestamp.
	108	+
	109	+#### entity-index (Entity Index)
	110	+
	111	+A master index of all entities grouped by type. Each type section lists entities alphabetically with their first description. Shows total entity count and number of types.
	112	+
	113	+#### csv (CSV Export)
	114	+
	115	+A CSV file suitable for spreadsheet import. Columns:
	116	+
	117	+\| Column \| Description \|
	118	+\|--------\|-------------\|
	119	+\| Name \| Entity name \|
	120	+\| Type \| Entity type \|
	121	+\| Description \| First description \|
	122	+\| Related To \| Semicolon-separated list of entities this entity has outgoing relationships to \|
	123	+\| Source \| First occurrence source \|
	124	+
	125	+### Entity briefs
	126	+
	127	+In addition to the selected document types, the `generate_all()` function automatically creates individual entity brief pages in an `entities/` subdirectory. Each brief contains:
	128	+
	129	+- Entity name and type
	130	+- Summary (all descriptions)
	131	+- Outgoing relationships (table of target entities and relationship types)
	132	+- Incoming relationships (table of source entities and relationship types)
	133	+- Source occurrences with timestamps and context text
	134	+
	135	+## Obsidian vault export
	136	+
	137	+The Obsidian exporter creates a complete vault structure with YAML frontmatter, `[[wiki-links]]` for entity cross-references, and Obsidian-compatible metadata.
	138	+
	139	+### CLI usage
	140	+
	141	+```
	142	+planopticon export obsidian DB_PATH [OPTIONS]
	143	+```
	144	+
	145	+Options:
	146	+
	147	+\| Option \| Short \| Default \| Description \|
	148	+\|--------\|-------\|---------\|-------------\|
	149	+\| `--output` \| `-o` \| `./obsidian-vault` \| Output vault directory \|
	150	+
	151	+Example:
	152	+
	153	+```bash
	154	+planopticon export obsidian knowledge_graph.db -o ./my-vault
	155	+```
	156	+
	157	+### Generated structure
	158	+
	159	+```
	160	+my-vault/
	161	+ _Index.md # Map of Content (MOC)
	162	+ Tag - Person.md # One tag page per entity type
	163	+ Tag - Technology.md
	164	+ Tag - Concept.md
	165	+ Alice.md # Individual entity notes
	166	+ Python.md
	167	+ Microservices.md
	168	+ ...
	169	+```
	170	+
	171	+### Entity notes
	172	+
	173	+Each entity gets a dedicated note with:
	174	+
	175	+YAML frontmatter:
	176	+
	177	+```yaml
	178	+---
	179	+type: technology
	180	+tags:
	181	+ - technology
	182	+aliases:
	183	+ - Python 3
	184	+ - CPython
	185	+date: 2026-03-07
	186	+---
	187	+```
	188	+
	189	+The frontmatter includes:
	190	+
	191	+- `type`: The entity type
	192	+- `tags`: Entity type as a tag (for Obsidian tag-based filtering)
	193	+- `aliases`: Any known aliases for the entity (if available)
	194	+- `date`: The export date
	195	+
	196	+Body content:
	197	+
	198	+- `# Entity Name` heading
	199	+- Description paragraphs
	200	+- `## Relationships` section with `[[wiki-links]]` to related entities:
	201	+ ```
	202	+ - uses: [[FastAPI]]
	203	+ - depends_on: [[PostgreSQL]]
	204	+ ```
	205	+- `## Referenced by` section with incoming relationships:
	206	+ ```
	207	+ - implements from [[Backend Service]]
	208	+ ```
	209	+
	210	+### Index note (Map of Content)
	211	+
	212	+The `_Index.md` file serves as a Map of Content (MOC), listing all entities grouped by type with `[[wiki-links]]`:
	213	+
	214	+```markdown
	215	+---
	216	+type: index
	217	+tags:
	218	+ - MOC
	219	+date: 2026-03-07
	220	+---
	221	+
	222	+# Index
	223	+
	224	+47 entities \| 31 relationships
	225	+
	226	+## Concept
	227	+
	228	+- [[Microservices]]
	229	+- [[REST API]]
	230	+
	231	+## Person
	232	+
	233	+- [[Alice]]
	234	+- [[Bob]]
	235	+```
	236	+
	237	+### Tag pages
	238	+
	239	+One tag page is created per entity type (e.g., `Tag - Person.md`, `Tag - Technology.md`). Each page has frontmatter tagging it with the entity type and lists all entities of that type with descriptions.
	240	+
	241	+## Notion-compatible markdown export
	242	+
	243	+The Notion exporter creates Markdown files with Notion-style callout blocks and a CSV database file for bulk import into Notion.
	244	+
	245	+### CLI usage
	246	+
	247	+```
	248	+planopticon export notion DB_PATH [OPTIONS]
	249	+```
	250	+
	251	+Options:
	252	+
	253	+\| Option \| Short \| Default \| Description \|
	254	+\|--------\|-------\|---------\|-------------\|
	255	+\| `--output` \| `-o` \| `./notion-export` \| Output directory \|
	256	+
	257	+Example:
	258	+
	259	+```bash
	260	+planopticon export notion knowledge_graph.db -o ./notion-export
	261	+```
	262	+
	263	+### Generated structure
	264	+
	265	+```
	266	+notion-export/
	267	+ Overview.md # Knowledge graph overview page
	268	+ entities_database.csv # CSV for Notion database import
	269	+ Alice.md # Individual entity pages
	270	+ Python.md
	271	+ ...
	272	+```
	273	+
	274	+### Entity pages
	275	+
	276	+Each entity page uses Notion-style callout syntax for metadata:
	277	+
	278	+```markdown
	279	+# Python
	280	+
	281	+> :computer: Type: technology
	282	+
	283	+## Description
	284	+
	285	+A high-level programming language...
	286	+
	287	+> :memo: Properties
	288	+> - version: 3.11
	289	+> - paradigm: multi-paradigm
	290	+
	291	+## Relationships
	292	+
	293	+\| Target \| Relationship \|
	294	+\|--------\|-------------\|
	295	+\| FastAPI \| uses \|
	296	+\| Django \| framework_for \|
	297	+
	298	+## Referenced by
	299	+
	300	+\| Source \| Relationship \|
	301	+\|--------\|-------------\|
	302	+\| Backend Service \| implements \|
	303	+```
	304	+
	305	+### CSV database
	306	+
	307	+The `entities_database.csv` file contains all entities in a format suitable for Notion's CSV database import:
	308	+
	309	+\| Column \| Description \|
	310	+\|--------\|-------------\|
	311	+\| Name \| Entity name \|
	312	+\| Type \| Entity type \|
	313	+\| Description \| First two descriptions, semicolon-separated \|
	314	+\| Related To \| Comma-separated list of outgoing relationship targets \|
	315	+
	316	+### Overview page
	317	+
	318	+The `Overview.md` page provides a summary with entity counts and a grouped listing of all entities by type.
	319	+
	320	+## GitHub wiki generator
	321	+
	322	+The wiki generator creates a complete set of GitHub wiki pages from a knowledge graph, including navigation (Home page and Sidebar) and cross-linked entity pages.
	323	+
	324	+### CLI usage
	325	+
	326	+Generate wiki pages locally:
	327	+
	328	+```
	329	+planopticon wiki generate DB_PATH [OPTIONS]
	330	+```
	331	+
	332	+\| Option \| Short \| Default \| Description \|
	333	+\|--------\|-------\|---------\|-------------\|
	334	+\| `--output` \| `-o` \| `./wiki` \| Output directory for wiki pages \|
	335	+\| `--title` \| \| `Knowledge Base` \| Wiki title (shown on Home page) \|
	336	+
	337	+Push wiki pages to GitHub:
	338	+
	339	+```
	340	+planopticon wiki push WIKI_DIR REPO [OPTIONS]
	341	+```
	342	+
	343	+\| Argument \| Description \|
	344	+\|----------\|-------------\|
	345	+\| `WIKI_DIR` \| Path to the directory containing generated wiki `.md` files \|
	346	+\| `REPO` \| GitHub repository in `owner/repo` format \|
	347	+
	348	+\| Option \| Short \| Default \| Description \|
	349	+\|--------\|-------\|---------\|-------------\|
	350	+\| `--message` \| `-m` \| `Update wiki` \| Git commit message \|
	351	+
	352	+Examples:
	353	+
	354	+```bash
	355	+# Generate wiki pages
	356	+planopticon wiki generate knowledge_graph.db -o ./wiki
	357	+
	358	+# Generate with a custom title
	359	+planopticon wiki generate kg.db -o ./wiki --title "Project Wiki"
	360	+
	361	+# Push to GitHub
	362	+planopticon wiki push ./wiki ConflictHQ/PlanOpticon
	363	+
	364	+# Push with a custom commit message
	365	+planopticon wiki push ./wiki owner/repo -m "Add entity pages"
	366	+```
	367	+
	368	+### Generated pages
	369	+
	370	+The wiki generator creates the following pages:
	371	+
	372	+\| Page \| Description \|
	373	+\|------\|-------------\|
	374	+\| `Home.md` \| Main wiki page with entity counts, type links, and artifact links \|
	375	+\| `_Sidebar.md` \| Navigation sidebar with links to Home, entity type indexes, and artifacts \|
	376	+\| `{Type}.md` \| One index page per entity type with a table of entities and descriptions \|
	377	+\| `{Entity}.md` \| Individual entity pages with type, descriptions, relationships, and sources \|
	378	+
	379	+### Entity pages
	380	+
	381	+Each entity page contains:
	382	+
	383	+- Entity name as the top heading
	384	+- Type label
	385	+- Descriptions section (bullet list)
	386	+- Relationships table with wiki-style links to target entities
	387	+- Referenced By table with links to source entities
	388	+- Sources section listing occurrences with timestamps and context
	389	+
	390	+All entity and type names are cross-linked using GitHub wiki-compatible links (`[Name](Sanitized-Name)`).
	391	+
	392	+### Push behavior
	393	+
	394	+The `wiki push` command:
	395	+
	396	+1. Clones the existing GitHub wiki repository (`https://github.com/{repo}.wiki.git`).
	397	+2. If the wiki does not exist yet, initializes a new Git repository.
	398	+3. Copies all `.md` files from the wiki directory into the clone.
	399	+4. Commits the changes.
	400	+5. Pushes to the remote (tries `master` first, then `main`).
	401	+
	402	+This requires Git authentication with push access to the repository. The wiki must be enabled in the GitHub repository settings.
	403	+
	404	+## PlanOpticonExchange JSON format
	405	+
	406	+The PlanOpticonExchange is the canonical interchange format for PlanOpticon data. Every command produces it, and every export adapter can consume it. It provides a structured, versioned JSON representation of a complete knowledge graph with project metadata.
	407	+
	408	+### CLI usage
	409	+
	410	+```
	411	+planopticon export exchange DB_PATH [OPTIONS]
	412	+```
	413	+
	414	+\| Option \| Short \| Default \| Description \|
	415	+\|--------\|-------\|---------\|-------------\|
	416	+\| `--output` \| `-o` \| `./exchange.json` \| Output JSON file path \|
	417	+\| `--name` \| \| `Untitled` \| Project name for the exchange payload \|
	418	+\| `--description` \| \| (empty) \| Project description \|
	419	+
	420	+Examples:
	421	+
	422	+```bash
	423	+# Basic export
	424	+planopticon export exchange knowledge_graph.db
	425	+
	426	+# With project metadata
	427	+planopticon export exchange kg.db -o exchange.json --name "My Project" --description "Sprint 3 analysis"
	428	+```
	429	+
	430	+### Schema
	431	+
	432	+The exchange format has the following top-level structure:
	433	+
	434	+```json
	435	+{
	436	+ "version": "1.0",
	437	+ "project": {
	438	+ "name": "My Project",
	439	+ "description": "Sprint 3 analysis",
	440	+ "created_at": "2026-03-07T10:30:00.000000",
	441	+ "updated_at": "2026-03-07T10:30:00.000000",
	442	+ "tags": ["sprint-3", "backend"]
	443	+ },
	444	+ "entities": [
	445	+ {
	446	+ "name": "Python",
	447	+ "type": "technology",
	448	+ "descriptions": ["A high-level programming language"],
	449	+ "source": "transcript",
	450	+ "occurrences": [
	451	+ {
	452	+ "source": "meeting.mp4",
	453	+ "timestamp": "00:05:23",
	454	+ "text": "We should use Python for the backend"
	455	+ }
	456	+ ]
	457	+ }
	458	+ ],
	459	+ "relationships": [
	460	+ {
	461	+ "source": "Python",
	462	+ "target": "Backend Service",
	463	+ "type": "used_by",
	464	+ "content_source": "transcript:meeting.mp4",
	465	+ "timestamp": 323.0
	466	+ }
	467	+ ],
	468	+ "artifacts": [
	469	+ {
	470	+ "name": "Project Plan",
	471	+ "content": "# Project Plan\n\n...",
	472	+ "artifact_type": "project_plan",
	473	+ "format": "markdown",
	474	+ "metadata": {}
	475	+ }
	476	+ ],
	477	+ "sources": [
	478	+ {
	479	+ "source_id": "abc123",
	480	+ "source_type": "video",
	481	+ "title": "Sprint Planning Meeting",
	482	+ "path": "/recordings/meeting.mp4",
	483	+ "url": null,
	484	+ "mime_type": "video/mp4",
	485	+ "ingested_at": "2026-03-07T10:00:00.000000",
	486	+ "metadata": {}
	487	+ }
	488	+ ]
	489	+}
	490	+```
	491	+
	492	+Top-level fields:
	493	+
	494	+\| Field \| Type \| Description \|
	495	+\|-------\|------\|-------------\|
	496	+\| `version` \| `str` \| Schema version (currently `"1.0"`) \|
	497	+\| `project` \| `ProjectMeta` \| Project-level metadata \|
	498	+\| `entities` \| `List[Entity]` \| Knowledge graph entities \|
	499	+\| `relationships` \| `List[Relationship]` \| Knowledge graph relationships \|
	500	+\| `artifacts` \| `List[ArtifactMeta]` \| Generated artifacts (plans, PRDs, etc.) \|
	501	+\| `sources` \| `List[SourceRecord]` \| Content source provenance records \|
	502	+
	503	+### Merging exchange files
	504	+
	505	+The exchange format supports merging, with automatic deduplication:
	506	+
	507	+- Entities are deduplicated by name
	508	+- Relationships are deduplicated by the tuple `(source, target, type)`
	509	+- Artifacts are deduplicated by name
	510	+- Sources are deduplicated by `source_id`
	511	+
	512	+```python
	513	+from video_processor.exchange import PlanOpticonExchange
	514	+
	515	+# Load two exchange files
	516	+ex1 = PlanOpticonExchange.from_file("sprint-1.json")
	517	+ex2 = PlanOpticonExchange.from_file("sprint-2.json")
	518	+
	519	+# Merge ex2 into ex1
	520	+ex1.merge(ex2)
	521	+
	522	+# Save the combined result
	523	+ex1.to_file("combined.json")
	524	+```
	525	+
	526	+The `project.updated_at` timestamp is updated automatically on merge.
	527	+
	528	+### Python API
	529	+
	530	+Create from a knowledge graph:
	531	+
	532	+```python
	533	+from video_processor.exchange import PlanOpticonExchange
	534	+from video_processor.integrators.knowledge_graph import KnowledgeGraph
	535	+
	536	+kg = KnowledgeGraph(db_path="knowledge_graph.db")
	537	+kg_data = kg.to_dict()
	538	+
	539	+exchange = PlanOpticonExchange.from_knowledge_graph(
	540	+ kg_data,
	541	+ project_name="My Project",
	542	+ project_description="Analysis of sprint planning meetings",
	543	+ tags=["planning", "backend"],
	544	+)
	545	+```
	546	+
	547	+Save and load:
	548	+
	549	+```python
	550	+# Save to file
	551	+exchange.to_file("exchange.json")
	552	+
	553	+# Load from file
	554	+loaded = PlanOpticonExchange.from_file("exchange.json")
	555	+```
	556	+
	557	+Get JSON Schema:
	558	+
	559	+```python
	560	+schema = PlanOpticonExchange.json_schema()
	561	+```
	562	+
	563	+This returns the full JSON Schema for validation and documentation purposes.
	564	+
	565	+## Python API for all exporters
	566	+
	567	+### Markdown document generation
	568	+
	569	+```python
	570	+from pathlib import Path
	571	+from video_processor.exporters.markdown import (
	572	+ generate_all,
	573	+ generate_executive_summary,
	574	+ generate_meeting_notes,
	575	+ generate_glossary,
	576	+ generate_relationship_map,
	577	+ generate_status_report,
	578	+ generate_entity_index,
	579	+ generate_csv_export,
	580	+ generate_entity_brief,
	581	+ DOCUMENT_TYPES,
	582	+)
	583	+from video_processor.integrators.knowledge_graph import KnowledgeGraph
	584	+
	585	+kg = KnowledgeGraph(db_path=Path("knowledge_graph.db"))
	586	+kg_data = kg.to_dict()
	587	+
	588	+# Generate all document types at once
	589	+created_files = generate_all(kg_data, Path("./export"))
	590	+
	591	+# Generate specific document types
	592	+created_files = generate_all(
	593	+ kg_data,
	594	+ Path("./export"),
	595	+ doc_types=["summary", "glossary", "csv"],
	596	+)
	597	+
	598	+# Generate individual documents (returns markdown string)
	599	+summary = generate_executive_summary(kg_data)
	600	+notes = generate_meeting_notes(kg_data, title="Sprint Planning")
	601	+glossary = generate_glossary(kg_data)
	602	+rel_map = generate_relationship_map(kg_data)
	603	+status = generate_status_report(kg_data, title="Q1 Status")
	604	+index = generate_entity_index(kg_data)
	605	+csv_text = generate_csv_export(kg_data)
	606	+
	607	+# Generate a brief for a single entity
	608	+entity = kg_data["nodes"][0]
	609	+relationships = kg_data["relationships"]
	610	+brief = generate_entity_brief(entity, relationships)
	611	+```
	612	+
	613	+### Obsidian export
	614	+
	615	+```python
	616	+from pathlib import Path
	617	+from video_processor.agent.skills.notes_export import export_to_obsidian
	618	+from video_processor.integrators.knowledge_graph import KnowledgeGraph
	619	+
	620	+kg = KnowledgeGraph(db_path=Path("knowledge_graph.db"))
	621	+kg_data = kg.to_dict()
	622	+
	623	+created_files = export_to_obsidian(kg_data, Path("./obsidian-vault"))
	624	+print(f"Created {len(created_files)} files")
	625	+```
	626	+
	627	+### Notion export
	628	+
	629	+```python
	630	+from pathlib import Path
	631	+from video_processor.agent.skills.notes_export import export_to_notion_md
	632	+from video_processor.integrators.knowledge_graph import KnowledgeGraph
	633	+
	634	+kg = KnowledgeGraph(db_path=Path("knowledge_graph.db"))
	635	+kg_data = kg.to_dict()
	636	+
	637	+created_files = export_to_notion_md(kg_data, Path("./notion-export"))
	638	+```
	639	+
	640	+### Wiki generation
	641	+
	642	+```python
	643	+from pathlib import Path
	644	+from video_processor.agent.skills.wiki_generator import (
	645	+ generate_wiki,
	646	+ write_wiki,
	647	+ push_wiki,
	648	+)
	649	+from video_processor.integrators.knowledge_graph import KnowledgeGraph
	650	+
	651	+kg = KnowledgeGraph(db_path=Path("knowledge_graph.db"))
	652	+kg_data = kg.to_dict()
	653	+
	654	+# Generate pages as a dict of {filename: content}
	655	+pages = generate_wiki(kg_data, title="Project Wiki")
	656	+
	657	+# Write to disk
	658	+written = write_wiki(pages, Path("./wiki"))
	659	+
	660	+# Push to GitHub (requires git auth)
	661	+success = push_wiki(Path("./wiki"), "owner/repo", message="Update wiki")
	662	+```
	663	+
	664	+## Companion REPL
	665	+
	666	+Inside the interactive companion REPL, use the `/export` command:
	667	+
	668	+```
	669	+> /export markdown
	670	+Export 'markdown' requested. Use the CLI command:
	671	+ planopticon export markdown ./knowledge_graph.db
	672	+
	673	+> /export obsidian
	674	+Export 'obsidian' requested. Use the CLI command:
	675	+ planopticon export obsidian ./knowledge_graph.db
	676	+```
	677	+
	678	+The REPL provides guidance on the CLI command to run; actual export is performed via the CLI.
	679	+
	680	+## Common workflows
	681	+
	682	+### Analyze videos and export to Obsidian
	683	+
	684	+```bash
	685	+# Analyze meeting recordings
	686	+planopticon analyze meeting-1.mp4 -o ./results
	687	+planopticon analyze meeting-2.mp4 --db-path ./results/knowledge_graph.db
	688	+
	689	+# Ingest supplementary docs
	690	+planopticon ingest ./specs/ --db-path ./results/knowledge_graph.db
	691	+
	692	+# Export to Obsidian vault
	693	+planopticon export obsidian ./results/knowledge_graph.db -o ~/Obsidian/ProjectVault
	694	+
	695	+# Open in Obsidian and explore the graph view
	696	+```
	697	+
	698	+### Generate project documentation
	699	+
	700	+```bash
	701	+# Generate all markdown documents
	702	+planopticon export markdown knowledge_graph.db -o ./docs
	703	+
	704	+# The output includes:
	705	+# docs/summary.md - Executive summary
	706	+# docs/meeting-notes.md - Meeting notes format
	707	+# docs/glossary.md - Entity glossary
	708	+# docs/relationship-map.md - Relationships + Mermaid diagram
	709	+# docs/status-report.md - Project status report
	710	+# docs/entity-index.md - Master entity index
	711	+# docs/csv.csv - Spreadsheet-ready CSV
	712	+# docs/entities/ - Individual entity briefs
	713	+```
	714	+
	715	+### Publish a GitHub wiki
	716	+
	717	+```bash
	718	+# Generate wiki pages
	719	+planopticon wiki generate knowledge_graph.db -o ./wiki --title "Project Knowledge Base"
	720	+
	721	+# Review locally, then push
	722	+planopticon wiki push ./wiki ConflictHQ/my-project -m "Initial wiki from meeting analysis"
	723	+```
	724	+
	725	+### Share data between projects
	726	+
	727	+```bash
	728	+# Export from project A
	729	+planopticon export exchange ./project-a/knowledge_graph.db \
	730	+ -o project-a.json --name "Project A"
	731	+
	732	+# Export from project B
	733	+planopticon export exchange ./project-b/knowledge_graph.db \
	734	+ -o project-b.json --name "Project B"
	735	+
	736	+# Merge in Python
	737	+python -c "
	738	+from video_processor.exchange import PlanOpticonExchange
	739	+a = PlanOpticonExchange.from_file('project-a.json')
	740	+b = PlanOpticonExchange.from_file('project-b.json')
	741	+a.merge(b)
	742	+a.to_file('combined.json')
	743	+print(f'Combined: {len(a.entities)} entities, {len(a.relationships)} relationships')
	744	+"
	745	+```
	746	+
	747	+### Export for spreadsheet analysis
	748	+
	749	+```bash
	750	+# Generate just the CSV
	751	+planopticon export markdown knowledge_graph.db --type csv -o ./export
	752	+
	753	+# The file export/csv.csv can be opened in Excel, Google Sheets, etc.
	754	+```
	755	+
	756	+Alternatively, the Notion export includes an `entities_database.csv` that can be imported into any spreadsheet tool or Notion database.

	--- a/docs/guide/export.md
	+++ b/docs/guide/export.md
	@@ -0,0 +1,756 @@

	--- a/docs/guide/export.md
	+++ b/docs/guide/export.md
	@@ -0,0 +1,756 @@
1	# Export
2
3	PlanOpticon provides multiple ways to export knowledge graph data into formats suitable for documentation, note-taking, collaboration, and interchange. All export commands work offline from a `knowledge_graph.db` file -- no API key is needed for template-based exports.
4
5	## Overview of export options
6
7	\| Format \| Command \| API Key \| Description \|
8	\|--------\|---------\|---------\|-------------\|
9	\| Markdown documents \| `planopticon export markdown` \| No \| 7 document types: summary, meeting notes, glossary, and more \|
10	\| Obsidian vault \| `planopticon export obsidian` \| No \| YAML frontmatter, `[[wiki-links]]`, tag pages, Map of Content \|
11	\| Notion-compatible \| `planopticon export notion` \| No \| Callout blocks, CSV database for bulk import \|
12	\| PlanOpticonExchange JSON \| `planopticon export exchange` \| No \| Canonical interchange format for merging and sharing \|
13	\| GitHub wiki \| `planopticon wiki generate` \| No \| Home, Sidebar, entity pages, type indexes \|
14	\| GitHub wiki push \| `planopticon wiki push` \| Git auth \| Push generated wiki to a GitHub repo \|
15
16	## Markdown document generator
17
18	The markdown exporter produces structured documents from knowledge graph data using pure template-based generation. No LLM calls are made -- the output is deterministic and based entirely on the entities and relationships in the graph.
19
20	### CLI usage
21
22	```
23	planopticon export markdown DB_PATH [OPTIONS]
24	```
25
26	Arguments:
27
28	\| Argument \| Description \|
29	\|----------\|-------------\|
30	\| `DB_PATH` \| Path to a `knowledge_graph.db` file \|
31
32	Options:
33
34	\| Option \| Short \| Default \| Description \|
35	\|--------\|-------\|---------\|-------------\|
36	\| `--output` \| `-o` \| `./export` \| Output directory \|
37	\| `--type` \| \| `all` \| Document types to generate (repeatable). Choices: `summary`, `meeting-notes`, `glossary`, `relationship-map`, `status-report`, `entity-index`, `csv`, `all` \|
38
39	Examples:
40
41	```bash
42	# Generate all document types
43	planopticon export markdown knowledge_graph.db
44
45	# Generate only summary and glossary
46	planopticon export markdown kg.db -o ./docs --type summary --type glossary
47
48	# Generate meeting notes and CSV
49	planopticon export markdown kg.db --type meeting-notes --type csv
50	```
51
52	### Document types
53
54	#### summary (Executive Summary)
55
56	A high-level overview of the knowledge graph. Contains:
57
58	- Total entity and relationship counts
59	- Entity breakdown by type (table with counts and example names)
60	- Key entities ranked by number of connections (top 10)
61	- Relationship type breakdown with counts
62
63	This is useful for getting a quick overview of what a knowledge base contains.
64
65	#### meeting-notes (Meeting Notes)
66
67	Formats knowledge graph data as structured meeting notes. Organizes entities into planning-relevant categories:
68
69	- Discussion Topics: Entities of type `concept`, `technology`, or `topic` with their descriptions
70	- Participants: Entities of type `person`
71	- Decisions & Constraints: Entities of type `decision` or `constraint`
72	- Action Items: Entities of type `goal`, `feature`, or `milestone`, shown as checkboxes. If an entity has an `assigned_to` or `owned_by` relationship, the owner is shown as `@name`
73	- Open Questions / Loose Ends: Entities with one or fewer relationships (excluding people), indicating topics that may need follow-up
74
75	Includes a generation timestamp.
76
77	#### glossary (Glossary)
78
79	An alphabetically sorted dictionary of all entities in the knowledge graph. Each entry shows:
80
81	- Entity name (bold)
82	- Entity type (italic, in parentheses)
83	- First description
84
85	Format:
86
87	```
88	Entity Name (type)
89	: Description text here.
90	```
91
92	#### relationship-map (Relationship Map)
93
94	A comprehensive view of all relationships in the graph, organized by relationship type. Each type gets its own section with a table of source-target pairs.
95
96	Also includes a Mermaid diagram of the top 20 most-connected entities, rendered as a `graph LR` flowchart with labeled edges. This diagram can be rendered natively in GitHub, GitLab, Obsidian, and many other Markdown viewers.
97
98	#### status-report (Status Report)
99
100	A project-oriented status report that highlights planning entities:
101
102	- Overview: Counts of entities, relationships, features, milestones, requirements, and risks/constraints
103	- Milestones: Entities of type `milestone` with descriptions
104	- Features: Table of entities of type `feature` with descriptions (truncated to 60 characters)
105	- Risks & Constraints: Entities of type `risk` or `constraint`
106
107	Includes a generation timestamp.
108
109	#### entity-index (Entity Index)
110
111	A master index of all entities grouped by type. Each type section lists entities alphabetically with their first description. Shows total entity count and number of types.
112
113	#### csv (CSV Export)
114
115	A CSV file suitable for spreadsheet import. Columns:
116
117	\| Column \| Description \|
118	\|--------\|-------------\|
119	\| Name \| Entity name \|
120	\| Type \| Entity type \|
121	\| Description \| First description \|
122	\| Related To \| Semicolon-separated list of entities this entity has outgoing relationships to \|
123	\| Source \| First occurrence source \|
124
125	### Entity briefs
126
127	In addition to the selected document types, the `generate_all()` function automatically creates individual entity brief pages in an `entities/` subdirectory. Each brief contains:
128
129	- Entity name and type
130	- Summary (all descriptions)
131	- Outgoing relationships (table of target entities and relationship types)
132	- Incoming relationships (table of source entities and relationship types)
133	- Source occurrences with timestamps and context text
134
135	## Obsidian vault export
136
137	The Obsidian exporter creates a complete vault structure with YAML frontmatter, `[[wiki-links]]` for entity cross-references, and Obsidian-compatible metadata.
138
139	### CLI usage
140
141	```
142	planopticon export obsidian DB_PATH [OPTIONS]
143	```
144
145	Options:
146
147	\| Option \| Short \| Default \| Description \|
148	\|--------\|-------\|---------\|-------------\|
149	\| `--output` \| `-o` \| `./obsidian-vault` \| Output vault directory \|
150
151	Example:
152
153	```bash
154	planopticon export obsidian knowledge_graph.db -o ./my-vault
155	```
156
157	### Generated structure
158
159	```
160	my-vault/
161	_Index.md # Map of Content (MOC)
162	Tag - Person.md # One tag page per entity type
163	Tag - Technology.md
164	Tag - Concept.md
165	Alice.md # Individual entity notes
166	Python.md
167	Microservices.md
168	...
169	```
170
171	### Entity notes
172
173	Each entity gets a dedicated note with:
174
175	YAML frontmatter:
176
177	```yaml
178	---
179	type: technology
180	tags:
181	- technology
182	aliases:
183	- Python 3
184	- CPython
185	date: 2026-03-07
186	---
187	```
188
189	The frontmatter includes:
190
191	- `type`: The entity type
192	- `tags`: Entity type as a tag (for Obsidian tag-based filtering)
193	- `aliases`: Any known aliases for the entity (if available)
194	- `date`: The export date
195
196	Body content:
197
198	- `# Entity Name` heading
199	- Description paragraphs
200	- `## Relationships` section with `[[wiki-links]]` to related entities:
201	```
202	- uses: [[FastAPI]]
203	- depends_on: [[PostgreSQL]]
204	```
205	- `## Referenced by` section with incoming relationships:
206	```
207	- implements from [[Backend Service]]
208	```
209
210	### Index note (Map of Content)
211
212	The `_Index.md` file serves as a Map of Content (MOC), listing all entities grouped by type with `[[wiki-links]]`:
213
214	```markdown
215	---
216	type: index
217	tags:
218	- MOC
219	date: 2026-03-07
220	---
221
222	# Index
223
224	47 entities \| 31 relationships
225
226	## Concept
227
228	- [[Microservices]]
229	- [[REST API]]
230
231	## Person
232
233	- [[Alice]]
234	- [[Bob]]
235	```
236
237	### Tag pages
238
239	One tag page is created per entity type (e.g., `Tag - Person.md`, `Tag - Technology.md`). Each page has frontmatter tagging it with the entity type and lists all entities of that type with descriptions.
240
241	## Notion-compatible markdown export
242
243	The Notion exporter creates Markdown files with Notion-style callout blocks and a CSV database file for bulk import into Notion.
244
245	### CLI usage
246
247	```
248	planopticon export notion DB_PATH [OPTIONS]
249	```
250
251	Options:
252
253	\| Option \| Short \| Default \| Description \|
254	\|--------\|-------\|---------\|-------------\|
255	\| `--output` \| `-o` \| `./notion-export` \| Output directory \|
256
257	Example:
258
259	```bash
260	planopticon export notion knowledge_graph.db -o ./notion-export
261	```
262
263	### Generated structure
264
265	```
266	notion-export/
267	Overview.md # Knowledge graph overview page
268	entities_database.csv # CSV for Notion database import
269	Alice.md # Individual entity pages
270	Python.md
271	...
272	```
273
274	### Entity pages
275
276	Each entity page uses Notion-style callout syntax for metadata:
277
278	```markdown
279	# Python
280
281	> :computer: Type: technology
282
283	## Description
284
285	A high-level programming language...
286
287	> :memo: Properties
288	> - version: 3.11
289	> - paradigm: multi-paradigm
290
291	## Relationships
292
293	\| Target \| Relationship \|
294	\|--------\|-------------\|
295	\| FastAPI \| uses \|
296	\| Django \| framework_for \|
297
298	## Referenced by
299
300	\| Source \| Relationship \|
301	\|--------\|-------------\|
302	\| Backend Service \| implements \|
303	```
304
305	### CSV database
306
307	The `entities_database.csv` file contains all entities in a format suitable for Notion's CSV database import:
308
309	\| Column \| Description \|
310	\|--------\|-------------\|
311	\| Name \| Entity name \|
312	\| Type \| Entity type \|
313	\| Description \| First two descriptions, semicolon-separated \|
314	\| Related To \| Comma-separated list of outgoing relationship targets \|
315
316	### Overview page
317
318	The `Overview.md` page provides a summary with entity counts and a grouped listing of all entities by type.
319
320	## GitHub wiki generator
321
322	The wiki generator creates a complete set of GitHub wiki pages from a knowledge graph, including navigation (Home page and Sidebar) and cross-linked entity pages.
323
324	### CLI usage
325
326	Generate wiki pages locally:
327
328	```
329	planopticon wiki generate DB_PATH [OPTIONS]
330	```
331
332	\| Option \| Short \| Default \| Description \|
333	\|--------\|-------\|---------\|-------------\|
334	\| `--output` \| `-o` \| `./wiki` \| Output directory for wiki pages \|
335	\| `--title` \| \| `Knowledge Base` \| Wiki title (shown on Home page) \|
336
337	Push wiki pages to GitHub:
338
339	```
340	planopticon wiki push WIKI_DIR REPO [OPTIONS]
341	```
342
343	\| Argument \| Description \|
344	\|----------\|-------------\|
345	\| `WIKI_DIR` \| Path to the directory containing generated wiki `.md` files \|
346	\| `REPO` \| GitHub repository in `owner/repo` format \|
347
348	\| Option \| Short \| Default \| Description \|
349	\|--------\|-------\|---------\|-------------\|
350	\| `--message` \| `-m` \| `Update wiki` \| Git commit message \|
351
352	Examples:
353
354	```bash
355	# Generate wiki pages
356	planopticon wiki generate knowledge_graph.db -o ./wiki
357
358	# Generate with a custom title
359	planopticon wiki generate kg.db -o ./wiki --title "Project Wiki"
360
361	# Push to GitHub
362	planopticon wiki push ./wiki ConflictHQ/PlanOpticon
363
364	# Push with a custom commit message
365	planopticon wiki push ./wiki owner/repo -m "Add entity pages"
366	```
367
368	### Generated pages
369
370	The wiki generator creates the following pages:
371
372	\| Page \| Description \|
373	\|------\|-------------\|
374	\| `Home.md` \| Main wiki page with entity counts, type links, and artifact links \|
375	\| `_Sidebar.md` \| Navigation sidebar with links to Home, entity type indexes, and artifacts \|
376	\| `{Type}.md` \| One index page per entity type with a table of entities and descriptions \|
377	\| `{Entity}.md` \| Individual entity pages with type, descriptions, relationships, and sources \|
378
379	### Entity pages
380
381	Each entity page contains:
382
383	- Entity name as the top heading
384	- Type label
385	- Descriptions section (bullet list)
386	- Relationships table with wiki-style links to target entities
387	- Referenced By table with links to source entities
388	- Sources section listing occurrences with timestamps and context
389
390	All entity and type names are cross-linked using GitHub wiki-compatible links (`[Name](Sanitized-Name)`).
391
392	### Push behavior
393
394	The `wiki push` command:
395
396	1. Clones the existing GitHub wiki repository (`https://github.com/{repo}.wiki.git`).
397	2. If the wiki does not exist yet, initializes a new Git repository.
398	3. Copies all `.md` files from the wiki directory into the clone.
399	4. Commits the changes.
400	5. Pushes to the remote (tries `master` first, then `main`).
401
402	This requires Git authentication with push access to the repository. The wiki must be enabled in the GitHub repository settings.
403
404	## PlanOpticonExchange JSON format
405
406	The PlanOpticonExchange is the canonical interchange format for PlanOpticon data. Every command produces it, and every export adapter can consume it. It provides a structured, versioned JSON representation of a complete knowledge graph with project metadata.
407
408	### CLI usage
409
410	```
411	planopticon export exchange DB_PATH [OPTIONS]
412	```
413
414	\| Option \| Short \| Default \| Description \|
415	\|--------\|-------\|---------\|-------------\|
416	\| `--output` \| `-o` \| `./exchange.json` \| Output JSON file path \|
417	\| `--name` \| \| `Untitled` \| Project name for the exchange payload \|
418	\| `--description` \| \| (empty) \| Project description \|
419
420	Examples:
421
422	```bash
423	# Basic export
424	planopticon export exchange knowledge_graph.db
425
426	# With project metadata
427	planopticon export exchange kg.db -o exchange.json --name "My Project" --description "Sprint 3 analysis"
428	```
429
430	### Schema
431
432	The exchange format has the following top-level structure:
433
434	```json
435	{
436	"version": "1.0",
437	"project": {
438	"name": "My Project",
439	"description": "Sprint 3 analysis",
440	"created_at": "2026-03-07T10:30:00.000000",
441	"updated_at": "2026-03-07T10:30:00.000000",
442	"tags": ["sprint-3", "backend"]
443	},
444	"entities": [
445	{
446	"name": "Python",
447	"type": "technology",
448	"descriptions": ["A high-level programming language"],
449	"source": "transcript",
450	"occurrences": [
451	{
452	"source": "meeting.mp4",
453	"timestamp": "00:05:23",
454	"text": "We should use Python for the backend"
455	}
456	]
457	}
458	],
459	"relationships": [
460	{
461	"source": "Python",
462	"target": "Backend Service",
463	"type": "used_by",
464	"content_source": "transcript:meeting.mp4",
465	"timestamp": 323.0
466	}
467	],
468	"artifacts": [
469	{
470	"name": "Project Plan",
471	"content": "# Project Plan\n\n...",
472	"artifact_type": "project_plan",
473	"format": "markdown",
474	"metadata": {}
475	}
476	],
477	"sources": [
478	{
479	"source_id": "abc123",
480	"source_type": "video",
481	"title": "Sprint Planning Meeting",
482	"path": "/recordings/meeting.mp4",
483	"url": null,
484	"mime_type": "video/mp4",
485	"ingested_at": "2026-03-07T10:00:00.000000",
486	"metadata": {}
487	}
488	]
489	}
490	```
491
492	Top-level fields:
493
494	\| Field \| Type \| Description \|
495	\|-------\|------\|-------------\|
496	\| `version` \| `str` \| Schema version (currently `"1.0"`) \|
497	\| `project` \| `ProjectMeta` \| Project-level metadata \|
498	\| `entities` \| `List[Entity]` \| Knowledge graph entities \|
499	\| `relationships` \| `List[Relationship]` \| Knowledge graph relationships \|
500	\| `artifacts` \| `List[ArtifactMeta]` \| Generated artifacts (plans, PRDs, etc.) \|
501	\| `sources` \| `List[SourceRecord]` \| Content source provenance records \|
502
503	### Merging exchange files
504
505	The exchange format supports merging, with automatic deduplication:
506
507	- Entities are deduplicated by name
508	- Relationships are deduplicated by the tuple `(source, target, type)`
509	- Artifacts are deduplicated by name
510	- Sources are deduplicated by `source_id`
511
512	```python
513	from video_processor.exchange import PlanOpticonExchange
514
515	# Load two exchange files
516	ex1 = PlanOpticonExchange.from_file("sprint-1.json")
517	ex2 = PlanOpticonExchange.from_file("sprint-2.json")
518
519	# Merge ex2 into ex1
520	ex1.merge(ex2)
521
522	# Save the combined result
523	ex1.to_file("combined.json")
524	```
525
526	The `project.updated_at` timestamp is updated automatically on merge.
527
528	### Python API
529
530	Create from a knowledge graph:
531
532	```python
533	from video_processor.exchange import PlanOpticonExchange
534	from video_processor.integrators.knowledge_graph import KnowledgeGraph
535
536	kg = KnowledgeGraph(db_path="knowledge_graph.db")
537	kg_data = kg.to_dict()
538
539	exchange = PlanOpticonExchange.from_knowledge_graph(
540	kg_data,
541	project_name="My Project",
542	project_description="Analysis of sprint planning meetings",
543	tags=["planning", "backend"],
544	)
545	```
546
547	Save and load:
548
549	```python
550	# Save to file
551	exchange.to_file("exchange.json")
552
553	# Load from file
554	loaded = PlanOpticonExchange.from_file("exchange.json")
555	```
556
557	Get JSON Schema:
558
559	```python
560	schema = PlanOpticonExchange.json_schema()
561	```
562
563	This returns the full JSON Schema for validation and documentation purposes.
564
565	## Python API for all exporters
566
567	### Markdown document generation
568
569	```python
570	from pathlib import Path
571	from video_processor.exporters.markdown import (
572	generate_all,
573	generate_executive_summary,
574	generate_meeting_notes,
575	generate_glossary,
576	generate_relationship_map,
577	generate_status_report,
578	generate_entity_index,
579	generate_csv_export,
580	generate_entity_brief,
581	DOCUMENT_TYPES,
582	)
583	from video_processor.integrators.knowledge_graph import KnowledgeGraph
584
585	kg = KnowledgeGraph(db_path=Path("knowledge_graph.db"))
586	kg_data = kg.to_dict()
587
588	# Generate all document types at once
589	created_files = generate_all(kg_data, Path("./export"))
590
591	# Generate specific document types
592	created_files = generate_all(
593	kg_data,
594	Path("./export"),
595	doc_types=["summary", "glossary", "csv"],
596	)
597
598	# Generate individual documents (returns markdown string)
599	summary = generate_executive_summary(kg_data)
600	notes = generate_meeting_notes(kg_data, title="Sprint Planning")
601	glossary = generate_glossary(kg_data)
602	rel_map = generate_relationship_map(kg_data)
603	status = generate_status_report(kg_data, title="Q1 Status")
604	index = generate_entity_index(kg_data)
605	csv_text = generate_csv_export(kg_data)
606
607	# Generate a brief for a single entity
608	entity = kg_data["nodes"][0]
609	relationships = kg_data["relationships"]
610	brief = generate_entity_brief(entity, relationships)
611	```
612
613	### Obsidian export
614
615	```python
616	from pathlib import Path
617	from video_processor.agent.skills.notes_export import export_to_obsidian
618	from video_processor.integrators.knowledge_graph import KnowledgeGraph
619
620	kg = KnowledgeGraph(db_path=Path("knowledge_graph.db"))
621	kg_data = kg.to_dict()
622
623	created_files = export_to_obsidian(kg_data, Path("./obsidian-vault"))
624	print(f"Created {len(created_files)} files")
625	```
626
627	### Notion export
628
629	```python
630	from pathlib import Path
631	from video_processor.agent.skills.notes_export import export_to_notion_md
632	from video_processor.integrators.knowledge_graph import KnowledgeGraph
633
634	kg = KnowledgeGraph(db_path=Path("knowledge_graph.db"))
635	kg_data = kg.to_dict()
636
637	created_files = export_to_notion_md(kg_data, Path("./notion-export"))
638	```
639
640	### Wiki generation
641
642	```python
643	from pathlib import Path
644	from video_processor.agent.skills.wiki_generator import (
645	generate_wiki,
646	write_wiki,
647	push_wiki,
648	)
649	from video_processor.integrators.knowledge_graph import KnowledgeGraph
650
651	kg = KnowledgeGraph(db_path=Path("knowledge_graph.db"))
652	kg_data = kg.to_dict()
653
654	# Generate pages as a dict of {filename: content}
655	pages = generate_wiki(kg_data, title="Project Wiki")
656
657	# Write to disk
658	written = write_wiki(pages, Path("./wiki"))
659
660	# Push to GitHub (requires git auth)
661	success = push_wiki(Path("./wiki"), "owner/repo", message="Update wiki")
662	```
663
664	## Companion REPL
665
666	Inside the interactive companion REPL, use the `/export` command:
667
668	```
669	> /export markdown
670	Export 'markdown' requested. Use the CLI command:
671	planopticon export markdown ./knowledge_graph.db
672
673	> /export obsidian
674	Export 'obsidian' requested. Use the CLI command:
675	planopticon export obsidian ./knowledge_graph.db
676	```
677
678	The REPL provides guidance on the CLI command to run; actual export is performed via the CLI.
679
680	## Common workflows
681
682	### Analyze videos and export to Obsidian
683
684	```bash
685	# Analyze meeting recordings
686	planopticon analyze meeting-1.mp4 -o ./results
687	planopticon analyze meeting-2.mp4 --db-path ./results/knowledge_graph.db
688
689	# Ingest supplementary docs
690	planopticon ingest ./specs/ --db-path ./results/knowledge_graph.db
691
692	# Export to Obsidian vault
693	planopticon export obsidian ./results/knowledge_graph.db -o ~/Obsidian/ProjectVault
694
695	# Open in Obsidian and explore the graph view
696	```
697
698	### Generate project documentation
699
700	```bash
701	# Generate all markdown documents
702	planopticon export markdown knowledge_graph.db -o ./docs
703
704	# The output includes:
705	# docs/summary.md - Executive summary
706	# docs/meeting-notes.md - Meeting notes format
707	# docs/glossary.md - Entity glossary
708	# docs/relationship-map.md - Relationships + Mermaid diagram
709	# docs/status-report.md - Project status report
710	# docs/entity-index.md - Master entity index
711	# docs/csv.csv - Spreadsheet-ready CSV
712	# docs/entities/ - Individual entity briefs
713	```
714
715	### Publish a GitHub wiki
716
717	```bash
718	# Generate wiki pages
719	planopticon wiki generate knowledge_graph.db -o ./wiki --title "Project Knowledge Base"
720
721	# Review locally, then push
722	planopticon wiki push ./wiki ConflictHQ/my-project -m "Initial wiki from meeting analysis"
723	```
724
725	### Share data between projects
726
727	```bash
728	# Export from project A
729	planopticon export exchange ./project-a/knowledge_graph.db \
730	-o project-a.json --name "Project A"
731
732	# Export from project B
733	planopticon export exchange ./project-b/knowledge_graph.db \
734	-o project-b.json --name "Project B"
735
736	# Merge in Python
737	python -c "
738	from video_processor.exchange import PlanOpticonExchange
739	a = PlanOpticonExchange.from_file('project-a.json')
740	b = PlanOpticonExchange.from_file('project-b.json')
741	a.merge(b)
742	a.to_file('combined.json')
743	print(f'Combined: {len(a.entities)} entities, {len(a.relationships)} relationships')
744	"
745	```
746
747	### Export for spreadsheet analysis
748
749	```bash
750	# Generate just the CSV
751	planopticon export markdown knowledge_graph.db --type csv -o ./export
752
753	# The file export/csv.csv can be opened in Excel, Google Sheets, etc.
754	```
755
756	Alternatively, the Notion export includes an `entities_database.csv` that can be imported into any spreadsheet tool or Notion database.

A docs/guide/knowledge-graphs.md

+650

		--- a/docs/guide/knowledge-graphs.md
		+++ b/docs/guide/knowledge-graphs.md
		@@ -0,0 +1,650 @@
	1	+# Knowledge Graphs
	2	+
	3	+PlanOpticon builds structured knowledge graphs from video analyses, document ingestion, and other content sources. A knowledge graph captures entities (people, technologies, concepts, organizations) and the relationships between them, providing a queryable representation of everything discussed or presented in your source material.
	4	+
	5	+---
	6	+
	7	+## Storage
	8	+
	9	+Knowledge graphs are stored as SQLite databases (`knowledge_graph.db`) using Python's built-in `sqlite3` module. This means:
	10	+
	11	+- Zero external dependencies. No database server to install or manage.
	12	+- Single-file portability. Copy the `.db` file to share a knowledge graph.
	13	+- WAL mode. SQLite Write-Ahead Logging is enabled for concurrent read performance.
	14	+- JSON fallback. Knowledge graphs can also be saved as `knowledge_graph.json` for interoperability, though SQLite is preferred for performance and querying.
	15	+
	16	+### Database Schema
	17	+
	18	+The SQLite store uses the following tables:
	19	+
	20	+\| Table \| Purpose \|
	21	+\|---\|---\|
	22	+\| `entities` \| Core entity records with name, type, descriptions, source, and arbitrary properties \|
	23	+\| `occurrences` \| Where and when each entity was mentioned (source, timestamp, text snippet) \|
	24	+\| `relationships` \| Directed edges between entities with type, content source, timestamp, and properties \|
	25	+\| `sources` \| Registered content sources with provenance metadata (source type, title, path, URL, MIME type, ingestion timestamp) \|
	26	+\| `source_locations` \| Links between sources and specific entities/relationships, with location details (timestamp, page, section, line range, text snippet) \|
	27	+
	28	+All entity lookups are case-insensitive (indexed on `name_lower`). Entities and relationships are indexed on their source and target fields for efficient traversal.
	29	+
	30	+### Storage Backends
	31	+
	32	+PlanOpticon supports two storage backends, selected automatically:
	33	+
	34	+\| Backend \| When Used \| Persistence \|
	35	+\|---\|---\|---\|
	36	+\| `SQLiteStore` \| When a `db_path` is provided \| Persistent on disk \|
	37	+\| `InMemoryStore` \| When no path is given, or as fallback \| In-memory only \|
	38	+
	39	+Both backends implement the same `GraphStore` abstract interface, so all query and manipulation code works identically regardless of backend.
	40	+
	41	+```python
	42	+from video_processor.integrators.graph_store import create_store
	43	+
	44	+# Persistent SQLite store
	45	+store = create_store("/path/to/knowledge_graph.db")
	46	+
	47	+# In-memory store (for temporary operations)
	48	+store = create_store()
	49	+```
	50	+
	51	+---
	52	+
	53	+## Entity Types
	54	+
	55	+Entities extracted from content are assigned one of the following base types:
	56	+
	57	+\| Type \| Description \| Specificity Rank \|
	58	+\|---\|---\|---\|
	59	+\| `person` \| People mentioned or participating \| 3 (highest) \|
	60	+\| `technology` \| Tools, languages, frameworks, platforms \| 3 \|
	61	+\| `organization` \| Companies, teams, departments \| 2 \|
	62	+\| `time` \| Dates, deadlines, time references \| 1 \|
	63	+\| `diagram` \| Visual diagrams extracted from video frames \| 1 \|
	64	+\| `concept` \| General concepts, topics, ideas (default) \| 0 (lowest) \|
	65	+
	66	+The specificity rank is used during merge operations: when two entities are matched as duplicates, the more specific type wins (e.g., `technology` overrides `concept`).
	67	+
	68	+### Planning Taxonomy
	69	+
	70	+Beyond the base entity types, PlanOpticon includes a planning taxonomy for classifying entities into project-planning categories. The `TaxonomyClassifier` maps extracted entities into these types:
	71	+
	72	+\| Planning Type \| Keywords Matched \|
	73	+\|---\|---\|
	74	+\| `goal` \| goal, objective, aim, target outcome \|
	75	+\| `requirement` \| must, should, requirement, need, required \|
	76	+\| `constraint` \| constraint, limitation, restrict, cannot, must not \|
	77	+\| `decision` \| decided, decision, chose, selected, agreed \|
	78	+\| `risk` \| risk, concern, worry, danger, threat \|
	79	+\| `assumption` \| assume, assumption, expecting, presume \|
	80	+\| `dependency` \| depends, dependency, relies on, prerequisite, blocked \|
	81	+\| `milestone` \| milestone, deadline, deliverable, release, launch \|
	82	+\| `task` \| task, todo, action item, work item, implement \|
	83	+\| `feature` \| feature, capability, functionality \|
	84	+
	85	+Classification works in two stages:
	86	+
	87	+1. Heuristic classification. Entity descriptions are scanned for the keywords listed above. First match wins.
	88	+2. LLM refinement. If an LLM provider is available, entities are sent to the LLM for more nuanced classification with priority assignment (`high`, `medium`, `low`). LLM results override heuristic results on conflicts.
	89	+
	90	+Classified entities are used by planning agent skills (project_plan, prd, roadmap, task_breakdown) to produce targeted, context-aware artifacts.
	91	+
	92	+---
	93	+
	94	+## Relationship Types
	95	+
	96	+Relationships are directed edges between entities. The `type` field is a free-text string determined by the LLM during extraction. Common relationship types include:
	97	+
	98	+- `related_to` (default)
	99	+- `works_with`
	100	+- `uses`
	101	+- `depends_on`
	102	+- `proposed`
	103	+- `discussed_by`
	104	+- `employed_by`
	105	+- `collaborates_with`
	106	+- `expert_in`
	107	+
	108	+### Typed Relationships
	109	+
	110	+The `add_typed_relationship()` method creates edges with custom labels and optional properties, enabling richer graph semantics:
	111	+
	112	+```python
	113	+store.add_typed_relationship(
	114	+ source="Authentication Service",
	115	+ target="PostgreSQL",
	116	+ edge_label="USES_SYSTEM",
	117	+ properties={"purpose": "user credential storage", "version": "15"},
	118	+)
	119	+```
	120	+
	121	+### Relationship Checks
	122	+
	123	+You can check whether a relationship exists between two entities:
	124	+
	125	+```python
	126	+# Check for any relationship
	127	+store.has_relationship("Alice", "Kubernetes")
	128	+
	129	+# Check for a specific relationship type
	130	+store.has_relationship("Alice", "Kubernetes", edge_label="expert_in")
	131	+```
	132	+
	133	+---
	134	+
	135	+## Building a Knowledge Graph
	136	+
	137	+### From Video Analysis
	138	+
	139	+The primary path for building a knowledge graph is through video analysis. When you run `planopticon analyze`, the pipeline extracts entities and relationships from:
	140	+
	141	+- Transcript segments -- batched in groups of 10 for efficient API usage, with speaker identification
	142	+- Diagram content -- text extracted from visual diagrams detected in video frames
	143	+
	144	+```bash
	145	+planopticon analyze -i meeting.mp4 -o results/
	146	+# Creates results/knowledge_graph.db
	147	+```
	148	+
	149	+### From Document Ingestion
	150	+
	151	+Documents (Markdown, PDF, DOCX) can be ingested directly into a knowledge graph:
	152	+
	153	+```bash
	154	+# Ingest a single file
	155	+planopticon ingest -i requirements.pdf -o results/
	156	+
	157	+# Ingest a directory recursively
	158	+planopticon ingest -i docs/ -o results/ --recursive
	159	+
	160	+# Ingest into an existing knowledge graph
	161	+planopticon ingest -i notes.md --db results/knowledge_graph.db
	162	+```
	163	+
	164	+### From Batch Processing
	165	+
	166	+Multiple videos can be processed in batch mode, with all results merged into a single knowledge graph:
	167	+
	168	+```bash
	169	+planopticon batch -i videos/ -o results/
	170	+```
	171	+
	172	+### Programmatic Construction
	173	+
	174	+```python
	175	+from video_processor.integrators.knowledge_graph import KnowledgeGraph
	176	+
	177	+# Create a new knowledge graph with LLM extraction
	178	+from video_processor.providers.manager import ProviderManager
	179	+pm = ProviderManager()
	180	+kg = KnowledgeGraph(provider_manager=pm, db_path="knowledge_graph.db")
	181	+
	182	+# Add content (entities and relationships are extracted by LLM)
	183	+kg.add_content(
	184	+ text="Alice proposed using Kubernetes for container orchestration.",
	185	+ source="meeting_notes",
	186	+ timestamp=120.5,
	187	+)
	188	+
	189	+# Process a full transcript
	190	+kg.process_transcript(transcript_data, batch_size=10)
	191	+
	192	+# Process diagram results
	193	+kg.process_diagrams(diagram_results)
	194	+
	195	+# Save
	196	+kg.save("knowledge_graph.db")
	197	+```
	198	+
	199	+---
	200	+
	201	+## Merge and Deduplication
	202	+
	203	+When combining knowledge graphs from multiple sources, PlanOpticon performs intelligent merge with deduplication.
	204	+
	205	+### Fuzzy Name Matching
	206	+
	207	+Entity names are compared using Python's `SequenceMatcher` with a threshold of 0.85. This means "Kubernetes" and "kubernetes" are matched exactly (case-insensitive), while "React.js" and "ReactJS" may be matched as duplicates if their similarity ratio meets the threshold.
	208	+
	209	+### Type Conflict Resolution
	210	+
	211	+When two entities match but have different types, the more specific type wins based on the specificity ranking:
	212	+
	213	+\| Scenario \| Result \|
	214	+\|---\|---\|
	215	+\| `concept` vs `technology` \| `technology` wins (rank 3 > rank 0) \|
	216	+\| `person` vs `concept` \| `person` wins (rank 3 > rank 0) \|
	217	+\| `organization` vs `concept` \| `organization` wins (rank 2 > rank 0) \|
	218	+\| `person` vs `technology` \| Keeps whichever was first (equal rank) \|
	219	+
	220	+### Provenance Tracking
	221	+
	222	+Merged entities receive a `merged_from:<original_name>` description entry, preserving the audit trail of which entities were unified.
	223	+
	224	+### Programmatic Merge
	225	+
	226	+```python
	227	+from video_processor.integrators.knowledge_graph import KnowledgeGraph
	228	+
	229	+# Load two knowledge graphs
	230	+kg1 = KnowledgeGraph(db_path="project_a.db")
	231	+kg2 = KnowledgeGraph(db_path="project_b.db")
	232	+
	233	+# Merge kg2 into kg1
	234	+kg1.merge(kg2)
	235	+
	236	+# Save the merged result
	237	+kg1.save("merged.db")
	238	+```
	239	+
	240	+The merge operation also copies all registered sources and occurrences, so provenance information is preserved across merges.
	241	+
	242	+---
	243	+
	244	+## Querying
	245	+
	246	+PlanOpticon provides two query modes: direct mode (no LLM required) and agentic mode (LLM-powered natural language).
	247	+
	248	+### Direct Mode
	249	+
	250	+Direct mode queries are fast, deterministic, and require no API key. They are the right choice for structured lookups.
	251	+
	252	+#### Stats
	253	+
	254	+Return entity count, relationship count, and entity type breakdown:
	255	+
	256	+```bash
	257	+planopticon query
	258	+```
	259	+
	260	+```python
	261	+engine.stats()
	262	+# QueryResult with data: {
	263	+# "entity_count": 42,
	264	+# "relationship_count": 87,
	265	+# "entity_types": {"technology": 15, "person": 12, ...}
	266	+# }
	267	+```
	268	+
	269	+#### Entities
	270	+
	271	+Filter entities by name substring and/or type:
	272	+
	273	+```bash
	274	+planopticon query "entities --type technology"
	275	+planopticon query "entities --name python"
	276	+```
	277	+
	278	+```python
	279	+engine.entities(entity_type="technology")
	280	+engine.entities(name="python")
	281	+engine.entities(name="auth", entity_type="concept", limit=10)
	282	+```
	283	+
	284	+All filtering is case-insensitive. Results are capped at 50 by default (configurable via `limit`).
	285	+
	286	+#### Neighbors
	287	+
	288	+Get an entity and all directly connected nodes and relationships:
	289	+
	290	+```bash
	291	+planopticon query "neighbors Alice"
	292	+```
	293	+
	294	+```python
	295	+engine.neighbors("Alice", depth=1)
	296	+```
	297	+
	298	+The `depth` parameter controls how many hops to traverse (default 1). The result includes both entity objects and relationship objects.
	299	+
	300	+#### Relationships
	301	+
	302	+Filter relationships by source, target, and/or type:
	303	+
	304	+```bash
	305	+planopticon query "relationships --source Alice"
	306	+```
	307	+
	308	+```python
	309	+engine.relationships(source="Alice")
	310	+engine.relationships(target="Kubernetes", rel_type="uses")
	311	+```
	312	+
	313	+#### Sources
	314	+
	315	+List all registered content sources:
	316	+
	317	+```python
	318	+engine.sources()
	319	+```
	320	+
	321	+#### Provenance
	322	+
	323	+Get all source locations for a specific entity, showing exactly where it was mentioned:
	324	+
	325	+```python
	326	+engine.provenance("Kubernetes")
	327	+# Returns source locations with timestamps, pages, sections, and text snippets
	328	+```
	329	+
	330	+#### Raw SQL
	331	+
	332	+Execute arbitrary SQL against the SQLite backend (SQLite stores only):
	333	+
	334	+```python
	335	+engine.sql("SELECT name, type FROM entities WHERE type = 'technology' ORDER BY name")
	336	+```
	337	+
	338	+### Agentic Mode
	339	+
	340	+Agentic mode accepts natural-language questions and uses the LLM to plan and execute queries. It requires a configured LLM provider.
	341	+
	342	+```bash
	343	+planopticon query "What technologies were discussed?"
	344	+planopticon query "Who are the key people mentioned?"
	345	+planopticon query "What depends on the authentication service?"
	346	+```
	347	+
	348	+The agentic query pipeline:
	349	+
	350	+1. Plan. The LLM receives graph stats and available actions (entities, relationships, neighbors, stats). It selects exactly one action and its parameters.
	351	+2. Execute. The chosen action is run through the direct-mode engine.
	352	+3. Synthesize. The LLM receives the raw query results and the original question, then produces a concise natural-language answer.
	353	+
	354	+This design ensures the LLM never generates arbitrary code -- it only selects from a fixed set of known query actions.
	355	+
	356	+```bash
	357	+# Requires an API key
	358	+planopticon query "What technologies were discussed?" -p openai
	359	+
	360	+# Use the interactive REPL for multiple queries
	361	+planopticon query -I
	362	+```
	363	+
	364	+---
	365	+
	366	+## Graph Query Engine Python API
	367	+
	368	+The `GraphQueryEngine` class provides the programmatic interface for all query operations.
	369	+
	370	+### Initialization
	371	+
	372	+```python
	373	+from video_processor.integrators.graph_query import GraphQueryEngine
	374	+from video_processor.integrators.graph_discovery import find_nearest_graph
	375	+
	376	+# From a .db file
	377	+path = find_nearest_graph()
	378	+engine = GraphQueryEngine.from_db_path(path)
	379	+
	380	+# From a .json file
	381	+engine = GraphQueryEngine.from_json_path("knowledge_graph.json")
	382	+
	383	+# With an LLM provider for agentic mode
	384	+from video_processor.providers.manager import ProviderManager
	385	+pm = ProviderManager()
	386	+engine = GraphQueryEngine.from_db_path(path, provider_manager=pm)
	387	+```
	388	+
	389	+### QueryResult
	390	+
	391	+All query methods return a `QueryResult` dataclass with multiple output formats:
	392	+
	393	+```python
	394	+result = engine.stats()
	395	+
	396	+# Human-readable text
	397	+print(result.to_text())
	398	+
	399	+# JSON string
	400	+print(result.to_json())
	401	+
	402	+# Mermaid diagram (for graph results)
	403	+result = engine.neighbors("Alice")
	404	+print(result.to_mermaid())
	405	+```
	406	+
	407	+The `QueryResult` contains:
	408	+
	409	+\| Field \| Type \| Description \|
	410	+\|---\|---\|---\|
	411	+\| `data` \| Any \| The raw result data (dict, list, or scalar) \|
	412	+\| `query_type` \| str \| `"filter"` for direct mode, `"agentic"` for LLM mode, `"sql"` for raw SQL \|
	413	+\| `raw_query` \| str \| String representation of the executed query \|
	414	+\| `explanation` \| str \| Human-readable explanation or LLM-synthesized answer \|
	415	+
	416	+---
	417	+
	418	+## The Self-Contained HTML Viewer
	419	+
	420	+PlanOpticon includes a zero-dependency HTML knowledge graph viewer at `knowledge-base/viewer.html`. This file is fully self-contained -- it inlines D3.js and requires no build step, no server, and no internet connection.
	421	+
	422	+To use it, open `viewer.html` in a browser. It will load and visualize a `knowledge_graph.json` file (place it in the same directory, or use the file picker in the viewer).
	423	+
	424	+The viewer provides:
	425	+
	426	+- Interactive force-directed graph layout
	427	+- Zoom and pan navigation
	428	+- Entity nodes colored by type
	429	+- Relationship edges with labels
	430	+- Click-to-focus on individual entities
	431	+- Entity detail panel showing descriptions and connections
	432	+
	433	+This covers approximately 80% of graph exploration needs with zero infrastructure.
	434	+
	435	+---
	436	+
	437	+## KG Management Commands
	438	+
	439	+The `planopticon kg` command group provides utilities for managing knowledge graph files.
	440	+
	441	+### kg convert
	442	+
	443	+Convert a knowledge graph between SQLite and JSON formats:
	444	+
	445	+```bash
	446	+# SQLite to JSON
	447	+planopticon kg convert results/knowledge_graph.db output.json
	448	+
	449	+# JSON to SQLite
	450	+planopticon kg convert knowledge_graph.json knowledge_graph.db
	451	+```
	452	+
	453	+The output format is inferred from the destination file extension. Source and destination must be different formats.
	454	+
	455	+### kg sync
	456	+
	457	+Synchronize a `.db` and `.json` knowledge graph, updating the stale one:
	458	+
	459	+```bash
	460	+# Auto-detect which is newer and sync
	461	+planopticon kg sync results/knowledge_graph.db
	462	+
	463	+# Explicit JSON path
	464	+planopticon kg sync knowledge_graph.db knowledge_graph.json
	465	+
	466	+# Force a specific direction
	467	+planopticon kg sync knowledge_graph.db knowledge_graph.json --direction db-to-json
	468	+planopticon kg sync knowledge_graph.db knowledge_graph.json --direction json-to-db
	469	+```
	470	+
	471	+If `JSON_PATH` is omitted, the `.json` path is derived from the `.db` path (same name, different extension). In `auto` mode (the default), the newer file is used as the source.
	472	+
	473	+### kg inspect
	474	+
	475	+Show summary statistics for a knowledge graph file:
	476	+
	477	+```bash
	478	+planopticon kg inspect results/knowledge_graph.db
	479	+```
	480	+
	481	+Output:
	482	+
	483	+```
	484	+File: results/knowledge_graph.db
	485	+Store: sqlite
	486	+Entities: 42
	487	+Relationships: 87
	488	+Entity types:
	489	+ technology: 15
	490	+ person: 12
	491	+ concept: 10
	492	+ organization: 5
	493	+```
	494	+
	495	+Works with both `.db` and `.json` files.
	496	+
	497	+### kg classify
	498	+
	499	+Classify knowledge graph entities into planning taxonomy types:
	500	+
	501	+```bash
	502	+# Heuristic + LLM classification
	503	+planopticon kg classify results/knowledge_graph.db
	504	+
	505	+# Heuristic only (no API key needed)
	506	+planopticon kg classify results/knowledge_graph.db -p none
	507	+
	508	+# JSON output
	509	+planopticon kg classify results/knowledge_graph.db --format json
	510	+```
	511	+
	512	+Text output groups entities by planning type:
	513	+
	514	+```
	515	+GOALS (3)
	516	+ - Improve system reliability [high]
	517	+ Must achieve 99.9% uptime
	518	+ - Reduce deployment time [medium]
	519	+ Automate the deployment pipeline
	520	+
	521	+RISKS (2)
	522	+ - Data migration complexity [high]
	523	+ Legacy schema incompatibilities
	524	+ ...
	525	+
	526	+TASKS (5)
	527	+ - Implement OAuth2 flow
	528	+ Set up authentication service
	529	+ ...
	530	+```
	531	+
	532	+JSON output returns an array of `PlanningEntity` objects with `name`, `planning_type`, `priority`, `description`, and `source_entities` fields.
	533	+
	534	+### kg from-exchange
	535	+
	536	+Import a PlanOpticonExchange JSON file into a knowledge graph database:
	537	+
	538	+```bash
	539	+# Import to default location (./knowledge_graph.db)
	540	+planopticon kg from-exchange exchange.json
	541	+
	542	+# Import to a specific path
	543	+planopticon kg from-exchange exchange.json -o project.db
	544	+```
	545	+
	546	+The PlanOpticonExchange format is a standardized interchange format that includes entities, relationships, and source records.
	547	+
	548	+---
	549	+
	550	+## Output Formats
	551	+
	552	+Query results can be output in three formats:
	553	+
	554	+### Text (default)
	555	+
	556	+Human-readable format with entity types in brackets, relationship arrows, and indented details:
	557	+
	558	+```
	559	+Found 15 entities
	560	+ [technology] Python -- General-purpose programming language
	561	+ [person] Alice -- Lead engineer on the project
	562	+ [concept] Microservices -- Architectural pattern discussed
	563	+```
	564	+
	565	+### JSON
	566	+
	567	+Full structured output including query metadata:
	568	+
	569	+```bash
	570	+planopticon query --format json stats
	571	+```
	572	+
	573	+```json
	574	+{
	575	+ "query_type": "filter",
	576	+ "raw_query": "stats()",
	577	+ "explanation": "Knowledge graph statistics",
	578	+ "data": {
	579	+ "entity_count": 42,
	580	+ "relationship_count": 87,
	581	+ "entity_types": {
	582	+ "technology": 15,
	583	+ "person": 12
	584	+ }
	585	+ }
	586	+}
	587	+```
	588	+
	589	+### Mermaid
	590	+
	591	+Graph results rendered as Mermaid diagram syntax, ready for embedding in markdown:
	592	+
	593	+```bash
	594	+planopticon query --format mermaid "neighbors Alice"
	595	+```
	596	+
	597	+```
	598	+graph LR
	599	+ Alice["Alice"]:::person
	600	+ Python["Python"]:::technology
	601	+ Kubernetes["Kubernetes"]:::technology
	602	+ Alice -- "expert_in" --> Kubernetes
	603	+ Alice -- "works_with" --> Python
	604	+ classDef person fill:#f9d5e5,stroke:#333
	605	+ classDef concept fill:#eeeeee,stroke:#333
	606	+ classDef technology fill:#d5e5f9,stroke:#333
	607	+ classDef organization fill:#f9e5d5,stroke:#333
	608	+```
	609	+
	610	+The `KnowledgeGraph.generate_mermaid()` method also produces full-graph Mermaid diagrams, capped at the top 30 most-connected nodes by default.
	611	+
	612	+---
	613	+
	614	+## Auto-Discovery
	615	+
	616	+PlanOpticon automatically locates knowledge graph files using the `find_nearest_graph()` function. The search order is:
	617	+
	618	+1. Current directory -- check for `knowledge_graph.db` and `knowledge_graph.json`
	619	+2. Common subdirectories -- `results/`, `output/`, `knowledge-base/`
	620	+3. Recursive downward walk -- up to 4 levels deep, skipping hidden directories
	621	+4. Parent directory walk -- upward through the directory tree, checking each level and its common subdirectories
	622	+
	623	+Within each search phase, `.db` files are preferred over `.json` files. Results are sorted by proximity (closest first).
	624	+
	625	+```python
	626	+from video_processor.integrators.graph_discovery import (
	627	+ find_nearest_graph,
	628	+ find_knowledge_graphs,
	629	+ describe_graph,
	630	+)
	631	+
	632	+# Find the single closest knowledge graph
	633	+path = find_nearest_graph()
	634	+
	635	+# Find all knowledge graphs, sorted by proximity
	636	+paths = find_knowledge_graphs()
	637	+
	638	+# Find graphs starting from a specific directory
	639	+paths = find_knowledge_graphs(start_dir="/path/to/project")
	640	+
	641	+# Disable upward walking
	642	+paths = find_knowledge_graphs(walk_up=False)
	643	+
	644	+# Get summary stats without loading the full graph
	645	+info = describe_graph(path)
	646	+# {"entity_count": 42, "relationship_count": 87,
	647	+# "entity_types": {...}, "store_type": "sqlite"}
	648	+```
	649	+
	650	+Auto-discovery is used by the Companion REPL, the `planopticon query` command, and the planning agent when no explicit `--kb` path is provided.

	--- a/docs/guide/knowledge-graphs.md
	+++ b/docs/guide/knowledge-graphs.md
	@@ -0,0 +1,650 @@

	--- a/docs/guide/knowledge-graphs.md
	+++ b/docs/guide/knowledge-graphs.md
	@@ -0,0 +1,650 @@
1	# Knowledge Graphs
2
3	PlanOpticon builds structured knowledge graphs from video analyses, document ingestion, and other content sources. A knowledge graph captures entities (people, technologies, concepts, organizations) and the relationships between them, providing a queryable representation of everything discussed or presented in your source material.
4
5	---
6
7	## Storage
8
9	Knowledge graphs are stored as SQLite databases (`knowledge_graph.db`) using Python's built-in `sqlite3` module. This means:
10
11	- Zero external dependencies. No database server to install or manage.
12	- Single-file portability. Copy the `.db` file to share a knowledge graph.
13	- WAL mode. SQLite Write-Ahead Logging is enabled for concurrent read performance.
14	- JSON fallback. Knowledge graphs can also be saved as `knowledge_graph.json` for interoperability, though SQLite is preferred for performance and querying.
15
16	### Database Schema
17
18	The SQLite store uses the following tables:
19
20	\| Table \| Purpose \|
21	\|---\|---\|
22	\| `entities` \| Core entity records with name, type, descriptions, source, and arbitrary properties \|
23	\| `occurrences` \| Where and when each entity was mentioned (source, timestamp, text snippet) \|
24	\| `relationships` \| Directed edges between entities with type, content source, timestamp, and properties \|
25	\| `sources` \| Registered content sources with provenance metadata (source type, title, path, URL, MIME type, ingestion timestamp) \|
26	\| `source_locations` \| Links between sources and specific entities/relationships, with location details (timestamp, page, section, line range, text snippet) \|
27
28	All entity lookups are case-insensitive (indexed on `name_lower`). Entities and relationships are indexed on their source and target fields for efficient traversal.
29
30	### Storage Backends
31
32	PlanOpticon supports two storage backends, selected automatically:
33
34	\| Backend \| When Used \| Persistence \|
35	\|---\|---\|---\|
36	\| `SQLiteStore` \| When a `db_path` is provided \| Persistent on disk \|
37	\| `InMemoryStore` \| When no path is given, or as fallback \| In-memory only \|
38
39	Both backends implement the same `GraphStore` abstract interface, so all query and manipulation code works identically regardless of backend.
40
41	```python
42	from video_processor.integrators.graph_store import create_store
43
44	# Persistent SQLite store
45	store = create_store("/path/to/knowledge_graph.db")
46
47	# In-memory store (for temporary operations)
48	store = create_store()
49	```
50
51	---
52
53	## Entity Types
54
55	Entities extracted from content are assigned one of the following base types:
56
57	\| Type \| Description \| Specificity Rank \|
58	\|---\|---\|---\|
59	\| `person` \| People mentioned or participating \| 3 (highest) \|
60	\| `technology` \| Tools, languages, frameworks, platforms \| 3 \|
61	\| `organization` \| Companies, teams, departments \| 2 \|
62	\| `time` \| Dates, deadlines, time references \| 1 \|
63	\| `diagram` \| Visual diagrams extracted from video frames \| 1 \|
64	\| `concept` \| General concepts, topics, ideas (default) \| 0 (lowest) \|
65
66	The specificity rank is used during merge operations: when two entities are matched as duplicates, the more specific type wins (e.g., `technology` overrides `concept`).
67
68	### Planning Taxonomy
69
70	Beyond the base entity types, PlanOpticon includes a planning taxonomy for classifying entities into project-planning categories. The `TaxonomyClassifier` maps extracted entities into these types:
71
72	\| Planning Type \| Keywords Matched \|
73	\|---\|---\|
74	\| `goal` \| goal, objective, aim, target outcome \|
75	\| `requirement` \| must, should, requirement, need, required \|
76	\| `constraint` \| constraint, limitation, restrict, cannot, must not \|
77	\| `decision` \| decided, decision, chose, selected, agreed \|
78	\| `risk` \| risk, concern, worry, danger, threat \|
79	\| `assumption` \| assume, assumption, expecting, presume \|
80	\| `dependency` \| depends, dependency, relies on, prerequisite, blocked \|
81	\| `milestone` \| milestone, deadline, deliverable, release, launch \|
82	\| `task` \| task, todo, action item, work item, implement \|
83	\| `feature` \| feature, capability, functionality \|
84
85	Classification works in two stages:
86
87	1. Heuristic classification. Entity descriptions are scanned for the keywords listed above. First match wins.
88	2. LLM refinement. If an LLM provider is available, entities are sent to the LLM for more nuanced classification with priority assignment (`high`, `medium`, `low`). LLM results override heuristic results on conflicts.
89
90	Classified entities are used by planning agent skills (project_plan, prd, roadmap, task_breakdown) to produce targeted, context-aware artifacts.
91
92	---
93
94	## Relationship Types
95
96	Relationships are directed edges between entities. The `type` field is a free-text string determined by the LLM during extraction. Common relationship types include:
97
98	- `related_to` (default)
99	- `works_with`
100	- `uses`
101	- `depends_on`
102	- `proposed`
103	- `discussed_by`
104	- `employed_by`
105	- `collaborates_with`
106	- `expert_in`
107
108	### Typed Relationships
109
110	The `add_typed_relationship()` method creates edges with custom labels and optional properties, enabling richer graph semantics:
111
112	```python
113	store.add_typed_relationship(
114	source="Authentication Service",
115	target="PostgreSQL",
116	edge_label="USES_SYSTEM",
117	properties={"purpose": "user credential storage", "version": "15"},
118	)
119	```
120
121	### Relationship Checks
122
123	You can check whether a relationship exists between two entities:
124
125	```python
126	# Check for any relationship
127	store.has_relationship("Alice", "Kubernetes")
128
129	# Check for a specific relationship type
130	store.has_relationship("Alice", "Kubernetes", edge_label="expert_in")
131	```
132
133	---
134
135	## Building a Knowledge Graph
136
137	### From Video Analysis
138
139	The primary path for building a knowledge graph is through video analysis. When you run `planopticon analyze`, the pipeline extracts entities and relationships from:
140
141	- Transcript segments -- batched in groups of 10 for efficient API usage, with speaker identification
142	- Diagram content -- text extracted from visual diagrams detected in video frames
143
144	```bash
145	planopticon analyze -i meeting.mp4 -o results/
146	# Creates results/knowledge_graph.db
147	```
148
149	### From Document Ingestion
150
151	Documents (Markdown, PDF, DOCX) can be ingested directly into a knowledge graph:
152
153	```bash
154	# Ingest a single file
155	planopticon ingest -i requirements.pdf -o results/
156
157	# Ingest a directory recursively
158	planopticon ingest -i docs/ -o results/ --recursive
159
160	# Ingest into an existing knowledge graph
161	planopticon ingest -i notes.md --db results/knowledge_graph.db
162	```
163
164	### From Batch Processing
165
166	Multiple videos can be processed in batch mode, with all results merged into a single knowledge graph:
167
168	```bash
169	planopticon batch -i videos/ -o results/
170	```
171
172	### Programmatic Construction
173
174	```python
175	from video_processor.integrators.knowledge_graph import KnowledgeGraph
176
177	# Create a new knowledge graph with LLM extraction
178	from video_processor.providers.manager import ProviderManager
179	pm = ProviderManager()
180	kg = KnowledgeGraph(provider_manager=pm, db_path="knowledge_graph.db")
181
182	# Add content (entities and relationships are extracted by LLM)
183	kg.add_content(
184	text="Alice proposed using Kubernetes for container orchestration.",
185	source="meeting_notes",
186	timestamp=120.5,
187	)
188
189	# Process a full transcript
190	kg.process_transcript(transcript_data, batch_size=10)
191
192	# Process diagram results
193	kg.process_diagrams(diagram_results)
194
195	# Save
196	kg.save("knowledge_graph.db")
197	```
198
199	---
200
201	## Merge and Deduplication
202
203	When combining knowledge graphs from multiple sources, PlanOpticon performs intelligent merge with deduplication.
204
205	### Fuzzy Name Matching
206
207	Entity names are compared using Python's `SequenceMatcher` with a threshold of 0.85. This means "Kubernetes" and "kubernetes" are matched exactly (case-insensitive), while "React.js" and "ReactJS" may be matched as duplicates if their similarity ratio meets the threshold.
208
209	### Type Conflict Resolution
210
211	When two entities match but have different types, the more specific type wins based on the specificity ranking:
212
213	\| Scenario \| Result \|
214	\|---\|---\|
215	\| `concept` vs `technology` \| `technology` wins (rank 3 > rank 0) \|
216	\| `person` vs `concept` \| `person` wins (rank 3 > rank 0) \|
217	\| `organization` vs `concept` \| `organization` wins (rank 2 > rank 0) \|
218	\| `person` vs `technology` \| Keeps whichever was first (equal rank) \|
219
220	### Provenance Tracking
221
222	Merged entities receive a `merged_from:<original_name>` description entry, preserving the audit trail of which entities were unified.
223
224	### Programmatic Merge
225
226	```python
227	from video_processor.integrators.knowledge_graph import KnowledgeGraph
228
229	# Load two knowledge graphs
230	kg1 = KnowledgeGraph(db_path="project_a.db")
231	kg2 = KnowledgeGraph(db_path="project_b.db")
232
233	# Merge kg2 into kg1
234	kg1.merge(kg2)
235
236	# Save the merged result
237	kg1.save("merged.db")
238	```
239
240	The merge operation also copies all registered sources and occurrences, so provenance information is preserved across merges.
241
242	---
243
244	## Querying
245
246	PlanOpticon provides two query modes: direct mode (no LLM required) and agentic mode (LLM-powered natural language).
247
248	### Direct Mode
249
250	Direct mode queries are fast, deterministic, and require no API key. They are the right choice for structured lookups.
251
252	#### Stats
253
254	Return entity count, relationship count, and entity type breakdown:
255
256	```bash
257	planopticon query
258	```
259
260	```python
261	engine.stats()
262	# QueryResult with data: {
263	# "entity_count": 42,
264	# "relationship_count": 87,
265	# "entity_types": {"technology": 15, "person": 12, ...}
266	# }
267	```
268
269	#### Entities
270
271	Filter entities by name substring and/or type:
272
273	```bash
274	planopticon query "entities --type technology"
275	planopticon query "entities --name python"
276	```
277
278	```python
279	engine.entities(entity_type="technology")
280	engine.entities(name="python")
281	engine.entities(name="auth", entity_type="concept", limit=10)
282	```
283
284	All filtering is case-insensitive. Results are capped at 50 by default (configurable via `limit`).
285
286	#### Neighbors
287
288	Get an entity and all directly connected nodes and relationships:
289
290	```bash
291	planopticon query "neighbors Alice"
292	```
293
294	```python
295	engine.neighbors("Alice", depth=1)
296	```
297
298	The `depth` parameter controls how many hops to traverse (default 1). The result includes both entity objects and relationship objects.
299
300	#### Relationships
301
302	Filter relationships by source, target, and/or type:
303
304	```bash
305	planopticon query "relationships --source Alice"
306	```
307
308	```python
309	engine.relationships(source="Alice")
310	engine.relationships(target="Kubernetes", rel_type="uses")
311	```
312
313	#### Sources
314
315	List all registered content sources:
316
317	```python
318	engine.sources()
319	```
320
321	#### Provenance
322
323	Get all source locations for a specific entity, showing exactly where it was mentioned:
324
325	```python
326	engine.provenance("Kubernetes")
327	# Returns source locations with timestamps, pages, sections, and text snippets
328	```
329
330	#### Raw SQL
331
332	Execute arbitrary SQL against the SQLite backend (SQLite stores only):
333
334	```python
335	engine.sql("SELECT name, type FROM entities WHERE type = 'technology' ORDER BY name")
336	```
337
338	### Agentic Mode
339
340	Agentic mode accepts natural-language questions and uses the LLM to plan and execute queries. It requires a configured LLM provider.
341
342	```bash
343	planopticon query "What technologies were discussed?"
344	planopticon query "Who are the key people mentioned?"
345	planopticon query "What depends on the authentication service?"
346	```
347
348	The agentic query pipeline:
349
350	1. Plan. The LLM receives graph stats and available actions (entities, relationships, neighbors, stats). It selects exactly one action and its parameters.
351	2. Execute. The chosen action is run through the direct-mode engine.
352	3. Synthesize. The LLM receives the raw query results and the original question, then produces a concise natural-language answer.
353
354	This design ensures the LLM never generates arbitrary code -- it only selects from a fixed set of known query actions.
355
356	```bash
357	# Requires an API key
358	planopticon query "What technologies were discussed?" -p openai
359
360	# Use the interactive REPL for multiple queries
361	planopticon query -I
362	```
363
364	---
365
366	## Graph Query Engine Python API
367
368	The `GraphQueryEngine` class provides the programmatic interface for all query operations.
369
370	### Initialization
371
372	```python
373	from video_processor.integrators.graph_query import GraphQueryEngine
374	from video_processor.integrators.graph_discovery import find_nearest_graph
375
376	# From a .db file
377	path = find_nearest_graph()
378	engine = GraphQueryEngine.from_db_path(path)
379
380	# From a .json file
381	engine = GraphQueryEngine.from_json_path("knowledge_graph.json")
382
383	# With an LLM provider for agentic mode
384	from video_processor.providers.manager import ProviderManager
385	pm = ProviderManager()
386	engine = GraphQueryEngine.from_db_path(path, provider_manager=pm)
387	```
388
389	### QueryResult
390
391	All query methods return a `QueryResult` dataclass with multiple output formats:
392
393	```python
394	result = engine.stats()
395
396	# Human-readable text
397	print(result.to_text())
398
399	# JSON string
400	print(result.to_json())
401
402	# Mermaid diagram (for graph results)
403	result = engine.neighbors("Alice")
404	print(result.to_mermaid())
405	```
406
407	The `QueryResult` contains:
408
409	\| Field \| Type \| Description \|
410	\|---\|---\|---\|
411	\| `data` \| Any \| The raw result data (dict, list, or scalar) \|
412	\| `query_type` \| str \| `"filter"` for direct mode, `"agentic"` for LLM mode, `"sql"` for raw SQL \|
413	\| `raw_query` \| str \| String representation of the executed query \|
414	\| `explanation` \| str \| Human-readable explanation or LLM-synthesized answer \|
415
416	---
417
418	## The Self-Contained HTML Viewer
419
420	PlanOpticon includes a zero-dependency HTML knowledge graph viewer at `knowledge-base/viewer.html`. This file is fully self-contained -- it inlines D3.js and requires no build step, no server, and no internet connection.
421
422	To use it, open `viewer.html` in a browser. It will load and visualize a `knowledge_graph.json` file (place it in the same directory, or use the file picker in the viewer).
423
424	The viewer provides:
425
426	- Interactive force-directed graph layout
427	- Zoom and pan navigation
428	- Entity nodes colored by type
429	- Relationship edges with labels
430	- Click-to-focus on individual entities
431	- Entity detail panel showing descriptions and connections
432
433	This covers approximately 80% of graph exploration needs with zero infrastructure.
434
435	---
436
437	## KG Management Commands
438
439	The `planopticon kg` command group provides utilities for managing knowledge graph files.
440
441	### kg convert
442
443	Convert a knowledge graph between SQLite and JSON formats:
444
445	```bash
446	# SQLite to JSON
447	planopticon kg convert results/knowledge_graph.db output.json
448
449	# JSON to SQLite
450	planopticon kg convert knowledge_graph.json knowledge_graph.db
451	```
452
453	The output format is inferred from the destination file extension. Source and destination must be different formats.
454
455	### kg sync
456
457	Synchronize a `.db` and `.json` knowledge graph, updating the stale one:
458
459	```bash
460	# Auto-detect which is newer and sync
461	planopticon kg sync results/knowledge_graph.db
462
463	# Explicit JSON path
464	planopticon kg sync knowledge_graph.db knowledge_graph.json
465
466	# Force a specific direction
467	planopticon kg sync knowledge_graph.db knowledge_graph.json --direction db-to-json
468	planopticon kg sync knowledge_graph.db knowledge_graph.json --direction json-to-db
469	```
470
471	If `JSON_PATH` is omitted, the `.json` path is derived from the `.db` path (same name, different extension). In `auto` mode (the default), the newer file is used as the source.
472
473	### kg inspect
474
475	Show summary statistics for a knowledge graph file:
476
477	```bash
478	planopticon kg inspect results/knowledge_graph.db
479	```
480
481	Output:
482
483	```
484	File: results/knowledge_graph.db
485	Store: sqlite
486	Entities: 42
487	Relationships: 87
488	Entity types:
489	technology: 15
490	person: 12
491	concept: 10
492	organization: 5
493	```
494
495	Works with both `.db` and `.json` files.
496
497	### kg classify
498
499	Classify knowledge graph entities into planning taxonomy types:
500
501	```bash
502	# Heuristic + LLM classification
503	planopticon kg classify results/knowledge_graph.db
504
505	# Heuristic only (no API key needed)
506	planopticon kg classify results/knowledge_graph.db -p none
507
508	# JSON output
509	planopticon kg classify results/knowledge_graph.db --format json
510	```
511
512	Text output groups entities by planning type:
513
514	```
515	GOALS (3)
516	- Improve system reliability [high]
517	Must achieve 99.9% uptime
518	- Reduce deployment time [medium]
519	Automate the deployment pipeline
520
521	RISKS (2)
522	- Data migration complexity [high]
523	Legacy schema incompatibilities
524	...
525
526	TASKS (5)
527	- Implement OAuth2 flow
528	Set up authentication service
529	...
530	```
531
532	JSON output returns an array of `PlanningEntity` objects with `name`, `planning_type`, `priority`, `description`, and `source_entities` fields.
533
534	### kg from-exchange
535
536	Import a PlanOpticonExchange JSON file into a knowledge graph database:
537
538	```bash
539	# Import to default location (./knowledge_graph.db)
540	planopticon kg from-exchange exchange.json
541
542	# Import to a specific path
543	planopticon kg from-exchange exchange.json -o project.db
544	```
545
546	The PlanOpticonExchange format is a standardized interchange format that includes entities, relationships, and source records.
547
548	---
549
550	## Output Formats
551
552	Query results can be output in three formats:
553
554	### Text (default)
555
556	Human-readable format with entity types in brackets, relationship arrows, and indented details:
557
558	```
559	Found 15 entities
560	[technology] Python -- General-purpose programming language
561	[person] Alice -- Lead engineer on the project
562	[concept] Microservices -- Architectural pattern discussed
563	```
564
565	### JSON
566
567	Full structured output including query metadata:
568
569	```bash
570	planopticon query --format json stats
571	```
572
573	```json
574	{
575	"query_type": "filter",
576	"raw_query": "stats()",
577	"explanation": "Knowledge graph statistics",
578	"data": {
579	"entity_count": 42,
580	"relationship_count": 87,
581	"entity_types": {
582	"technology": 15,
583	"person": 12
584	}
585	}
586	}
587	```
588
589	### Mermaid
590
591	Graph results rendered as Mermaid diagram syntax, ready for embedding in markdown:
592
593	```bash
594	planopticon query --format mermaid "neighbors Alice"
595	```
596
597	```
598	graph LR
599	Alice["Alice"]:::person
600	Python["Python"]:::technology
601	Kubernetes["Kubernetes"]:::technology
602	Alice -- "expert_in" --> Kubernetes
603	Alice -- "works_with" --> Python
604	classDef person fill:#f9d5e5,stroke:#333
605	classDef concept fill:#eeeeee,stroke:#333
606	classDef technology fill:#d5e5f9,stroke:#333
607	classDef organization fill:#f9e5d5,stroke:#333
608	```
609
610	The `KnowledgeGraph.generate_mermaid()` method also produces full-graph Mermaid diagrams, capped at the top 30 most-connected nodes by default.
611
612	---
613
614	## Auto-Discovery
615
616	PlanOpticon automatically locates knowledge graph files using the `find_nearest_graph()` function. The search order is:
617
618	1. Current directory -- check for `knowledge_graph.db` and `knowledge_graph.json`
619	2. Common subdirectories -- `results/`, `output/`, `knowledge-base/`
620	3. Recursive downward walk -- up to 4 levels deep, skipping hidden directories
621	4. Parent directory walk -- upward through the directory tree, checking each level and its common subdirectories
622
623	Within each search phase, `.db` files are preferred over `.json` files. Results are sorted by proximity (closest first).
624
625	```python
626	from video_processor.integrators.graph_discovery import (
627	find_nearest_graph,
628	find_knowledge_graphs,
629	describe_graph,
630	)
631
632	# Find the single closest knowledge graph
633	path = find_nearest_graph()
634
635	# Find all knowledge graphs, sorted by proximity
636	paths = find_knowledge_graphs()
637
638	# Find graphs starting from a specific directory
639	paths = find_knowledge_graphs(start_dir="/path/to/project")
640
641	# Disable upward walking
642	paths = find_knowledge_graphs(walk_up=False)
643
644	# Get summary stats without loading the full graph
645	info = describe_graph(path)
646	# {"entity_count": 42, "relationship_count": 87,
647	# "entity_types": {...}, "store_type": "sqlite"}
648	```
649
650	Auto-discovery is used by the Companion REPL, the `planopticon query` command, and the planning agent when no explicit `--kb` path is provided.

M docs/guide/output-formats.md

+303 -21

		--- docs/guide/output-formats.md
		+++ docs/guide/output-formats.md
		@@ -1,47 +1,329 @@
1	1	# Output Formats
2	2
3		-PlanOpticon produces multiple output formats from each analysis run.
	3	+PlanOpticon produces a wide range of output formats from video analysis, document ingestion, batch processing, knowledge graph export, and agent skills. This page is the comprehensive reference for every format the tool can emit.
	4	+
	5	+---
4	6
5	7	## Transcripts
6	8
	9	+Video analysis always produces transcripts in three formats, stored in the `transcript/` subdirectory of the output folder.
	10	+
7	11	\| Format \| File \| Description \|
8	12	\|--------\|------\|-------------\|
9		-\| JSON \| `transcript/transcript.json` \| Full transcript with segments, timestamps, speakers \|
10		-\| Text \| `transcript/transcript.txt` \| Plain text transcript \|
11		-\| SRT \| `transcript/transcript.srt` \| Subtitle format with timestamps \|
	13	+\| JSON \| `transcript/transcript.json` \| Full transcript with segments, timestamps, speaker labels, and confidence scores. Each segment includes `start`, `end`, `text`, and optional `speaker` fields. \|
	14	+\| Text \| `transcript/transcript.txt` \| Plain text transcript with no metadata. Suitable for feeding into other tools or reading directly. \|
	15	+\| SRT \| `transcript/transcript.srt` \| SubRip subtitle format with sequential numbering and `HH:MM:SS,mmm` timestamps. Can be loaded into video players or subtitle editors. \|
	16	+
	17	+### Transcript JSON structure
	18	+
	19	+```json
	20	+{
	21	+ "segments": [
	22	+ {
	23	+ "start": 0.0,
	24	+ "end": 4.5,
	25	+ "text": "Welcome to the sprint review.",
	26	+ "speaker": "Alice"
	27	+ }
	28	+ ],
	29	+ "text": "Welcome to the sprint review. ...",
	30	+ "language": "en"
	31	+}
	32	+```
	33	+
	34	+When the `--speakers` flag is provided (e.g., `--speakers "Alice,Bob,Carol"`), speaker diarization hints are passed to the transcription provider and speaker labels appear in the JSON segments.
	35	+
	36	+---
12	37
13	38	## Reports
14	39
	40	+Analysis reports are generated from the combined transcript, diagrams, key points, action items, and knowledge graph. They live in the `results/` subdirectory.
	41	+
15	42	\| Format \| File \| Description \|
16	43	\|--------\|------\|-------------\|
17		-\| Markdown \| `results/analysis.md` \| Structured report with diagrams \|
18		-\| HTML \| `results/analysis.html` \| Self-contained HTML with mermaid.js \|
19		-\| PDF \| `results/analysis.pdf` \| Print-ready PDF (requires `planopticon[pdf]`) \|
	44	+\| Markdown \| `results/analysis.md` \| Structured report with embedded Mermaid diagram blocks, tables, and cross-references. Works in any Markdown renderer. \|
	45	+\| HTML \| `results/analysis.html` \| Self-contained HTML page with inline CSS, embedded SVG diagrams, and a bundled mermaid.js script for rendering any unrendered Mermaid blocks. No external dependencies required to view. \|
	46	+\| PDF \| `results/analysis.pdf` \| Print-ready PDF. Requires the `planopticon[pdf]` extra (`pip install planopticon[pdf]`). Generated from the HTML report. \|
	47	+
	48	+---
20	49
21	50	## Diagrams
22	51
23		-Each detected diagram produces:
	52	+Each visual element detected during frame analysis produces up to five output files in the `diagrams/` subdirectory. The index `N` is zero-based.
24	53
25	54	\| Format \| File \| Description \|
26	55	\|--------\|------\|-------------\|
27		-\| JPEG \| `diagrams/diagram_N.jpg` \| Original frame \|
28		-\| Mermaid \| `diagrams/diagram_N.mermaid` \| Mermaid source code \|
29		-\| SVG \| `diagrams/diagram_N.svg` \| Vector rendering \|
30		-\| PNG \| `diagrams/diagram_N.png` \| Raster rendering \|
31		-\| JSON \| `diagrams/diagram_N.json` \| Structured analysis data \|
	56	+\| JPEG \| `diagrams/diagram_N.jpg` \| Original video frame captured at the point of detection. \|
	57	+\| Mermaid \| `diagrams/diagram_N.mermaid` \| Mermaid source code reconstructed from the diagram by the vision model. Supports flowcharts, sequence diagrams, architecture diagrams, and more. \|
	58	+\| SVG \| `diagrams/diagram_N.svg` \| Vector rendering of the Mermaid source, produced by the Mermaid CLI or built-in renderer. \|
	59	+\| PNG \| `diagrams/diagram_N.png` \| Raster rendering of the Mermaid source at high resolution. \|
	60	+\| JSON \| `diagrams/diagram_N.json` \| Structured analysis data including diagram type, description, extracted text, chart data (if applicable), and confidence score. \|
	61	+
	62	+Frames that score as medium confidence are saved as captioned screenshots in the `captures/` subdirectory instead, with a `capture_N.jpg` and `capture_N.json` pair.
	63	+
	64	+---
	65	+
	66	+## Structured Data
32	67
33		-## Structured data
	68	+Core analysis artifacts are stored as JSON files in the `results/` subdirectory.
34	69
35	70	\| Format \| File \| Description \|
36	71	\|--------\|------\|-------------\|
37		-\| JSON \| `results/knowledge_graph.json` \| Entities and relationships \|
38		-\| JSON \| `results/key_points.json` \| Extracted key points \|
39		-\| JSON \| `results/action_items.json` \| Action items with assignees \|
40		-\| JSON \| `manifest.json` \| Complete run manifest \|
	72	+\| SQLite \| `results/knowledge_graph.db` \| Primary knowledge graph database. SQLite-based, queryable with `planopticon query`. Contains entities, relationships, source provenance, and metadata. This is the preferred format for querying and merging. \|
	73	+\| JSON \| `results/knowledge_graph.json` \| JSON export of the knowledge graph. Contains `entities` and `relationships` arrays. Automatically kept in sync with the `.db` file. Used as a fallback when SQLite is not available. \|
	74	+\| JSON \| `results/key_points.json` \| Array of extracted key points, each with `text`, `category`, and `confidence` fields. \|
	75	+\| JSON \| `results/action_items.json` \| Array of action items, each with `text`, `assignee`, `due_date`, `priority`, and `status` fields. \|
	76	+\| JSON \| `manifest.json` \| Complete run manifest. The single source of truth for the analysis run. Contains video metadata, processing stats, file paths to all outputs, and inline key points, action items, diagram metadata, and screen captures. \|
	77	+
	78	+### Knowledge graph JSON structure
	79	+
	80	+```json
	81	+{
	82	+ "entities": [
	83	+ {
	84	+ "name": "Kubernetes",
	85	+ "type": "technology",
	86	+ "descriptions": ["Container orchestration platform discussed in architecture review"],
	87	+ "occurrences": [
	88	+ {"source": "video:recording.mp4", "timestamp": "00:05:23"}
	89	+ ]
	90	+ }
	91	+ ],
	92	+ "relationships": [
	93	+ {
	94	+ "source": "Kubernetes",
	95	+ "target": "Docker",
	96	+ "type": "DEPENDS_ON",
	97	+ "descriptions": ["Kubernetes uses Docker as container runtime"]
	98	+ }
	99	+ ]
	100	+}
	101	+```
	102	+
	103	+---
41	104
42	105	## Charts
43	106
44		-When chart data is extracted from diagrams (bar, line, pie, scatter), PlanOpticon reproduces them:
	107	+When chart data is extracted from diagrams (bar charts, line charts, pie charts, scatter plots), PlanOpticon reproduces them as standalone image files.
	108	+
	109	+\| Format \| File \| Description \|
	110	+\|--------\|------\|-------------\|
	111	+\| SVG \| `diagrams/chart_N.svg` \| Vector chart rendered via matplotlib. Suitable for embedding in documents or scaling to any size. \|
	112	+\| PNG \| `diagrams/chart_N.png` \| Raster chart rendered via matplotlib at 150 DPI. \|
	113	+
	114	+Reproduced charts are also embedded inline in the HTML and PDF reports.
	115	+
	116	+---
	117	+
	118	+## Knowledge Graph Exports
	119	+
	120	+Beyond the default `knowledge_graph.db` and `knowledge_graph.json` produced during analysis, PlanOpticon supports exporting knowledge graphs to several additional formats via the `planopticon export` and `planopticon kg convert` commands.
	121	+
	122	+\| Format \| Command / File \| Description \|
	123	+\|--------\|---------------\|-------------\|
	124	+\| JSON \| `knowledge_graph.json` \| Default JSON export. Produced automatically alongside the `.db` file. \|
	125	+\| SQLite \| `knowledge_graph.db` \| Primary database format. Can be converted to/from JSON with `planopticon kg convert`. \|
	126	+\| GraphML \| `output.graphml` \| XML-based graph format via `planopticon kg convert kg.db output.graphml`. Compatible with Gephi, yEd, Cytoscape, and other graph visualization tools. \|
	127	+\| CSV \| `export/entities.csv`, `export/relationships.csv` \| Tabular export via `planopticon export markdown kg.db --type csv`. Produces separate CSV files for entities and relationships. \|
	128	+\| Mermaid \| Inline in reports \| Mermaid graph diagrams are embedded in Markdown and HTML reports. Also available programmatically via `GraphQueryEngine.to_mermaid()`. \|
	129	+
	130	+### Converting between formats
	131	+
	132	+```bash
	133	+# SQLite to JSON
	134	+planopticon kg convert results/knowledge_graph.db output.json
	135	+
	136	+# JSON to SQLite
	137	+planopticon kg convert knowledge_graph.json knowledge_graph.db
	138	+
	139	+# Sync both directions (updates the stale file)
	140	+planopticon kg sync results/knowledge_graph.db
	141	+planopticon kg sync knowledge_graph.db knowledge_graph.json --direction db-to-json
	142	+```
	143	+
	144	+---
	145	+
	146	+## PlanOpticonExchange Format
	147	+
	148	+The PlanOpticonExchange format (`.json`) is a canonical interchange payload designed for sharing knowledge graphs between PlanOpticon instances, teams, or external systems.
	149	+
	150	+```bash
	151	+planopticon export exchange knowledge_graph.db
	152	+planopticon export exchange kg.db -o exchange.json --name "My Project"
	153	+```
	154	+
	155	+The exchange payload includes:
	156	+
	157	+- Schema version for forward compatibility
	158	+- Project metadata (name, description)
	159	+- Full entity and relationship data with provenance
	160	+- Source tracking for multi-source graphs
	161	+- Merge support -- exchange files can be merged together, deduplicating entities by name
	162	+
	163	+### Exchange JSON structure
	164	+
	165	+```json
	166	+{
	167	+ "schema_version": "1.0",
	168	+ "project": {
	169	+ "name": "Sprint Reviews Q4",
	170	+ "description": "Knowledge extracted from Q4 sprint review recordings"
	171	+ },
	172	+ "entities": [...],
	173	+ "relationships": [...],
	174	+ "sources": [...]
	175	+}
	176	+```
	177	+
	178	+---
	179	+
	180	+## Document Exports
	181	+
	182	+PlanOpticon can generate structured Markdown documents from any knowledge graph, with no API key required. These are pure template-based outputs derived from the graph data.
	183	+
	184	+### Markdown document types
	185	+
	186	+There are seven document types plus a CSV export, all generated via `planopticon export markdown`:
	187	+
	188	+\| Type \| File \| Description \|
	189	+\|------\|------\|-------------\|
	190	+\| `summary` \| `executive_summary.md` \| High-level executive summary with entity counts, top relationships, and key themes. \|
	191	+\| `meeting-notes` \| `meeting_notes.md` \| Structured meeting notes with attendees, topics discussed, decisions made, and action items. \|
	192	+\| `glossary` \| `glossary.md` \| Alphabetical glossary of all entities with descriptions and types. \|
	193	+\| `relationship-map` \| `relationship_map.md` \| Textual and Mermaid-based relationship map showing how entities connect. \|
	194	+\| `status-report` \| `status_report.md` \| Status report format with progress indicators, risks, and next steps. \|
	195	+\| `entity-index` \| `entity_index.md` \| Comprehensive index of all entities grouped by type, with links to individual briefs. \|
	196	+\| `entity-brief` \| `entities/<Name>.md` \| One-pager brief for each entity, showing descriptions, relationships, and source references. \|
	197	+\| `csv` \| `entities.csv` \| Tabular CSV export of entities and relationships. \|
	198	+
	199	+```bash
	200	+# Generate all document types
	201	+planopticon export markdown knowledge_graph.db
	202	+
	203	+# Generate specific types
	204	+planopticon export markdown kg.db -o ./docs --type summary --type glossary
	205	+
	206	+# Generate meeting notes and CSV
	207	+planopticon export markdown kg.db --type meeting-notes --type csv
	208	+```
	209	+
	210	+### Obsidian vault export
	211	+
	212	+Exports the knowledge graph as an Obsidian-compatible vault with YAML frontmatter, `[[wiki-links]]` between entities, and proper folder structure.
	213	+
	214	+```bash
	215	+planopticon export obsidian knowledge_graph.db -o ./my-vault
	216	+```
	217	+
	218	+The vault includes:
	219	+
	220	+- One note per entity with frontmatter (`type`, `aliases`, `tags`)
	221	+- Wiki-links between related entities
	222	+- A `_Index.md` file for navigation
	223	+- Compatible with Obsidian graph view
	224	+
	225	+### Notion markdown export
	226	+
	227	+Exports as Notion-compatible Markdown with a CSV database file for import into Notion databases.
	228	+
	229	+```bash
	230	+planopticon export notion knowledge_graph.db -o ./notion-export
	231	+```
	232	+
	233	+### GitHub wiki export
	234	+
	235	+Generates a complete GitHub wiki with a sidebar, home page, and per-entity pages. Can be pushed directly to a GitHub wiki repository.
	236	+
	237	+```bash
	238	+# Generate wiki pages
	239	+planopticon wiki generate knowledge_graph.db -o ./wiki
	240	+
	241	+# Push to GitHub
	242	+planopticon wiki push ./wiki ConflictHQ/PlanOpticon -m "Update wiki from KG"
	243	+```
	244	+
	245	+---
	246	+
	247	+## Batch Outputs
	248	+
	249	+Batch processing produces additional files at the batch root directory, alongside per-video output folders.
	250	+
	251	+\| Format \| File \| Description \|
	252	+\|--------\|------\|-------------\|
	253	+\| JSON \| `batch_manifest.json` \| Batch-level manifest with aggregate stats, per-video status (completed/failed), error details, and paths to all sub-outputs. \|
	254	+\| Markdown \| `batch_summary.md` \| Aggregated summary report with combined key points, action items, entity counts, and a Mermaid diagram of the merged knowledge graph. \|
	255	+\| SQLite \| `knowledge_graph.db` \| Merged knowledge graph combining entities and relationships across all successfully processed videos. Uses fuzzy matching and conflict resolution. \|
	256	+\| JSON \| `knowledge_graph.json` \| JSON export of the merged knowledge graph. \|
	257	+
	258	+---
	259	+
	260	+## Self-Contained HTML Viewer
	261	+
	262	+PlanOpticon ships with a self-contained interactive knowledge graph viewer at `knowledge-base/viewer.html` in the repository. This file:
	263	+
	264	+- Uses D3.js (bundled inline, no CDN dependency)
	265	+- Renders an interactive force-directed graph visualization
	266	+- Supports node filtering by entity type
	267	+- Shows entity details and relationships on click
	268	+- Can load any `knowledge_graph.json` file
	269	+- Works offline with no server required -- just open in a browser
	270	+- Covers approximately 80% of graph exploration needs with zero infrastructure
	271	+
	272	+---
	273	+
	274	+## Output Directory Structure
	275	+
	276	+A complete single-video analysis produces the following directory tree:
	277	+
	278	+```
	279	+output/
	280	+├── manifest.json # Run manifest (source of truth)
	281	+├── transcript/
	282	+│ ├── transcript.json # Full transcript with segments
	283	+│ ├── transcript.txt # Plain text
	284	+│ └── transcript.srt # Subtitles
	285	+├── frames/
	286	+│ ├── frame_0000.jpg # Extracted video frames
	287	+│ ├── frame_0001.jpg
	288	+│ └── ...
	289	+├── diagrams/
	290	+│ ├── diagram_0.jpg # Original frame
	291	+│ ├── diagram_0.mermaid # Mermaid source
	292	+│ ├── diagram_0.svg # Vector rendering
	293	+│ ├── diagram_0.png # Raster rendering
	294	+│ ├── diagram_0.json # Analysis data
	295	+│ ├── chart_0.svg # Reproduced chart (SVG)
	296	+│ ├── chart_0.png # Reproduced chart (PNG)
	297	+│ └── ...
	298	+├── captures/
	299	+│ ├── capture_0.jpg # Medium-confidence screenshots
	300	+│ ├── capture_0.json # Caption and metadata
	301	+│ └── ...
	302	+└── results/
	303	+ ├── analysis.md # Markdown report
	304	+ ├── analysis.html # HTML report
	305	+ ├── analysis.pdf # PDF report (if planopticon[pdf] installed)
	306	+ ├── knowledge_graph.db # Knowledge graph (SQLite, primary)
	307	+ ├── knowledge_graph.json # Knowledge graph (JSON export)
	308	+ ├── key_points.json # Extracted key points
	309	+ └── action_items.json # Action items
	310	+```
	311	+
	312	+---
	313	+
	314	+## Controlling Output Format
	315	+
	316	+Use the `--output-format` flag with `planopticon analyze` to control how results are presented:
	317	+
	318	+\| Value \| Behavior \|
	319	+\|-------\|----------\|
	320	+\| `default` \| Writes all output files to disk and prints a usage summary to stdout. \|
	321	+\| `json` \| Writes all output files to disk and also emits the complete `VideoManifest` as structured JSON to stdout. Useful for piping into other tools or CI/CD pipelines. \|
	322	+
	323	+```bash
	324	+# Standard output (files + console summary)
	325	+planopticon analyze -i video.mp4 -o ./output
45	326
46		-- SVG + PNG via matplotlib
47		-- Embedded in HTML/PDF reports
	327	+# JSON manifest to stdout (for scripting)
	328	+planopticon analyze -i video.mp4 -o ./output --output-format json
	329	+```
48	330
49	331	ADDED docs/guide/planning-agent.md

	--- docs/guide/output-formats.md
	+++ docs/guide/output-formats.md
	@@ -1,47 +1,329 @@
1	# Output Formats
2
3	PlanOpticon produces multiple output formats from each analysis run.


4
5	## Transcripts
6


7	\| Format \| File \| Description \|
8	\|--------\|------\|-------------\|
9	\| JSON \| `transcript/transcript.json` \| Full transcript with segments, timestamps, speakers \|
10	\| Text \| `transcript/transcript.txt` \| Plain text transcript \|
11	\| SRT \| `transcript/transcript.srt` \| Subtitle format with timestamps \|





















12
13	## Reports
14


15	\| Format \| File \| Description \|
16	\|--------\|------\|-------------\|
17	\| Markdown \| `results/analysis.md` \| Structured report with diagrams \|
18	\| HTML \| `results/analysis.html` \| Self-contained HTML with mermaid.js \|
19	\| PDF \| `results/analysis.pdf` \| Print-ready PDF (requires `planopticon[pdf]`) \|


20
21	## Diagrams
22
23	Each detected diagram produces:
24
25	\| Format \| File \| Description \|
26	\|--------\|------\|-------------\|
27	\| JPEG \| `diagrams/diagram_N.jpg` \| Original frame \|
28	\| Mermaid \| `diagrams/diagram_N.mermaid` \| Mermaid source code \|
29	\| SVG \| `diagrams/diagram_N.svg` \| Vector rendering \|
30	\| PNG \| `diagrams/diagram_N.png` \| Raster rendering \|
31	\| JSON \| `diagrams/diagram_N.json` \| Structured analysis data \|






32
33	## Structured data
34
35	\| Format \| File \| Description \|
36	\|--------\|------\|-------------\|
37	\| JSON \| `results/knowledge_graph.json` \| Entities and relationships \|
38	\| JSON \| `results/key_points.json` \| Extracted key points \|
39	\| JSON \| `results/action_items.json` \| Action items with assignees \|
40	\| JSON \| `manifest.json` \| Complete run manifest \|




























41
42	## Charts
43
44	When chart data is extracted from diagrams (bar, line, pie, scatter), PlanOpticon reproduces them:


























































































































































































































45
46	- SVG + PNG via matplotlib
47	- Embedded in HTML/PDF reports

48
49	DDED docs/guide/planning-agent.md

	--- docs/guide/output-formats.md
	+++ docs/guide/output-formats.md
	@@ -1,47 +1,329 @@
1	# Output Formats
2
3	PlanOpticon produces a wide range of output formats from video analysis, document ingestion, batch processing, knowledge graph export, and agent skills. This page is the comprehensive reference for every format the tool can emit.
4
5	---
6
7	## Transcripts
8
9	Video analysis always produces transcripts in three formats, stored in the `transcript/` subdirectory of the output folder.
10
11	\| Format \| File \| Description \|
12	\|--------\|------\|-------------\|
13	\| JSON \| `transcript/transcript.json` \| Full transcript with segments, timestamps, speaker labels, and confidence scores. Each segment includes `start`, `end`, `text`, and optional `speaker` fields. \|
14	\| Text \| `transcript/transcript.txt` \| Plain text transcript with no metadata. Suitable for feeding into other tools or reading directly. \|
15	\| SRT \| `transcript/transcript.srt` \| SubRip subtitle format with sequential numbering and `HH:MM:SS,mmm` timestamps. Can be loaded into video players or subtitle editors. \|
16
17	### Transcript JSON structure
18
19	```json
20	{
21	"segments": [
22	{
23	"start": 0.0,
24	"end": 4.5,
25	"text": "Welcome to the sprint review.",
26	"speaker": "Alice"
27	}
28	],
29	"text": "Welcome to the sprint review. ...",
30	"language": "en"
31	}
32	```
33
34	When the `--speakers` flag is provided (e.g., `--speakers "Alice,Bob,Carol"`), speaker diarization hints are passed to the transcription provider and speaker labels appear in the JSON segments.
35
36	---
37
38	## Reports
39
40	Analysis reports are generated from the combined transcript, diagrams, key points, action items, and knowledge graph. They live in the `results/` subdirectory.
41
42	\| Format \| File \| Description \|
43	\|--------\|------\|-------------\|
44	\| Markdown \| `results/analysis.md` \| Structured report with embedded Mermaid diagram blocks, tables, and cross-references. Works in any Markdown renderer. \|
45	\| HTML \| `results/analysis.html` \| Self-contained HTML page with inline CSS, embedded SVG diagrams, and a bundled mermaid.js script for rendering any unrendered Mermaid blocks. No external dependencies required to view. \|
46	\| PDF \| `results/analysis.pdf` \| Print-ready PDF. Requires the `planopticon[pdf]` extra (`pip install planopticon[pdf]`). Generated from the HTML report. \|
47
48	---
49
50	## Diagrams
51
52	Each visual element detected during frame analysis produces up to five output files in the `diagrams/` subdirectory. The index `N` is zero-based.
53
54	\| Format \| File \| Description \|
55	\|--------\|------\|-------------\|
56	\| JPEG \| `diagrams/diagram_N.jpg` \| Original video frame captured at the point of detection. \|
57	\| Mermaid \| `diagrams/diagram_N.mermaid` \| Mermaid source code reconstructed from the diagram by the vision model. Supports flowcharts, sequence diagrams, architecture diagrams, and more. \|
58	\| SVG \| `diagrams/diagram_N.svg` \| Vector rendering of the Mermaid source, produced by the Mermaid CLI or built-in renderer. \|
59	\| PNG \| `diagrams/diagram_N.png` \| Raster rendering of the Mermaid source at high resolution. \|
60	\| JSON \| `diagrams/diagram_N.json` \| Structured analysis data including diagram type, description, extracted text, chart data (if applicable), and confidence score. \|
61
62	Frames that score as medium confidence are saved as captioned screenshots in the `captures/` subdirectory instead, with a `capture_N.jpg` and `capture_N.json` pair.
63
64	---
65
66	## Structured Data
67
68	Core analysis artifacts are stored as JSON files in the `results/` subdirectory.
69
70	\| Format \| File \| Description \|
71	\|--------\|------\|-------------\|
72	\| SQLite \| `results/knowledge_graph.db` \| Primary knowledge graph database. SQLite-based, queryable with `planopticon query`. Contains entities, relationships, source provenance, and metadata. This is the preferred format for querying and merging. \|
73	\| JSON \| `results/knowledge_graph.json` \| JSON export of the knowledge graph. Contains `entities` and `relationships` arrays. Automatically kept in sync with the `.db` file. Used as a fallback when SQLite is not available. \|
74	\| JSON \| `results/key_points.json` \| Array of extracted key points, each with `text`, `category`, and `confidence` fields. \|
75	\| JSON \| `results/action_items.json` \| Array of action items, each with `text`, `assignee`, `due_date`, `priority`, and `status` fields. \|
76	\| JSON \| `manifest.json` \| Complete run manifest. The single source of truth for the analysis run. Contains video metadata, processing stats, file paths to all outputs, and inline key points, action items, diagram metadata, and screen captures. \|
77
78	### Knowledge graph JSON structure
79
80	```json
81	{
82	"entities": [
83	{
84	"name": "Kubernetes",
85	"type": "technology",
86	"descriptions": ["Container orchestration platform discussed in architecture review"],
87	"occurrences": [
88	{"source": "video:recording.mp4", "timestamp": "00:05:23"}
89	]
90	}
91	],
92	"relationships": [
93	{
94	"source": "Kubernetes",
95	"target": "Docker",
96	"type": "DEPENDS_ON",
97	"descriptions": ["Kubernetes uses Docker as container runtime"]
98	}
99	]
100	}
101	```
102
103	---
104
105	## Charts
106
107	When chart data is extracted from diagrams (bar charts, line charts, pie charts, scatter plots), PlanOpticon reproduces them as standalone image files.
108
109	\| Format \| File \| Description \|
110	\|--------\|------\|-------------\|
111	\| SVG \| `diagrams/chart_N.svg` \| Vector chart rendered via matplotlib. Suitable for embedding in documents or scaling to any size. \|
112	\| PNG \| `diagrams/chart_N.png` \| Raster chart rendered via matplotlib at 150 DPI. \|
113
114	Reproduced charts are also embedded inline in the HTML and PDF reports.
115
116	---
117
118	## Knowledge Graph Exports
119
120	Beyond the default `knowledge_graph.db` and `knowledge_graph.json` produced during analysis, PlanOpticon supports exporting knowledge graphs to several additional formats via the `planopticon export` and `planopticon kg convert` commands.
121
122	\| Format \| Command / File \| Description \|
123	\|--------\|---------------\|-------------\|
124	\| JSON \| `knowledge_graph.json` \| Default JSON export. Produced automatically alongside the `.db` file. \|
125	\| SQLite \| `knowledge_graph.db` \| Primary database format. Can be converted to/from JSON with `planopticon kg convert`. \|
126	\| GraphML \| `output.graphml` \| XML-based graph format via `planopticon kg convert kg.db output.graphml`. Compatible with Gephi, yEd, Cytoscape, and other graph visualization tools. \|
127	\| CSV \| `export/entities.csv`, `export/relationships.csv` \| Tabular export via `planopticon export markdown kg.db --type csv`. Produces separate CSV files for entities and relationships. \|
128	\| Mermaid \| Inline in reports \| Mermaid graph diagrams are embedded in Markdown and HTML reports. Also available programmatically via `GraphQueryEngine.to_mermaid()`. \|
129
130	### Converting between formats
131
132	```bash
133	# SQLite to JSON
134	planopticon kg convert results/knowledge_graph.db output.json
135
136	# JSON to SQLite
137	planopticon kg convert knowledge_graph.json knowledge_graph.db
138
139	# Sync both directions (updates the stale file)
140	planopticon kg sync results/knowledge_graph.db
141	planopticon kg sync knowledge_graph.db knowledge_graph.json --direction db-to-json
142	```
143
144	---
145
146	## PlanOpticonExchange Format
147
148	The PlanOpticonExchange format (`.json`) is a canonical interchange payload designed for sharing knowledge graphs between PlanOpticon instances, teams, or external systems.
149
150	```bash
151	planopticon export exchange knowledge_graph.db
152	planopticon export exchange kg.db -o exchange.json --name "My Project"
153	```
154
155	The exchange payload includes:
156
157	- Schema version for forward compatibility
158	- Project metadata (name, description)
159	- Full entity and relationship data with provenance
160	- Source tracking for multi-source graphs
161	- Merge support -- exchange files can be merged together, deduplicating entities by name
162
163	### Exchange JSON structure
164
165	```json
166	{
167	"schema_version": "1.0",
168	"project": {
169	"name": "Sprint Reviews Q4",
170	"description": "Knowledge extracted from Q4 sprint review recordings"
171	},
172	"entities": [...],
173	"relationships": [...],
174	"sources": [...]
175	}
176	```
177
178	---
179
180	## Document Exports
181
182	PlanOpticon can generate structured Markdown documents from any knowledge graph, with no API key required. These are pure template-based outputs derived from the graph data.
183
184	### Markdown document types
185
186	There are seven document types plus a CSV export, all generated via `planopticon export markdown`:
187
188	\| Type \| File \| Description \|
189	\|------\|------\|-------------\|
190	\| `summary` \| `executive_summary.md` \| High-level executive summary with entity counts, top relationships, and key themes. \|
191	\| `meeting-notes` \| `meeting_notes.md` \| Structured meeting notes with attendees, topics discussed, decisions made, and action items. \|
192	\| `glossary` \| `glossary.md` \| Alphabetical glossary of all entities with descriptions and types. \|
193	\| `relationship-map` \| `relationship_map.md` \| Textual and Mermaid-based relationship map showing how entities connect. \|
194	\| `status-report` \| `status_report.md` \| Status report format with progress indicators, risks, and next steps. \|
195	\| `entity-index` \| `entity_index.md` \| Comprehensive index of all entities grouped by type, with links to individual briefs. \|
196	\| `entity-brief` \| `entities/<Name>.md` \| One-pager brief for each entity, showing descriptions, relationships, and source references. \|
197	\| `csv` \| `entities.csv` \| Tabular CSV export of entities and relationships. \|
198
199	```bash
200	# Generate all document types
201	planopticon export markdown knowledge_graph.db
202
203	# Generate specific types
204	planopticon export markdown kg.db -o ./docs --type summary --type glossary
205
206	# Generate meeting notes and CSV
207	planopticon export markdown kg.db --type meeting-notes --type csv
208	```
209
210	### Obsidian vault export
211
212	Exports the knowledge graph as an Obsidian-compatible vault with YAML frontmatter, `[[wiki-links]]` between entities, and proper folder structure.
213
214	```bash
215	planopticon export obsidian knowledge_graph.db -o ./my-vault
216	```
217
218	The vault includes:
219
220	- One note per entity with frontmatter (`type`, `aliases`, `tags`)
221	- Wiki-links between related entities
222	- A `_Index.md` file for navigation
223	- Compatible with Obsidian graph view
224
225	### Notion markdown export
226
227	Exports as Notion-compatible Markdown with a CSV database file for import into Notion databases.
228
229	```bash
230	planopticon export notion knowledge_graph.db -o ./notion-export
231	```
232
233	### GitHub wiki export
234
235	Generates a complete GitHub wiki with a sidebar, home page, and per-entity pages. Can be pushed directly to a GitHub wiki repository.
236
237	```bash
238	# Generate wiki pages
239	planopticon wiki generate knowledge_graph.db -o ./wiki
240
241	# Push to GitHub
242	planopticon wiki push ./wiki ConflictHQ/PlanOpticon -m "Update wiki from KG"
243	```
244
245	---
246
247	## Batch Outputs
248
249	Batch processing produces additional files at the batch root directory, alongside per-video output folders.
250
251	\| Format \| File \| Description \|
252	\|--------\|------\|-------------\|
253	\| JSON \| `batch_manifest.json` \| Batch-level manifest with aggregate stats, per-video status (completed/failed), error details, and paths to all sub-outputs. \|
254	\| Markdown \| `batch_summary.md` \| Aggregated summary report with combined key points, action items, entity counts, and a Mermaid diagram of the merged knowledge graph. \|
255	\| SQLite \| `knowledge_graph.db` \| Merged knowledge graph combining entities and relationships across all successfully processed videos. Uses fuzzy matching and conflict resolution. \|
256	\| JSON \| `knowledge_graph.json` \| JSON export of the merged knowledge graph. \|
257
258	---
259
260	## Self-Contained HTML Viewer
261
262	PlanOpticon ships with a self-contained interactive knowledge graph viewer at `knowledge-base/viewer.html` in the repository. This file:
263
264	- Uses D3.js (bundled inline, no CDN dependency)
265	- Renders an interactive force-directed graph visualization
266	- Supports node filtering by entity type
267	- Shows entity details and relationships on click
268	- Can load any `knowledge_graph.json` file
269	- Works offline with no server required -- just open in a browser
270	- Covers approximately 80% of graph exploration needs with zero infrastructure
271
272	---
273
274	## Output Directory Structure
275
276	A complete single-video analysis produces the following directory tree:
277
278	```
279	output/
280	├── manifest.json # Run manifest (source of truth)
281	├── transcript/
282	│ ├── transcript.json # Full transcript with segments
283	│ ├── transcript.txt # Plain text
284	│ └── transcript.srt # Subtitles
285	├── frames/
286	│ ├── frame_0000.jpg # Extracted video frames
287	│ ├── frame_0001.jpg
288	│ └── ...
289	├── diagrams/
290	│ ├── diagram_0.jpg # Original frame
291	│ ├── diagram_0.mermaid # Mermaid source
292	│ ├── diagram_0.svg # Vector rendering
293	│ ├── diagram_0.png # Raster rendering
294	│ ├── diagram_0.json # Analysis data
295	│ ├── chart_0.svg # Reproduced chart (SVG)
296	│ ├── chart_0.png # Reproduced chart (PNG)
297	│ └── ...
298	├── captures/
299	│ ├── capture_0.jpg # Medium-confidence screenshots
300	│ ├── capture_0.json # Caption and metadata
301	│ └── ...
302	└── results/
303	├── analysis.md # Markdown report
304	├── analysis.html # HTML report
305	├── analysis.pdf # PDF report (if planopticon[pdf] installed)
306	├── knowledge_graph.db # Knowledge graph (SQLite, primary)
307	├── knowledge_graph.json # Knowledge graph (JSON export)
308	├── key_points.json # Extracted key points
309	└── action_items.json # Action items
310	```
311
312	---
313
314	## Controlling Output Format
315
316	Use the `--output-format` flag with `planopticon analyze` to control how results are presented:
317
318	\| Value \| Behavior \|
319	\|-------\|----------\|
320	\| `default` \| Writes all output files to disk and prints a usage summary to stdout. \|
321	\| `json` \| Writes all output files to disk and also emits the complete `VideoManifest` as structured JSON to stdout. Useful for piping into other tools or CI/CD pipelines. \|
322
323	```bash
324	# Standard output (files + console summary)
325	planopticon analyze -i video.mp4 -o ./output
326
327	# JSON manifest to stdout (for scripting)
328	planopticon analyze -i video.mp4 -o ./output --output-format json
329	```
330
331	DDED docs/guide/planning-agent.md

A docs/guide/planning-agent.md

+425

		--- a/docs/guide/planning-agent.md
		+++ b/docs/guide/planning-agent.md
		@@ -0,0 +1,425 @@
	1	+# Planning Agent
	2	+
	3	+The Planning Agent is PlanOpticon's AI-powered system for synthesizing knowledge graph content into structured planning artifacts. It takes extracted entities and relationships from video analyses, document ingestions, and other sources, then uses LLM reasoning to produce project plans, PRDs, roadmaps, task breakdowns, GitHub issues, and more.
	4	+
	5	+---
	6	+
	7	+## How It Works
	8	+
	9	+The Planning Agent operates through a three-stage pipeline:
	10	+
	11	+### 1. Context Assembly
	12	+
	13	+The agent gathers context from all available sources:
	14	+
	15	+- Knowledge graph -- entity counts, types, relationships, and planning entities from the loaded KG
	16	+- Query engine -- used to pull stats, entity lists, and relationship data for prompt construction
	17	+- Provider manager -- the configured LLM provider used for generation
	18	+- Prior artifacts -- any artifacts already generated in the session (skills can chain off each other)
	19	+- Conversation history -- accumulated chat messages when running in interactive mode
	20	+
	21	+This context is bundled into an `AgentContext` dataclass that is shared across all skills.
	22	+
	23	+### 2. Skill Selection
	24	+
	25	+When the agent receives a user request, it determines which skills to run:
	26	+
	27	+LLM-driven planning (with provider). The agent constructs a prompt that includes the knowledge base summary, all available skill names and descriptions, and the user's request. The LLM returns a JSON array of skill names to execute in order, along with any parameters. For example, given "Create a project plan and break it into tasks," the LLM might select `["project_plan", "task_breakdown"]`.
	28	+
	29	+Keyword fallback (without provider). If no LLM provider is available, the agent falls back to simple keyword matching. It splits each skill name on underscores and checks whether any of those words appear in the user's request. For example, the request "generate a roadmap" would match the `roadmap` skill because "roadmap" appears in both the request and the skill name.
	30	+
	31	+### 3. Execution
	32	+
	33	+Selected skills are executed sequentially. Each skill:
	34	+
	35	+1. Checks `can_execute()` to verify the required context is available (by default, both a knowledge graph and an LLM provider must be present)
	36	+2. Pulls relevant data from the knowledge graph via the query engine
	37	+3. Constructs a detailed prompt for the LLM with extracted context
	38	+4. Calls the LLM and parses the response
	39	+5. Returns an `Artifact` object containing the generated content
	40	+
	41	+Each artifact is appended to `context.artifacts`, making it available to subsequent skills. This enables chaining -- for example, `task_breakdown` can feed into `github_issues`.
	42	+
	43	+---
	44	+
	45	+## AgentContext
	46	+
	47	+The `AgentContext` dataclass is the shared state object that connects all components of the planning agent system.
	48	+
	49	+```python
	50	+@dataclass
	51	+class AgentContext:
	52	+ knowledge_graph: Any = None # KnowledgeGraph instance
	53	+ query_engine: Any = None # GraphQueryEngine instance
	54	+ provider_manager: Any = None # ProviderManager instance
	55	+ planning_entities: List[Any] = field(default_factory=list)
	56	+ user_requirements: Dict[str, Any] = field(default_factory=dict)
	57	+ conversation_history: List[Dict[str, str]] = field(default_factory=list)
	58	+ artifacts: List[Artifact] = field(default_factory=list)
	59	+ config: Dict[str, Any] = field(default_factory=dict)
	60	+```
	61	+
	62	+\| Field \| Purpose \|
	63	+\|---\|---\|
	64	+\| `knowledge_graph` \| The loaded `KnowledgeGraph` instance; provides access to entities, relationships, and graph operations \|
	65	+\| `query_engine` \| A `GraphQueryEngine` for running structured queries (stats, entities, neighbors, relationships) \|
	66	+\| `provider_manager` \| The `ProviderManager` that handles LLM API calls across providers \|
	67	+\| `planning_entities` \| Entities classified into the planning taxonomy (goals, requirements, risks, etc.) \|
	68	+\| `user_requirements` \| Structured requirements gathered from the `requirements_chat` skill \|
	69	+\| `conversation_history` \| Accumulated chat messages for interactive sessions \|
	70	+\| `artifacts` \| All artifacts generated during the session, enabling skill chaining \|
	71	+\| `config` \| Arbitrary configuration overrides \|
	72	+
	73	+---
	74	+
	75	+## Artifacts
	76	+
	77	+Every skill returns an `Artifact` dataclass:
	78	+
	79	+```python
	80	+@dataclass
	81	+class Artifact:
	82	+ name: str # Human-readable name (e.g., "Project Plan")
	83	+ content: str # The generated content (markdown, JSON, etc.)
	84	+ artifact_type: str # Type identifier: "project_plan", "prd", "roadmap", etc.
	85	+ format: str = "markdown" # Content format: "markdown", "json", "mermaid"
	86	+ metadata: Dict[str, Any] = field(default_factory=dict)
	87	+```
	88	+
	89	+Artifacts are the currency of the agent system. They can be:
	90	+
	91	+- Displayed directly in the Companion REPL
	92	+- Exported to disk via the `artifact_export` skill
	93	+- Pushed to external tools via the `cli_adapter` skill
	94	+- Chained into other skills (e.g., task breakdown feeds into GitHub issues)
	95	+
	96	+---
	97	+
	98	+## Skills Reference
	99	+
	100	+The agent ships with 11 built-in skills. Each skill is a class that extends `Skill` and self-registers at import time via `register_skill()`.
	101	+
	102	+### project_plan
	103	+
	104	+Description: Generate a structured project plan from knowledge graph.
	105	+
	106	+Pulls the full knowledge graph context (stats, entities, relationships, and planning entities grouped by type) and asks the LLM to produce a comprehensive project plan with:
	107	+
	108	+1. Executive Summary
	109	+2. Goals and Objectives
	110	+3. Scope
	111	+4. Phases and Milestones
	112	+5. Resource Requirements
	113	+6. Risks and Mitigations
	114	+7. Success Criteria
	115	+
	116	+Artifact type: `project_plan` \| Format: markdown
	117	+
	118	+### prd
	119	+
	120	+Description: Generate a product requirements document (PRD) / feature spec.
	121	+
	122	+Filters planning entities to those of type `requirement`, `feature`, and `constraint`, then asks the LLM to generate a PRD with:
	123	+
	124	+1. Problem Statement
	125	+2. User Stories
	126	+3. Functional Requirements
	127	+4. Non-Functional Requirements
	128	+5. Acceptance Criteria
	129	+6. Out of Scope
	130	+
	131	+If no pre-filtered entities match, the LLM derives requirements from the full knowledge graph context.
	132	+
	133	+Artifact type: `prd` \| Format: markdown
	134	+
	135	+### roadmap
	136	+
	137	+Description: Generate a product/project roadmap.
	138	+
	139	+Focuses on planning entities of type `milestone`, `feature`, and `dependency`. Asks the LLM to produce a roadmap with:
	140	+
	141	+1. Vision and Strategy
	142	+2. Phases (with timeline estimates)
	143	+3. Key Dependencies
	144	+4. A Mermaid Gantt chart summarizing the timeline
	145	+
	146	+Artifact type: `roadmap` \| Format: markdown
	147	+
	148	+### task_breakdown
	149	+
	150	+Description: Break down goals into tasks with dependencies.
	151	+
	152	+Focuses on planning entities of type `goal`, `feature`, and `milestone`. Returns a JSON array of task objects, each containing:
	153	+
	154	+\| Field \| Type \| Description \|
	155	+\|---\|---\|---\|
	156	+\| `id` \| string \| Task identifier (e.g., "T1", "T2") \|
	157	+\| `title` \| string \| Short task title \|
	158	+\| `description` \| string \| Detailed description \|
	159	+\| `depends_on` \| list \| IDs of prerequisite tasks \|
	160	+\| `priority` \| string \| `high`, `medium`, or `low` \|
	161	+\| `estimate` \| string \| Effort estimate (e.g., "2d", "1w") \|
	162	+\| `assignee_role` \| string \| Role needed to perform the task \|
	163	+
	164	+Artifact type: `task_list` \| Format: json
	165	+
	166	+### github_issues
	167	+
	168	+Description: Generate GitHub issues from task breakdown.
	169	+
	170	+Converts tasks into GitHub-ready issue objects. If a `task_list` artifact exists in the context, it is used as input. Otherwise, minimal issues are generated from the planning entities directly.
	171	+
	172	+Each issue includes a formatted body with description, priority, estimate, and dependencies, plus labels derived from the task priority.
	173	+
	174	+The skill also provides a `push_to_github(issues_json, repo)` function that shells out to the `gh` CLI to create actual issues. This is used by the `cli_adapter` skill.
	175	+
	176	+Artifact type: `issues` \| Format: json
	177	+
	178	+### requirements_chat
	179	+
	180	+Description: Interactive requirements gathering via guided questions.
	181	+
	182	+Generates a structured requirements questionnaire based on the knowledge graph context. The questionnaire contains 8-12 targeted questions, each with:
	183	+
	184	+\| Field \| Type \| Description \|
	185	+\|---\|---\|---\|
	186	+\| `id` \| string \| Question identifier (e.g., "Q1") \|
	187	+\| `category` \| string \| `goals`, `constraints`, `priorities`, or `scope` \|
	188	+\| `question` \| string \| The question text \|
	189	+\| `context` \| string \| Why this question matters \|
	190	+
	191	+The skill also provides a `gather_requirements(context, answers)` method that takes the completed Q&A and synthesizes structured requirements (goals, constraints, priorities, scope).
	192	+
	193	+Artifact type: `requirements` \| Format: json
	194	+
	195	+### doc_generator
	196	+
	197	+Description: Generate technical documentation, ADRs, or meeting notes.
	198	+
	199	+Supports three document types, selected via the `doc_type` parameter:
	200	+
	201	+\| `doc_type` \| Output Structure \|
	202	+\|---\|---\|
	203	+\| `technical_doc` (default) \| Overview, Architecture, Components and Interfaces, Data Flow, Deployment and Configuration, API Reference \|
	204	+\| `adr` \| Title, Status (Proposed), Context, Decision, Consequences, Alternatives Considered \|
	205	+\| `meeting_notes` \| Meeting Summary, Key Discussion Points, Decisions Made, Action Items (with owners), Open Questions, Next Steps \|
	206	+
	207	+Artifact type: `document` \| Format: markdown
	208	+
	209	+### artifact_export
	210	+
	211	+Description: Export artifacts in agent-ready formats.
	212	+
	213	+Writes all artifacts accumulated in the context to a directory structure. Each artifact is written to a file based on its type:
	214	+
	215	+\| Artifact Type \| Filename \|
	216	+\|---\|---\|
	217	+\| `project_plan` \| `project_plan.md` \|
	218	+\| `prd` \| `prd.md` \|
	219	+\| `roadmap` \| `roadmap.md` \|
	220	+\| `task_list` \| `tasks.json` \|
	221	+\| `issues` \| `issues.json` \|
	222	+\| `requirements` \| `requirements.json` \|
	223	+\| `document` \| `docs/<name>.md` \|
	224	+
	225	+A `manifest.json` is written alongside, listing all exported files with their names, types, and formats.
	226	+
	227	+Artifact type: `export_manifest` \| Format: json
	228	+
	229	+Accepts an `output_dir` parameter (defaults to `plan/`).
	230	+
	231	+### cli_adapter
	232	+
	233	+Description: Push artifacts to external tools via their CLIs.
	234	+
	235	+Converts artifacts into CLI commands for external project management tools. Supported tools:
	236	+
	237	+\| Tool \| CLI \| Example Command \|
	238	+\|---\|---\|---\|
	239	+\| `github` \| `gh` \| `gh issue create --title "..." --body "..." --label "..."` \|
	240	+\| `jira` \| `jira` \| `jira issue create --summary "..." --description "..."` \|
	241	+\| `linear` \| `linear` \| `linear issue create --title "..." --description "..."` \|
	242	+
	243	+The skill checks whether the target CLI is available on the system and includes that status in the output. Commands are generated in dry-run mode by default.
	244	+
	245	+Artifact type: `cli_commands` \| Format: json
	246	+
	247	+### notes_export
	248	+
	249	+Description: Export knowledge graph as structured notes (Obsidian, Notion).
	250	+
	251	+Exports the entire knowledge graph as a collection of markdown files optimized for a specific note-taking platform. Accepts a `format` parameter:
	252	+
	253	+Obsidian format creates:
	254	+
	255	+- One `.md` file per entity with YAML frontmatter, tags, and `[[wiki-links]]`
	256	+- An `_Index.md` Map of Content grouping entities by type
	257	+- Tag pages for each entity type
	258	+- Artifact notes for any generated artifacts
	259	+
	260	+Notion format creates:
	261	+
	262	+- One `.md` file per entity with Notion-style callout blocks and relationship tables
	263	+- An `entities_database.csv` for bulk import into a Notion database
	264	+- An `Overview.md` page with stats and entity listings
	265	+- Artifact pages
	266	+
	267	+Artifact type: `notes_export` \| Format: markdown
	268	+
	269	+### wiki_generator
	270	+
	271	+Description: Generate a GitHub wiki from knowledge graph and artifacts.
	272	+
	273	+Generates a complete GitHub wiki structure as a dictionary of page names to markdown content. Creates:
	274	+
	275	+- Home page with entity type counts and links
	276	+- _Sidebar navigation with entity types and artifacts
	277	+- Type index pages with tables of entities per type
	278	+- Individual entity pages with descriptions, outgoing/incoming relationships, and source occurrences
	279	+- Artifact pages for any generated planning artifacts
	280	+
	281	+The skill also provides standalone functions `write_wiki(pages, output_dir)` to write pages to disk and `push_wiki(wiki_dir, repo)` to push directly to a GitHub wiki repository.
	282	+
	283	+Artifact type: `wiki` \| Format: markdown
	284	+
	285	+---
	286	+
	287	+## CLI Usage
	288	+
	289	+### One-shot execution
	290	+
	291	+Run the agent with a request string. The agent selects and executes appropriate skills automatically.
	292	+
	293	+```bash
	294	+# Generate a project plan
	295	+planopticon agent "Create a project plan" --kb ./results
	296	+
	297	+# Generate a PRD
	298	+planopticon agent "Write a PRD for the authentication system" --kb ./results
	299	+
	300	+# Break down into tasks
	301	+planopticon agent "Break this into tasks and estimate effort" --kb ./results
	302	+```
	303	+
	304	+### Export artifacts to disk
	305	+
	306	+Use `--export` to write generated artifacts to a directory:
	307	+
	308	+```bash
	309	+planopticon agent "Create a full project plan with tasks" --kb ./results --export ./output
	310	+```
	311	+
	312	+### Interactive mode
	313	+
	314	+Use `-I` for a multi-turn session where you can issue multiple requests:
	315	+
	316	+```bash
	317	+planopticon agent -I --kb ./results
	318	+```
	319	+
	320	+In interactive mode, the agent supports:
	321	+
	322	+- Free-text requests (executed via LLM skill selection)
	323	+- `/plan` -- shortcut to generate a project plan
	324	+- `/skills` -- list available skills
	325	+- `quit`, `exit`, `q` -- end the session
	326	+
	327	+### Provider and model options
	328	+
	329	+```bash
	330	+# Use a specific provider
	331	+planopticon agent "Create a roadmap" --kb ./results -p anthropic
	332	+
	333	+# Use a specific model
	334	+planopticon agent "Generate a PRD" --kb ./results --chat-model gpt-4o
	335	+```
	336	+
	337	+### Auto-discovery
	338	+
	339	+If `--kb` is not specified, the agent uses `KBContext.auto_discover()` to find knowledge graphs in the workspace.
	340	+
	341	+---
	342	+
	343	+## Using Skills from the Companion REPL
	344	+
	345	+The Companion REPL provides direct access to agent skills through slash commands. See the [Companion guide](companion.md) for full details.
	346	+
	347	+\| Companion Command \| Skill Executed \|
	348	+\|---\|---\|
	349	+\| `/plan` \| `project_plan` \|
	350	+\| `/prd` \| `prd` \|
	351	+\| `/tasks` \| `task_breakdown` \|
	352	+\| `/run SKILL_NAME` \| Any registered skill by name \|
	353	+
	354	+When executed from the Companion, skills use the same `AgentContext` that powers the chat mode. This means:
	355	+
	356	+- The knowledge graph loaded at startup is automatically available
	357	+- The active LLM provider (set via `/provider` or `/model`) is used for generation
	358	+- Generated artifacts accumulate across the session, enabling chaining
	359	+
	360	+---
	361	+
	362	+## Example Workflows
	363	+
	364	+### From video to project plan
	365	+
	366	+```bash
	367	+# 1. Analyze a video
	368	+planopticon analyze -i sprint-review.mp4 -o results/
	369	+
	370	+# 2. Launch the agent with the results
	371	+planopticon agent "Create a comprehensive project plan with tasks and a roadmap" \
	372	+ --kb results/ --export plan/
	373	+
	374	+# 3. Review the generated artifacts
	375	+ls plan/
	376	+# project_plan.md roadmap.md tasks.json manifest.json
	377	+```
	378	+
	379	+### Interactive planning session
	380	+
	381	+```bash
	382	+$ planopticon companion --kb ./results
	383	+
	384	+planopticon> /status
	385	+Workspace status:
	386	+ KG: knowledge_graph.db (58 entities, 124 relationships)
	387	+ ...
	388	+
	389	+planopticon> What are the main goals discussed?
	390	+Based on the knowledge graph, the main goals are...
	391	+
	392	+planopticon> /plan
	393	+--- Project Plan (project_plan) ---
	394	+...
	395	+
	396	+planopticon> /tasks
	397	+--- Task Breakdown (task_list) ---
	398	+...
	399	+
	400	+planopticon> /run github_issues
	401	+--- GitHub Issues (issues) ---
	402	+[
	403	+ {"title": "Set up authentication service", ...},
	404	+ ...
	405	+]
	406	+
	407	+planopticon> /run artifact_export
	408	+--- Export Manifest (export_manifest) ---
	409	+{
	410	+ "artifact_count": 3,
	411	+ "output_dir": "plan",
	412	+ "files": [...]
	413	+}
	414	+```
	415	+
	416	+### Skill chaining
	417	+
	418	+Skills that produce artifacts make them available to subsequent skills automatically:
	419	+
	420	+1. `/tasks` generates a `task_list` artifact
	421	+2. `/run github_issues` detects the existing `task_list` artifact and converts its tasks into GitHub issues
	422	+3. `/run cli_adapter` takes the most recent artifact and generates `gh issue create` commands
	423	+4. `/run artifact_export` writes all accumulated artifacts to disk with a manifest
	424	+
	425	+This chaining works both in the Companion REPL and in one-shot agent execution, since the `AgentContext.artifacts` list persists for the duration of the session.

	--- a/docs/guide/planning-agent.md
	+++ b/docs/guide/planning-agent.md
	@@ -0,0 +1,425 @@

	--- a/docs/guide/planning-agent.md
	+++ b/docs/guide/planning-agent.md
	@@ -0,0 +1,425 @@
1	# Planning Agent
2
3	The Planning Agent is PlanOpticon's AI-powered system for synthesizing knowledge graph content into structured planning artifacts. It takes extracted entities and relationships from video analyses, document ingestions, and other sources, then uses LLM reasoning to produce project plans, PRDs, roadmaps, task breakdowns, GitHub issues, and more.
4
5	---
6
7	## How It Works
8
9	The Planning Agent operates through a three-stage pipeline:
10
11	### 1. Context Assembly
12
13	The agent gathers context from all available sources:
14
15	- Knowledge graph -- entity counts, types, relationships, and planning entities from the loaded KG
16	- Query engine -- used to pull stats, entity lists, and relationship data for prompt construction
17	- Provider manager -- the configured LLM provider used for generation
18	- Prior artifacts -- any artifacts already generated in the session (skills can chain off each other)
19	- Conversation history -- accumulated chat messages when running in interactive mode
20
21	This context is bundled into an `AgentContext` dataclass that is shared across all skills.
22
23	### 2. Skill Selection
24
25	When the agent receives a user request, it determines which skills to run:
26
27	LLM-driven planning (with provider). The agent constructs a prompt that includes the knowledge base summary, all available skill names and descriptions, and the user's request. The LLM returns a JSON array of skill names to execute in order, along with any parameters. For example, given "Create a project plan and break it into tasks," the LLM might select `["project_plan", "task_breakdown"]`.
28
29	Keyword fallback (without provider). If no LLM provider is available, the agent falls back to simple keyword matching. It splits each skill name on underscores and checks whether any of those words appear in the user's request. For example, the request "generate a roadmap" would match the `roadmap` skill because "roadmap" appears in both the request and the skill name.
30
31	### 3. Execution
32
33	Selected skills are executed sequentially. Each skill:
34
35	1. Checks `can_execute()` to verify the required context is available (by default, both a knowledge graph and an LLM provider must be present)
36	2. Pulls relevant data from the knowledge graph via the query engine
37	3. Constructs a detailed prompt for the LLM with extracted context
38	4. Calls the LLM and parses the response
39	5. Returns an `Artifact` object containing the generated content
40
41	Each artifact is appended to `context.artifacts`, making it available to subsequent skills. This enables chaining -- for example, `task_breakdown` can feed into `github_issues`.
42
43	---
44
45	## AgentContext
46
47	The `AgentContext` dataclass is the shared state object that connects all components of the planning agent system.
48
49	```python
50	@dataclass
51	class AgentContext:
52	knowledge_graph: Any = None # KnowledgeGraph instance
53	query_engine: Any = None # GraphQueryEngine instance
54	provider_manager: Any = None # ProviderManager instance
55	planning_entities: List[Any] = field(default_factory=list)
56	user_requirements: Dict[str, Any] = field(default_factory=dict)
57	conversation_history: List[Dict[str, str]] = field(default_factory=list)
58	artifacts: List[Artifact] = field(default_factory=list)
59	config: Dict[str, Any] = field(default_factory=dict)
60	```
61
62	\| Field \| Purpose \|
63	\|---\|---\|
64	\| `knowledge_graph` \| The loaded `KnowledgeGraph` instance; provides access to entities, relationships, and graph operations \|
65	\| `query_engine` \| A `GraphQueryEngine` for running structured queries (stats, entities, neighbors, relationships) \|
66	\| `provider_manager` \| The `ProviderManager` that handles LLM API calls across providers \|
67	\| `planning_entities` \| Entities classified into the planning taxonomy (goals, requirements, risks, etc.) \|
68	\| `user_requirements` \| Structured requirements gathered from the `requirements_chat` skill \|
69	\| `conversation_history` \| Accumulated chat messages for interactive sessions \|
70	\| `artifacts` \| All artifacts generated during the session, enabling skill chaining \|
71	\| `config` \| Arbitrary configuration overrides \|
72
73	---
74
75	## Artifacts
76
77	Every skill returns an `Artifact` dataclass:
78
79	```python
80	@dataclass
81	class Artifact:
82	name: str # Human-readable name (e.g., "Project Plan")
83	content: str # The generated content (markdown, JSON, etc.)
84	artifact_type: str # Type identifier: "project_plan", "prd", "roadmap", etc.
85	format: str = "markdown" # Content format: "markdown", "json", "mermaid"
86	metadata: Dict[str, Any] = field(default_factory=dict)
87	```
88
89	Artifacts are the currency of the agent system. They can be:
90
91	- Displayed directly in the Companion REPL
92	- Exported to disk via the `artifact_export` skill
93	- Pushed to external tools via the `cli_adapter` skill
94	- Chained into other skills (e.g., task breakdown feeds into GitHub issues)
95
96	---
97
98	## Skills Reference
99
100	The agent ships with 11 built-in skills. Each skill is a class that extends `Skill` and self-registers at import time via `register_skill()`.
101
102	### project_plan
103
104	Description: Generate a structured project plan from knowledge graph.
105
106	Pulls the full knowledge graph context (stats, entities, relationships, and planning entities grouped by type) and asks the LLM to produce a comprehensive project plan with:
107
108	1. Executive Summary
109	2. Goals and Objectives
110	3. Scope
111	4. Phases and Milestones
112	5. Resource Requirements
113	6. Risks and Mitigations
114	7. Success Criteria
115
116	Artifact type: `project_plan` \| Format: markdown
117
118	### prd
119
120	Description: Generate a product requirements document (PRD) / feature spec.
121
122	Filters planning entities to those of type `requirement`, `feature`, and `constraint`, then asks the LLM to generate a PRD with:
123
124	1. Problem Statement
125	2. User Stories
126	3. Functional Requirements
127	4. Non-Functional Requirements
128	5. Acceptance Criteria
129	6. Out of Scope
130
131	If no pre-filtered entities match, the LLM derives requirements from the full knowledge graph context.
132
133	Artifact type: `prd` \| Format: markdown
134
135	### roadmap
136
137	Description: Generate a product/project roadmap.
138
139	Focuses on planning entities of type `milestone`, `feature`, and `dependency`. Asks the LLM to produce a roadmap with:
140
141	1. Vision and Strategy
142	2. Phases (with timeline estimates)
143	3. Key Dependencies
144	4. A Mermaid Gantt chart summarizing the timeline
145
146	Artifact type: `roadmap` \| Format: markdown
147
148	### task_breakdown
149
150	Description: Break down goals into tasks with dependencies.
151
152	Focuses on planning entities of type `goal`, `feature`, and `milestone`. Returns a JSON array of task objects, each containing:
153
154	\| Field \| Type \| Description \|
155	\|---\|---\|---\|
156	\| `id` \| string \| Task identifier (e.g., "T1", "T2") \|
157	\| `title` \| string \| Short task title \|
158	\| `description` \| string \| Detailed description \|
159	\| `depends_on` \| list \| IDs of prerequisite tasks \|
160	\| `priority` \| string \| `high`, `medium`, or `low` \|
161	\| `estimate` \| string \| Effort estimate (e.g., "2d", "1w") \|
162	\| `assignee_role` \| string \| Role needed to perform the task \|
163
164	Artifact type: `task_list` \| Format: json
165
166	### github_issues
167
168	Description: Generate GitHub issues from task breakdown.
169
170	Converts tasks into GitHub-ready issue objects. If a `task_list` artifact exists in the context, it is used as input. Otherwise, minimal issues are generated from the planning entities directly.
171
172	Each issue includes a formatted body with description, priority, estimate, and dependencies, plus labels derived from the task priority.
173
174	The skill also provides a `push_to_github(issues_json, repo)` function that shells out to the `gh` CLI to create actual issues. This is used by the `cli_adapter` skill.
175
176	Artifact type: `issues` \| Format: json
177
178	### requirements_chat
179
180	Description: Interactive requirements gathering via guided questions.
181
182	Generates a structured requirements questionnaire based on the knowledge graph context. The questionnaire contains 8-12 targeted questions, each with:
183
184	\| Field \| Type \| Description \|
185	\|---\|---\|---\|
186	\| `id` \| string \| Question identifier (e.g., "Q1") \|
187	\| `category` \| string \| `goals`, `constraints`, `priorities`, or `scope` \|
188	\| `question` \| string \| The question text \|
189	\| `context` \| string \| Why this question matters \|
190
191	The skill also provides a `gather_requirements(context, answers)` method that takes the completed Q&A and synthesizes structured requirements (goals, constraints, priorities, scope).
192
193	Artifact type: `requirements` \| Format: json
194
195	### doc_generator
196
197	Description: Generate technical documentation, ADRs, or meeting notes.
198
199	Supports three document types, selected via the `doc_type` parameter:
200
201	\| `doc_type` \| Output Structure \|
202	\|---\|---\|
203	\| `technical_doc` (default) \| Overview, Architecture, Components and Interfaces, Data Flow, Deployment and Configuration, API Reference \|
204	\| `adr` \| Title, Status (Proposed), Context, Decision, Consequences, Alternatives Considered \|
205	\| `meeting_notes` \| Meeting Summary, Key Discussion Points, Decisions Made, Action Items (with owners), Open Questions, Next Steps \|
206
207	Artifact type: `document` \| Format: markdown
208
209	### artifact_export
210
211	Description: Export artifacts in agent-ready formats.
212
213	Writes all artifacts accumulated in the context to a directory structure. Each artifact is written to a file based on its type:
214
215	\| Artifact Type \| Filename \|
216	\|---\|---\|
217	\| `project_plan` \| `project_plan.md` \|
218	\| `prd` \| `prd.md` \|
219	\| `roadmap` \| `roadmap.md` \|
220	\| `task_list` \| `tasks.json` \|
221	\| `issues` \| `issues.json` \|
222	\| `requirements` \| `requirements.json` \|
223	\| `document` \| `docs/<name>.md` \|
224
225	A `manifest.json` is written alongside, listing all exported files with their names, types, and formats.
226
227	Artifact type: `export_manifest` \| Format: json
228
229	Accepts an `output_dir` parameter (defaults to `plan/`).
230
231	### cli_adapter
232
233	Description: Push artifacts to external tools via their CLIs.
234
235	Converts artifacts into CLI commands for external project management tools. Supported tools:
236
237	\| Tool \| CLI \| Example Command \|
238	\|---\|---\|---\|
239	\| `github` \| `gh` \| `gh issue create --title "..." --body "..." --label "..."` \|
240	\| `jira` \| `jira` \| `jira issue create --summary "..." --description "..."` \|
241	\| `linear` \| `linear` \| `linear issue create --title "..." --description "..."` \|
242
243	The skill checks whether the target CLI is available on the system and includes that status in the output. Commands are generated in dry-run mode by default.
244
245	Artifact type: `cli_commands` \| Format: json
246
247	### notes_export
248
249	Description: Export knowledge graph as structured notes (Obsidian, Notion).
250
251	Exports the entire knowledge graph as a collection of markdown files optimized for a specific note-taking platform. Accepts a `format` parameter:
252
253	Obsidian format creates:
254
255	- One `.md` file per entity with YAML frontmatter, tags, and `[[wiki-links]]`
256	- An `_Index.md` Map of Content grouping entities by type
257	- Tag pages for each entity type
258	- Artifact notes for any generated artifacts
259
260	Notion format creates:
261
262	- One `.md` file per entity with Notion-style callout blocks and relationship tables
263	- An `entities_database.csv` for bulk import into a Notion database
264	- An `Overview.md` page with stats and entity listings
265	- Artifact pages
266
267	Artifact type: `notes_export` \| Format: markdown
268
269	### wiki_generator
270
271	Description: Generate a GitHub wiki from knowledge graph and artifacts.
272
273	Generates a complete GitHub wiki structure as a dictionary of page names to markdown content. Creates:
274
275	- Home page with entity type counts and links
276	- _Sidebar navigation with entity types and artifacts
277	- Type index pages with tables of entities per type
278	- Individual entity pages with descriptions, outgoing/incoming relationships, and source occurrences
279	- Artifact pages for any generated planning artifacts
280
281	The skill also provides standalone functions `write_wiki(pages, output_dir)` to write pages to disk and `push_wiki(wiki_dir, repo)` to push directly to a GitHub wiki repository.
282
283	Artifact type: `wiki` \| Format: markdown
284
285	---
286
287	## CLI Usage
288
289	### One-shot execution
290
291	Run the agent with a request string. The agent selects and executes appropriate skills automatically.
292
293	```bash
294	# Generate a project plan
295	planopticon agent "Create a project plan" --kb ./results
296
297	# Generate a PRD
298	planopticon agent "Write a PRD for the authentication system" --kb ./results
299
300	# Break down into tasks
301	planopticon agent "Break this into tasks and estimate effort" --kb ./results
302	```
303
304	### Export artifacts to disk
305
306	Use `--export` to write generated artifacts to a directory:
307
308	```bash
309	planopticon agent "Create a full project plan with tasks" --kb ./results --export ./output
310	```
311
312	### Interactive mode
313
314	Use `-I` for a multi-turn session where you can issue multiple requests:
315
316	```bash
317	planopticon agent -I --kb ./results
318	```
319
320	In interactive mode, the agent supports:
321
322	- Free-text requests (executed via LLM skill selection)
323	- `/plan` -- shortcut to generate a project plan
324	- `/skills` -- list available skills
325	- `quit`, `exit`, `q` -- end the session
326
327	### Provider and model options
328
329	```bash
330	# Use a specific provider
331	planopticon agent "Create a roadmap" --kb ./results -p anthropic
332
333	# Use a specific model
334	planopticon agent "Generate a PRD" --kb ./results --chat-model gpt-4o
335	```
336
337	### Auto-discovery
338
339	If `--kb` is not specified, the agent uses `KBContext.auto_discover()` to find knowledge graphs in the workspace.
340
341	---
342
343	## Using Skills from the Companion REPL
344
345	The Companion REPL provides direct access to agent skills through slash commands. See the [Companion guide](companion.md) for full details.
346
347	\| Companion Command \| Skill Executed \|
348	\|---\|---\|
349	\| `/plan` \| `project_plan` \|
350	\| `/prd` \| `prd` \|
351	\| `/tasks` \| `task_breakdown` \|
352	\| `/run SKILL_NAME` \| Any registered skill by name \|
353
354	When executed from the Companion, skills use the same `AgentContext` that powers the chat mode. This means:
355
356	- The knowledge graph loaded at startup is automatically available
357	- The active LLM provider (set via `/provider` or `/model`) is used for generation
358	- Generated artifacts accumulate across the session, enabling chaining
359
360	---
361
362	## Example Workflows
363
364	### From video to project plan
365
366	```bash
367	# 1. Analyze a video
368	planopticon analyze -i sprint-review.mp4 -o results/
369
370	# 2. Launch the agent with the results
371	planopticon agent "Create a comprehensive project plan with tasks and a roadmap" \
372	--kb results/ --export plan/
373
374	# 3. Review the generated artifacts
375	ls plan/
376	# project_plan.md roadmap.md tasks.json manifest.json
377	```
378
379	### Interactive planning session
380
381	```bash
382	$ planopticon companion --kb ./results
383
384	planopticon> /status
385	Workspace status:
386	KG: knowledge_graph.db (58 entities, 124 relationships)
387	...
388
389	planopticon> What are the main goals discussed?
390	Based on the knowledge graph, the main goals are...
391
392	planopticon> /plan
393	--- Project Plan (project_plan) ---
394	...
395
396	planopticon> /tasks
397	--- Task Breakdown (task_list) ---
398	...
399
400	planopticon> /run github_issues
401	--- GitHub Issues (issues) ---
402	[
403	{"title": "Set up authentication service", ...},
404	...
405	]
406
407	planopticon> /run artifact_export
408	--- Export Manifest (export_manifest) ---
409	{
410	"artifact_count": 3,
411	"output_dir": "plan",
412	"files": [...]
413	}
414	```
415
416	### Skill chaining
417
418	Skills that produce artifacts make them available to subsequent skills automatically:
419
420	1. `/tasks` generates a `task_list` artifact
421	2. `/run github_issues` detects the existing `task_list` artifact and converts its tasks into GitHub issues
422	3. `/run cli_adapter` takes the most recent artifact and generates `gh issue create` commands
423	4. `/run artifact_export` writes all accumulated artifacts to disk with a manifest
424
425	This chaining works both in the Companion REPL and in one-shot agent execution, since the `AgentContext.artifacts` list persists for the duration of the session.

M docs/guide/single-video.md

+190 -14

		--- docs/guide/single-video.md
		+++ docs/guide/single-video.md
		@@ -8,22 +8,28 @@
8	8
9	9	## What happens
10	10
11	11	The pipeline runs these steps in order:
12	12
13		-1. Frame extraction — Samples frames using change detection for transitions plus periodic capture (every 30s) for slow-evolving content like document scrolling
14		-2. People frame filtering — OpenCV face detection automatically removes webcam/video conference frames, keeping only shared content (slides, documents, screen shares)
15		-3. Audio extraction — Extracts audio track to WAV
16		-4. Transcription — Sends audio to speech-to-text (Whisper or Gemini)
17		-5. Diagram detection — Vision model classifies each frame as diagram/chart/whiteboard/screenshot/none
18		-6. Diagram analysis — High-confidence diagrams get full extraction (description, text, mermaid, chart data)
19		-7. Screengrab fallback — Medium-confidence frames are saved as captioned screenshots
20		-8. Knowledge graph — Extracts entities and relationships from transcript + diagrams
21		-9. Key points — LLM extracts main points and topics
22		-10. Action items — LLM finds tasks, commitments, and follow-ups
23		-11. Reports — Generates markdown, HTML, and PDF
24		-12. Export — Renders mermaid diagrams to SVG/PNG, reproduces charts
	13	+1. Frame extraction -- Samples frames using change detection for transitions plus periodic capture (every 30s) for slow-evolving content like document scrolling
	14	+2. People frame filtering -- OpenCV face detection automatically removes webcam/video conference frames, keeping only shared content (slides, documents, screen shares)
	15	+3. Audio extraction -- Extracts audio track to WAV
	16	+4. Transcription -- Sends audio to speech-to-text (Whisper or Gemini). If `--speakers` is provided, speaker diarization hints are passed to the provider.
	17	+5. Diagram detection -- Vision model classifies each frame as diagram/chart/whiteboard/screenshot/none
	18	+6. Diagram analysis -- High-confidence diagrams get full extraction (description, text, mermaid, chart data)
	19	+7. Screengrab fallback -- Medium-confidence frames are saved as captioned screenshots
	20	+8. Knowledge graph -- Extracts entities and relationships from transcript + diagrams, stored in both `knowledge_graph.db` (SQLite, primary) and `knowledge_graph.json` (export)
	21	+9. Key points -- LLM extracts main points and topics
	22	+10. Action items -- LLM finds tasks, commitments, and follow-ups
	23	+11. Reports -- Generates markdown, HTML, and PDF
	24	+12. Export -- Renders mermaid diagrams to SVG/PNG, reproduces charts
	25	+
	26	+After analysis, you can optionally run planning taxonomy classification on the knowledge graph to categorize entities for use with the planning agent:
	27	+
	28	+```bash
	29	+planopticon kg classify results/knowledge_graph.db
	30	+```
25	31
26	32	## Processing depth
27	33
28	34	### `basic`
29	35	- Transcription only
		@@ -30,18 +36,111 @@
30	36	- Key points and action items
31	37	- No diagram extraction
32	38
33	39	### `standard` (default)
34	40	- Everything in basic
35		-- Diagram extraction (up to 10 frames)
	41	+- Diagram extraction (up to 10 frames, evenly sampled)
36	42	- Knowledge graph
37	43	- Full report generation
38	44
39	45	### `comprehensive`
40	46	- Everything in standard
41	47	- More frames analyzed (up to 20)
42	48	- Deeper analysis
	49	+
	50	+## Command-line options
	51	+
	52	+### Provider and model selection
	53	+
	54	+```bash
	55	+# Use a specific provider
	56	+planopticon analyze -i video.mp4 -o ./output --provider anthropic
	57	+
	58	+# Override vision and chat models separately
	59	+planopticon analyze -i video.mp4 -o ./output --vision-model gpt-4o --chat-model claude-sonnet-4-20250514
	60	+```
	61	+
	62	+### Speaker diarization hints
	63	+
	64	+Use `--speakers` to provide speaker names as comma-separated hints. These are passed to the transcription provider to improve speaker identification in the transcript segments.
	65	+
	66	+```bash
	67	+planopticon analyze -i video.mp4 -o ./output --speakers "Alice,Bob,Carol"
	68	+```
	69	+
	70	+### Custom prompt templates
	71	+
	72	+Use `--templates-dir` to point to a directory of custom `.txt` prompt template files. These override the built-in prompts used for diagram analysis, key point extraction, action item extraction, and other LLM-driven steps.
	73	+
	74	+```bash
	75	+planopticon analyze -i video.mp4 -o ./output --templates-dir ./my-prompts
	76	+```
	77	+
	78	+Template files should be named to match the built-in template names (e.g., `key_points.txt`, `action_items.txt`). See the `video_processor/utils/prompt_templates.py` module for the full list of template names.
	79	+
	80	+### Output format
	81	+
	82	+Use `--output-format json` to emit the complete `VideoManifest` as structured JSON to stdout, in addition to writing all output files to disk. This is useful for scripting, CI/CD integration, or piping results into other tools.
	83	+
	84	+```bash
	85	+# Standard output (files + console summary)
	86	+planopticon analyze -i video.mp4 -o ./output
	87	+
	88	+# JSON manifest to stdout
	89	+planopticon analyze -i video.mp4 -o ./output --output-format json
	90	+```
	91	+
	92	+### Frame extraction tuning
	93	+
	94	+```bash
	95	+# Adjust sampling rate (frames per second to consider)
	96	+planopticon analyze -i video.mp4 -o ./output --sampling-rate 1.0
	97	+
	98	+# Adjust change detection threshold (lower = more sensitive)
	99	+planopticon analyze -i video.mp4 -o ./output --change-threshold 0.10
	100	+
	101	+# Adjust periodic capture interval
	102	+planopticon analyze -i video.mp4 -o ./output --periodic-capture 60
	103	+
	104	+# Enable GPU acceleration for frame extraction
	105	+planopticon analyze -i video.mp4 -o ./output --use-gpu
	106	+```
	107	+
	108	+## Output structure
	109	+
	110	+Every run produces a standardized directory structure:
	111	+
	112	+```
	113	+output/
	114	+├── manifest.json # Run manifest (source of truth)
	115	+├── transcript/
	116	+│ ├── transcript.json # Full transcript with segments + speakers
	117	+│ ├── transcript.txt # Plain text
	118	+│ └── transcript.srt # Subtitles
	119	+├── frames/
	120	+│ ├── frame_0000.jpg
	121	+│ └── ...
	122	+├── diagrams/
	123	+│ ├── diagram_0.jpg # Original frame
	124	+│ ├── diagram_0.mermaid # Mermaid source
	125	+│ ├── diagram_0.svg # Vector rendering
	126	+│ ├── diagram_0.png # Raster rendering
	127	+│ ├── diagram_0.json # Analysis data
	128	+│ └── ...
	129	+├── captures/
	130	+│ ├── capture_0.jpg # Medium-confidence screenshots
	131	+│ ├── capture_0.json
	132	+│ └── ...
	133	+└── results/
	134	+ ├── analysis.md # Markdown report
	135	+ ├── analysis.html # HTML report
	136	+ ├── analysis.pdf # PDF (if planopticon[pdf] installed)
	137	+ ├── knowledge_graph.db # Knowledge graph (SQLite, primary)
	138	+ ├── knowledge_graph.json # Knowledge graph (JSON export)
	139	+ ├── key_points.json # Extracted key points
	140	+ └── action_items.json # Action items
	141	+```
43	142
44	143	## Output manifest
45	144
46	145	Every run produces a `manifest.json` that is the single source of truth:
47	146
		@@ -56,13 +155,90 @@
56	155	"stats": {
57	156	"duration_seconds": 45.2,
58	157	"frames_extracted": 42,
59	158	"people_frames_filtered": 11,
60	159	"diagrams_detected": 3,
61		- "screen_captures": 5
	160	+ "screen_captures": 5,
	161	+ "models_used": {
	162	+ "vision": "gpt-4o",
	163	+ "chat": "gpt-4o"
	164	+ }
62	165	},
	166	+ "transcript_json": "transcript/transcript.json",
	167	+ "transcript_txt": "transcript/transcript.txt",
	168	+ "transcript_srt": "transcript/transcript.srt",
	169	+ "analysis_md": "results/analysis.md",
	170	+ "knowledge_graph_json": "results/knowledge_graph.json",
	171	+ "knowledge_graph_db": "results/knowledge_graph.db",
	172	+ "key_points_json": "results/key_points.json",
	173	+ "action_items_json": "results/action_items.json",
63	174	"key_points": [...],
64	175	"action_items": [...],
65	176	"diagrams": [...],
66	177	"screen_captures": [...]
67	178	}
68	179	```
	180	+
	181	+## Checkpoint and resume
	182	+
	183	+The pipeline supports checkpoint/resume. If a step's output files already exist on disk, that step is skipped on re-run. This means you can safely re-run an interrupted analysis and it will pick up where it left off:
	184	+
	185	+```bash
	186	+# First run (interrupted at step 6)
	187	+planopticon analyze -i video.mp4 -o ./output
	188	+
	189	+# Second run (resumes from step 6)
	190	+planopticon analyze -i video.mp4 -o ./output
	191	+```
	192	+
	193	+## Using results after analysis
	194	+
	195	+### Query the knowledge graph
	196	+
	197	+After analysis completes, you can query the knowledge graph directly:
	198	+
	199	+```bash
	200	+# Show graph stats
	201	+planopticon query --db results/knowledge_graph.db
	202	+
	203	+# List entities by type
	204	+planopticon query --db results/knowledge_graph.db "entities --type technology"
	205	+
	206	+# Find neighbors of an entity
	207	+planopticon query --db results/knowledge_graph.db "neighbors Kubernetes"
	208	+
	209	+# Ask natural language questions (requires API key)
	210	+planopticon query --db results/knowledge_graph.db "What technologies were discussed?"
	211	+```
	212	+
	213	+### Classify entities for planning
	214	+
	215	+Run taxonomy classification to categorize entities into planning types (goal, milestone, risk, dependency, etc.):
	216	+
	217	+```bash
	218	+planopticon kg classify results/knowledge_graph.db
	219	+planopticon kg classify results/knowledge_graph.db --format json
	220	+```
	221	+
	222	+### Export to other formats
	223	+
	224	+```bash
	225	+# Generate markdown documents
	226	+planopticon export markdown results/knowledge_graph.db -o ./docs
	227	+
	228	+# Export as Obsidian vault
	229	+planopticon export obsidian results/knowledge_graph.db -o ./vault
	230	+
	231	+# Export as PlanOpticonExchange
	232	+planopticon export exchange results/knowledge_graph.db -o exchange.json
	233	+
	234	+# Generate GitHub wiki
	235	+planopticon wiki generate results/knowledge_graph.db -o ./wiki
	236	+```
	237	+
	238	+### Use with the planning agent
	239	+
	240	+The planning agent can consume the knowledge graph to generate project plans, PRDs, roadmaps, and other planning artifacts:
	241	+
	242	+```bash
	243	+planopticon agent --db results/knowledge_graph.db
	244	+```
69	245
70	246	ADDED docs/use-cases.md

	--- docs/guide/single-video.md
	+++ docs/guide/single-video.md
	@@ -8,22 +8,28 @@
8
9	## What happens
10
11	The pipeline runs these steps in order:
12
13	1. Frame extraction — Samples frames using change detection for transitions plus periodic capture (every 30s) for slow-evolving content like document scrolling
14	2. People frame filtering — OpenCV face detection automatically removes webcam/video conference frames, keeping only shared content (slides, documents, screen shares)
15	3. Audio extraction — Extracts audio track to WAV
16	4. Transcription — Sends audio to speech-to-text (Whisper or Gemini)
17	5. Diagram detection — Vision model classifies each frame as diagram/chart/whiteboard/screenshot/none
18	6. Diagram analysis — High-confidence diagrams get full extraction (description, text, mermaid, chart data)
19	7. Screengrab fallback — Medium-confidence frames are saved as captioned screenshots
20	8. Knowledge graph — Extracts entities and relationships from transcript + diagrams
21	9. Key points — LLM extracts main points and topics
22	10. Action items — LLM finds tasks, commitments, and follow-ups
23	11. Reports — Generates markdown, HTML, and PDF
24	12. Export — Renders mermaid diagrams to SVG/PNG, reproduces charts






25
26	## Processing depth
27
28	### `basic`
29	- Transcription only
	@@ -30,18 +36,111 @@
30	- Key points and action items
31	- No diagram extraction
32
33	### `standard` (default)
34	- Everything in basic
35	- Diagram extraction (up to 10 frames)
36	- Knowledge graph
37	- Full report generation
38
39	### `comprehensive`
40	- Everything in standard
41	- More frames analyzed (up to 20)
42	- Deeper analysis





























































































43
44	## Output manifest
45
46	Every run produces a `manifest.json` that is the single source of truth:
47
	@@ -56,13 +155,90 @@
56	"stats": {
57	"duration_seconds": 45.2,
58	"frames_extracted": 42,
59	"people_frames_filtered": 11,
60	"diagrams_detected": 3,
61	"screen_captures": 5




62	},








63	"key_points": [...],
64	"action_items": [...],
65	"diagrams": [...],
66	"screen_captures": [...]
67	}
68	```

































































69
70	DDED docs/use-cases.md

	--- docs/guide/single-video.md
	+++ docs/guide/single-video.md
	@@ -8,22 +8,28 @@
8
9	## What happens
10
11	The pipeline runs these steps in order:
12
13	1. Frame extraction -- Samples frames using change detection for transitions plus periodic capture (every 30s) for slow-evolving content like document scrolling
14	2. People frame filtering -- OpenCV face detection automatically removes webcam/video conference frames, keeping only shared content (slides, documents, screen shares)
15	3. Audio extraction -- Extracts audio track to WAV
16	4. Transcription -- Sends audio to speech-to-text (Whisper or Gemini). If `--speakers` is provided, speaker diarization hints are passed to the provider.
17	5. Diagram detection -- Vision model classifies each frame as diagram/chart/whiteboard/screenshot/none
18	6. Diagram analysis -- High-confidence diagrams get full extraction (description, text, mermaid, chart data)
19	7. Screengrab fallback -- Medium-confidence frames are saved as captioned screenshots
20	8. Knowledge graph -- Extracts entities and relationships from transcript + diagrams, stored in both `knowledge_graph.db` (SQLite, primary) and `knowledge_graph.json` (export)
21	9. Key points -- LLM extracts main points and topics
22	10. Action items -- LLM finds tasks, commitments, and follow-ups
23	11. Reports -- Generates markdown, HTML, and PDF
24	12. Export -- Renders mermaid diagrams to SVG/PNG, reproduces charts
25
26	After analysis, you can optionally run planning taxonomy classification on the knowledge graph to categorize entities for use with the planning agent:
27
28	```bash
29	planopticon kg classify results/knowledge_graph.db
30	```
31
32	## Processing depth
33
34	### `basic`
35	- Transcription only
	@@ -30,18 +36,111 @@
36	- Key points and action items
37	- No diagram extraction
38
39	### `standard` (default)
40	- Everything in basic
41	- Diagram extraction (up to 10 frames, evenly sampled)
42	- Knowledge graph
43	- Full report generation
44
45	### `comprehensive`
46	- Everything in standard
47	- More frames analyzed (up to 20)
48	- Deeper analysis
49
50	## Command-line options
51
52	### Provider and model selection
53
54	```bash
55	# Use a specific provider
56	planopticon analyze -i video.mp4 -o ./output --provider anthropic
57
58	# Override vision and chat models separately
59	planopticon analyze -i video.mp4 -o ./output --vision-model gpt-4o --chat-model claude-sonnet-4-20250514
60	```
61
62	### Speaker diarization hints
63
64	Use `--speakers` to provide speaker names as comma-separated hints. These are passed to the transcription provider to improve speaker identification in the transcript segments.
65
66	```bash
67	planopticon analyze -i video.mp4 -o ./output --speakers "Alice,Bob,Carol"
68	```
69
70	### Custom prompt templates
71
72	Use `--templates-dir` to point to a directory of custom `.txt` prompt template files. These override the built-in prompts used for diagram analysis, key point extraction, action item extraction, and other LLM-driven steps.
73
74	```bash
75	planopticon analyze -i video.mp4 -o ./output --templates-dir ./my-prompts
76	```
77
78	Template files should be named to match the built-in template names (e.g., `key_points.txt`, `action_items.txt`). See the `video_processor/utils/prompt_templates.py` module for the full list of template names.
79
80	### Output format
81
82	Use `--output-format json` to emit the complete `VideoManifest` as structured JSON to stdout, in addition to writing all output files to disk. This is useful for scripting, CI/CD integration, or piping results into other tools.
83
84	```bash
85	# Standard output (files + console summary)
86	planopticon analyze -i video.mp4 -o ./output
87
88	# JSON manifest to stdout
89	planopticon analyze -i video.mp4 -o ./output --output-format json
90	```
91
92	### Frame extraction tuning
93
94	```bash
95	# Adjust sampling rate (frames per second to consider)
96	planopticon analyze -i video.mp4 -o ./output --sampling-rate 1.0
97
98	# Adjust change detection threshold (lower = more sensitive)
99	planopticon analyze -i video.mp4 -o ./output --change-threshold 0.10
100
101	# Adjust periodic capture interval
102	planopticon analyze -i video.mp4 -o ./output --periodic-capture 60
103
104	# Enable GPU acceleration for frame extraction
105	planopticon analyze -i video.mp4 -o ./output --use-gpu
106	```
107
108	## Output structure
109
110	Every run produces a standardized directory structure:
111
112	```
113	output/
114	├── manifest.json # Run manifest (source of truth)
115	├── transcript/
116	│ ├── transcript.json # Full transcript with segments + speakers
117	│ ├── transcript.txt # Plain text
118	│ └── transcript.srt # Subtitles
119	├── frames/
120	│ ├── frame_0000.jpg
121	│ └── ...
122	├── diagrams/
123	│ ├── diagram_0.jpg # Original frame
124	│ ├── diagram_0.mermaid # Mermaid source
125	│ ├── diagram_0.svg # Vector rendering
126	│ ├── diagram_0.png # Raster rendering
127	│ ├── diagram_0.json # Analysis data
128	│ └── ...
129	├── captures/
130	│ ├── capture_0.jpg # Medium-confidence screenshots
131	│ ├── capture_0.json
132	│ └── ...
133	└── results/
134	├── analysis.md # Markdown report
135	├── analysis.html # HTML report
136	├── analysis.pdf # PDF (if planopticon[pdf] installed)
137	├── knowledge_graph.db # Knowledge graph (SQLite, primary)
138	├── knowledge_graph.json # Knowledge graph (JSON export)
139	├── key_points.json # Extracted key points
140	└── action_items.json # Action items
141	```
142
143	## Output manifest
144
145	Every run produces a `manifest.json` that is the single source of truth:
146
	@@ -56,13 +155,90 @@
155	"stats": {
156	"duration_seconds": 45.2,
157	"frames_extracted": 42,
158	"people_frames_filtered": 11,
159	"diagrams_detected": 3,
160	"screen_captures": 5,
161	"models_used": {
162	"vision": "gpt-4o",
163	"chat": "gpt-4o"
164	}
165	},
166	"transcript_json": "transcript/transcript.json",
167	"transcript_txt": "transcript/transcript.txt",
168	"transcript_srt": "transcript/transcript.srt",
169	"analysis_md": "results/analysis.md",
170	"knowledge_graph_json": "results/knowledge_graph.json",
171	"knowledge_graph_db": "results/knowledge_graph.db",
172	"key_points_json": "results/key_points.json",
173	"action_items_json": "results/action_items.json",
174	"key_points": [...],
175	"action_items": [...],
176	"diagrams": [...],
177	"screen_captures": [...]
178	}
179	```
180
181	## Checkpoint and resume
182
183	The pipeline supports checkpoint/resume. If a step's output files already exist on disk, that step is skipped on re-run. This means you can safely re-run an interrupted analysis and it will pick up where it left off:
184
185	```bash
186	# First run (interrupted at step 6)
187	planopticon analyze -i video.mp4 -o ./output
188
189	# Second run (resumes from step 6)
190	planopticon analyze -i video.mp4 -o ./output
191	```
192
193	## Using results after analysis
194
195	### Query the knowledge graph
196
197	After analysis completes, you can query the knowledge graph directly:
198
199	```bash
200	# Show graph stats
201	planopticon query --db results/knowledge_graph.db
202
203	# List entities by type
204	planopticon query --db results/knowledge_graph.db "entities --type technology"
205
206	# Find neighbors of an entity
207	planopticon query --db results/knowledge_graph.db "neighbors Kubernetes"
208
209	# Ask natural language questions (requires API key)
210	planopticon query --db results/knowledge_graph.db "What technologies were discussed?"
211	```
212
213	### Classify entities for planning
214
215	Run taxonomy classification to categorize entities into planning types (goal, milestone, risk, dependency, etc.):
216
217	```bash
218	planopticon kg classify results/knowledge_graph.db
219	planopticon kg classify results/knowledge_graph.db --format json
220	```
221
222	### Export to other formats
223
224	```bash
225	# Generate markdown documents
226	planopticon export markdown results/knowledge_graph.db -o ./docs
227
228	# Export as Obsidian vault
229	planopticon export obsidian results/knowledge_graph.db -o ./vault
230
231	# Export as PlanOpticonExchange
232	planopticon export exchange results/knowledge_graph.db -o exchange.json
233
234	# Generate GitHub wiki
235	planopticon wiki generate results/knowledge_graph.db -o ./wiki
236	```
237
238	### Use with the planning agent
239
240	The planning agent can consume the knowledge graph to generate project plans, PRDs, roadmaps, and other planning artifacts:
241
242	```bash
243	planopticon agent --db results/knowledge_graph.db
244	```
245
246	DDED docs/use-cases.md

A docs/use-cases.md

+342

		--- a/docs/use-cases.md
		+++ b/docs/use-cases.md
		@@ -0,0 +1,342 @@
	1	+# Use Cases
	2	+
	3	+PlanOpticon is built for anyone who needs to turn unstructured content — recordings, documents, notes, web pages — into structured, searchable, actionable knowledge. Here are the most common ways people use it.
	4	+
	5	+---
	6	+
	7	+## Meeting notes and follow-ups
	8	+
	9	+Problem: You have hours of meeting recordings but no time to rewatch them. Action items get lost, decisions are forgotten, and new team members have no way to catch up.
	10	+
	11	+Solution: Point PlanOpticon at your recordings and get structured transcripts, action items with assignees and deadlines, key decisions, and a knowledge graph linking people to topics.
	12	+
	13	+```bash
	14	+# Analyze a single meeting recording
	15	+planopticon analyze -i standup-2026-03-07.mp4 -o ./meetings/march-7
	16	+
	17	+# Process a month of recordings at once
	18	+planopticon batch -i ./recordings/march -o ./meetings --title "March 2026 Meetings"
	19	+
	20	+# Query what was decided
	21	+planopticon query "What decisions were made about the API redesign?"
	22	+
	23	+# Find all action items assigned to Alice
	24	+planopticon query "relationships --source Alice"
	25	+```
	26	+
	27	+What you get:
	28	+
	29	+- Full transcript with timestamps and speaker segments
	30	+- Action items extracted with assignees, deadlines, and context
	31	+- Key points and decisions highlighted
	32	+- Knowledge graph connecting people, topics, technologies, and decisions
	33	+- Markdown report you can share with the team
	34	+
	35	+Next steps: Export to your team's wiki or note system:
	36	+
	37	+```bash
	38	+# Push to GitHub wiki
	39	+planopticon wiki generate --input ./meetings --output ./wiki
	40	+planopticon wiki push --input ./wiki --target "github://your-org/your-repo"
	41	+
	42	+# Export to Obsidian for personal knowledge management
	43	+planopticon export obsidian --input ./meetings --output ~/Obsidian/Meetings
	44	+```
	45	+
	46	+---
	47	+
	48	+## Research processing
	49	+
	50	+Problem: You're researching a topic across YouTube talks, arXiv papers, blog posts, and podcasts. Information is scattered and hard to cross-reference.
	51	+
	52	+Solution: Ingest everything into a single knowledge graph, then query across all sources.
	53	+
	54	+```bash
	55	+# Ingest a YouTube conference talk
	56	+planopticon ingest "https://youtube.com/watch?v=..." --output ./research
	57	+
	58	+# Ingest arXiv papers
	59	+planopticon ingest "https://arxiv.org/abs/2401.12345" --output ./research
	60	+
	61	+# Ingest blog posts and documentation
	62	+planopticon ingest "https://example.com/blog/post" --output ./research
	63	+
	64	+# Ingest local PDF papers
	65	+planopticon ingest ./papers/ --output ./research --recursive
	66	+
	67	+# Now query across everything
	68	+planopticon query "What approaches to vector search were discussed?"
	69	+planopticon query "entities --type technology"
	70	+planopticon query "neighbors TransformerArchitecture"
	71	+```
	72	+
	73	+What you get:
	74	+
	75	+- A unified knowledge graph merging entities across all sources
	76	+- Cross-references showing where the same concept appears in different sources
	77	+- Searchable entity index by type (people, technologies, concepts, papers)
	78	+- Relationship maps showing how ideas connect
	79	+
	80	+Go deeper with the companion:
	81	+
	82	+```bash
	83	+planopticon companion --kb ./research
	84	+```
	85	+
	86	+```
	87	+planopticon> What are the main approaches to retrieval-augmented generation?
	88	+planopticon> /entities --type technology
	89	+planopticon> /neighbors RAG
	90	+planopticon> /export obsidian
	91	+```
	92	+
	93	+---
	94	+
	95	+## Knowledge gathering across platforms
	96	+
	97	+Problem: Your team's knowledge is spread across Google Docs, Notion, Obsidian, GitHub wikis, and Apple Notes. There's no single place to search everything.
	98	+
	99	+Solution: Pull from all sources into one knowledge graph.
	100	+
	101	+```bash
	102	+# Authenticate with your platforms
	103	+planopticon auth google
	104	+planopticon auth notion
	105	+planopticon auth github
	106	+
	107	+# Ingest from Google Workspace
	108	+planopticon gws ingest --folder-id abc123 --output ./kb --recursive
	109	+
	110	+# Ingest from Notion
	111	+planopticon ingest --source notion --output ./kb
	112	+
	113	+# Ingest from an Obsidian vault
	114	+planopticon ingest ~/Obsidian/WorkVault --output ./kb --recursive
	115	+
	116	+# Ingest from GitHub wikis and READMEs
	117	+planopticon ingest "github://your-org/project-a" --output ./kb
	118	+planopticon ingest "github://your-org/project-b" --output ./kb
	119	+
	120	+# Query the unified knowledge base
	121	+planopticon query stats
	122	+planopticon query "entities --type person"
	123	+planopticon query "What do we know about the authentication system?"
	124	+```
	125	+
	126	+What you get:
	127	+
	128	+- Merged knowledge graph with provenance tracking (you can see which source each entity came from)
	129	+- Deduplicated entities across platforms (same concept mentioned in Notion and Google Docs gets merged)
	130	+- Full-text search across all ingested content
	131	+- Relationship maps showing how concepts connect across your organization's documents
	132	+
	133	+---
	134	+
	135	+## Team onboarding
	136	+
	137	+Problem: New team members spend weeks reading docs, watching recorded meetings, and asking questions to get up to speed.
	138	+
	139	+Solution: Build a knowledge base from existing content and let new people explore it conversationally.
	140	+
	141	+```bash
	142	+# Build the knowledge base from everything
	143	+planopticon batch -i ./recordings/onboarding -o ./kb --title "Team Onboarding"
	144	+planopticon ingest ./docs/ --output ./kb --recursive
	145	+planopticon ingest ./architecture-decisions/ --output ./kb --recursive
	146	+
	147	+# New team member launches the companion
	148	+planopticon companion --kb ./kb
	149	+```
	150	+
	151	+```
	152	+planopticon> What is the overall architecture of the system?
	153	+planopticon> Who are the key people on the team?
	154	+planopticon> /entities --type technology
	155	+planopticon> What was the rationale for choosing PostgreSQL over MongoDB?
	156	+planopticon> /neighbors AuthenticationService
	157	+planopticon> What are the main open issues or risks?
	158	+```
	159	+
	160	+What you get:
	161	+
	162	+- Interactive Q&A over the entire team knowledge base
	163	+- Entity exploration — browse people, technologies, services, decisions
	164	+- Relationship navigation — "show me everything connected to the payment system"
	165	+- No need to rewatch hours of recordings
	166	+
	167	+---
	168	+
	169	+## Data collection and synthesis
	170	+
	171	+Problem: You need to collect and synthesize information from many sources — customer interviews, competitor analysis, market research — into a coherent picture.
	172	+
	173	+Solution: Batch process recordings and documents, then use the planning agent to generate synthesis artifacts.
	174	+
	175	+```bash
	176	+# Process customer interview recordings
	177	+planopticon batch -i ./interviews -o ./research --title "Customer Interviews Q1"
	178	+
	179	+# Ingest competitor documentation
	180	+planopticon ingest ./competitor-analysis/ --output ./research --recursive
	181	+
	182	+# Ingest market research PDFs
	183	+planopticon ingest ./market-reports/ --output ./research --recursive
	184	+
	185	+# Use the planning agent to synthesize
	186	+planopticon agent --kb ./research --interactive
	187	+```
	188	+
	189	+```
	190	+planopticon> Generate a summary of common customer pain points
	191	+planopticon> /plan
	192	+planopticon> /tasks
	193	+planopticon> /export markdown
	194	+```
	195	+
	196	+What you get:
	197	+
	198	+- Merged knowledge graph across all interviews and documents
	199	+- Cross-referenced entities showing which customers mentioned which features
	200	+- Agent-generated project plans, PRDs, and task breakdowns based on the data
	201	+- Exportable artifacts for sharing with stakeholders
	202	+
	203	+---
	204	+
	205	+## Content creation from video
	206	+
	207	+Problem: You have video content (lectures, tutorials, webinars) that you want to turn into written documentation, blog posts, or course materials.
	208	+
	209	+Solution: Extract structured knowledge and export it in your preferred format.
	210	+
	211	+```bash
	212	+# Analyze the video
	213	+planopticon analyze -i webinar-recording.mp4 -o ./content
	214	+
	215	+# Generate multiple document types (no LLM needed for these)
	216	+planopticon export markdown --input ./content --output ./docs
	217	+
	218	+# Export to Obsidian for further editing
	219	+planopticon export obsidian --input ./content --output ~/Obsidian/Content
	220	+```
	221	+
	222	+What you get for each video:
	223	+
	224	+- Full transcript (JSON, plain text, SRT subtitles)
	225	+- Extracted diagrams reproduced as Mermaid/SVG/PNG
	226	+- Charts reproduced with data tables
	227	+- Knowledge graph of concepts and relationships
	228	+- 7 types of markdown documents: summary, meeting notes, glossary, relationship map, status report, entity index, CSV data
	229	+
	230	+---
	231	+
	232	+## Decision tracking over time
	233	+
	234	+Problem: Important decisions are made in meetings but never formally recorded. Months later, nobody remembers why a choice was made.
	235	+
	236	+Solution: Process meeting recordings continuously and query the growing knowledge graph for decisions and their context.
	237	+
	238	+```bash
	239	+# Process each week's recordings
	240	+planopticon batch -i ./recordings/week-12 -o ./decisions --title "Week 12"
	241	+
	242	+# The knowledge graph grows over time — entities merge across weeks
	243	+planopticon query "entities --type goal"
	244	+planopticon query "entities --type risk"
	245	+planopticon query "entities --type milestone"
	246	+
	247	+# Find decisions about a specific topic
	248	+planopticon query "What was decided about the database migration?"
	249	+
	250	+# Track risks over time
	251	+planopticon query "relationships --type risk"
	252	+```
	253	+
	254	+The planning taxonomy automatically classifies entities as goals, requirements, risks, tasks, and milestones — giving you a structured view of project evolution over time.
	255	+
	256	+---
	257	+
	258	+## Zoom / Teams / Meet integration
	259	+
	260	+Problem: Meeting recordings are sitting in Zoom/Teams/Meet cloud storage. You want to process them without manually downloading each one.
	261	+
	262	+Solution: Authenticate once, list recordings, and process them directly.
	263	+
	264	+```bash
	265	+# Authenticate with your meeting platform
	266	+planopticon auth zoom
	267	+# or: planopticon auth microsoft
	268	+# or: planopticon auth google
	269	+
	270	+# List recent recordings
	271	+planopticon recordings zoom-list
	272	+planopticon recordings teams-list --from 2026-01-01
	273	+planopticon recordings meet-list --limit 20
	274	+
	275	+# Process recordings (download + analyze)
	276	+planopticon analyze -i "zoom://recording-id" -o ./output
	277	+```
	278	+
	279	+Setup requirements:
	280	+
	281	+\| Platform \| What you need \|
	282	+\|----------\|--------------\|
	283	+\| Zoom \| `ZOOM_CLIENT_ID` + `ZOOM_CLIENT_SECRET` (create an OAuth app at marketplace.zoom.us) \|
	284	+\| Teams \| `MICROSOFT_CLIENT_ID` + `MICROSOFT_CLIENT_SECRET` (register an Azure AD app) \|
	285	+\| Meet \| `GOOGLE_CLIENT_ID` + `GOOGLE_CLIENT_SECRET` (create OAuth credentials in Google Cloud Console) \|
	286	+
	287	+See the [Authentication guide](guide/authentication.md) for detailed setup instructions.
	288	+
	289	+---
	290	+
	291	+## Fully offline processing
	292	+
	293	+Problem: You're working with sensitive content that can't leave your network, or you simply don't want to pay for API calls.
	294	+
	295	+Solution: Use Ollama for local AI processing with no external API calls.
	296	+
	297	+```bash
	298	+# Install Ollama and pull models
	299	+ollama pull llama3.2 # Chat/analysis
	300	+ollama pull llava # Vision (diagram detection)
	301	+
	302	+# Install local Whisper for transcription
	303	+pip install planopticon[gpu]
	304	+
	305	+# Process entirely offline
	306	+planopticon analyze -i sensitive-meeting.mp4 -o ./output --provider ollama
	307	+```
	308	+
	309	+PlanOpticon auto-detects Ollama when it's running. If no cloud API keys are configured, it uses Ollama automatically. Pair with local Whisper transcription for a fully air-gapped pipeline.
	310	+
	311	+---
	312	+
	313	+## Competitive research
	314	+
	315	+Problem: You want to systematically analyze competitor content — conference talks, documentation, blog posts — and identify patterns.
	316	+
	317	+Solution: Ingest competitor content from multiple sources and query for patterns.
	318	+
	319	+```bash
	320	+# Ingest competitor conference talks from YouTube
	321	+planopticon ingest "https://youtube.com/watch?v=competitor-talk-1" --output ./competitive
	322	+planopticon ingest "https://youtube.com/watch?v=competitor-talk-2" --output ./competitive
	323	+
	324	+# Ingest their documentation
	325	+planopticon ingest "https://competitor.com/docs" --output ./competitive
	326	+
	327	+# Ingest their GitHub repos
	328	+planopticon auth github
	329	+planopticon ingest "github://competitor/main-product" --output ./competitive
	330	+
	331	+# Analyze patterns
	332	+planopticon query "entities --type technology"
	333	+planopticon query "What technologies are competitors investing in?"
	334	+planopticon companion --kb ./competitive
	335	+```
	336	+
	337	+```
	338	+planopticon> What are the common architectural patterns across competitors?
	339	+planopticon> /entities --type technology
	340	+planopticon> Which technologies appear most frequently?
	341	+planopticon> /export markdown
	342	+```

	--- a/docs/use-cases.md
	+++ b/docs/use-cases.md
	@@ -0,0 +1,342 @@

	--- a/docs/use-cases.md
	+++ b/docs/use-cases.md
	@@ -0,0 +1,342 @@
1	# Use Cases
2
3	PlanOpticon is built for anyone who needs to turn unstructured content — recordings, documents, notes, web pages — into structured, searchable, actionable knowledge. Here are the most common ways people use it.
4
5	---
6
7	## Meeting notes and follow-ups
8
9	Problem: You have hours of meeting recordings but no time to rewatch them. Action items get lost, decisions are forgotten, and new team members have no way to catch up.
10
11	Solution: Point PlanOpticon at your recordings and get structured transcripts, action items with assignees and deadlines, key decisions, and a knowledge graph linking people to topics.
12
13	```bash
14	# Analyze a single meeting recording
15	planopticon analyze -i standup-2026-03-07.mp4 -o ./meetings/march-7
16
17	# Process a month of recordings at once
18	planopticon batch -i ./recordings/march -o ./meetings --title "March 2026 Meetings"
19
20	# Query what was decided
21	planopticon query "What decisions were made about the API redesign?"
22
23	# Find all action items assigned to Alice
24	planopticon query "relationships --source Alice"
25	```
26
27	What you get:
28
29	- Full transcript with timestamps and speaker segments
30	- Action items extracted with assignees, deadlines, and context
31	- Key points and decisions highlighted
32	- Knowledge graph connecting people, topics, technologies, and decisions
33	- Markdown report you can share with the team
34
35	Next steps: Export to your team's wiki or note system:
36
37	```bash
38	# Push to GitHub wiki
39	planopticon wiki generate --input ./meetings --output ./wiki
40	planopticon wiki push --input ./wiki --target "github://your-org/your-repo"
41
42	# Export to Obsidian for personal knowledge management
43	planopticon export obsidian --input ./meetings --output ~/Obsidian/Meetings
44	```
45
46	---
47
48	## Research processing
49
50	Problem: You're researching a topic across YouTube talks, arXiv papers, blog posts, and podcasts. Information is scattered and hard to cross-reference.
51
52	Solution: Ingest everything into a single knowledge graph, then query across all sources.
53
54	```bash
55	# Ingest a YouTube conference talk
56	planopticon ingest "https://youtube.com/watch?v=..." --output ./research
57
58	# Ingest arXiv papers
59	planopticon ingest "https://arxiv.org/abs/2401.12345" --output ./research
60
61	# Ingest blog posts and documentation
62	planopticon ingest "https://example.com/blog/post" --output ./research
63
64	# Ingest local PDF papers
65	planopticon ingest ./papers/ --output ./research --recursive
66
67	# Now query across everything
68	planopticon query "What approaches to vector search were discussed?"
69	planopticon query "entities --type technology"
70	planopticon query "neighbors TransformerArchitecture"
71	```
72
73	What you get:
74
75	- A unified knowledge graph merging entities across all sources
76	- Cross-references showing where the same concept appears in different sources
77	- Searchable entity index by type (people, technologies, concepts, papers)
78	- Relationship maps showing how ideas connect
79
80	Go deeper with the companion:
81
82	```bash
83	planopticon companion --kb ./research
84	```
85
86	```
87	planopticon> What are the main approaches to retrieval-augmented generation?
88	planopticon> /entities --type technology
89	planopticon> /neighbors RAG
90	planopticon> /export obsidian
91	```
92
93	---
94
95	## Knowledge gathering across platforms
96
97	Problem: Your team's knowledge is spread across Google Docs, Notion, Obsidian, GitHub wikis, and Apple Notes. There's no single place to search everything.
98
99	Solution: Pull from all sources into one knowledge graph.
100
101	```bash
102	# Authenticate with your platforms
103	planopticon auth google
104	planopticon auth notion
105	planopticon auth github
106
107	# Ingest from Google Workspace
108	planopticon gws ingest --folder-id abc123 --output ./kb --recursive
109
110	# Ingest from Notion
111	planopticon ingest --source notion --output ./kb
112
113	# Ingest from an Obsidian vault
114	planopticon ingest ~/Obsidian/WorkVault --output ./kb --recursive
115
116	# Ingest from GitHub wikis and READMEs
117	planopticon ingest "github://your-org/project-a" --output ./kb
118	planopticon ingest "github://your-org/project-b" --output ./kb
119
120	# Query the unified knowledge base
121	planopticon query stats
122	planopticon query "entities --type person"
123	planopticon query "What do we know about the authentication system?"
124	```
125
126	What you get:
127
128	- Merged knowledge graph with provenance tracking (you can see which source each entity came from)
129	- Deduplicated entities across platforms (same concept mentioned in Notion and Google Docs gets merged)
130	- Full-text search across all ingested content
131	- Relationship maps showing how concepts connect across your organization's documents
132
133	---
134
135	## Team onboarding
136
137	Problem: New team members spend weeks reading docs, watching recorded meetings, and asking questions to get up to speed.
138
139	Solution: Build a knowledge base from existing content and let new people explore it conversationally.
140
141	```bash
142	# Build the knowledge base from everything
143	planopticon batch -i ./recordings/onboarding -o ./kb --title "Team Onboarding"
144	planopticon ingest ./docs/ --output ./kb --recursive
145	planopticon ingest ./architecture-decisions/ --output ./kb --recursive
146
147	# New team member launches the companion
148	planopticon companion --kb ./kb
149	```
150
151	```
152	planopticon> What is the overall architecture of the system?
153	planopticon> Who are the key people on the team?
154	planopticon> /entities --type technology
155	planopticon> What was the rationale for choosing PostgreSQL over MongoDB?
156	planopticon> /neighbors AuthenticationService
157	planopticon> What are the main open issues or risks?
158	```
159
160	What you get:
161
162	- Interactive Q&A over the entire team knowledge base
163	- Entity exploration — browse people, technologies, services, decisions
164	- Relationship navigation — "show me everything connected to the payment system"
165	- No need to rewatch hours of recordings
166
167	---
168
169	## Data collection and synthesis
170
171	Problem: You need to collect and synthesize information from many sources — customer interviews, competitor analysis, market research — into a coherent picture.
172
173	Solution: Batch process recordings and documents, then use the planning agent to generate synthesis artifacts.
174
175	```bash
176	# Process customer interview recordings
177	planopticon batch -i ./interviews -o ./research --title "Customer Interviews Q1"
178
179	# Ingest competitor documentation
180	planopticon ingest ./competitor-analysis/ --output ./research --recursive
181
182	# Ingest market research PDFs
183	planopticon ingest ./market-reports/ --output ./research --recursive
184
185	# Use the planning agent to synthesize
186	planopticon agent --kb ./research --interactive
187	```
188
189	```
190	planopticon> Generate a summary of common customer pain points
191	planopticon> /plan
192	planopticon> /tasks
193	planopticon> /export markdown
194	```
195
196	What you get:
197
198	- Merged knowledge graph across all interviews and documents
199	- Cross-referenced entities showing which customers mentioned which features
200	- Agent-generated project plans, PRDs, and task breakdowns based on the data
201	- Exportable artifacts for sharing with stakeholders
202
203	---
204
205	## Content creation from video
206
207	Problem: You have video content (lectures, tutorials, webinars) that you want to turn into written documentation, blog posts, or course materials.
208
209	Solution: Extract structured knowledge and export it in your preferred format.
210
211	```bash
212	# Analyze the video
213	planopticon analyze -i webinar-recording.mp4 -o ./content
214
215	# Generate multiple document types (no LLM needed for these)
216	planopticon export markdown --input ./content --output ./docs
217
218	# Export to Obsidian for further editing
219	planopticon export obsidian --input ./content --output ~/Obsidian/Content
220	```
221
222	What you get for each video:
223
224	- Full transcript (JSON, plain text, SRT subtitles)
225	- Extracted diagrams reproduced as Mermaid/SVG/PNG
226	- Charts reproduced with data tables
227	- Knowledge graph of concepts and relationships
228	- 7 types of markdown documents: summary, meeting notes, glossary, relationship map, status report, entity index, CSV data
229
230	---
231
232	## Decision tracking over time
233
234	Problem: Important decisions are made in meetings but never formally recorded. Months later, nobody remembers why a choice was made.
235
236	Solution: Process meeting recordings continuously and query the growing knowledge graph for decisions and their context.
237
238	```bash
239	# Process each week's recordings
240	planopticon batch -i ./recordings/week-12 -o ./decisions --title "Week 12"
241
242	# The knowledge graph grows over time — entities merge across weeks
243	planopticon query "entities --type goal"
244	planopticon query "entities --type risk"
245	planopticon query "entities --type milestone"
246
247	# Find decisions about a specific topic
248	planopticon query "What was decided about the database migration?"
249
250	# Track risks over time
251	planopticon query "relationships --type risk"
252	```
253
254	The planning taxonomy automatically classifies entities as goals, requirements, risks, tasks, and milestones — giving you a structured view of project evolution over time.
255
256	---
257
258	## Zoom / Teams / Meet integration
259
260	Problem: Meeting recordings are sitting in Zoom/Teams/Meet cloud storage. You want to process them without manually downloading each one.
261
262	Solution: Authenticate once, list recordings, and process them directly.
263
264	```bash
265	# Authenticate with your meeting platform
266	planopticon auth zoom
267	# or: planopticon auth microsoft
268	# or: planopticon auth google
269
270	# List recent recordings
271	planopticon recordings zoom-list
272	planopticon recordings teams-list --from 2026-01-01
273	planopticon recordings meet-list --limit 20
274
275	# Process recordings (download + analyze)
276	planopticon analyze -i "zoom://recording-id" -o ./output
277	```
278
279	Setup requirements:
280
281	\| Platform \| What you need \|
282	\|----------\|--------------\|
283	\| Zoom \| `ZOOM_CLIENT_ID` + `ZOOM_CLIENT_SECRET` (create an OAuth app at marketplace.zoom.us) \|
284	\| Teams \| `MICROSOFT_CLIENT_ID` + `MICROSOFT_CLIENT_SECRET` (register an Azure AD app) \|
285	\| Meet \| `GOOGLE_CLIENT_ID` + `GOOGLE_CLIENT_SECRET` (create OAuth credentials in Google Cloud Console) \|
286
287	See the [Authentication guide](guide/authentication.md) for detailed setup instructions.
288
289	---
290
291	## Fully offline processing
292
293	Problem: You're working with sensitive content that can't leave your network, or you simply don't want to pay for API calls.
294
295	Solution: Use Ollama for local AI processing with no external API calls.
296
297	```bash
298	# Install Ollama and pull models
299	ollama pull llama3.2 # Chat/analysis
300	ollama pull llava # Vision (diagram detection)
301
302	# Install local Whisper for transcription
303	pip install planopticon[gpu]
304
305	# Process entirely offline
306	planopticon analyze -i sensitive-meeting.mp4 -o ./output --provider ollama
307	```
308
309	PlanOpticon auto-detects Ollama when it's running. If no cloud API keys are configured, it uses Ollama automatically. Pair with local Whisper transcription for a fully air-gapped pipeline.
310
311	---
312
313	## Competitive research
314
315	Problem: You want to systematically analyze competitor content — conference talks, documentation, blog posts — and identify patterns.
316
317	Solution: Ingest competitor content from multiple sources and query for patterns.
318
319	```bash
320	# Ingest competitor conference talks from YouTube
321	planopticon ingest "https://youtube.com/watch?v=competitor-talk-1" --output ./competitive
322	planopticon ingest "https://youtube.com/watch?v=competitor-talk-2" --output ./competitive
323
324	# Ingest their documentation
325	planopticon ingest "https://competitor.com/docs" --output ./competitive
326
327	# Ingest their GitHub repos
328	planopticon auth github
329	planopticon ingest "github://competitor/main-product" --output ./competitive
330
331	# Analyze patterns
332	planopticon query "entities --type technology"
333	planopticon query "What technologies are competitors investing in?"
334	planopticon companion --kb ./competitive
335	```
336
337	```
338	planopticon> What are the common architectural patterns across competitors?
339	planopticon> /entities --type technology
340	planopticon> Which technologies appear most frequently?
341	planopticon> /export markdown
342	```

M mkdocs.yml

+11

		--- mkdocs.yml
		+++ mkdocs.yml
		@@ -79,21 +79,32 @@
79	79	- Quick Start: getting-started/quickstart.md
80	80	- Configuration: getting-started/configuration.md
81	81	- User Guide:
82	82	- Single Video Analysis: guide/single-video.md
83	83	- Batch Processing: guide/batch.md
	84	+ - Document Ingestion: guide/document-ingestion.md
84	85	- Cloud Sources: guide/cloud-sources.md
	86	+ - Knowledge Graphs: guide/knowledge-graphs.md
	87	+ - Interactive Companion: guide/companion.md
	88	+ - Planning Agent: guide/planning-agent.md
	89	+ - Authentication: guide/authentication.md
	90	+ - Export & Documents: guide/export.md
85	91	- Output Formats: guide/output-formats.md
	92	+ - Use Cases: use-cases.md
86	93	- CLI Reference: cli-reference.md
87	94	- Architecture:
88	95	- Overview: architecture/overview.md
89	96	- Provider System: architecture/providers.md
90	97	- Processing Pipeline: architecture/pipeline.md
91	98	- API Reference:
92	99	- Models: api/models.md
93	100	- Providers: api/providers.md
94	101	- Analyzers: api/analyzers.md
	102	+ - Agent & Skills: api/agent.md
	103	+ - Sources: api/sources.md
	104	+ - Authentication: api/auth.md
	105	+ - FAQ & Troubleshooting: faq.md
95	106	- Contributing: contributing.md
96	107
97	108	extra:
98	109	social:
99	110	- icon: fontawesome/brands/github
100	111

	--- mkdocs.yml
	+++ mkdocs.yml
	@@ -79,21 +79,32 @@
79	- Quick Start: getting-started/quickstart.md
80	- Configuration: getting-started/configuration.md
81	- User Guide:
82	- Single Video Analysis: guide/single-video.md
83	- Batch Processing: guide/batch.md

84	- Cloud Sources: guide/cloud-sources.md





85	- Output Formats: guide/output-formats.md

86	- CLI Reference: cli-reference.md
87	- Architecture:
88	- Overview: architecture/overview.md
89	- Provider System: architecture/providers.md
90	- Processing Pipeline: architecture/pipeline.md
91	- API Reference:
92	- Models: api/models.md
93	- Providers: api/providers.md
94	- Analyzers: api/analyzers.md




95	- Contributing: contributing.md
96
97	extra:
98	social:
99	- icon: fontawesome/brands/github
100

	--- mkdocs.yml
	+++ mkdocs.yml
	@@ -79,21 +79,32 @@
79	- Quick Start: getting-started/quickstart.md
80	- Configuration: getting-started/configuration.md
81	- User Guide:
82	- Single Video Analysis: guide/single-video.md
83	- Batch Processing: guide/batch.md
84	- Document Ingestion: guide/document-ingestion.md
85	- Cloud Sources: guide/cloud-sources.md
86	- Knowledge Graphs: guide/knowledge-graphs.md
87	- Interactive Companion: guide/companion.md
88	- Planning Agent: guide/planning-agent.md
89	- Authentication: guide/authentication.md
90	- Export & Documents: guide/export.md
91	- Output Formats: guide/output-formats.md
92	- Use Cases: use-cases.md
93	- CLI Reference: cli-reference.md
94	- Architecture:
95	- Overview: architecture/overview.md
96	- Provider System: architecture/providers.md
97	- Processing Pipeline: architecture/pipeline.md
98	- API Reference:
99	- Models: api/models.md
100	- Providers: api/providers.md
101	- Analyzers: api/analyzers.md
102	- Agent & Skills: api/agent.md
103	- Sources: api/sources.md
104	- Authentication: api/auth.md
105	- FAQ & Troubleshooting: faq.md
106	- Contributing: contributing.md
107
108	extra:
109	social:
110	- icon: fontawesome/brands/github
111

PlanOpticon

Keyboard Shortcuts