PlanOpticon

planopticon / docs / api / analyzers.md

Source Rendered

Blame History Raw 390 lines

1	`# Analyzers API Reference`
2
3	`::: video_processor.analyzers.diagram_analyzer`
4
5	`::: video_processor.analyzers.content_analyzer`
6
7	`::: video_processor.analyzers.action_detector`
8
9	`---`
10
11	`## Overview`
12
13	`The analyzers module contains the core content extraction logic for PlanOpticon. These analyzers process video frames and transcripts to extract structured knowledge: diagrams, key points, action items, and cross-referenced entities.`
14
15	All analyzers accept an optional `ProviderManager` instance. When provided, they use LLM capabilities for richer extraction. Without one, they fall back to heuristic/pattern-based methods where possible.
16
17	`---`
18
19	`## DiagramAnalyzer`
20
21	```python
22	`from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer`
23	```
24
25	`Vision model-based diagram detection and analysis. Classifies video frames as diagrams, slides, screenshots, or other content, then performs full extraction on high-confidence frames.`
26
27	`### Constructor`
28
29	```python
30	`def __init__(`
31	`self,`
32	`provider_manager: Optional[ProviderManager] = None,`
33	`confidence_threshold: float = 0.3,`
34	`)`
35	```
36
37	`\| Parameter \| Type \| Default \| Description \|`
38	`\|---\|---\|---\|---\|`
39	\| `provider_manager` \| `Optional[ProviderManager]` \| `None` \| LLM provider (creates a default if not provided) \|
40	\| `confidence_threshold` \| `float` \| `0.3` \| Minimum confidence to process a frame at all \|
41
42	`### classify_frame()`
43
44	```python
45	`def classify_frame(self, image_path: Union[str, Path]) -> dict`
46	```
47
48	`Classify a single frame using a vision model. Determines whether the frame contains a diagram, slide, or other visual content worth extracting.`
49
50	`Parameters:`
51
52	`\| Parameter \| Type \| Description \|`
53	`\|---\|---\|---\|`
54	\| `image_path` \| `Union[str, Path]` \| Path to the frame image file \|
55
56	Returns: `dict` with the following keys:
57
58	`\| Key \| Type \| Description \|`
59	`\|---\|---\|---\|`
60	\| `is_diagram` \| `bool` \| Whether the frame contains extractable content \|
61	\| `diagram_type` \| `str` \| One of: `flowchart`, `sequence`, `architecture`, `whiteboard`, `chart`, `table`, `slide`, `screenshot`, `unknown` \|
62	\| `confidence` \| `float` \| Detection confidence from 0.0 to 1.0 \|
63	\| `content_type` \| `str` \| Content category: `slide`, `diagram`, `document`, `screen_share`, `whiteboard`, `chart`, `person`, `other` \|
64	\| `brief_description` \| `str` \| One-sentence description of the frame content \|
65
66	Important: Frames showing people, webcam feeds, or video conference participant views return `confidence: 0.0`. The classifier is tuned to detect only shared/presented content.
67
68	```python
69	`analyzer = DiagramAnalyzer()`
70	`result = analyzer.classify_frame("/path/to/frame_042.jpg")`
71	`if result["confidence"] >= 0.7:`
72	`print(f"Diagram detected: {result['diagram_type']}")`
73	```
74
75	`### analyze_diagram_single_pass()`
76
77	```python
78	`def analyze_diagram_single_pass(self, image_path: Union[str, Path]) -> dict`
79	```
80
81	`Full single-pass diagram analysis. Extracts description, text content, elements, relationships, Mermaid syntax, and chart data in a single LLM call.`
82
83	Returns: `dict` with the following keys:
84
85	`\| Key \| Type \| Description \|`
86	`\|---\|---\|---\|`
87	\| `diagram_type` \| `str` \| Diagram classification \|
88	\| `description` \| `str` \| Detailed description of the visual content \|
89	\| `text_content` \| `str` \| All visible text, preserving structure \|
90	\| `elements` \| `list[str]` \| Identified elements/components \|
91	\| `relationships` \| `list[str]` \| Relationships in `"A -> B: label"` format \|
92	\| `mermaid` \| `str` \| Valid Mermaid diagram syntax \|
93	\| `chart_data` \| `dict \\| None` \| Chart data with `labels`, `values`, `chart_type` (only for data charts) \|
94
95	Returns an empty `dict` on failure.
96
97	`### caption_frame()`
98
99	```python
100	`def caption_frame(self, image_path: Union[str, Path]) -> str`
101	```
102
103	`Get a brief 1-2 sentence caption for a frame. Used as a fallback when full diagram analysis is not warranted.`
104
105	Returns: `str` -- a brief description of the frame content.
106
107	`### process_frames()`
108
109	```python
110	`def process_frames(`
111	`self,`
112	`frame_paths: List[Union[str, Path]],`
113	`diagrams_dir: Optional[Path] = None,`
114	`captures_dir: Optional[Path] = None,`
115	`) -> Tuple[List[DiagramResult], List[ScreenCapture]]`
116	```
117
118	`Process a batch of extracted video frames through the full classification and analysis pipeline.`
119
120	`Parameters:`
121
122	`\| Parameter \| Type \| Default \| Description \|`
123	`\|---\|---\|---\|---\|`
124	\| `frame_paths` \| `List[Union[str, Path]]` \| required \| Paths to frame images \|
125	\| `diagrams_dir` \| `Optional[Path]` \| `None` \| Output directory for diagram files (images, mermaid, JSON) \|
126	\| `captures_dir` \| `Optional[Path]` \| `None` \| Output directory for screengrab fallback files \|
127
128	Returns: `Tuple[List[DiagramResult], List[ScreenCapture]]`
129
130	`Confidence thresholds:`
131
132	`\| Confidence Range \| Action \|`
133	`\|---\|---\|`
134	`\| >= 0.7 \| Full diagram analysis -- extracts elements, relationships, Mermaid syntax \|`
135	`\| 0.3 to 0.7 \| Screengrab fallback -- saves frame with a brief caption \|`
136	`\| < 0.3 \| Skipped entirely \|`
137
138	`Output files (when directories are provided):`
139
140	For diagrams (`diagrams_dir`):
141
142	- `diagram_N.jpg` -- original frame image
143	- `diagram_N.mermaid` -- Mermaid source (if generated)
144	- `diagram_N.json` -- full DiagramResult as JSON
145
146	For screen captures (`captures_dir`):
147
148	- `capture_N.jpg` -- original frame image
149	- `capture_N.json` -- ScreenCapture metadata as JSON
150
151	```python
152	`from pathlib import Path`
153	`from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer`
154	`from video_processor.providers.manager import ProviderManager`
155
156	`analyzer = DiagramAnalyzer(`
157	`provider_manager=ProviderManager(),`
158	`confidence_threshold=0.3,`
159	`)`
160
161	`frame_paths = list(Path("output/frames").glob("*.jpg"))`
162	`diagrams, captures = analyzer.process_frames(`
163	`frame_paths,`
164	`diagrams_dir=Path("output/diagrams"),`
165	`captures_dir=Path("output/captures"),`
166	`)`
167
168	`print(f"Found {len(diagrams)} diagrams, {len(captures)} screengrabs")`
169	`for d in diagrams:`
170	`print(f" [{d.diagram_type.value}] {d.description}")`
171	```
172
173	`---`
174
175	`## ContentAnalyzer`
176
177	```python
178	`from video_processor.analyzers.content_analyzer import ContentAnalyzer`
179	```
180
181	`Cross-references transcript and diagram entities for richer knowledge extraction. Merges entities found in different sources and enriches key points with diagram links.`
182
183	`### Constructor`
184
185	```python
186	`def __init__(self, provider_manager: Optional[ProviderManager] = None)`
187	```
188
189	`\| Parameter \| Type \| Default \| Description \|`
190	`\|---\|---\|---\|---\|`
191	\| `provider_manager` \| `Optional[ProviderManager]` \| `None` \| Required for LLM-based fuzzy matching \|
192
193	`### cross_reference()`
194
195	```python
196	`def cross_reference(`
197	`self,`
198	`transcript_entities: List[Entity],`
199	`diagram_entities: List[Entity],`
200	`) -> List[Entity]`
201	```
202
203	`Merge entities from transcripts and diagrams into a unified list with source attribution.`
204
205	`Merge strategy:`
206
207	1. Index all transcript entities by lowercase name, marked with `source="transcript"`
208	2. Merge diagram entities: if a name matches, set `source="both"` and combine descriptions/occurrences; otherwise add as `source="diagram"`
209	3. If a `ProviderManager` is available, use LLM fuzzy matching to find additional matches among unmatched entities (e.g., "PostgreSQL" from transcript matching "Postgres" from diagram)
210
211	`Parameters:`
212
213	`\| Parameter \| Type \| Description \|`
214	`\|---\|---\|---\|`
215	\| `transcript_entities` \| `List[Entity]` \| Entities extracted from transcript \|
216	\| `diagram_entities` \| `List[Entity]` \| Entities extracted from diagrams \|
217
218	Returns: `List[Entity]` -- merged entity list with `source` attribution.
219
220	```python
221	`from video_processor.analyzers.content_analyzer import ContentAnalyzer`
222	`from video_processor.models import Entity`
223
224	`analyzer = ContentAnalyzer(provider_manager=pm)`
225
226	`transcript_entities = [`
227	`Entity(name="PostgreSQL", type="technology"),`
228	`Entity(name="Alice", type="person"),`
229	`]`
230	`diagram_entities = [`
231	`Entity(name="Postgres", type="technology"),`
232	`Entity(name="Redis", type="technology"),`
233	`]`
234
235	`merged = analyzer.cross_reference(transcript_entities, diagram_entities)`
236	`# "PostgreSQL" and "Postgres" may be fuzzy-matched and merged`
237	```
238
239	`### enrich_key_points()`
240
241	```python
242	`def enrich_key_points(`
243	`self,`
244	`key_points: List[KeyPoint],`
245	`diagrams: list,`
246	`transcript_text: str,`
247	`) -> List[KeyPoint]`
248	```
249
250	`Link key points to relevant diagrams by entity overlap. Examines word overlap between key point text and diagram elements/text content.`
251
252	`Parameters:`
253
254	`\| Parameter \| Type \| Description \|`
255	`\|---\|---\|---\|`
256	\| `key_points` \| `List[KeyPoint]` \| Key points to enrich \|
257	\| `diagrams` \| `list` \| List of `DiagramResult` objects or dicts \|
258	\| `transcript_text` \| `str` \| Full transcript text (reserved for future use) \|
259
260	Returns: `List[KeyPoint]` -- key points with `related_diagrams` indices populated.
261
262	`A key point is linked to a diagram when they share 2 or more words (excluding short words) between the key point text/details and the diagram's elements/text content.`
263
264	`---`
265
266	`## ActionDetector`
267
268	```python
269	`from video_processor.analyzers.action_detector import ActionDetector`
270	```
271
272	`Detects action items from transcripts and diagram content using LLM extraction with a regex pattern fallback.`
273
274	`### Constructor`
275
276	```python
277	`def __init__(self, provider_manager: Optional[ProviderManager] = None)`
278	```
279
280	`\| Parameter \| Type \| Default \| Description \|`
281	`\|---\|---\|---\|---\|`
282	\| `provider_manager` \| `Optional[ProviderManager]` \| `None` \| Required for LLM-based extraction \|
283
284	`### detect_from_transcript()`
285
286	```python
287	`def detect_from_transcript(`
288	`self,`
289	`text: str,`
290	`segments: Optional[List[TranscriptSegment]] = None,`
291	`) -> List[ActionItem]`
292	```
293
294	`Detect action items from transcript text.`
295
296	`Parameters:`
297
298	`\| Parameter \| Type \| Default \| Description \|`
299	`\|---\|---\|---\|---\|`
300	\| `text` \| `str` \| required \| Transcript text to analyze \|
301	\| `segments` \| `Optional[List[TranscriptSegment]]` \| `None` \| Transcript segments for timestamp attachment \|
302
303	Returns: `List[ActionItem]` -- detected action items with `source="transcript"`.
304
305	`Extraction modes:`
306
307	- LLM mode (when `provider_manager` is set): Sends the transcript to the LLM with a structured extraction prompt. Extracts action, assignee, deadline, priority, and context.
308	`- Pattern mode (fallback): Matches sentences against regex patterns for action-oriented language.`
309
310	`Pattern matching detects sentences containing:`
311
312	`- "need/needs to", "should/must/shall"`
313	`- "will/going to", "action item/todo/follow-up"`
314	`- "assigned to/responsible for", "deadline/due by"`
315	`- "let's/let us", "make sure/ensure"`
316	`- "can you/could you/please"`
317
318	Timestamp attachment: When `segments` are provided, each action item is matched to the most relevant transcript segment (by word overlap, minimum 3 matching words), and a timestamp is added to `context`.
319
320	`### detect_from_diagrams()`
321
322	```python
323	`def detect_from_diagrams(self, diagrams: list) -> List[ActionItem]`
324	```
325
326	`Extract action items from diagram text content and elements. Processes each diagram's combined text using either LLM or pattern extraction.`
327
328	`Parameters:`
329
330	`\| Parameter \| Type \| Description \|`
331	`\|---\|---\|---\|`
332	\| `diagrams` \| `list` \| List of `DiagramResult` objects or dicts \|
333
334	Returns: `List[ActionItem]` -- action items with `source="diagram"`.
335
336	`### merge_action_items()`
337
338	```python
339	`def merge_action_items(`
340	`self,`
341	`transcript_items: List[ActionItem],`
342	`diagram_items: List[ActionItem],`
343	`) -> List[ActionItem]`
344	```
345
346	`Merge action items from multiple sources, deduplicating by action text (case-insensitive, whitespace-normalized).`
347
348	Returns: `List[ActionItem]` -- deduplicated merged list.
349
350	`### Usage example`
351
352	```python
353	`from video_processor.analyzers.action_detector import ActionDetector`
354	`from video_processor.providers.manager import ProviderManager`
355
356	`detector = ActionDetector(provider_manager=ProviderManager())`
357
358	`# From transcript`
359	`transcript_items = detector.detect_from_transcript(`
360	`text="Alice needs to update the API docs by Friday. "`
361	`"Bob should review the PR before merging.",`
362	`segments=transcript_segments,`
363	`)`
364
365	`# From diagrams`
366	`diagram_items = detector.detect_from_diagrams(diagram_results)`
367
368	`# Merge and deduplicate`
369	`all_items = detector.merge_action_items(transcript_items, diagram_items)`
370
371	`for item in all_items:`
372	`print(f"[{item.priority or 'unset'}] {item.action}")`
373	`if item.assignee:`
374	`print(f" Assignee: {item.assignee}")`
375	`if item.deadline:`
376	`print(f" Deadline: {item.deadline}")`
377	```
378
379	`### Pattern fallback (no LLM)`
380
381	```python
382	`# Works without any API keys`
383	`detector = ActionDetector() # No provider_manager`
384	`items = detector.detect_from_transcript(`
385	`"We need to finalize the database schema. "`
386	`"Please update the deployment scripts."`
387	`)`
388	`# Returns ActionItems matched by regex patterns`
389	```
390

PlanOpticon

Keyboard Shortcuts