PlanOpticon
Analyzers API Reference
::: video_processor.analyzers.diagram_analyzer
::: video_processor.analyzers.content_analyzer
::: video_processor.analyzers.action_detector
Overview
The analyzers module contains the core content extraction logic for PlanOpticon. These analyzers process video frames and transcripts to extract structured knowledge: diagrams, key points, action items, and cross-referenced entities.
All analyzers accept an optional ProviderManager instance. When provided, they use LLM capabilities for richer extraction. Without one, they fall back to heuristic/pattern-based methods where possible.
DiagramAnalyzer
from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
Vision model-based diagram detection and analysis. Classifies video frames as diagrams, slides, screenshots, or other content, then performs full extraction on high-confidence frames.
Constructor
def __init__(
self,
provider_manager: Optional[ProviderManager] = None,
confidence_threshold: float = 0.3,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
provider_manager |
Optional[ProviderManager] |
None |
LLM provider (creates a default if not provided) |
confidence_threshold |
float |
0.3 |
Minimum confidence to process a frame at all |
classify_frame()
def classify_frame(self, image_path: Union[str, Path]) -> dict
Classify a single frame using a vision model. Determines whether the frame contains a diagram, slide, or other visual content worth extracting.
Parameters:
| Parameter | Type | Description |
|---|---|---|
image_path |
Union[str, Path] |
Path to the frame image file |
Returns: dict with the following keys:
| Key | Type | Description |
|---|---|---|
is_diagram |
bool |
Whether the frame contains extractable content |
diagram_type |
str |
One of: flowchart, sequence, architecture, whiteboard, chart, table, slide, screenshot, unknown |
confidence |
float |
Detection confidence from 0.0 to 1.0 |
content_type |
str |
Content category: slide, diagram, document, screen_share, whiteboard, chart, person, other |
brief_description |
str |
One-sentence description of the frame content |
Important: Frames showing people, webcam feeds, or video conference participant views return confidence: 0.0. The classifier is tuned to detect only shared/presented content.
analyzer = DiagramAnalyzer()
result = analyzer.classify_frame("/path/to/frame_042.jpg")
if result["confidence"] >= 0.7:
print(f"Diagram detected: {result['diagram_type']}")
analyze_diagram_single_pass()
def analyze_diagram_single_pass(self, image_path: Union[str, Path]) -> dict
Full single-pass diagram analysis. Extracts description, text content, elements, relationships, Mermaid syntax, and chart data in a single LLM call.
Returns: dict with the following keys:
| Key | Type | Description |
|---|---|---|
diagram_type |
str |
Diagram classification |
description |
str |
Detailed description of the visual content |
text_content |
str |
All visible text, preserving structure |
elements |
list[str] |
Identified elements/components |
relationships |
list[str] |
Relationships in "A -> B: label" format |
mermaid |
str |
Valid Mermaid diagram syntax |
chart_data |
dict \| None |
Chart data with labels, values, chart_type (only for data charts) |
Returns an empty dict on failure.
caption_frame()
def caption_frame(self, image_path: Union[str, Path]) -> str
Get a brief 1-2 sentence caption for a frame. Used as a fallback when full diagram analysis is not warranted.
Returns: str -- a brief description of the frame content.
process_frames()
def process_frames(
self,
frame_paths: List[Union[str, Path]],
diagrams_dir: Optional[Path] = None,
captures_dir: Optional[Path] = None,
) -> Tuple[List[DiagramResult], List[ScreenCapture]]
Process a batch of extracted video frames through the full classification and analysis pipeline.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
frame_paths |
List[Union[str, Path]] |
required | Paths to frame images |
diagrams_dir |
Optional[Path] |
None |
Output directory for diagram files (images, mermaid, JSON) |
captures_dir |
Optional[Path] |
None |
Output directory for screengrab fallback files |
Returns: Tuple[List[DiagramResult], List[ScreenCapture]]
Confidence thresholds:
| Confidence Range | Action |
|---|---|
| >= 0.7 | Full diagram analysis -- extracts elements, relationships, Mermaid syntax |
| 0.3 to 0.7 | Screengrab fallback -- saves frame with a brief caption |
| < 0.3 | Skipped entirely |
Output files (when directories are provided):
For diagrams (diagrams_dir):
diagram_N.jpg-- original frame imagediagram_N.mermaid-- Mermaid source (if generated)diagram_N.json-- full DiagramResult as JSON
For screen captures (captures_dir):
capture_N.jpg-- original frame imagecapture_N.json-- ScreenCapture metadata as JSON
from pathlib import Path
from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
from video_processor.providers.manager import ProviderManager
analyzer = DiagramAnalyzer(
provider_manager=ProviderManager(),
confidence_threshold=0.3,
)
frame_paths = list(Path("output/frames").glob("*.jpg"))
diagrams, captures = analyzer.process_frames(
frame_paths,
diagrams_dir=Path("output/diagrams"),
captures_dir=Path("output/captures"),
)
print(f"Found {len(diagrams)} diagrams, {len(captures)} screengrabs")
for d in diagrams:
print(f" [{d.diagram_type.value}] {d.description}")
ContentAnalyzer
from video_processor.analyzers.content_analyzer import ContentAnalyzer
Cross-references transcript and diagram entities for richer knowledge extraction. Merges entities found in different sources and enriches key points with diagram links.
Constructor
def __init__(self, provider_manager: Optional[ProviderManager] = None)
| Parameter | Type | Default | Description |
|---|---|---|---|
provider_manager |
Optional[ProviderManager] |
None |
Required for LLM-based fuzzy matching |
cross_reference()
def cross_reference(
self,
transcript_entities: List[Entity],
diagram_entities: List[Entity],
) -> List[Entity]
Merge entities from transcripts and diagrams into a unified list with source attribution.
Merge strategy:
- Index all transcript entities by lowercase name, marked with
source="transcript" - Merge diagram entities: if a name matches, set
source="both"and combine descriptions/occurrences; otherwise add assource="diagram" - If a
ProviderManageris available, use LLM fuzzy matching to find additional matches among unmatched entities (e.g., "PostgreSQL" from transcript matching "Postgres" from diagram)
Parameters:
| Parameter | Type | Description |
|---|---|---|
transcript_entities |
List[Entity] |
Entities extracted from transcript |
diagram_entities |
List[Entity] |
Entities extracted from diagrams |
Returns: List[Entity] -- merged entity list with source attribution.
from video_processor.analyzers.content_analyzer import ContentAnalyzer
from video_processor.models import Entity
analyzer = ContentAnalyzer(provider_manager=pm)
transcript_entities = [
Entity(name="PostgreSQL", type="technology"),
Entity(name="Alice", type="person"),
]
diagram_entities = [
Entity(name="Postgres", type="technology"),
Entity(name="Redis", type="technology"),
]
merged = analyzer.cross_reference(transcript_entities, diagram_entities)
# "PostgreSQL" and "Postgres" may be fuzzy-matched and merged
enrich_key_points()
def enrich_key_points(
self,
key_points: List[KeyPoint],
diagrams: list,
transcript_text: str,
) -> List[KeyPoint]
Link key points to relevant diagrams by entity overlap. Examines word overlap between key point text and diagram elements/text content.
Parameters:
| Parameter | Type | Description |
|---|---|---|
key_points |
List[KeyPoint] |
Key points to enrich |
diagrams |
list |
List of DiagramResult objects or dicts |
transcript_text |
str |
Full transcript text (reserved for future use) |
Returns: List[KeyPoint] -- key points with related_diagrams indices populated.
A key point is linked to a diagram when they share 2 or more words (excluding short words) between the key point text/details and the diagram's elements/text content.
ActionDetector
from video_processor.analyzers.action_detector import ActionDetector
Detects action items from transcripts and diagram content using LLM extraction with a regex pattern fallback.
Constructor
def __init__(self, provider_manager: Optional[ProviderManager] = None)
| Parameter | Type | Default | Description |
|---|---|---|---|
provider_manager |
Optional[ProviderManager] |
None |
Required for LLM-based extraction |
detect_from_transcript()
def detect_from_transcript(
self,
text: str,
segments: Optional[List[TranscriptSegment]] = None,
) -> List[ActionItem]
Detect action items from transcript text.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
str |
required | Transcript text to analyze |
segments |
Optional[List[TranscriptSegment]] |
None |
Transcript segments for timestamp attachment |
Returns: List[ActionItem] -- detected action items with source="transcript".
Extraction modes:
- LLM mode (when
provider_manageris set): Sends the transcript to the LLM with a structured extraction prompt. Extracts action, assignee, deadline, priority, and context. - Pattern mode (fallback): Matches sentences against regex patterns for action-oriented language.
Pattern matching detects sentences containing:
- "need/needs to", "should/must/shall"
- "will/going to", "action item/todo/follow-up"
- "assigned to/responsible for", "deadline/due by"
- "let's/let us", "make sure/ensure"
- "can you/could you/please"
Timestamp attachment: When segments are provided, each action item is matched to the most relevant transcript segment (by word overlap, minimum 3 matching words), and a timestamp is added to context.
detect_from_diagrams()
def detect_from_diagrams(self, diagrams: list) -> List[ActionItem]
Extract action items from diagram text content and elements. Processes each diagram's combined text using either LLM or pattern extraction.
Parameters:
| Parameter | Type | Description |
|---|---|---|
diagrams |
list |
List of DiagramResult objects or dicts |
Returns: List[ActionItem] -- action items with source="diagram".
merge_action_items()
def merge_action_items(
self,
transcript_items: List[ActionItem],
diagram_items: List[ActionItem],
) -> List[ActionItem]
Merge action items from multiple sources, deduplicating by action text (case-insensitive, whitespace-normalized).
Returns: List[ActionItem] -- deduplicated merged list.
Usage example
from video_processor.analyzers.action_detector import ActionDetector
from video_processor.providers.manager import ProviderManager
detector = ActionDetector(provider_manager=ProviderManager())
# From transcript
transcript_items = detector.detect_from_transcript(
text="Alice needs to update the API docs by Friday. "
"Bob should review the PR before merging.",
segments=transcript_segments,
)
# From diagrams
diagram_items = detector.detect_from_diagrams(diagram_results)
# Merge and deduplicate
all_items = detector.merge_action_items(transcript_items, diagram_items)
for item in all_items:
print(f"[{item.priority or 'unset'}] {item.action}")
if item.assignee:
print(f" Assignee: {item.assignee}")
if item.deadline:
print(f" Deadline: {item.deadline}")
Pattern fallback (no LLM)
# Works without any API keys
detector = ActionDetector() # No provider_manager
items = detector.detect_from_transcript(
"We need to finalize the database schema. "
"Please update the deployment scripts."
)
# Returns ActionItems matched by regex patterns