829e24abdf9… — PlanOpticon

A .github/CONTRIBUTING.md

+79

		--- a/.github/CONTRIBUTING.md
		+++ b/.github/CONTRIBUTING.md
		@@ -0,0 +1,79 @@
	1	+# Contributing to PlanOpticon
	2	+
	3	+Thank you for your interest in contributing to PlanOpticon! This guide will help you get started.
	4	+
	5	+## Development Setup
	6	+
	7	+1. Fork and clone the repository:
	8	+
	9	+ ```bash
	10	+ git clone https://github.com/<your-username>/PlanOpticon.git
	11	+ cd PlanOpticon
	12	+ ```
	13	+
	14	+2. Create a virtual environment:
	15	+
	16	+ ```bash
	17	+ python -m venv .venv
	18	+ source .venv/bin/activate # On Windows: .venv\Scripts\activate
	19	+ ```
	20	+
	21	+3. Install in editable mode with dev dependencies:
	22	+
	23	+ ```bash
	24	+ pip install -e ".[dev]"
	25	+ ```
	26	+
	27	+4. Install FFmpeg (required for video processing):
	28	+
	29	+ ```bash
	30	+ # macOS
	31	+ brew install ffmpeg
	32	+
	33	+ # Ubuntu/Debian
	34	+ sudo apt install ffmpeg
	35	+ ```
	36	+
	37	+5. Set up at least one AI provider API key:
	38	+
	39	+ ```bash
	40	+ export OPENAI_API_KEY="sk-..."
	41	+ # or
	42	+ export ANTHROPIC_API_KEY="sk-ant-..."
	43	+ # or
	44	+ export GEMINI_API_KEY="..."
	45	+ ```
	46	+
	47	+## Running Tests
	48	+
	49	+```bash
	50	+pytest tests/
	51	+```
	52	+
	53	+To run tests with coverage:
	54	+
	55	+```bash
	56	+pytest tests/ --cov=video_processor
	57	+```
	58	+
	59	+## Code Style
	60	+
	61	+This project uses [Ruff](https://docs.astral.sh/ruff/) for linting and formatting.
	62	+
	63	+Check for lint issues:
	64	+
	65	+```bash
	66	+ruff check .
	67	+```
	68	+
	69	+modifying files):*
	70	+
	71	+```bash
	72	+ruff format --check .
	73	+```
	74	+
	75	+The project targets a line length of 100 characters and Python 3.10+. See `pyproject.toml` for the full Ruff configuration.
	76	+
	77	+## Commit Conventions
	78	+
	79	+Write clear, descriptive commit messages. Use the imperative mood in t

	--- a/.github/CONTRIBUTING.md
	+++ b/.github/CONTRIBUTING.md
	@@ -0,0 +1,79 @@

	--- a/.github/CONTRIBUTING.md
	+++ b/.github/CONTRIBUTING.md
	@@ -0,0 +1,79 @@
1	# Contributing to PlanOpticon
2
3	Thank you for your interest in contributing to PlanOpticon! This guide will help you get started.
4
5	## Development Setup
6
7	1. Fork and clone the repository:
8
9	```bash
10	git clone https://github.com/<your-username>/PlanOpticon.git
11	cd PlanOpticon
12	```
13
14	2. Create a virtual environment:
15
16	```bash
17	python -m venv .venv
18	source .venv/bin/activate # On Windows: .venv\Scripts\activate
19	```
20
21	3. Install in editable mode with dev dependencies:
22
23	```bash
24	pip install -e ".[dev]"
25	```
26
27	4. Install FFmpeg (required for video processing):
28
29	```bash
30	# macOS
31	brew install ffmpeg
32
33	# Ubuntu/Debian
34	sudo apt install ffmpeg
35	```
36
37	5. Set up at least one AI provider API key:
38
39	```bash
40	export OPENAI_API_KEY="sk-..."
41	# or
42	export ANTHROPIC_API_KEY="sk-ant-..."
43	# or
44	export GEMINI_API_KEY="..."
45	```
46
47	## Running Tests
48
49	```bash
50	pytest tests/
51	```
52
53	To run tests with coverage:
54
55	```bash
56	pytest tests/ --cov=video_processor
57	```
58
59	## Code Style
60
61	This project uses [Ruff](https://docs.astral.sh/ruff/) for linting and formatting.
62
63	Check for lint issues:
64
65	```bash
66	ruff check .
67	```
68
69	modifying files):*
70
71	```bash
72	ruff format --check .
73	```
74
75	The project targets a line length of 100 characters and Python 3.10+. See `pyproject.toml` for the full Ruff configuration.
76
77	## Commit Conventions
78
79	Write clear, descriptive commit messages. Use the imperative mood in t

A .github/FUNDING.yml

+1

		--- a/.github/FUNDING.yml
		+++ b/.github/FUNDING.yml
		@@ -0,0 +1 @@
	1	+github: ConflictHQ

	--- a/.github/FUNDING.yml
	+++ b/.github/FUNDING.yml
	@@ -0,0 +1 @@

	--- a/.github/FUNDING.yml
	+++ b/.github/FUNDING.yml
	@@ -0,0 +1 @@
1	github: ConflictHQ

A .github/ISSUE_TEMPLATE/bug_report.yml

+106

		--- a/.github/ISSUE_TEMPLATE/bug_report.yml
		+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
		@@ -0,0 +1,106 @@
	1	+name: Bug Report
	2	+description: Report a bug in PlanOpticon
	3	+title: "[Bug]: "
	4	+labels: ["bug", "triage"]
	5	+body:
	6	+ - type: markdown
	7	+ attributes:
	8	+ value: \|
	9	+ Thank you for taking the time to report a bug. Please fill out the fields below so we can diagnose and fix the issue as quickly as possible.
	10	+
	11	+ - type: textarea
	12	+ id: description
	13	+ attributes:
	14	+ label: Description
	15	+ description: A clear and concise description of the bug.
	16	+ placeholder: Describe the bug...
	17	+ validations:
	18	+ required: true
	19	+
	20	+ - type: textarea
	21	+ id: steps-to-reproduce
	22	+ attributes:
	23	+ label: Steps to Reproduce
	24	+ description: The exact steps to reproduce the behavior.
	25	+ placeholder: \|
	26	+ 1. Run `planopticon analyze -i video.mp4 -o ./output`
	27	+ 2. Wait for frame extraction to complete
	28	+ 3. Observe error in diagram extraction step
	29	+ validations:
	30	+ required: true
	31	+
	32	+ - type: textarea
	33	+ id: expected-behavior
	34	+ attributes:
	35	+ label: Expected Behavior
	36	+ description: What you expected to happen.
	37	+ placeholder: Describe what you expected...
	38	+ validations:
	39	+ required: true
	40	+
	41	+ - type: textarea
	42	+ id: actual-behavior
	43	+ attributes:
	44	+ label: Actual Behavior
	45	+ description: What actually happened.
	46	+ placeholder: Describe what actually happened...
	47	+ validations:
	48	+ required: true
	49	+
	50	+ - type: dropdown
	51	+ id: os
	52	+ attributes:
	53	+ label: Operating System
	54	+ options:
	55	+ - macOS
	56	+ - Linux (Ubuntu/Debian)
	57	+ - Linux (Fedora/RHEL)
	58	+ - Linux (other)
	59	+ - Windows
	60	+ - Other
	61	+ validations:
	62	+ required: true
	63	+
	64	+ - type: dropdown
	65	+ id: python-version
	66	+ attributes:
	67	+ label: Python Version
	68	+ options:
	69	+ - "3.13"
	70	+ - "3.12"
	71	+ - "3.11"
	72	+ - "3.10"
	73	+ validations:
	74	+ required: true
	75	+
	76	+ - type: input
	77	+ id: planopticon-version
	78	+ attributes:
	79	+ label: PlanOpticon Version
	80	+ description: Run `planopticon --version` or `pip show planopticon` to find this.
	81	+ placeholder: "e.g. 0.2.0"
	82	+ validations:
	83	+ required: true
	84	+
	85	+ - type: dropdown
	86	+ id: provider
	87	+ attributes:
	88	+ label: AI Provider
	89	+ description: Which AI provider were you using when the bug occurred?
	90	+ options:
	91	+ - OpenAI
	92	+ - Anthropic
	93	+ - Google Gemini
	94	+ - Multiple providers
	95	+ - Not applicable
	96	+ validations:
	97	+ required: true
	98	+
	99	+ - type: textarea
	100	+ id: logs
	101	+ attributes:
	102	+ label: Logs
	103	+ description: Paste any relevant log output. This will be automatically formatted as code.
	104	+ render: shell
	105	+ validations:
	106	+ required: false

	--- a/.github/ISSUE_TEMPLATE/bug_report.yml
	+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
	@@ -0,0 +1,106 @@

	--- a/.github/ISSUE_TEMPLATE/bug_report.yml
	+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
	@@ -0,0 +1,106 @@
1	name: Bug Report
2	description: Report a bug in PlanOpticon
3	title: "[Bug]: "
4	labels: ["bug", "triage"]
5	body:
6	- type: markdown
7	attributes:
8	value: \|
9	Thank you for taking the time to report a bug. Please fill out the fields below so we can diagnose and fix the issue as quickly as possible.
10
11	- type: textarea
12	id: description
13	attributes:
14	label: Description
15	description: A clear and concise description of the bug.
16	placeholder: Describe the bug...
17	validations:
18	required: true
19
20	- type: textarea
21	id: steps-to-reproduce
22	attributes:
23	label: Steps to Reproduce
24	description: The exact steps to reproduce the behavior.
25	placeholder: \|
26	1. Run `planopticon analyze -i video.mp4 -o ./output`
27	2. Wait for frame extraction to complete
28	3. Observe error in diagram extraction step
29	validations:
30	required: true
31
32	- type: textarea
33	id: expected-behavior
34	attributes:
35	label: Expected Behavior
36	description: What you expected to happen.
37	placeholder: Describe what you expected...
38	validations:
39	required: true
40
41	- type: textarea
42	id: actual-behavior
43	attributes:
44	label: Actual Behavior
45	description: What actually happened.
46	placeholder: Describe what actually happened...
47	validations:
48	required: true
49
50	- type: dropdown
51	id: os
52	attributes:
53	label: Operating System
54	options:
55	- macOS
56	- Linux (Ubuntu/Debian)
57	- Linux (Fedora/RHEL)
58	- Linux (other)
59	- Windows
60	- Other
61	validations:
62	required: true
63
64	- type: dropdown
65	id: python-version
66	attributes:
67	label: Python Version
68	options:
69	- "3.13"
70	- "3.12"
71	- "3.11"
72	- "3.10"
73	validations:
74	required: true
75
76	- type: input
77	id: planopticon-version
78	attributes:
79	label: PlanOpticon Version
80	description: Run `planopticon --version` or `pip show planopticon` to find this.
81	placeholder: "e.g. 0.2.0"
82	validations:
83	required: true
84
85	- type: dropdown
86	id: provider
87	attributes:
88	label: AI Provider
89	description: Which AI provider were you using when the bug occurred?
90	options:
91	- OpenAI
92	- Anthropic
93	- Google Gemini
94	- Multiple providers
95	- Not applicable
96	validations:
97	required: true
98
99	- type: textarea
100	id: logs
101	attributes:
102	label: Logs
103	description: Paste any relevant log output. This will be automatically formatted as code.
104	render: shell
105	validations:
106	required: false

A .github/ISSUE_TEMPLATE/config.yml

+5

		--- a/.github/ISSUE_TEMPLATE/config.yml
		+++ b/.github/ISSUE_TEMPLATE/config.yml
		@@ -0,0 +1,5 @@
	1	+blank_issues_enabled: false
	2	+contact_links:
	3	+ - name: Discussions
	4	+ url: https://github.com/ConflictHQ/PlanOpticon/discussions
	5	+ about: Ask questions, share ideas, or discuss PlanOpticon with the community.

	--- a/.github/ISSUE_TEMPLATE/config.yml
	+++ b/.github/ISSUE_TEMPLATE/config.yml
	@@ -0,0 +1,5 @@

	--- a/.github/ISSUE_TEMPLATE/config.yml
	+++ b/.github/ISSUE_TEMPLATE/config.yml
	@@ -0,0 +1,5 @@
1	blank_issues_enabled: false
2	contact_links:
3	- name: Discussions
4	url: https://github.com/ConflictHQ/PlanOpticon/discussions
5	about: Ask questions, share ideas, or discuss PlanOpticon with the community.

A .github/ISSUE_TEMPLATE/feature_request.yml

+37

		--- a/.github/ISSUE_TEMPLATE/feature_request.yml
		+++ b/.github/ISSUE_TEMPLATE/feature_request.yml
		@@ -0,0 +1,37 @@
	1	+name: Feature Request
	2	+description: Suggest a new feature or improvement for PlanOpticon
	3	+title: "[Feature]: "
	4	+labels: ["enhancement"]
	5	+body:
	6	+ - type: markdown
	7	+ attributes:
	8	+ value: \|
	9	+ We appreciate your ideas for improving PlanOpticon. Please describe your feature request in detail so we can evaluate and prioritize it.
	10	+
	11	+ - type: textarea
	12	+ id: description
	13	+ attributes:
	14	+ label: Description
	15	+ description: A clear and concise description of the feature you would like to see.
	16	+ placeholder: Describe the feature...
	17	+ validations:
	18	+ required: true
	19	+
	20	+ - type: textarea
	21	+ id: use-case
	22	+ attributes:
	23	+ label: Use Case
	24	+ description: Explain the problem this feature would solve or the workflow it would improve. Why is this feature important to you?
	25	+ placeholder: \|
	26	+ As a user who processes large batches of meeting recordings, I need...
	27	+ validations:
	28	+ required: true
	29	+
	30	+ - type: textarea
	31	+ id: proposed-solution
	32	+ attributes:
	33	+ label: Proposed Solution
	34	+ description: If you have ideas on how this could be implemented, describe them here. This is optional -- we welcome feature requests even without a proposed solution.
	35	+ placeholder: Describe a possible implementation approach...
	36	+ validations:
	37	+ required: false

	--- a/.github/ISSUE_TEMPLATE/feature_request.yml
	+++ b/.github/ISSUE_TEMPLATE/feature_request.yml
	@@ -0,0 +1,37 @@

	--- a/.github/ISSUE_TEMPLATE/feature_request.yml
	+++ b/.github/ISSUE_TEMPLATE/feature_request.yml
	@@ -0,0 +1,37 @@
1	name: Feature Request
2	description: Suggest a new feature or improvement for PlanOpticon
3	title: "[Feature]: "
4	labels: ["enhancement"]
5	body:
6	- type: markdown
7	attributes:
8	value: \|
9	We appreciate your ideas for improving PlanOpticon. Please describe your feature request in detail so we can evaluate and prioritize it.
10
11	- type: textarea
12	id: description
13	attributes:
14	label: Description
15	description: A clear and concise description of the feature you would like to see.
16	placeholder: Describe the feature...
17	validations:
18	required: true
19
20	- type: textarea
21	id: use-case
22	attributes:
23	label: Use Case
24	description: Explain the problem this feature would solve or the workflow it would improve. Why is this feature important to you?
25	placeholder: \|
26	As a user who processes large batches of meeting recordings, I need...
27	validations:
28	required: true
29
30	- type: textarea
31	id: proposed-solution
32	attributes:
33	label: Proposed Solution
34	description: If you have ideas on how this could be implemented, describe them here. This is optional -- we welcome feature requests even without a proposed solution.
35	placeholder: Describe a possible implementation approach...
36	validations:
37	required: false

A .github/PULL_REQUEST_TEMPLATE.md

+25

		--- a/.github/PULL_REQUEST_TEMPLATE.md
		+++ b/.github/PULL_REQUEST_TEMPLATE.md
		@@ -0,0 +1,25 @@
	1	+## Summary of Changes
	2	+
	3	+<!-- Briefly describe what this PR does and why. -->
	4	+
	5	+## Type of Change
	6	+
	7	+<!-- Check the one that applies. -->
	8	+
	9	+- [ ] Bug fix (non-breaking change that fixes an issue)
	10	+- [ ] New feature (non-breaking change that adds functionality)
	11	+- [ ] Documentation update
	12	+- [ ] Refactor (no functional changes)
	13	+- [ ] Breaking change (fix or feature that would cause existing functionality to change)
	14	+
	15	+## Test Plan
	16	+
	17	+<!-- Describe how you tested these changes. Include commands, scenarios, or links to CI runs. -->
	18	+
	19	+## Checklist
	20	+
	21	+- [ ] Tests pass locally (`pytest tests/`)
	22	+- [ ] Lint is clean (`ruff check .` and `ruff format --check .`)
	23	+- [ ] Documentation has been updated (if applicable)
	24	+- [ ] Any new dependencies are added to `pyproject.toml`
	25	+- [ ] Commit messages follow the project's conventions

	--- a/.github/PULL_REQUEST_TEMPLATE.md
	+++ b/.github/PULL_REQUEST_TEMPLATE.md
	@@ -0,0 +1,25 @@

	--- a/.github/PULL_REQUEST_TEMPLATE.md
	+++ b/.github/PULL_REQUEST_TEMPLATE.md
	@@ -0,0 +1,25 @@
1	## Summary of Changes
2
3	<!-- Briefly describe what this PR does and why. -->
4
5	## Type of Change
6
7	<!-- Check the one that applies. -->
8
9	- [ ] Bug fix (non-breaking change that fixes an issue)
10	- [ ] New feature (non-breaking change that adds functionality)
11	- [ ] Documentation update
12	- [ ] Refactor (no functional changes)
13	- [ ] Breaking change (fix or feature that would cause existing functionality to change)
14
15	## Test Plan
16
17	<!-- Describe how you tested these changes. Include commands, scenarios, or links to CI runs. -->
18
19	## Checklist
20
21	- [ ] Tests pass locally (`pytest tests/`)
22	- [ ] Lint is clean (`ruff check .` and `ruff format --check .`)
23	- [ ] Documentation has been updated (if applicable)
24	- [ ] Any new dependencies are added to `pyproject.toml`
25	- [ ] Commit messages follow the project's conventions

A .github/SECURITY.md

+40

		--- a/.github/SECURITY.md
		+++ b/.github/SECURITY.md
		@@ -0,0 +1,40 @@
	1	+# Security Policy
	2	+
	3	+## Reporting a Vulnerability
	4	+
	5	+If you discover a security vulnerability in PlanOpticon, we ask that you report it responsibly. Please do not open a public GitHub issue for security vulnerabilities.
	6	+
	7	+Instead, send an email to:
	8	+
	9	+[email protected]
	10	+
	11	+Include as much of the following information as possible:
	12	+
	13	+- A description of the vulnerability and its potential impact
	14	+- Steps to reproduce the issue
	15	+- Any relevant logs, screenshots, or proof-of-concept code
	16	+- Your recommended fix, if you have one
	17	+
	18	+## What to Expect
	19	+
	20	+- Acknowledgment: We will acknowledge receipt of your report within 2 business days.
	21	+- Assessment: We will investigate and assess the severity of the issue. We may reach out to you for additional details.
	22	+- Resolution: We will work on a fix and coordinate disclosure with you. We aim to resolve critical issues within 14 days.
	23	+- Credit: With your permission, we will credit you in the release notes for the fix.
	24	+
	25	+## Supported Versions
	26	+
	27	+We provide security updates for the latest minor release of PlanOpticon. We recommend always running the most recent version.
	28	+
	29	+\| Version \| Supported \|
	30	+\|---------\|-----------\|
	31	+\| Latest \| Yes \|
	32	+\| Older \| No \|
	33	+
	34	+## Scope
	35	+
	36	+This security policy covers the PlanOpticon application and its first-party code. Vulnerabilities in third-party dependencies should be reported to the respective upstream projects, though we appreciate being notified so we can update our dependencies promptly.
	37	+
	38	+## Thank You
	39	+
	40	+We value the security research community and appreciate the effort that goes into finding and responsibly disclosing vulnerabilities. Thank you for helping keep PlanOpticon and its users safe.

	--- a/.github/SECURITY.md
	+++ b/.github/SECURITY.md
	@@ -0,0 +1,40 @@

	--- a/.github/SECURITY.md
	+++ b/.github/SECURITY.md
	@@ -0,0 +1,40 @@
1	# Security Policy
2
3	## Reporting a Vulnerability
4
5	If you discover a security vulnerability in PlanOpticon, we ask that you report it responsibly. Please do not open a public GitHub issue for security vulnerabilities.
6
7	Instead, send an email to:
8
9	[email protected]
10
11	Include as much of the following information as possible:
12
13	- A description of the vulnerability and its potential impact
14	- Steps to reproduce the issue
15	- Any relevant logs, screenshots, or proof-of-concept code
16	- Your recommended fix, if you have one
17
18	## What to Expect
19
20	- Acknowledgment: We will acknowledge receipt of your report within 2 business days.
21	- Assessment: We will investigate and assess the severity of the issue. We may reach out to you for additional details.
22	- Resolution: We will work on a fix and coordinate disclosure with you. We aim to resolve critical issues within 14 days.
23	- Credit: With your permission, we will credit you in the release notes for the fix.
24
25	## Supported Versions
26
27	We provide security updates for the latest minor release of PlanOpticon. We recommend always running the most recent version.
28
29	\| Version \| Supported \|
30	\|---------\|-----------\|
31	\| Latest \| Yes \|
32	\| Older \| No \|
33
34	## Scope
35
36	This security policy covers the PlanOpticon application and its first-party code. Vulnerabilities in third-party dependencies should be reported to the respective upstream projects, though we appreciate being notified so we can update our dependencies promptly.
37
38	## Thank You
39
40	We value the security research community and appreciate the effort that goes into finding and responsibly disclosing vulnerabilities. Thank you for helping keep PlanOpticon and its users safe.

M .github/workflows/release-binaries.yml

+1

		--- .github/workflows/release-binaries.yml
		+++ .github/workflows/release-binaries.yml
		@@ -47,10 +47,11 @@
47	47	run: \|
48	48	pip install -e ".[all]"
49	49	pip install pyinstaller
50	50
51	51	- name: Build binary
	52	+ shell: bash
52	53	run: \|
53	54	pyinstaller \
54	55	--name planopticon-${{ matrix.target }} \
55	56	--onefile \
56	57	--console \
57	58
58	59	DELETED implementation.md

	--- .github/workflows/release-binaries.yml
	+++ .github/workflows/release-binaries.yml
	@@ -47,10 +47,11 @@
47	run: \|
48	pip install -e ".[all]"
49	pip install pyinstaller
50
51	- name: Build binary

52	run: \|
53	pyinstaller \
54	--name planopticon-${{ matrix.target }} \
55	--onefile \
56	--console \
57
58	ELETED implementation.md

	--- .github/workflows/release-binaries.yml
	+++ .github/workflows/release-binaries.yml
	@@ -47,10 +47,11 @@
47	run: \|
48	pip install -e ".[all]"
49	pip install pyinstaller
50
51	- name: Build binary
52	shell: bash
53	run: \|
54	pyinstaller \
55	--name planopticon-${{ matrix.target }} \
56	--onefile \
57	--console \
58
59	ELETED implementation.md

D implementation.md

-272

		--- a/implementation.md
		+++ b/implementation.md
		@@ -1,272 +0,0 @@
1		-# PlanOpticon Implementation Guide
2		-This document provides detailed technical guidance for implementing the PlanOpticon system architecture. The suggested approach balances code quality, performance optimization, and architecture best practices.
3		-## System Architecture
4		-PlanOpticon follows a modular pipeline architecture with these core components:
5		-```
6		-video_processor/
7		-├── extractors/
8		-│ ├── frame_extractor.py
9		-│ ├── audio_extractor.py
10		-│ └── text_extractor.py
11		-├── api/
12		-│ ├── transcription_api.py
13		-│ ├── vision_api.py
14		-│ ├── llm_api.py
15		-│ └── api_manager.py
16		-├── analyzers/
17		-│ ├── content_analyzer.py
18		-│ ├── diagram_analyzer.py
19		-│ └── action_detector.py
20		-├── integrators/
21		-│ ├── knowledge_graph.py
22		-│ └── plan_generator.py
23		-├── utils/
24		-│ ├── api_cache.py
25		-│ ├── prompt_templates.py
26		-│ └── visualization.py
27		-└── cli/
28		- ├── commands.py
29		- └── output_formatter.py
30		-```
31		-## Implementation Approach
32		-When building complex systems like PlanOpticon, it's critical to develop each component with clear boundaries and interfaces. The following approach provides a framework for high-quality implementation:
33		-### Video and Audio Processing
34		-Video frame extraction should be implemented with performance in mind:
35		-```
36		-pythondef extract_frames(video_path, sampling_rate=1.0, change_threshold=0.15):
37		- """
38		- Extract frames from video based on sampling rate and visual change detection.
39		-
40		- Parameters
41		- ----------
42		- video_path : str
43		- Path to video file
44		- sampling_rate : float
45		- Frame sampling rate (1.0 = every frame)
46		- change_threshold : float
47		- Threshold for detecting significant visual changes
48		-
49		- Returns
50		- -------
51		- list
52		- List of extracted frames as numpy arrays
53		- """
54		- # Implementation details here
55		- pass
56		-```
57		-Consider using a decorator pattern for GPU acceleration when available:
58		-```
59		-pythondef gpu_accelerated(func):
60		- """Decorator to use GPU implementation when available."""
61		- @functools.wraps(func)
62		- def wrapper(args, *kwargs):
63		- if is_gpu_available() and not kwargs.get('disable_gpu'):
64		- return func_gpu(args, *kwargs)
65		- return func(args, *kwargs)
66		- return wrapper
67		-```
68		-### Computer Vision Components
69		-When implementing diagram detection, consider using a progressive refinement approach:
70		-```
71		-pythonclass DiagramDetector:
72		- """Detects and extracts diagrams from video frames."""
73		-
74		- def __init__(self, model_path, confidence_threshold=0.7):
75		- """Initialize detector with pre-trained model."""
76		- # Implementation details
77		-
78		- def detect(self, frame):
79		- """
80		- Detect diagrams in a single frame.
81		-
82		- Parameters
83		- ----------
84		- frame : numpy.ndarray
85		- Video frame as numpy array
86		-
87		- Returns
88		- -------
89		- list
90		- List of detected diagram regions as bounding boxes
91		- """
92		- # 1. Initial region proposal
93		- # 2. Feature extraction
94		- # A well-designed detection pipeline would incorporate multiple stages
95		- # of increasingly refined detection to balance performance and accuracy
96		- pass
97		-
98		- def extract_and_normalize(self, frame, regions):
99		- """Extract and normalize detected diagrams."""
100		- # Implementation details
101		- pass
102		-```
103		-### Speech Processing Pipeline
104		-The speech recognition and diarization system should be implemented with careful attention to context:
105		-pythonclass SpeechProcessor:
106		- """Process speech from audio extraction."""
107		-
108		- def __init__(self, models_dir, device='auto'):
109		- """
110		- Initialize speech processor.
111		-
112		- Parameters
113		- ----------
114		- models_dir : str
115		- Directory containing pre-trained models
116		- device : str
117		- Computing device ('cpu', 'cuda', 'auto')
118		- """
119		- # Implementation details
120		-
121		- def process_audio(self, audio_path):
122		- """
123		- Process audio file for transcription and speaker diarization.
124		-
125		- Parameters
126		- ----------
127		- audio_path : str
128		- Path to audio file
129		-
130		- Returns
131		- -------
132		- dict
133		- Processed speech segments with speaker attribution
134		- """
135		- # The key to effective speech processing is maintaining temporal context
136		- # throughout the pipeline and handling speaker transitions gracefully
137		- pass
138		-### Action Item Detection
139		-Action item detection requires sophisticated NLP techniques:
140		-pythonclass ActionItemDetector:
141		- """Detect action items from transcript."""
142		-
143		- def detect_action_items(self, transcript):
144		- """
145		- Detect action items from transcript.
146		-
147		- Parameters
148		- ----------
149		- transcript : list
150		- List of transcript segments
151		-
152		- Returns
153		- -------
154		- list
155		- Detected action items with metadata
156		- """
157		- # A well-designed action item detector would incorporate:
158		- # 1. Intent recognition
159		- # 2. Commitment language detection
160		- # 3. Responsibility attribution
161		- # 4. Deadline extraction
162		- # 5. Priority estimation
163		- pass
164		-## Performance Optimization
165		-For optimal performance across different hardware targets:
166		-
167		-ARM Optimization
168		-
169		-Use vectorized operations with NumPy/SciPy where possible
170		-Implement conditional paths for ARM-specific optimizations
171		-Consider using PyTorch's mobile optimized models
172		-
173		-
174		-## Memory Management
175		-
176		-Implement progressive loading for large videos
177		-Use memory-mapped file access for large datasets
178		-Release resources explicitly when no longer needed
179		-
180		-
181		-## GPU Acceleration
182		-
183		-Design compute-intensive operations to work in batches
184		-Minimize CPU-GPU memory transfers
185		-Implement fallback paths for CPU-only environments
186		-
187		-
188		-
189		-## Code Quality Guidelines
190		-Maintain high code quality through these practices:
191		-
192		-### PEP 8 Compliance
193		-
194		-Consistent 4-space indentation
195		-Maximum line length of 88 characters (Black formatter standard)
196		-Descriptive variable names with snake_case convention
197		-Comprehensive docstrings for all public functions and classes
198		-
199		-
200		-### Type Annotations
201		-
202		-Use Python's type hints consistently throughout codebase
203		-Define custom types for complex data structures
204		-Validate with mypy during development
205		-
206		-
207		-### Testing Strategy
208		-
209		-Write unit tests for each module with minimum 80% coverage
210		-Create integration tests for component interactions
211		-Implement performance benchmarks for critical paths
212		-
213		-
214		-
215		-# API Integration Considerations
216		-When implementing cloud API components, consider:
217		-
218		-## API Selection
219		-
220		-Balance capabilities, cost, and performance requirements
221		-Implement appropriate rate limiting and quota management
222		-Design with graceful fallbacks between different API providers
223		-
224		-
225		-### Efficient API Usage
226		-
227		-Create optimized prompts for different content types
228		-Batch requests where possible to minimize API calls
229		-Implement caching to avoid redundant API calls
230		-
231		-
232		-### Prompt Engineering
233		-
234		-Design effective prompt templates for consistent results
235		-Implement few-shot examples for specialized content understanding
236		-Create chain-of-thought prompting for complex analysis tasks
237		-
238		-
239		-
240		-## Prompting Guidelines
241		-When developing complex AI systems, clear guidance helps ensure effective implementation. Consider these approaches:
242		-
243		-### Component Breakdown
244		-
245		-Begin by dividing the system into well-defined modules
246		-Define clear interfaces between components
247		-Specify expected inputs and outputs for each function
248		-
249		-
250		-### Progressive Development
251		-
252		-Start with skeleton implementation of core functionality
253		-Add refinements iteratively
254		-Implement error handling after core functionality works
255		-
256		-
257		-### Example-Driven Design
258		-
259		-Provide clear examples of expected behaviors
260		-Include sample inputs and outputs
261		-Demonstrate error cases and handling
262		-
263		-
264		-### Architecture Patterns
265		-
266		-Use factory patterns for flexible component creation
267		-Implement strategy patterns for algorithm selection
268		-Apply decorator patterns for cross-cutting concerns
269		-
270		-Remember that the best implementations come from clear understanding of the problem domain and careful consideration of edge cases.
271		-
272		-PlanOpticon's implementation requires attention to both high-level architecture and low-level optimization. By following these guidelines, developers can create a robust, performant system that effectively extracts valuable information from video content.

	--- a/implementation.md
	+++ b/implementation.md
	@@ -1,272 +0,0 @@
1	# PlanOpticon Implementation Guide
2	This document provides detailed technical guidance for implementing the PlanOpticon system architecture. The suggested approach balances code quality, performance optimization, and architecture best practices.
3	## System Architecture
4	PlanOpticon follows a modular pipeline architecture with these core components:
5	```
6	video_processor/
7	├── extractors/
8	│ ├── frame_extractor.py
9	│ ├── audio_extractor.py
10	│ └── text_extractor.py
11	├── api/
12	│ ├── transcription_api.py
13	│ ├── vision_api.py
14	│ ├── llm_api.py
15	│ └── api_manager.py
16	├── analyzers/
17	│ ├── content_analyzer.py
18	│ ├── diagram_analyzer.py
19	│ └── action_detector.py
20	├── integrators/
21	│ ├── knowledge_graph.py
22	│ └── plan_generator.py
23	├── utils/
24	│ ├── api_cache.py
25	│ ├── prompt_templates.py
26	│ └── visualization.py
27	└── cli/
28	├── commands.py
29	└── output_formatter.py
30	```
31	## Implementation Approach
32	When building complex systems like PlanOpticon, it's critical to develop each component with clear boundaries and interfaces. The following approach provides a framework for high-quality implementation:
33	### Video and Audio Processing
34	Video frame extraction should be implemented with performance in mind:
35	```
36	pythondef extract_frames(video_path, sampling_rate=1.0, change_threshold=0.15):
37	"""
38	Extract frames from video based on sampling rate and visual change detection.
39
40	Parameters
41	----------
42	video_path : str
43	Path to video file
44	sampling_rate : float
45	Frame sampling rate (1.0 = every frame)
46	change_threshold : float
47	Threshold for detecting significant visual changes
48
49	Returns
50	-------
51	list
52	List of extracted frames as numpy arrays
53	"""
54	# Implementation details here
55	pass
56	```
57	Consider using a decorator pattern for GPU acceleration when available:
58	```
59	pythondef gpu_accelerated(func):
60	"""Decorator to use GPU implementation when available."""
61	@functools.wraps(func)
62	def wrapper(args, *kwargs):
63	if is_gpu_available() and not kwargs.get('disable_gpu'):
64	return func_gpu(args, *kwargs)
65	return func(args, *kwargs)
66	return wrapper
67	```
68	### Computer Vision Components
69	When implementing diagram detection, consider using a progressive refinement approach:
70	```
71	pythonclass DiagramDetector:
72	"""Detects and extracts diagrams from video frames."""
73
74	def __init__(self, model_path, confidence_threshold=0.7):
75	"""Initialize detector with pre-trained model."""
76	# Implementation details
77
78	def detect(self, frame):
79	"""
80	Detect diagrams in a single frame.
81
82	Parameters
83	----------
84	frame : numpy.ndarray
85	Video frame as numpy array
86
87	Returns
88	-------
89	list
90	List of detected diagram regions as bounding boxes
91	"""
92	# 1. Initial region proposal
93	# 2. Feature extraction
94	# A well-designed detection pipeline would incorporate multiple stages
95	# of increasingly refined detection to balance performance and accuracy
96	pass
97
98	def extract_and_normalize(self, frame, regions):
99	"""Extract and normalize detected diagrams."""
100	# Implementation details
101	pass
102	```
103	### Speech Processing Pipeline
104	The speech recognition and diarization system should be implemented with careful attention to context:
105	pythonclass SpeechProcessor:
106	"""Process speech from audio extraction."""
107
108	def __init__(self, models_dir, device='auto'):
109	"""
110	Initialize speech processor.
111
112	Parameters
113	----------
114	models_dir : str
115	Directory containing pre-trained models
116	device : str
117	Computing device ('cpu', 'cuda', 'auto')
118	"""
119	# Implementation details
120
121	def process_audio(self, audio_path):
122	"""
123	Process audio file for transcription and speaker diarization.
124
125	Parameters
126	----------
127	audio_path : str
128	Path to audio file
129
130	Returns
131	-------
132	dict
133	Processed speech segments with speaker attribution
134	"""
135	# The key to effective speech processing is maintaining temporal context
136	# throughout the pipeline and handling speaker transitions gracefully
137	pass
138	### Action Item Detection
139	Action item detection requires sophisticated NLP techniques:
140	pythonclass ActionItemDetector:
141	"""Detect action items from transcript."""
142
143	def detect_action_items(self, transcript):
144	"""
145	Detect action items from transcript.
146
147	Parameters
148	----------
149	transcript : list
150	List of transcript segments
151
152	Returns
153	-------
154	list
155	Detected action items with metadata
156	"""
157	# A well-designed action item detector would incorporate:
158	# 1. Intent recognition
159	# 2. Commitment language detection
160	# 3. Responsibility attribution
161	# 4. Deadline extraction
162	# 5. Priority estimation
163	pass
164	## Performance Optimization
165	For optimal performance across different hardware targets:
166
167	ARM Optimization
168
169	Use vectorized operations with NumPy/SciPy where possible
170	Implement conditional paths for ARM-specific optimizations
171	Consider using PyTorch's mobile optimized models
172
173
174	## Memory Management
175
176	Implement progressive loading for large videos
177	Use memory-mapped file access for large datasets
178	Release resources explicitly when no longer needed
179
180
181	## GPU Acceleration
182
183	Design compute-intensive operations to work in batches
184	Minimize CPU-GPU memory transfers
185	Implement fallback paths for CPU-only environments
186
187
188
189	## Code Quality Guidelines
190	Maintain high code quality through these practices:
191
192	### PEP 8 Compliance
193
194	Consistent 4-space indentation
195	Maximum line length of 88 characters (Black formatter standard)
196	Descriptive variable names with snake_case convention
197	Comprehensive docstrings for all public functions and classes
198
199
200	### Type Annotations
201
202	Use Python's type hints consistently throughout codebase
203	Define custom types for complex data structures
204	Validate with mypy during development
205
206
207	### Testing Strategy
208
209	Write unit tests for each module with minimum 80% coverage
210	Create integration tests for component interactions
211	Implement performance benchmarks for critical paths
212
213
214
215	# API Integration Considerations
216	When implementing cloud API components, consider:
217
218	## API Selection
219
220	Balance capabilities, cost, and performance requirements
221	Implement appropriate rate limiting and quota management
222	Design with graceful fallbacks between different API providers
223
224
225	### Efficient API Usage
226
227	Create optimized prompts for different content types
228	Batch requests where possible to minimize API calls
229	Implement caching to avoid redundant API calls
230
231
232	### Prompt Engineering
233
234	Design effective prompt templates for consistent results
235	Implement few-shot examples for specialized content understanding
236	Create chain-of-thought prompting for complex analysis tasks
237
238
239
240	## Prompting Guidelines
241	When developing complex AI systems, clear guidance helps ensure effective implementation. Consider these approaches:
242
243	### Component Breakdown
244
245	Begin by dividing the system into well-defined modules
246	Define clear interfaces between components
247	Specify expected inputs and outputs for each function
248
249
250	### Progressive Development
251
252	Start with skeleton implementation of core functionality
253	Add refinements iteratively
254	Implement error handling after core functionality works
255
256
257	### Example-Driven Design
258
259	Provide clear examples of expected behaviors
260	Include sample inputs and outputs
261	Demonstrate error cases and handling
262
263
264	### Architecture Patterns
265
266	Use factory patterns for flexible component creation
267	Implement strategy patterns for algorithm selection
268	Apply decorator patterns for cross-cutting concerns
269
270	Remember that the best implementations come from clear understanding of the problem domain and careful consideration of edge cases.
271
272	PlanOpticon's implementation requires attention to both high-level architecture and low-level optimization. By following these guidelines, developers can create a robust, performant system that effectively extracts valuable information from video content.

	--- a/implementation.md
	+++ b/implementation.md
	@@ -1,272 +0,0 @@

M pyproject.toml

+3

		--- pyproject.toml
		+++ pyproject.toml
		@@ -99,10 +99,13 @@
99	99	target-version = "py310"
100	100
101	101	[tool.ruff.lint]
102	102	select = ["E", "F", "W", "I"]
103	103
	104	+[tool.ruff.lint.per-file-ignores]
	105	+"video_processor/utils/prompt_templates.py" = ["E501"]
	106	+
104	107	[tool.mypy]
105	108	python_version = "3.10"
106	109	warn_return_any = true
107	110	warn_unused_configs = true
108	111
109	112
110	113	DELETED scripts/setup.sh

	--- pyproject.toml
	+++ pyproject.toml
	@@ -99,10 +99,13 @@
99	target-version = "py310"
100
101	[tool.ruff.lint]
102	select = ["E", "F", "W", "I"]
103



104	[tool.mypy]
105	python_version = "3.10"
106	warn_return_any = true
107	warn_unused_configs = true
108
109
110	ELETED scripts/setup.sh

	--- pyproject.toml
	+++ pyproject.toml
	@@ -99,10 +99,13 @@
99	target-version = "py310"
100
101	[tool.ruff.lint]
102	select = ["E", "F", "W", "I"]
103
104	[tool.ruff.lint.per-file-ignores]
105	"video_processor/utils/prompt_templates.py" = ["E501"]
106
107	[tool.mypy]
108	python_version = "3.10"
109	warn_return_any = true
110	warn_unused_configs = true
111
112
113	ELETED scripts/setup.sh

D scripts/setup.sh

-120

		--- a/scripts/setup.sh
		+++ b/scripts/setup.sh
		@@ -1,120 +0,0 @@
1		-#!/bin/bash
2		-# PlanOpticon setup script
3		-set -e
4		-
5		-# Detect operating system
6		-if [[ "$OSTYPE" == "darwin"* ]]; then
7		- OS="macos"
8		-elif [[ "$OSTYPE" == "linux-gnu"* ]]; then
9		- OS="linux"
10		-else
11		- echo "Unsupported operating system: $OSTYPE"
12		- exit 1
13		-fi
14		-
15		-# Detect architecture
16		-ARCH=$(uname -m)
17		-if [[ "$ARCH" == "arm64" ]] \|\| [[ "$ARCH" == "aarch64" ]]; then
18		- ARCH="arm64"
19		-elif [[ "$ARCH" == "x86_64" ]]; then
20		- ARCH="x86_64"
21		-else
22		- echo "Unsupported architecture: $ARCH"
23		- exit 1
24		-fi
25		-
26		-echo "Setting up PlanOpticon on $OS ($ARCH)..."
27		-
28		-# Check for Python
29		-if ! command -v python3 &> /dev/null; then
30		- echo "Python 3 is required but not found."
31		- if [[ "$OS" == "macos" ]]; then
32		- echo "Please install Python 3 using Homebrew or from python.org."
33		- echo " brew install python"
34		- elif [[ "$OS" == "linux" ]]; then
35		- echo "Please install Python 3 using your package manager."
36		- echo " Ubuntu/Debian: sudo apt install python3 python3-pip python3-venv"
37		- echo " Fedora: sudo dnf install python3 python3-pip"
38		- fi
39		- exit 1
40		-fi
41		-
42		-# Check Python version
43		-PY_VERSION=$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
44		-PY_MAJOR=$(echo $PY_VERSION \| cut -d. -f1)
45		-PY_MINOR=$(echo $PY_VERSION \| cut -d. -f2)
46		-
47		-if [[ "$PY_MAJOR" -lt 3 ]] \|\| [[ "$PY_MAJOR" -eq 3 && "$PY_MINOR" -lt 9 ]]; then
48		- echo "Python 3.9 or higher is required, but found $PY_VERSION."
49		- echo "Please upgrade your Python installation."
50		- exit 1
51		-fi
52		-
53		-echo "Using Python $PY_VERSION"
54		-
55		-# Check for FFmpeg
56		-if ! command -v ffmpeg &> /dev/null; then
57		- echo "FFmpeg is required but not found."
58		- if [[ "$OS" == "macos" ]]; then
59		- echo "Please install FFmpeg using Homebrew:"
60		- echo " brew install ffmpeg"
61		- elif [[ "$OS" == "linux" ]]; then
62		- echo "Please install FFmpeg using your package manager:"
63		- echo " Ubuntu/Debian: sudo apt install ffmpeg"
64		- echo " Fedora: sudo dnf install ffmpeg"
65		- fi
66		- exit 1
67		-fi
68		-
69		-echo "FFmpeg found"
70		-
71		-# Create and activate virtual environment
72		-if [[ -d "venv" ]]; then
73		- echo "Virtual environment already exists"
74		-else
75		- echo "Creating virtual environment..."
76		- python3 -m venv venv
77		-fi
78		-
79		-# Determine activate script path
80		-if [[ "$OS" == "macos" ]] \|\| [[ "$OS" == "linux" ]]; then
81		- ACTIVATE="venv/bin/activate"
82		-fi
83		-
84		-echo "Activating virtual environment..."
85		-source "$ACTIVATE"
86		-
87		-# Upgrade pip
88		-echo "Upgrading pip..."
89		-pip install --upgrade pip
90		-
91		-# Install dependencies
92		-echo "Installing dependencies..."
93		-pip install -e .
94		-
95		-# Install optional GPU dependencies if available
96		-if [[ "$OS" == "macos" && "$ARCH" == "arm64" ]]; then
97		- echo "Installing optional ARM-specific packages for macOS..."
98		- pip install -r requirements-apple.txt 2>/dev/null \|\| echo "No ARM-specific packages found or could not install them."
99		-elif [[ "$ARCH" == "x86_64" ]]; then
100		- # Check for NVIDIA GPU
101		- if [[ "$OS" == "linux" ]] && command -v nvidia-smi &> /dev/null; then
102		- echo "NVIDIA GPU detected, installing GPU dependencies..."
103		- pip install -r requirements-gpu.txt 2>/dev/null \|\| echo "Could not install GPU packages."
104		- fi
105		-fi
106		-
107		-# Create example .env file if it doesn't exist
108		-if [[ ! -f ".env" ]]; then
109		- echo "Creating example .env file..."
110		- cp .env.example .env
111		- echo "Please edit the .env file to add your API keys."
112		-fi
113		-
114		-echo "Setup complete! PlanOpticon is ready to use."
115		-echo ""
116		-echo "To activate the virtual environment, run:"
117		-echo " source \"$ACTIVATE\""
118		-echo ""
119		-echo "To run PlanOpticon, use:"
120		-echo " planopticon --help"

	--- a/scripts/setup.sh
	+++ b/scripts/setup.sh
	@@ -1,120 +0,0 @@
1	#!/bin/bash
2	# PlanOpticon setup script
3	set -e
4
5	# Detect operating system
6	if [[ "$OSTYPE" == "darwin"* ]]; then
7	OS="macos"
8	elif [[ "$OSTYPE" == "linux-gnu"* ]]; then
9	OS="linux"
10	else
11	echo "Unsupported operating system: $OSTYPE"
12	exit 1
13	fi
14
15	# Detect architecture
16	ARCH=$(uname -m)
17	if [[ "$ARCH" == "arm64" ]] \|\| [[ "$ARCH" == "aarch64" ]]; then
18	ARCH="arm64"
19	elif [[ "$ARCH" == "x86_64" ]]; then
20	ARCH="x86_64"
21	else
22	echo "Unsupported architecture: $ARCH"
23	exit 1
24	fi
25
26	echo "Setting up PlanOpticon on $OS ($ARCH)..."
27
28	# Check for Python
29	if ! command -v python3 &> /dev/null; then
30	echo "Python 3 is required but not found."
31	if [[ "$OS" == "macos" ]]; then
32	echo "Please install Python 3 using Homebrew or from python.org."
33	echo " brew install python"
34	elif [[ "$OS" == "linux" ]]; then
35	echo "Please install Python 3 using your package manager."
36	echo " Ubuntu/Debian: sudo apt install python3 python3-pip python3-venv"
37	echo " Fedora: sudo dnf install python3 python3-pip"
38	fi
39	exit 1
40	fi
41
42	# Check Python version
43	PY_VERSION=$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
44	PY_MAJOR=$(echo $PY_VERSION \| cut -d. -f1)
45	PY_MINOR=$(echo $PY_VERSION \| cut -d. -f2)
46
47	if [[ "$PY_MAJOR" -lt 3 ]] \|\| [[ "$PY_MAJOR" -eq 3 && "$PY_MINOR" -lt 9 ]]; then
48	echo "Python 3.9 or higher is required, but found $PY_VERSION."
49	echo "Please upgrade your Python installation."
50	exit 1
51	fi
52
53	echo "Using Python $PY_VERSION"
54
55	# Check for FFmpeg
56	if ! command -v ffmpeg &> /dev/null; then
57	echo "FFmpeg is required but not found."
58	if [[ "$OS" == "macos" ]]; then
59	echo "Please install FFmpeg using Homebrew:"
60	echo " brew install ffmpeg"
61	elif [[ "$OS" == "linux" ]]; then
62	echo "Please install FFmpeg using your package manager:"
63	echo " Ubuntu/Debian: sudo apt install ffmpeg"
64	echo " Fedora: sudo dnf install ffmpeg"
65	fi
66	exit 1
67	fi
68
69	echo "FFmpeg found"
70
71	# Create and activate virtual environment
72	if [[ -d "venv" ]]; then
73	echo "Virtual environment already exists"
74	else
75	echo "Creating virtual environment..."
76	python3 -m venv venv
77	fi
78
79	# Determine activate script path
80	if [[ "$OS" == "macos" ]] \|\| [[ "$OS" == "linux" ]]; then
81	ACTIVATE="venv/bin/activate"
82	fi
83
84	echo "Activating virtual environment..."
85	source "$ACTIVATE"
86
87	# Upgrade pip
88	echo "Upgrading pip..."
89	pip install --upgrade pip
90
91	# Install dependencies
92	echo "Installing dependencies..."
93	pip install -e .
94
95	# Install optional GPU dependencies if available
96	if [[ "$OS" == "macos" && "$ARCH" == "arm64" ]]; then
97	echo "Installing optional ARM-specific packages for macOS..."
98	pip install -r requirements-apple.txt 2>/dev/null \|\| echo "No ARM-specific packages found or could not install them."
99	elif [[ "$ARCH" == "x86_64" ]]; then
100	# Check for NVIDIA GPU
101	if [[ "$OS" == "linux" ]] && command -v nvidia-smi &> /dev/null; then
102	echo "NVIDIA GPU detected, installing GPU dependencies..."
103	pip install -r requirements-gpu.txt 2>/dev/null \|\| echo "Could not install GPU packages."
104	fi
105	fi
106
107	# Create example .env file if it doesn't exist
108	if [[ ! -f ".env" ]]; then
109	echo "Creating example .env file..."
110	cp .env.example .env
111	echo "Please edit the .env file to add your API keys."
112	fi
113
114	echo "Setup complete! PlanOpticon is ready to use."
115	echo ""
116	echo "To activate the virtual environment, run:"
117	echo " source \"$ACTIVATE\""
118	echo ""
119	echo "To run PlanOpticon, use:"
120	echo " planopticon --help"

	--- a/scripts/setup.sh
	+++ b/scripts/setup.sh
	@@ -1,120 +0,0 @@

M setup.py

+1

		--- setup.py
		+++ setup.py
		@@ -1,4 +1,5 @@
1	1	"""Backwards-compatible setup.py — all config lives in pyproject.toml."""
	2	+
2	3	from setuptools import setup
3	4
4	5	setup()
5	6

	--- setup.py
	+++ setup.py
	@@ -1,4 +1,5 @@
1	"""Backwards-compatible setup.py — all config lives in pyproject.toml."""

2	from setuptools import setup
3
4	setup()
5

	--- setup.py
	+++ setup.py
	@@ -1,4 +1,5 @@
1	"""Backwards-compatible setup.py — all config lives in pyproject.toml."""
2
3	from setuptools import setup
4
5	setup()
6

M tests/test_action_detector.py

+56 -32

		--- tests/test_action_detector.py
		+++ tests/test_action_detector.py
		@@ -1,20 +1,20 @@
1	1	"""Tests for enhanced action item detection."""
2	2
3	3	import json
4	4	from unittest.mock import MagicMock
5	5
6		-import pytest
7		-
8	6	from video_processor.analyzers.action_detector import ActionDetector
9	7	from video_processor.models import ActionItem, TranscriptSegment
10	8
11	9
12	10	class TestPatternExtract:
13	11	def test_detects_need_to(self):
14	12	detector = ActionDetector()
15		- items = detector.detect_from_transcript("We need to update the database schema before release.")
	13	+ items = detector.detect_from_transcript(
	14	+ "We need to update the database schema before release."
	15	+ )
16	16	assert len(items) >= 1
17	17	assert any("database" in i.action.lower() for i in items)
18	18
19	19	def test_detects_should(self):
20	20	detector = ActionDetector()
		@@ -21,11 +21,13 @@
21	21	items = detector.detect_from_transcript("Alice should review the pull request by Friday.")
22	22	assert len(items) >= 1
23	23
24	24	def test_detects_action_item_keyword(self):
25	25	detector = ActionDetector()
26		- items = detector.detect_from_transcript("Action item: set up monitoring for the new service.")
	26	+ items = detector.detect_from_transcript(
	27	+ "Action item: set up monitoring for the new service."
	28	+ )
27	29	assert len(items) >= 1
28	30
29	31	def test_detects_follow_up(self):
30	32	detector = ActionDetector()
31	33	items = detector.detect_from_transcript("Follow up with the client about requirements.")
		@@ -41,22 +43,16 @@
41	43	items = detector.detect_from_transcript("Do it.")
42	44	assert len(items) == 0
43	45
44	46	def test_no_action_patterns(self):
45	47	detector = ActionDetector()
46		- items = detector.detect_from_transcript(
47		- "The weather was nice today. We had lunch at noon."
48		- )
	48	+ items = detector.detect_from_transcript("The weather was nice today. We had lunch at noon.")
49	49	assert len(items) == 0
50	50
51	51	def test_multiple_sentences(self):
52	52	detector = ActionDetector()
53		- text = (
54		- "We need to deploy the fix. "
55		- "Alice should test it first. "
56		- "The sky is blue."
57		- )
	53	+ text = "We need to deploy the fix. Alice should test it first. The sky is blue."
58	54	items = detector.detect_from_transcript(text)
59	55	assert len(items) == 2
60	56
61	57	def test_source_is_transcript(self):
62	58	detector = ActionDetector()
		@@ -66,14 +62,21 @@
66	62
67	63
68	64	class TestLLMExtract:
69	65	def test_llm_extraction(self):
70	66	pm = MagicMock()
71		- pm.chat.return_value = json.dumps([
72		- {"action": "Deploy new version", "assignee": "Bob", "deadline": "Friday",
73		- "priority": "high", "context": "Production release"}
74		- ])
	67	+ pm.chat.return_value = json.dumps(
	68	+ [
	69	+ {
	70	+ "action": "Deploy new version",
	71	+ "assignee": "Bob",
	72	+ "deadline": "Friday",
	73	+ "priority": "high",
	74	+ "context": "Production release",
	75	+ }
	76	+ ]
	77	+ )
75	78	detector = ActionDetector(provider_manager=pm)
76	79	items = detector.detect_from_transcript("Deploy new version by Friday.")
77	80	assert len(items) == 1
78	81	assert items[0].action == "Deploy new version"
79	82	assert items[0].assignee == "Bob"
		@@ -102,28 +105,37 @@
102	105	items = detector.detect_from_transcript("Update the docs.")
103	106	assert items == []
104	107
105	108	def test_llm_skips_items_without_action(self):
106	109	pm = MagicMock()
107		- pm.chat.return_value = json.dumps([
108		- {"action": "Valid action", "assignee": None},
109		- {"assignee": "Alice"}, # No action field
110		- {"action": "", "assignee": "Bob"}, # Empty action
111		- ])
	110	+ pm.chat.return_value = json.dumps(
	111	+ [
	112	+ {"action": "Valid action", "assignee": None},
	113	+ {"assignee": "Alice"}, # No action field
	114	+ {"action": "", "assignee": "Bob"}, # Empty action
	115	+ ]
	116	+ )
112	117	detector = ActionDetector(provider_manager=pm)
113	118	items = detector.detect_from_transcript("Some text.")
114	119	assert len(items) == 1
115	120	assert items[0].action == "Valid action"
116	121
117	122
118	123	class TestDetectFromDiagrams:
119	124	def test_dict_diagrams(self):
120	125	pm = MagicMock()
121		- pm.chat.return_value = json.dumps([
122		- {"action": "Migrate database", "assignee": None, "deadline": None,
123		- "priority": None, "context": None},
124		- ])
	126	+ pm.chat.return_value = json.dumps(
	127	+ [
	128	+ {
	129	+ "action": "Migrate database",
	130	+ "assignee": None,
	131	+ "deadline": None,
	132	+ "priority": None,
	133	+ "context": None,
	134	+ },
	135	+ ]
	136	+ )
125	137	detector = ActionDetector(provider_manager=pm)
126	138	diagrams = [
127	139	{"text_content": "Step 1: Migrate database", "elements": ["DB", "Migration"]},
128	140	]
129	141	items = detector.detect_from_diagrams(diagrams)
		@@ -130,14 +142,21 @@
130	142	assert len(items) == 1
131	143	assert items[0].source == "diagram"
132	144
133	145	def test_object_diagrams(self):
134	146	pm = MagicMock()
135		- pm.chat.return_value = json.dumps([
136		- {"action": "Update API", "assignee": None, "deadline": None,
137		- "priority": None, "context": None},
138		- ])
	147	+ pm.chat.return_value = json.dumps(
	148	+ [
	149	+ {
	150	+ "action": "Update API",
	151	+ "assignee": None,
	152	+ "deadline": None,
	153	+ "priority": None,
	154	+ "context": None,
	155	+ },
	156	+ ]
	157	+ )
139	158	detector = ActionDetector(provider_manager=pm)
140	159
141	160	class FakeDiagram:
142	161	text_content = "Update API endpoints"
143	162	elements = ["API", "Gateway"]
		@@ -153,11 +172,14 @@
153	172	assert items == []
154	173
155	174	def test_pattern_fallback_for_diagrams(self):
156	175	detector = ActionDetector() # No provider
157	176	diagrams = [
158		- {"text_content": "We need to update the configuration before deployment.", "elements": []},
	177	+ {
	178	+ "text_content": "We need to update the configuration before deployment.",
	179	+ "elements": [],
	180	+ },
159	181	]
160	182	items = detector.detect_from_diagrams(diagrams)
161	183	assert len(items) >= 1
162	184	assert items[0].source == "diagram"
163	185
		@@ -191,16 +213,18 @@
191	213
192	214
193	215	class TestAttachTimestamps:
194	216	def test_attaches_matching_segment(self):
195	217	detector = ActionDetector()
196		- items = [
	218	+ [
197	219	ActionItem(action="We need to update the database schema before release"),
198	220	]
199	221	segments = [
200	222	TranscriptSegment(start=0.0, end=5.0, text="Welcome to the meeting."),
201		- TranscriptSegment(start=5.0, end=15.0, text="We need to update the database schema before release."),
	223	+ TranscriptSegment(
	224	+ start=5.0, end=15.0, text="We need to update the database schema before release."
	225	+ ),
202	226	TranscriptSegment(start=15.0, end=20.0, text="Any questions?"),
203	227	]
204	228	detector.detect_from_transcript(
205	229	"We need to update the database schema before release.",
206	230	segments=segments,
207	231

	--- tests/test_action_detector.py
	+++ tests/test_action_detector.py
	@@ -1,20 +1,20 @@
1	"""Tests for enhanced action item detection."""
2
3	import json
4	from unittest.mock import MagicMock
5
6	import pytest
7
8	from video_processor.analyzers.action_detector import ActionDetector
9	from video_processor.models import ActionItem, TranscriptSegment
10
11
12	class TestPatternExtract:
13	def test_detects_need_to(self):
14	detector = ActionDetector()
15	items = detector.detect_from_transcript("We need to update the database schema before release.")


16	assert len(items) >= 1
17	assert any("database" in i.action.lower() for i in items)
18
19	def test_detects_should(self):
20	detector = ActionDetector()
	@@ -21,11 +21,13 @@
21	items = detector.detect_from_transcript("Alice should review the pull request by Friday.")
22	assert len(items) >= 1
23
24	def test_detects_action_item_keyword(self):
25	detector = ActionDetector()
26	items = detector.detect_from_transcript("Action item: set up monitoring for the new service.")


27	assert len(items) >= 1
28
29	def test_detects_follow_up(self):
30	detector = ActionDetector()
31	items = detector.detect_from_transcript("Follow up with the client about requirements.")
	@@ -41,22 +43,16 @@
41	items = detector.detect_from_transcript("Do it.")
42	assert len(items) == 0
43
44	def test_no_action_patterns(self):
45	detector = ActionDetector()
46	items = detector.detect_from_transcript(
47	"The weather was nice today. We had lunch at noon."
48	)
49	assert len(items) == 0
50
51	def test_multiple_sentences(self):
52	detector = ActionDetector()
53	text = (
54	"We need to deploy the fix. "
55	"Alice should test it first. "
56	"The sky is blue."
57	)
58	items = detector.detect_from_transcript(text)
59	assert len(items) == 2
60
61	def test_source_is_transcript(self):
62	detector = ActionDetector()
	@@ -66,14 +62,21 @@
66
67
68	class TestLLMExtract:
69	def test_llm_extraction(self):
70	pm = MagicMock()
71	pm.chat.return_value = json.dumps([
72	{"action": "Deploy new version", "assignee": "Bob", "deadline": "Friday",
73	"priority": "high", "context": "Production release"}
74	])







75	detector = ActionDetector(provider_manager=pm)
76	items = detector.detect_from_transcript("Deploy new version by Friday.")
77	assert len(items) == 1
78	assert items[0].action == "Deploy new version"
79	assert items[0].assignee == "Bob"
	@@ -102,28 +105,37 @@
102	items = detector.detect_from_transcript("Update the docs.")
103	assert items == []
104
105	def test_llm_skips_items_without_action(self):
106	pm = MagicMock()
107	pm.chat.return_value = json.dumps([
108	{"action": "Valid action", "assignee": None},
109	{"assignee": "Alice"}, # No action field
110	{"action": "", "assignee": "Bob"}, # Empty action
111	])


112	detector = ActionDetector(provider_manager=pm)
113	items = detector.detect_from_transcript("Some text.")
114	assert len(items) == 1
115	assert items[0].action == "Valid action"
116
117
118	class TestDetectFromDiagrams:
119	def test_dict_diagrams(self):
120	pm = MagicMock()
121	pm.chat.return_value = json.dumps([
122	{"action": "Migrate database", "assignee": None, "deadline": None,
123	"priority": None, "context": None},
124	])







125	detector = ActionDetector(provider_manager=pm)
126	diagrams = [
127	{"text_content": "Step 1: Migrate database", "elements": ["DB", "Migration"]},
128	]
129	items = detector.detect_from_diagrams(diagrams)
	@@ -130,14 +142,21 @@
130	assert len(items) == 1
131	assert items[0].source == "diagram"
132
133	def test_object_diagrams(self):
134	pm = MagicMock()
135	pm.chat.return_value = json.dumps([
136	{"action": "Update API", "assignee": None, "deadline": None,
137	"priority": None, "context": None},
138	])







139	detector = ActionDetector(provider_manager=pm)
140
141	class FakeDiagram:
142	text_content = "Update API endpoints"
143	elements = ["API", "Gateway"]
	@@ -153,11 +172,14 @@
153	assert items == []
154
155	def test_pattern_fallback_for_diagrams(self):
156	detector = ActionDetector() # No provider
157	diagrams = [
158	{"text_content": "We need to update the configuration before deployment.", "elements": []},



159	]
160	items = detector.detect_from_diagrams(diagrams)
161	assert len(items) >= 1
162	assert items[0].source == "diagram"
163
	@@ -191,16 +213,18 @@
191
192
193	class TestAttachTimestamps:
194	def test_attaches_matching_segment(self):
195	detector = ActionDetector()
196	items = [
197	ActionItem(action="We need to update the database schema before release"),
198	]
199	segments = [
200	TranscriptSegment(start=0.0, end=5.0, text="Welcome to the meeting."),
201	TranscriptSegment(start=5.0, end=15.0, text="We need to update the database schema before release."),


202	TranscriptSegment(start=15.0, end=20.0, text="Any questions?"),
203	]
204	detector.detect_from_transcript(
205	"We need to update the database schema before release.",
206	segments=segments,
207

	--- tests/test_action_detector.py
	+++ tests/test_action_detector.py
	@@ -1,20 +1,20 @@
1	"""Tests for enhanced action item detection."""
2
3	import json
4	from unittest.mock import MagicMock
5


6	from video_processor.analyzers.action_detector import ActionDetector
7	from video_processor.models import ActionItem, TranscriptSegment
8
9
10	class TestPatternExtract:
11	def test_detects_need_to(self):
12	detector = ActionDetector()
13	items = detector.detect_from_transcript(
14	"We need to update the database schema before release."
15	)
16	assert len(items) >= 1
17	assert any("database" in i.action.lower() for i in items)
18
19	def test_detects_should(self):
20	detector = ActionDetector()
	@@ -21,11 +21,13 @@
21	items = detector.detect_from_transcript("Alice should review the pull request by Friday.")
22	assert len(items) >= 1
23
24	def test_detects_action_item_keyword(self):
25	detector = ActionDetector()
26	items = detector.detect_from_transcript(
27	"Action item: set up monitoring for the new service."
28	)
29	assert len(items) >= 1
30
31	def test_detects_follow_up(self):
32	detector = ActionDetector()
33	items = detector.detect_from_transcript("Follow up with the client about requirements.")
	@@ -41,22 +43,16 @@
43	items = detector.detect_from_transcript("Do it.")
44	assert len(items) == 0
45
46	def test_no_action_patterns(self):
47	detector = ActionDetector()
48	items = detector.detect_from_transcript("The weather was nice today. We had lunch at noon.")


49	assert len(items) == 0
50
51	def test_multiple_sentences(self):
52	detector = ActionDetector()
53	text = "We need to deploy the fix. Alice should test it first. The sky is blue."




54	items = detector.detect_from_transcript(text)
55	assert len(items) == 2
56
57	def test_source_is_transcript(self):
58	detector = ActionDetector()
	@@ -66,14 +62,21 @@
62
63
64	class TestLLMExtract:
65	def test_llm_extraction(self):
66	pm = MagicMock()
67	pm.chat.return_value = json.dumps(
68	[
69	{
70	"action": "Deploy new version",
71	"assignee": "Bob",
72	"deadline": "Friday",
73	"priority": "high",
74	"context": "Production release",
75	}
76	]
77	)
78	detector = ActionDetector(provider_manager=pm)
79	items = detector.detect_from_transcript("Deploy new version by Friday.")
80	assert len(items) == 1
81	assert items[0].action == "Deploy new version"
82	assert items[0].assignee == "Bob"
	@@ -102,28 +105,37 @@
105	items = detector.detect_from_transcript("Update the docs.")
106	assert items == []
107
108	def test_llm_skips_items_without_action(self):
109	pm = MagicMock()
110	pm.chat.return_value = json.dumps(
111	[
112	{"action": "Valid action", "assignee": None},
113	{"assignee": "Alice"}, # No action field
114	{"action": "", "assignee": "Bob"}, # Empty action
115	]
116	)
117	detector = ActionDetector(provider_manager=pm)
118	items = detector.detect_from_transcript("Some text.")
119	assert len(items) == 1
120	assert items[0].action == "Valid action"
121
122
123	class TestDetectFromDiagrams:
124	def test_dict_diagrams(self):
125	pm = MagicMock()
126	pm.chat.return_value = json.dumps(
127	[
128	{
129	"action": "Migrate database",
130	"assignee": None,
131	"deadline": None,
132	"priority": None,
133	"context": None,
134	},
135	]
136	)
137	detector = ActionDetector(provider_manager=pm)
138	diagrams = [
139	{"text_content": "Step 1: Migrate database", "elements": ["DB", "Migration"]},
140	]
141	items = detector.detect_from_diagrams(diagrams)
	@@ -130,14 +142,21 @@
142	assert len(items) == 1
143	assert items[0].source == "diagram"
144
145	def test_object_diagrams(self):
146	pm = MagicMock()
147	pm.chat.return_value = json.dumps(
148	[
149	{
150	"action": "Update API",
151	"assignee": None,
152	"deadline": None,
153	"priority": None,
154	"context": None,
155	},
156	]
157	)
158	detector = ActionDetector(provider_manager=pm)
159
160	class FakeDiagram:
161	text_content = "Update API endpoints"
162	elements = ["API", "Gateway"]
	@@ -153,11 +172,14 @@
172	assert items == []
173
174	def test_pattern_fallback_for_diagrams(self):
175	detector = ActionDetector() # No provider
176	diagrams = [
177	{
178	"text_content": "We need to update the configuration before deployment.",
179	"elements": [],
180	},
181	]
182	items = detector.detect_from_diagrams(diagrams)
183	assert len(items) >= 1
184	assert items[0].source == "diagram"
185
	@@ -191,16 +213,18 @@
213
214
215	class TestAttachTimestamps:
216	def test_attaches_matching_segment(self):
217	detector = ActionDetector()
218	[
219	ActionItem(action="We need to update the database schema before release"),
220	]
221	segments = [
222	TranscriptSegment(start=0.0, end=5.0, text="Welcome to the meeting."),
223	TranscriptSegment(
224	start=5.0, end=15.0, text="We need to update the database schema before release."
225	),
226	TranscriptSegment(start=15.0, end=20.0, text="Any questions?"),
227	]
228	detector.detect_from_transcript(
229	"We need to update the database schema before release.",
230	segments=segments,
231

M tests/test_agent.py

+9 -9

		--- tests/test_agent.py
		+++ tests/test_agent.py
		@@ -1,11 +1,9 @@
1	1	"""Tests for the agentic processing orchestrator."""
2	2
3	3	import json
4		-from unittest.mock import MagicMock, patch
5		-
6		-import pytest
	4	+from unittest.mock import MagicMock
7	5
8	6	from video_processor.agent.orchestrator import AgentOrchestrator
9	7
10	8
11	9	class TestPlanCreation:
		@@ -99,16 +97,18 @@
99	97	agent.insights.append("should not modify internal")
100	98	assert len(agent._insights) == 2
101	99
102	100	def test_deep_analysis_populates_insights(self):
103	101	pm = MagicMock()
104		- pm.chat.return_value = json.dumps({
105		- "decisions": ["Decided to use microservices"],
106		- "risks": ["Timeline is tight"],
107		- "follow_ups": [],
108		- "tensions": [],
109		- })
	102	+ pm.chat.return_value = json.dumps(
	103	+ {
	104	+ "decisions": ["Decided to use microservices"],
	105	+ "risks": ["Timeline is tight"],
	106	+ "follow_ups": [],
	107	+ "tensions": [],
	108	+ }
	109	+ )
110	110	agent = AgentOrchestrator(provider_manager=pm)
111	111	agent._results["transcribe"] = {"text": "Some long transcript text here"}
112	112	result = agent._deep_analysis("/tmp")
113	113	assert "decisions" in result
114	114	assert any("microservices" in i for i in agent._insights)
115	115

	--- tests/test_agent.py
	+++ tests/test_agent.py
	@@ -1,11 +1,9 @@
1	"""Tests for the agentic processing orchestrator."""
2
3	import json
4	from unittest.mock import MagicMock, patch
5
6	import pytest
7
8	from video_processor.agent.orchestrator import AgentOrchestrator
9
10
11	class TestPlanCreation:
	@@ -99,16 +97,18 @@
99	agent.insights.append("should not modify internal")
100	assert len(agent._insights) == 2
101
102	def test_deep_analysis_populates_insights(self):
103	pm = MagicMock()
104	pm.chat.return_value = json.dumps({
105	"decisions": ["Decided to use microservices"],
106	"risks": ["Timeline is tight"],
107	"follow_ups": [],
108	"tensions": [],
109	})


110	agent = AgentOrchestrator(provider_manager=pm)
111	agent._results["transcribe"] = {"text": "Some long transcript text here"}
112	result = agent._deep_analysis("/tmp")
113	assert "decisions" in result
114	assert any("microservices" in i for i in agent._insights)
115

	--- tests/test_agent.py
	+++ tests/test_agent.py
	@@ -1,11 +1,9 @@
1	"""Tests for the agentic processing orchestrator."""
2
3	import json
4	from unittest.mock import MagicMock


5
6	from video_processor.agent.orchestrator import AgentOrchestrator
7
8
9	class TestPlanCreation:
	@@ -99,16 +97,18 @@
97	agent.insights.append("should not modify internal")
98	assert len(agent._insights) == 2
99
100	def test_deep_analysis_populates_insights(self):
101	pm = MagicMock()
102	pm.chat.return_value = json.dumps(
103	{
104	"decisions": ["Decided to use microservices"],
105	"risks": ["Timeline is tight"],
106	"follow_ups": [],
107	"tensions": [],
108	}
109	)
110	agent = AgentOrchestrator(provider_manager=pm)
111	agent._results["transcribe"] = {"text": "Some long transcript text here"}
112	result = agent._deep_analysis("/tmp")
113	assert "decisions" in result
114	assert any("microservices" in i for i in agent._insights)
115

M tests/test_api_cache.py

+1 -4

		--- tests/test_api_cache.py
		+++ tests/test_api_cache.py
		@@ -1,12 +1,9 @@
1	1	"""Tests for API response cache."""
2	2
3		-import json
4	3	import time
5	4
6		-import pytest
7		-
8	5	from video_processor.utils.api_cache import ApiCache
9	6
10	7
11	8	class TestApiCache:
12	9	def test_set_and_get(self, tmp_path):
		@@ -71,13 +68,13 @@
71	68	cache_b.set("key", "value_b")
72	69	assert cache_a.get("key") == "value_a"
73	70	assert cache_b.get("key") == "value_b"
74	71
75	72	def test_creates_namespace_dir(self, tmp_path):
76		- cache = ApiCache(tmp_path / "sub", namespace="deep")
	73	+ ApiCache(tmp_path / "sub", namespace="deep")
77	74	assert (tmp_path / "sub" / "deep").exists()
78	75
79	76	def test_cache_path_uses_hash(self, tmp_path):
80	77	cache = ApiCache(tmp_path, namespace="test")
81	78	path = cache.get_cache_path("my_key")
82	79	assert path.suffix == ".json"
83	80	assert path.parent.name == "test"
84	81

	--- tests/test_api_cache.py
	+++ tests/test_api_cache.py
	@@ -1,12 +1,9 @@
1	"""Tests for API response cache."""
2
3	import json
4	import time
5
6	import pytest
7
8	from video_processor.utils.api_cache import ApiCache
9
10
11	class TestApiCache:
12	def test_set_and_get(self, tmp_path):
	@@ -71,13 +68,13 @@
71	cache_b.set("key", "value_b")
72	assert cache_a.get("key") == "value_a"
73	assert cache_b.get("key") == "value_b"
74
75	def test_creates_namespace_dir(self, tmp_path):
76	cache = ApiCache(tmp_path / "sub", namespace="deep")
77	assert (tmp_path / "sub" / "deep").exists()
78
79	def test_cache_path_uses_hash(self, tmp_path):
80	cache = ApiCache(tmp_path, namespace="test")
81	path = cache.get_cache_path("my_key")
82	assert path.suffix == ".json"
83	assert path.parent.name == "test"
84

	--- tests/test_api_cache.py
	+++ tests/test_api_cache.py
	@@ -1,12 +1,9 @@
1	"""Tests for API response cache."""
2

3	import time
4


5	from video_processor.utils.api_cache import ApiCache
6
7
8	class TestApiCache:
9	def test_set_and_get(self, tmp_path):
	@@ -71,13 +68,13 @@
68	cache_b.set("key", "value_b")
69	assert cache_a.get("key") == "value_a"
70	assert cache_b.get("key") == "value_b"
71
72	def test_creates_namespace_dir(self, tmp_path):
73	ApiCache(tmp_path / "sub", namespace="deep")
74	assert (tmp_path / "sub" / "deep").exists()
75
76	def test_cache_path_uses_hash(self, tmp_path):
77	cache = ApiCache(tmp_path, namespace="test")
78	path = cache.get_cache_path("my_key")
79	assert path.suffix == ".json"
80	assert path.parent.name == "test"
81

M tests/test_audio_extractor.py

+28 -34

		--- tests/test_audio_extractor.py
		+++ tests/test_audio_extractor.py
		@@ -1,65 +1,65 @@
1	1	"""Tests for the audio extractor module."""
2		-import os
	2	+
3	3	import tempfile
4	4	from pathlib import Path
5		-from unittest.mock import patch, MagicMock
	5	+from unittest.mock import MagicMock, patch
6	6
7	7	import numpy as np
8		-import pytest
9	8
10	9	from video_processor.extractors.audio_extractor import AudioExtractor
	10	+
11	11
12	12	class TestAudioExtractor:
13	13	"""Test suite for AudioExtractor class."""
14		-
	14	+
15	15	def test_init(self):
16	16	"""Test initialization of AudioExtractor."""
17	17	# Default parameters
18	18	extractor = AudioExtractor()
19	19	assert extractor.sample_rate == 16000
20	20	assert extractor.mono is True
21		-
	21	+
22	22	# Custom parameters
23	23	extractor = AudioExtractor(sample_rate=44100, mono=False)
24	24	assert extractor.sample_rate == 44100
25	25	assert extractor.mono is False
26		-
27		- @patch('subprocess.run')
	26	+
	27	+ @patch("subprocess.run")
28	28	def test_extract_audio(self, mock_run):
29	29	"""Test audio extraction from video."""
30	30	# Mock the subprocess.run call
31	31	mock_result = MagicMock()
32	32	mock_result.returncode = 0
33	33	mock_run.return_value = mock_result
34		-
	34	+
35	35	with tempfile.TemporaryDirectory() as temp_dir:
36	36	# Create a dummy video file
37	37	video_path = Path(temp_dir) / "test_video.mp4"
38	38	with open(video_path, "wb") as f:
39	39	f.write(b"dummy video content")
40		-
	40	+
41	41	# Extract audio
42	42	extractor = AudioExtractor()
43		-
	43	+
44	44	# Test with default output path
45	45	output_path = extractor.extract_audio(video_path)
46	46	assert output_path == video_path.with_suffix(".wav")
47		-
	47	+
48	48	# Test with custom output path
49	49	custom_output = Path(temp_dir) / "custom_audio.wav"
50	50	output_path = extractor.extract_audio(video_path, custom_output)
51	51	assert output_path == custom_output
52		-
	52	+
53	53	# Verify subprocess.run was called with correct arguments
54	54	mock_run.assert_called()
55	55	args, kwargs = mock_run.call_args
56	56	assert "ffmpeg" in args[0]
57	57	assert "-i" in args[0]
58	58	assert str(video_path) in args[0]
59		-
60		- @patch('soundfile.info')
	59	+
	60	+ @patch("soundfile.info")
61	61	def test_get_audio_properties(self, mock_sf_info):
62	62	"""Test getting audio properties."""
63	63	# Mock soundfile.info
64	64	mock_info = MagicMock()
65	65	mock_info.duration = 10.5
		@@ -66,55 +66,49 @@
66	66	mock_info.samplerate = 16000
67	67	mock_info.channels = 1
68	68	mock_info.format = "WAV"
69	69	mock_info.subtype = "PCM_16"
70	70	mock_sf_info.return_value = mock_info
71		-
	71	+
72	72	with tempfile.TemporaryDirectory() as temp_dir:
73	73	# Create a dummy audio file
74	74	audio_path = Path(temp_dir) / "test_audio.wav"
75	75	with open(audio_path, "wb") as f:
76	76	f.write(b"dummy audio content")
77		-
	77	+
78	78	# Get properties
79	79	extractor = AudioExtractor()
80	80	props = extractor.get_audio_properties(audio_path)
81		-
	81	+
82	82	# Verify properties
83	83	assert props["duration"] == 10.5
84	84	assert props["sample_rate"] == 16000
85	85	assert props["channels"] == 1
86	86	assert props["format"] == "WAV"
87	87	assert props["subtype"] == "PCM_16"
88	88	assert props["path"] == str(audio_path)
89		-
	89	+
90	90	def test_segment_audio(self):
91	91	"""Test audio segmentation."""
92	92	# Create a dummy audio array (1 second at 16kHz)
93	93	audio_data = np.ones(16000)
94	94	sample_rate = 16000
95		-
	95	+
96	96	extractor = AudioExtractor()
97		-
	97	+
98	98	# Test with 500ms segments, no overlap
99	99	segments = extractor.segment_audio(
100		- audio_data,
101		- sample_rate,
102		- segment_length_ms=500,
103		- overlap_ms=0
	100	+ audio_data, sample_rate, segment_length_ms=500, overlap_ms=0
104	101	)
105		-
	102	+
106	103	# Should produce 2 segments of 8000 samples each
107	104	assert len(segments) == 2
108	105	assert len(segments[0]) == 8000
109	106	assert len(segments[1]) == 8000
110		-
	107	+
111	108	# Test with 600ms segments, 100ms overlap
112	109	segments = extractor.segment_audio(
113		- audio_data,
114		- sample_rate,
115		- segment_length_ms=600,
116		- overlap_ms=100
117		- )
118		-
119		- # Should produce 2 segments (with overlap)
120		- assert len(segments) == 2
	110	+ audio_data, sample_rate, segment_length_ms=600, overlap_ms=100
	111	+ )
	112	+
	113	+ # Should produce 2 segments (with overlap)
	114	+ assert len(segments) == 2
121	115

	--- tests/test_audio_extractor.py
	+++ tests/test_audio_extractor.py
	@@ -1,65 +1,65 @@
1	"""Tests for the audio extractor module."""
2	import os
3	import tempfile
4	from pathlib import Path
5	from unittest.mock import patch, MagicMock
6
7	import numpy as np
8	import pytest
9
10	from video_processor.extractors.audio_extractor import AudioExtractor

11
12	class TestAudioExtractor:
13	"""Test suite for AudioExtractor class."""
14
15	def test_init(self):
16	"""Test initialization of AudioExtractor."""
17	# Default parameters
18	extractor = AudioExtractor()
19	assert extractor.sample_rate == 16000
20	assert extractor.mono is True
21
22	# Custom parameters
23	extractor = AudioExtractor(sample_rate=44100, mono=False)
24	assert extractor.sample_rate == 44100
25	assert extractor.mono is False
26
27	@patch('subprocess.run')
28	def test_extract_audio(self, mock_run):
29	"""Test audio extraction from video."""
30	# Mock the subprocess.run call
31	mock_result = MagicMock()
32	mock_result.returncode = 0
33	mock_run.return_value = mock_result
34
35	with tempfile.TemporaryDirectory() as temp_dir:
36	# Create a dummy video file
37	video_path = Path(temp_dir) / "test_video.mp4"
38	with open(video_path, "wb") as f:
39	f.write(b"dummy video content")
40
41	# Extract audio
42	extractor = AudioExtractor()
43
44	# Test with default output path
45	output_path = extractor.extract_audio(video_path)
46	assert output_path == video_path.with_suffix(".wav")
47
48	# Test with custom output path
49	custom_output = Path(temp_dir) / "custom_audio.wav"
50	output_path = extractor.extract_audio(video_path, custom_output)
51	assert output_path == custom_output
52
53	# Verify subprocess.run was called with correct arguments
54	mock_run.assert_called()
55	args, kwargs = mock_run.call_args
56	assert "ffmpeg" in args[0]
57	assert "-i" in args[0]
58	assert str(video_path) in args[0]
59
60	@patch('soundfile.info')
61	def test_get_audio_properties(self, mock_sf_info):
62	"""Test getting audio properties."""
63	# Mock soundfile.info
64	mock_info = MagicMock()
65	mock_info.duration = 10.5
	@@ -66,55 +66,49 @@
66	mock_info.samplerate = 16000
67	mock_info.channels = 1
68	mock_info.format = "WAV"
69	mock_info.subtype = "PCM_16"
70	mock_sf_info.return_value = mock_info
71
72	with tempfile.TemporaryDirectory() as temp_dir:
73	# Create a dummy audio file
74	audio_path = Path(temp_dir) / "test_audio.wav"
75	with open(audio_path, "wb") as f:
76	f.write(b"dummy audio content")
77
78	# Get properties
79	extractor = AudioExtractor()
80	props = extractor.get_audio_properties(audio_path)
81
82	# Verify properties
83	assert props["duration"] == 10.5
84	assert props["sample_rate"] == 16000
85	assert props["channels"] == 1
86	assert props["format"] == "WAV"
87	assert props["subtype"] == "PCM_16"
88	assert props["path"] == str(audio_path)
89
90	def test_segment_audio(self):
91	"""Test audio segmentation."""
92	# Create a dummy audio array (1 second at 16kHz)
93	audio_data = np.ones(16000)
94	sample_rate = 16000
95
96	extractor = AudioExtractor()
97
98	# Test with 500ms segments, no overlap
99	segments = extractor.segment_audio(
100	audio_data,
101	sample_rate,
102	segment_length_ms=500,
103	overlap_ms=0
104	)
105
106	# Should produce 2 segments of 8000 samples each
107	assert len(segments) == 2
108	assert len(segments[0]) == 8000
109	assert len(segments[1]) == 8000
110
111	# Test with 600ms segments, 100ms overlap
112	segments = extractor.segment_audio(
113	audio_data,
114	sample_rate,
115	segment_length_ms=600,
116	overlap_ms=100
117	)
118
119	# Should produce 2 segments (with overlap)
120	assert len(segments) == 2
121

	--- tests/test_audio_extractor.py
	+++ tests/test_audio_extractor.py
	@@ -1,65 +1,65 @@
1	"""Tests for the audio extractor module."""
2
3	import tempfile
4	from pathlib import Path
5	from unittest.mock import MagicMock, patch
6
7	import numpy as np

8
9	from video_processor.extractors.audio_extractor import AudioExtractor
10
11
12	class TestAudioExtractor:
13	"""Test suite for AudioExtractor class."""
14
15	def test_init(self):
16	"""Test initialization of AudioExtractor."""
17	# Default parameters
18	extractor = AudioExtractor()
19	assert extractor.sample_rate == 16000
20	assert extractor.mono is True
21
22	# Custom parameters
23	extractor = AudioExtractor(sample_rate=44100, mono=False)
24	assert extractor.sample_rate == 44100
25	assert extractor.mono is False
26
27	@patch("subprocess.run")
28	def test_extract_audio(self, mock_run):
29	"""Test audio extraction from video."""
30	# Mock the subprocess.run call
31	mock_result = MagicMock()
32	mock_result.returncode = 0
33	mock_run.return_value = mock_result
34
35	with tempfile.TemporaryDirectory() as temp_dir:
36	# Create a dummy video file
37	video_path = Path(temp_dir) / "test_video.mp4"
38	with open(video_path, "wb") as f:
39	f.write(b"dummy video content")
40
41	# Extract audio
42	extractor = AudioExtractor()
43
44	# Test with default output path
45	output_path = extractor.extract_audio(video_path)
46	assert output_path == video_path.with_suffix(".wav")
47
48	# Test with custom output path
49	custom_output = Path(temp_dir) / "custom_audio.wav"
50	output_path = extractor.extract_audio(video_path, custom_output)
51	assert output_path == custom_output
52
53	# Verify subprocess.run was called with correct arguments
54	mock_run.assert_called()
55	args, kwargs = mock_run.call_args
56	assert "ffmpeg" in args[0]
57	assert "-i" in args[0]
58	assert str(video_path) in args[0]
59
60	@patch("soundfile.info")
61	def test_get_audio_properties(self, mock_sf_info):
62	"""Test getting audio properties."""
63	# Mock soundfile.info
64	mock_info = MagicMock()
65	mock_info.duration = 10.5
	@@ -66,55 +66,49 @@
66	mock_info.samplerate = 16000
67	mock_info.channels = 1
68	mock_info.format = "WAV"
69	mock_info.subtype = "PCM_16"
70	mock_sf_info.return_value = mock_info
71
72	with tempfile.TemporaryDirectory() as temp_dir:
73	# Create a dummy audio file
74	audio_path = Path(temp_dir) / "test_audio.wav"
75	with open(audio_path, "wb") as f:
76	f.write(b"dummy audio content")
77
78	# Get properties
79	extractor = AudioExtractor()
80	props = extractor.get_audio_properties(audio_path)
81
82	# Verify properties
83	assert props["duration"] == 10.5
84	assert props["sample_rate"] == 16000
85	assert props["channels"] == 1
86	assert props["format"] == "WAV"
87	assert props["subtype"] == "PCM_16"
88	assert props["path"] == str(audio_path)
89
90	def test_segment_audio(self):
91	"""Test audio segmentation."""
92	# Create a dummy audio array (1 second at 16kHz)
93	audio_data = np.ones(16000)
94	sample_rate = 16000
95
96	extractor = AudioExtractor()
97
98	# Test with 500ms segments, no overlap
99	segments = extractor.segment_audio(
100	audio_data, sample_rate, segment_length_ms=500, overlap_ms=0



101	)
102
103	# Should produce 2 segments of 8000 samples each
104	assert len(segments) == 2
105	assert len(segments[0]) == 8000
106	assert len(segments[1]) == 8000
107
108	# Test with 600ms segments, 100ms overlap
109	segments = extractor.segment_audio(
110	audio_data, sample_rate, segment_length_ms=600, overlap_ms=100
111	)
112
113	# Should produce 2 segments (with overlap)
114	assert len(segments) == 2



115

M tests/test_batch.py

-3

		--- tests/test_batch.py
		+++ tests/test_batch.py
		@@ -1,11 +1,8 @@
1	1	"""Tests for batch processing and knowledge graph merging."""
2	2
3	3	import json
4		-from pathlib import Path
5		-
6		-import pytest
7	4
8	5	from video_processor.integrators.knowledge_graph import KnowledgeGraph
9	6	from video_processor.integrators.plan_generator import PlanGenerator
10	7	from video_processor.models import (
11	8	ActionItem,
12	9

	--- tests/test_batch.py
	+++ tests/test_batch.py
	@@ -1,11 +1,8 @@
1	"""Tests for batch processing and knowledge graph merging."""
2
3	import json
4	from pathlib import Path
5
6	import pytest
7
8	from video_processor.integrators.knowledge_graph import KnowledgeGraph
9	from video_processor.integrators.plan_generator import PlanGenerator
10	from video_processor.models import (
11	ActionItem,
12

	--- tests/test_batch.py
	+++ tests/test_batch.py
	@@ -1,11 +1,8 @@
1	"""Tests for batch processing and knowledge graph merging."""
2
3	import json



4
5	from video_processor.integrators.knowledge_graph import KnowledgeGraph
6	from video_processor.integrators.plan_generator import PlanGenerator
7	from video_processor.models import (
8	ActionItem,
9

M tests/test_cloud_sources.py

+13 -6

		--- tests/test_cloud_sources.py
		+++ tests/test_cloud_sources.py
		@@ -134,11 +134,13 @@
134	134	@patch("video_processor.sources.google_drive.GoogleDriveSource._auth_service_account")
135	135	def test_authenticate_import_error(self, mock_auth):
136	136	from video_processor.sources.google_drive import GoogleDriveSource
137	137
138	138	source = GoogleDriveSource()
139		- with patch.dict("sys.modules", {"google.oauth2": None, "google.oauth2.service_account": None}):
	139	+ with patch.dict(
	140	+ "sys.modules", {"google.oauth2": None, "google.oauth2.service_account": None}
	141	+ ):
140	142	# The import will fail inside authenticate
141	143	result = source.authenticate()
142	144	assert result is False
143	145
144	146
		@@ -188,19 +190,24 @@
188	190	def test_auth_saved_token(self, tmp_path):
189	191	pytest.importorskip("dropbox")
190	192	from video_processor.sources.dropbox_source import DropboxSource
191	193
192	194	token_file = tmp_path / "token.json"
193		- token_file.write_text(json.dumps({
194		- "refresh_token": "rt_test",
195		- "app_key": "key",
196		- "app_secret": "secret",
197		- }))
	195	+ token_file.write_text(
	196	+ json.dumps(
	197	+ {
	198	+ "refresh_token": "rt_test",
	199	+ "app_key": "key",
	200	+ "app_secret": "secret",
	201	+ }
	202	+ )
	203	+ )
198	204
199	205	source = DropboxSource(token_path=token_file, app_key="key", app_secret="secret")
200	206
201	207	mock_dbx = MagicMock()
202	208	with patch("dropbox.Dropbox", return_value=mock_dbx):
203	209	import dropbox
	210	+
204	211	result = source._auth_saved_token(dropbox)
205	212	assert result is True
206	213	assert source.dbx is mock_dbx
207	214

	--- tests/test_cloud_sources.py
	+++ tests/test_cloud_sources.py
	@@ -134,11 +134,13 @@
134	@patch("video_processor.sources.google_drive.GoogleDriveSource._auth_service_account")
135	def test_authenticate_import_error(self, mock_auth):
136	from video_processor.sources.google_drive import GoogleDriveSource
137
138	source = GoogleDriveSource()
139	with patch.dict("sys.modules", {"google.oauth2": None, "google.oauth2.service_account": None}):


140	# The import will fail inside authenticate
141	result = source.authenticate()
142	assert result is False
143
144
	@@ -188,19 +190,24 @@
188	def test_auth_saved_token(self, tmp_path):
189	pytest.importorskip("dropbox")
190	from video_processor.sources.dropbox_source import DropboxSource
191
192	token_file = tmp_path / "token.json"
193	token_file.write_text(json.dumps({
194	"refresh_token": "rt_test",
195	"app_key": "key",
196	"app_secret": "secret",
197	}))




198
199	source = DropboxSource(token_path=token_file, app_key="key", app_secret="secret")
200
201	mock_dbx = MagicMock()
202	with patch("dropbox.Dropbox", return_value=mock_dbx):
203	import dropbox

204	result = source._auth_saved_token(dropbox)
205	assert result is True
206	assert source.dbx is mock_dbx
207

	--- tests/test_cloud_sources.py
	+++ tests/test_cloud_sources.py
	@@ -134,11 +134,13 @@
134	@patch("video_processor.sources.google_drive.GoogleDriveSource._auth_service_account")
135	def test_authenticate_import_error(self, mock_auth):
136	from video_processor.sources.google_drive import GoogleDriveSource
137
138	source = GoogleDriveSource()
139	with patch.dict(
140	"sys.modules", {"google.oauth2": None, "google.oauth2.service_account": None}
141	):
142	# The import will fail inside authenticate
143	result = source.authenticate()
144	assert result is False
145
146
	@@ -188,19 +190,24 @@
190	def test_auth_saved_token(self, tmp_path):
191	pytest.importorskip("dropbox")
192	from video_processor.sources.dropbox_source import DropboxSource
193
194	token_file = tmp_path / "token.json"
195	token_file.write_text(
196	json.dumps(
197	{
198	"refresh_token": "rt_test",
199	"app_key": "key",
200	"app_secret": "secret",
201	}
202	)
203	)
204
205	source = DropboxSource(token_path=token_file, app_key="key", app_secret="secret")
206
207	mock_dbx = MagicMock()
208	with patch("dropbox.Dropbox", return_value=mock_dbx):
209	import dropbox
210
211	result = source._auth_saved_token(dropbox)
212	assert result is True
213	assert source.dbx is mock_dbx
214

M tests/test_content_analyzer.py

+9 -7

		--- tests/test_content_analyzer.py
		+++ tests/test_content_analyzer.py
		@@ -1,11 +1,9 @@
1	1	"""Tests for content cross-referencing between transcript and diagram entities."""
2	2
3	3	import json
4		-from unittest.mock import MagicMock, patch
5		-
6		-import pytest
	4	+from unittest.mock import MagicMock
7	5
8	6	from video_processor.analyzers.content_analyzer import ContentAnalyzer
9	7	from video_processor.models import Entity, KeyPoint
10	8
11	9
		@@ -74,13 +72,15 @@
74	72
75	73
76	74	class TestFuzzyMatch:
77	75	def test_fuzzy_match_with_llm(self):
78	76	pm = MagicMock()
79		- pm.chat.return_value = json.dumps([
80		- {"transcript": "K8s", "diagram": "Kubernetes"},
81		- ])
	77	+ pm.chat.return_value = json.dumps(
	78	+ [
	79	+ {"transcript": "K8s", "diagram": "Kubernetes"},
	80	+ ]
	81	+ )
82	82	analyzer = ContentAnalyzer(provider_manager=pm)
83	83
84	84	t_entities = [
85	85	Entity(name="K8s", type="technology", descriptions=["Container orchestration"]),
86	86	]
		@@ -189,11 +189,13 @@
189	189	assert len(result[0].related_diagrams) == 2
190	190
191	191	def test_details_used_for_matching(self):
192	192	analyzer = ContentAnalyzer()
193	193	kps = [
194		- KeyPoint(point="Architecture overview", details="Uses Docker and Kubernetes for deployment"),
	194	+ KeyPoint(
	195	+ point="Architecture overview", details="Uses Docker and Kubernetes for deployment"
	196	+ ),
195	197	]
196	198	diagrams = [
197	199	{"elements": ["Docker", "Kubernetes"], "text_content": "deployment infrastructure"},
198	200	]
199	201	result = analyzer.enrich_key_points(kps, diagrams, "")
200	202

	--- tests/test_content_analyzer.py
	+++ tests/test_content_analyzer.py
	@@ -1,11 +1,9 @@
1	"""Tests for content cross-referencing between transcript and diagram entities."""
2
3	import json
4	from unittest.mock import MagicMock, patch
5
6	import pytest
7
8	from video_processor.analyzers.content_analyzer import ContentAnalyzer
9	from video_processor.models import Entity, KeyPoint
10
11
	@@ -74,13 +72,15 @@
74
75
76	class TestFuzzyMatch:
77	def test_fuzzy_match_with_llm(self):
78	pm = MagicMock()
79	pm.chat.return_value = json.dumps([
80	{"transcript": "K8s", "diagram": "Kubernetes"},
81	])


82	analyzer = ContentAnalyzer(provider_manager=pm)
83
84	t_entities = [
85	Entity(name="K8s", type="technology", descriptions=["Container orchestration"]),
86	]
	@@ -189,11 +189,13 @@
189	assert len(result[0].related_diagrams) == 2
190
191	def test_details_used_for_matching(self):
192	analyzer = ContentAnalyzer()
193	kps = [
194	KeyPoint(point="Architecture overview", details="Uses Docker and Kubernetes for deployment"),


195	]
196	diagrams = [
197	{"elements": ["Docker", "Kubernetes"], "text_content": "deployment infrastructure"},
198	]
199	result = analyzer.enrich_key_points(kps, diagrams, "")
200

	--- tests/test_content_analyzer.py
	+++ tests/test_content_analyzer.py
	@@ -1,11 +1,9 @@
1	"""Tests for content cross-referencing between transcript and diagram entities."""
2
3	import json
4	from unittest.mock import MagicMock


5
6	from video_processor.analyzers.content_analyzer import ContentAnalyzer
7	from video_processor.models import Entity, KeyPoint
8
9
	@@ -74,13 +72,15 @@
72
73
74	class TestFuzzyMatch:
75	def test_fuzzy_match_with_llm(self):
76	pm = MagicMock()
77	pm.chat.return_value = json.dumps(
78	[
79	{"transcript": "K8s", "diagram": "Kubernetes"},
80	]
81	)
82	analyzer = ContentAnalyzer(provider_manager=pm)
83
84	t_entities = [
85	Entity(name="K8s", type="technology", descriptions=["Container orchestration"]),
86	]
	@@ -189,11 +189,13 @@
189	assert len(result[0].related_diagrams) == 2
190
191	def test_details_used_for_matching(self):
192	analyzer = ContentAnalyzer()
193	kps = [
194	KeyPoint(
195	point="Architecture overview", details="Uses Docker and Kubernetes for deployment"
196	),
197	]
198	diagrams = [
199	{"elements": ["Docker", "Kubernetes"], "text_content": "deployment infrastructure"},
200	]
201	result = analyzer.enrich_key_points(kps, diagrams, "")
202

M tests/test_diagram_analyzer.py

+79 -42

		--- tests/test_diagram_analyzer.py
		+++ tests/test_diagram_analyzer.py
		@@ -1,18 +1,17 @@
1	1	"""Tests for the rewritten diagram analyzer."""
2	2
3	3	import json
4		-from pathlib import Path
5		-from unittest.mock import MagicMock, patch
	4	+from unittest.mock import MagicMock
6	5
7	6	import pytest
8	7
9	8	from video_processor.analyzers.diagram_analyzer import (
10	9	DiagramAnalyzer,
11	10	_parse_json_response,
12	11	)
13		-from video_processor.models import DiagramResult, DiagramType, ScreenCapture
	12	+from video_processor.models import DiagramType
14	13
15	14
16	15	class TestParseJsonResponse:
17	16	def test_plain_json(self):
18	17	result = _parse_json_response('{"key": "value"}')
		@@ -50,27 +49,31 @@
50	49	fp = tmp_path / "frame_0.jpg"
51	50	fp.write_bytes(b"\xff\xd8\xff fake image data")
52	51	return fp
53	52
54	53	def test_classify_frame_diagram(self, analyzer, mock_pm, fake_frame):
55		- mock_pm.analyze_image.return_value = json.dumps({
56		- "is_diagram": True,
57		- "diagram_type": "flowchart",
58		- "confidence": 0.85,
59		- "brief_description": "A flowchart showing login process"
60		- })
	54	+ mock_pm.analyze_image.return_value = json.dumps(
	55	+ {
	56	+ "is_diagram": True,
	57	+ "diagram_type": "flowchart",
	58	+ "confidence": 0.85,
	59	+ "brief_description": "A flowchart showing login process",
	60	+ }
	61	+ )
61	62	result = analyzer.classify_frame(fake_frame)
62	63	assert result["is_diagram"] is True
63	64	assert result["confidence"] == 0.85
64	65
65	66	def test_classify_frame_not_diagram(self, analyzer, mock_pm, fake_frame):
66		- mock_pm.analyze_image.return_value = json.dumps({
67		- "is_diagram": False,
68		- "diagram_type": "unknown",
69		- "confidence": 0.1,
70		- "brief_description": "A person speaking"
71		- })
	67	+ mock_pm.analyze_image.return_value = json.dumps(
	68	+ {
	69	+ "is_diagram": False,
	70	+ "diagram_type": "unknown",
	71	+ "confidence": 0.1,
	72	+ "brief_description": "A person speaking",
	73	+ }
	74	+ )
72	75	result = analyzer.classify_frame(fake_frame)
73	76	assert result["is_diagram"] is False
74	77
75	78	def test_classify_frame_failure(self, analyzer, mock_pm, fake_frame):
76	79	mock_pm.analyze_image.return_value = "I cannot parse this image"
		@@ -77,19 +80,21 @@
77	80	result = analyzer.classify_frame(fake_frame)
78	81	assert result["is_diagram"] is False
79	82	assert result["confidence"] == 0.0
80	83
81	84	def test_analyze_single_pass(self, analyzer, mock_pm, fake_frame):
82		- mock_pm.analyze_image.return_value = json.dumps({
83		- "diagram_type": "architecture",
84		- "description": "Microservices architecture",
85		- "text_content": "Service A, Service B",
86		- "elements": ["Service A", "Service B"],
87		- "relationships": ["A -> B: calls"],
88		- "mermaid": "graph LR\n A-->B",
89		- "chart_data": None
90		- })
	85	+ mock_pm.analyze_image.return_value = json.dumps(
	86	+ {
	87	+ "diagram_type": "architecture",
	88	+ "description": "Microservices architecture",
	89	+ "text_content": "Service A, Service B",
	90	+ "elements": ["Service A", "Service B"],
	91	+ "relationships": ["A -> B: calls"],
	92	+ "mermaid": "graph LR\n A-->B",
	93	+ "chart_data": None,
	94	+ }
	95	+ )
91	96	result = analyzer.analyze_diagram_single_pass(fake_frame)
92	97	assert result["diagram_type"] == "architecture"
93	98	assert result["mermaid"] == "graph LR\n A-->B"
94	99
95	100	def test_process_frames_high_confidence_diagram(self, analyzer, mock_pm, tmp_path):
		@@ -105,38 +110,62 @@
105	110
106	111	# Frame 0: high confidence diagram
107	112	# Frame 1: low confidence (skip)
108	113	# Frame 2: medium confidence (screengrab)
109	114	classify_responses = [
110		- json.dumps({"is_diagram": True, "diagram_type": "flowchart", "confidence": 0.9, "brief_description": "flow"}),
111		- json.dumps({"is_diagram": False, "diagram_type": "unknown", "confidence": 0.1, "brief_description": "nothing"}),
112		- json.dumps({"is_diagram": True, "diagram_type": "slide", "confidence": 0.5, "brief_description": "a slide"}),
113		- ]
114		- analysis_response = json.dumps({
115		- "diagram_type": "flowchart",
116		- "description": "Login flow",
117		- "text_content": "Start -> End",
118		- "elements": ["Start", "End"],
119		- "relationships": ["Start -> End"],
120		- "mermaid": "graph LR\n Start-->End",
121		- "chart_data": None
122		- })
	115	+ json.dumps(
	116	+ {
	117	+ "is_diagram": True,
	118	+ "diagram_type": "flowchart",
	119	+ "confidence": 0.9,
	120	+ "brief_description": "flow",
	121	+ }
	122	+ ),
	123	+ json.dumps(
	124	+ {
	125	+ "is_diagram": False,
	126	+ "diagram_type": "unknown",
	127	+ "confidence": 0.1,
	128	+ "brief_description": "nothing",
	129	+ }
	130	+ ),
	131	+ json.dumps(
	132	+ {
	133	+ "is_diagram": True,
	134	+ "diagram_type": "slide",
	135	+ "confidence": 0.5,
	136	+ "brief_description": "a slide",
	137	+ }
	138	+ ),
	139	+ ]
	140	+ analysis_response = json.dumps(
	141	+ {
	142	+ "diagram_type": "flowchart",
	143	+ "description": "Login flow",
	144	+ "text_content": "Start -> End",
	145	+ "elements": ["Start", "End"],
	146	+ "relationships": ["Start -> End"],
	147	+ "mermaid": "graph LR\n Start-->End",
	148	+ "chart_data": None,
	149	+ }
	150	+ )
123	151
124	152	# Calls are interleaved per-frame:
125	153	# call 0: classify frame 0 (high conf)
126	154	# call 1: analyze frame 0 (full analysis)
127	155	# call 2: classify frame 1 (low conf - skip)
128	156	# call 3: classify frame 2 (medium conf)
129	157	# call 4: caption frame 2 (screengrab)
130	158	call_sequence = [
131		- classify_responses[0], # classify frame 0
132		- analysis_response, # analyze frame 0
133		- classify_responses[1], # classify frame 1
134		- classify_responses[2], # classify frame 2
	159	+ classify_responses[0], # classify frame 0
	160	+ analysis_response, # analyze frame 0
	161	+ classify_responses[1], # classify frame 1
	162	+ classify_responses[2], # classify frame 2
135	163	"A slide about something", # caption frame 2
136	164	]
137	165	call_count = [0]
	166	+
138	167	def side_effect(image_bytes, prompt, max_tokens=4096):
139	168	idx = call_count[0]
140	169	call_count[0] += 1
141	170	return call_sequence[idx]
142	171
		@@ -164,15 +193,23 @@
164	193	fp.write_bytes(b"\xff\xd8\xff fake")
165	194	captures_dir = tmp_path / "captures"
166	195
167	196	# High confidence classification but analysis fails
168	197	call_count = [0]
	198	+
169	199	def side_effect(image_bytes, prompt, max_tokens=4096):
170	200	idx = call_count[0]
171	201	call_count[0] += 1
172	202	if idx == 0:
173		- return json.dumps({"is_diagram": True, "diagram_type": "chart", "confidence": 0.8, "brief_description": "chart"})
	203	+ return json.dumps(
	204	+ {
	205	+ "is_diagram": True,
	206	+ "diagram_type": "chart",
	207	+ "confidence": 0.8,
	208	+ "brief_description": "chart",
	209	+ }
	210	+ )
174	211	if idx == 1:
175	212	return "This is not valid JSON" # Analysis fails
176	213	return "A chart showing data" # Caption
177	214
178	215	mock_pm.analyze_image.side_effect = side_effect
179	216

	--- tests/test_diagram_analyzer.py
	+++ tests/test_diagram_analyzer.py
	@@ -1,18 +1,17 @@
1	"""Tests for the rewritten diagram analyzer."""
2
3	import json
4	from pathlib import Path
5	from unittest.mock import MagicMock, patch
6
7	import pytest
8
9	from video_processor.analyzers.diagram_analyzer import (
10	DiagramAnalyzer,
11	_parse_json_response,
12	)
13	from video_processor.models import DiagramResult, DiagramType, ScreenCapture
14
15
16	class TestParseJsonResponse:
17	def test_plain_json(self):
18	result = _parse_json_response('{"key": "value"}')
	@@ -50,27 +49,31 @@
50	fp = tmp_path / "frame_0.jpg"
51	fp.write_bytes(b"\xff\xd8\xff fake image data")
52	return fp
53
54	def test_classify_frame_diagram(self, analyzer, mock_pm, fake_frame):
55	mock_pm.analyze_image.return_value = json.dumps({
56	"is_diagram": True,
57	"diagram_type": "flowchart",
58	"confidence": 0.85,
59	"brief_description": "A flowchart showing login process"
60	})


61	result = analyzer.classify_frame(fake_frame)
62	assert result["is_diagram"] is True
63	assert result["confidence"] == 0.85
64
65	def test_classify_frame_not_diagram(self, analyzer, mock_pm, fake_frame):
66	mock_pm.analyze_image.return_value = json.dumps({
67	"is_diagram": False,
68	"diagram_type": "unknown",
69	"confidence": 0.1,
70	"brief_description": "A person speaking"
71	})


72	result = analyzer.classify_frame(fake_frame)
73	assert result["is_diagram"] is False
74
75	def test_classify_frame_failure(self, analyzer, mock_pm, fake_frame):
76	mock_pm.analyze_image.return_value = "I cannot parse this image"
	@@ -77,19 +80,21 @@
77	result = analyzer.classify_frame(fake_frame)
78	assert result["is_diagram"] is False
79	assert result["confidence"] == 0.0
80
81	def test_analyze_single_pass(self, analyzer, mock_pm, fake_frame):
82	mock_pm.analyze_image.return_value = json.dumps({
83	"diagram_type": "architecture",
84	"description": "Microservices architecture",
85	"text_content": "Service A, Service B",
86	"elements": ["Service A", "Service B"],
87	"relationships": ["A -> B: calls"],
88	"mermaid": "graph LR\n A-->B",
89	"chart_data": None
90	})


91	result = analyzer.analyze_diagram_single_pass(fake_frame)
92	assert result["diagram_type"] == "architecture"
93	assert result["mermaid"] == "graph LR\n A-->B"
94
95	def test_process_frames_high_confidence_diagram(self, analyzer, mock_pm, tmp_path):
	@@ -105,38 +110,62 @@
105
106	# Frame 0: high confidence diagram
107	# Frame 1: low confidence (skip)
108	# Frame 2: medium confidence (screengrab)
109	classify_responses = [
110	json.dumps({"is_diagram": True, "diagram_type": "flowchart", "confidence": 0.9, "brief_description": "flow"}),
111	json.dumps({"is_diagram": False, "diagram_type": "unknown", "confidence": 0.1, "brief_description": "nothing"}),
112	json.dumps({"is_diagram": True, "diagram_type": "slide", "confidence": 0.5, "brief_description": "a slide"}),
113	]
114	analysis_response = json.dumps({
115	"diagram_type": "flowchart",
116	"description": "Login flow",
117	"text_content": "Start -> End",
118	"elements": ["Start", "End"],
119	"relationships": ["Start -> End"],
120	"mermaid": "graph LR\n Start-->End",
121	"chart_data": None
122	})























123
124	# Calls are interleaved per-frame:
125	# call 0: classify frame 0 (high conf)
126	# call 1: analyze frame 0 (full analysis)
127	# call 2: classify frame 1 (low conf - skip)
128	# call 3: classify frame 2 (medium conf)
129	# call 4: caption frame 2 (screengrab)
130	call_sequence = [
131	classify_responses[0], # classify frame 0
132	analysis_response, # analyze frame 0
133	classify_responses[1], # classify frame 1
134	classify_responses[2], # classify frame 2
135	"A slide about something", # caption frame 2
136	]
137	call_count = [0]

138	def side_effect(image_bytes, prompt, max_tokens=4096):
139	idx = call_count[0]
140	call_count[0] += 1
141	return call_sequence[idx]
142
	@@ -164,15 +193,23 @@
164	fp.write_bytes(b"\xff\xd8\xff fake")
165	captures_dir = tmp_path / "captures"
166
167	# High confidence classification but analysis fails
168	call_count = [0]

169	def side_effect(image_bytes, prompt, max_tokens=4096):
170	idx = call_count[0]
171	call_count[0] += 1
172	if idx == 0:
173	return json.dumps({"is_diagram": True, "diagram_type": "chart", "confidence": 0.8, "brief_description": "chart"})







174	if idx == 1:
175	return "This is not valid JSON" # Analysis fails
176	return "A chart showing data" # Caption
177
178	mock_pm.analyze_image.side_effect = side_effect
179

	--- tests/test_diagram_analyzer.py
	+++ tests/test_diagram_analyzer.py
	@@ -1,18 +1,17 @@
1	"""Tests for the rewritten diagram analyzer."""
2
3	import json
4	from unittest.mock import MagicMock

5
6	import pytest
7
8	from video_processor.analyzers.diagram_analyzer import (
9	DiagramAnalyzer,
10	_parse_json_response,
11	)
12	from video_processor.models import DiagramType
13
14
15	class TestParseJsonResponse:
16	def test_plain_json(self):
17	result = _parse_json_response('{"key": "value"}')
	@@ -50,27 +49,31 @@
49	fp = tmp_path / "frame_0.jpg"
50	fp.write_bytes(b"\xff\xd8\xff fake image data")
51	return fp
52
53	def test_classify_frame_diagram(self, analyzer, mock_pm, fake_frame):
54	mock_pm.analyze_image.return_value = json.dumps(
55	{
56	"is_diagram": True,
57	"diagram_type": "flowchart",
58	"confidence": 0.85,
59	"brief_description": "A flowchart showing login process",
60	}
61	)
62	result = analyzer.classify_frame(fake_frame)
63	assert result["is_diagram"] is True
64	assert result["confidence"] == 0.85
65
66	def test_classify_frame_not_diagram(self, analyzer, mock_pm, fake_frame):
67	mock_pm.analyze_image.return_value = json.dumps(
68	{
69	"is_diagram": False,
70	"diagram_type": "unknown",
71	"confidence": 0.1,
72	"brief_description": "A person speaking",
73	}
74	)
75	result = analyzer.classify_frame(fake_frame)
76	assert result["is_diagram"] is False
77
78	def test_classify_frame_failure(self, analyzer, mock_pm, fake_frame):
79	mock_pm.analyze_image.return_value = "I cannot parse this image"
	@@ -77,19 +80,21 @@
80	result = analyzer.classify_frame(fake_frame)
81	assert result["is_diagram"] is False
82	assert result["confidence"] == 0.0
83
84	def test_analyze_single_pass(self, analyzer, mock_pm, fake_frame):
85	mock_pm.analyze_image.return_value = json.dumps(
86	{
87	"diagram_type": "architecture",
88	"description": "Microservices architecture",
89	"text_content": "Service A, Service B",
90	"elements": ["Service A", "Service B"],
91	"relationships": ["A -> B: calls"],
92	"mermaid": "graph LR\n A-->B",
93	"chart_data": None,
94	}
95	)
96	result = analyzer.analyze_diagram_single_pass(fake_frame)
97	assert result["diagram_type"] == "architecture"
98	assert result["mermaid"] == "graph LR\n A-->B"
99
100	def test_process_frames_high_confidence_diagram(self, analyzer, mock_pm, tmp_path):
	@@ -105,38 +110,62 @@
110
111	# Frame 0: high confidence diagram
112	# Frame 1: low confidence (skip)
113	# Frame 2: medium confidence (screengrab)
114	classify_responses = [
115	json.dumps(
116	{
117	"is_diagram": True,
118	"diagram_type": "flowchart",
119	"confidence": 0.9,
120	"brief_description": "flow",
121	}
122	),
123	json.dumps(
124	{
125	"is_diagram": False,
126	"diagram_type": "unknown",
127	"confidence": 0.1,
128	"brief_description": "nothing",
129	}
130	),
131	json.dumps(
132	{
133	"is_diagram": True,
134	"diagram_type": "slide",
135	"confidence": 0.5,
136	"brief_description": "a slide",
137	}
138	),
139	]
140	analysis_response = json.dumps(
141	{
142	"diagram_type": "flowchart",
143	"description": "Login flow",
144	"text_content": "Start -> End",
145	"elements": ["Start", "End"],
146	"relationships": ["Start -> End"],
147	"mermaid": "graph LR\n Start-->End",
148	"chart_data": None,
149	}
150	)
151
152	# Calls are interleaved per-frame:
153	# call 0: classify frame 0 (high conf)
154	# call 1: analyze frame 0 (full analysis)
155	# call 2: classify frame 1 (low conf - skip)
156	# call 3: classify frame 2 (medium conf)
157	# call 4: caption frame 2 (screengrab)
158	call_sequence = [
159	classify_responses[0], # classify frame 0
160	analysis_response, # analyze frame 0
161	classify_responses[1], # classify frame 1
162	classify_responses[2], # classify frame 2
163	"A slide about something", # caption frame 2
164	]
165	call_count = [0]
166
167	def side_effect(image_bytes, prompt, max_tokens=4096):
168	idx = call_count[0]
169	call_count[0] += 1
170	return call_sequence[idx]
171
	@@ -164,15 +193,23 @@
193	fp.write_bytes(b"\xff\xd8\xff fake")
194	captures_dir = tmp_path / "captures"
195
196	# High confidence classification but analysis fails
197	call_count = [0]
198
199	def side_effect(image_bytes, prompt, max_tokens=4096):
200	idx = call_count[0]
201	call_count[0] += 1
202	if idx == 0:
203	return json.dumps(
204	{
205	"is_diagram": True,
206	"diagram_type": "chart",
207	"confidence": 0.8,
208	"brief_description": "chart",
209	}
210	)
211	if idx == 1:
212	return "This is not valid JSON" # Analysis fails
213	return "A chart showing data" # Caption
214
215	mock_pm.analyze_image.side_effect = side_effect
216

M tests/test_frame_extractor.py

+12 -9

		--- tests/test_frame_extractor.py
		+++ tests/test_frame_extractor.py
		@@ -1,19 +1,19 @@
1	1	"""Tests for the frame extractor module."""
	2	+
2	3	import os
3	4	import tempfile
4		-from pathlib import Path
5	5
6	6	import numpy as np
7	7	import pytest
8	8
9	9	from video_processor.extractors.frame_extractor import (
10	10	calculate_frame_difference,
11		- extract_frames,
12	11	is_gpu_available,
13		- save_frames
	12	+ save_frames,
14	13	)
	14	+
15	15
16	16	# Create dummy test frames
17	17	@pytest.fixture
18	18	def dummy_frames():
19	19	# Create a list of dummy frames with different content
		@@ -21,42 +21,45 @@
21	21	for i in range(3):
22	22	# Create frame with different intensity for each
23	23	frame = np.ones((100, 100, 3), dtype=np.uint8) * (i * 50)
24	24	frames.append(frame)
25	25	return frames
	26	+
26	27
27	28	def test_calculate_frame_difference():
28	29	"""Test frame difference calculation."""
29	30	# Create two frames with some difference
30	31	frame1 = np.zeros((100, 100, 3), dtype=np.uint8)
31	32	frame2 = np.ones((100, 100, 3), dtype=np.uint8) * 128 # 50% intensity
32		-
	33	+
33	34	# Calculate difference
34	35	diff = calculate_frame_difference(frame1, frame2)
35		-
	36	+
36	37	# Expected difference is around 128/255 = 0.5
37	38	assert 0.45 <= diff <= 0.55
38		-
	39	+
39	40	# Test identical frames
40	41	diff_identical = calculate_frame_difference(frame1, frame1.copy())
41	42	assert diff_identical < 0.001 # Should be very close to 0
	43	+
42	44
43	45	def test_is_gpu_available():
44	46	"""Test GPU availability check."""
45	47	# This just tests that the function runs without error
46	48	# We don't assert the result because it depends on the system
47	49	result = is_gpu_available()
48	50	assert isinstance(result, bool)
	51	+
49	52
50	53	def test_save_frames(dummy_frames):
51	54	"""Test saving frames to disk."""
52	55	with tempfile.TemporaryDirectory() as temp_dir:
53	56	# Save frames
54	57	paths = save_frames(dummy_frames, temp_dir, "test_frame")
55		-
	58	+
56	59	# Check that we got the correct number of paths
57	60	assert len(paths) == len(dummy_frames)
58		-
	61	+
59	62	# Check that files were created
60	63	for path in paths:
61	64	assert os.path.exists(path)
62		- assert os.path.getsize(path) > 0 # Files should have content
	65	+ assert os.path.getsize(path) > 0 # Files should have content
63	66

	--- tests/test_frame_extractor.py
	+++ tests/test_frame_extractor.py
	@@ -1,19 +1,19 @@
1	"""Tests for the frame extractor module."""

2	import os
3	import tempfile
4	from pathlib import Path
5
6	import numpy as np
7	import pytest
8
9	from video_processor.extractors.frame_extractor import (
10	calculate_frame_difference,
11	extract_frames,
12	is_gpu_available,
13	save_frames
14	)

15
16	# Create dummy test frames
17	@pytest.fixture
18	def dummy_frames():
19	# Create a list of dummy frames with different content
	@@ -21,42 +21,45 @@
21	for i in range(3):
22	# Create frame with different intensity for each
23	frame = np.ones((100, 100, 3), dtype=np.uint8) * (i * 50)
24	frames.append(frame)
25	return frames

26
27	def test_calculate_frame_difference():
28	"""Test frame difference calculation."""
29	# Create two frames with some difference
30	frame1 = np.zeros((100, 100, 3), dtype=np.uint8)
31	frame2 = np.ones((100, 100, 3), dtype=np.uint8) * 128 # 50% intensity
32
33	# Calculate difference
34	diff = calculate_frame_difference(frame1, frame2)
35
36	# Expected difference is around 128/255 = 0.5
37	assert 0.45 <= diff <= 0.55
38
39	# Test identical frames
40	diff_identical = calculate_frame_difference(frame1, frame1.copy())
41	assert diff_identical < 0.001 # Should be very close to 0

42
43	def test_is_gpu_available():
44	"""Test GPU availability check."""
45	# This just tests that the function runs without error
46	# We don't assert the result because it depends on the system
47	result = is_gpu_available()
48	assert isinstance(result, bool)

49
50	def test_save_frames(dummy_frames):
51	"""Test saving frames to disk."""
52	with tempfile.TemporaryDirectory() as temp_dir:
53	# Save frames
54	paths = save_frames(dummy_frames, temp_dir, "test_frame")
55
56	# Check that we got the correct number of paths
57	assert len(paths) == len(dummy_frames)
58
59	# Check that files were created
60	for path in paths:
61	assert os.path.exists(path)
62	assert os.path.getsize(path) > 0 # Files should have content
63

	--- tests/test_frame_extractor.py
	+++ tests/test_frame_extractor.py
	@@ -1,19 +1,19 @@
1	"""Tests for the frame extractor module."""
2
3	import os
4	import tempfile

5
6	import numpy as np
7	import pytest
8
9	from video_processor.extractors.frame_extractor import (
10	calculate_frame_difference,

11	is_gpu_available,
12	save_frames,
13	)
14
15
16	# Create dummy test frames
17	@pytest.fixture
18	def dummy_frames():
19	# Create a list of dummy frames with different content
	@@ -21,42 +21,45 @@
21	for i in range(3):
22	# Create frame with different intensity for each
23	frame = np.ones((100, 100, 3), dtype=np.uint8) * (i * 50)
24	frames.append(frame)
25	return frames
26
27
28	def test_calculate_frame_difference():
29	"""Test frame difference calculation."""
30	# Create two frames with some difference
31	frame1 = np.zeros((100, 100, 3), dtype=np.uint8)
32	frame2 = np.ones((100, 100, 3), dtype=np.uint8) * 128 # 50% intensity
33
34	# Calculate difference
35	diff = calculate_frame_difference(frame1, frame2)
36
37	# Expected difference is around 128/255 = 0.5
38	assert 0.45 <= diff <= 0.55
39
40	# Test identical frames
41	diff_identical = calculate_frame_difference(frame1, frame1.copy())
42	assert diff_identical < 0.001 # Should be very close to 0
43
44
45	def test_is_gpu_available():
46	"""Test GPU availability check."""
47	# This just tests that the function runs without error
48	# We don't assert the result because it depends on the system
49	result = is_gpu_available()
50	assert isinstance(result, bool)
51
52
53	def test_save_frames(dummy_frames):
54	"""Test saving frames to disk."""
55	with tempfile.TemporaryDirectory() as temp_dir:
56	# Save frames
57	paths = save_frames(dummy_frames, temp_dir, "test_frame")
58
59	# Check that we got the correct number of paths
60	assert len(paths) == len(dummy_frames)
61
62	# Check that files were created
63	for path in paths:
64	assert os.path.exists(path)
65	assert os.path.getsize(path) > 0 # Files should have content
66

M tests/test_json_parsing.py

+2 -4

		--- tests/test_json_parsing.py
		+++ tests/test_json_parsing.py
		@@ -1,25 +1,23 @@
1	1	"""Tests for robust JSON parsing from LLM responses."""
2	2
3		-import pytest
4		-
5	3	from video_processor.utils.json_parsing import parse_json_from_response
6	4
7	5
8	6	class TestParseJsonFromResponse:
9	7	def test_direct_dict(self):
10	8	assert parse_json_from_response('{"key": "value"}') == {"key": "value"}
11	9
12	10	def test_direct_array(self):
13		- assert parse_json_from_response('[1, 2, 3]') == [1, 2, 3]
	11	+ assert parse_json_from_response("[1, 2, 3]") == [1, 2, 3]
14	12
15	13	def test_markdown_fenced_json(self):
16	14	text = '```json\n{"key": "value"}\n```'
17	15	assert parse_json_from_response(text) == {"key": "value"}
18	16
19	17	def test_markdown_fenced_no_lang(self):
20		- text = '```\n[1, 2]\n```'
	18	+ text = "```\n[1, 2]\n```"
21	19	assert parse_json_from_response(text) == [1, 2]
22	20
23	21	def test_json_embedded_in_text(self):
24	22	text = 'Here is the result:\n{"name": "test", "value": 42}\nEnd of result.'
25	23	result = parse_json_from_response(text)
26	24

	--- tests/test_json_parsing.py
	+++ tests/test_json_parsing.py
	@@ -1,25 +1,23 @@
1	"""Tests for robust JSON parsing from LLM responses."""
2
3	import pytest
4
5	from video_processor.utils.json_parsing import parse_json_from_response
6
7
8	class TestParseJsonFromResponse:
9	def test_direct_dict(self):
10	assert parse_json_from_response('{"key": "value"}') == {"key": "value"}
11
12	def test_direct_array(self):
13	assert parse_json_from_response('[1, 2, 3]') == [1, 2, 3]
14
15	def test_markdown_fenced_json(self):
16	text = '```json\n{"key": "value"}\n```'
17	assert parse_json_from_response(text) == {"key": "value"}
18
19	def test_markdown_fenced_no_lang(self):
20	text = '```\n[1, 2]\n```'
21	assert parse_json_from_response(text) == [1, 2]
22
23	def test_json_embedded_in_text(self):
24	text = 'Here is the result:\n{"name": "test", "value": 42}\nEnd of result.'
25	result = parse_json_from_response(text)
26

	--- tests/test_json_parsing.py
	+++ tests/test_json_parsing.py
	@@ -1,25 +1,23 @@
1	"""Tests for robust JSON parsing from LLM responses."""
2


3	from video_processor.utils.json_parsing import parse_json_from_response
4
5
6	class TestParseJsonFromResponse:
7	def test_direct_dict(self):
8	assert parse_json_from_response('{"key": "value"}') == {"key": "value"}
9
10	def test_direct_array(self):
11	assert parse_json_from_response("[1, 2, 3]") == [1, 2, 3]
12
13	def test_markdown_fenced_json(self):
14	text = '```json\n{"key": "value"}\n```'
15	assert parse_json_from_response(text) == {"key": "value"}
16
17	def test_markdown_fenced_no_lang(self):
18	text = "```\n[1, 2]\n```"
19	assert parse_json_from_response(text) == [1, 2]
20
21	def test_json_embedded_in_text(self):
22	text = 'Here is the result:\n{"name": "test", "value": 42}\nEnd of result.'
23	result = parse_json_from_response(text)
24

M tests/test_models.py

+11 -7

		--- tests/test_models.py
		+++ tests/test_models.py
		@@ -1,11 +1,7 @@
1	1	"""Tests for pydantic data models."""
2	2
3		-import json
4		-
5		-import pytest
6		-
7	3	from video_processor.models import (
8	4	ActionItem,
9	5	BatchManifest,
10	6	BatchVideoEntry,
11	7	DiagramResult,
		@@ -66,11 +62,13 @@
66	62	assert restored == item
67	63
68	64
69	65	class TestKeyPoint:
70	66	def test_with_related_diagrams(self):
71		- kp = KeyPoint(point="System uses microservices", topic="Architecture", related_diagrams=[0, 2])
	67	+ kp = KeyPoint(
	68	+ point="System uses microservices", topic="Architecture", related_diagrams=[0, 2]
	69	+ )
72	70	assert kp.related_diagrams == [0, 2]
73	71
74	72	def test_round_trip(self):
75	73	kp = KeyPoint(point="Test", details="Detail", timestamp=42.0, source="diagram")
76	74	restored = KeyPoint.model_validate_json(kp.model_dump_json())
		@@ -120,11 +118,15 @@
120	118	sc = ScreenCapture(frame_index=10, caption="Architecture overview slide", confidence=0.5)
121	119	assert sc.image_path is None
122	120
123	121	def test_round_trip(self):
124	122	sc = ScreenCapture(
125		- frame_index=7, timestamp=30.0, caption="Timeline", image_path="captures/capture_0.jpg", confidence=0.45
	123	+ frame_index=7,
	124	+ timestamp=30.0,
	125	+ caption="Timeline",
	126	+ image_path="captures/capture_0.jpg",
	127	+ confidence=0.45,
126	128	)
127	129	restored = ScreenCapture.model_validate_json(sc.model_dump_json())
128	130	assert restored == sc
129	131
130	132
		@@ -171,11 +173,13 @@
171	173	assert m.screen_captures == []
172	174	assert m.stats.frames_extracted == 0
173	175
174	176	def test_full_round_trip(self):
175	177	m = VideoManifest(
176		- video=VideoMetadata(title="Meeting", source_path="/tmp/video.mp4", duration_seconds=3600.0),
	178	+ video=VideoMetadata(
	179	+ title="Meeting", source_path="/tmp/video.mp4", duration_seconds=3600.0
	180	+ ),
177	181	stats=ProcessingStats(
178	182	frames_extracted=50,
179	183	diagrams_detected=3,
180	184	screen_captures=2,
181	185	models_used={"vision": "gpt-4o", "chat": "claude-sonnet-4-5"},
182	186

	--- tests/test_models.py
	+++ tests/test_models.py
	@@ -1,11 +1,7 @@
1	"""Tests for pydantic data models."""
2
3	import json
4
5	import pytest
6
7	from video_processor.models import (
8	ActionItem,
9	BatchManifest,
10	BatchVideoEntry,
11	DiagramResult,
	@@ -66,11 +62,13 @@
66	assert restored == item
67
68
69	class TestKeyPoint:
70	def test_with_related_diagrams(self):
71	kp = KeyPoint(point="System uses microservices", topic="Architecture", related_diagrams=[0, 2])


72	assert kp.related_diagrams == [0, 2]
73
74	def test_round_trip(self):
75	kp = KeyPoint(point="Test", details="Detail", timestamp=42.0, source="diagram")
76	restored = KeyPoint.model_validate_json(kp.model_dump_json())
	@@ -120,11 +118,15 @@
120	sc = ScreenCapture(frame_index=10, caption="Architecture overview slide", confidence=0.5)
121	assert sc.image_path is None
122
123	def test_round_trip(self):
124	sc = ScreenCapture(
125	frame_index=7, timestamp=30.0, caption="Timeline", image_path="captures/capture_0.jpg", confidence=0.45




126	)
127	restored = ScreenCapture.model_validate_json(sc.model_dump_json())
128	assert restored == sc
129
130
	@@ -171,11 +173,13 @@
171	assert m.screen_captures == []
172	assert m.stats.frames_extracted == 0
173
174	def test_full_round_trip(self):
175	m = VideoManifest(
176	video=VideoMetadata(title="Meeting", source_path="/tmp/video.mp4", duration_seconds=3600.0),


177	stats=ProcessingStats(
178	frames_extracted=50,
179	diagrams_detected=3,
180	screen_captures=2,
181	models_used={"vision": "gpt-4o", "chat": "claude-sonnet-4-5"},
182

	--- tests/test_models.py
	+++ tests/test_models.py
	@@ -1,11 +1,7 @@
1	"""Tests for pydantic data models."""
2




3	from video_processor.models import (
4	ActionItem,
5	BatchManifest,
6	BatchVideoEntry,
7	DiagramResult,
	@@ -66,11 +62,13 @@
62	assert restored == item
63
64
65	class TestKeyPoint:
66	def test_with_related_diagrams(self):
67	kp = KeyPoint(
68	point="System uses microservices", topic="Architecture", related_diagrams=[0, 2]
69	)
70	assert kp.related_diagrams == [0, 2]
71
72	def test_round_trip(self):
73	kp = KeyPoint(point="Test", details="Detail", timestamp=42.0, source="diagram")
74	restored = KeyPoint.model_validate_json(kp.model_dump_json())
	@@ -120,11 +118,15 @@
118	sc = ScreenCapture(frame_index=10, caption="Architecture overview slide", confidence=0.5)
119	assert sc.image_path is None
120
121	def test_round_trip(self):
122	sc = ScreenCapture(
123	frame_index=7,
124	timestamp=30.0,
125	caption="Timeline",
126	image_path="captures/capture_0.jpg",
127	confidence=0.45,
128	)
129	restored = ScreenCapture.model_validate_json(sc.model_dump_json())
130	assert restored == sc
131
132
	@@ -171,11 +173,13 @@
173	assert m.screen_captures == []
174	assert m.stats.frames_extracted == 0
175
176	def test_full_round_trip(self):
177	m = VideoManifest(
178	video=VideoMetadata(
179	title="Meeting", source_path="/tmp/video.mp4", duration_seconds=3600.0
180	),
181	stats=ProcessingStats(
182	frames_extracted=50,
183	diagrams_detected=3,
184	screen_captures=2,
185	models_used={"vision": "gpt-4o", "chat": "claude-sonnet-4-5"},
186

M tests/test_output_structure.py

-4

		--- tests/test_output_structure.py
		+++ tests/test_output_structure.py
		@@ -1,12 +1,8 @@
1	1	"""Tests for output structure and manifest I/O."""
2	2
3	3	import json
4		-import tempfile
5		-from pathlib import Path
6		-
7		-import pytest
8	4
9	5	from video_processor.models import (
10	6	ActionItem,
11	7	BatchManifest,
12	8	BatchVideoEntry,
13	9

	--- tests/test_output_structure.py
	+++ tests/test_output_structure.py
	@@ -1,12 +1,8 @@
1	"""Tests for output structure and manifest I/O."""
2
3	import json
4	import tempfile
5	from pathlib import Path
6
7	import pytest
8
9	from video_processor.models import (
10	ActionItem,
11	BatchManifest,
12	BatchVideoEntry,
13

	--- tests/test_output_structure.py
	+++ tests/test_output_structure.py
	@@ -1,12 +1,8 @@
1	"""Tests for output structure and manifest I/O."""
2
3	import json




4
5	from video_processor.models import (
6	ActionItem,
7	BatchManifest,
8	BatchVideoEntry,
9

M tests/test_pipeline.py

+33 -23

		--- tests/test_pipeline.py
		+++ tests/test_pipeline.py
		@@ -1,14 +1,11 @@
1	1	"""Tests for the core video processing pipeline."""
2	2
3	3	import json
4		-from pathlib import Path
5		-from unittest.mock import MagicMock, patch
	4	+from unittest.mock import MagicMock
6	5
7		-import pytest
8		-
9		-from video_processor.pipeline import _extract_key_points, _extract_action_items, _format_srt_time
	6	+from video_processor.pipeline import _extract_action_items, _extract_key_points, _format_srt_time
10	7
11	8
12	9	class TestFormatSrtTime:
13	10	def test_zero(self):
14	11	assert _format_srt_time(0) == "00:00:00,000"
		@@ -28,27 +25,31 @@
28	25
29	26
30	27	class TestExtractKeyPoints:
31	28	def test_parses_valid_response(self):
32	29	pm = MagicMock()
33		- pm.chat.return_value = json.dumps([
34		- {"point": "Main point", "topic": "Architecture", "details": "Some details"},
35		- {"point": "Second point", "topic": None, "details": None},
36		- ])
	30	+ pm.chat.return_value = json.dumps(
	31	+ [
	32	+ {"point": "Main point", "topic": "Architecture", "details": "Some details"},
	33	+ {"point": "Second point", "topic": None, "details": None},
	34	+ ]
	35	+ )
37	36	result = _extract_key_points(pm, "Some transcript text here")
38	37	assert len(result) == 2
39	38	assert result[0].point == "Main point"
40	39	assert result[0].topic == "Architecture"
41	40	assert result[1].point == "Second point"
42	41
43	42	def test_skips_invalid_items(self):
44	43	pm = MagicMock()
45		- pm.chat.return_value = json.dumps([
46		- {"point": "Valid", "topic": None},
47		- {"topic": "No point field"},
48		- {"point": "", "topic": "Empty point"},
49		- ])
	44	+ pm.chat.return_value = json.dumps(
	45	+ [
	46	+ {"point": "Valid", "topic": None},
	47	+ {"topic": "No point field"},
	48	+ {"point": "", "topic": "Empty point"},
	49	+ ]
	50	+ )
50	51	result = _extract_key_points(pm, "text")
51	52	assert len(result) == 1
52	53	assert result[0].point == "Valid"
53	54
54	55	def test_handles_error(self):
		@@ -65,29 +66,38 @@
65	66
66	67
67	68	class TestExtractActionItems:
68	69	def test_parses_valid_response(self):
69	70	pm = MagicMock()
70		- pm.chat.return_value = json.dumps([
71		- {"action": "Deploy fix", "assignee": "Bob", "deadline": "Friday",
72		- "priority": "high", "context": "Production"},
73		- ])
	71	+ pm.chat.return_value = json.dumps(
	72	+ [
	73	+ {
	74	+ "action": "Deploy fix",
	75	+ "assignee": "Bob",
	76	+ "deadline": "Friday",
	77	+ "priority": "high",
	78	+ "context": "Production",
	79	+ },
	80	+ ]
	81	+ )
74	82	result = _extract_action_items(pm, "Some transcript text")
75	83	assert len(result) == 1
76	84	assert result[0].action == "Deploy fix"
77	85	assert result[0].assignee == "Bob"
78	86
79	87	def test_skips_invalid_items(self):
80	88	pm = MagicMock()
81		- pm.chat.return_value = json.dumps([
82		- {"action": "Valid action"},
83		- {"assignee": "No action field"},
84		- {"action": ""},
85		- ])
	89	+ pm.chat.return_value = json.dumps(
	90	+ [
	91	+ {"action": "Valid action"},
	92	+ {"assignee": "No action field"},
	93	+ {"action": ""},
	94	+ ]
	95	+ )
86	96	result = _extract_action_items(pm, "text")
87	97	assert len(result) == 1
88	98
89	99	def test_handles_error(self):
90	100	pm = MagicMock()
91	101	pm.chat.side_effect = Exception("API down")
92	102	result = _extract_action_items(pm, "text")
93	103	assert result == []
94	104

	--- tests/test_pipeline.py
	+++ tests/test_pipeline.py
	@@ -1,14 +1,11 @@
1	"""Tests for the core video processing pipeline."""
2
3	import json
4	from pathlib import Path
5	from unittest.mock import MagicMock, patch
6
7	import pytest
8
9	from video_processor.pipeline import _extract_key_points, _extract_action_items, _format_srt_time
10
11
12	class TestFormatSrtTime:
13	def test_zero(self):
14	assert _format_srt_time(0) == "00:00:00,000"
	@@ -28,27 +25,31 @@
28
29
30	class TestExtractKeyPoints:
31	def test_parses_valid_response(self):
32	pm = MagicMock()
33	pm.chat.return_value = json.dumps([
34	{"point": "Main point", "topic": "Architecture", "details": "Some details"},
35	{"point": "Second point", "topic": None, "details": None},
36	])


37	result = _extract_key_points(pm, "Some transcript text here")
38	assert len(result) == 2
39	assert result[0].point == "Main point"
40	assert result[0].topic == "Architecture"
41	assert result[1].point == "Second point"
42
43	def test_skips_invalid_items(self):
44	pm = MagicMock()
45	pm.chat.return_value = json.dumps([
46	{"point": "Valid", "topic": None},
47	{"topic": "No point field"},
48	{"point": "", "topic": "Empty point"},
49	])


50	result = _extract_key_points(pm, "text")
51	assert len(result) == 1
52	assert result[0].point == "Valid"
53
54	def test_handles_error(self):
	@@ -65,29 +66,38 @@
65
66
67	class TestExtractActionItems:
68	def test_parses_valid_response(self):
69	pm = MagicMock()
70	pm.chat.return_value = json.dumps([
71	{"action": "Deploy fix", "assignee": "Bob", "deadline": "Friday",
72	"priority": "high", "context": "Production"},
73	])







74	result = _extract_action_items(pm, "Some transcript text")
75	assert len(result) == 1
76	assert result[0].action == "Deploy fix"
77	assert result[0].assignee == "Bob"
78
79	def test_skips_invalid_items(self):
80	pm = MagicMock()
81	pm.chat.return_value = json.dumps([
82	{"action": "Valid action"},
83	{"assignee": "No action field"},
84	{"action": ""},
85	])


86	result = _extract_action_items(pm, "text")
87	assert len(result) == 1
88
89	def test_handles_error(self):
90	pm = MagicMock()
91	pm.chat.side_effect = Exception("API down")
92	result = _extract_action_items(pm, "text")
93	assert result == []
94

	--- tests/test_pipeline.py
	+++ tests/test_pipeline.py
	@@ -1,14 +1,11 @@
1	"""Tests for the core video processing pipeline."""
2
3	import json
4	from unittest.mock import MagicMock

5
6	from video_processor.pipeline import _extract_action_items, _extract_key_points, _format_srt_time


7
8
9	class TestFormatSrtTime:
10	def test_zero(self):
11	assert _format_srt_time(0) == "00:00:00,000"
	@@ -28,27 +25,31 @@
25
26
27	class TestExtractKeyPoints:
28	def test_parses_valid_response(self):
29	pm = MagicMock()
30	pm.chat.return_value = json.dumps(
31	[
32	{"point": "Main point", "topic": "Architecture", "details": "Some details"},
33	{"point": "Second point", "topic": None, "details": None},
34	]
35	)
36	result = _extract_key_points(pm, "Some transcript text here")
37	assert len(result) == 2
38	assert result[0].point == "Main point"
39	assert result[0].topic == "Architecture"
40	assert result[1].point == "Second point"
41
42	def test_skips_invalid_items(self):
43	pm = MagicMock()
44	pm.chat.return_value = json.dumps(
45	[
46	{"point": "Valid", "topic": None},
47	{"topic": "No point field"},
48	{"point": "", "topic": "Empty point"},
49	]
50	)
51	result = _extract_key_points(pm, "text")
52	assert len(result) == 1
53	assert result[0].point == "Valid"
54
55	def test_handles_error(self):
	@@ -65,29 +66,38 @@
66
67
68	class TestExtractActionItems:
69	def test_parses_valid_response(self):
70	pm = MagicMock()
71	pm.chat.return_value = json.dumps(
72	[
73	{
74	"action": "Deploy fix",
75	"assignee": "Bob",
76	"deadline": "Friday",
77	"priority": "high",
78	"context": "Production",
79	},
80	]
81	)
82	result = _extract_action_items(pm, "Some transcript text")
83	assert len(result) == 1
84	assert result[0].action == "Deploy fix"
85	assert result[0].assignee == "Bob"
86
87	def test_skips_invalid_items(self):
88	pm = MagicMock()
89	pm.chat.return_value = json.dumps(
90	[
91	{"action": "Valid action"},
92	{"assignee": "No action field"},
93	{"action": ""},
94	]
95	)
96	result = _extract_action_items(pm, "text")
97	assert len(result) == 1
98
99	def test_handles_error(self):
100	pm = MagicMock()
101	pm.chat.side_effect = Exception("API down")
102	result = _extract_action_items(pm, "text")
103	assert result == []
104

M tests/test_prompt_templates.py

-2

		--- tests/test_prompt_templates.py
		+++ tests/test_prompt_templates.py
		@@ -1,9 +1,7 @@
1	1	"""Tests for prompt template management."""
2	2
3		-import pytest
4		-
5	3	from video_processor.utils.prompt_templates import (
6	4	DEFAULT_TEMPLATES,
7	5	PromptTemplate,
8	6	default_prompt_manager,
9	7	)
10	8

	--- tests/test_prompt_templates.py
	+++ tests/test_prompt_templates.py
	@@ -1,9 +1,7 @@
1	"""Tests for prompt template management."""
2
3	import pytest
4
5	from video_processor.utils.prompt_templates import (
6	DEFAULT_TEMPLATES,
7	PromptTemplate,
8	default_prompt_manager,
9	)
10

	--- tests/test_prompt_templates.py
	+++ tests/test_prompt_templates.py
	@@ -1,9 +1,7 @@
1	"""Tests for prompt template management."""
2


3	from video_processor.utils.prompt_templates import (
4	DEFAULT_TEMPLATES,
5	PromptTemplate,
6	default_prompt_manager,
7	)
8

M tests/test_providers.py

+10 -4

		--- tests/test_providers.py
		+++ tests/test_providers.py
		@@ -1,11 +1,9 @@
1	1	"""Tests for the provider abstraction layer."""
2	2
3	3	from unittest.mock import MagicMock, patch
4	4
5		-import pytest
6		-
7	5	from video_processor.providers.base import BaseProvider, ModelInfo
8	6	from video_processor.providers.manager import ProviderManager
9	7
10	8
11	9	class TestModelInfo:
		@@ -13,11 +11,16 @@
13	11	m = ModelInfo(id="gpt-4o", provider="openai", capabilities=["chat", "vision"])
14	12	assert m.id == "gpt-4o"
15	13	assert "vision" in m.capabilities
16	14
17	15	def test_round_trip(self):
18		- m = ModelInfo(id="claude-sonnet-4-5-20250929", provider="anthropic", display_name="Claude Sonnet", capabilities=["chat", "vision"])
	16	+ m = ModelInfo(
	17	+ id="claude-sonnet-4-5-20250929",
	18	+ provider="anthropic",
	19	+ display_name="Claude Sonnet",
	20	+ capabilities=["chat", "vision"],
	21	+ )
19	22	restored = ModelInfo.model_validate_json(m.model_dump_json())
20	23	assert restored == m
21	24
22	25
23	26	class TestProviderManager:
		@@ -107,23 +110,26 @@
107	110	class TestDiscovery:
108	111	@patch("video_processor.providers.discovery._cached_models", None)
109	112	@patch.dict("os.environ", {}, clear=True)
110	113	def test_discover_skips_missing_keys(self):
111	114	from video_processor.providers.discovery import discover_available_models
	115	+
112	116	# No API keys -> empty list, no errors
113	117	models = discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""})
114	118	assert models == []
115	119
116	120	@patch.dict("os.environ", {}, clear=True)
117	121	@patch("video_processor.providers.discovery._cached_models", None)
118	122	def test_discover_caches_results(self):
119	123	from video_processor.providers import discovery
120	124
121		- models = discovery.discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""})
	125	+ models = discovery.discover_available_models(
	126	+ api_keys={"openai": "", "anthropic": "", "gemini": ""}
	127	+ )
122	128	assert models == []
123	129	# Second call should use cache
124	130	models2 = discovery.discover_available_models(api_keys={"openai": "key"})
125	131	assert models2 == [] # Still cached empty result
126	132
127	133	# Force refresh
128	134	discovery.clear_discovery_cache()
129	135	# Would try to connect with real key, so skip that test
130	136

	--- tests/test_providers.py
	+++ tests/test_providers.py
	@@ -1,11 +1,9 @@
1	"""Tests for the provider abstraction layer."""
2
3	from unittest.mock import MagicMock, patch
4
5	import pytest
6
7	from video_processor.providers.base import BaseProvider, ModelInfo
8	from video_processor.providers.manager import ProviderManager
9
10
11	class TestModelInfo:
	@@ -13,11 +11,16 @@
13	m = ModelInfo(id="gpt-4o", provider="openai", capabilities=["chat", "vision"])
14	assert m.id == "gpt-4o"
15	assert "vision" in m.capabilities
16
17	def test_round_trip(self):
18	m = ModelInfo(id="claude-sonnet-4-5-20250929", provider="anthropic", display_name="Claude Sonnet", capabilities=["chat", "vision"])





19	restored = ModelInfo.model_validate_json(m.model_dump_json())
20	assert restored == m
21
22
23	class TestProviderManager:
	@@ -107,23 +110,26 @@
107	class TestDiscovery:
108	@patch("video_processor.providers.discovery._cached_models", None)
109	@patch.dict("os.environ", {}, clear=True)
110	def test_discover_skips_missing_keys(self):
111	from video_processor.providers.discovery import discover_available_models

112	# No API keys -> empty list, no errors
113	models = discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""})
114	assert models == []
115
116	@patch.dict("os.environ", {}, clear=True)
117	@patch("video_processor.providers.discovery._cached_models", None)
118	def test_discover_caches_results(self):
119	from video_processor.providers import discovery
120
121	models = discovery.discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""})


122	assert models == []
123	# Second call should use cache
124	models2 = discovery.discover_available_models(api_keys={"openai": "key"})
125	assert models2 == [] # Still cached empty result
126
127	# Force refresh
128	discovery.clear_discovery_cache()
129	# Would try to connect with real key, so skip that test
130

	--- tests/test_providers.py
	+++ tests/test_providers.py
	@@ -1,11 +1,9 @@
1	"""Tests for the provider abstraction layer."""
2
3	from unittest.mock import MagicMock, patch
4


5	from video_processor.providers.base import BaseProvider, ModelInfo
6	from video_processor.providers.manager import ProviderManager
7
8
9	class TestModelInfo:
	@@ -13,11 +11,16 @@
11	m = ModelInfo(id="gpt-4o", provider="openai", capabilities=["chat", "vision"])
12	assert m.id == "gpt-4o"
13	assert "vision" in m.capabilities
14
15	def test_round_trip(self):
16	m = ModelInfo(
17	id="claude-sonnet-4-5-20250929",
18	provider="anthropic",
19	display_name="Claude Sonnet",
20	capabilities=["chat", "vision"],
21	)
22	restored = ModelInfo.model_validate_json(m.model_dump_json())
23	assert restored == m
24
25
26	class TestProviderManager:
	@@ -107,23 +110,26 @@
110	class TestDiscovery:
111	@patch("video_processor.providers.discovery._cached_models", None)
112	@patch.dict("os.environ", {}, clear=True)
113	def test_discover_skips_missing_keys(self):
114	from video_processor.providers.discovery import discover_available_models
115
116	# No API keys -> empty list, no errors
117	models = discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""})
118	assert models == []
119
120	@patch.dict("os.environ", {}, clear=True)
121	@patch("video_processor.providers.discovery._cached_models", None)
122	def test_discover_caches_results(self):
123	from video_processor.providers import discovery
124
125	models = discovery.discover_available_models(
126	api_keys={"openai": "", "anthropic": "", "gemini": ""}
127	)
128	assert models == []
129	# Second call should use cache
130	models2 = discovery.discover_available_models(api_keys={"openai": "key"})
131	assert models2 == [] # Still cached empty result
132
133	# Force refresh
134	discovery.clear_discovery_cache()
135	# Would try to connect with real key, so skip that test
136

M tests/test_rendering.py

+3 -7

		--- tests/test_rendering.py
		+++ tests/test_rendering.py
		@@ -1,12 +1,8 @@
1	1	"""Tests for rendering and export utilities."""
2	2
3		-import json
4		-from pathlib import Path
5		-from unittest.mock import MagicMock, patch
6		-
7		-import pytest
	3	+from unittest.mock import patch
8	4
9	5	from video_processor.models import (
10	6	ActionItem,
11	7	DiagramResult,
12	8	DiagramType,
		@@ -101,11 +97,11 @@
101	97	assert result == {}
102	98
103	99	def test_creates_output_dir(self, tmp_path):
104	100	nested = tmp_path / "charts" / "output"
105	101	data = {"labels": ["A"], "values": [1], "chart_type": "bar"}
106		- result = reproduce_chart(data, nested, "test")
	102	+ reproduce_chart(data, nested, "test")
107	103	assert nested.exists()
108	104
109	105
110	106	class TestExportAllFormats:
111	107	def _make_manifest(self) -> VideoManifest:
		@@ -180,11 +176,11 @@
180	176	],
181	177	)
182	178	(tmp_path / "results").mkdir()
183	179	(tmp_path / "diagrams").mkdir()
184	180
185		- result = export_all_formats(tmp_path, manifest)
	181	+ export_all_formats(tmp_path, manifest)
186	182	# Chart should be reproduced
187	183	chart_svg = tmp_path / "diagrams" / "diagram_0_chart.svg"
188	184	assert chart_svg.exists()
189	185
190	186
191	187

	--- tests/test_rendering.py
	+++ tests/test_rendering.py
	@@ -1,12 +1,8 @@
1	"""Tests for rendering and export utilities."""
2
3	import json
4	from pathlib import Path
5	from unittest.mock import MagicMock, patch
6
7	import pytest
8
9	from video_processor.models import (
10	ActionItem,
11	DiagramResult,
12	DiagramType,
	@@ -101,11 +97,11 @@
101	assert result == {}
102
103	def test_creates_output_dir(self, tmp_path):
104	nested = tmp_path / "charts" / "output"
105	data = {"labels": ["A"], "values": [1], "chart_type": "bar"}
106	result = reproduce_chart(data, nested, "test")
107	assert nested.exists()
108
109
110	class TestExportAllFormats:
111	def _make_manifest(self) -> VideoManifest:
	@@ -180,11 +176,11 @@
180	],
181	)
182	(tmp_path / "results").mkdir()
183	(tmp_path / "diagrams").mkdir()
184
185	result = export_all_formats(tmp_path, manifest)
186	# Chart should be reproduced
187	chart_svg = tmp_path / "diagrams" / "diagram_0_chart.svg"
188	assert chart_svg.exists()
189
190
191

	--- tests/test_rendering.py
	+++ tests/test_rendering.py
	@@ -1,12 +1,8 @@
1	"""Tests for rendering and export utilities."""
2
3	from unittest.mock import patch




4
5	from video_processor.models import (
6	ActionItem,
7	DiagramResult,
8	DiagramType,
	@@ -101,11 +97,11 @@
97	assert result == {}
98
99	def test_creates_output_dir(self, tmp_path):
100	nested = tmp_path / "charts" / "output"
101	data = {"labels": ["A"], "values": [1], "chart_type": "bar"}
102	reproduce_chart(data, nested, "test")
103	assert nested.exists()
104
105
106	class TestExportAllFormats:
107	def _make_manifest(self) -> VideoManifest:
	@@ -180,11 +176,11 @@
176	],
177	)
178	(tmp_path / "results").mkdir()
179	(tmp_path / "diagrams").mkdir()
180
181	export_all_formats(tmp_path, manifest)
182	# Chart should be reproduced
183	chart_svg = tmp_path / "diagrams" / "diagram_0_chart.svg"
184	assert chart_svg.exists()
185
186
187

M video_processor/agent/orchestrator.py

+7 -21

		--- video_processor/agent/orchestrator.py
		+++ video_processor/agent/orchestrator.py
		@@ -5,17 +5,13 @@
5	5	import time
6	6	from pathlib import Path
7	7	from typing import Any, Dict, List, Optional
8	8
9	9	from video_processor.models import (
10		- ActionItem,
11		- DiagramResult,
12		- KeyPoint,
13		- ScreenCapture,
	10	+ ProcessingStats,
14	11	VideoManifest,
15	12	VideoMetadata,
16		- ProcessingStats,
17	13	)
18	14	from video_processor.providers.manager import ProviderManager
19	15
20	16	logger = logging.getLogger(__name__)
21	17
		@@ -107,13 +103,11 @@
107	103	plan.append({"step": "generate_reports", "priority": "required"})
108	104
109	105	self._plan = plan
110	106	return plan
111	107
112		- def _execute_step(
113		- self, step: Dict[str, Any], input_path: Path, output_dir: Path
114		- ) -> None:
	108	+ def _execute_step(self, step: Dict[str, Any], input_path: Path, output_dir: Path) -> None:
115	109	"""Execute a single step with retry logic."""
116	110	step_name = step["step"]
117	111	logger.info(f"Agent step: {step_name}")
118	112
119	113	for attempt in range(1, self.max_retries + 1):
		@@ -141,13 +135,11 @@
141	135	result = self._run_step(fallback, input_path, output_dir)
142	136	self._results[step_name] = result
143	137	except Exception as fe:
144	138	logger.error(f"Fallback {fallback} also failed: {fe}")
145	139
146		- def _run_step(
147		- self, step_name: str, input_path: Path, output_dir: Path
148		- ) -> Any:
	140	+ def _run_step(self, step_name: str, input_path: Path, output_dir: Path) -> Any:
149	141	"""Run a specific processing step."""
150	142	from video_processor.output_structure import create_video_output_dirs
151	143
152	144	dirs = create_video_output_dirs(output_dir, input_path.stem)
153	145
		@@ -177,13 +169,11 @@
177	169	transcription = self.pm.transcribe_audio(audio_path)
178	170	text = transcription.get("text", "")
179	171
180	172	# Save transcript
181	173	dirs["transcript"].mkdir(parents=True, exist_ok=True)
182		- (dirs["transcript"] / "transcript.json").write_text(
183		- json.dumps(transcription, indent=2)
184		- )
	174	+ (dirs["transcript"] / "transcript.json").write_text(json.dumps(transcription, indent=2))
185	175	(dirs["transcript"] / "transcript.txt").write_text(text)
186	176	return transcription
187	177
188	178	elif step_name == "detect_diagrams":
189	179	from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
		@@ -256,23 +246,19 @@
256	246	"""Adapt the plan based on step results."""
257	247
258	248	if completed_step == "transcribe":
259	249	text = result.get("text", "") if isinstance(result, dict) else ""
260	250	# If transcript is very long, add deep analysis
261		- if len(text) > 10000 and not any(
262		- s["step"] == "deep_analysis" for s in self._plan
263		- ):
	251	+ if len(text) > 10000 and not any(s["step"] == "deep_analysis" for s in self._plan):
264	252	self._plan.append({"step": "deep_analysis", "priority": "adaptive"})
265	253	logger.info("Agent adapted: adding deep analysis for long transcript")
266	254
267	255	elif completed_step == "detect_diagrams":
268	256	diagrams = result.get("diagrams", []) if isinstance(result, dict) else []
269	257	captures = result.get("captures", []) if isinstance(result, dict) else []
270	258	# If many diagrams found, ensure cross-referencing
271		- if len(diagrams) >= 3 and not any(
272		- s["step"] == "cross_reference" for s in self._plan
273		- ):
	259	+ if len(diagrams) >= 3 and not any(s["step"] == "cross_reference" for s in self._plan):
274	260	self._plan.append({"step": "cross_reference", "priority": "adaptive"})
275	261	logger.info("Agent adapted: adding cross-reference for diagram-heavy video")
276	262
277	263	if len(captures) > len(diagrams):
278	264	self._insights.append(
		@@ -358,11 +344,11 @@
358	344
359	345	transcript = self._results.get("transcribe", {})
360	346	kp_result = self._results.get("extract_key_points", {})
361	347	key_points = kp_result.get("key_points", [])
362	348	ai_result = self._results.get("extract_action_items", {})
363		- action_items = ai_result.get("action_items", [])
	349	+ ai_result.get("action_items", [])
364	350	diagram_result = self._results.get("detect_diagrams", {})
365	351	diagrams = diagram_result.get("diagrams", [])
366	352	kg_result = self._results.get("build_knowledge_graph", {})
367	353	kg = kg_result.get("knowledge_graph")
368	354
369	355

	--- video_processor/agent/orchestrator.py
	+++ video_processor/agent/orchestrator.py
	@@ -5,17 +5,13 @@
5	import time
6	from pathlib import Path
7	from typing import Any, Dict, List, Optional
8
9	from video_processor.models import (
10	ActionItem,
11	DiagramResult,
12	KeyPoint,
13	ScreenCapture,
14	VideoManifest,
15	VideoMetadata,
16	ProcessingStats,
17	)
18	from video_processor.providers.manager import ProviderManager
19
20	logger = logging.getLogger(__name__)
21
	@@ -107,13 +103,11 @@
107	plan.append({"step": "generate_reports", "priority": "required"})
108
109	self._plan = plan
110	return plan
111
112	def _execute_step(
113	self, step: Dict[str, Any], input_path: Path, output_dir: Path
114	) -> None:
115	"""Execute a single step with retry logic."""
116	step_name = step["step"]
117	logger.info(f"Agent step: {step_name}")
118
119	for attempt in range(1, self.max_retries + 1):
	@@ -141,13 +135,11 @@
141	result = self._run_step(fallback, input_path, output_dir)
142	self._results[step_name] = result
143	except Exception as fe:
144	logger.error(f"Fallback {fallback} also failed: {fe}")
145
146	def _run_step(
147	self, step_name: str, input_path: Path, output_dir: Path
148	) -> Any:
149	"""Run a specific processing step."""
150	from video_processor.output_structure import create_video_output_dirs
151
152	dirs = create_video_output_dirs(output_dir, input_path.stem)
153
	@@ -177,13 +169,11 @@
177	transcription = self.pm.transcribe_audio(audio_path)
178	text = transcription.get("text", "")
179
180	# Save transcript
181	dirs["transcript"].mkdir(parents=True, exist_ok=True)
182	(dirs["transcript"] / "transcript.json").write_text(
183	json.dumps(transcription, indent=2)
184	)
185	(dirs["transcript"] / "transcript.txt").write_text(text)
186	return transcription
187
188	elif step_name == "detect_diagrams":
189	from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
	@@ -256,23 +246,19 @@
256	"""Adapt the plan based on step results."""
257
258	if completed_step == "transcribe":
259	text = result.get("text", "") if isinstance(result, dict) else ""
260	# If transcript is very long, add deep analysis
261	if len(text) > 10000 and not any(
262	s["step"] == "deep_analysis" for s in self._plan
263	):
264	self._plan.append({"step": "deep_analysis", "priority": "adaptive"})
265	logger.info("Agent adapted: adding deep analysis for long transcript")
266
267	elif completed_step == "detect_diagrams":
268	diagrams = result.get("diagrams", []) if isinstance(result, dict) else []
269	captures = result.get("captures", []) if isinstance(result, dict) else []
270	# If many diagrams found, ensure cross-referencing
271	if len(diagrams) >= 3 and not any(
272	s["step"] == "cross_reference" for s in self._plan
273	):
274	self._plan.append({"step": "cross_reference", "priority": "adaptive"})
275	logger.info("Agent adapted: adding cross-reference for diagram-heavy video")
276
277	if len(captures) > len(diagrams):
278	self._insights.append(
	@@ -358,11 +344,11 @@
358
359	transcript = self._results.get("transcribe", {})
360	kp_result = self._results.get("extract_key_points", {})
361	key_points = kp_result.get("key_points", [])
362	ai_result = self._results.get("extract_action_items", {})
363	action_items = ai_result.get("action_items", [])
364	diagram_result = self._results.get("detect_diagrams", {})
365	diagrams = diagram_result.get("diagrams", [])
366	kg_result = self._results.get("build_knowledge_graph", {})
367	kg = kg_result.get("knowledge_graph")
368
369

	--- video_processor/agent/orchestrator.py
	+++ video_processor/agent/orchestrator.py
	@@ -5,17 +5,13 @@
5	import time
6	from pathlib import Path
7	from typing import Any, Dict, List, Optional
8
9	from video_processor.models import (
10	ProcessingStats,



11	VideoManifest,
12	VideoMetadata,

13	)
14	from video_processor.providers.manager import ProviderManager
15
16	logger = logging.getLogger(__name__)
17
	@@ -107,13 +103,11 @@
103	plan.append({"step": "generate_reports", "priority": "required"})
104
105	self._plan = plan
106	return plan
107
108	def _execute_step(self, step: Dict[str, Any], input_path: Path, output_dir: Path) -> None:


109	"""Execute a single step with retry logic."""
110	step_name = step["step"]
111	logger.info(f"Agent step: {step_name}")
112
113	for attempt in range(1, self.max_retries + 1):
	@@ -141,13 +135,11 @@
135	result = self._run_step(fallback, input_path, output_dir)
136	self._results[step_name] = result
137	except Exception as fe:
138	logger.error(f"Fallback {fallback} also failed: {fe}")
139
140	def _run_step(self, step_name: str, input_path: Path, output_dir: Path) -> Any:


141	"""Run a specific processing step."""
142	from video_processor.output_structure import create_video_output_dirs
143
144	dirs = create_video_output_dirs(output_dir, input_path.stem)
145
	@@ -177,13 +169,11 @@
169	transcription = self.pm.transcribe_audio(audio_path)
170	text = transcription.get("text", "")
171
172	# Save transcript
173	dirs["transcript"].mkdir(parents=True, exist_ok=True)
174	(dirs["transcript"] / "transcript.json").write_text(json.dumps(transcription, indent=2))


175	(dirs["transcript"] / "transcript.txt").write_text(text)
176	return transcription
177
178	elif step_name == "detect_diagrams":
179	from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
	@@ -256,23 +246,19 @@
246	"""Adapt the plan based on step results."""
247
248	if completed_step == "transcribe":
249	text = result.get("text", "") if isinstance(result, dict) else ""
250	# If transcript is very long, add deep analysis
251	if len(text) > 10000 and not any(s["step"] == "deep_analysis" for s in self._plan):


252	self._plan.append({"step": "deep_analysis", "priority": "adaptive"})
253	logger.info("Agent adapted: adding deep analysis for long transcript")
254
255	elif completed_step == "detect_diagrams":
256	diagrams = result.get("diagrams", []) if isinstance(result, dict) else []
257	captures = result.get("captures", []) if isinstance(result, dict) else []
258	# If many diagrams found, ensure cross-referencing
259	if len(diagrams) >= 3 and not any(s["step"] == "cross_reference" for s in self._plan):


260	self._plan.append({"step": "cross_reference", "priority": "adaptive"})
261	logger.info("Agent adapted: adding cross-reference for diagram-heavy video")
262
263	if len(captures) > len(diagrams):
264	self._insights.append(
	@@ -358,11 +344,11 @@
344
345	transcript = self._results.get("transcribe", {})
346	kp_result = self._results.get("extract_key_points", {})
347	key_points = kp_result.get("key_points", [])
348	ai_result = self._results.get("extract_action_items", {})
349	ai_result.get("action_items", [])
350	diagram_result = self._results.get("detect_diagrams", {})
351	diagrams = diagram_result.get("diagrams", [])
352	kg_result = self._results.get("build_knowledge_graph", {})
353	kg = kg_result.get("knowledge_graph")
354
355

M video_processor/analyzers/action_detector.py

+7 -5

		--- video_processor/analyzers/action_detector.py
		+++ video_processor/analyzers/action_detector.py
		@@ -150,23 +150,25 @@
150	150	return []
151	151
152	152	def _pattern_extract(self, text: str) -> List[ActionItem]:
153	153	"""Extract action items using regex pattern matching."""
154	154	items: List[ActionItem] = []
155		- sentences = re.split(r'[.!?]\s+', text)
	155	+ sentences = re.split(r"[.!?]\s+", text)
156	156
157	157	for sentence in sentences:
158	158	sentence = sentence.strip()
159	159	if not sentence or len(sentence) < 10:
160	160	continue
161	161
162	162	for pattern in _ACTION_PATTERNS:
163	163	if pattern.search(sentence):
164		- items.append(ActionItem(
165		- action=sentence,
166		- source="transcript",
167		- ))
	164	+ items.append(
	165	+ ActionItem(
	166	+ action=sentence,
	167	+ source="transcript",
	168	+ )
	169	+ )
168	170	break # One match per sentence is enough
169	171
170	172	return items
171	173
172	174	def _attach_timestamps(
173	175

	--- video_processor/analyzers/action_detector.py
	+++ video_processor/analyzers/action_detector.py
	@@ -150,23 +150,25 @@
150	return []
151
152	def _pattern_extract(self, text: str) -> List[ActionItem]:
153	"""Extract action items using regex pattern matching."""
154	items: List[ActionItem] = []
155	sentences = re.split(r'[.!?]\s+', text)
156
157	for sentence in sentences:
158	sentence = sentence.strip()
159	if not sentence or len(sentence) < 10:
160	continue
161
162	for pattern in _ACTION_PATTERNS:
163	if pattern.search(sentence):
164	items.append(ActionItem(
165	action=sentence,
166	source="transcript",
167	))


168	break # One match per sentence is enough
169
170	return items
171
172	def _attach_timestamps(
173

	--- video_processor/analyzers/action_detector.py
	+++ video_processor/analyzers/action_detector.py
	@@ -150,23 +150,25 @@
150	return []
151
152	def _pattern_extract(self, text: str) -> List[ActionItem]:
153	"""Extract action items using regex pattern matching."""
154	items: List[ActionItem] = []
155	sentences = re.split(r"[.!?]\s+", text)
156
157	for sentence in sentences:
158	sentence = sentence.strip()
159	if not sentence or len(sentence) < 10:
160	continue
161
162	for pattern in _ACTION_PATTERNS:
163	if pattern.search(sentence):
164	items.append(
165	ActionItem(
166	action=sentence,
167	source="transcript",
168	)
169	)
170	break # One match per sentence is enough
171
172	return items
173
174	def _attach_timestamps(
175

M video_processor/analyzers/content_analyzer.py

+9 -7

		--- video_processor/analyzers/content_analyzer.py
		+++ video_processor/analyzers/content_analyzer.py
		@@ -58,18 +58,18 @@
58	58	)
59	59
60	60	# LLM fuzzy matching for unmatched entities
61	61	if self.pm:
62	62	unmatched_t = [
63		- e for e in transcript_entities if e.name.lower() not in {
64		- d.name.lower() for d in diagram_entities
65		- }
	63	+ e
	64	+ for e in transcript_entities
	65	+ if e.name.lower() not in {d.name.lower() for d in diagram_entities}
66	66	]
67	67	unmatched_d = [
68		- e for e in diagram_entities if e.name.lower() not in {
69		- t.name.lower() for t in transcript_entities
70		- }
	68	+ e
	69	+ for e in diagram_entities
	70	+ if e.name.lower() not in {t.name.lower() for t in transcript_entities}
71	71	]
72	72
73	73	if unmatched_t and unmatched_d:
74	74	matches = self._fuzzy_match(unmatched_t, unmatched_d)
75	75	for t_name, d_name in matches:
		@@ -136,11 +136,13 @@
136	136
137	137	# Build diagram entity index
138	138	diagram_entities: dict[int, set[str]] = {}
139	139	for i, d in enumerate(diagrams):
140	140	elements = d.get("elements", []) if isinstance(d, dict) else getattr(d, "elements", [])
141		- text = d.get("text_content", "") if isinstance(d, dict) else getattr(d, "text_content", "")
	141	+ text = (
	142	+ d.get("text_content", "") if isinstance(d, dict) else getattr(d, "text_content", "")
	143	+ )
142	144	entities = set(str(e).lower() for e in elements)
143	145	if text:
144	146	entities.update(word.lower() for word in text.split() if len(word) > 3)
145	147	diagram_entities[i] = entities
146	148
147	149

	--- video_processor/analyzers/content_analyzer.py
	+++ video_processor/analyzers/content_analyzer.py
	@@ -58,18 +58,18 @@
58	)
59
60	# LLM fuzzy matching for unmatched entities
61	if self.pm:
62	unmatched_t = [
63	e for e in transcript_entities if e.name.lower() not in {
64	d.name.lower() for d in diagram_entities
65	}
66	]
67	unmatched_d = [
68	e for e in diagram_entities if e.name.lower() not in {
69	t.name.lower() for t in transcript_entities
70	}
71	]
72
73	if unmatched_t and unmatched_d:
74	matches = self._fuzzy_match(unmatched_t, unmatched_d)
75	for t_name, d_name in matches:
	@@ -136,11 +136,13 @@
136
137	# Build diagram entity index
138	diagram_entities: dict[int, set[str]] = {}
139	for i, d in enumerate(diagrams):
140	elements = d.get("elements", []) if isinstance(d, dict) else getattr(d, "elements", [])
141	text = d.get("text_content", "") if isinstance(d, dict) else getattr(d, "text_content", "")


142	entities = set(str(e).lower() for e in elements)
143	if text:
144	entities.update(word.lower() for word in text.split() if len(word) > 3)
145	diagram_entities[i] = entities
146
147

	--- video_processor/analyzers/content_analyzer.py
	+++ video_processor/analyzers/content_analyzer.py
	@@ -58,18 +58,18 @@
58	)
59
60	# LLM fuzzy matching for unmatched entities
61	if self.pm:
62	unmatched_t = [
63	e
64	for e in transcript_entities
65	if e.name.lower() not in {d.name.lower() for d in diagram_entities}
66	]
67	unmatched_d = [
68	e
69	for e in diagram_entities
70	if e.name.lower() not in {t.name.lower() for t in transcript_entities}
71	]
72
73	if unmatched_t and unmatched_d:
74	matches = self._fuzzy_match(unmatched_t, unmatched_d)
75	for t_name, d_name in matches:
	@@ -136,11 +136,13 @@
136
137	# Build diagram entity index
138	diagram_entities: dict[int, set[str]] = {}
139	for i, d in enumerate(diagrams):
140	elements = d.get("elements", []) if isinstance(d, dict) else getattr(d, "elements", [])
141	text = (
142	d.get("text_content", "") if isinstance(d, dict) else getattr(d, "text_content", "")
143	)
144	entities = set(str(e).lower() for e in elements)
145	if text:
146	entities.update(word.lower() for word in text.split() if len(word) > 3)
147	diagram_entities[i] = entities
148
149

M video_processor/analyzers/diagram_analyzer.py

+25 -11

		--- video_processor/analyzers/diagram_analyzer.py
		+++ video_processor/analyzers/diagram_analyzer.py
		@@ -24,23 +24,25 @@
24	24	shared/presented content, NOT people or camera views.
25	25
26	26	Return ONLY a JSON object (no markdown fences):
27	27	{
28	28	"is_diagram": true/false,
29		- "diagram_type": "flowchart"\|"sequence"\|"architecture"\|"whiteboard"\|"chart"\|"table"\|"slide"\|"screenshot"\|"unknown",
	29	+ "diagram_type": "flowchart"\|"sequence"\|"architecture"
	30	+ \|"whiteboard"\|"chart"\|"table"\|"slide"\|"screenshot"\|"unknown",
30	31	"confidence": 0.0 to 1.0,
31	32	"content_type": "slide"\|"diagram"\|"document"\|"screen_share"\|"whiteboard"\|"chart"\|"person"\|"other",
32	33	"brief_description": "one-sentence description of what you see"
33	34	}
34	35	"""
35	36
36	37	# Single-pass analysis prompt — extracts everything in one call
37	38	_ANALYSIS_PROMPT = """\
38		-Analyze this diagram/visual content comprehensively. Extract ALL of the following in a single JSON response (no markdown fences):
39		-
	39	+Analyze this diagram/visual content comprehensively. Extract ALL of the
	40	+following in a single JSON response (no markdown fences):
40	41	{
41		- "diagram_type": "flowchart"\|"sequence"\|"architecture"\|"whiteboard"\|"chart"\|"table"\|"slide"\|"screenshot"\|"unknown",
	42	+ "diagram_type": "flowchart"\|"sequence"\|"architecture"
	43	+ \|"whiteboard"\|"chart"\|"table"\|"slide"\|"screenshot"\|"unknown",
42	44	"description": "detailed description of the visual content",
43	45	"text_content": "all visible text, preserving structure",
44	46	"elements": ["list", "of", "identified", "elements/components"],
45	47	"relationships": ["element A -> element B: relationship", ...],
46	48	"mermaid": "mermaid diagram syntax representing this visual (graph LR, sequenceDiagram, etc.)",
		@@ -68,12 +70,11 @@
68	70	# Strip markdown fences
69	71	cleaned = text.strip()
70	72	if cleaned.startswith("```"):
71	73	lines = cleaned.split("\n")
72	74	# Remove first and last fence lines
73		- lines = [l for l in lines if not l.strip().startswith("```")]
74		- cleaned = "\n".join(lines)
	75	+ lines = [line for line in lines if not line.strip().startswith("```")]
75	76	try:
76	77	return json.loads(cleaned)
77	78	except json.JSONDecodeError:
78	79	# Try to find JSON object in the text
79	80	start = cleaned.find("{")
		@@ -105,11 +106,16 @@
105	106	"""
106	107	image_bytes = _read_image_bytes(image_path)
107	108	raw = self.pm.analyze_image(image_bytes, _CLASSIFY_PROMPT, max_tokens=512)
108	109	result = _parse_json_response(raw)
109	110	if result is None:
110		- return {"is_diagram": False, "diagram_type": "unknown", "confidence": 0.0, "brief_description": ""}
	111	+ return {
	112	+ "is_diagram": False,
	113	+ "diagram_type": "unknown",
	114	+ "confidence": 0.0,
	115	+ "brief_description": "",
	116	+ }
111	117	return result
112	118
113	119	def analyze_diagram_single_pass(self, image_path: Union[str, Path]) -> dict:
114	120	"""
115	121	Full single-pass diagram analysis — description, text, mermaid, chart data.
		@@ -163,15 +169,19 @@
163	169	logger.debug(f"Frame {i}: confidence {confidence:.2f} below threshold, skipping")
164	170	continue
165	171
166	172	if confidence >= 0.7:
167	173	# Full diagram analysis
168		- logger.info(f"Frame {i}: diagram detected (confidence {confidence:.2f}), analyzing...")
	174	+ logger.info(
	175	+ f"Frame {i}: diagram detected (confidence {confidence:.2f}), analyzing..."
	176	+ )
169	177	try:
170	178	analysis = self.analyze_diagram_single_pass(fp)
171	179	except Exception as e:
172		- logger.warning(f"Diagram analysis failed for frame {i}: {e}, falling back to screengrab")
	180	+ logger.warning(
	181	+ f"Diagram analysis failed for frame {i}: {e}, falling back to screengrab"
	182	+ )
173	183	analysis = {}
174	184
175	185	if not analysis:
176	186	# Analysis failed — fall back to screengrab
177	187	capture = self._save_screengrab(fp, i, capture_idx, captures_dir, confidence)
		@@ -221,16 +231,20 @@
221	231	diagrams.append(dr)
222	232	diagram_idx += 1
223	233
224	234	else:
225	235	# Screengrab fallback (0.3 <= confidence < 0.7)
226		- logger.info(f"Frame {i}: uncertain (confidence {confidence:.2f}), saving as screengrab")
	236	+ logger.info(
	237	+ f"Frame {i}: uncertain (confidence {confidence:.2f}), saving as screengrab"
	238	+ )
227	239	capture = self._save_screengrab(fp, i, capture_idx, captures_dir, confidence)
228	240	captures.append(capture)
229	241	capture_idx += 1
230	242
231		- logger.info(f"Diagram processing complete: {len(diagrams)} diagrams, {len(captures)} screengrabs")
	243	+ logger.info(
	244	+ f"Diagram processing complete: {len(diagrams)} diagrams, {len(captures)} screengrabs"
	245	+ )
232	246	return diagrams, captures
233	247
234	248	def _save_screengrab(
235	249	self,
236	250	frame_path: Path,
237	251

	--- video_processor/analyzers/diagram_analyzer.py
	+++ video_processor/analyzers/diagram_analyzer.py
	@@ -24,23 +24,25 @@
24	shared/presented content, NOT people or camera views.
25
26	Return ONLY a JSON object (no markdown fences):
27	{
28	"is_diagram": true/false,
29	"diagram_type": "flowchart"\|"sequence"\|"architecture"\|"whiteboard"\|"chart"\|"table"\|"slide"\|"screenshot"\|"unknown",

30	"confidence": 0.0 to 1.0,
31	"content_type": "slide"\|"diagram"\|"document"\|"screen_share"\|"whiteboard"\|"chart"\|"person"\|"other",
32	"brief_description": "one-sentence description of what you see"
33	}
34	"""
35
36	# Single-pass analysis prompt — extracts everything in one call
37	_ANALYSIS_PROMPT = """\
38	Analyze this diagram/visual content comprehensively. Extract ALL of the following in a single JSON response (no markdown fences):
39
40	{
41	"diagram_type": "flowchart"\|"sequence"\|"architecture"\|"whiteboard"\|"chart"\|"table"\|"slide"\|"screenshot"\|"unknown",

42	"description": "detailed description of the visual content",
43	"text_content": "all visible text, preserving structure",
44	"elements": ["list", "of", "identified", "elements/components"],
45	"relationships": ["element A -> element B: relationship", ...],
46	"mermaid": "mermaid diagram syntax representing this visual (graph LR, sequenceDiagram, etc.)",
	@@ -68,12 +70,11 @@
68	# Strip markdown fences
69	cleaned = text.strip()
70	if cleaned.startswith("```"):
71	lines = cleaned.split("\n")
72	# Remove first and last fence lines
73	lines = [l for l in lines if not l.strip().startswith("```")]
74	cleaned = "\n".join(lines)
75	try:
76	return json.loads(cleaned)
77	except json.JSONDecodeError:
78	# Try to find JSON object in the text
79	start = cleaned.find("{")
	@@ -105,11 +106,16 @@
105	"""
106	image_bytes = _read_image_bytes(image_path)
107	raw = self.pm.analyze_image(image_bytes, _CLASSIFY_PROMPT, max_tokens=512)
108	result = _parse_json_response(raw)
109	if result is None:
110	return {"is_diagram": False, "diagram_type": "unknown", "confidence": 0.0, "brief_description": ""}





111	return result
112
113	def analyze_diagram_single_pass(self, image_path: Union[str, Path]) -> dict:
114	"""
115	Full single-pass diagram analysis — description, text, mermaid, chart data.
	@@ -163,15 +169,19 @@
163	logger.debug(f"Frame {i}: confidence {confidence:.2f} below threshold, skipping")
164	continue
165
166	if confidence >= 0.7:
167	# Full diagram analysis
168	logger.info(f"Frame {i}: diagram detected (confidence {confidence:.2f}), analyzing...")


169	try:
170	analysis = self.analyze_diagram_single_pass(fp)
171	except Exception as e:
172	logger.warning(f"Diagram analysis failed for frame {i}: {e}, falling back to screengrab")


173	analysis = {}
174
175	if not analysis:
176	# Analysis failed — fall back to screengrab
177	capture = self._save_screengrab(fp, i, capture_idx, captures_dir, confidence)
	@@ -221,16 +231,20 @@
221	diagrams.append(dr)
222	diagram_idx += 1
223
224	else:
225	# Screengrab fallback (0.3 <= confidence < 0.7)
226	logger.info(f"Frame {i}: uncertain (confidence {confidence:.2f}), saving as screengrab")


227	capture = self._save_screengrab(fp, i, capture_idx, captures_dir, confidence)
228	captures.append(capture)
229	capture_idx += 1
230
231	logger.info(f"Diagram processing complete: {len(diagrams)} diagrams, {len(captures)} screengrabs")


232	return diagrams, captures
233
234	def _save_screengrab(
235	self,
236	frame_path: Path,
237

	--- video_processor/analyzers/diagram_analyzer.py
	+++ video_processor/analyzers/diagram_analyzer.py
	@@ -24,23 +24,25 @@
24	shared/presented content, NOT people or camera views.
25
26	Return ONLY a JSON object (no markdown fences):
27	{
28	"is_diagram": true/false,
29	"diagram_type": "flowchart"\|"sequence"\|"architecture"
30	\|"whiteboard"\|"chart"\|"table"\|"slide"\|"screenshot"\|"unknown",
31	"confidence": 0.0 to 1.0,
32	"content_type": "slide"\|"diagram"\|"document"\|"screen_share"\|"whiteboard"\|"chart"\|"person"\|"other",
33	"brief_description": "one-sentence description of what you see"
34	}
35	"""
36
37	# Single-pass analysis prompt — extracts everything in one call
38	_ANALYSIS_PROMPT = """\
39	Analyze this diagram/visual content comprehensively. Extract ALL of the
40	following in a single JSON response (no markdown fences):
41	{
42	"diagram_type": "flowchart"\|"sequence"\|"architecture"
43	\|"whiteboard"\|"chart"\|"table"\|"slide"\|"screenshot"\|"unknown",
44	"description": "detailed description of the visual content",
45	"text_content": "all visible text, preserving structure",
46	"elements": ["list", "of", "identified", "elements/components"],
47	"relationships": ["element A -> element B: relationship", ...],
48	"mermaid": "mermaid diagram syntax representing this visual (graph LR, sequenceDiagram, etc.)",
	@@ -68,12 +70,11 @@
70	# Strip markdown fences
71	cleaned = text.strip()
72	if cleaned.startswith("```"):
73	lines = cleaned.split("\n")
74	# Remove first and last fence lines
75	lines = [line for line in lines if not line.strip().startswith("```")]

76	try:
77	return json.loads(cleaned)
78	except json.JSONDecodeError:
79	# Try to find JSON object in the text
80	start = cleaned.find("{")
	@@ -105,11 +106,16 @@
106	"""
107	image_bytes = _read_image_bytes(image_path)
108	raw = self.pm.analyze_image(image_bytes, _CLASSIFY_PROMPT, max_tokens=512)
109	result = _parse_json_response(raw)
110	if result is None:
111	return {
112	"is_diagram": False,
113	"diagram_type": "unknown",
114	"confidence": 0.0,
115	"brief_description": "",
116	}
117	return result
118
119	def analyze_diagram_single_pass(self, image_path: Union[str, Path]) -> dict:
120	"""
121	Full single-pass diagram analysis — description, text, mermaid, chart data.
	@@ -163,15 +169,19 @@
169	logger.debug(f"Frame {i}: confidence {confidence:.2f} below threshold, skipping")
170	continue
171
172	if confidence >= 0.7:
173	# Full diagram analysis
174	logger.info(
175	f"Frame {i}: diagram detected (confidence {confidence:.2f}), analyzing..."
176	)
177	try:
178	analysis = self.analyze_diagram_single_pass(fp)
179	except Exception as e:
180	logger.warning(
181	f"Diagram analysis failed for frame {i}: {e}, falling back to screengrab"
182	)
183	analysis = {}
184
185	if not analysis:
186	# Analysis failed — fall back to screengrab
187	capture = self._save_screengrab(fp, i, capture_idx, captures_dir, confidence)
	@@ -221,16 +231,20 @@
231	diagrams.append(dr)
232	diagram_idx += 1
233
234	else:
235	# Screengrab fallback (0.3 <= confidence < 0.7)
236	logger.info(
237	f"Frame {i}: uncertain (confidence {confidence:.2f}), saving as screengrab"
238	)
239	capture = self._save_screengrab(fp, i, capture_idx, captures_dir, confidence)
240	captures.append(capture)
241	capture_idx += 1
242
243	logger.info(
244	f"Diagram processing complete: {len(diagrams)} diagrams, {len(captures)} screengrabs"
245	)
246	return diagrams, captures
247
248	def _save_screengrab(
249	self,
250	frame_path: Path,
251

M video_processor/cli/commands.py

+46 -15

		--- video_processor/cli/commands.py
		+++ video_processor/cli/commands.py
		@@ -2,13 +2,11 @@
2	2
3	3	import json
4	4	import logging
5	5	import os
6	6	import sys
7		-import time
8	7	from pathlib import Path
9		-from typing import List, Optional
10	8
11	9	import click
12	10	import colorlog
13	11	from tqdm import tqdm
14	12
		@@ -49,23 +47,32 @@
49	47	if ctx.invoked_subcommand is None:
50	48	_interactive_menu(ctx)
51	49
52	50
53	51	@cli.command()
54		-@click.option("--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path")
	52	+@click.option(
	53	+ "--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path"
	54	+)
55	55	@click.option("--output", "-o", required=True, type=click.Path(), help="Output directory")
56	56	@click.option(
57	57	"--depth",
58	58	type=click.Choice(["basic", "standard", "comprehensive"]),
59	59	default="standard",
60	60	help="Processing depth",
61	61	)
62		-@click.option("--focus", type=str, help='Comma-separated focus areas (e.g., "diagrams,action-items")')
	62	+@click.option(
	63	+ "--focus", type=str, help='Comma-separated focus areas (e.g., "diagrams,action-items")'
	64	+)
63	65	@click.option("--use-gpu", is_flag=True, help="Enable GPU acceleration if available")
64	66	@click.option("--sampling-rate", type=float, default=0.5, help="Frame sampling rate")
65	67	@click.option("--change-threshold", type=float, default=0.15, help="Visual change threshold")
66		-@click.option("--periodic-capture", type=float, default=30.0, help="Capture a frame every N seconds regardless of change (0 to disable)")
	68	+@click.option(
	69	+ "--periodic-capture",
	70	+ type=float,
	71	+ default=30.0,
	72	+ help="Capture a frame every N seconds regardless of change (0 to disable)",
	73	+)
67	74	@click.option("--title", type=str, help="Title for the analysis report")
68	75	@click.option(
69	76	"--provider",
70	77	"-p",
71	78	type=click.Choice(["auto", "openai", "anthropic", "gemini"]),
		@@ -102,11 +109,11 @@
102	109	chat_model=chat_model,
103	110	provider=prov,
104	111	)
105	112
106	113	try:
107		- manifest = process_single_video(
	114	+ process_single_video(
108	115	input_path=input,
109	116	output_dir=output,
110	117	provider_manager=pm,
111	118	depth=depth,
112	119	focus_areas=focus_areas,
		@@ -127,11 +134,13 @@
127	134	traceback.print_exc()
128	135	sys.exit(1)
129	136
130	137
131	138	@cli.command()
132		-@click.option("--input-dir", "-i", type=click.Path(), default=None, help="Local directory of videos")
	139	+@click.option(
	140	+ "--input-dir", "-i", type=click.Path(), default=None, help="Local directory of videos"
	141	+)
133	142	@click.option("--output", "-o", required=True, type=click.Path(), help="Output directory")
134	143	@click.option(
135	144	"--depth",
136	145	type=click.Choice(["basic", "standard", "comprehensive"]),
137	146	default="standard",
		@@ -159,20 +168,35 @@
159	168	default="local",
160	169	help="Video source (local directory, Google Drive, or Dropbox)",
161	170	)
162	171	@click.option("--folder-id", type=str, default=None, help="Google Drive folder ID")
163	172	@click.option("--folder-path", type=str, default=None, help="Cloud folder path")
164		-@click.option("--recursive/--no-recursive", default=True, help="Recurse into subfolders (default: recursive)")
	173	+@click.option(
	174	+ "--recursive/--no-recursive", default=True, help="Recurse into subfolders (default: recursive)"
	175	+)
165	176	@click.pass_context
166		-def batch(ctx, input_dir, output, depth, pattern, title, provider, vision_model, chat_model, source, folder_id, folder_path, recursive):
	177	+def batch(
	178	+ ctx,
	179	+ input_dir,
	180	+ output,
	181	+ depth,
	182	+ pattern,
	183	+ title,
	184	+ provider,
	185	+ vision_model,
	186	+ chat_model,
	187	+ source,
	188	+ folder_id,
	189	+ folder_path,
	190	+ recursive,
	191	+):
167	192	"""Process a folder of videos in batch."""
168	193	from video_processor.integrators.knowledge_graph import KnowledgeGraph
169	194	from video_processor.integrators.plan_generator import PlanGenerator
170	195	from video_processor.models import BatchManifest, BatchVideoEntry
171	196	from video_processor.output_structure import (
172	197	create_batch_output_dirs,
173		- read_video_manifest,
174	198	write_batch_manifest,
175	199	)
176	200	from video_processor.pipeline import process_single_video
177	201	from video_processor.providers.manager import ProviderManager
178	202
		@@ -190,21 +214,23 @@
190	214
191	215	cloud = GoogleDriveSource()
192	216	if not cloud.authenticate():
193	217	logging.error("Google Drive authentication failed")
194	218	sys.exit(1)
195		- cloud_files = cloud.list_videos(folder_id=folder_id, folder_path=folder_path, patterns=patterns, recursive=recursive)
196		- local_paths = cloud.download_all(cloud_files, download_dir)
	219	+ cloud_files = cloud.list_videos(
	220	+ folder_id=folder_id, folder_path=folder_path, patterns=patterns, recursive=recursive
	221	+ )
	222	+ cloud.download_all(cloud_files, download_dir)
197	223	elif source == "dropbox":
198	224	from video_processor.sources.dropbox_source import DropboxSource
199	225
200	226	cloud = DropboxSource()
201	227	if not cloud.authenticate():
202	228	logging.error("Dropbox authentication failed")
203	229	sys.exit(1)
204	230	cloud_files = cloud.list_videos(folder_path=folder_path, patterns=patterns)
205		- local_paths = cloud.download_all(cloud_files, download_dir)
	231	+ cloud.download_all(cloud_files, download_dir)
206	232	else:
207	233	logging.error(f"Unknown source: {source}")
208	234	sys.exit(1)
209	235
210	236	input_dir = download_dir
		@@ -302,11 +328,14 @@
302	328	batch_summary_md="batch_summary.md",
303	329	merged_knowledge_graph_json="knowledge_graph.json",
304	330	)
305	331	write_batch_manifest(batch_manifest, output)
306	332	click.echo(pm.usage.format_summary())
307		- click.echo(f"\n Batch complete: {batch_manifest.completed_videos}/{batch_manifest.total_videos} succeeded")
	333	+ click.echo(
	334	+ f"\n Batch complete: {batch_manifest.completed_videos}"
	335	+ f"/{batch_manifest.total_videos} succeeded"
	336	+ )
308	337	click.echo(f" Results: {output}/batch_manifest.json")
309	338
310	339
311	340	@cli.command("list-models")
312	341	@click.pass_context
		@@ -374,11 +403,13 @@
374	403	traceback.print_exc()
375	404	sys.exit(1)
376	405
377	406
378	407	@cli.command("agent-analyze")
379		-@click.option("--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path")
	408	+@click.option(
	409	+ "--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path"
	410	+)
380	411	@click.option("--output", "-o", required=True, type=click.Path(), help="Output directory")
381	412	@click.option(
382	413	"--depth",
383	414	type=click.Choice(["basic", "standard", "comprehensive"]),
384	415	default="standard",
385	416

	--- video_processor/cli/commands.py
	+++ video_processor/cli/commands.py
	@@ -2,13 +2,11 @@
2
3	import json
4	import logging
5	import os
6	import sys
7	import time
8	from pathlib import Path
9	from typing import List, Optional
10
11	import click
12	import colorlog
13	from tqdm import tqdm
14
	@@ -49,23 +47,32 @@
49	if ctx.invoked_subcommand is None:
50	_interactive_menu(ctx)
51
52
53	@cli.command()
54	@click.option("--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path")


55	@click.option("--output", "-o", required=True, type=click.Path(), help="Output directory")
56	@click.option(
57	"--depth",
58	type=click.Choice(["basic", "standard", "comprehensive"]),
59	default="standard",
60	help="Processing depth",
61	)
62	@click.option("--focus", type=str, help='Comma-separated focus areas (e.g., "diagrams,action-items")')


63	@click.option("--use-gpu", is_flag=True, help="Enable GPU acceleration if available")
64	@click.option("--sampling-rate", type=float, default=0.5, help="Frame sampling rate")
65	@click.option("--change-threshold", type=float, default=0.15, help="Visual change threshold")
66	@click.option("--periodic-capture", type=float, default=30.0, help="Capture a frame every N seconds regardless of change (0 to disable)")





67	@click.option("--title", type=str, help="Title for the analysis report")
68	@click.option(
69	"--provider",
70	"-p",
71	type=click.Choice(["auto", "openai", "anthropic", "gemini"]),
	@@ -102,11 +109,11 @@
102	chat_model=chat_model,
103	provider=prov,
104	)
105
106	try:
107	manifest = process_single_video(
108	input_path=input,
109	output_dir=output,
110	provider_manager=pm,
111	depth=depth,
112	focus_areas=focus_areas,
	@@ -127,11 +134,13 @@
127	traceback.print_exc()
128	sys.exit(1)
129
130
131	@cli.command()
132	@click.option("--input-dir", "-i", type=click.Path(), default=None, help="Local directory of videos")


133	@click.option("--output", "-o", required=True, type=click.Path(), help="Output directory")
134	@click.option(
135	"--depth",
136	type=click.Choice(["basic", "standard", "comprehensive"]),
137	default="standard",
	@@ -159,20 +168,35 @@
159	default="local",
160	help="Video source (local directory, Google Drive, or Dropbox)",
161	)
162	@click.option("--folder-id", type=str, default=None, help="Google Drive folder ID")
163	@click.option("--folder-path", type=str, default=None, help="Cloud folder path")
164	@click.option("--recursive/--no-recursive", default=True, help="Recurse into subfolders (default: recursive)")


165	@click.pass_context
166	def batch(ctx, input_dir, output, depth, pattern, title, provider, vision_model, chat_model, source, folder_id, folder_path, recursive):














167	"""Process a folder of videos in batch."""
168	from video_processor.integrators.knowledge_graph import KnowledgeGraph
169	from video_processor.integrators.plan_generator import PlanGenerator
170	from video_processor.models import BatchManifest, BatchVideoEntry
171	from video_processor.output_structure import (
172	create_batch_output_dirs,
173	read_video_manifest,
174	write_batch_manifest,
175	)
176	from video_processor.pipeline import process_single_video
177	from video_processor.providers.manager import ProviderManager
178
	@@ -190,21 +214,23 @@
190
191	cloud = GoogleDriveSource()
192	if not cloud.authenticate():
193	logging.error("Google Drive authentication failed")
194	sys.exit(1)
195	cloud_files = cloud.list_videos(folder_id=folder_id, folder_path=folder_path, patterns=patterns, recursive=recursive)
196	local_paths = cloud.download_all(cloud_files, download_dir)


197	elif source == "dropbox":
198	from video_processor.sources.dropbox_source import DropboxSource
199
200	cloud = DropboxSource()
201	if not cloud.authenticate():
202	logging.error("Dropbox authentication failed")
203	sys.exit(1)
204	cloud_files = cloud.list_videos(folder_path=folder_path, patterns=patterns)
205	local_paths = cloud.download_all(cloud_files, download_dir)
206	else:
207	logging.error(f"Unknown source: {source}")
208	sys.exit(1)
209
210	input_dir = download_dir
	@@ -302,11 +328,14 @@
302	batch_summary_md="batch_summary.md",
303	merged_knowledge_graph_json="knowledge_graph.json",
304	)
305	write_batch_manifest(batch_manifest, output)
306	click.echo(pm.usage.format_summary())
307	click.echo(f"\n Batch complete: {batch_manifest.completed_videos}/{batch_manifest.total_videos} succeeded")



308	click.echo(f" Results: {output}/batch_manifest.json")
309
310
311	@cli.command("list-models")
312	@click.pass_context
	@@ -374,11 +403,13 @@
374	traceback.print_exc()
375	sys.exit(1)
376
377
378	@cli.command("agent-analyze")
379	@click.option("--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path")


380	@click.option("--output", "-o", required=True, type=click.Path(), help="Output directory")
381	@click.option(
382	"--depth",
383	type=click.Choice(["basic", "standard", "comprehensive"]),
384	default="standard",
385

	--- video_processor/cli/commands.py
	+++ video_processor/cli/commands.py
	@@ -2,13 +2,11 @@
2
3	import json
4	import logging
5	import os
6	import sys

7	from pathlib import Path

8
9	import click
10	import colorlog
11	from tqdm import tqdm
12
	@@ -49,23 +47,32 @@
47	if ctx.invoked_subcommand is None:
48	_interactive_menu(ctx)
49
50
51	@cli.command()
52	@click.option(
53	"--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path"
54	)
55	@click.option("--output", "-o", required=True, type=click.Path(), help="Output directory")
56	@click.option(
57	"--depth",
58	type=click.Choice(["basic", "standard", "comprehensive"]),
59	default="standard",
60	help="Processing depth",
61	)
62	@click.option(
63	"--focus", type=str, help='Comma-separated focus areas (e.g., "diagrams,action-items")'
64	)
65	@click.option("--use-gpu", is_flag=True, help="Enable GPU acceleration if available")
66	@click.option("--sampling-rate", type=float, default=0.5, help="Frame sampling rate")
67	@click.option("--change-threshold", type=float, default=0.15, help="Visual change threshold")
68	@click.option(
69	"--periodic-capture",
70	type=float,
71	default=30.0,
72	help="Capture a frame every N seconds regardless of change (0 to disable)",
73	)
74	@click.option("--title", type=str, help="Title for the analysis report")
75	@click.option(
76	"--provider",
77	"-p",
78	type=click.Choice(["auto", "openai", "anthropic", "gemini"]),
	@@ -102,11 +109,11 @@
109	chat_model=chat_model,
110	provider=prov,
111	)
112
113	try:
114	process_single_video(
115	input_path=input,
116	output_dir=output,
117	provider_manager=pm,
118	depth=depth,
119	focus_areas=focus_areas,
	@@ -127,11 +134,13 @@
134	traceback.print_exc()
135	sys.exit(1)
136
137
138	@cli.command()
139	@click.option(
140	"--input-dir", "-i", type=click.Path(), default=None, help="Local directory of videos"
141	)
142	@click.option("--output", "-o", required=True, type=click.Path(), help="Output directory")
143	@click.option(
144	"--depth",
145	type=click.Choice(["basic", "standard", "comprehensive"]),
146	default="standard",
	@@ -159,20 +168,35 @@
168	default="local",
169	help="Video source (local directory, Google Drive, or Dropbox)",
170	)
171	@click.option("--folder-id", type=str, default=None, help="Google Drive folder ID")
172	@click.option("--folder-path", type=str, default=None, help="Cloud folder path")
173	@click.option(
174	"--recursive/--no-recursive", default=True, help="Recurse into subfolders (default: recursive)"
175	)
176	@click.pass_context
177	def batch(
178	ctx,
179	input_dir,
180	output,
181	depth,
182	pattern,
183	title,
184	provider,
185	vision_model,
186	chat_model,
187	source,
188	folder_id,
189	folder_path,
190	recursive,
191	):
192	"""Process a folder of videos in batch."""
193	from video_processor.integrators.knowledge_graph import KnowledgeGraph
194	from video_processor.integrators.plan_generator import PlanGenerator
195	from video_processor.models import BatchManifest, BatchVideoEntry
196	from video_processor.output_structure import (
197	create_batch_output_dirs,

198	write_batch_manifest,
199	)
200	from video_processor.pipeline import process_single_video
201	from video_processor.providers.manager import ProviderManager
202
	@@ -190,21 +214,23 @@
214
215	cloud = GoogleDriveSource()
216	if not cloud.authenticate():
217	logging.error("Google Drive authentication failed")
218	sys.exit(1)
219	cloud_files = cloud.list_videos(
220	folder_id=folder_id, folder_path=folder_path, patterns=patterns, recursive=recursive
221	)
222	cloud.download_all(cloud_files, download_dir)
223	elif source == "dropbox":
224	from video_processor.sources.dropbox_source import DropboxSource
225
226	cloud = DropboxSource()
227	if not cloud.authenticate():
228	logging.error("Dropbox authentication failed")
229	sys.exit(1)
230	cloud_files = cloud.list_videos(folder_path=folder_path, patterns=patterns)
231	cloud.download_all(cloud_files, download_dir)
232	else:
233	logging.error(f"Unknown source: {source}")
234	sys.exit(1)
235
236	input_dir = download_dir
	@@ -302,11 +328,14 @@
328	batch_summary_md="batch_summary.md",
329	merged_knowledge_graph_json="knowledge_graph.json",
330	)
331	write_batch_manifest(batch_manifest, output)
332	click.echo(pm.usage.format_summary())
333	click.echo(
334	f"\n Batch complete: {batch_manifest.completed_videos}"
335	f"/{batch_manifest.total_videos} succeeded"
336	)
337	click.echo(f" Results: {output}/batch_manifest.json")
338
339
340	@cli.command("list-models")
341	@click.pass_context
	@@ -374,11 +403,13 @@
403	traceback.print_exc()
404	sys.exit(1)
405
406
407	@cli.command("agent-analyze")
408	@click.option(
409	"--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path"
410	)
411	@click.option("--output", "-o", required=True, type=click.Path(), help="Output directory")
412	@click.option(
413	"--depth",
414	type=click.Choice(["basic", "standard", "comprehensive"]),
415	default="standard",
416

M video_processor/cli/output_formatter.py

+25 -22

		--- video_processor/cli/output_formatter.py
		+++ video_processor/cli/output_formatter.py
		@@ -1,42 +1,42 @@
1	1	"""Output formatting for PlanOpticon analysis results."""
2	2
3	3	import html
4		-import json
5	4	import logging
6	5	import shutil
7	6	from pathlib import Path
8	7	from typing import Dict, List, Optional, Union
9	8
10	9	logger = logging.getLogger(__name__)
	10	+
11	11
12	12	class OutputFormatter:
13	13	"""Formats and organizes output from video analysis."""
14		-
	14	+
15	15	def __init__(self, output_dir: Union[str, Path]):
16	16	"""
17	17	Initialize output formatter.
18		-
	18	+
19	19	Parameters
20	20	----------
21	21	output_dir : str or Path
22	22	Output directory for formatted content
23	23	"""
24	24	self.output_dir = Path(output_dir)
25	25	self.output_dir.mkdir(parents=True, exist_ok=True)
26		-
	26	+
27	27	def organize_outputs(
28	28	self,
29	29	markdown_path: Union[str, Path],
30	30	knowledge_graph_path: Union[str, Path],
31	31	diagrams: List[Dict],
32	32	frames_dir: Optional[Union[str, Path]] = None,
33		- transcript_path: Optional[Union[str, Path]] = None
	33	+ transcript_path: Optional[Union[str, Path]] = None,
34	34	) -> Dict:
35	35	"""
36	36	Organize outputs into a consistent structure.
37		-
	37	+
38	38	Parameters
39	39	----------
40	40	markdown_path : str or Path
41	41	Path to markdown analysis
42	42	knowledge_graph_path : str or Path
		@@ -45,84 +45,84 @@
45	45	List of diagram analysis results
46	46	frames_dir : str or Path, optional
47	47	Directory with extracted frames
48	48	transcript_path : str or Path, optional
49	49	Path to transcript file
50		-
	50	+
51	51	Returns
52	52	-------
53	53	dict
54	54	Dictionary with organized output paths
55	55	"""
56	56	# Create output structure
57	57	md_dir = self.output_dir / "markdown"
58	58	diagrams_dir = self.output_dir / "diagrams"
59	59	data_dir = self.output_dir / "data"
60		-
	60	+
61	61	md_dir.mkdir(exist_ok=True)
62	62	diagrams_dir.mkdir(exist_ok=True)
63	63	data_dir.mkdir(exist_ok=True)
64		-
	64	+
65	65	# Copy markdown file
66	66	markdown_path = Path(markdown_path)
67	67	md_output = md_dir / markdown_path.name
68	68	shutil.copy2(markdown_path, md_output)
69		-
	69	+
70	70	# Copy knowledge graph
71	71	kg_path = Path(knowledge_graph_path)
72	72	kg_output = data_dir / kg_path.name
73	73	shutil.copy2(kg_path, kg_output)
74		-
	74	+
75	75	# Copy diagram images if available
76	76	diagram_images = []
77	77	for diagram in diagrams:
78	78	if "image_path" in diagram and diagram["image_path"]:
79	79	img_path = Path(diagram["image_path"])
80	80	if img_path.exists():
81	81	img_output = diagrams_dir / img_path.name
82	82	shutil.copy2(img_path, img_output)
83	83	diagram_images.append(str(img_output))
84		-
	84	+
85	85	# Copy transcript if provided
86	86	transcript_output = None
87	87	if transcript_path:
88	88	transcript_path = Path(transcript_path)
89	89	if transcript_path.exists():
90	90	transcript_output = data_dir / transcript_path.name
91	91	shutil.copy2(transcript_path, transcript_output)
92		-
	92	+
93	93	# Copy selected frames if provided
94	94	frame_outputs = []
95	95	if frames_dir:
96	96	frames_dir = Path(frames_dir)
97	97	if frames_dir.exists():
98	98	frames_output_dir = self.output_dir / "frames"
99	99	frames_output_dir.mkdir(exist_ok=True)
100		-
	100	+
101	101	# Copy a limited number of representative frames
102	102	frame_files = sorted(list(frames_dir.glob("*.jpg")))
103	103	max_frames = min(10, len(frame_files))
104	104	step = max(1, len(frame_files) // max_frames)
105		-
	105	+
106	106	for i in range(0, len(frame_files), step):
107	107	if len(frame_outputs) >= max_frames:
108	108	break
109		-
	109	+
110	110	frame = frame_files[i]
111	111	frame_output = frames_output_dir / frame.name
112	112	shutil.copy2(frame, frame_output)
113	113	frame_outputs.append(str(frame_output))
114		-
	114	+
115	115	# Return organized paths
116	116	return {
117	117	"markdown": str(md_output),
118	118	"knowledge_graph": str(kg_output),
119	119	"diagram_images": diagram_images,
120	120	"frames": frame_outputs,
121		- "transcript": str(transcript_output) if transcript_output else None
	121	+ "transcript": str(transcript_output) if transcript_output else None,
122	122	}
123		-
	123	+
124	124	def create_html_index(self, outputs: Dict) -> Path:
125	125	"""
126	126	Create HTML index page for outputs.
127	127
128	128	Parameters
		@@ -142,11 +142,12 @@
142	142	"<!DOCTYPE html>",
143	143	"<html>",
144	144	"<head>",
145	145	" <title>PlanOpticon Analysis Results</title>",
146	146	" <style>",
147		- " body { font-family: Arial, sans-serif; margin: 0; padding: 20px; line-height: 1.6; }",
	147	+ " body { font-family: Arial, sans-serif;"
	148	+ " margin: 0; padding: 20px; line-height: 1.6; }",
148	149	" .container { max-width: 1200px; margin: 0 auto; }",
149	150	" h1 { color: #333; }",
150	151	" h2 { color: #555; margin-top: 30px; }",
151	152	" .section { margin-bottom: 30px; }",
152	153	" .files { display: flex; flex-wrap: wrap; }",
		@@ -158,11 +159,11 @@
158	159	" </style>",
159	160	"</head>",
160	161	"<body>",
161	162	"<div class='container'>",
162	163	" <h1>PlanOpticon Analysis Results</h1>",
163		- ""
	164	+ "",
164	165	]
165	166
166	167	# Add markdown section
167	168	if outputs.get("markdown"):
168	169	md_path = Path(outputs["markdown"])
		@@ -228,11 +229,13 @@
228	229	lines.append(" <ul>")
229	230
230	231	for data_path in data_files:
231	232	data_rel = esc(str(data_path.relative_to(self.output_dir)))
232	233	data_name = esc(data_path.name)
233		- lines.append(f" <li><a href='{data_rel}' target='_blank'>{data_name}</a></li>")
	234	+ lines.append(
	235	+ f" <li><a href='{data_rel}' target='_blank'>{data_name}</a></li>"
	236	+ )
234	237
235	238	lines.append(" </ul>")
236	239	lines.append(" </div>")
237	240
238	241	# Close HTML
239	242

	--- video_processor/cli/output_formatter.py
	+++ video_processor/cli/output_formatter.py
	@@ -1,42 +1,42 @@
1	"""Output formatting for PlanOpticon analysis results."""
2
3	import html
4	import json
5	import logging
6	import shutil
7	from pathlib import Path
8	from typing import Dict, List, Optional, Union
9
10	logger = logging.getLogger(__name__)

11
12	class OutputFormatter:
13	"""Formats and organizes output from video analysis."""
14
15	def __init__(self, output_dir: Union[str, Path]):
16	"""
17	Initialize output formatter.
18
19	Parameters
20	----------
21	output_dir : str or Path
22	Output directory for formatted content
23	"""
24	self.output_dir = Path(output_dir)
25	self.output_dir.mkdir(parents=True, exist_ok=True)
26
27	def organize_outputs(
28	self,
29	markdown_path: Union[str, Path],
30	knowledge_graph_path: Union[str, Path],
31	diagrams: List[Dict],
32	frames_dir: Optional[Union[str, Path]] = None,
33	transcript_path: Optional[Union[str, Path]] = None
34	) -> Dict:
35	"""
36	Organize outputs into a consistent structure.
37
38	Parameters
39	----------
40	markdown_path : str or Path
41	Path to markdown analysis
42	knowledge_graph_path : str or Path
	@@ -45,84 +45,84 @@
45	List of diagram analysis results
46	frames_dir : str or Path, optional
47	Directory with extracted frames
48	transcript_path : str or Path, optional
49	Path to transcript file
50
51	Returns
52	-------
53	dict
54	Dictionary with organized output paths
55	"""
56	# Create output structure
57	md_dir = self.output_dir / "markdown"
58	diagrams_dir = self.output_dir / "diagrams"
59	data_dir = self.output_dir / "data"
60
61	md_dir.mkdir(exist_ok=True)
62	diagrams_dir.mkdir(exist_ok=True)
63	data_dir.mkdir(exist_ok=True)
64
65	# Copy markdown file
66	markdown_path = Path(markdown_path)
67	md_output = md_dir / markdown_path.name
68	shutil.copy2(markdown_path, md_output)
69
70	# Copy knowledge graph
71	kg_path = Path(knowledge_graph_path)
72	kg_output = data_dir / kg_path.name
73	shutil.copy2(kg_path, kg_output)
74
75	# Copy diagram images if available
76	diagram_images = []
77	for diagram in diagrams:
78	if "image_path" in diagram and diagram["image_path"]:
79	img_path = Path(diagram["image_path"])
80	if img_path.exists():
81	img_output = diagrams_dir / img_path.name
82	shutil.copy2(img_path, img_output)
83	diagram_images.append(str(img_output))
84
85	# Copy transcript if provided
86	transcript_output = None
87	if transcript_path:
88	transcript_path = Path(transcript_path)
89	if transcript_path.exists():
90	transcript_output = data_dir / transcript_path.name
91	shutil.copy2(transcript_path, transcript_output)
92
93	# Copy selected frames if provided
94	frame_outputs = []
95	if frames_dir:
96	frames_dir = Path(frames_dir)
97	if frames_dir.exists():
98	frames_output_dir = self.output_dir / "frames"
99	frames_output_dir.mkdir(exist_ok=True)
100
101	# Copy a limited number of representative frames
102	frame_files = sorted(list(frames_dir.glob("*.jpg")))
103	max_frames = min(10, len(frame_files))
104	step = max(1, len(frame_files) // max_frames)
105
106	for i in range(0, len(frame_files), step):
107	if len(frame_outputs) >= max_frames:
108	break
109
110	frame = frame_files[i]
111	frame_output = frames_output_dir / frame.name
112	shutil.copy2(frame, frame_output)
113	frame_outputs.append(str(frame_output))
114
115	# Return organized paths
116	return {
117	"markdown": str(md_output),
118	"knowledge_graph": str(kg_output),
119	"diagram_images": diagram_images,
120	"frames": frame_outputs,
121	"transcript": str(transcript_output) if transcript_output else None
122	}
123
124	def create_html_index(self, outputs: Dict) -> Path:
125	"""
126	Create HTML index page for outputs.
127
128	Parameters
	@@ -142,11 +142,12 @@
142	"<!DOCTYPE html>",
143	"<html>",
144	"<head>",
145	" <title>PlanOpticon Analysis Results</title>",
146	" <style>",
147	" body { font-family: Arial, sans-serif; margin: 0; padding: 20px; line-height: 1.6; }",

148	" .container { max-width: 1200px; margin: 0 auto; }",
149	" h1 { color: #333; }",
150	" h2 { color: #555; margin-top: 30px; }",
151	" .section { margin-bottom: 30px; }",
152	" .files { display: flex; flex-wrap: wrap; }",
	@@ -158,11 +159,11 @@
158	" </style>",
159	"</head>",
160	"<body>",
161	"<div class='container'>",
162	" <h1>PlanOpticon Analysis Results</h1>",
163	""
164	]
165
166	# Add markdown section
167	if outputs.get("markdown"):
168	md_path = Path(outputs["markdown"])
	@@ -228,11 +229,13 @@
228	lines.append(" <ul>")
229
230	for data_path in data_files:
231	data_rel = esc(str(data_path.relative_to(self.output_dir)))
232	data_name = esc(data_path.name)
233	lines.append(f" <li><a href='{data_rel}' target='_blank'>{data_name}</a></li>")


234
235	lines.append(" </ul>")
236	lines.append(" </div>")
237
238	# Close HTML
239

	--- video_processor/cli/output_formatter.py
	+++ video_processor/cli/output_formatter.py
	@@ -1,42 +1,42 @@
1	"""Output formatting for PlanOpticon analysis results."""
2
3	import html

4	import logging
5	import shutil
6	from pathlib import Path
7	from typing import Dict, List, Optional, Union
8
9	logger = logging.getLogger(__name__)
10
11
12	class OutputFormatter:
13	"""Formats and organizes output from video analysis."""
14
15	def __init__(self, output_dir: Union[str, Path]):
16	"""
17	Initialize output formatter.
18
19	Parameters
20	----------
21	output_dir : str or Path
22	Output directory for formatted content
23	"""
24	self.output_dir = Path(output_dir)
25	self.output_dir.mkdir(parents=True, exist_ok=True)
26
27	def organize_outputs(
28	self,
29	markdown_path: Union[str, Path],
30	knowledge_graph_path: Union[str, Path],
31	diagrams: List[Dict],
32	frames_dir: Optional[Union[str, Path]] = None,
33	transcript_path: Optional[Union[str, Path]] = None,
34	) -> Dict:
35	"""
36	Organize outputs into a consistent structure.
37
38	Parameters
39	----------
40	markdown_path : str or Path
41	Path to markdown analysis
42	knowledge_graph_path : str or Path
	@@ -45,84 +45,84 @@
45	List of diagram analysis results
46	frames_dir : str or Path, optional
47	Directory with extracted frames
48	transcript_path : str or Path, optional
49	Path to transcript file
50
51	Returns
52	-------
53	dict
54	Dictionary with organized output paths
55	"""
56	# Create output structure
57	md_dir = self.output_dir / "markdown"
58	diagrams_dir = self.output_dir / "diagrams"
59	data_dir = self.output_dir / "data"
60
61	md_dir.mkdir(exist_ok=True)
62	diagrams_dir.mkdir(exist_ok=True)
63	data_dir.mkdir(exist_ok=True)
64
65	# Copy markdown file
66	markdown_path = Path(markdown_path)
67	md_output = md_dir / markdown_path.name
68	shutil.copy2(markdown_path, md_output)
69
70	# Copy knowledge graph
71	kg_path = Path(knowledge_graph_path)
72	kg_output = data_dir / kg_path.name
73	shutil.copy2(kg_path, kg_output)
74
75	# Copy diagram images if available
76	diagram_images = []
77	for diagram in diagrams:
78	if "image_path" in diagram and diagram["image_path"]:
79	img_path = Path(diagram["image_path"])
80	if img_path.exists():
81	img_output = diagrams_dir / img_path.name
82	shutil.copy2(img_path, img_output)
83	diagram_images.append(str(img_output))
84
85	# Copy transcript if provided
86	transcript_output = None
87	if transcript_path:
88	transcript_path = Path(transcript_path)
89	if transcript_path.exists():
90	transcript_output = data_dir / transcript_path.name
91	shutil.copy2(transcript_path, transcript_output)
92
93	# Copy selected frames if provided
94	frame_outputs = []
95	if frames_dir:
96	frames_dir = Path(frames_dir)
97	if frames_dir.exists():
98	frames_output_dir = self.output_dir / "frames"
99	frames_output_dir.mkdir(exist_ok=True)
100
101	# Copy a limited number of representative frames
102	frame_files = sorted(list(frames_dir.glob("*.jpg")))
103	max_frames = min(10, len(frame_files))
104	step = max(1, len(frame_files) // max_frames)
105
106	for i in range(0, len(frame_files), step):
107	if len(frame_outputs) >= max_frames:
108	break
109
110	frame = frame_files[i]
111	frame_output = frames_output_dir / frame.name
112	shutil.copy2(frame, frame_output)
113	frame_outputs.append(str(frame_output))
114
115	# Return organized paths
116	return {
117	"markdown": str(md_output),
118	"knowledge_graph": str(kg_output),
119	"diagram_images": diagram_images,
120	"frames": frame_outputs,
121	"transcript": str(transcript_output) if transcript_output else None,
122	}
123
124	def create_html_index(self, outputs: Dict) -> Path:
125	"""
126	Create HTML index page for outputs.
127
128	Parameters
	@@ -142,11 +142,12 @@
142	"<!DOCTYPE html>",
143	"<html>",
144	"<head>",
145	" <title>PlanOpticon Analysis Results</title>",
146	" <style>",
147	" body { font-family: Arial, sans-serif;"
148	" margin: 0; padding: 20px; line-height: 1.6; }",
149	" .container { max-width: 1200px; margin: 0 auto; }",
150	" h1 { color: #333; }",
151	" h2 { color: #555; margin-top: 30px; }",
152	" .section { margin-bottom: 30px; }",
153	" .files { display: flex; flex-wrap: wrap; }",
	@@ -158,11 +159,11 @@
159	" </style>",
160	"</head>",
161	"<body>",
162	"<div class='container'>",
163	" <h1>PlanOpticon Analysis Results</h1>",
164	"",
165	]
166
167	# Add markdown section
168	if outputs.get("markdown"):
169	md_path = Path(outputs["markdown"])
	@@ -228,11 +229,13 @@
229	lines.append(" <ul>")
230
231	for data_path in data_files:
232	data_rel = esc(str(data_path.relative_to(self.output_dir)))
233	data_name = esc(data_path.name)
234	lines.append(
235	f" <li><a href='{data_rel}' target='_blank'>{data_name}</a></li>"
236	)
237
238	lines.append(" </ul>")
239	lines.append(" </div>")
240
241	# Close HTML
242

M video_processor/extractors/__init__.py

+11 -11

		--- video_processor/extractors/__init__.py
		+++ video_processor/extractors/__init__.py
		@@ -1,17 +1,17 @@
	1	+from video_processor.extractors.audio_extractor import AudioExtractor
1	2	from video_processor.extractors.frame_extractor import (
2		- extract_frames,
3		- save_frames,
4		- calculate_frame_difference,
5		- is_gpu_available
	3	+ calculate_frame_difference,
	4	+ extract_frames,
	5	+ is_gpu_available,
	6	+ save_frames,
6	7	)
7		-from video_processor.extractors.audio_extractor import AudioExtractor
8	8	from video_processor.extractors.text_extractor import TextExtractor
9	9
10	10	__all__ = [
11		- 'extract_frames',
12		- 'save_frames',
13		- 'calculate_frame_difference',
14		- 'is_gpu_available',
15		- 'AudioExtractor',
16		- 'TextExtractor',
	11	+ "extract_frames",
	12	+ "save_frames",
	13	+ "calculate_frame_difference",
	14	+ "is_gpu_available",
	15	+ "AudioExtractor",
	16	+ "TextExtractor",
17	17	]
18	18

	--- video_processor/extractors/__init__.py
	+++ video_processor/extractors/__init__.py
	@@ -1,17 +1,17 @@

1	from video_processor.extractors.frame_extractor import (
2	extract_frames,
3	save_frames,
4	calculate_frame_difference,
5	is_gpu_available
6	)
7	from video_processor.extractors.audio_extractor import AudioExtractor
8	from video_processor.extractors.text_extractor import TextExtractor
9
10	__all__ = [
11	'extract_frames',
12	'save_frames',
13	'calculate_frame_difference',
14	'is_gpu_available',
15	'AudioExtractor',
16	'TextExtractor',
17	]
18

	--- video_processor/extractors/__init__.py
	+++ video_processor/extractors/__init__.py
	@@ -1,17 +1,17 @@
1	from video_processor.extractors.audio_extractor import AudioExtractor
2	from video_processor.extractors.frame_extractor import (
3	calculate_frame_difference,
4	extract_frames,
5	is_gpu_available,
6	save_frames,
7	)

8	from video_processor.extractors.text_extractor import TextExtractor
9
10	__all__ = [
11	"extract_frames",
12	"save_frames",
13	"calculate_frame_difference",
14	"is_gpu_available",
15	"AudioExtractor",
16	"TextExtractor",
17	]
18

M video_processor/extractors/audio_extractor.py

+60 -65

		--- video_processor/extractors/audio_extractor.py
		+++ video_processor/extractors/audio_extractor.py
		@@ -1,172 +1,170 @@
1	1	"""Audio extraction and processing module for video analysis."""
	2	+
2	3	import logging
3		-import os
4	4	import subprocess
5	5	from pathlib import Path
6	6	from typing import Dict, Optional, Tuple, Union
7	7
8	8	import librosa
9	9	import numpy as np
10	10	import soundfile as sf
11	11
12	12	logger = logging.getLogger(__name__)
	13	+
13	14
14	15	class AudioExtractor:
15	16	"""Extract and process audio from video files."""
16		-
	17	+
17	18	def __init__(self, sample_rate: int = 16000, mono: bool = True):
18	19	"""
19	20	Initialize the audio extractor.
20		-
	21	+
21	22	Parameters
22	23	----------
23	24	sample_rate : int
24	25	Target sample rate for extracted audio
25	26	mono : bool
26	27	Whether to convert audio to mono
27	28	"""
28	29	self.sample_rate = sample_rate
29	30	self.mono = mono
30		-
	31	+
31	32	def extract_audio(
32		- self,
33		- video_path: Union[str, Path],
34		- output_path: Optional[Union[str, Path]] = None,
35		- format: str = "wav"
	33	+ self,
	34	+ video_path: Union[str, Path],
	35	+ output_path: Optional[Union[str, Path]] = None,
	36	+ format: str = "wav",
36	37	) -> Path:
37	38	"""
38	39	Extract audio from video file.
39		-
	40	+
40	41	Parameters
41	42	----------
42	43	video_path : str or Path
43	44	Path to video file
44	45	output_path : str or Path, optional
45	46	Path to save extracted audio (if None, saves alongside video)
46	47	format : str
47	48	Audio format to save (wav, mp3, etc.)
48		-
	49	+
49	50	Returns
50	51	-------
51	52	Path
52	53	Path to extracted audio file
53	54	"""
54	55	video_path = Path(video_path)
55	56	if not video_path.exists():
56	57	raise FileNotFoundError(f"Video file not found: {video_path}")
57		-
	58	+
58	59	# Generate output path if not provided
59	60	if output_path is None:
60	61	output_path = video_path.with_suffix(f".{format}")
61	62	else:
62	63	output_path = Path(output_path)
63		-
	64	+
64	65	# Ensure output directory exists
65	66	output_path.parent.mkdir(parents=True, exist_ok=True)
66		-
	67	+
67	68	# Extract audio using ffmpeg
68	69	try:
69	70	cmd = [
70		- "ffmpeg",
71		- "-i", str(video_path),
72		- "-vn", # No video
73		- "-acodec", "pcm_s16le", # PCM 16-bit little-endian
74		- "-ar", str(self.sample_rate), # Sample rate
75		- "-ac", "1" if self.mono else "2", # Channels (mono or stereo)
76		- "-y", # Overwrite output
77		- str(output_path)
78		- ]
79		-
80		- # Run ffmpeg command
81		- result = subprocess.run(
82		- cmd,
83		- stdout=subprocess.PIPE,
84		- stderr=subprocess.PIPE,
85		- check=True
86		- )
87		-
	71	+ "ffmpeg",
	72	+ "-i",
	73	+ str(video_path),
	74	+ "-vn", # No video
	75	+ "-acodec",
	76	+ "pcm_s16le", # PCM 16-bit little-endian
	77	+ "-ar",
	78	+ str(self.sample_rate), # Sample rate
	79	+ "-ac",
	80	+ "1" if self.mono else "2", # Channels (mono or stereo)
	81	+ "-y", # Overwrite output
	82	+ str(output_path),
	83	+ ]
	84	+
	85	+ # Run ffmpeg command
	86	+ subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True)
	87	+
88	88	logger.info(f"Extracted audio from {video_path} to {output_path}")
89	89	return output_path
90		-
	90	+
91	91	except subprocess.CalledProcessError as e:
92	92	logger.error(f"Failed to extract audio: {e.stderr.decode()}")
93	93	raise RuntimeError(f"Failed to extract audio: {e.stderr.decode()}")
94	94	except Exception as e:
95	95	logger.error(f"Error extracting audio: {str(e)}")
96	96	raise
97		-
	97	+
98	98	def load_audio(self, audio_path: Union[str, Path]) -> Tuple[np.ndarray, int]:
99	99	"""
100	100	Load audio file into memory.
101		-
	101	+
102	102	Parameters
103	103	----------
104	104	audio_path : str or Path
105	105	Path to audio file
106		-
	106	+
107	107	Returns
108	108	-------
109	109	tuple
110	110	(audio_data, sample_rate)
111	111	"""
112	112	audio_path = Path(audio_path)
113	113	if not audio_path.exists():
114	114	raise FileNotFoundError(f"Audio file not found: {audio_path}")
115		-
	115	+
116	116	# Load audio data
117	117	audio_data, sr = librosa.load(
118		- audio_path,
119		- sr=self.sample_rate if self.sample_rate else None,
120		- mono=self.mono
	118	+ audio_path, sr=self.sample_rate if self.sample_rate else None, mono=self.mono
121	119	)
122		-
	120	+
123	121	logger.info(f"Loaded audio from {audio_path}: shape={audio_data.shape}, sr={sr}")
124	122	return audio_data, sr
125		-
	123	+
126	124	def get_audio_properties(self, audio_path: Union[str, Path]) -> Dict:
127	125	"""
128	126	Get properties of audio file.
129		-
	127	+
130	128	Parameters
131	129	----------
132	130	audio_path : str or Path
133	131	Path to audio file
134		-
	132	+
135	133	Returns
136	134	-------
137	135	dict
138	136	Audio properties (duration, sample_rate, channels, etc.)
139	137	"""
140	138	audio_path = Path(audio_path)
141	139	if not audio_path.exists():
142	140	raise FileNotFoundError(f"Audio file not found: {audio_path}")
143		-
	141	+
144	142	# Get audio info
145	143	info = sf.info(audio_path)
146		-
	144	+
147	145	properties = {
148	146	"duration": info.duration,
149	147	"sample_rate": info.samplerate,
150	148	"channels": info.channels,
151	149	"format": info.format,
152	150	"subtype": info.subtype,
153		- "path": str(audio_path)
	151	+ "path": str(audio_path),
154	152	}
155		-
	153	+
156	154	return properties
157		-
	155	+
158	156	def segment_audio(
159	157	self,
160	158	audio_data: np.ndarray,
161	159	sample_rate: int,
162	160	segment_length_ms: int = 30000,
163		- overlap_ms: int = 0
	161	+ overlap_ms: int = 0,
164	162	) -> list:
165	163	"""
166	164	Segment audio into chunks.
167		-
	165	+
168	166	Parameters
169	167	----------
170	168	audio_data : np.ndarray
171	169	Audio data
172	170	sample_rate : int
		@@ -173,65 +171,62 @@
173	171	Sample rate of audio
174	172	segment_length_ms : int
175	173	Length of segments in milliseconds
176	174	overlap_ms : int
177	175	Overlap between segments in milliseconds
178		-
	176	+
179	177	Returns
180	178	-------
181	179	list
182	180	List of audio segments as numpy arrays
183	181	"""
184	182	# Convert ms to samples
185	183	segment_length_samples = int(segment_length_ms * sample_rate / 1000)
186	184	overlap_samples = int(overlap_ms * sample_rate / 1000)
187		-
	185	+
188	186	# Calculate hop length
189	187	hop_length = segment_length_samples - overlap_samples
190		-
	188	+
191	189	# Initialize segments list
192	190	segments = []
193		-
	191	+
194	192	# Generate segments
195	193	for i in range(0, len(audio_data), hop_length):
196	194	end_idx = min(i + segment_length_samples, len(audio_data))
197	195	segment = audio_data[i:end_idx]
198		-
	196	+
199	197	# Only add if segment is long enough (at least 50% of target length)
200	198	if len(segment) >= segment_length_samples * 0.5:
201	199	segments.append(segment)
202		-
	200	+
203	201	# Break if we've reached the end
204	202	if end_idx == len(audio_data):
205	203	break
206		-
	204	+
207	205	logger.info(f"Segmented audio into {len(segments)} chunks")
208	206	return segments
209		-
	207	+
210	208	def save_segment(
211		- self,
212		- segment: np.ndarray,
213		- output_path: Union[str, Path],
214		- sample_rate: int
	209	+ self, segment: np.ndarray, output_path: Union[str, Path], sample_rate: int
215	210	) -> Path:
216	211	"""
217	212	Save audio segment to file.
218		-
	213	+
219	214	Parameters
220	215	----------
221	216	segment : np.ndarray
222	217	Audio segment data
223	218	output_path : str or Path
224	219	Path to save segment
225	220	sample_rate : int
226	221	Sample rate of segment
227		-
	222	+
228	223	Returns
229	224	-------
230	225	Path
231	226	Path to saved segment
232	227	"""
233	228	output_path = Path(output_path)
234	229	output_path.parent.mkdir(parents=True, exist_ok=True)
235		-
	230	+
236	231	sf.write(output_path, segment, sample_rate)
237	232	return output_path
238	233

	--- video_processor/extractors/audio_extractor.py
	+++ video_processor/extractors/audio_extractor.py
	@@ -1,172 +1,170 @@
1	"""Audio extraction and processing module for video analysis."""

2	import logging
3	import os
4	import subprocess
5	from pathlib import Path
6	from typing import Dict, Optional, Tuple, Union
7
8	import librosa
9	import numpy as np
10	import soundfile as sf
11
12	logger = logging.getLogger(__name__)

13
14	class AudioExtractor:
15	"""Extract and process audio from video files."""
16
17	def __init__(self, sample_rate: int = 16000, mono: bool = True):
18	"""
19	Initialize the audio extractor.
20
21	Parameters
22	----------
23	sample_rate : int
24	Target sample rate for extracted audio
25	mono : bool
26	Whether to convert audio to mono
27	"""
28	self.sample_rate = sample_rate
29	self.mono = mono
30
31	def extract_audio(
32	self,
33	video_path: Union[str, Path],
34	output_path: Optional[Union[str, Path]] = None,
35	format: str = "wav"
36	) -> Path:
37	"""
38	Extract audio from video file.
39
40	Parameters
41	----------
42	video_path : str or Path
43	Path to video file
44	output_path : str or Path, optional
45	Path to save extracted audio (if None, saves alongside video)
46	format : str
47	Audio format to save (wav, mp3, etc.)
48
49	Returns
50	-------
51	Path
52	Path to extracted audio file
53	"""
54	video_path = Path(video_path)
55	if not video_path.exists():
56	raise FileNotFoundError(f"Video file not found: {video_path}")
57
58	# Generate output path if not provided
59	if output_path is None:
60	output_path = video_path.with_suffix(f".{format}")
61	else:
62	output_path = Path(output_path)
63
64	# Ensure output directory exists
65	output_path.parent.mkdir(parents=True, exist_ok=True)
66
67	# Extract audio using ffmpeg
68	try:
69	cmd = [
70	"ffmpeg",
71	"-i", str(video_path),
72	"-vn", # No video
73	"-acodec", "pcm_s16le", # PCM 16-bit little-endian
74	"-ar", str(self.sample_rate), # Sample rate
75	"-ac", "1" if self.mono else "2", # Channels (mono or stereo)
76	"-y", # Overwrite output
77	str(output_path)
78	]
79
80	# Run ffmpeg command
81	result = subprocess.run(
82	cmd,
83	stdout=subprocess.PIPE,
84	stderr=subprocess.PIPE,
85	check=True
86	)
87
88	logger.info(f"Extracted audio from {video_path} to {output_path}")
89	return output_path
90
91	except subprocess.CalledProcessError as e:
92	logger.error(f"Failed to extract audio: {e.stderr.decode()}")
93	raise RuntimeError(f"Failed to extract audio: {e.stderr.decode()}")
94	except Exception as e:
95	logger.error(f"Error extracting audio: {str(e)}")
96	raise
97
98	def load_audio(self, audio_path: Union[str, Path]) -> Tuple[np.ndarray, int]:
99	"""
100	Load audio file into memory.
101
102	Parameters
103	----------
104	audio_path : str or Path
105	Path to audio file
106
107	Returns
108	-------
109	tuple
110	(audio_data, sample_rate)
111	"""
112	audio_path = Path(audio_path)
113	if not audio_path.exists():
114	raise FileNotFoundError(f"Audio file not found: {audio_path}")
115
116	# Load audio data
117	audio_data, sr = librosa.load(
118	audio_path,
119	sr=self.sample_rate if self.sample_rate else None,
120	mono=self.mono
121	)
122
123	logger.info(f"Loaded audio from {audio_path}: shape={audio_data.shape}, sr={sr}")
124	return audio_data, sr
125
126	def get_audio_properties(self, audio_path: Union[str, Path]) -> Dict:
127	"""
128	Get properties of audio file.
129
130	Parameters
131	----------
132	audio_path : str or Path
133	Path to audio file
134
135	Returns
136	-------
137	dict
138	Audio properties (duration, sample_rate, channels, etc.)
139	"""
140	audio_path = Path(audio_path)
141	if not audio_path.exists():
142	raise FileNotFoundError(f"Audio file not found: {audio_path}")
143
144	# Get audio info
145	info = sf.info(audio_path)
146
147	properties = {
148	"duration": info.duration,
149	"sample_rate": info.samplerate,
150	"channels": info.channels,
151	"format": info.format,
152	"subtype": info.subtype,
153	"path": str(audio_path)
154	}
155
156	return properties
157
158	def segment_audio(
159	self,
160	audio_data: np.ndarray,
161	sample_rate: int,
162	segment_length_ms: int = 30000,
163	overlap_ms: int = 0
164	) -> list:
165	"""
166	Segment audio into chunks.
167
168	Parameters
169	----------
170	audio_data : np.ndarray
171	Audio data
172	sample_rate : int
	@@ -173,65 +171,62 @@
173	Sample rate of audio
174	segment_length_ms : int
175	Length of segments in milliseconds
176	overlap_ms : int
177	Overlap between segments in milliseconds
178
179	Returns
180	-------
181	list
182	List of audio segments as numpy arrays
183	"""
184	# Convert ms to samples
185	segment_length_samples = int(segment_length_ms * sample_rate / 1000)
186	overlap_samples = int(overlap_ms * sample_rate / 1000)
187
188	# Calculate hop length
189	hop_length = segment_length_samples - overlap_samples
190
191	# Initialize segments list
192	segments = []
193
194	# Generate segments
195	for i in range(0, len(audio_data), hop_length):
196	end_idx = min(i + segment_length_samples, len(audio_data))
197	segment = audio_data[i:end_idx]
198
199	# Only add if segment is long enough (at least 50% of target length)
200	if len(segment) >= segment_length_samples * 0.5:
201	segments.append(segment)
202
203	# Break if we've reached the end
204	if end_idx == len(audio_data):
205	break
206
207	logger.info(f"Segmented audio into {len(segments)} chunks")
208	return segments
209
210	def save_segment(
211	self,
212	segment: np.ndarray,
213	output_path: Union[str, Path],
214	sample_rate: int
215	) -> Path:
216	"""
217	Save audio segment to file.
218
219	Parameters
220	----------
221	segment : np.ndarray
222	Audio segment data
223	output_path : str or Path
224	Path to save segment
225	sample_rate : int
226	Sample rate of segment
227
228	Returns
229	-------
230	Path
231	Path to saved segment
232	"""
233	output_path = Path(output_path)
234	output_path.parent.mkdir(parents=True, exist_ok=True)
235
236	sf.write(output_path, segment, sample_rate)
237	return output_path
238

	--- video_processor/extractors/audio_extractor.py
	+++ video_processor/extractors/audio_extractor.py
	@@ -1,172 +1,170 @@
1	"""Audio extraction and processing module for video analysis."""
2
3	import logging

4	import subprocess
5	from pathlib import Path
6	from typing import Dict, Optional, Tuple, Union
7
8	import librosa
9	import numpy as np
10	import soundfile as sf
11
12	logger = logging.getLogger(__name__)
13
14
15	class AudioExtractor:
16	"""Extract and process audio from video files."""
17
18	def __init__(self, sample_rate: int = 16000, mono: bool = True):
19	"""
20	Initialize the audio extractor.
21
22	Parameters
23	----------
24	sample_rate : int
25	Target sample rate for extracted audio
26	mono : bool
27	Whether to convert audio to mono
28	"""
29	self.sample_rate = sample_rate
30	self.mono = mono
31
32	def extract_audio(
33	self,
34	video_path: Union[str, Path],
35	output_path: Optional[Union[str, Path]] = None,
36	format: str = "wav",
37	) -> Path:
38	"""
39	Extract audio from video file.
40
41	Parameters
42	----------
43	video_path : str or Path
44	Path to video file
45	output_path : str or Path, optional
46	Path to save extracted audio (if None, saves alongside video)
47	format : str
48	Audio format to save (wav, mp3, etc.)
49
50	Returns
51	-------
52	Path
53	Path to extracted audio file
54	"""
55	video_path = Path(video_path)
56	if not video_path.exists():
57	raise FileNotFoundError(f"Video file not found: {video_path}")
58
59	# Generate output path if not provided
60	if output_path is None:
61	output_path = video_path.with_suffix(f".{format}")
62	else:
63	output_path = Path(output_path)
64
65	# Ensure output directory exists
66	output_path.parent.mkdir(parents=True, exist_ok=True)
67
68	# Extract audio using ffmpeg
69	try:
70	cmd = [
71	"ffmpeg",
72	"-i",
73	str(video_path),
74	"-vn", # No video
75	"-acodec",
76	"pcm_s16le", # PCM 16-bit little-endian
77	"-ar",
78	str(self.sample_rate), # Sample rate
79	"-ac",
80	"1" if self.mono else "2", # Channels (mono or stereo)
81	"-y", # Overwrite output
82	str(output_path),
83	]
84
85	# Run ffmpeg command
86	subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True)
87

88	logger.info(f"Extracted audio from {video_path} to {output_path}")
89	return output_path
90
91	except subprocess.CalledProcessError as e:
92	logger.error(f"Failed to extract audio: {e.stderr.decode()}")
93	raise RuntimeError(f"Failed to extract audio: {e.stderr.decode()}")
94	except Exception as e:
95	logger.error(f"Error extracting audio: {str(e)}")
96	raise
97
98	def load_audio(self, audio_path: Union[str, Path]) -> Tuple[np.ndarray, int]:
99	"""
100	Load audio file into memory.
101
102	Parameters
103	----------
104	audio_path : str or Path
105	Path to audio file
106
107	Returns
108	-------
109	tuple
110	(audio_data, sample_rate)
111	"""
112	audio_path = Path(audio_path)
113	if not audio_path.exists():
114	raise FileNotFoundError(f"Audio file not found: {audio_path}")
115
116	# Load audio data
117	audio_data, sr = librosa.load(
118	audio_path, sr=self.sample_rate if self.sample_rate else None, mono=self.mono


119	)
120
121	logger.info(f"Loaded audio from {audio_path}: shape={audio_data.shape}, sr={sr}")
122	return audio_data, sr
123
124	def get_audio_properties(self, audio_path: Union[str, Path]) -> Dict:
125	"""
126	Get properties of audio file.
127
128	Parameters
129	----------
130	audio_path : str or Path
131	Path to audio file
132
133	Returns
134	-------
135	dict
136	Audio properties (duration, sample_rate, channels, etc.)
137	"""
138	audio_path = Path(audio_path)
139	if not audio_path.exists():
140	raise FileNotFoundError(f"Audio file not found: {audio_path}")
141
142	# Get audio info
143	info = sf.info(audio_path)
144
145	properties = {
146	"duration": info.duration,
147	"sample_rate": info.samplerate,
148	"channels": info.channels,
149	"format": info.format,
150	"subtype": info.subtype,
151	"path": str(audio_path),
152	}
153
154	return properties
155
156	def segment_audio(
157	self,
158	audio_data: np.ndarray,
159	sample_rate: int,
160	segment_length_ms: int = 30000,
161	overlap_ms: int = 0,
162	) -> list:
163	"""
164	Segment audio into chunks.
165
166	Parameters
167	----------
168	audio_data : np.ndarray
169	Audio data
170	sample_rate : int
	@@ -173,65 +171,62 @@
171	Sample rate of audio
172	segment_length_ms : int
173	Length of segments in milliseconds
174	overlap_ms : int
175	Overlap between segments in milliseconds
176
177	Returns
178	-------
179	list
180	List of audio segments as numpy arrays
181	"""
182	# Convert ms to samples
183	segment_length_samples = int(segment_length_ms * sample_rate / 1000)
184	overlap_samples = int(overlap_ms * sample_rate / 1000)
185
186	# Calculate hop length
187	hop_length = segment_length_samples - overlap_samples
188
189	# Initialize segments list
190	segments = []
191
192	# Generate segments
193	for i in range(0, len(audio_data), hop_length):
194	end_idx = min(i + segment_length_samples, len(audio_data))
195	segment = audio_data[i:end_idx]
196
197	# Only add if segment is long enough (at least 50% of target length)
198	if len(segment) >= segment_length_samples * 0.5:
199	segments.append(segment)
200
201	# Break if we've reached the end
202	if end_idx == len(audio_data):
203	break
204
205	logger.info(f"Segmented audio into {len(segments)} chunks")
206	return segments
207
208	def save_segment(
209	self, segment: np.ndarray, output_path: Union[str, Path], sample_rate: int



210	) -> Path:
211	"""
212	Save audio segment to file.
213
214	Parameters
215	----------
216	segment : np.ndarray
217	Audio segment data
218	output_path : str or Path
219	Path to save segment
220	sample_rate : int
221	Sample rate of segment
222
223	Returns
224	-------
225	Path
226	Path to saved segment
227	"""
228	output_path = Path(output_path)
229	output_path.parent.mkdir(parents=True, exist_ok=True)
230
231	sf.write(output_path, segment, sample_rate)
232	return output_path
233

M video_processor/extractors/frame_extractor.py

+28 -15

		--- video_processor/extractors/frame_extractor.py
		+++ video_processor/extractors/frame_extractor.py
		@@ -1,6 +1,7 @@
1	1	"""Frame extraction module for video processing."""
	2	+
2	3	import functools
3	4	import logging
4	5	from pathlib import Path
5	6	from typing import List, Optional, Tuple, Union
6	7
		@@ -112,44 +113,49 @@
112	113	filtered.append(frame)
113	114
114	115	if removed:
115	116	logger.info(f"Filtered out {removed}/{len(frames)} people/webcam frames")
116	117	return filtered, removed
	118	+
117	119
118	120	def is_gpu_available() -> bool:
119	121	"""Check if GPU acceleration is available for OpenCV."""
120	122	try:
121	123	# Check if CUDA is available
122	124	count = cv2.cuda.getCudaEnabledDeviceCount()
123	125	return count > 0
124	126	except Exception:
125	127	return False
	128	+
126	129
127	130	def gpu_accelerated(func):
128	131	"""Decorator to use GPU implementation when available."""
	132	+
129	133	@functools.wraps(func)
130	134	def wrapper(args, *kwargs):
131		- if is_gpu_available() and not kwargs.get('disable_gpu'):
	135	+ if is_gpu_available() and not kwargs.get("disable_gpu"):
132	136	# Remove the disable_gpu kwarg if it exists
133		- kwargs.pop('disable_gpu', None)
	137	+ kwargs.pop("disable_gpu", None)
134	138	return func_gpu(args, *kwargs)
135	139	# Remove the disable_gpu kwarg if it exists
136		- kwargs.pop('disable_gpu', None)
	140	+ kwargs.pop("disable_gpu", None)
137	141	return func(args, *kwargs)
	142	+
138	143	return wrapper
	144	+
139	145
140	146	def calculate_frame_difference(prev_frame: np.ndarray, curr_frame: np.ndarray) -> float:
141	147	"""
142	148	Calculate the difference between two frames.
143		-
	149	+
144	150	Parameters
145	151	----------
146	152	prev_frame : np.ndarray
147	153	Previous frame
148	154	curr_frame : np.ndarray
149	155	Current frame
150		-
	156	+
151	157	Returns
152	158	-------
153	159	float
154	160	Difference score between 0 and 1
155	161	"""
		@@ -156,30 +162,31 @@
156	162	# Convert to grayscale
157	163	if len(prev_frame.shape) == 3:
158	164	prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
159	165	else:
160	166	prev_gray = prev_frame
161		-
	167	+
162	168	if len(curr_frame.shape) == 3:
163	169	curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)
164	170	else:
165	171	curr_gray = curr_frame
166		-
	172	+
167	173	# Calculate absolute difference
168	174	diff = cv2.absdiff(prev_gray, curr_gray)
169		-
	175	+
170	176	# Normalize and return mean difference
171	177	return np.mean(diff) / 255.0
	178	+
172	179
173	180	@gpu_accelerated
174	181	def extract_frames(
175	182	video_path: Union[str, Path],
176	183	sampling_rate: float = 1.0,
177	184	change_threshold: float = 0.15,
178	185	periodic_capture_seconds: float = 30.0,
179	186	max_frames: Optional[int] = None,
180		- resize_to: Optional[Tuple[int, int]] = None
	187	+ resize_to: Optional[Tuple[int, int]] = None,
181	188	) -> List[np.ndarray]:
182	189	"""
183	190	Extract frames from video based on visual change detection + periodic capture.
184	191
185	192	Two capture strategies work together:
		@@ -273,11 +280,13 @@
273	280	if diff > change_threshold:
274	281	should_capture = True
275	282	reason = f"change={diff:.3f}"
276	283
277	284	# Periodic capture — even if change is small
278		- elif periodic_interval > 0 and (frame_idx - last_capture_frame) >= periodic_interval:
	285	+ elif (
	286	+ periodic_interval > 0 and (frame_idx - last_capture_frame) >= periodic_interval
	287	+ ):
279	288	should_capture = True
280	289	reason = "periodic"
281	290
282	291	if should_capture:
283	292	extracted_frames.append(frame)
		@@ -299,41 +308,45 @@
299	308
300	309	pbar.close()
301	310	cap.release()
302	311	logger.info(f"Extracted {len(extracted_frames)} frames from {frame_count} total frames")
303	312	return extracted_frames
	313	+
304	314
305	315	def func_gpu(args, *kwargs):
306	316	"""GPU-accelerated version of extract_frames."""
307	317	# This would be implemented with CUDA acceleration
308	318	# For now, fall back to the unwrapped CPU version
309	319	logger.info("GPU acceleration not yet implemented, falling back to CPU")
310	320	return extract_frames.__wrapped__(args, *kwargs)
311	321
312		-def save_frames(frames: List[np.ndarray], output_dir: Union[str, Path], base_filename: str = "frame") -> List[Path]:
	322	+
	323	+def save_frames(
	324	+ frames: List[np.ndarray], output_dir: Union[str, Path], base_filename: str = "frame"
	325	+) -> List[Path]:
313	326	"""
314	327	Save extracted frames to disk.
315		-
	328	+
316	329	Parameters
317	330	----------
318	331	frames : list
319	332	List of frames to save
320	333	output_dir : str or Path
321	334	Directory to save frames in
322	335	base_filename : str
323	336	Base name for frame files
324		-
	337	+
325	338	Returns
326	339	-------
327	340	list
328	341	List of paths to saved frame files
329	342	"""
330	343	output_dir = Path(output_dir)
331	344	output_dir.mkdir(parents=True, exist_ok=True)
332		-
	345	+
333	346	saved_paths = []
334	347	for i, frame in enumerate(frames):
335	348	output_path = output_dir / f"{base_filename}_{i:04d}.jpg"
336	349	cv2.imwrite(str(output_path), frame)
337	350	saved_paths.append(output_path)
338		-
	351	+
339	352	return saved_paths
340	353

	--- video_processor/extractors/frame_extractor.py
	+++ video_processor/extractors/frame_extractor.py
	@@ -1,6 +1,7 @@
1	"""Frame extraction module for video processing."""

2	import functools
3	import logging
4	from pathlib import Path
5	from typing import List, Optional, Tuple, Union
6
	@@ -112,44 +113,49 @@
112	filtered.append(frame)
113
114	if removed:
115	logger.info(f"Filtered out {removed}/{len(frames)} people/webcam frames")
116	return filtered, removed

117
118	def is_gpu_available() -> bool:
119	"""Check if GPU acceleration is available for OpenCV."""
120	try:
121	# Check if CUDA is available
122	count = cv2.cuda.getCudaEnabledDeviceCount()
123	return count > 0
124	except Exception:
125	return False

126
127	def gpu_accelerated(func):
128	"""Decorator to use GPU implementation when available."""

129	@functools.wraps(func)
130	def wrapper(args, *kwargs):
131	if is_gpu_available() and not kwargs.get('disable_gpu'):
132	# Remove the disable_gpu kwarg if it exists
133	kwargs.pop('disable_gpu', None)
134	return func_gpu(args, *kwargs)
135	# Remove the disable_gpu kwarg if it exists
136	kwargs.pop('disable_gpu', None)
137	return func(args, *kwargs)

138	return wrapper

139
140	def calculate_frame_difference(prev_frame: np.ndarray, curr_frame: np.ndarray) -> float:
141	"""
142	Calculate the difference between two frames.
143
144	Parameters
145	----------
146	prev_frame : np.ndarray
147	Previous frame
148	curr_frame : np.ndarray
149	Current frame
150
151	Returns
152	-------
153	float
154	Difference score between 0 and 1
155	"""
	@@ -156,30 +162,31 @@
156	# Convert to grayscale
157	if len(prev_frame.shape) == 3:
158	prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
159	else:
160	prev_gray = prev_frame
161
162	if len(curr_frame.shape) == 3:
163	curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)
164	else:
165	curr_gray = curr_frame
166
167	# Calculate absolute difference
168	diff = cv2.absdiff(prev_gray, curr_gray)
169
170	# Normalize and return mean difference
171	return np.mean(diff) / 255.0

172
173	@gpu_accelerated
174	def extract_frames(
175	video_path: Union[str, Path],
176	sampling_rate: float = 1.0,
177	change_threshold: float = 0.15,
178	periodic_capture_seconds: float = 30.0,
179	max_frames: Optional[int] = None,
180	resize_to: Optional[Tuple[int, int]] = None
181	) -> List[np.ndarray]:
182	"""
183	Extract frames from video based on visual change detection + periodic capture.
184
185	Two capture strategies work together:
	@@ -273,11 +280,13 @@
273	if diff > change_threshold:
274	should_capture = True
275	reason = f"change={diff:.3f}"
276
277	# Periodic capture — even if change is small
278	elif periodic_interval > 0 and (frame_idx - last_capture_frame) >= periodic_interval:


279	should_capture = True
280	reason = "periodic"
281
282	if should_capture:
283	extracted_frames.append(frame)
	@@ -299,41 +308,45 @@
299
300	pbar.close()
301	cap.release()
302	logger.info(f"Extracted {len(extracted_frames)} frames from {frame_count} total frames")
303	return extracted_frames

304
305	def func_gpu(args, *kwargs):
306	"""GPU-accelerated version of extract_frames."""
307	# This would be implemented with CUDA acceleration
308	# For now, fall back to the unwrapped CPU version
309	logger.info("GPU acceleration not yet implemented, falling back to CPU")
310	return extract_frames.__wrapped__(args, *kwargs)
311
312	def save_frames(frames: List[np.ndarray], output_dir: Union[str, Path], base_filename: str = "frame") -> List[Path]:



313	"""
314	Save extracted frames to disk.
315
316	Parameters
317	----------
318	frames : list
319	List of frames to save
320	output_dir : str or Path
321	Directory to save frames in
322	base_filename : str
323	Base name for frame files
324
325	Returns
326	-------
327	list
328	List of paths to saved frame files
329	"""
330	output_dir = Path(output_dir)
331	output_dir.mkdir(parents=True, exist_ok=True)
332
333	saved_paths = []
334	for i, frame in enumerate(frames):
335	output_path = output_dir / f"{base_filename}_{i:04d}.jpg"
336	cv2.imwrite(str(output_path), frame)
337	saved_paths.append(output_path)
338
339	return saved_paths
340

	--- video_processor/extractors/frame_extractor.py
	+++ video_processor/extractors/frame_extractor.py
	@@ -1,6 +1,7 @@
1	"""Frame extraction module for video processing."""
2
3	import functools
4	import logging
5	from pathlib import Path
6	from typing import List, Optional, Tuple, Union
7
	@@ -112,44 +113,49 @@
113	filtered.append(frame)
114
115	if removed:
116	logger.info(f"Filtered out {removed}/{len(frames)} people/webcam frames")
117	return filtered, removed
118
119
120	def is_gpu_available() -> bool:
121	"""Check if GPU acceleration is available for OpenCV."""
122	try:
123	# Check if CUDA is available
124	count = cv2.cuda.getCudaEnabledDeviceCount()
125	return count > 0
126	except Exception:
127	return False
128
129
130	def gpu_accelerated(func):
131	"""Decorator to use GPU implementation when available."""
132
133	@functools.wraps(func)
134	def wrapper(args, *kwargs):
135	if is_gpu_available() and not kwargs.get("disable_gpu"):
136	# Remove the disable_gpu kwarg if it exists
137	kwargs.pop("disable_gpu", None)
138	return func_gpu(args, *kwargs)
139	# Remove the disable_gpu kwarg if it exists
140	kwargs.pop("disable_gpu", None)
141	return func(args, *kwargs)
142
143	return wrapper
144
145
146	def calculate_frame_difference(prev_frame: np.ndarray, curr_frame: np.ndarray) -> float:
147	"""
148	Calculate the difference between two frames.
149
150	Parameters
151	----------
152	prev_frame : np.ndarray
153	Previous frame
154	curr_frame : np.ndarray
155	Current frame
156
157	Returns
158	-------
159	float
160	Difference score between 0 and 1
161	"""
	@@ -156,30 +162,31 @@
162	# Convert to grayscale
163	if len(prev_frame.shape) == 3:
164	prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
165	else:
166	prev_gray = prev_frame
167
168	if len(curr_frame.shape) == 3:
169	curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)
170	else:
171	curr_gray = curr_frame
172
173	# Calculate absolute difference
174	diff = cv2.absdiff(prev_gray, curr_gray)
175
176	# Normalize and return mean difference
177	return np.mean(diff) / 255.0
178
179
180	@gpu_accelerated
181	def extract_frames(
182	video_path: Union[str, Path],
183	sampling_rate: float = 1.0,
184	change_threshold: float = 0.15,
185	periodic_capture_seconds: float = 30.0,
186	max_frames: Optional[int] = None,
187	resize_to: Optional[Tuple[int, int]] = None,
188	) -> List[np.ndarray]:
189	"""
190	Extract frames from video based on visual change detection + periodic capture.
191
192	Two capture strategies work together:
	@@ -273,11 +280,13 @@
280	if diff > change_threshold:
281	should_capture = True
282	reason = f"change={diff:.3f}"
283
284	# Periodic capture — even if change is small
285	elif (
286	periodic_interval > 0 and (frame_idx - last_capture_frame) >= periodic_interval
287	):
288	should_capture = True
289	reason = "periodic"
290
291	if should_capture:
292	extracted_frames.append(frame)
	@@ -299,41 +308,45 @@
308
309	pbar.close()
310	cap.release()
311	logger.info(f"Extracted {len(extracted_frames)} frames from {frame_count} total frames")
312	return extracted_frames
313
314
315	def func_gpu(args, *kwargs):
316	"""GPU-accelerated version of extract_frames."""
317	# This would be implemented with CUDA acceleration
318	# For now, fall back to the unwrapped CPU version
319	logger.info("GPU acceleration not yet implemented, falling back to CPU")
320	return extract_frames.__wrapped__(args, *kwargs)
321
322
323	def save_frames(
324	frames: List[np.ndarray], output_dir: Union[str, Path], base_filename: str = "frame"
325	) -> List[Path]:
326	"""
327	Save extracted frames to disk.
328
329	Parameters
330	----------
331	frames : list
332	List of frames to save
333	output_dir : str or Path
334	Directory to save frames in
335	base_filename : str
336	Base name for frame files
337
338	Returns
339	-------
340	list
341	List of paths to saved frame files
342	"""
343	output_dir = Path(output_dir)
344	output_dir.mkdir(parents=True, exist_ok=True)
345
346	saved_paths = []
347	for i, frame in enumerate(frames):
348	output_path = output_dir / f"{base_filename}_{i:04d}.jpg"
349	cv2.imwrite(str(output_path), frame)
350	saved_paths.append(output_path)
351
352	return saved_paths
353

M video_processor/extractors/text_extractor.py

+73 -72

		--- video_processor/extractors/text_extractor.py
		+++ video_processor/extractors/text_extractor.py
		@@ -1,48 +1,51 @@
1	1	"""Text extraction module for frames and diagrams."""
	2	+
2	3	import logging
3	4	from pathlib import Path
4	5	from typing import Dict, List, Optional, Tuple, Union
5	6
6	7	import cv2
7	8	import numpy as np
8	9
9	10	logger = logging.getLogger(__name__)
	11	+
10	12
11	13	class TextExtractor:
12	14	"""Extract text from images, frames, and diagrams."""
13		-
	15	+
14	16	def __init__(self, tesseract_path: Optional[str] = None):
15	17	"""
16	18	Initialize text extractor.
17		-
	19	+
18	20	Parameters
19	21	----------
20	22	tesseract_path : str, optional
21	23	Path to tesseract executable for local OCR
22	24	"""
23	25	self.tesseract_path = tesseract_path
24		-
	26	+
25	27	# Check if we're using tesseract locally
26	28	self.use_local_ocr = False
27	29	if tesseract_path:
28	30	try:
29	31	import pytesseract
	32	+
30	33	pytesseract.pytesseract.tesseract_cmd = tesseract_path
31	34	self.use_local_ocr = True
32	35	except ImportError:
33	36	logger.warning("pytesseract not installed, local OCR unavailable")
34		-
	37	+
35	38	def preprocess_image(self, image: np.ndarray) -> np.ndarray:
36	39	"""
37	40	Preprocess image for better text extraction.
38		-
	41	+
39	42	Parameters
40	43	----------
41	44	image : np.ndarray
42	45	Input image
43		-
	46	+
44	47	Returns
45	48	-------
46	49	np.ndarray
47	50	Preprocessed image
48	51	"""
		@@ -49,66 +52,61 @@
49	52	# Convert to grayscale if not already
50	53	if len(image.shape) == 3:
51	54	gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
52	55	else:
53	56	gray = image
54		-
	57	+
55	58	# Apply adaptive thresholding
56	59	thresh = cv2.adaptiveThreshold(
57		- gray,
58		- 255,
59		- cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
60		- cv2.THRESH_BINARY_INV,
61		- 11,
62		- 2
63		- )
64		-
	60	+ gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2
	61	+ )
	62	+
65	63	# Noise removal
66	64	kernel = np.ones((1, 1), np.uint8)
67	65	opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel)
68		-
	66	+
69	67	# Invert back
70	68	result = cv2.bitwise_not(opening)
71		-
	69	+
72	70	return result
73		-
	71	+
74	72	def extract_text_local(self, image: np.ndarray) -> str:
75	73	"""
76	74	Extract text from image using local OCR (Tesseract).
77		-
	75	+
78	76	Parameters
79	77	----------
80	78	image : np.ndarray
81	79	Input image
82		-
	80	+
83	81	Returns
84	82	-------
85	83	str
86	84	Extracted text
87	85	"""
88	86	if not self.use_local_ocr:
89	87	raise RuntimeError("Local OCR not configured")
90		-
	88	+
91	89	import pytesseract
92		-
	90	+
93	91	# Preprocess image
94	92	processed = self.preprocess_image(image)
95		-
	93	+
96	94	# Extract text
97	95	text = pytesseract.image_to_string(processed)
98		-
	96	+
99	97	return text
100		-
	98	+
101	99	def detect_text_regions(self, image: np.ndarray) -> List[Tuple[int, int, int, int]]:
102	100	"""
103	101	Detect potential text regions in image.
104		-
	102	+
105	103	Parameters
106	104	----------
107	105	image : np.ndarray
108	106	Input image
109		-
	107	+
110	108	Returns
111	109	-------
112	110	list
113	111	List of bounding boxes for text regions (x, y, w, h)
114	112	"""
		@@ -115,179 +113,182 @@
115	113	# Convert to grayscale
116	114	if len(image.shape) == 3:
117	115	gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
118	116	else:
119	117	gray = image
120		-
	118	+
121	119	# Apply MSER (Maximally Stable Extremal Regions)
122	120	mser = cv2.MSER_create()
123	121	regions, _ = mser.detectRegions(gray)
124		-
	122	+
125	123	# Convert regions to bounding boxes
126	124	bboxes = []
127	125	for region in regions:
128	126	x, y, w, h = cv2.boundingRect(region.reshape(-1, 1, 2))
129		-
	127	+
130	128	# Apply filtering criteria for text-like regions
131	129	aspect_ratio = w / float(h)
132	130	if 0.1 < aspect_ratio < 10 and h > 5 and w > 5:
133	131	bboxes.append((x, y, w, h))
134		-
	132	+
135	133	# Merge overlapping boxes
136	134	merged_bboxes = self._merge_overlapping_boxes(bboxes)
137		-
	135	+
138	136	logger.debug(f"Detected {len(merged_bboxes)} text regions")
139	137	return merged_bboxes
140		-
141		- def _merge_overlapping_boxes(self, boxes: List[Tuple[int, int, int, int]]) -> List[Tuple[int, int, int, int]]:
	138	+
	139	+ def _merge_overlapping_boxes(
	140	+ self, boxes: List[Tuple[int, int, int, int]]
	141	+ ) -> List[Tuple[int, int, int, int]]:
142	142	"""
143	143	Merge overlapping bounding boxes.
144		-
	144	+
145	145	Parameters
146	146	----------
147	147	boxes : list
148	148	List of bounding boxes (x, y, w, h)
149		-
	149	+
150	150	Returns
151	151	-------
152	152	list
153	153	Merged bounding boxes
154	154	"""
155	155	if not boxes:
156	156	return []
157		-
	157	+
158	158	# Sort boxes by x coordinate
159	159	sorted_boxes = sorted(boxes, key=lambda b: b[0])
160		-
	160	+
161	161	merged = []
162	162	current = list(sorted_boxes[0])
163		-
	163	+
164	164	for box in sorted_boxes[1:]:
165	165	# Check if current box overlaps with the next one
166		- if (current[0] <= box[0] + box[2] and
167		- box[0] <= current[0] + current[2] and
168		- current[1] <= box[1] + box[3] and
169		- box[1] <= current[1] + current[3]):
170		-
	166	+ if (
	167	+ current[0] <= box[0] + box[2]
	168	+ and box[0] <= current[0] + current[2]
	169	+ and current[1] <= box[1] + box[3]
	170	+ and box[1] <= current[1] + current[3]
	171	+ ):
171	172	# Calculate merged box
172	173	x1 = min(current[0], box[0])
173	174	y1 = min(current[1], box[1])
174	175	x2 = max(current[0] + current[2], box[0] + box[2])
175	176	y2 = max(current[1] + current[3], box[1] + box[3])
176		-
	177	+
177	178	# Update current box
178	179	current = [x1, y1, x2 - x1, y2 - y1]
179	180	else:
180	181	# Add current box to merged list and update current
181	182	merged.append(tuple(current))
182	183	current = list(box)
183		-
	184	+
184	185	# Add the last box
185	186	merged.append(tuple(current))
186		-
	187	+
187	188	return merged
188		-
	189	+
189	190	def extract_text_from_regions(
190		- self,
191		- image: np.ndarray,
192		- regions: List[Tuple[int, int, int, int]]
	191	+ self, image: np.ndarray, regions: List[Tuple[int, int, int, int]]
193	192	) -> Dict[Tuple[int, int, int, int], str]:
194	193	"""
195	194	Extract text from specified regions in image.
196		-
	195	+
197	196	Parameters
198	197	----------
199	198	image : np.ndarray
200	199	Input image
201	200	regions : list
202	201	List of regions as (x, y, w, h)
203		-
	202	+
204	203	Returns
205	204	-------
206	205	dict
207	206	Dictionary of {region: text}
208	207	"""
209	208	results = {}
210		-
	209	+
211	210	for region in regions:
212	211	x, y, w, h = region
213		-
	212	+
214	213	# Extract region
215		- roi = image[y:y+h, x:x+w]
216		-
	214	+ roi = image[y : y + h, x : x + w]
	215	+
217	216	# Skip empty regions
218	217	if roi.size == 0:
219	218	continue
220		-
	219	+
221	220	# Extract text
222	221	if self.use_local_ocr:
223	222	text = self.extract_text_local(roi)
224	223	else:
225	224	text = "API-based text extraction not yet implemented"
226		-
	225	+
227	226	# Store non-empty results
228	227	if text.strip():
229	228	results[region] = text.strip()
230		-
	229	+
231	230	return results
232		-
	231	+
233	232	def extract_text_from_image(self, image: np.ndarray, detect_regions: bool = True) -> str:
234	233	"""
235	234	Extract text from entire image.
236		-
	235	+
237	236	Parameters
238	237	----------
239	238	image : np.ndarray
240	239	Input image
241	240	detect_regions : bool
242	241	Whether to detect and process text regions separately
243		-
	242	+
244	243	Returns
245	244	-------
246	245	str
247	246	Extracted text
248	247	"""
249	248	if detect_regions:
250	249	# Detect regions and extract text from each
251	250	regions = self.detect_text_regions(image)
252	251	region_texts = self.extract_text_from_regions(image, regions)
253		-
	252	+
254	253	# Combine text from all regions
255	254	text = "\n".join(region_texts.values())
256	255	else:
257	256	# Extract text from entire image
258	257	if self.use_local_ocr:
259	258	text = self.extract_text_local(image)
260	259	else:
261	260	text = "API-based text extraction not yet implemented"
262		-
	261	+
263	262	return text
264		-
265		- def extract_text_from_file(self, image_path: Union[str, Path], detect_regions: bool = True) -> str:
	263	+
	264	+ def extract_text_from_file(
	265	+ self, image_path: Union[str, Path], detect_regions: bool = True
	266	+ ) -> str:
266	267	"""
267	268	Extract text from image file.
268		-
	269	+
269	270	Parameters
270	271	----------
271	272	image_path : str or Path
272	273	Path to image file
273	274	detect_regions : bool
274	275	Whether to detect and process text regions separately
275		-
	276	+
276	277	Returns
277	278	-------
278	279	str
279	280	Extracted text
280	281	"""
281	282	image_path = Path(image_path)
282	283	if not image_path.exists():
283	284	raise FileNotFoundError(f"Image file not found: {image_path}")
284		-
	285	+
285	286	# Load image
286	287	image = cv2.imread(str(image_path))
287	288	if image is None:
288	289	raise ValueError(f"Failed to load image: {image_path}")
289		-
	290	+
290	291	# Extract text
291	292	text = self.extract_text_from_image(image, detect_regions)
292		-
	293	+
293	294	return text
294	295

	--- video_processor/extractors/text_extractor.py
	+++ video_processor/extractors/text_extractor.py
	@@ -1,48 +1,51 @@
1	"""Text extraction module for frames and diagrams."""

2	import logging
3	from pathlib import Path
4	from typing import Dict, List, Optional, Tuple, Union
5
6	import cv2
7	import numpy as np
8
9	logger = logging.getLogger(__name__)

10
11	class TextExtractor:
12	"""Extract text from images, frames, and diagrams."""
13
14	def __init__(self, tesseract_path: Optional[str] = None):
15	"""
16	Initialize text extractor.
17
18	Parameters
19	----------
20	tesseract_path : str, optional
21	Path to tesseract executable for local OCR
22	"""
23	self.tesseract_path = tesseract_path
24
25	# Check if we're using tesseract locally
26	self.use_local_ocr = False
27	if tesseract_path:
28	try:
29	import pytesseract

30	pytesseract.pytesseract.tesseract_cmd = tesseract_path
31	self.use_local_ocr = True
32	except ImportError:
33	logger.warning("pytesseract not installed, local OCR unavailable")
34
35	def preprocess_image(self, image: np.ndarray) -> np.ndarray:
36	"""
37	Preprocess image for better text extraction.
38
39	Parameters
40	----------
41	image : np.ndarray
42	Input image
43
44	Returns
45	-------
46	np.ndarray
47	Preprocessed image
48	"""
	@@ -49,66 +52,61 @@
49	# Convert to grayscale if not already
50	if len(image.shape) == 3:
51	gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
52	else:
53	gray = image
54
55	# Apply adaptive thresholding
56	thresh = cv2.adaptiveThreshold(
57	gray,
58	255,
59	cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
60	cv2.THRESH_BINARY_INV,
61	11,
62	2
63	)
64
65	# Noise removal
66	kernel = np.ones((1, 1), np.uint8)
67	opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel)
68
69	# Invert back
70	result = cv2.bitwise_not(opening)
71
72	return result
73
74	def extract_text_local(self, image: np.ndarray) -> str:
75	"""
76	Extract text from image using local OCR (Tesseract).
77
78	Parameters
79	----------
80	image : np.ndarray
81	Input image
82
83	Returns
84	-------
85	str
86	Extracted text
87	"""
88	if not self.use_local_ocr:
89	raise RuntimeError("Local OCR not configured")
90
91	import pytesseract
92
93	# Preprocess image
94	processed = self.preprocess_image(image)
95
96	# Extract text
97	text = pytesseract.image_to_string(processed)
98
99	return text
100
101	def detect_text_regions(self, image: np.ndarray) -> List[Tuple[int, int, int, int]]:
102	"""
103	Detect potential text regions in image.
104
105	Parameters
106	----------
107	image : np.ndarray
108	Input image
109
110	Returns
111	-------
112	list
113	List of bounding boxes for text regions (x, y, w, h)
114	"""
	@@ -115,179 +113,182 @@
115	# Convert to grayscale
116	if len(image.shape) == 3:
117	gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
118	else:
119	gray = image
120
121	# Apply MSER (Maximally Stable Extremal Regions)
122	mser = cv2.MSER_create()
123	regions, _ = mser.detectRegions(gray)
124
125	# Convert regions to bounding boxes
126	bboxes = []
127	for region in regions:
128	x, y, w, h = cv2.boundingRect(region.reshape(-1, 1, 2))
129
130	# Apply filtering criteria for text-like regions
131	aspect_ratio = w / float(h)
132	if 0.1 < aspect_ratio < 10 and h > 5 and w > 5:
133	bboxes.append((x, y, w, h))
134
135	# Merge overlapping boxes
136	merged_bboxes = self._merge_overlapping_boxes(bboxes)
137
138	logger.debug(f"Detected {len(merged_bboxes)} text regions")
139	return merged_bboxes
140
141	def _merge_overlapping_boxes(self, boxes: List[Tuple[int, int, int, int]]) -> List[Tuple[int, int, int, int]]:


142	"""
143	Merge overlapping bounding boxes.
144
145	Parameters
146	----------
147	boxes : list
148	List of bounding boxes (x, y, w, h)
149
150	Returns
151	-------
152	list
153	Merged bounding boxes
154	"""
155	if not boxes:
156	return []
157
158	# Sort boxes by x coordinate
159	sorted_boxes = sorted(boxes, key=lambda b: b[0])
160
161	merged = []
162	current = list(sorted_boxes[0])
163
164	for box in sorted_boxes[1:]:
165	# Check if current box overlaps with the next one
166	if (current[0] <= box[0] + box[2] and
167	box[0] <= current[0] + current[2] and
168	current[1] <= box[1] + box[3] and
169	box[1] <= current[1] + current[3]):
170

171	# Calculate merged box
172	x1 = min(current[0], box[0])
173	y1 = min(current[1], box[1])
174	x2 = max(current[0] + current[2], box[0] + box[2])
175	y2 = max(current[1] + current[3], box[1] + box[3])
176
177	# Update current box
178	current = [x1, y1, x2 - x1, y2 - y1]
179	else:
180	# Add current box to merged list and update current
181	merged.append(tuple(current))
182	current = list(box)
183
184	# Add the last box
185	merged.append(tuple(current))
186
187	return merged
188
189	def extract_text_from_regions(
190	self,
191	image: np.ndarray,
192	regions: List[Tuple[int, int, int, int]]
193	) -> Dict[Tuple[int, int, int, int], str]:
194	"""
195	Extract text from specified regions in image.
196
197	Parameters
198	----------
199	image : np.ndarray
200	Input image
201	regions : list
202	List of regions as (x, y, w, h)
203
204	Returns
205	-------
206	dict
207	Dictionary of {region: text}
208	"""
209	results = {}
210
211	for region in regions:
212	x, y, w, h = region
213
214	# Extract region
215	roi = image[y:y+h, x:x+w]
216
217	# Skip empty regions
218	if roi.size == 0:
219	continue
220
221	# Extract text
222	if self.use_local_ocr:
223	text = self.extract_text_local(roi)
224	else:
225	text = "API-based text extraction not yet implemented"
226
227	# Store non-empty results
228	if text.strip():
229	results[region] = text.strip()
230
231	return results
232
233	def extract_text_from_image(self, image: np.ndarray, detect_regions: bool = True) -> str:
234	"""
235	Extract text from entire image.
236
237	Parameters
238	----------
239	image : np.ndarray
240	Input image
241	detect_regions : bool
242	Whether to detect and process text regions separately
243
244	Returns
245	-------
246	str
247	Extracted text
248	"""
249	if detect_regions:
250	# Detect regions and extract text from each
251	regions = self.detect_text_regions(image)
252	region_texts = self.extract_text_from_regions(image, regions)
253
254	# Combine text from all regions
255	text = "\n".join(region_texts.values())
256	else:
257	# Extract text from entire image
258	if self.use_local_ocr:
259	text = self.extract_text_local(image)
260	else:
261	text = "API-based text extraction not yet implemented"
262
263	return text
264
265	def extract_text_from_file(self, image_path: Union[str, Path], detect_regions: bool = True) -> str:


266	"""
267	Extract text from image file.
268
269	Parameters
270	----------
271	image_path : str or Path
272	Path to image file
273	detect_regions : bool
274	Whether to detect and process text regions separately
275
276	Returns
277	-------
278	str
279	Extracted text
280	"""
281	image_path = Path(image_path)
282	if not image_path.exists():
283	raise FileNotFoundError(f"Image file not found: {image_path}")
284
285	# Load image
286	image = cv2.imread(str(image_path))
287	if image is None:
288	raise ValueError(f"Failed to load image: {image_path}")
289
290	# Extract text
291	text = self.extract_text_from_image(image, detect_regions)
292
293	return text
294

	--- video_processor/extractors/text_extractor.py
	+++ video_processor/extractors/text_extractor.py
	@@ -1,48 +1,51 @@
1	"""Text extraction module for frames and diagrams."""
2
3	import logging
4	from pathlib import Path
5	from typing import Dict, List, Optional, Tuple, Union
6
7	import cv2
8	import numpy as np
9
10	logger = logging.getLogger(__name__)
11
12
13	class TextExtractor:
14	"""Extract text from images, frames, and diagrams."""
15
16	def __init__(self, tesseract_path: Optional[str] = None):
17	"""
18	Initialize text extractor.
19
20	Parameters
21	----------
22	tesseract_path : str, optional
23	Path to tesseract executable for local OCR
24	"""
25	self.tesseract_path = tesseract_path
26
27	# Check if we're using tesseract locally
28	self.use_local_ocr = False
29	if tesseract_path:
30	try:
31	import pytesseract
32
33	pytesseract.pytesseract.tesseract_cmd = tesseract_path
34	self.use_local_ocr = True
35	except ImportError:
36	logger.warning("pytesseract not installed, local OCR unavailable")
37
38	def preprocess_image(self, image: np.ndarray) -> np.ndarray:
39	"""
40	Preprocess image for better text extraction.
41
42	Parameters
43	----------
44	image : np.ndarray
45	Input image
46
47	Returns
48	-------
49	np.ndarray
50	Preprocessed image
51	"""
	@@ -49,66 +52,61 @@
52	# Convert to grayscale if not already
53	if len(image.shape) == 3:
54	gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
55	else:
56	gray = image
57
58	# Apply adaptive thresholding
59	thresh = cv2.adaptiveThreshold(
60	gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2
61	)
62





63	# Noise removal
64	kernel = np.ones((1, 1), np.uint8)
65	opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel)
66
67	# Invert back
68	result = cv2.bitwise_not(opening)
69
70	return result
71
72	def extract_text_local(self, image: np.ndarray) -> str:
73	"""
74	Extract text from image using local OCR (Tesseract).
75
76	Parameters
77	----------
78	image : np.ndarray
79	Input image
80
81	Returns
82	-------
83	str
84	Extracted text
85	"""
86	if not self.use_local_ocr:
87	raise RuntimeError("Local OCR not configured")
88
89	import pytesseract
90
91	# Preprocess image
92	processed = self.preprocess_image(image)
93
94	# Extract text
95	text = pytesseract.image_to_string(processed)
96
97	return text
98
99	def detect_text_regions(self, image: np.ndarray) -> List[Tuple[int, int, int, int]]:
100	"""
101	Detect potential text regions in image.
102
103	Parameters
104	----------
105	image : np.ndarray
106	Input image
107
108	Returns
109	-------
110	list
111	List of bounding boxes for text regions (x, y, w, h)
112	"""
	@@ -115,179 +113,182 @@
113	# Convert to grayscale
114	if len(image.shape) == 3:
115	gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
116	else:
117	gray = image
118
119	# Apply MSER (Maximally Stable Extremal Regions)
120	mser = cv2.MSER_create()
121	regions, _ = mser.detectRegions(gray)
122
123	# Convert regions to bounding boxes
124	bboxes = []
125	for region in regions:
126	x, y, w, h = cv2.boundingRect(region.reshape(-1, 1, 2))
127
128	# Apply filtering criteria for text-like regions
129	aspect_ratio = w / float(h)
130	if 0.1 < aspect_ratio < 10 and h > 5 and w > 5:
131	bboxes.append((x, y, w, h))
132
133	# Merge overlapping boxes
134	merged_bboxes = self._merge_overlapping_boxes(bboxes)
135
136	logger.debug(f"Detected {len(merged_bboxes)} text regions")
137	return merged_bboxes
138
139	def _merge_overlapping_boxes(
140	self, boxes: List[Tuple[int, int, int, int]]
141	) -> List[Tuple[int, int, int, int]]:
142	"""
143	Merge overlapping bounding boxes.
144
145	Parameters
146	----------
147	boxes : list
148	List of bounding boxes (x, y, w, h)
149
150	Returns
151	-------
152	list
153	Merged bounding boxes
154	"""
155	if not boxes:
156	return []
157
158	# Sort boxes by x coordinate
159	sorted_boxes = sorted(boxes, key=lambda b: b[0])
160
161	merged = []
162	current = list(sorted_boxes[0])
163
164	for box in sorted_boxes[1:]:
165	# Check if current box overlaps with the next one
166	if (
167	current[0] <= box[0] + box[2]
168	and box[0] <= current[0] + current[2]
169	and current[1] <= box[1] + box[3]
170	and box[1] <= current[1] + current[3]
171	):
172	# Calculate merged box
173	x1 = min(current[0], box[0])
174	y1 = min(current[1], box[1])
175	x2 = max(current[0] + current[2], box[0] + box[2])
176	y2 = max(current[1] + current[3], box[1] + box[3])
177
178	# Update current box
179	current = [x1, y1, x2 - x1, y2 - y1]
180	else:
181	# Add current box to merged list and update current
182	merged.append(tuple(current))
183	current = list(box)
184
185	# Add the last box
186	merged.append(tuple(current))
187
188	return merged
189
190	def extract_text_from_regions(
191	self, image: np.ndarray, regions: List[Tuple[int, int, int, int]]


192	) -> Dict[Tuple[int, int, int, int], str]:
193	"""
194	Extract text from specified regions in image.
195
196	Parameters
197	----------
198	image : np.ndarray
199	Input image
200	regions : list
201	List of regions as (x, y, w, h)
202
203	Returns
204	-------
205	dict
206	Dictionary of {region: text}
207	"""
208	results = {}
209
210	for region in regions:
211	x, y, w, h = region
212
213	# Extract region
214	roi = image[y : y + h, x : x + w]
215
216	# Skip empty regions
217	if roi.size == 0:
218	continue
219
220	# Extract text
221	if self.use_local_ocr:
222	text = self.extract_text_local(roi)
223	else:
224	text = "API-based text extraction not yet implemented"
225
226	# Store non-empty results
227	if text.strip():
228	results[region] = text.strip()
229
230	return results
231
232	def extract_text_from_image(self, image: np.ndarray, detect_regions: bool = True) -> str:
233	"""
234	Extract text from entire image.
235
236	Parameters
237	----------
238	image : np.ndarray
239	Input image
240	detect_regions : bool
241	Whether to detect and process text regions separately
242
243	Returns
244	-------
245	str
246	Extracted text
247	"""
248	if detect_regions:
249	# Detect regions and extract text from each
250	regions = self.detect_text_regions(image)
251	region_texts = self.extract_text_from_regions(image, regions)
252
253	# Combine text from all regions
254	text = "\n".join(region_texts.values())
255	else:
256	# Extract text from entire image
257	if self.use_local_ocr:
258	text = self.extract_text_local(image)
259	else:
260	text = "API-based text extraction not yet implemented"
261
262	return text
263
264	def extract_text_from_file(
265	self, image_path: Union[str, Path], detect_regions: bool = True
266	) -> str:
267	"""
268	Extract text from image file.
269
270	Parameters
271	----------
272	image_path : str or Path
273	Path to image file
274	detect_regions : bool
275	Whether to detect and process text regions separately
276
277	Returns
278	-------
279	str
280	Extracted text
281	"""
282	image_path = Path(image_path)
283	if not image_path.exists():
284	raise FileNotFoundError(f"Image file not found: {image_path}")
285
286	# Load image
287	image = cv2.imread(str(image_path))
288	if image is None:
289	raise ValueError(f"Failed to load image: {image_path}")
290
291	# Extract text
292	text = self.extract_text_from_image(image, detect_regions)
293
294	return text
295

M video_processor/integrators/knowledge_graph.py

+72 -52

		--- video_processor/integrators/knowledge_graph.py
		+++ video_processor/integrators/knowledge_graph.py
		@@ -1,8 +1,7 @@
1	1	"""Knowledge graph integration for organizing extracted content."""
2	2
3		-import json
4	3	import logging
5	4	from pathlib import Path
6	5	from typing import Dict, List, Optional, Union
7	6
8	7	from tqdm import tqdm
		@@ -33,18 +32,24 @@
33	32	[{"role": "user", "content": prompt}],
34	33	max_tokens=4096,
35	34	temperature=temperature,
36	35	)
37	36
38		- def extract_entities_and_relationships(self, text: str) -> tuple[List[Entity], List[Relationship]]:
	37	+ def extract_entities_and_relationships(
	38	+ self, text: str
	39	+ ) -> tuple[List[Entity], List[Relationship]]:
39	40	"""Extract entities and relationships in a single LLM call."""
40	41	prompt = (
41	42	"Extract all notable entities and relationships from the following content.\n\n"
42	43	f"CONTENT:\n{text}\n\n"
43	44	"Return a JSON object with two keys:\n"
44		- '- "entities": array of {"name": "...", "type": "person\|concept\|technology\|organization\|time", "description": "brief description"}\n'
45		- '- "relationships": array of {"source": "entity name", "target": "entity name", "type": "relationship description"}\n\n'
	45	+ '- "entities": array of {"name": "...", '
	46	+ '"type": "person\|concept\|technology\|organization\|time", '
	47	+ '"description": "brief description"}\n'
	48	+ '- "relationships": array of {"source": "entity name", '
	49	+ '"target": "entity name", '
	50	+ '"type": "relationship description"}\n\n'
46	51	"Return ONLY the JSON object."
47	52	)
48	53	raw = self._chat(prompt)
49	54	parsed = parse_json_from_response(raw)
50	55
		@@ -52,32 +57,38 @@
52	57	rels = []
53	58
54	59	if isinstance(parsed, dict):
55	60	for item in parsed.get("entities", []):
56	61	if isinstance(item, dict) and "name" in item:
57		- entities.append(Entity(
58		- name=item["name"],
59		- type=item.get("type", "concept"),
60		- descriptions=[item["description"]] if item.get("description") else [],
61		- ))
62		- entity_names = {e.name for e in entities}
	62	+ entities.append(
	63	+ Entity(
	64	+ name=item["name"],
	65	+ type=item.get("type", "concept"),
	66	+ descriptions=[item["description"]] if item.get("description") else [],
	67	+ )
	68	+ )
	69	+ {e.name for e in entities}
63	70	for item in parsed.get("relationships", []):
64	71	if isinstance(item, dict) and "source" in item and "target" in item:
65		- rels.append(Relationship(
66		- source=item["source"],
67		- target=item["target"],
68		- type=item.get("type", "related_to"),
69		- ))
	72	+ rels.append(
	73	+ Relationship(
	74	+ source=item["source"],
	75	+ target=item["target"],
	76	+ type=item.get("type", "related_to"),
	77	+ )
	78	+ )
70	79	elif isinstance(parsed, list):
71	80	# Fallback: if model returns a flat entity list
72	81	for item in parsed:
73	82	if isinstance(item, dict) and "name" in item:
74		- entities.append(Entity(
75		- name=item["name"],
76		- type=item.get("type", "concept"),
77		- descriptions=[item["description"]] if item.get("description") else [],
78		- ))
	83	+ entities.append(
	84	+ Entity(
	85	+ name=item["name"],
	86	+ type=item.get("type", "concept"),
	87	+ descriptions=[item["description"]] if item.get("description") else [],
	88	+ )
	89	+ )
79	90
80	91	return entities, rels
81	92
82	93	def add_content(self, text: str, source: str, timestamp: Optional[float] = None) -> None:
83	94	"""Add content to knowledge graph by extracting entities and relationships."""
		@@ -84,39 +95,45 @@
84	95	entities, relationships = self.extract_entities_and_relationships(text)
85	96
86	97	for entity in entities:
87	98	eid = entity.name
88	99	if eid in self.nodes:
89		- self.nodes[eid]["occurrences"].append({
90		- "source": source,
91		- "timestamp": timestamp,
92		- "text": text[:100] + "..." if len(text) > 100 else text,
93		- })
	100	+ self.nodes[eid]["occurrences"].append(
	101	+ {
	102	+ "source": source,
	103	+ "timestamp": timestamp,
	104	+ "text": text[:100] + "..." if len(text) > 100 else text,
	105	+ }
	106	+ )
94	107	if entity.descriptions:
95	108	self.nodes[eid]["descriptions"].update(entity.descriptions)
96	109	else:
97	110	self.nodes[eid] = {
98	111	"id": eid,
99	112	"name": entity.name,
100	113	"type": entity.type,
101	114	"descriptions": set(entity.descriptions),
102		- "occurrences": [{
103		- "source": source,
104		- "timestamp": timestamp,
105		- "text": text[:100] + "..." if len(text) > 100 else text,
106		- }],
	115	+ "occurrences": [
	116	+ {
	117	+ "source": source,
	118	+ "timestamp": timestamp,
	119	+ "text": text[:100] + "..." if len(text) > 100 else text,
	120	+ }
	121	+ ],
107	122	}
108	123
109	124	for rel in relationships:
110	125	if rel.source in self.nodes and rel.target in self.nodes:
111		- self.relationships.append({
112		- "source": rel.source,
113		- "target": rel.target,
114		- "type": rel.type,
115		- "content_source": source,
116		- "timestamp": timestamp,
117		- })
	126	+ self.relationships.append(
	127	+ {
	128	+ "source": rel.source,
	129	+ "target": rel.target,
	130	+ "type": rel.type,
	131	+ "content_source": source,
	132	+ "timestamp": timestamp,
	133	+ }
	134	+ )
118	135
119	136	def process_transcript(self, transcript: Dict, batch_size: int = 10) -> None:
120	137	"""Process transcript segments into knowledge graph, batching for efficiency."""
121	138	if "segments" not in transcript:
122	139	logger.warning("Transcript missing segments")
		@@ -137,17 +154,15 @@
137	154	}
138	155
139	156	# Batch segments together for fewer API calls
140	157	batches = []
141	158	for start in range(0, len(segments), batch_size):
142		- batches.append(segments[start:start + batch_size])
	159	+ batches.append(segments[start : start + batch_size])
143	160
144	161	for batch in tqdm(batches, desc="Building knowledge graph", unit="batch"):
145	162	# Combine batch text
146		- combined_text = " ".join(
147		- seg["text"] for seg in batch if "text" in seg
148		- )
	163	+ combined_text = " ".join(seg["text"] for seg in batch if "text" in seg)
149	164	if not combined_text.strip():
150	165	continue
151	166
152	167	# Use first segment's timestamp as batch timestamp
153	168	batch_start_idx = segments.index(batch[0])
		@@ -169,29 +184,33 @@
169	184	self.nodes[diagram_id] = {
170	185	"id": diagram_id,
171	186	"name": f"Diagram {i}",
172	187	"type": "diagram",
173	188	"descriptions": {"Visual diagram from video"},
174		- "occurrences": [{
175		- "source": source if text_content else f"diagram_{i}",
176		- "frame_index": diagram.get("frame_index"),
177		- }],
	189	+ "occurrences": [
	190	+ {
	191	+ "source": source if text_content else f"diagram_{i}",
	192	+ "frame_index": diagram.get("frame_index"),
	193	+ }
	194	+ ],
178	195	}
179	196
180	197	def to_data(self) -> KnowledgeGraphData:
181	198	"""Convert to pydantic KnowledgeGraphData model."""
182	199	nodes = []
183	200	for node in self.nodes.values():
184	201	descs = node.get("descriptions", set())
185	202	if isinstance(descs, set):
186	203	descs = list(descs)
187		- nodes.append(Entity(
188		- name=node["name"],
189		- type=node.get("type", "concept"),
190		- descriptions=descs,
191		- occurrences=node.get("occurrences", []),
192		- ))
	204	+ nodes.append(
	205	+ Entity(
	206	+ name=node["name"],
	207	+ type=node.get("type", "concept"),
	208	+ descriptions=descs,
	209	+ occurrences=node.get("occurrences", []),
	210	+ )
	211	+ )
193	212
194	213	rels = [
195	214	Relationship(
196	215	source=r["source"],
197	216	target=r["target"],
		@@ -280,11 +299,12 @@
280	299	def generate_mermaid(self, max_nodes: int = 30) -> str:
281	300	"""Generate Mermaid visualization code."""
282	301	node_importance = {}
283	302	for node_id in self.nodes:
284	303	count = sum(
285		- 1 for rel in self.relationships
	304	+ 1
	305	+ for rel in self.relationships
286	306	if rel["source"] == node_id or rel["target"] == node_id
287	307	)
288	308	node_importance[node_id] = count
289	309
290	310	important = sorted(node_importance.items(), key=lambda x: x[1], reverse=True)
291	311

	--- video_processor/integrators/knowledge_graph.py
	+++ video_processor/integrators/knowledge_graph.py
	@@ -1,8 +1,7 @@
1	"""Knowledge graph integration for organizing extracted content."""
2
3	import json
4	import logging
5	from pathlib import Path
6	from typing import Dict, List, Optional, Union
7
8	from tqdm import tqdm
	@@ -33,18 +32,24 @@
33	[{"role": "user", "content": prompt}],
34	max_tokens=4096,
35	temperature=temperature,
36	)
37
38	def extract_entities_and_relationships(self, text: str) -> tuple[List[Entity], List[Relationship]]:


39	"""Extract entities and relationships in a single LLM call."""
40	prompt = (
41	"Extract all notable entities and relationships from the following content.\n\n"
42	f"CONTENT:\n{text}\n\n"
43	"Return a JSON object with two keys:\n"
44	'- "entities": array of {"name": "...", "type": "person\|concept\|technology\|organization\|time", "description": "brief description"}\n'
45	'- "relationships": array of {"source": "entity name", "target": "entity name", "type": "relationship description"}\n\n'




46	"Return ONLY the JSON object."
47	)
48	raw = self._chat(prompt)
49	parsed = parse_json_from_response(raw)
50
	@@ -52,32 +57,38 @@
52	rels = []
53
54	if isinstance(parsed, dict):
55	for item in parsed.get("entities", []):
56	if isinstance(item, dict) and "name" in item:
57	entities.append(Entity(
58	name=item["name"],
59	type=item.get("type", "concept"),
60	descriptions=[item["description"]] if item.get("description") else [],
61	))
62	entity_names = {e.name for e in entities}


63	for item in parsed.get("relationships", []):
64	if isinstance(item, dict) and "source" in item and "target" in item:
65	rels.append(Relationship(
66	source=item["source"],
67	target=item["target"],
68	type=item.get("type", "related_to"),
69	))


70	elif isinstance(parsed, list):
71	# Fallback: if model returns a flat entity list
72	for item in parsed:
73	if isinstance(item, dict) and "name" in item:
74	entities.append(Entity(
75	name=item["name"],
76	type=item.get("type", "concept"),
77	descriptions=[item["description"]] if item.get("description") else [],
78	))


79
80	return entities, rels
81
82	def add_content(self, text: str, source: str, timestamp: Optional[float] = None) -> None:
83	"""Add content to knowledge graph by extracting entities and relationships."""
	@@ -84,39 +95,45 @@
84	entities, relationships = self.extract_entities_and_relationships(text)
85
86	for entity in entities:
87	eid = entity.name
88	if eid in self.nodes:
89	self.nodes[eid]["occurrences"].append({
90	"source": source,
91	"timestamp": timestamp,
92	"text": text[:100] + "..." if len(text) > 100 else text,
93	})


94	if entity.descriptions:
95	self.nodes[eid]["descriptions"].update(entity.descriptions)
96	else:
97	self.nodes[eid] = {
98	"id": eid,
99	"name": entity.name,
100	"type": entity.type,
101	"descriptions": set(entity.descriptions),
102	"occurrences": [{
103	"source": source,
104	"timestamp": timestamp,
105	"text": text[:100] + "..." if len(text) > 100 else text,
106	}],


107	}
108
109	for rel in relationships:
110	if rel.source in self.nodes and rel.target in self.nodes:
111	self.relationships.append({
112	"source": rel.source,
113	"target": rel.target,
114	"type": rel.type,
115	"content_source": source,
116	"timestamp": timestamp,
117	})


118
119	def process_transcript(self, transcript: Dict, batch_size: int = 10) -> None:
120	"""Process transcript segments into knowledge graph, batching for efficiency."""
121	if "segments" not in transcript:
122	logger.warning("Transcript missing segments")
	@@ -137,17 +154,15 @@
137	}
138
139	# Batch segments together for fewer API calls
140	batches = []
141	for start in range(0, len(segments), batch_size):
142	batches.append(segments[start:start + batch_size])
143
144	for batch in tqdm(batches, desc="Building knowledge graph", unit="batch"):
145	# Combine batch text
146	combined_text = " ".join(
147	seg["text"] for seg in batch if "text" in seg
148	)
149	if not combined_text.strip():
150	continue
151
152	# Use first segment's timestamp as batch timestamp
153	batch_start_idx = segments.index(batch[0])
	@@ -169,29 +184,33 @@
169	self.nodes[diagram_id] = {
170	"id": diagram_id,
171	"name": f"Diagram {i}",
172	"type": "diagram",
173	"descriptions": {"Visual diagram from video"},
174	"occurrences": [{
175	"source": source if text_content else f"diagram_{i}",
176	"frame_index": diagram.get("frame_index"),
177	}],


178	}
179
180	def to_data(self) -> KnowledgeGraphData:
181	"""Convert to pydantic KnowledgeGraphData model."""
182	nodes = []
183	for node in self.nodes.values():
184	descs = node.get("descriptions", set())
185	if isinstance(descs, set):
186	descs = list(descs)
187	nodes.append(Entity(
188	name=node["name"],
189	type=node.get("type", "concept"),
190	descriptions=descs,
191	occurrences=node.get("occurrences", []),
192	))


193
194	rels = [
195	Relationship(
196	source=r["source"],
197	target=r["target"],
	@@ -280,11 +299,12 @@
280	def generate_mermaid(self, max_nodes: int = 30) -> str:
281	"""Generate Mermaid visualization code."""
282	node_importance = {}
283	for node_id in self.nodes:
284	count = sum(
285	1 for rel in self.relationships

286	if rel["source"] == node_id or rel["target"] == node_id
287	)
288	node_importance[node_id] = count
289
290	important = sorted(node_importance.items(), key=lambda x: x[1], reverse=True)
291

	--- video_processor/integrators/knowledge_graph.py
	+++ video_processor/integrators/knowledge_graph.py
	@@ -1,8 +1,7 @@
1	"""Knowledge graph integration for organizing extracted content."""
2

3	import logging
4	from pathlib import Path
5	from typing import Dict, List, Optional, Union
6
7	from tqdm import tqdm
	@@ -33,18 +32,24 @@
32	[{"role": "user", "content": prompt}],
33	max_tokens=4096,
34	temperature=temperature,
35	)
36
37	def extract_entities_and_relationships(
38	self, text: str
39	) -> tuple[List[Entity], List[Relationship]]:
40	"""Extract entities and relationships in a single LLM call."""
41	prompt = (
42	"Extract all notable entities and relationships from the following content.\n\n"
43	f"CONTENT:\n{text}\n\n"
44	"Return a JSON object with two keys:\n"
45	'- "entities": array of {"name": "...", '
46	'"type": "person\|concept\|technology\|organization\|time", '
47	'"description": "brief description"}\n'
48	'- "relationships": array of {"source": "entity name", '
49	'"target": "entity name", '
50	'"type": "relationship description"}\n\n'
51	"Return ONLY the JSON object."
52	)
53	raw = self._chat(prompt)
54	parsed = parse_json_from_response(raw)
55
	@@ -52,32 +57,38 @@
57	rels = []
58
59	if isinstance(parsed, dict):
60	for item in parsed.get("entities", []):
61	if isinstance(item, dict) and "name" in item:
62	entities.append(
63	Entity(
64	name=item["name"],
65	type=item.get("type", "concept"),
66	descriptions=[item["description"]] if item.get("description") else [],
67	)
68	)
69	{e.name for e in entities}
70	for item in parsed.get("relationships", []):
71	if isinstance(item, dict) and "source" in item and "target" in item:
72	rels.append(
73	Relationship(
74	source=item["source"],
75	target=item["target"],
76	type=item.get("type", "related_to"),
77	)
78	)
79	elif isinstance(parsed, list):
80	# Fallback: if model returns a flat entity list
81	for item in parsed:
82	if isinstance(item, dict) and "name" in item:
83	entities.append(
84	Entity(
85	name=item["name"],
86	type=item.get("type", "concept"),
87	descriptions=[item["description"]] if item.get("description") else [],
88	)
89	)
90
91	return entities, rels
92
93	def add_content(self, text: str, source: str, timestamp: Optional[float] = None) -> None:
94	"""Add content to knowledge graph by extracting entities and relationships."""
	@@ -84,39 +95,45 @@
95	entities, relationships = self.extract_entities_and_relationships(text)
96
97	for entity in entities:
98	eid = entity.name
99	if eid in self.nodes:
100	self.nodes[eid]["occurrences"].append(
101	{
102	"source": source,
103	"timestamp": timestamp,
104	"text": text[:100] + "..." if len(text) > 100 else text,
105	}
106	)
107	if entity.descriptions:
108	self.nodes[eid]["descriptions"].update(entity.descriptions)
109	else:
110	self.nodes[eid] = {
111	"id": eid,
112	"name": entity.name,
113	"type": entity.type,
114	"descriptions": set(entity.descriptions),
115	"occurrences": [
116	{
117	"source": source,
118	"timestamp": timestamp,
119	"text": text[:100] + "..." if len(text) > 100 else text,
120	}
121	],
122	}
123
124	for rel in relationships:
125	if rel.source in self.nodes and rel.target in self.nodes:
126	self.relationships.append(
127	{
128	"source": rel.source,
129	"target": rel.target,
130	"type": rel.type,
131	"content_source": source,
132	"timestamp": timestamp,
133	}
134	)
135
136	def process_transcript(self, transcript: Dict, batch_size: int = 10) -> None:
137	"""Process transcript segments into knowledge graph, batching for efficiency."""
138	if "segments" not in transcript:
139	logger.warning("Transcript missing segments")
	@@ -137,17 +154,15 @@
154	}
155
156	# Batch segments together for fewer API calls
157	batches = []
158	for start in range(0, len(segments), batch_size):
159	batches.append(segments[start : start + batch_size])
160
161	for batch in tqdm(batches, desc="Building knowledge graph", unit="batch"):
162	# Combine batch text
163	combined_text = " ".join(seg["text"] for seg in batch if "text" in seg)


164	if not combined_text.strip():
165	continue
166
167	# Use first segment's timestamp as batch timestamp
168	batch_start_idx = segments.index(batch[0])
	@@ -169,29 +184,33 @@
184	self.nodes[diagram_id] = {
185	"id": diagram_id,
186	"name": f"Diagram {i}",
187	"type": "diagram",
188	"descriptions": {"Visual diagram from video"},
189	"occurrences": [
190	{
191	"source": source if text_content else f"diagram_{i}",
192	"frame_index": diagram.get("frame_index"),
193	}
194	],
195	}
196
197	def to_data(self) -> KnowledgeGraphData:
198	"""Convert to pydantic KnowledgeGraphData model."""
199	nodes = []
200	for node in self.nodes.values():
201	descs = node.get("descriptions", set())
202	if isinstance(descs, set):
203	descs = list(descs)
204	nodes.append(
205	Entity(
206	name=node["name"],
207	type=node.get("type", "concept"),
208	descriptions=descs,
209	occurrences=node.get("occurrences", []),
210	)
211	)
212
213	rels = [
214	Relationship(
215	source=r["source"],
216	target=r["target"],
	@@ -280,11 +299,12 @@
299	def generate_mermaid(self, max_nodes: int = 30) -> str:
300	"""Generate Mermaid visualization code."""
301	node_importance = {}
302	for node_id in self.nodes:
303	count = sum(
304	1
305	for rel in self.relationships
306	if rel["source"] == node_id or rel["target"] == node_id
307	)
308	node_importance[node_id] = count
309
310	important = sorted(node_importance.items(), key=lambda x: x[1], reverse=True)
311

M video_processor/integrators/plan_generator.py

+4 -3

		--- video_processor/integrators/plan_generator.py
		+++ video_processor/integrators/plan_generator.py
		@@ -1,14 +1,13 @@
1	1	"""Plan generation for creating structured markdown output."""
2	2
3		-import json
4	3	import logging
5	4	from pathlib import Path
6	5	from typing import Dict, List, Optional, Union
7	6
8	7	from video_processor.integrators.knowledge_graph import KnowledgeGraph
9		-from video_processor.models import BatchManifest, VideoManifest
	8	+from video_processor.models import VideoManifest
10	9	from video_processor.providers.manager import ProviderManager
11	10
12	11	logger = logging.getLogger(__name__)
13	12
14	13
		@@ -36,11 +35,13 @@
36	35	"""Generate summary from transcript."""
37	36	full_text = ""
38	37	if "segments" in transcript:
39	38	for segment in transcript["segments"]:
40	39	if "text" in segment:
41		- speaker = f"{segment.get('speaker', 'Speaker')}: " if "speaker" in segment else ""
	40	+ speaker = (
	41	+ f"{segment.get('speaker', 'Speaker')}: " if "speaker" in segment else ""
	42	+ )
42	43	full_text += f"{speaker}{segment['text']}\n\n"
43	44
44	45	if not full_text.strip():
45	46	full_text = transcript.get("text", "")
46	47
47	48

	--- video_processor/integrators/plan_generator.py
	+++ video_processor/integrators/plan_generator.py
	@@ -1,14 +1,13 @@
1	"""Plan generation for creating structured markdown output."""
2
3	import json
4	import logging
5	from pathlib import Path
6	from typing import Dict, List, Optional, Union
7
8	from video_processor.integrators.knowledge_graph import KnowledgeGraph
9	from video_processor.models import BatchManifest, VideoManifest
10	from video_processor.providers.manager import ProviderManager
11
12	logger = logging.getLogger(__name__)
13
14
	@@ -36,11 +35,13 @@
36	"""Generate summary from transcript."""
37	full_text = ""
38	if "segments" in transcript:
39	for segment in transcript["segments"]:
40	if "text" in segment:
41	speaker = f"{segment.get('speaker', 'Speaker')}: " if "speaker" in segment else ""


42	full_text += f"{speaker}{segment['text']}\n\n"
43
44	if not full_text.strip():
45	full_text = transcript.get("text", "")
46
47

	--- video_processor/integrators/plan_generator.py
	+++ video_processor/integrators/plan_generator.py
	@@ -1,14 +1,13 @@
1	"""Plan generation for creating structured markdown output."""
2

3	import logging
4	from pathlib import Path
5	from typing import Dict, List, Optional, Union
6
7	from video_processor.integrators.knowledge_graph import KnowledgeGraph
8	from video_processor.models import VideoManifest
9	from video_processor.providers.manager import ProviderManager
10
11	logger = logging.getLogger(__name__)
12
13
	@@ -36,11 +35,13 @@
35	"""Generate summary from transcript."""
36	full_text = ""
37	if "segments" in transcript:
38	for segment in transcript["segments"]:
39	if "text" in segment:
40	speaker = (
41	f"{segment.get('speaker', 'Speaker')}: " if "speaker" in segment else ""
42	)
43	full_text += f"{speaker}{segment['text']}\n\n"
44
45	if not full_text.strip():
46	full_text = transcript.get("text", "")
47
48

M video_processor/models.py

+38 -17

		--- video_processor/models.py
		+++ video_processor/models.py
		@@ -1,17 +1,17 @@
1	1	"""Pydantic data models for PlanOpticon output."""
2	2
3	3	from datetime import datetime
4	4	from enum import Enum
5		-from pathlib import Path
6	5	from typing import Any, Dict, List, Optional
7	6
8	7	from pydantic import BaseModel, Field
9	8
10	9
11	10	class DiagramType(str, Enum):
12	11	"""Types of visual content detected in video frames."""
	12	+
13	13	flowchart = "flowchart"
14	14	sequence = "sequence"
15	15	architecture = "architecture"
16	16	whiteboard = "whiteboard"
17	17	chart = "chart"
		@@ -21,10 +21,11 @@
21	21	unknown = "unknown"
22	22
23	23
24	24	class OutputFormat(str, Enum):
25	25	"""Available output formats."""
	26	+
26	27	markdown = "markdown"
27	28	json = "json"
28	29	html = "html"
29	30	pdf = "pdf"
30	31	svg = "svg"
		@@ -31,39 +32,47 @@
31	32	png = "png"
32	33
33	34
34	35	class TranscriptSegment(BaseModel):
35	36	"""A single segment of transcribed audio."""
	37	+
36	38	start: float = Field(description="Start time in seconds")
37	39	end: float = Field(description="End time in seconds")
38	40	text: str = Field(description="Transcribed text")
39	41	speaker: Optional[str] = Field(default=None, description="Speaker identifier")
40	42	confidence: Optional[float] = Field(default=None, description="Transcription confidence 0-1")
41	43
42	44
43	45	class ActionItem(BaseModel):
44	46	"""An action item extracted from content."""
	47	+
45	48	action: str = Field(description="The action to be taken")
46	49	assignee: Optional[str] = Field(default=None, description="Person responsible")
47	50	deadline: Optional[str] = Field(default=None, description="Deadline or timeframe")
48	51	priority: Optional[str] = Field(default=None, description="Priority level")
49	52	context: Optional[str] = Field(default=None, description="Additional context")
50		- source: Optional[str] = Field(default=None, description="Where this was found (transcript/diagram)")
	53	+ source: Optional[str] = Field(
	54	+ default=None, description="Where this was found (transcript/diagram)"
	55	+ )
51	56
52	57
53	58	class KeyPoint(BaseModel):
54	59	"""A key point extracted from content."""
	60	+
55	61	point: str = Field(description="The key point")
56	62	topic: Optional[str] = Field(default=None, description="Topic or category")
57	63	details: Optional[str] = Field(default=None, description="Supporting details")
58	64	timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)")
59	65	source: Optional[str] = Field(default=None, description="Where this was found")
60		- related_diagrams: List[int] = Field(default_factory=list, description="Indices of related diagrams")
	66	+ related_diagrams: List[int] = Field(
	67	+ default_factory=list, description="Indices of related diagrams"
	68	+ )
61	69
62	70
63	71	class DiagramResult(BaseModel):
64	72	"""Result from diagram extraction and analysis."""
	73	+
65	74	frame_index: int = Field(description="Index of the source frame")
66	75	timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)")
67	76	diagram_type: DiagramType = Field(default=DiagramType.unknown, description="Type of diagram")
68	77	confidence: float = Field(default=0.0, description="Detection confidence 0-1")
69	78	description: Optional[str] = Field(default=None, description="Description of the diagram")
		@@ -70,85 +79,95 @@
70	79	text_content: Optional[str] = Field(default=None, description="Text visible in the diagram")
71	80	elements: List[str] = Field(default_factory=list, description="Identified elements")
72	81	relationships: List[str] = Field(default_factory=list, description="Identified relationships")
73	82	mermaid: Optional[str] = Field(default=None, description="Mermaid syntax representation")
74	83	chart_data: Optional[Dict[str, Any]] = Field(
75		- default=None,
76		- description="Chart data for reproduction (labels, values, chart_type)"
	84	+ default=None, description="Chart data for reproduction (labels, values, chart_type)"
77	85	)
78	86	image_path: Optional[str] = Field(default=None, description="Relative path to original frame")
79	87	svg_path: Optional[str] = Field(default=None, description="Relative path to rendered SVG")
80	88	png_path: Optional[str] = Field(default=None, description="Relative path to rendered PNG")
81	89	mermaid_path: Optional[str] = Field(default=None, description="Relative path to mermaid source")
82	90
83	91
84	92	class ScreenCapture(BaseModel):
85	93	"""A screengrab fallback when diagram extraction fails or is uncertain."""
	94	+
86	95	frame_index: int = Field(description="Index of the source frame")
87	96	timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)")
88	97	caption: Optional[str] = Field(default=None, description="Brief description of the content")
89	98	image_path: Optional[str] = Field(default=None, description="Relative path to screenshot")
90		- confidence: float = Field(default=0.0, description="Detection confidence that triggered fallback")
	99	+ confidence: float = Field(
	100	+ default=0.0, description="Detection confidence that triggered fallback"
	101	+ )
91	102
92	103
93	104	class Entity(BaseModel):
94	105	"""An entity in the knowledge graph."""
	106	+
95	107	name: str = Field(description="Entity name")
96	108	type: str = Field(default="concept", description="Entity type (person, concept, time, diagram)")
97	109	descriptions: List[str] = Field(default_factory=list, description="Descriptions of this entity")
98		- source: Optional[str] = Field(default=None, description="Source attribution (transcript/diagram/both)")
	110	+ source: Optional[str] = Field(
	111	+ default=None, description="Source attribution (transcript/diagram/both)"
	112	+ )
99	113	occurrences: List[Dict[str, Any]] = Field(
100		- default_factory=list,
101		- description="List of occurrences with source, timestamp, text"
	114	+ default_factory=list, description="List of occurrences with source, timestamp, text"
102	115	)
103	116
104	117
105	118	class Relationship(BaseModel):
106	119	"""A relationship between entities in the knowledge graph."""
	120	+
107	121	source: str = Field(description="Source entity name")
108	122	target: str = Field(description="Target entity name")
109	123	type: str = Field(default="related_to", description="Relationship type")
110	124	content_source: Optional[str] = Field(default=None, description="Content source identifier")
111	125	timestamp: Optional[float] = Field(default=None, description="Timestamp in seconds")
112	126
113	127
114	128	class KnowledgeGraphData(BaseModel):
115	129	"""Serializable knowledge graph data."""
	130	+
116	131	nodes: List[Entity] = Field(default_factory=list, description="Graph nodes/entities")
117		- relationships: List[Relationship] = Field(default_factory=list, description="Graph relationships")
	132	+ relationships: List[Relationship] = Field(
	133	+ default_factory=list, description="Graph relationships"
	134	+ )
118	135
119	136
120	137	class ProcessingStats(BaseModel):
121	138	"""Statistics about a processing run."""
	139	+
122	140	start_time: Optional[str] = Field(default=None, description="ISO format start time")
123	141	end_time: Optional[str] = Field(default=None, description="ISO format end time")
124	142	duration_seconds: Optional[float] = Field(default=None, description="Total processing time")
125	143	frames_extracted: int = Field(default=0)
126	144	people_frames_filtered: int = Field(default=0)
127	145	diagrams_detected: int = Field(default=0)
128	146	screen_captures: int = Field(default=0)
129	147	transcript_duration_seconds: Optional[float] = Field(default=None)
130	148	models_used: Dict[str, str] = Field(
131		- default_factory=dict,
132		- description="Map of task to model used (e.g. vision: gpt-4o)"
	149	+ default_factory=dict, description="Map of task to model used (e.g. vision: gpt-4o)"
133	150	)
134	151
135	152
136	153	class VideoMetadata(BaseModel):
137	154	"""Metadata about the source video."""
	155	+
138	156	title: str = Field(description="Video title")
139	157	source_path: Optional[str] = Field(default=None, description="Original video file path")
140	158	duration_seconds: Optional[float] = Field(default=None, description="Video duration")
141	159	resolution: Optional[str] = Field(default=None, description="Video resolution (e.g. 1920x1080)")
142	160	processed_at: str = Field(
143	161	default_factory=lambda: datetime.now().isoformat(),
144		- description="ISO format processing timestamp"
	162	+ description="ISO format processing timestamp",
145	163	)
146	164
147	165
148	166	class VideoManifest(BaseModel):
149	167	"""Manifest for a single video processing run - the single source of truth."""
	168	+
150	169	version: str = Field(default="1.0", description="Manifest schema version")
151	170	video: VideoMetadata = Field(description="Source video metadata")
152	171	stats: ProcessingStats = Field(default_factory=ProcessingStats)
153	172
154	173	# Relative paths to output files
		@@ -167,15 +186,18 @@
167	186	action_items: List[ActionItem] = Field(default_factory=list)
168	187	diagrams: List[DiagramResult] = Field(default_factory=list)
169	188	screen_captures: List[ScreenCapture] = Field(default_factory=list)
170	189
171	190	# Frame paths
172		- frame_paths: List[str] = Field(default_factory=list, description="Relative paths to extracted frames")
	191	+ frame_paths: List[str] = Field(
	192	+ default_factory=list, description="Relative paths to extracted frames"
	193	+ )
173	194
174	195
175	196	class BatchVideoEntry(BaseModel):
176	197	"""Summary of a single video within a batch."""
	198	+
177	199	video_name: str
178	200	manifest_path: str = Field(description="Relative path to video manifest")
179	201	status: str = Field(default="pending", description="pending/completed/failed")
180	202	error: Optional[str] = Field(default=None, description="Error message if failed")
181	203	diagrams_count: int = Field(default=0)
		@@ -184,15 +206,14 @@
184	206	duration_seconds: Optional[float] = Field(default=None)
185	207
186	208
187	209	class BatchManifest(BaseModel):
188	210	"""Manifest for a batch processing run."""
	211	+
189	212	version: str = Field(default="1.0")
190	213	title: str = Field(default="Batch Processing Results")
191		- processed_at: str = Field(
192		- default_factory=lambda: datetime.now().isoformat()
193		- )
	214	+ processed_at: str = Field(default_factory=lambda: datetime.now().isoformat())
194	215	stats: ProcessingStats = Field(default_factory=ProcessingStats)
195	216
196	217	videos: List[BatchVideoEntry] = Field(default_factory=list)
197	218
198	219	# Aggregated counts
199	220

	--- video_processor/models.py
	+++ video_processor/models.py
	@@ -1,17 +1,17 @@
1	"""Pydantic data models for PlanOpticon output."""
2
3	from datetime import datetime
4	from enum import Enum
5	from pathlib import Path
6	from typing import Any, Dict, List, Optional
7
8	from pydantic import BaseModel, Field
9
10
11	class DiagramType(str, Enum):
12	"""Types of visual content detected in video frames."""

13	flowchart = "flowchart"
14	sequence = "sequence"
15	architecture = "architecture"
16	whiteboard = "whiteboard"
17	chart = "chart"
	@@ -21,10 +21,11 @@
21	unknown = "unknown"
22
23
24	class OutputFormat(str, Enum):
25	"""Available output formats."""

26	markdown = "markdown"
27	json = "json"
28	html = "html"
29	pdf = "pdf"
30	svg = "svg"
	@@ -31,39 +32,47 @@
31	png = "png"
32
33
34	class TranscriptSegment(BaseModel):
35	"""A single segment of transcribed audio."""

36	start: float = Field(description="Start time in seconds")
37	end: float = Field(description="End time in seconds")
38	text: str = Field(description="Transcribed text")
39	speaker: Optional[str] = Field(default=None, description="Speaker identifier")
40	confidence: Optional[float] = Field(default=None, description="Transcription confidence 0-1")
41
42
43	class ActionItem(BaseModel):
44	"""An action item extracted from content."""

45	action: str = Field(description="The action to be taken")
46	assignee: Optional[str] = Field(default=None, description="Person responsible")
47	deadline: Optional[str] = Field(default=None, description="Deadline or timeframe")
48	priority: Optional[str] = Field(default=None, description="Priority level")
49	context: Optional[str] = Field(default=None, description="Additional context")
50	source: Optional[str] = Field(default=None, description="Where this was found (transcript/diagram)")


51
52
53	class KeyPoint(BaseModel):
54	"""A key point extracted from content."""

55	point: str = Field(description="The key point")
56	topic: Optional[str] = Field(default=None, description="Topic or category")
57	details: Optional[str] = Field(default=None, description="Supporting details")
58	timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)")
59	source: Optional[str] = Field(default=None, description="Where this was found")
60	related_diagrams: List[int] = Field(default_factory=list, description="Indices of related diagrams")


61
62
63	class DiagramResult(BaseModel):
64	"""Result from diagram extraction and analysis."""

65	frame_index: int = Field(description="Index of the source frame")
66	timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)")
67	diagram_type: DiagramType = Field(default=DiagramType.unknown, description="Type of diagram")
68	confidence: float = Field(default=0.0, description="Detection confidence 0-1")
69	description: Optional[str] = Field(default=None, description="Description of the diagram")
	@@ -70,85 +79,95 @@
70	text_content: Optional[str] = Field(default=None, description="Text visible in the diagram")
71	elements: List[str] = Field(default_factory=list, description="Identified elements")
72	relationships: List[str] = Field(default_factory=list, description="Identified relationships")
73	mermaid: Optional[str] = Field(default=None, description="Mermaid syntax representation")
74	chart_data: Optional[Dict[str, Any]] = Field(
75	default=None,
76	description="Chart data for reproduction (labels, values, chart_type)"
77	)
78	image_path: Optional[str] = Field(default=None, description="Relative path to original frame")
79	svg_path: Optional[str] = Field(default=None, description="Relative path to rendered SVG")
80	png_path: Optional[str] = Field(default=None, description="Relative path to rendered PNG")
81	mermaid_path: Optional[str] = Field(default=None, description="Relative path to mermaid source")
82
83
84	class ScreenCapture(BaseModel):
85	"""A screengrab fallback when diagram extraction fails or is uncertain."""

86	frame_index: int = Field(description="Index of the source frame")
87	timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)")
88	caption: Optional[str] = Field(default=None, description="Brief description of the content")
89	image_path: Optional[str] = Field(default=None, description="Relative path to screenshot")
90	confidence: float = Field(default=0.0, description="Detection confidence that triggered fallback")


91
92
93	class Entity(BaseModel):
94	"""An entity in the knowledge graph."""

95	name: str = Field(description="Entity name")
96	type: str = Field(default="concept", description="Entity type (person, concept, time, diagram)")
97	descriptions: List[str] = Field(default_factory=list, description="Descriptions of this entity")
98	source: Optional[str] = Field(default=None, description="Source attribution (transcript/diagram/both)")


99	occurrences: List[Dict[str, Any]] = Field(
100	default_factory=list,
101	description="List of occurrences with source, timestamp, text"
102	)
103
104
105	class Relationship(BaseModel):
106	"""A relationship between entities in the knowledge graph."""

107	source: str = Field(description="Source entity name")
108	target: str = Field(description="Target entity name")
109	type: str = Field(default="related_to", description="Relationship type")
110	content_source: Optional[str] = Field(default=None, description="Content source identifier")
111	timestamp: Optional[float] = Field(default=None, description="Timestamp in seconds")
112
113
114	class KnowledgeGraphData(BaseModel):
115	"""Serializable knowledge graph data."""

116	nodes: List[Entity] = Field(default_factory=list, description="Graph nodes/entities")
117	relationships: List[Relationship] = Field(default_factory=list, description="Graph relationships")


118
119
120	class ProcessingStats(BaseModel):
121	"""Statistics about a processing run."""

122	start_time: Optional[str] = Field(default=None, description="ISO format start time")
123	end_time: Optional[str] = Field(default=None, description="ISO format end time")
124	duration_seconds: Optional[float] = Field(default=None, description="Total processing time")
125	frames_extracted: int = Field(default=0)
126	people_frames_filtered: int = Field(default=0)
127	diagrams_detected: int = Field(default=0)
128	screen_captures: int = Field(default=0)
129	transcript_duration_seconds: Optional[float] = Field(default=None)
130	models_used: Dict[str, str] = Field(
131	default_factory=dict,
132	description="Map of task to model used (e.g. vision: gpt-4o)"
133	)
134
135
136	class VideoMetadata(BaseModel):
137	"""Metadata about the source video."""

138	title: str = Field(description="Video title")
139	source_path: Optional[str] = Field(default=None, description="Original video file path")
140	duration_seconds: Optional[float] = Field(default=None, description="Video duration")
141	resolution: Optional[str] = Field(default=None, description="Video resolution (e.g. 1920x1080)")
142	processed_at: str = Field(
143	default_factory=lambda: datetime.now().isoformat(),
144	description="ISO format processing timestamp"
145	)
146
147
148	class VideoManifest(BaseModel):
149	"""Manifest for a single video processing run - the single source of truth."""

150	version: str = Field(default="1.0", description="Manifest schema version")
151	video: VideoMetadata = Field(description="Source video metadata")
152	stats: ProcessingStats = Field(default_factory=ProcessingStats)
153
154	# Relative paths to output files
	@@ -167,15 +186,18 @@
167	action_items: List[ActionItem] = Field(default_factory=list)
168	diagrams: List[DiagramResult] = Field(default_factory=list)
169	screen_captures: List[ScreenCapture] = Field(default_factory=list)
170
171	# Frame paths
172	frame_paths: List[str] = Field(default_factory=list, description="Relative paths to extracted frames")


173
174
175	class BatchVideoEntry(BaseModel):
176	"""Summary of a single video within a batch."""

177	video_name: str
178	manifest_path: str = Field(description="Relative path to video manifest")
179	status: str = Field(default="pending", description="pending/completed/failed")
180	error: Optional[str] = Field(default=None, description="Error message if failed")
181	diagrams_count: int = Field(default=0)
	@@ -184,15 +206,14 @@
184	duration_seconds: Optional[float] = Field(default=None)
185
186
187	class BatchManifest(BaseModel):
188	"""Manifest for a batch processing run."""

189	version: str = Field(default="1.0")
190	title: str = Field(default="Batch Processing Results")
191	processed_at: str = Field(
192	default_factory=lambda: datetime.now().isoformat()
193	)
194	stats: ProcessingStats = Field(default_factory=ProcessingStats)
195
196	videos: List[BatchVideoEntry] = Field(default_factory=list)
197
198	# Aggregated counts
199

	--- video_processor/models.py
	+++ video_processor/models.py
	@@ -1,17 +1,17 @@
1	"""Pydantic data models for PlanOpticon output."""
2
3	from datetime import datetime
4	from enum import Enum

5	from typing import Any, Dict, List, Optional
6
7	from pydantic import BaseModel, Field
8
9
10	class DiagramType(str, Enum):
11	"""Types of visual content detected in video frames."""
12
13	flowchart = "flowchart"
14	sequence = "sequence"
15	architecture = "architecture"
16	whiteboard = "whiteboard"
17	chart = "chart"
	@@ -21,10 +21,11 @@
21	unknown = "unknown"
22
23
24	class OutputFormat(str, Enum):
25	"""Available output formats."""
26
27	markdown = "markdown"
28	json = "json"
29	html = "html"
30	pdf = "pdf"
31	svg = "svg"
	@@ -31,39 +32,47 @@
32	png = "png"
33
34
35	class TranscriptSegment(BaseModel):
36	"""A single segment of transcribed audio."""
37
38	start: float = Field(description="Start time in seconds")
39	end: float = Field(description="End time in seconds")
40	text: str = Field(description="Transcribed text")
41	speaker: Optional[str] = Field(default=None, description="Speaker identifier")
42	confidence: Optional[float] = Field(default=None, description="Transcription confidence 0-1")
43
44
45	class ActionItem(BaseModel):
46	"""An action item extracted from content."""
47
48	action: str = Field(description="The action to be taken")
49	assignee: Optional[str] = Field(default=None, description="Person responsible")
50	deadline: Optional[str] = Field(default=None, description="Deadline or timeframe")
51	priority: Optional[str] = Field(default=None, description="Priority level")
52	context: Optional[str] = Field(default=None, description="Additional context")
53	source: Optional[str] = Field(
54	default=None, description="Where this was found (transcript/diagram)"
55	)
56
57
58	class KeyPoint(BaseModel):
59	"""A key point extracted from content."""
60
61	point: str = Field(description="The key point")
62	topic: Optional[str] = Field(default=None, description="Topic or category")
63	details: Optional[str] = Field(default=None, description="Supporting details")
64	timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)")
65	source: Optional[str] = Field(default=None, description="Where this was found")
66	related_diagrams: List[int] = Field(
67	default_factory=list, description="Indices of related diagrams"
68	)
69
70
71	class DiagramResult(BaseModel):
72	"""Result from diagram extraction and analysis."""
73
74	frame_index: int = Field(description="Index of the source frame")
75	timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)")
76	diagram_type: DiagramType = Field(default=DiagramType.unknown, description="Type of diagram")
77	confidence: float = Field(default=0.0, description="Detection confidence 0-1")
78	description: Optional[str] = Field(default=None, description="Description of the diagram")
	@@ -70,85 +79,95 @@
79	text_content: Optional[str] = Field(default=None, description="Text visible in the diagram")
80	elements: List[str] = Field(default_factory=list, description="Identified elements")
81	relationships: List[str] = Field(default_factory=list, description="Identified relationships")
82	mermaid: Optional[str] = Field(default=None, description="Mermaid syntax representation")
83	chart_data: Optional[Dict[str, Any]] = Field(
84	default=None, description="Chart data for reproduction (labels, values, chart_type)"

85	)
86	image_path: Optional[str] = Field(default=None, description="Relative path to original frame")
87	svg_path: Optional[str] = Field(default=None, description="Relative path to rendered SVG")
88	png_path: Optional[str] = Field(default=None, description="Relative path to rendered PNG")
89	mermaid_path: Optional[str] = Field(default=None, description="Relative path to mermaid source")
90
91
92	class ScreenCapture(BaseModel):
93	"""A screengrab fallback when diagram extraction fails or is uncertain."""
94
95	frame_index: int = Field(description="Index of the source frame")
96	timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)")
97	caption: Optional[str] = Field(default=None, description="Brief description of the content")
98	image_path: Optional[str] = Field(default=None, description="Relative path to screenshot")
99	confidence: float = Field(
100	default=0.0, description="Detection confidence that triggered fallback"
101	)
102
103
104	class Entity(BaseModel):
105	"""An entity in the knowledge graph."""
106
107	name: str = Field(description="Entity name")
108	type: str = Field(default="concept", description="Entity type (person, concept, time, diagram)")
109	descriptions: List[str] = Field(default_factory=list, description="Descriptions of this entity")
110	source: Optional[str] = Field(
111	default=None, description="Source attribution (transcript/diagram/both)"
112	)
113	occurrences: List[Dict[str, Any]] = Field(
114	default_factory=list, description="List of occurrences with source, timestamp, text"

115	)
116
117
118	class Relationship(BaseModel):
119	"""A relationship between entities in the knowledge graph."""
120
121	source: str = Field(description="Source entity name")
122	target: str = Field(description="Target entity name")
123	type: str = Field(default="related_to", description="Relationship type")
124	content_source: Optional[str] = Field(default=None, description="Content source identifier")
125	timestamp: Optional[float] = Field(default=None, description="Timestamp in seconds")
126
127
128	class KnowledgeGraphData(BaseModel):
129	"""Serializable knowledge graph data."""
130
131	nodes: List[Entity] = Field(default_factory=list, description="Graph nodes/entities")
132	relationships: List[Relationship] = Field(
133	default_factory=list, description="Graph relationships"
134	)
135
136
137	class ProcessingStats(BaseModel):
138	"""Statistics about a processing run."""
139
140	start_time: Optional[str] = Field(default=None, description="ISO format start time")
141	end_time: Optional[str] = Field(default=None, description="ISO format end time")
142	duration_seconds: Optional[float] = Field(default=None, description="Total processing time")
143	frames_extracted: int = Field(default=0)
144	people_frames_filtered: int = Field(default=0)
145	diagrams_detected: int = Field(default=0)
146	screen_captures: int = Field(default=0)
147	transcript_duration_seconds: Optional[float] = Field(default=None)
148	models_used: Dict[str, str] = Field(
149	default_factory=dict, description="Map of task to model used (e.g. vision: gpt-4o)"

150	)
151
152
153	class VideoMetadata(BaseModel):
154	"""Metadata about the source video."""
155
156	title: str = Field(description="Video title")
157	source_path: Optional[str] = Field(default=None, description="Original video file path")
158	duration_seconds: Optional[float] = Field(default=None, description="Video duration")
159	resolution: Optional[str] = Field(default=None, description="Video resolution (e.g. 1920x1080)")
160	processed_at: str = Field(
161	default_factory=lambda: datetime.now().isoformat(),
162	description="ISO format processing timestamp",
163	)
164
165
166	class VideoManifest(BaseModel):
167	"""Manifest for a single video processing run - the single source of truth."""
168
169	version: str = Field(default="1.0", description="Manifest schema version")
170	video: VideoMetadata = Field(description="Source video metadata")
171	stats: ProcessingStats = Field(default_factory=ProcessingStats)
172
173	# Relative paths to output files
	@@ -167,15 +186,18 @@
186	action_items: List[ActionItem] = Field(default_factory=list)
187	diagrams: List[DiagramResult] = Field(default_factory=list)
188	screen_captures: List[ScreenCapture] = Field(default_factory=list)
189
190	# Frame paths
191	frame_paths: List[str] = Field(
192	default_factory=list, description="Relative paths to extracted frames"
193	)
194
195
196	class BatchVideoEntry(BaseModel):
197	"""Summary of a single video within a batch."""
198
199	video_name: str
200	manifest_path: str = Field(description="Relative path to video manifest")
201	status: str = Field(default="pending", description="pending/completed/failed")
202	error: Optional[str] = Field(default=None, description="Error message if failed")
203	diagrams_count: int = Field(default=0)
	@@ -184,15 +206,14 @@
206	duration_seconds: Optional[float] = Field(default=None)
207
208
209	class BatchManifest(BaseModel):
210	"""Manifest for a batch processing run."""
211
212	version: str = Field(default="1.0")
213	title: str = Field(default="Batch Processing Results")
214	processed_at: str = Field(default_factory=lambda: datetime.now().isoformat())


215	stats: ProcessingStats = Field(default_factory=ProcessingStats)
216
217	videos: List[BatchVideoEntry] = Field(default_factory=list)
218
219	# Aggregated counts
220

M video_processor/output_structure.py

-1

		--- video_processor/output_structure.py
		+++ video_processor/output_structure.py
		@@ -1,8 +1,7 @@
1	1	"""Standardized output directory structure and manifest I/O for PlanOpticon."""
2	2
3		-import json
4	3	import logging
5	4	from pathlib import Path
6	5	from typing import Dict
7	6
8	7	from video_processor.models import BatchManifest, VideoManifest
9	8

	--- video_processor/output_structure.py
	+++ video_processor/output_structure.py
	@@ -1,8 +1,7 @@
1	"""Standardized output directory structure and manifest I/O for PlanOpticon."""
2
3	import json
4	import logging
5	from pathlib import Path
6	from typing import Dict
7
8	from video_processor.models import BatchManifest, VideoManifest
9

	--- video_processor/output_structure.py
	+++ video_processor/output_structure.py
	@@ -1,8 +1,7 @@
1	"""Standardized output directory structure and manifest I/O for PlanOpticon."""
2

3	import logging
4	from pathlib import Path
5	from typing import Dict
6
7	from video_processor.models import BatchManifest, VideoManifest
8

M video_processor/pipeline.py

+17 -14

		--- video_processor/pipeline.py
		+++ video_processor/pipeline.py
		@@ -9,11 +9,15 @@
9	9
10	10	from tqdm import tqdm
11	11
12	12	from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
13	13	from video_processor.extractors.audio_extractor import AudioExtractor
14		-from video_processor.extractors.frame_extractor import extract_frames, filter_people_frames, save_frames
	14	+from video_processor.extractors.frame_extractor import (
	15	+ extract_frames,
	16	+ filter_people_frames,
	17	+ save_frames,
	18	+)
15	19	from video_processor.integrators.knowledge_graph import KnowledgeGraph
16	20	from video_processor.integrators.plan_generator import PlanGenerator
17	21	from video_processor.models import (
18	22	ActionItem,
19	23	KeyPoint,
		@@ -145,13 +149,11 @@
145	149	srt_lines = []
146	150	for i, seg in enumerate(segments):
147	151	start = seg.get("start", 0)
148	152	end = seg.get("end", 0)
149	153	srt_lines.append(str(i + 1))
150		- srt_lines.append(
151		- f"{_format_srt_time(start)} --> {_format_srt_time(end)}"
152		- )
	154	+ srt_lines.append(f"{_format_srt_time(start)} --> {_format_srt_time(end)}")
153	155	srt_lines.append(seg.get("text", "").strip())
154	156	srt_lines.append("")
155	157	transcript_srt.write_text("\n".join(srt_lines))
156	158	pipeline_bar.update(1)
157	159
		@@ -158,14 +160,17 @@
158	160	# --- Step 4: Diagram extraction ---
159	161	pm.usage.start_step("Visual analysis")
160	162	pipeline_bar.set_description("Pipeline: analyzing visuals")
161	163	diagrams = []
162	164	screen_captures = []
163		- existing_diagrams = sorted(dirs["diagrams"].glob("diagram_*.json")) if dirs["diagrams"].exists() else []
	165	+ existing_diagrams = (
	166	+ sorted(dirs["diagrams"].glob("diagram_*.json")) if dirs["diagrams"].exists() else []
	167	+ )
164	168	if existing_diagrams:
165	169	logger.info(f"Resuming: found {len(existing_diagrams)} diagrams on disk, skipping analysis")
166	170	from video_processor.models import DiagramResult
	171	+
167	172	for dj in existing_diagrams:
168	173	try:
169	174	diagrams.append(DiagramResult.model_validate_json(dj.read_text()))
170	175	except Exception as e:
171	176	logger.warning(f"Failed to load diagram {dj}: {e}")
		@@ -208,16 +213,12 @@
208	213	pipeline_bar.set_description("Pipeline: extracting key points")
209	214	kp_path = dirs["results"] / "key_points.json"
210	215	ai_path = dirs["results"] / "action_items.json"
211	216	if kp_path.exists() and ai_path.exists():
212	217	logger.info("Resuming: found key points and action items on disk")
213		- key_points = [
214		- KeyPoint(**item) for item in json.loads(kp_path.read_text())
215		- ]
216		- action_items = [
217		- ActionItem(**item) for item in json.loads(ai_path.read_text())
218		- ]
	218	+ key_points = [KeyPoint(**item) for item in json.loads(kp_path.read_text())]
	219	+ action_items = [ActionItem(**item) for item in json.loads(ai_path.read_text())]
219	220	else:
220	221	key_points = _extract_key_points(pm, transcript_text)
221	222	action_items = _extract_action_items(pm, transcript_text)
222	223
223	224	kp_path.write_text(json.dumps([kp.model_dump() for kp in key_points], indent=2))
		@@ -286,13 +287,15 @@
286	287	pipeline_bar.close()
287	288
288	289	# Write manifest
289	290	write_video_manifest(manifest, output_dir)
290	291
291		- logger.info(f"Processing complete in {elapsed:.1f}s: {len(diagrams)} diagrams, "
292		- f"{len(screen_captures)} captures, {len(key_points)} key points, "
293		- f"{len(action_items)} action items")
	292	+ logger.info(
	293	+ f"Processing complete in {elapsed:.1f}s: {len(diagrams)} diagrams, "
	294	+ f"{len(screen_captures)} captures, {len(key_points)} key points, "
	295	+ f"{len(action_items)} action items"
	296	+ )
294	297
295	298	return manifest
296	299
297	300
298	301	def _extract_key_points(pm: ProviderManager, text: str) -> list[KeyPoint]:
299	302

	--- video_processor/pipeline.py
	+++ video_processor/pipeline.py
	@@ -9,11 +9,15 @@
9
10	from tqdm import tqdm
11
12	from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
13	from video_processor.extractors.audio_extractor import AudioExtractor
14	from video_processor.extractors.frame_extractor import extract_frames, filter_people_frames, save_frames




15	from video_processor.integrators.knowledge_graph import KnowledgeGraph
16	from video_processor.integrators.plan_generator import PlanGenerator
17	from video_processor.models import (
18	ActionItem,
19	KeyPoint,
	@@ -145,13 +149,11 @@
145	srt_lines = []
146	for i, seg in enumerate(segments):
147	start = seg.get("start", 0)
148	end = seg.get("end", 0)
149	srt_lines.append(str(i + 1))
150	srt_lines.append(
151	f"{_format_srt_time(start)} --> {_format_srt_time(end)}"
152	)
153	srt_lines.append(seg.get("text", "").strip())
154	srt_lines.append("")
155	transcript_srt.write_text("\n".join(srt_lines))
156	pipeline_bar.update(1)
157
	@@ -158,14 +160,17 @@
158	# --- Step 4: Diagram extraction ---
159	pm.usage.start_step("Visual analysis")
160	pipeline_bar.set_description("Pipeline: analyzing visuals")
161	diagrams = []
162	screen_captures = []
163	existing_diagrams = sorted(dirs["diagrams"].glob("diagram_*.json")) if dirs["diagrams"].exists() else []


164	if existing_diagrams:
165	logger.info(f"Resuming: found {len(existing_diagrams)} diagrams on disk, skipping analysis")
166	from video_processor.models import DiagramResult

167	for dj in existing_diagrams:
168	try:
169	diagrams.append(DiagramResult.model_validate_json(dj.read_text()))
170	except Exception as e:
171	logger.warning(f"Failed to load diagram {dj}: {e}")
	@@ -208,16 +213,12 @@
208	pipeline_bar.set_description("Pipeline: extracting key points")
209	kp_path = dirs["results"] / "key_points.json"
210	ai_path = dirs["results"] / "action_items.json"
211	if kp_path.exists() and ai_path.exists():
212	logger.info("Resuming: found key points and action items on disk")
213	key_points = [
214	KeyPoint(**item) for item in json.loads(kp_path.read_text())
215	]
216	action_items = [
217	ActionItem(**item) for item in json.loads(ai_path.read_text())
218	]
219	else:
220	key_points = _extract_key_points(pm, transcript_text)
221	action_items = _extract_action_items(pm, transcript_text)
222
223	kp_path.write_text(json.dumps([kp.model_dump() for kp in key_points], indent=2))
	@@ -286,13 +287,15 @@
286	pipeline_bar.close()
287
288	# Write manifest
289	write_video_manifest(manifest, output_dir)
290
291	logger.info(f"Processing complete in {elapsed:.1f}s: {len(diagrams)} diagrams, "
292	f"{len(screen_captures)} captures, {len(key_points)} key points, "
293	f"{len(action_items)} action items")


294
295	return manifest
296
297
298	def _extract_key_points(pm: ProviderManager, text: str) -> list[KeyPoint]:
299

	--- video_processor/pipeline.py
	+++ video_processor/pipeline.py
	@@ -9,11 +9,15 @@
9
10	from tqdm import tqdm
11
12	from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
13	from video_processor.extractors.audio_extractor import AudioExtractor
14	from video_processor.extractors.frame_extractor import (
15	extract_frames,
16	filter_people_frames,
17	save_frames,
18	)
19	from video_processor.integrators.knowledge_graph import KnowledgeGraph
20	from video_processor.integrators.plan_generator import PlanGenerator
21	from video_processor.models import (
22	ActionItem,
23	KeyPoint,
	@@ -145,13 +149,11 @@
149	srt_lines = []
150	for i, seg in enumerate(segments):
151	start = seg.get("start", 0)
152	end = seg.get("end", 0)
153	srt_lines.append(str(i + 1))
154	srt_lines.append(f"{_format_srt_time(start)} --> {_format_srt_time(end)}")


155	srt_lines.append(seg.get("text", "").strip())
156	srt_lines.append("")
157	transcript_srt.write_text("\n".join(srt_lines))
158	pipeline_bar.update(1)
159
	@@ -158,14 +160,17 @@
160	# --- Step 4: Diagram extraction ---
161	pm.usage.start_step("Visual analysis")
162	pipeline_bar.set_description("Pipeline: analyzing visuals")
163	diagrams = []
164	screen_captures = []
165	existing_diagrams = (
166	sorted(dirs["diagrams"].glob("diagram_*.json")) if dirs["diagrams"].exists() else []
167	)
168	if existing_diagrams:
169	logger.info(f"Resuming: found {len(existing_diagrams)} diagrams on disk, skipping analysis")
170	from video_processor.models import DiagramResult
171
172	for dj in existing_diagrams:
173	try:
174	diagrams.append(DiagramResult.model_validate_json(dj.read_text()))
175	except Exception as e:
176	logger.warning(f"Failed to load diagram {dj}: {e}")
	@@ -208,16 +213,12 @@
213	pipeline_bar.set_description("Pipeline: extracting key points")
214	kp_path = dirs["results"] / "key_points.json"
215	ai_path = dirs["results"] / "action_items.json"
216	if kp_path.exists() and ai_path.exists():
217	logger.info("Resuming: found key points and action items on disk")
218	key_points = [KeyPoint(**item) for item in json.loads(kp_path.read_text())]
219	action_items = [ActionItem(**item) for item in json.loads(ai_path.read_text())]




220	else:
221	key_points = _extract_key_points(pm, transcript_text)
222	action_items = _extract_action_items(pm, transcript_text)
223
224	kp_path.write_text(json.dumps([kp.model_dump() for kp in key_points], indent=2))
	@@ -286,13 +287,15 @@
287	pipeline_bar.close()
288
289	# Write manifest
290	write_video_manifest(manifest, output_dir)
291
292	logger.info(
293	f"Processing complete in {elapsed:.1f}s: {len(diagrams)} diagrams, "
294	f"{len(screen_captures)} captures, {len(key_points)} key points, "
295	f"{len(action_items)} action items"
296	)
297
298	return manifest
299
300
301	def _extract_key_points(pm: ProviderManager, text: str) -> list[KeyPoint]:
302

M video_processor/providers/anthropic_provider.py

+8 -6

		--- video_processor/providers/anthropic_provider.py
		+++ video_processor/providers/anthropic_provider.py
		@@ -97,14 +97,16 @@
97	97	try:
98	98	page = self.client.models.list(limit=100)
99	99	for m in page.data:
100	100	mid = m.id
101	101	caps = ["chat", "vision"] # All Claude models support chat + vision
102		- models.append(ModelInfo(
103		- id=mid,
104		- provider="anthropic",
105		- display_name=getattr(m, "display_name", mid),
106		- capabilities=caps,
107		- ))
	102	+ models.append(
	103	+ ModelInfo(
	104	+ id=mid,
	105	+ provider="anthropic",
	106	+ display_name=getattr(m, "display_name", mid),
	107	+ capabilities=caps,
	108	+ )
	109	+ )
108	110	except Exception as e:
109	111	logger.warning(f"Failed to list Anthropic models: {e}")
110	112	return sorted(models, key=lambda m: m.id)
111	113

	--- video_processor/providers/anthropic_provider.py
	+++ video_processor/providers/anthropic_provider.py
	@@ -97,14 +97,16 @@
97	try:
98	page = self.client.models.list(limit=100)
99	for m in page.data:
100	mid = m.id
101	caps = ["chat", "vision"] # All Claude models support chat + vision
102	models.append(ModelInfo(
103	id=mid,
104	provider="anthropic",
105	display_name=getattr(m, "display_name", mid),
106	capabilities=caps,
107	))


108	except Exception as e:
109	logger.warning(f"Failed to list Anthropic models: {e}")
110	return sorted(models, key=lambda m: m.id)
111

	--- video_processor/providers/anthropic_provider.py
	+++ video_processor/providers/anthropic_provider.py
	@@ -97,14 +97,16 @@
97	try:
98	page = self.client.models.list(limit=100)
99	for m in page.data:
100	mid = m.id
101	caps = ["chat", "vision"] # All Claude models support chat + vision
102	models.append(
103	ModelInfo(
104	id=mid,
105	provider="anthropic",
106	display_name=getattr(m, "display_name", mid),
107	capabilities=caps,
108	)
109	)
110	except Exception as e:
111	logger.warning(f"Failed to list Anthropic models: {e}")
112	return sorted(models, key=lambda m: m.id)
113

M video_processor/providers/base.py

+2 -2

		--- video_processor/providers/base.py
		+++ video_processor/providers/base.py
		@@ -7,16 +7,16 @@
7	7	from pydantic import BaseModel, Field
8	8
9	9
10	10	class ModelInfo(BaseModel):
11	11	"""Information about an available model."""
	12	+
12	13	id: str = Field(description="Model identifier (e.g. gpt-4o)")
13	14	provider: str = Field(description="Provider name (openai, anthropic, gemini)")
14	15	display_name: str = Field(default="", description="Human-readable name")
15	16	capabilities: List[str] = Field(
16		- default_factory=list,
17		- description="Model capabilities: chat, vision, audio, embedding"
	17	+ default_factory=list, description="Model capabilities: chat, vision, audio, embedding"
18	18	)
19	19
20	20
21	21	class BaseProvider(ABC):
22	22	"""Abstract base for all provider implementations."""
23	23

	--- video_processor/providers/base.py
	+++ video_processor/providers/base.py
	@@ -7,16 +7,16 @@
7	from pydantic import BaseModel, Field
8
9
10	class ModelInfo(BaseModel):
11	"""Information about an available model."""

12	id: str = Field(description="Model identifier (e.g. gpt-4o)")
13	provider: str = Field(description="Provider name (openai, anthropic, gemini)")
14	display_name: str = Field(default="", description="Human-readable name")
15	capabilities: List[str] = Field(
16	default_factory=list,
17	description="Model capabilities: chat, vision, audio, embedding"
18	)
19
20
21	class BaseProvider(ABC):
22	"""Abstract base for all provider implementations."""
23

	--- video_processor/providers/base.py
	+++ video_processor/providers/base.py
	@@ -7,16 +7,16 @@
7	from pydantic import BaseModel, Field
8
9
10	class ModelInfo(BaseModel):
11	"""Information about an available model."""
12
13	id: str = Field(description="Model identifier (e.g. gpt-4o)")
14	provider: str = Field(description="Provider name (openai, anthropic, gemini)")
15	display_name: str = Field(default="", description="Human-readable name")
16	capabilities: List[str] = Field(
17	default_factory=list, description="Model capabilities: chat, vision, audio, embedding"

18	)
19
20
21	class BaseProvider(ABC):
22	"""Abstract base for all provider implementations."""
23

M video_processor/providers/discovery.py

+3

		--- video_processor/providers/discovery.py
		+++ video_processor/providers/discovery.py
		@@ -38,10 +38,11 @@
38	38
39	39	# OpenAI
40	40	if keys.get("openai"):
41	41	try:
42	42	from video_processor.providers.openai_provider import OpenAIProvider
	43	+
43	44	provider = OpenAIProvider(api_key=keys["openai"])
44	45	models = provider.list_models()
45	46	logger.info(f"Discovered {len(models)} OpenAI models")
46	47	all_models.extend(models)
47	48	except Exception as e:
		@@ -49,10 +50,11 @@
49	50
50	51	# Anthropic
51	52	if keys.get("anthropic"):
52	53	try:
53	54	from video_processor.providers.anthropic_provider import AnthropicProvider
	55	+
54	56	provider = AnthropicProvider(api_key=keys["anthropic"])
55	57	models = provider.list_models()
56	58	logger.info(f"Discovered {len(models)} Anthropic models")
57	59	all_models.extend(models)
58	60	except Exception as e:
		@@ -62,10 +64,11 @@
62	64	gemini_key = keys.get("gemini")
63	65	gemini_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS", "")
64	66	if gemini_key or gemini_creds:
65	67	try:
66	68	from video_processor.providers.gemini_provider import GeminiProvider
	69	+
67	70	provider = GeminiProvider(
68	71	api_key=gemini_key or None,
69	72	credentials_path=gemini_creds or None,
70	73	)
71	74	models = provider.list_models()
72	75

	--- video_processor/providers/discovery.py
	+++ video_processor/providers/discovery.py
	@@ -38,10 +38,11 @@
38
39	# OpenAI
40	if keys.get("openai"):
41	try:
42	from video_processor.providers.openai_provider import OpenAIProvider

43	provider = OpenAIProvider(api_key=keys["openai"])
44	models = provider.list_models()
45	logger.info(f"Discovered {len(models)} OpenAI models")
46	all_models.extend(models)
47	except Exception as e:
	@@ -49,10 +50,11 @@
49
50	# Anthropic
51	if keys.get("anthropic"):
52	try:
53	from video_processor.providers.anthropic_provider import AnthropicProvider

54	provider = AnthropicProvider(api_key=keys["anthropic"])
55	models = provider.list_models()
56	logger.info(f"Discovered {len(models)} Anthropic models")
57	all_models.extend(models)
58	except Exception as e:
	@@ -62,10 +64,11 @@
62	gemini_key = keys.get("gemini")
63	gemini_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS", "")
64	if gemini_key or gemini_creds:
65	try:
66	from video_processor.providers.gemini_provider import GeminiProvider

67	provider = GeminiProvider(
68	api_key=gemini_key or None,
69	credentials_path=gemini_creds or None,
70	)
71	models = provider.list_models()
72

	--- video_processor/providers/discovery.py
	+++ video_processor/providers/discovery.py
	@@ -38,10 +38,11 @@
38
39	# OpenAI
40	if keys.get("openai"):
41	try:
42	from video_processor.providers.openai_provider import OpenAIProvider
43
44	provider = OpenAIProvider(api_key=keys["openai"])
45	models = provider.list_models()
46	logger.info(f"Discovered {len(models)} OpenAI models")
47	all_models.extend(models)
48	except Exception as e:
	@@ -49,10 +50,11 @@
50
51	# Anthropic
52	if keys.get("anthropic"):
53	try:
54	from video_processor.providers.anthropic_provider import AnthropicProvider
55
56	provider = AnthropicProvider(api_key=keys["anthropic"])
57	models = provider.list_models()
58	logger.info(f"Discovered {len(models)} Anthropic models")
59	all_models.extend(models)
60	except Exception as e:
	@@ -62,10 +64,11 @@
64	gemini_key = keys.get("gemini")
65	gemini_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS", "")
66	if gemini_key or gemini_creds:
67	try:
68	from video_processor.providers.gemini_provider import GeminiProvider
69
70	provider = GeminiProvider(
71	api_key=gemini_key or None,
72	credentials_path=gemini_creds or None,
73	)
74	models = provider.list_models()
75

M video_processor/providers/gemini_provider.py

+19 -16

		--- video_processor/providers/gemini_provider.py
		+++ video_processor/providers/gemini_provider.py
		@@ -29,16 +29,15 @@
29	29	):
30	30	self.api_key = api_key or os.getenv("GEMINI_API_KEY")
31	31	self.credentials_path = credentials_path or os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
32	32
33	33	if not self.api_key and not self.credentials_path:
34		- raise ValueError(
35		- "Neither GEMINI_API_KEY nor GOOGLE_APPLICATION_CREDENTIALS is set"
36		- )
	34	+ raise ValueError("Neither GEMINI_API_KEY nor GOOGLE_APPLICATION_CREDENTIALS is set")
37	35
38	36	try:
39	37	from google import genai
	38	+
40	39	self._genai = genai
41	40
42	41	if self.api_key:
43	42	self.client = genai.Client(api_key=self.api_key)
44	43	else:
		@@ -55,12 +54,11 @@
55	54	project=project,
56	55	location=location,
57	56	)
58	57	except ImportError:
59	58	raise ImportError(
60		- "google-genai package not installed. "
61		- "Install with: pip install google-genai"
	59	+ "google-genai package not installed. Install with: pip install google-genai"
62	60	)
63	61
64	62	def chat(
65	63	self,
66	64	messages: list[dict],
		@@ -73,14 +71,16 @@
73	71	model = model or "gemini-2.5-flash"
74	72	# Convert OpenAI-style messages to Gemini contents
75	73	contents = []
76	74	for msg in messages:
77	75	role = "user" if msg["role"] == "user" else "model"
78		- contents.append(types.Content(
79		- role=role,
80		- parts=[types.Part.from_text(text=msg["content"])],
81		- ))
	76	+ contents.append(
	77	+ types.Content(
	78	+ role=role,
	79	+ parts=[types.Part.from_text(text=msg["content"])],
	80	+ )
	81	+ )
82	82
83	83	response = self.client.models.generate_content(
84	84	model=model,
85	85	contents=contents,
86	86	config=types.GenerateContentConfig(
		@@ -168,10 +168,11 @@
168	168	),
169	169	)
170	170
171	171	# Parse JSON response
172	172	import json
	173	+
173	174	try:
174	175	data = json.loads(response.text)
175	176	except (json.JSONDecodeError, TypeError):
176	177	data = {"text": response.text or "", "segments": []}
177	178
		@@ -190,11 +191,11 @@
190	191	for m in self.client.models.list():
191	192	mid = m.name or ""
192	193	# Strip prefix variants from different API modes
193	194	for prefix in ("models/", "publishers/google/models/"):
194	195	if mid.startswith(prefix):
195		- mid = mid[len(prefix):]
	196	+ mid = mid[len(prefix) :]
196	197	break
197	198	display = getattr(m, "display_name", mid) or mid
198	199
199	200	caps = []
200	201	mid_lower = mid.lower()
		@@ -206,14 +207,16 @@
206	207	caps.append("audio")
207	208	if "embedding" in mid_lower:
208	209	caps.append("embedding")
209	210
210	211	if caps:
211		- models.append(ModelInfo(
212		- id=mid,
213		- provider="gemini",
214		- display_name=display,
215		- capabilities=caps,
216		- ))
	212	+ models.append(
	213	+ ModelInfo(
	214	+ id=mid,
	215	+ provider="gemini",
	216	+ display_name=display,
	217	+ capabilities=caps,
	218	+ )
	219	+ )
217	220	except Exception as e:
218	221	logger.warning(f"Failed to list Gemini models: {e}")
219	222	return sorted(models, key=lambda m: m.id)
220	223

	--- video_processor/providers/gemini_provider.py
	+++ video_processor/providers/gemini_provider.py
	@@ -29,16 +29,15 @@
29	):
30	self.api_key = api_key or os.getenv("GEMINI_API_KEY")
31	self.credentials_path = credentials_path or os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
32
33	if not self.api_key and not self.credentials_path:
34	raise ValueError(
35	"Neither GEMINI_API_KEY nor GOOGLE_APPLICATION_CREDENTIALS is set"
36	)
37
38	try:
39	from google import genai

40	self._genai = genai
41
42	if self.api_key:
43	self.client = genai.Client(api_key=self.api_key)
44	else:
	@@ -55,12 +54,11 @@
55	project=project,
56	location=location,
57	)
58	except ImportError:
59	raise ImportError(
60	"google-genai package not installed. "
61	"Install with: pip install google-genai"
62	)
63
64	def chat(
65	self,
66	messages: list[dict],
	@@ -73,14 +71,16 @@
73	model = model or "gemini-2.5-flash"
74	# Convert OpenAI-style messages to Gemini contents
75	contents = []
76	for msg in messages:
77	role = "user" if msg["role"] == "user" else "model"
78	contents.append(types.Content(
79	role=role,
80	parts=[types.Part.from_text(text=msg["content"])],
81	))


82
83	response = self.client.models.generate_content(
84	model=model,
85	contents=contents,
86	config=types.GenerateContentConfig(
	@@ -168,10 +168,11 @@
168	),
169	)
170
171	# Parse JSON response
172	import json

173	try:
174	data = json.loads(response.text)
175	except (json.JSONDecodeError, TypeError):
176	data = {"text": response.text or "", "segments": []}
177
	@@ -190,11 +191,11 @@
190	for m in self.client.models.list():
191	mid = m.name or ""
192	# Strip prefix variants from different API modes
193	for prefix in ("models/", "publishers/google/models/"):
194	if mid.startswith(prefix):
195	mid = mid[len(prefix):]
196	break
197	display = getattr(m, "display_name", mid) or mid
198
199	caps = []
200	mid_lower = mid.lower()
	@@ -206,14 +207,16 @@
206	caps.append("audio")
207	if "embedding" in mid_lower:
208	caps.append("embedding")
209
210	if caps:
211	models.append(ModelInfo(
212	id=mid,
213	provider="gemini",
214	display_name=display,
215	capabilities=caps,
216	))


217	except Exception as e:
218	logger.warning(f"Failed to list Gemini models: {e}")
219	return sorted(models, key=lambda m: m.id)
220

	--- video_processor/providers/gemini_provider.py
	+++ video_processor/providers/gemini_provider.py
	@@ -29,16 +29,15 @@
29	):
30	self.api_key = api_key or os.getenv("GEMINI_API_KEY")
31	self.credentials_path = credentials_path or os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
32
33	if not self.api_key and not self.credentials_path:
34	raise ValueError("Neither GEMINI_API_KEY nor GOOGLE_APPLICATION_CREDENTIALS is set")


35
36	try:
37	from google import genai
38
39	self._genai = genai
40
41	if self.api_key:
42	self.client = genai.Client(api_key=self.api_key)
43	else:
	@@ -55,12 +54,11 @@
54	project=project,
55	location=location,
56	)
57	except ImportError:
58	raise ImportError(
59	"google-genai package not installed. Install with: pip install google-genai"

60	)
61
62	def chat(
63	self,
64	messages: list[dict],
	@@ -73,14 +71,16 @@
71	model = model or "gemini-2.5-flash"
72	# Convert OpenAI-style messages to Gemini contents
73	contents = []
74	for msg in messages:
75	role = "user" if msg["role"] == "user" else "model"
76	contents.append(
77	types.Content(
78	role=role,
79	parts=[types.Part.from_text(text=msg["content"])],
80	)
81	)
82
83	response = self.client.models.generate_content(
84	model=model,
85	contents=contents,
86	config=types.GenerateContentConfig(
	@@ -168,10 +168,11 @@
168	),
169	)
170
171	# Parse JSON response
172	import json
173
174	try:
175	data = json.loads(response.text)
176	except (json.JSONDecodeError, TypeError):
177	data = {"text": response.text or "", "segments": []}
178
	@@ -190,11 +191,11 @@
191	for m in self.client.models.list():
192	mid = m.name or ""
193	# Strip prefix variants from different API modes
194	for prefix in ("models/", "publishers/google/models/"):
195	if mid.startswith(prefix):
196	mid = mid[len(prefix) :]
197	break
198	display = getattr(m, "display_name", mid) or mid
199
200	caps = []
201	mid_lower = mid.lower()
	@@ -206,14 +207,16 @@
207	caps.append("audio")
208	if "embedding" in mid_lower:
209	caps.append("embedding")
210
211	if caps:
212	models.append(
213	ModelInfo(
214	id=mid,
215	provider="gemini",
216	display_name=display,
217	capabilities=caps,
218	)
219	)
220	except Exception as e:
221	logger.warning(f"Failed to list Gemini models: {e}")
222	return sorted(models, key=lambda m: m.id)
223

M video_processor/providers/manager.py

+29 -7

		--- video_processor/providers/manager.py
		+++ video_processor/providers/manager.py
		@@ -1,9 +1,8 @@
1	1	"""ProviderManager - unified interface for routing API calls to the best available provider."""
2	2
3	3	import logging
4		-import os
5	4	from pathlib import Path
6	5	from typing import Optional
7	6
8	7	from dotenv import load_dotenv
9	8
		@@ -67,11 +66,13 @@
67	66
68	67	# If a single provider is forced, apply it
69	68	if provider:
70	69	self.vision_model = vision_model or self._default_for_provider(provider, "vision")
71	70	self.chat_model = chat_model or self._default_for_provider(provider, "chat")
72		- self.transcription_model = transcription_model or self._default_for_provider(provider, "audio")
	71	+ self.transcription_model = transcription_model or self._default_for_provider(
	72	+ provider, "audio"
	73	+ )
73	74	else:
74	75	self.vision_model = vision_model
75	76	self.chat_model = chat_model
76	77	self.transcription_model = transcription_model
77	78
		@@ -80,34 +81,51 @@
80	81	@staticmethod
81	82	def _default_for_provider(provider: str, capability: str) -> str:
82	83	"""Return the default model for a provider/capability combo."""
83	84	defaults = {
84	85	"openai": {"chat": "gpt-4o", "vision": "gpt-4o", "audio": "whisper-1"},
85		- "anthropic": {"chat": "claude-sonnet-4-5-20250929", "vision": "claude-sonnet-4-5-20250929", "audio": ""},
86		- "gemini": {"chat": "gemini-2.5-flash", "vision": "gemini-2.5-flash", "audio": "gemini-2.5-flash"},
	86	+ "anthropic": {
	87	+ "chat": "claude-sonnet-4-5-20250929",
	88	+ "vision": "claude-sonnet-4-5-20250929",
	89	+ "audio": "",
	90	+ },
	91	+ "gemini": {
	92	+ "chat": "gemini-2.5-flash",
	93	+ "vision": "gemini-2.5-flash",
	94	+ "audio": "gemini-2.5-flash",
	95	+ },
87	96	}
88	97	return defaults.get(provider, {}).get(capability, "")
89	98
90	99	def _get_provider(self, provider_name: str) -> BaseProvider:
91	100	"""Lazily initialize and cache a provider instance."""
92	101	if provider_name not in self._providers:
93	102	if provider_name == "openai":
94	103	from video_processor.providers.openai_provider import OpenAIProvider
	104	+
95	105	self._providers[provider_name] = OpenAIProvider()
96	106	elif provider_name == "anthropic":
97	107	from video_processor.providers.anthropic_provider import AnthropicProvider
	108	+
98	109	self._providers[provider_name] = AnthropicProvider()
99	110	elif provider_name == "gemini":
100	111	from video_processor.providers.gemini_provider import GeminiProvider
	112	+
101	113	self._providers[provider_name] = GeminiProvider()
102	114	else:
103	115	raise ValueError(f"Unknown provider: {provider_name}")
104	116	return self._providers[provider_name]
105	117
106	118	def _provider_for_model(self, model_id: str) -> str:
107	119	"""Infer the provider from a model id."""
108		- if model_id.startswith("gpt-") or model_id.startswith("o1") or model_id.startswith("o3") or model_id.startswith("o4") or model_id.startswith("whisper"):
	120	+ if (
	121	+ model_id.startswith("gpt-")
	122	+ or model_id.startswith("o1")
	123	+ or model_id.startswith("o3")
	124	+ or model_id.startswith("o4")
	125	+ or model_id.startswith("whisper")
	126	+ ):
109	127	return "openai"
110	128	if model_id.startswith("claude-"):
111	129	return "anthropic"
112	130	if model_id.startswith("gemini-"):
113	131	return "gemini"
		@@ -121,11 +139,13 @@
121	139	def _get_available_models(self) -> list[ModelInfo]:
122	140	if self._available_models is None:
123	141	self._available_models = discover_available_models()
124	142	return self._available_models
125	143
126		- def _resolve_model(self, explicit: Optional[str], capability: str, preferences: list[tuple[str, str]]) -> tuple[str, str]:
	144	+ def _resolve_model(
	145	+ self, explicit: Optional[str], capability: str, preferences: list[tuple[str, str]]
	146	+ ) -> tuple[str, str]:
127	147	"""
128	148	Resolve which (provider, model) to use for a capability.
129	149
130	150	Returns (provider_name, model_id).
131	151	"""
		@@ -169,11 +189,13 @@
169	189	) -> str:
170	190	"""Send a chat completion to the best available provider."""
171	191	prov_name, model = self._resolve_model(self.chat_model, "chat", _CHAT_PREFERENCES)
172	192	logger.info(f"Chat: using {prov_name}/{model}")
173	193	provider = self._get_provider(prov_name)
174		- result = provider.chat(messages, max_tokens=max_tokens, temperature=temperature, model=model)
	194	+ result = provider.chat(
	195	+ messages, max_tokens=max_tokens, temperature=temperature, model=model
	196	+ )
175	197	self._track(provider, prov_name, model)
176	198	return result
177	199
178	200	def analyze_image(
179	201	self,
180	202

	--- video_processor/providers/manager.py
	+++ video_processor/providers/manager.py
	@@ -1,9 +1,8 @@
1	"""ProviderManager - unified interface for routing API calls to the best available provider."""
2
3	import logging
4	import os
5	from pathlib import Path
6	from typing import Optional
7
8	from dotenv import load_dotenv
9
	@@ -67,11 +66,13 @@
67
68	# If a single provider is forced, apply it
69	if provider:
70	self.vision_model = vision_model or self._default_for_provider(provider, "vision")
71	self.chat_model = chat_model or self._default_for_provider(provider, "chat")
72	self.transcription_model = transcription_model or self._default_for_provider(provider, "audio")


73	else:
74	self.vision_model = vision_model
75	self.chat_model = chat_model
76	self.transcription_model = transcription_model
77
	@@ -80,34 +81,51 @@
80	@staticmethod
81	def _default_for_provider(provider: str, capability: str) -> str:
82	"""Return the default model for a provider/capability combo."""
83	defaults = {
84	"openai": {"chat": "gpt-4o", "vision": "gpt-4o", "audio": "whisper-1"},
85	"anthropic": {"chat": "claude-sonnet-4-5-20250929", "vision": "claude-sonnet-4-5-20250929", "audio": ""},
86	"gemini": {"chat": "gemini-2.5-flash", "vision": "gemini-2.5-flash", "audio": "gemini-2.5-flash"},








87	}
88	return defaults.get(provider, {}).get(capability, "")
89
90	def _get_provider(self, provider_name: str) -> BaseProvider:
91	"""Lazily initialize and cache a provider instance."""
92	if provider_name not in self._providers:
93	if provider_name == "openai":
94	from video_processor.providers.openai_provider import OpenAIProvider

95	self._providers[provider_name] = OpenAIProvider()
96	elif provider_name == "anthropic":
97	from video_processor.providers.anthropic_provider import AnthropicProvider

98	self._providers[provider_name] = AnthropicProvider()
99	elif provider_name == "gemini":
100	from video_processor.providers.gemini_provider import GeminiProvider

101	self._providers[provider_name] = GeminiProvider()
102	else:
103	raise ValueError(f"Unknown provider: {provider_name}")
104	return self._providers[provider_name]
105
106	def _provider_for_model(self, model_id: str) -> str:
107	"""Infer the provider from a model id."""
108	if model_id.startswith("gpt-") or model_id.startswith("o1") or model_id.startswith("o3") or model_id.startswith("o4") or model_id.startswith("whisper"):






109	return "openai"
110	if model_id.startswith("claude-"):
111	return "anthropic"
112	if model_id.startswith("gemini-"):
113	return "gemini"
	@@ -121,11 +139,13 @@
121	def _get_available_models(self) -> list[ModelInfo]:
122	if self._available_models is None:
123	self._available_models = discover_available_models()
124	return self._available_models
125
126	def _resolve_model(self, explicit: Optional[str], capability: str, preferences: list[tuple[str, str]]) -> tuple[str, str]:


127	"""
128	Resolve which (provider, model) to use for a capability.
129
130	Returns (provider_name, model_id).
131	"""
	@@ -169,11 +189,13 @@
169	) -> str:
170	"""Send a chat completion to the best available provider."""
171	prov_name, model = self._resolve_model(self.chat_model, "chat", _CHAT_PREFERENCES)
172	logger.info(f"Chat: using {prov_name}/{model}")
173	provider = self._get_provider(prov_name)
174	result = provider.chat(messages, max_tokens=max_tokens, temperature=temperature, model=model)


175	self._track(provider, prov_name, model)
176	return result
177
178	def analyze_image(
179	self,
180

	--- video_processor/providers/manager.py
	+++ video_processor/providers/manager.py
	@@ -1,9 +1,8 @@
1	"""ProviderManager - unified interface for routing API calls to the best available provider."""
2
3	import logging

4	from pathlib import Path
5	from typing import Optional
6
7	from dotenv import load_dotenv
8
	@@ -67,11 +66,13 @@
66
67	# If a single provider is forced, apply it
68	if provider:
69	self.vision_model = vision_model or self._default_for_provider(provider, "vision")
70	self.chat_model = chat_model or self._default_for_provider(provider, "chat")
71	self.transcription_model = transcription_model or self._default_for_provider(
72	provider, "audio"
73	)
74	else:
75	self.vision_model = vision_model
76	self.chat_model = chat_model
77	self.transcription_model = transcription_model
78
	@@ -80,34 +81,51 @@
81	@staticmethod
82	def _default_for_provider(provider: str, capability: str) -> str:
83	"""Return the default model for a provider/capability combo."""
84	defaults = {
85	"openai": {"chat": "gpt-4o", "vision": "gpt-4o", "audio": "whisper-1"},
86	"anthropic": {
87	"chat": "claude-sonnet-4-5-20250929",
88	"vision": "claude-sonnet-4-5-20250929",
89	"audio": "",
90	},
91	"gemini": {
92	"chat": "gemini-2.5-flash",
93	"vision": "gemini-2.5-flash",
94	"audio": "gemini-2.5-flash",
95	},
96	}
97	return defaults.get(provider, {}).get(capability, "")
98
99	def _get_provider(self, provider_name: str) -> BaseProvider:
100	"""Lazily initialize and cache a provider instance."""
101	if provider_name not in self._providers:
102	if provider_name == "openai":
103	from video_processor.providers.openai_provider import OpenAIProvider
104
105	self._providers[provider_name] = OpenAIProvider()
106	elif provider_name == "anthropic":
107	from video_processor.providers.anthropic_provider import AnthropicProvider
108
109	self._providers[provider_name] = AnthropicProvider()
110	elif provider_name == "gemini":
111	from video_processor.providers.gemini_provider import GeminiProvider
112
113	self._providers[provider_name] = GeminiProvider()
114	else:
115	raise ValueError(f"Unknown provider: {provider_name}")
116	return self._providers[provider_name]
117
118	def _provider_for_model(self, model_id: str) -> str:
119	"""Infer the provider from a model id."""
120	if (
121	model_id.startswith("gpt-")
122	or model_id.startswith("o1")
123	or model_id.startswith("o3")
124	or model_id.startswith("o4")
125	or model_id.startswith("whisper")
126	):
127	return "openai"
128	if model_id.startswith("claude-"):
129	return "anthropic"
130	if model_id.startswith("gemini-"):
131	return "gemini"
	@@ -121,11 +139,13 @@
139	def _get_available_models(self) -> list[ModelInfo]:
140	if self._available_models is None:
141	self._available_models = discover_available_models()
142	return self._available_models
143
144	def _resolve_model(
145	self, explicit: Optional[str], capability: str, preferences: list[tuple[str, str]]
146	) -> tuple[str, str]:
147	"""
148	Resolve which (provider, model) to use for a capability.
149
150	Returns (provider_name, model_id).
151	"""
	@@ -169,11 +189,13 @@
189	) -> str:
190	"""Send a chat completion to the best available provider."""
191	prov_name, model = self._resolve_model(self.chat_model, "chat", _CHAT_PREFERENCES)
192	logger.info(f"Chat: using {prov_name}/{model}")
193	provider = self._get_provider(prov_name)
194	result = provider.chat(
195	messages, max_tokens=max_tokens, temperature=temperature, model=model
196	)
197	self._track(provider, prov_name, model)
198	return result
199
200	def analyze_image(
201	self,
202

M video_processor/providers/openai_provider.py

+36 -20

		--- video_processor/providers/openai_provider.py
		+++ video_processor/providers/openai_provider.py
		@@ -13,11 +13,22 @@
13	13
14	14	load_dotenv()
15	15	logger = logging.getLogger(__name__)
16	16
17	17	# Models known to have vision capability
18		-_VISION_MODELS = {"gpt-4o", "gpt-4o-mini", "gpt-4-turbo", "gpt-4.1", "gpt-4.1-mini", "gpt-4.1-nano", "o1", "o3", "o3-mini", "o4-mini"}
	18	+_VISION_MODELS = {
	19	+ "gpt-4o",
	20	+ "gpt-4o-mini",
	21	+ "gpt-4-turbo",
	22	+ "gpt-4.1",
	23	+ "gpt-4.1-mini",
	24	+ "gpt-4.1-nano",
	25	+ "o1",
	26	+ "o3",
	27	+ "o3-mini",
	28	+ "o4-mini",
	29	+}
19	30	_AUDIO_MODELS = {"whisper-1"}
20	31
21	32
22	33	class OpenAIProvider(BaseProvider):
23	34	"""OpenAI API provider."""
		@@ -44,11 +55,13 @@
44	55	max_tokens=max_tokens,
45	56	temperature=temperature,
46	57	)
47	58	self._last_usage = {
48	59	"input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0,
49		- "output_tokens": getattr(response.usage, "completion_tokens", 0) if response.usage else 0,
	60	+ "output_tokens": getattr(response.usage, "completion_tokens", 0)
	61	+ if response.usage
	62	+ else 0,
50	63	}
51	64	return response.choices[0].message.content or ""
52	65
53	66	def analyze_image(
54	67	self,
		@@ -75,11 +88,13 @@
75	88	],
76	89	max_tokens=max_tokens,
77	90	)
78	91	self._last_usage = {
79	92	"input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0,
80		- "output_tokens": getattr(response.usage, "completion_tokens", 0) if response.usage else 0,
	93	+ "output_tokens": getattr(response.usage, "completion_tokens", 0)
	94	+ if response.usage
	95	+ else 0,
81	96	}
82	97	return response.choices[0].message.content or ""
83	98
84	99	# Whisper API limit is 25MB
85	100	_MAX_FILE_SIZE = 25 * 1024 * 1024
		@@ -101,13 +116,11 @@
101	116	logger.info(
102	117	f"Audio file {file_size / 1024 / 1024:.1f}MB exceeds Whisper 25MB limit, chunking..."
103	118	)
104	119	return self._transcribe_chunked(audio_path, language, model)
105	120
106		- def _transcribe_single(
107		- self, audio_path: Path, language: Optional[str], model: str
108		- ) -> dict:
	121	+ def _transcribe_single(self, audio_path: Path, language: Optional[str], model: str) -> dict:
109	122	"""Transcribe a single audio file."""
110	123	with open(audio_path, "rb") as f:
111	124	kwargs = {"model": model, "file": f}
112	125	if language:
113	126	kwargs["language"] = language
		@@ -128,15 +141,14 @@
128	141	"duration": getattr(response, "duration", None),
129	142	"provider": "openai",
130	143	"model": model,
131	144	}
132	145
133		- def _transcribe_chunked(
134		- self, audio_path: Path, language: Optional[str], model: str
135		- ) -> dict:
	146	+ def _transcribe_chunked(self, audio_path: Path, language: Optional[str], model: str) -> dict:
136	147	"""Split audio into chunks under 25MB and transcribe each."""
137	148	import tempfile
	149	+
138	150	from video_processor.extractors.audio_extractor import AudioExtractor
139	151
140	152	extractor = AudioExtractor()
141	153	audio_data, sr = extractor.load_audio(audio_path)
142	154	total_duration = len(audio_data) / sr
		@@ -164,15 +176,17 @@
164	176	logger.info(f"Transcribing chunk {i + 1}/{len(segments_data)}...")
165	177	result = self._transcribe_single(chunk_path, language, model)
166	178
167	179	all_text.append(result["text"])
168	180	for seg in result.get("segments", []):
169		- all_segments.append({
170		- "start": seg["start"] + time_offset,
171		- "end": seg["end"] + time_offset,
172		- "text": seg["text"],
173		- })
	181	+ all_segments.append(
	182	+ {
	183	+ "start": seg["start"] + time_offset,
	184	+ "end": seg["end"] + time_offset,
	185	+ "text": seg["text"],
	186	+ }
	187	+ )
174	188
175	189	if not detected_language and result.get("language"):
176	190	detected_language = result["language"]
177	191
178	192	time_offset += len(chunk) / sr
		@@ -200,14 +214,16 @@
200	214	if mid in _AUDIO_MODELS or mid.startswith("whisper"):
201	215	caps.append("audio")
202	216	if "embedding" in mid:
203	217	caps.append("embedding")
204	218	if caps:
205		- models.append(ModelInfo(
206		- id=mid,
207		- provider="openai",
208		- display_name=mid,
209		- capabilities=caps,
210		- ))
	219	+ models.append(
	220	+ ModelInfo(
	221	+ id=mid,
	222	+ provider="openai",
	223	+ display_name=mid,
	224	+ capabilities=caps,
	225	+ )
	226	+ )
211	227	except Exception as e:
212	228	logger.warning(f"Failed to list OpenAI models: {e}")
213	229	return sorted(models, key=lambda m: m.id)
214	230

	--- video_processor/providers/openai_provider.py
	+++ video_processor/providers/openai_provider.py
	@@ -13,11 +13,22 @@
13
14	load_dotenv()
15	logger = logging.getLogger(__name__)
16
17	# Models known to have vision capability
18	_VISION_MODELS = {"gpt-4o", "gpt-4o-mini", "gpt-4-turbo", "gpt-4.1", "gpt-4.1-mini", "gpt-4.1-nano", "o1", "o3", "o3-mini", "o4-mini"}











19	_AUDIO_MODELS = {"whisper-1"}
20
21
22	class OpenAIProvider(BaseProvider):
23	"""OpenAI API provider."""
	@@ -44,11 +55,13 @@
44	max_tokens=max_tokens,
45	temperature=temperature,
46	)
47	self._last_usage = {
48	"input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0,
49	"output_tokens": getattr(response.usage, "completion_tokens", 0) if response.usage else 0,


50	}
51	return response.choices[0].message.content or ""
52
53	def analyze_image(
54	self,
	@@ -75,11 +88,13 @@
75	],
76	max_tokens=max_tokens,
77	)
78	self._last_usage = {
79	"input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0,
80	"output_tokens": getattr(response.usage, "completion_tokens", 0) if response.usage else 0,


81	}
82	return response.choices[0].message.content or ""
83
84	# Whisper API limit is 25MB
85	_MAX_FILE_SIZE = 25 * 1024 * 1024
	@@ -101,13 +116,11 @@
101	logger.info(
102	f"Audio file {file_size / 1024 / 1024:.1f}MB exceeds Whisper 25MB limit, chunking..."
103	)
104	return self._transcribe_chunked(audio_path, language, model)
105
106	def _transcribe_single(
107	self, audio_path: Path, language: Optional[str], model: str
108	) -> dict:
109	"""Transcribe a single audio file."""
110	with open(audio_path, "rb") as f:
111	kwargs = {"model": model, "file": f}
112	if language:
113	kwargs["language"] = language
	@@ -128,15 +141,14 @@
128	"duration": getattr(response, "duration", None),
129	"provider": "openai",
130	"model": model,
131	}
132
133	def _transcribe_chunked(
134	self, audio_path: Path, language: Optional[str], model: str
135	) -> dict:
136	"""Split audio into chunks under 25MB and transcribe each."""
137	import tempfile

138	from video_processor.extractors.audio_extractor import AudioExtractor
139
140	extractor = AudioExtractor()
141	audio_data, sr = extractor.load_audio(audio_path)
142	total_duration = len(audio_data) / sr
	@@ -164,15 +176,17 @@
164	logger.info(f"Transcribing chunk {i + 1}/{len(segments_data)}...")
165	result = self._transcribe_single(chunk_path, language, model)
166
167	all_text.append(result["text"])
168	for seg in result.get("segments", []):
169	all_segments.append({
170	"start": seg["start"] + time_offset,
171	"end": seg["end"] + time_offset,
172	"text": seg["text"],
173	})


174
175	if not detected_language and result.get("language"):
176	detected_language = result["language"]
177
178	time_offset += len(chunk) / sr
	@@ -200,14 +214,16 @@
200	if mid in _AUDIO_MODELS or mid.startswith("whisper"):
201	caps.append("audio")
202	if "embedding" in mid:
203	caps.append("embedding")
204	if caps:
205	models.append(ModelInfo(
206	id=mid,
207	provider="openai",
208	display_name=mid,
209	capabilities=caps,
210	))


211	except Exception as e:
212	logger.warning(f"Failed to list OpenAI models: {e}")
213	return sorted(models, key=lambda m: m.id)
214

	--- video_processor/providers/openai_provider.py
	+++ video_processor/providers/openai_provider.py
	@@ -13,11 +13,22 @@
13
14	load_dotenv()
15	logger = logging.getLogger(__name__)
16
17	# Models known to have vision capability
18	_VISION_MODELS = {
19	"gpt-4o",
20	"gpt-4o-mini",
21	"gpt-4-turbo",
22	"gpt-4.1",
23	"gpt-4.1-mini",
24	"gpt-4.1-nano",
25	"o1",
26	"o3",
27	"o3-mini",
28	"o4-mini",
29	}
30	_AUDIO_MODELS = {"whisper-1"}
31
32
33	class OpenAIProvider(BaseProvider):
34	"""OpenAI API provider."""
	@@ -44,11 +55,13 @@
55	max_tokens=max_tokens,
56	temperature=temperature,
57	)
58	self._last_usage = {
59	"input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0,
60	"output_tokens": getattr(response.usage, "completion_tokens", 0)
61	if response.usage
62	else 0,
63	}
64	return response.choices[0].message.content or ""
65
66	def analyze_image(
67	self,
	@@ -75,11 +88,13 @@
88	],
89	max_tokens=max_tokens,
90	)
91	self._last_usage = {
92	"input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0,
93	"output_tokens": getattr(response.usage, "completion_tokens", 0)
94	if response.usage
95	else 0,
96	}
97	return response.choices[0].message.content or ""
98
99	# Whisper API limit is 25MB
100	_MAX_FILE_SIZE = 25 * 1024 * 1024
	@@ -101,13 +116,11 @@
116	logger.info(
117	f"Audio file {file_size / 1024 / 1024:.1f}MB exceeds Whisper 25MB limit, chunking..."
118	)
119	return self._transcribe_chunked(audio_path, language, model)
120
121	def _transcribe_single(self, audio_path: Path, language: Optional[str], model: str) -> dict:


122	"""Transcribe a single audio file."""
123	with open(audio_path, "rb") as f:
124	kwargs = {"model": model, "file": f}
125	if language:
126	kwargs["language"] = language
	@@ -128,15 +141,14 @@
141	"duration": getattr(response, "duration", None),
142	"provider": "openai",
143	"model": model,
144	}
145
146	def _transcribe_chunked(self, audio_path: Path, language: Optional[str], model: str) -> dict:


147	"""Split audio into chunks under 25MB and transcribe each."""
148	import tempfile
149
150	from video_processor.extractors.audio_extractor import AudioExtractor
151
152	extractor = AudioExtractor()
153	audio_data, sr = extractor.load_audio(audio_path)
154	total_duration = len(audio_data) / sr
	@@ -164,15 +176,17 @@
176	logger.info(f"Transcribing chunk {i + 1}/{len(segments_data)}...")
177	result = self._transcribe_single(chunk_path, language, model)
178
179	all_text.append(result["text"])
180	for seg in result.get("segments", []):
181	all_segments.append(
182	{
183	"start": seg["start"] + time_offset,
184	"end": seg["end"] + time_offset,
185	"text": seg["text"],
186	}
187	)
188
189	if not detected_language and result.get("language"):
190	detected_language = result["language"]
191
192	time_offset += len(chunk) / sr
	@@ -200,14 +214,16 @@
214	if mid in _AUDIO_MODELS or mid.startswith("whisper"):
215	caps.append("audio")
216	if "embedding" in mid:
217	caps.append("embedding")
218	if caps:
219	models.append(
220	ModelInfo(
221	id=mid,
222	provider="openai",
223	display_name=mid,
224	capabilities=caps,
225	)
226	)
227	except Exception as e:
228	logger.warning(f"Failed to list OpenAI models: {e}")
229	return sorted(models, key=lambda m: m.id)
230

M video_processor/providers/whisper_local.py

+4 -5

		--- video_processor/providers/whisper_local.py
		+++ video_processor/providers/whisper_local.py
		@@ -69,13 +69,11 @@
69	69	return
70	70
71	71	try:
72	72	import whisper
73	73	except ImportError:
74		- raise ImportError(
75		- "openai-whisper not installed. Run: pip install openai-whisper torch"
76		- )
	74	+ raise ImportError("openai-whisper not installed. Run: pip install openai-whisper torch")
77	75
78	76	logger.info(f"Loading Whisper {self.model_size} model on {self.device}...")
79	77	self._model = whisper.load_model(self.model_size, device=self.device)
80	78	logger.info("Whisper model loaded")
81	79
		@@ -125,10 +123,11 @@
125	123
126	124	@staticmethod
127	125	def is_available() -> bool:
128	126	"""Check if local Whisper is installed and usable."""
129	127	try:
130		- import whisper
131		- import torch
	128	+ import torch # noqa: F401
	129	+ import whisper # noqa: F401
	130	+
132	131	return True
133	132	except ImportError:
134	133	return False
135	134

	--- video_processor/providers/whisper_local.py
	+++ video_processor/providers/whisper_local.py
	@@ -69,13 +69,11 @@
69	return
70
71	try:
72	import whisper
73	except ImportError:
74	raise ImportError(
75	"openai-whisper not installed. Run: pip install openai-whisper torch"
76	)
77
78	logger.info(f"Loading Whisper {self.model_size} model on {self.device}...")
79	self._model = whisper.load_model(self.model_size, device=self.device)
80	logger.info("Whisper model loaded")
81
	@@ -125,10 +123,11 @@
125
126	@staticmethod
127	def is_available() -> bool:
128	"""Check if local Whisper is installed and usable."""
129	try:
130	import whisper
131	import torch

132	return True
133	except ImportError:
134	return False
135

	--- video_processor/providers/whisper_local.py
	+++ video_processor/providers/whisper_local.py
	@@ -69,13 +69,11 @@
69	return
70
71	try:
72	import whisper
73	except ImportError:
74	raise ImportError("openai-whisper not installed. Run: pip install openai-whisper torch")


75
76	logger.info(f"Loading Whisper {self.model_size} model on {self.device}...")
77	self._model = whisper.load_model(self.model_size, device=self.device)
78	logger.info("Whisper model loaded")
79
	@@ -125,10 +123,11 @@
123
124	@staticmethod
125	def is_available() -> bool:
126	"""Check if local Whisper is installed and usable."""
127	try:
128	import torch # noqa: F401
129	import whisper # noqa: F401
130
131	return True
132	except ImportError:
133	return False
134

M video_processor/sources/base.py

+1

		--- video_processor/sources/base.py
		+++ video_processor/sources/base.py
		@@ -10,10 +10,11 @@
10	10	logger = logging.getLogger(__name__)
11	11
12	12
13	13	class SourceFile(BaseModel):
14	14	"""A file available in a cloud source."""
	15	+
15	16	name: str = Field(description="File name")
16	17	id: str = Field(description="Provider-specific file identifier")
17	18	size_bytes: Optional[int] = Field(default=None, description="File size in bytes")
18	19	mime_type: Optional[str] = Field(default=None, description="MIME type")
19	20	modified_at: Optional[str] = Field(default=None, description="Last modified timestamp")
20	21

	--- video_processor/sources/base.py
	+++ video_processor/sources/base.py
	@@ -10,10 +10,11 @@
10	logger = logging.getLogger(__name__)
11
12
13	class SourceFile(BaseModel):
14	"""A file available in a cloud source."""

15	name: str = Field(description="File name")
16	id: str = Field(description="Provider-specific file identifier")
17	size_bytes: Optional[int] = Field(default=None, description="File size in bytes")
18	mime_type: Optional[str] = Field(default=None, description="MIME type")
19	modified_at: Optional[str] = Field(default=None, description="Last modified timestamp")
20

	--- video_processor/sources/base.py
	+++ video_processor/sources/base.py
	@@ -10,10 +10,11 @@
10	logger = logging.getLogger(__name__)
11
12
13	class SourceFile(BaseModel):
14	"""A file available in a cloud source."""
15
16	name: str = Field(description="File name")
17	id: str = Field(description="Provider-specific file identifier")
18	size_bytes: Optional[int] = Field(default=None, description="File size in bytes")
19	mime_type: Optional[str] = Field(default=None, description="MIME type")
20	modified_at: Optional[str] = Field(default=None, description="Last modified timestamp")
21

M video_processor/sources/dropbox_source.py

+3 -9

		--- video_processor/sources/dropbox_source.py
		+++ video_processor/sources/dropbox_source.py
		@@ -56,13 +56,11 @@
56	56	def authenticate(self) -> bool:
57	57	"""Authenticate with Dropbox API."""
58	58	try:
59	59	import dropbox
60	60	except ImportError:
61		- logger.error(
62		- "Dropbox SDK not installed. Run: pip install planopticon[dropbox]"
63		- )
	61	+ logger.error("Dropbox SDK not installed. Run: pip install planopticon[dropbox]")
64	62	return False
65	63
66	64	# Try direct access token first
67	65	if self.access_token:
68	66	return self._auth_token(dropbox)
		@@ -109,13 +107,11 @@
109	107	return False
110	108
111	109	def _auth_oauth(self, dropbox) -> bool:
112	110	"""Run OAuth2 PKCE flow."""
113	111	if not self.app_key:
114		- logger.error(
115		- "Dropbox app key not configured. Set DROPBOX_APP_KEY env var."
116		- )
	112	+ logger.error("Dropbox app key not configured. Set DROPBOX_APP_KEY env var.")
117	113	return False
118	114
119	115	try:
120	116	flow = dropbox.DropboxOAuth2FlowNoRedirect(
121	117	consumer_key=self.app_key,
		@@ -187,13 +183,11 @@
187	183	ext = Path(entry.name).suffix.lower()
188	184	if ext not in VIDEO_EXTENSIONS:
189	185	continue
190	186
191	187	if patterns:
192		- if not any(
193		- entry.name.endswith(p.replace("*", "")) for p in patterns
194		- ):
	188	+ if not any(entry.name.endswith(p.replace("*", "")) for p in patterns):
195	189	continue
196	190
197	191	files.append(
198	192	SourceFile(
199	193	name=entry.name,
200	194

	--- video_processor/sources/dropbox_source.py
	+++ video_processor/sources/dropbox_source.py
	@@ -56,13 +56,11 @@
56	def authenticate(self) -> bool:
57	"""Authenticate with Dropbox API."""
58	try:
59	import dropbox
60	except ImportError:
61	logger.error(
62	"Dropbox SDK not installed. Run: pip install planopticon[dropbox]"
63	)
64	return False
65
66	# Try direct access token first
67	if self.access_token:
68	return self._auth_token(dropbox)
	@@ -109,13 +107,11 @@
109	return False
110
111	def _auth_oauth(self, dropbox) -> bool:
112	"""Run OAuth2 PKCE flow."""
113	if not self.app_key:
114	logger.error(
115	"Dropbox app key not configured. Set DROPBOX_APP_KEY env var."
116	)
117	return False
118
119	try:
120	flow = dropbox.DropboxOAuth2FlowNoRedirect(
121	consumer_key=self.app_key,
	@@ -187,13 +183,11 @@
187	ext = Path(entry.name).suffix.lower()
188	if ext not in VIDEO_EXTENSIONS:
189	continue
190
191	if patterns:
192	if not any(
193	entry.name.endswith(p.replace("*", "")) for p in patterns
194	):
195	continue
196
197	files.append(
198	SourceFile(
199	name=entry.name,
200

	--- video_processor/sources/dropbox_source.py
	+++ video_processor/sources/dropbox_source.py
	@@ -56,13 +56,11 @@
56	def authenticate(self) -> bool:
57	"""Authenticate with Dropbox API."""
58	try:
59	import dropbox
60	except ImportError:
61	logger.error("Dropbox SDK not installed. Run: pip install planopticon[dropbox]")


62	return False
63
64	# Try direct access token first
65	if self.access_token:
66	return self._auth_token(dropbox)
	@@ -109,13 +107,11 @@
107	return False
108
109	def _auth_oauth(self, dropbox) -> bool:
110	"""Run OAuth2 PKCE flow."""
111	if not self.app_key:
112	logger.error("Dropbox app key not configured. Set DROPBOX_APP_KEY env var.")


113	return False
114
115	try:
116	flow = dropbox.DropboxOAuth2FlowNoRedirect(
117	consumer_key=self.app_key,
	@@ -187,13 +183,11 @@
183	ext = Path(entry.name).suffix.lower()
184	if ext not in VIDEO_EXTENSIONS:
185	continue
186
187	if patterns:
188	if not any(entry.name.endswith(p.replace("*", "")) for p in patterns):


189	continue
190
191	files.append(
192	SourceFile(
193	name=entry.name,
194

M video_processor/sources/google_drive.py

+7 -20

		--- video_processor/sources/google_drive.py
		+++ video_processor/sources/google_drive.py
		@@ -65,27 +65,23 @@
65	65	If True, force service account auth. If False, force OAuth.
66	66	If None, auto-detect from credentials file.
67	67	token_path : Path, optional
68	68	Where to store/load OAuth tokens. Defaults to ~/.planopticon/google_drive_token.json
69	69	"""
70		- self.credentials_path = credentials_path or os.environ.get(
71		- "GOOGLE_APPLICATION_CREDENTIALS"
72		- )
	70	+ self.credentials_path = credentials_path or os.environ.get("GOOGLE_APPLICATION_CREDENTIALS")
73	71	self.use_service_account = use_service_account
74	72	self.token_path = token_path or _TOKEN_PATH
75	73	self.service = None
76	74	self._creds = None
77	75
78	76	def authenticate(self) -> bool:
79	77	"""Authenticate with Google Drive API."""
80	78	try:
81		- from google.oauth2 import service_account as sa_module
	79	+ from google.oauth2 import service_account as sa_module # noqa: F401
82	80	from googleapiclient.discovery import build
83	81	except ImportError:
84		- logger.error(
85		- "Google API client not installed. Run: pip install planopticon[gdrive]"
86		- )
	82	+ logger.error("Google API client not installed. Run: pip install planopticon[gdrive]")
87	83	return False
88	84
89	85	# Determine auth method
90	86	if self.use_service_account is True or (
91	87	self.use_service_account is None and self._is_service_account()
		@@ -130,23 +126,19 @@
130	126	try:
131	127	from google.auth.transport.requests import Request
132	128	from google.oauth2.credentials import Credentials
133	129	from google_auth_oauthlib.flow import InstalledAppFlow
134	130	except ImportError:
135		- logger.error(
136		- "OAuth libraries not installed. Run: pip install planopticon[gdrive]"
137		- )
	131	+ logger.error("OAuth libraries not installed. Run: pip install planopticon[gdrive]")
138	132	return False
139	133
140	134	creds = None
141	135
142	136	# Load existing token
143	137	if self.token_path.exists():
144	138	try:
145		- creds = Credentials.from_authorized_user_file(
146		- str(self.token_path), SCOPES
147		- )
	139	+ creds = Credentials.from_authorized_user_file(str(self.token_path), SCOPES)
148	140	except Exception:
149	141	pass
150	142
151	143	# Refresh or run new flow
152	144	if creds and creds.expired and creds.refresh_token:
		@@ -251,13 +243,11 @@
251	243	query_parts = []
252	244
253	245	if folder_id:
254	246	query_parts.append(f"'{folder_id}' in parents")
255	247
256		- mime_conditions = " or ".join(
257		- f"mimeType='{mt}'" for mt in VIDEO_MIME_TYPES
258		- )
	248	+ mime_conditions = " or ".join(f"mimeType='{mt}'" for mt in VIDEO_MIME_TYPES)
259	249	query_parts.append(f"({mime_conditions})")
260	250	query_parts.append("trashed=false")
261	251
262	252	query = " and ".join(query_parts)
263	253	page_token = None
		@@ -275,13 +265,11 @@
275	265	.execute()
276	266	)
277	267
278	268	for f in response.get("files", []):
279	269	name = f.get("name", "")
280		- if patterns and not any(
281		- name.endswith(p.replace("*", "")) for p in patterns
282		- ):
	270	+ if patterns and not any(name.endswith(p.replace("*", "")) for p in patterns):
283	271	continue
284	272
285	273	out.append(
286	274	SourceFile(
287	275	name=name,
		@@ -336,11 +324,10 @@
336	324	"""Download a file from Google Drive."""
337	325	if not self.service:
338	326	raise RuntimeError("Not authenticated. Call authenticate() first.")
339	327
340	328	from googleapiclient.http import MediaIoBaseDownload
341		- import io
342	329
343	330	destination = Path(destination)
344	331	destination.parent.mkdir(parents=True, exist_ok=True)
345	332
346	333	request = self.service.files().get_media(fileId=file.id)
347	334

	--- video_processor/sources/google_drive.py
	+++ video_processor/sources/google_drive.py
	@@ -65,27 +65,23 @@
65	If True, force service account auth. If False, force OAuth.
66	If None, auto-detect from credentials file.
67	token_path : Path, optional
68	Where to store/load OAuth tokens. Defaults to ~/.planopticon/google_drive_token.json
69	"""
70	self.credentials_path = credentials_path or os.environ.get(
71	"GOOGLE_APPLICATION_CREDENTIALS"
72	)
73	self.use_service_account = use_service_account
74	self.token_path = token_path or _TOKEN_PATH
75	self.service = None
76	self._creds = None
77
78	def authenticate(self) -> bool:
79	"""Authenticate with Google Drive API."""
80	try:
81	from google.oauth2 import service_account as sa_module
82	from googleapiclient.discovery import build
83	except ImportError:
84	logger.error(
85	"Google API client not installed. Run: pip install planopticon[gdrive]"
86	)
87	return False
88
89	# Determine auth method
90	if self.use_service_account is True or (
91	self.use_service_account is None and self._is_service_account()
	@@ -130,23 +126,19 @@
130	try:
131	from google.auth.transport.requests import Request
132	from google.oauth2.credentials import Credentials
133	from google_auth_oauthlib.flow import InstalledAppFlow
134	except ImportError:
135	logger.error(
136	"OAuth libraries not installed. Run: pip install planopticon[gdrive]"
137	)
138	return False
139
140	creds = None
141
142	# Load existing token
143	if self.token_path.exists():
144	try:
145	creds = Credentials.from_authorized_user_file(
146	str(self.token_path), SCOPES
147	)
148	except Exception:
149	pass
150
151	# Refresh or run new flow
152	if creds and creds.expired and creds.refresh_token:
	@@ -251,13 +243,11 @@
251	query_parts = []
252
253	if folder_id:
254	query_parts.append(f"'{folder_id}' in parents")
255
256	mime_conditions = " or ".join(
257	f"mimeType='{mt}'" for mt in VIDEO_MIME_TYPES
258	)
259	query_parts.append(f"({mime_conditions})")
260	query_parts.append("trashed=false")
261
262	query = " and ".join(query_parts)
263	page_token = None
	@@ -275,13 +265,11 @@
275	.execute()
276	)
277
278	for f in response.get("files", []):
279	name = f.get("name", "")
280	if patterns and not any(
281	name.endswith(p.replace("*", "")) for p in patterns
282	):
283	continue
284
285	out.append(
286	SourceFile(
287	name=name,
	@@ -336,11 +324,10 @@
336	"""Download a file from Google Drive."""
337	if not self.service:
338	raise RuntimeError("Not authenticated. Call authenticate() first.")
339
340	from googleapiclient.http import MediaIoBaseDownload
341	import io
342
343	destination = Path(destination)
344	destination.parent.mkdir(parents=True, exist_ok=True)
345
346	request = self.service.files().get_media(fileId=file.id)
347

	--- video_processor/sources/google_drive.py
	+++ video_processor/sources/google_drive.py
	@@ -65,27 +65,23 @@
65	If True, force service account auth. If False, force OAuth.
66	If None, auto-detect from credentials file.
67	token_path : Path, optional
68	Where to store/load OAuth tokens. Defaults to ~/.planopticon/google_drive_token.json
69	"""
70	self.credentials_path = credentials_path or os.environ.get("GOOGLE_APPLICATION_CREDENTIALS")


71	self.use_service_account = use_service_account
72	self.token_path = token_path or _TOKEN_PATH
73	self.service = None
74	self._creds = None
75
76	def authenticate(self) -> bool:
77	"""Authenticate with Google Drive API."""
78	try:
79	from google.oauth2 import service_account as sa_module # noqa: F401
80	from googleapiclient.discovery import build
81	except ImportError:
82	logger.error("Google API client not installed. Run: pip install planopticon[gdrive]")


83	return False
84
85	# Determine auth method
86	if self.use_service_account is True or (
87	self.use_service_account is None and self._is_service_account()
	@@ -130,23 +126,19 @@
126	try:
127	from google.auth.transport.requests import Request
128	from google.oauth2.credentials import Credentials
129	from google_auth_oauthlib.flow import InstalledAppFlow
130	except ImportError:
131	logger.error("OAuth libraries not installed. Run: pip install planopticon[gdrive]")


132	return False
133
134	creds = None
135
136	# Load existing token
137	if self.token_path.exists():
138	try:
139	creds = Credentials.from_authorized_user_file(str(self.token_path), SCOPES)


140	except Exception:
141	pass
142
143	# Refresh or run new flow
144	if creds and creds.expired and creds.refresh_token:
	@@ -251,13 +243,11 @@
243	query_parts = []
244
245	if folder_id:
246	query_parts.append(f"'{folder_id}' in parents")
247
248	mime_conditions = " or ".join(f"mimeType='{mt}'" for mt in VIDEO_MIME_TYPES)


249	query_parts.append(f"({mime_conditions})")
250	query_parts.append("trashed=false")
251
252	query = " and ".join(query_parts)
253	page_token = None
	@@ -275,13 +265,11 @@
265	.execute()
266	)
267
268	for f in response.get("files", []):
269	name = f.get("name", "")
270	if patterns and not any(name.endswith(p.replace("*", "")) for p in patterns):


271	continue
272
273	out.append(
274	SourceFile(
275	name=name,
	@@ -336,11 +324,10 @@
324	"""Download a file from Google Drive."""
325	if not self.service:
326	raise RuntimeError("Not authenticated. Call authenticate() first.")
327
328	from googleapiclient.http import MediaIoBaseDownload

329
330	destination = Path(destination)
331	destination.parent.mkdir(parents=True, exist_ok=True)
332
333	request = self.service.files().get_media(fileId=file.id)
334

M video_processor/utils/api_cache.py

+50 -56

		--- video_processor/utils/api_cache.py
		+++ video_processor/utils/api_cache.py
		@@ -1,28 +1,30 @@
1	1	"""Caching system for API responses to reduce API calls and costs."""
	2	+
	3	+import hashlib
2	4	import json
3	5	import logging
4	6	import os
5	7	import time
6		-import hashlib
7	8	from pathlib import Path
8	9	from typing import Any, Dict, Optional, Union
9	10
10	11	logger = logging.getLogger(__name__)
	12	+
11	13
12	14	class ApiCache:
13	15	"""Disk-based API response cache."""
14		-
	16	+
15	17	def __init__(
16		- self,
17		- cache_dir: Union[str, Path],
	18	+ self,
	19	+ cache_dir: Union[str, Path],
18	20	namespace: str = "default",
19		- ttl: int = 86400 # 24 hours in seconds
	21	+ ttl: int = 86400, # 24 hours in seconds
20	22	):
21	23	"""
22	24	Initialize API cache.
23		-
	25	+
24	26	Parameters
25	27	----------
26	28	cache_dir : str or Path
27	29	Directory for cache files
28	30	namespace : str
		@@ -31,206 +33,198 @@
31	33	Time-to-live for cache entries in seconds
32	34	"""
33	35	self.cache_dir = Path(cache_dir)
34	36	self.namespace = namespace
35	37	self.ttl = ttl
36		-
	38	+
37	39	# Ensure namespace directory exists
38	40	self.namespace_dir = self.cache_dir / namespace
39	41	self.namespace_dir.mkdir(parents=True, exist_ok=True)
40		-
	42	+
41	43	logger.debug(f"Initialized API cache in {self.namespace_dir}")
42		-
	44	+
43	45	def get_cache_path(self, key: str) -> Path:
44	46	"""
45	47	Get path to cache file for key.
46		-
	48	+
47	49	Parameters
48	50	----------
49	51	key : str
50	52	Cache key
51		-
	53	+
52	54	Returns
53	55	-------
54	56	Path
55	57	Path to cache file
56	58	"""
57	59	# Hash the key to ensure valid filename
58	60	hashed_key = hashlib.md5(key.encode()).hexdigest()
59	61	return self.namespace_dir / f"{hashed_key}.json"
60		-
	62	+
61	63	def get(self, key: str) -> Optional[Any]:
62	64	"""
63	65	Get value from cache.
64		-
	66	+
65	67	Parameters
66	68	----------
67	69	key : str
68	70	Cache key
69		-
	71	+
70	72	Returns
71	73	-------
72	74	object or None
73	75	Cached value if available and not expired, None otherwise
74	76	"""
75	77	cache_path = self.get_cache_path(key)
76		-
	78	+
77	79	# Check if cache file exists
78	80	if not cache_path.exists():
79	81	return None
80		-
	82	+
81	83	try:
82	84	# Read cache file
83	85	with open(cache_path, "r", encoding="utf-8") as f:
84	86	cache_data = json.load(f)
85		-
	87	+
86	88	# Check if cache entry is expired
87	89	timestamp = cache_data.get("timestamp", 0)
88	90	now = time.time()
89		-
	91	+
90	92	if now - timestamp > self.ttl:
91	93	logger.debug(f"Cache entry expired for {key}")
92	94	return None
93		-
	95	+
94	96	logger.debug(f"Cache hit for {key}")
95	97	return cache_data.get("value")
96		-
	98	+
97	99	except Exception as e:
98	100	logger.warning(f"Error reading cache: {str(e)}")
99	101	return None
100		-
	102	+
101	103	def set(self, key: str, value: Any) -> bool:
102	104	"""
103	105	Set value in cache.
104		-
	106	+
105	107	Parameters
106	108	----------
107	109	key : str
108	110	Cache key
109	111	value : object
110	112	Value to cache (must be JSON serializable)
111		-
	113	+
112	114	Returns
113	115	-------
114	116	bool
115	117	True if successful, False otherwise
116	118	"""
117	119	cache_path = self.get_cache_path(key)
118		-
	120	+
119	121	try:
120	122	# Prepare cache data
121		- cache_data = {
122		- "timestamp": time.time(),
123		- "value": value
124		- }
125		-
	123	+ cache_data = {"timestamp": time.time(), "value": value}
	124	+
126	125	# Write to cache file
127	126	with open(cache_path, "w", encoding="utf-8") as f:
128	127	json.dump(cache_data, f, ensure_ascii=False)
129		-
	128	+
130	129	logger.debug(f"Cached value for {key}")
131	130	return True
132		-
	131	+
133	132	except Exception as e:
134	133	logger.warning(f"Error writing to cache: {str(e)}")
135	134	return False
136		-
	135	+
137	136	def invalidate(self, key: str) -> bool:
138	137	"""
139	138	Invalidate cache entry.
140		-
	139	+
141	140	Parameters
142	141	----------
143	142	key : str
144	143	Cache key
145		-
	144	+
146	145	Returns
147	146	-------
148	147	bool
149	148	True if entry was removed, False otherwise
150	149	"""
151	150	cache_path = self.get_cache_path(key)
152		-
	151	+
153	152	if cache_path.exists():
154	153	try:
155	154	os.remove(cache_path)
156	155	logger.debug(f"Invalidated cache for {key}")
157	156	return True
158	157	except Exception as e:
159	158	logger.warning(f"Error invalidating cache: {str(e)}")
160		-
	159	+
161	160	return False
162		-
	161	+
163	162	def clear(self, older_than: Optional[int] = None) -> int:
164	163	"""
165	164	Clear all cache entries or entries older than specified time.
166		-
	165	+
167	166	Parameters
168	167	----------
169	168	older_than : int, optional
170	169	Clear entries older than this many seconds
171		-
	170	+
172	171	Returns
173	172	-------
174	173	int
175	174	Number of entries cleared
176	175	"""
177	176	count = 0
178	177	now = time.time()
179		-
	178	+
180	179	for cache_file in self.namespace_dir.glob("*.json"):
181	180	try:
182	181	# Check file age if criteria provided
183	182	if older_than is not None:
184	183	file_age = now - os.path.getmtime(cache_file)
185	184	if file_age <= older_than:
186	185	continue
187		-
	186	+
188	187	# Remove file
189	188	os.remove(cache_file)
190	189	count += 1
191		-
	190	+
192	191	except Exception as e:
193	192	logger.warning(f"Error clearing cache file {cache_file}: {str(e)}")
194		-
	193	+
195	194	logger.info(f"Cleared {count} cache entries from {self.namespace}")
196	195	return count
197		-
	196	+
198	197	def get_stats(self) -> Dict:
199	198	"""
200	199	Get cache statistics.
201		-
	200	+
202	201	Returns
203	202	-------
204	203	dict
205	204	Cache statistics
206	205	"""
207	206	cache_files = list(self.namespace_dir.glob("*.json"))
208	207	total_size = sum(os.path.getsize(f) for f in cache_files)
209		-
	208	+
210	209	# Analyze age distribution
211	210	now = time.time()
212		- age_distribution = {
213		- "1h": 0,
214		- "6h": 0,
215		- "24h": 0,
216		- "older": 0
217		- }
218		-
	211	+ age_distribution = {"1h": 0, "6h": 0, "24h": 0, "older": 0}
	212	+
219	213	for cache_file in cache_files:
220	214	file_age = now - os.path.getmtime(cache_file)
221		-
	215	+
222	216	if file_age <= 3600: # 1 hour
223	217	age_distribution["1h"] += 1
224	218	elif file_age <= 21600: # 6 hours
225	219	age_distribution["6h"] += 1
226	220	elif file_age <= 86400: # 24 hours
227	221	age_distribution["24h"] += 1
228	222	else:
229	223	age_distribution["older"] += 1
230		-
	224	+
231	225	return {
232	226	"namespace": self.namespace,
233	227	"entry_count": len(cache_files),
234	228	"total_size_bytes": total_size,
235		- "age_distribution": age_distribution
	229	+ "age_distribution": age_distribution,
236	230	}
237	231

	--- video_processor/utils/api_cache.py
	+++ video_processor/utils/api_cache.py
	@@ -1,28 +1,30 @@
1	"""Caching system for API responses to reduce API calls and costs."""


2	import json
3	import logging
4	import os
5	import time
6	import hashlib
7	from pathlib import Path
8	from typing import Any, Dict, Optional, Union
9
10	logger = logging.getLogger(__name__)

11
12	class ApiCache:
13	"""Disk-based API response cache."""
14
15	def __init__(
16	self,
17	cache_dir: Union[str, Path],
18	namespace: str = "default",
19	ttl: int = 86400 # 24 hours in seconds
20	):
21	"""
22	Initialize API cache.
23
24	Parameters
25	----------
26	cache_dir : str or Path
27	Directory for cache files
28	namespace : str
	@@ -31,206 +33,198 @@
31	Time-to-live for cache entries in seconds
32	"""
33	self.cache_dir = Path(cache_dir)
34	self.namespace = namespace
35	self.ttl = ttl
36
37	# Ensure namespace directory exists
38	self.namespace_dir = self.cache_dir / namespace
39	self.namespace_dir.mkdir(parents=True, exist_ok=True)
40
41	logger.debug(f"Initialized API cache in {self.namespace_dir}")
42
43	def get_cache_path(self, key: str) -> Path:
44	"""
45	Get path to cache file for key.
46
47	Parameters
48	----------
49	key : str
50	Cache key
51
52	Returns
53	-------
54	Path
55	Path to cache file
56	"""
57	# Hash the key to ensure valid filename
58	hashed_key = hashlib.md5(key.encode()).hexdigest()
59	return self.namespace_dir / f"{hashed_key}.json"
60
61	def get(self, key: str) -> Optional[Any]:
62	"""
63	Get value from cache.
64
65	Parameters
66	----------
67	key : str
68	Cache key
69
70	Returns
71	-------
72	object or None
73	Cached value if available and not expired, None otherwise
74	"""
75	cache_path = self.get_cache_path(key)
76
77	# Check if cache file exists
78	if not cache_path.exists():
79	return None
80
81	try:
82	# Read cache file
83	with open(cache_path, "r", encoding="utf-8") as f:
84	cache_data = json.load(f)
85
86	# Check if cache entry is expired
87	timestamp = cache_data.get("timestamp", 0)
88	now = time.time()
89
90	if now - timestamp > self.ttl:
91	logger.debug(f"Cache entry expired for {key}")
92	return None
93
94	logger.debug(f"Cache hit for {key}")
95	return cache_data.get("value")
96
97	except Exception as e:
98	logger.warning(f"Error reading cache: {str(e)}")
99	return None
100
101	def set(self, key: str, value: Any) -> bool:
102	"""
103	Set value in cache.
104
105	Parameters
106	----------
107	key : str
108	Cache key
109	value : object
110	Value to cache (must be JSON serializable)
111
112	Returns
113	-------
114	bool
115	True if successful, False otherwise
116	"""
117	cache_path = self.get_cache_path(key)
118
119	try:
120	# Prepare cache data
121	cache_data = {
122	"timestamp": time.time(),
123	"value": value
124	}
125
126	# Write to cache file
127	with open(cache_path, "w", encoding="utf-8") as f:
128	json.dump(cache_data, f, ensure_ascii=False)
129
130	logger.debug(f"Cached value for {key}")
131	return True
132
133	except Exception as e:
134	logger.warning(f"Error writing to cache: {str(e)}")
135	return False
136
137	def invalidate(self, key: str) -> bool:
138	"""
139	Invalidate cache entry.
140
141	Parameters
142	----------
143	key : str
144	Cache key
145
146	Returns
147	-------
148	bool
149	True if entry was removed, False otherwise
150	"""
151	cache_path = self.get_cache_path(key)
152
153	if cache_path.exists():
154	try:
155	os.remove(cache_path)
156	logger.debug(f"Invalidated cache for {key}")
157	return True
158	except Exception as e:
159	logger.warning(f"Error invalidating cache: {str(e)}")
160
161	return False
162
163	def clear(self, older_than: Optional[int] = None) -> int:
164	"""
165	Clear all cache entries or entries older than specified time.
166
167	Parameters
168	----------
169	older_than : int, optional
170	Clear entries older than this many seconds
171
172	Returns
173	-------
174	int
175	Number of entries cleared
176	"""
177	count = 0
178	now = time.time()
179
180	for cache_file in self.namespace_dir.glob("*.json"):
181	try:
182	# Check file age if criteria provided
183	if older_than is not None:
184	file_age = now - os.path.getmtime(cache_file)
185	if file_age <= older_than:
186	continue
187
188	# Remove file
189	os.remove(cache_file)
190	count += 1
191
192	except Exception as e:
193	logger.warning(f"Error clearing cache file {cache_file}: {str(e)}")
194
195	logger.info(f"Cleared {count} cache entries from {self.namespace}")
196	return count
197
198	def get_stats(self) -> Dict:
199	"""
200	Get cache statistics.
201
202	Returns
203	-------
204	dict
205	Cache statistics
206	"""
207	cache_files = list(self.namespace_dir.glob("*.json"))
208	total_size = sum(os.path.getsize(f) for f in cache_files)
209
210	# Analyze age distribution
211	now = time.time()
212	age_distribution = {
213	"1h": 0,
214	"6h": 0,
215	"24h": 0,
216	"older": 0
217	}
218
219	for cache_file in cache_files:
220	file_age = now - os.path.getmtime(cache_file)
221
222	if file_age <= 3600: # 1 hour
223	age_distribution["1h"] += 1
224	elif file_age <= 21600: # 6 hours
225	age_distribution["6h"] += 1
226	elif file_age <= 86400: # 24 hours
227	age_distribution["24h"] += 1
228	else:
229	age_distribution["older"] += 1
230
231	return {
232	"namespace": self.namespace,
233	"entry_count": len(cache_files),
234	"total_size_bytes": total_size,
235	"age_distribution": age_distribution
236	}
237

	--- video_processor/utils/api_cache.py
	+++ video_processor/utils/api_cache.py
	@@ -1,28 +1,30 @@
1	"""Caching system for API responses to reduce API calls and costs."""
2
3	import hashlib
4	import json
5	import logging
6	import os
7	import time

8	from pathlib import Path
9	from typing import Any, Dict, Optional, Union
10
11	logger = logging.getLogger(__name__)
12
13
14	class ApiCache:
15	"""Disk-based API response cache."""
16
17	def __init__(
18	self,
19	cache_dir: Union[str, Path],
20	namespace: str = "default",
21	ttl: int = 86400, # 24 hours in seconds
22	):
23	"""
24	Initialize API cache.
25
26	Parameters
27	----------
28	cache_dir : str or Path
29	Directory for cache files
30	namespace : str
	@@ -31,206 +33,198 @@
33	Time-to-live for cache entries in seconds
34	"""
35	self.cache_dir = Path(cache_dir)
36	self.namespace = namespace
37	self.ttl = ttl
38
39	# Ensure namespace directory exists
40	self.namespace_dir = self.cache_dir / namespace
41	self.namespace_dir.mkdir(parents=True, exist_ok=True)
42
43	logger.debug(f"Initialized API cache in {self.namespace_dir}")
44
45	def get_cache_path(self, key: str) -> Path:
46	"""
47	Get path to cache file for key.
48
49	Parameters
50	----------
51	key : str
52	Cache key
53
54	Returns
55	-------
56	Path
57	Path to cache file
58	"""
59	# Hash the key to ensure valid filename
60	hashed_key = hashlib.md5(key.encode()).hexdigest()
61	return self.namespace_dir / f"{hashed_key}.json"
62
63	def get(self, key: str) -> Optional[Any]:
64	"""
65	Get value from cache.
66
67	Parameters
68	----------
69	key : str
70	Cache key
71
72	Returns
73	-------
74	object or None
75	Cached value if available and not expired, None otherwise
76	"""
77	cache_path = self.get_cache_path(key)
78
79	# Check if cache file exists
80	if not cache_path.exists():
81	return None
82
83	try:
84	# Read cache file
85	with open(cache_path, "r", encoding="utf-8") as f:
86	cache_data = json.load(f)
87
88	# Check if cache entry is expired
89	timestamp = cache_data.get("timestamp", 0)
90	now = time.time()
91
92	if now - timestamp > self.ttl:
93	logger.debug(f"Cache entry expired for {key}")
94	return None
95
96	logger.debug(f"Cache hit for {key}")
97	return cache_data.get("value")
98
99	except Exception as e:
100	logger.warning(f"Error reading cache: {str(e)}")
101	return None
102
103	def set(self, key: str, value: Any) -> bool:
104	"""
105	Set value in cache.
106
107	Parameters
108	----------
109	key : str
110	Cache key
111	value : object
112	Value to cache (must be JSON serializable)
113
114	Returns
115	-------
116	bool
117	True if successful, False otherwise
118	"""
119	cache_path = self.get_cache_path(key)
120
121	try:
122	# Prepare cache data
123	cache_data = {"timestamp": time.time(), "value": value}
124



125	# Write to cache file
126	with open(cache_path, "w", encoding="utf-8") as f:
127	json.dump(cache_data, f, ensure_ascii=False)
128
129	logger.debug(f"Cached value for {key}")
130	return True
131
132	except Exception as e:
133	logger.warning(f"Error writing to cache: {str(e)}")
134	return False
135
136	def invalidate(self, key: str) -> bool:
137	"""
138	Invalidate cache entry.
139
140	Parameters
141	----------
142	key : str
143	Cache key
144
145	Returns
146	-------
147	bool
148	True if entry was removed, False otherwise
149	"""
150	cache_path = self.get_cache_path(key)
151
152	if cache_path.exists():
153	try:
154	os.remove(cache_path)
155	logger.debug(f"Invalidated cache for {key}")
156	return True
157	except Exception as e:
158	logger.warning(f"Error invalidating cache: {str(e)}")
159
160	return False
161
162	def clear(self, older_than: Optional[int] = None) -> int:
163	"""
164	Clear all cache entries or entries older than specified time.
165
166	Parameters
167	----------
168	older_than : int, optional
169	Clear entries older than this many seconds
170
171	Returns
172	-------
173	int
174	Number of entries cleared
175	"""
176	count = 0
177	now = time.time()
178
179	for cache_file in self.namespace_dir.glob("*.json"):
180	try:
181	# Check file age if criteria provided
182	if older_than is not None:
183	file_age = now - os.path.getmtime(cache_file)
184	if file_age <= older_than:
185	continue
186
187	# Remove file
188	os.remove(cache_file)
189	count += 1
190
191	except Exception as e:
192	logger.warning(f"Error clearing cache file {cache_file}: {str(e)}")
193
194	logger.info(f"Cleared {count} cache entries from {self.namespace}")
195	return count
196
197	def get_stats(self) -> Dict:
198	"""
199	Get cache statistics.
200
201	Returns
202	-------
203	dict
204	Cache statistics
205	"""
206	cache_files = list(self.namespace_dir.glob("*.json"))
207	total_size = sum(os.path.getsize(f) for f in cache_files)
208
209	# Analyze age distribution
210	now = time.time()
211	age_distribution = {"1h": 0, "6h": 0, "24h": 0, "older": 0}
212





213	for cache_file in cache_files:
214	file_age = now - os.path.getmtime(cache_file)
215
216	if file_age <= 3600: # 1 hour
217	age_distribution["1h"] += 1
218	elif file_age <= 21600: # 6 hours
219	age_distribution["6h"] += 1
220	elif file_age <= 86400: # 24 hours
221	age_distribution["24h"] += 1
222	else:
223	age_distribution["older"] += 1
224
225	return {
226	"namespace": self.namespace,
227	"entry_count": len(cache_files),
228	"total_size_bytes": total_size,
229	"age_distribution": age_distribution,
230	}
231

M video_processor/utils/export.py

+7 -4

		--- video_processor/utils/export.py
		+++ video_processor/utils/export.py
		@@ -1,15 +1,14 @@
1	1	"""Multi-format output orchestration."""
2	2
3		-import json
4	3	import logging
5	4	from pathlib import Path
6	5	from typing import Optional
7	6
8	7	from tqdm import tqdm
9	8
10		-from video_processor.models import DiagramResult, VideoManifest
	9	+from video_processor.models import VideoManifest
11	10	from video_processor.utils.rendering import render_mermaid, reproduce_chart
12	11
13	12	logger = logging.getLogger(__name__)
14	13
15	14
		@@ -79,11 +78,13 @@
79	78	svg_path = output_dir / d.svg_path if d.svg_path else None
80	79	if svg_path and svg_path.exists():
81	80	svg_content = svg_path.read_text()
82	81	diag_html += f'<div class="diagram">{svg_content}</div>'
83	82	elif d.image_path:
84		- diag_html += f'<img src="{d.image_path}" alt="Diagram {i + 1}" style="max-width:100%">'
	83	+ diag_html += (
	84	+ f'<img src="{d.image_path}" alt="Diagram {i + 1}" style="max-width:100%">'
	85	+ )
85	86	if d.mermaid:
86	87	diag_html += f'<pre class="mermaid">{d.mermaid}</pre>'
87	88	sections.append(diag_html)
88	89
89	90	title = manifest.video.title or "PlanOpticon Analysis"
		@@ -155,11 +156,13 @@
155	156	Updates manifest with output file paths and returns it.
156	157	"""
157	158	output_dir = Path(output_dir)
158	159
159	160	# Render mermaid diagrams to SVG/PNG
160		- for i, diagram in enumerate(tqdm(manifest.diagrams, desc="Rendering diagrams", unit="diag") if manifest.diagrams else []):
	161	+ for i, diagram in enumerate(
	162	+ tqdm(manifest.diagrams, desc="Rendering diagrams", unit="diag") if manifest.diagrams else []
	163	+ ):
161	164	if diagram.mermaid:
162	165	diagrams_dir = output_dir / "diagrams"
163	166	prefix = f"diagram_{i}"
164	167	paths = render_mermaid(diagram.mermaid, diagrams_dir, prefix)
165	168	if "svg" in paths:
166	169

	--- video_processor/utils/export.py
	+++ video_processor/utils/export.py
	@@ -1,15 +1,14 @@
1	"""Multi-format output orchestration."""
2
3	import json
4	import logging
5	from pathlib import Path
6	from typing import Optional
7
8	from tqdm import tqdm
9
10	from video_processor.models import DiagramResult, VideoManifest
11	from video_processor.utils.rendering import render_mermaid, reproduce_chart
12
13	logger = logging.getLogger(__name__)
14
15
	@@ -79,11 +78,13 @@
79	svg_path = output_dir / d.svg_path if d.svg_path else None
80	if svg_path and svg_path.exists():
81	svg_content = svg_path.read_text()
82	diag_html += f'<div class="diagram">{svg_content}</div>'
83	elif d.image_path:
84	diag_html += f'<img src="{d.image_path}" alt="Diagram {i + 1}" style="max-width:100%">'


85	if d.mermaid:
86	diag_html += f'<pre class="mermaid">{d.mermaid}</pre>'
87	sections.append(diag_html)
88
89	title = manifest.video.title or "PlanOpticon Analysis"
	@@ -155,11 +156,13 @@
155	Updates manifest with output file paths and returns it.
156	"""
157	output_dir = Path(output_dir)
158
159	# Render mermaid diagrams to SVG/PNG
160	for i, diagram in enumerate(tqdm(manifest.diagrams, desc="Rendering diagrams", unit="diag") if manifest.diagrams else []):


161	if diagram.mermaid:
162	diagrams_dir = output_dir / "diagrams"
163	prefix = f"diagram_{i}"
164	paths = render_mermaid(diagram.mermaid, diagrams_dir, prefix)
165	if "svg" in paths:
166

	--- video_processor/utils/export.py
	+++ video_processor/utils/export.py
	@@ -1,15 +1,14 @@
1	"""Multi-format output orchestration."""
2

3	import logging
4	from pathlib import Path
5	from typing import Optional
6
7	from tqdm import tqdm
8
9	from video_processor.models import VideoManifest
10	from video_processor.utils.rendering import render_mermaid, reproduce_chart
11
12	logger = logging.getLogger(__name__)
13
14
	@@ -79,11 +78,13 @@
78	svg_path = output_dir / d.svg_path if d.svg_path else None
79	if svg_path and svg_path.exists():
80	svg_content = svg_path.read_text()
81	diag_html += f'<div class="diagram">{svg_content}</div>'
82	elif d.image_path:
83	diag_html += (
84	f'<img src="{d.image_path}" alt="Diagram {i + 1}" style="max-width:100%">'
85	)
86	if d.mermaid:
87	diag_html += f'<pre class="mermaid">{d.mermaid}</pre>'
88	sections.append(diag_html)
89
90	title = manifest.video.title or "PlanOpticon Analysis"
	@@ -155,11 +156,13 @@
156	Updates manifest with output file paths and returns it.
157	"""
158	output_dir = Path(output_dir)
159
160	# Render mermaid diagrams to SVG/PNG
161	for i, diagram in enumerate(
162	tqdm(manifest.diagrams, desc="Rendering diagrams", unit="diag") if manifest.diagrams else []
163	):
164	if diagram.mermaid:
165	diagrams_dir = output_dir / "diagrams"
166	prefix = f"diagram_{i}"
167	paths = render_mermaid(diagram.mermaid, diagrams_dir, prefix)
168	if "svg" in paths:
169

M video_processor/utils/prompt_templates.py

+50 -56

		--- video_processor/utils/prompt_templates.py
		+++ video_processor/utils/prompt_templates.py
		@@ -1,152 +1,153 @@
1	1	"""Prompt templates for LLM-based content analysis."""
2		-import json
	2	+
3	3	import logging
4		-import os
5	4	from pathlib import Path
6	5	from string import Template
7		-from typing import Any, Dict, List, Optional, Union
	6	+from typing import Dict, Optional, Union
8	7
9	8	logger = logging.getLogger(__name__)
	9	+
10	10
11	11	class PromptTemplate:
12	12	"""Template manager for LLM prompts."""
13		-
	13	+
14	14	def __init__(
15		- self,
	15	+ self,
16	16	templates_dir: Optional[Union[str, Path]] = None,
17		- default_templates: Optional[Dict[str, str]] = None
	17	+ default_templates: Optional[Dict[str, str]] = None,
18	18	):
19	19	"""
20	20	Initialize prompt template manager.
21		-
	21	+
22	22	Parameters
23	23	----------
24	24	templates_dir : str or Path, optional
25	25	Directory containing template files
26	26	default_templates : dict, optional
27	27	Default templates to use
28	28	"""
29	29	self.templates_dir = Path(templates_dir) if templates_dir else None
30	30	self.templates = {}
31		-
	31	+
32	32	# Load default templates
33	33	if default_templates:
34	34	self.templates.update(default_templates)
35		-
	35	+
36	36	# Load templates from directory if provided
37	37	if self.templates_dir and self.templates_dir.exists():
38	38	self._load_templates_from_dir()
39		-
	39	+
40	40	def _load_templates_from_dir(self) -> None:
41	41	"""Load templates from template directory."""
42	42	if not self.templates_dir:
43	43	return
44		-
	44	+
45	45	for template_file in self.templates_dir.glob("*.txt"):
46	46	template_name = template_file.stem
47	47	try:
48	48	with open(template_file, "r", encoding="utf-8") as f:
49	49	template_content = f.read()
50	50	self.templates[template_name] = template_content
51	51	logger.debug(f"Loaded template: {template_name}")
52	52	except Exception as e:
53	53	logger.warning(f"Error loading template {template_name}: {str(e)}")
54		-
	54	+
55	55	def get_template(self, template_name: str) -> Optional[Template]:
56	56	"""
57	57	Get template by name.
58		-
	58	+
59	59	Parameters
60	60	----------
61	61	template_name : str
62	62	Template name
63		-
	63	+
64	64	Returns
65	65	-------
66	66	Template or None
67	67	Template object if found, None otherwise
68	68	"""
69	69	if template_name not in self.templates:
70	70	logger.warning(f"Template not found: {template_name}")
71	71	return None
72		-
	72	+
73	73	return Template(self.templates[template_name])
74		-
	74	+
75	75	def format_prompt(self, template_name: str, **kwargs) -> Optional[str]:
76	76	"""
77	77	Format prompt with provided parameters.
78		-
	78	+
79	79	Parameters
80	80	----------
81	81	template_name : str
82	82	Template name
83	83	**kwargs : dict
84	84	Template parameters
85		-
	85	+
86	86	Returns
87	87	-------
88	88	str or None
89	89	Formatted prompt if template exists, None otherwise
90	90	"""
91	91	template = self.get_template(template_name)
92	92	if not template:
93	93	return None
94		-
	94	+
95	95	try:
96	96	return template.safe_substitute(**kwargs)
97	97	except Exception as e:
98	98	logger.error(f"Error formatting template {template_name}: {str(e)}")
99	99	return None
100		-
	100	+
101	101	def add_template(self, template_name: str, template_content: str) -> None:
102	102	"""
103	103	Add or update template.
104		-
	104	+
105	105	Parameters
106	106	----------
107	107	template_name : str
108	108	Template name
109	109	template_content : str
110	110	Template content
111	111	"""
112	112	self.templates[template_name] = template_content
113		-
	113	+
114	114	def save_template(self, template_name: str) -> bool:
115	115	"""
116	116	Save template to file.
117		-
	117	+
118	118	Parameters
119	119	----------
120	120	template_name : str
121	121	Template name
122		-
	122	+
123	123	Returns
124	124	-------
125	125	bool
126	126	True if successful, False otherwise
127	127	"""
128	128	if not self.templates_dir:
129	129	logger.error("Templates directory not set")
130	130	return False
131		-
	131	+
132	132	if template_name not in self.templates:
133	133	logger.warning(f"Template not found: {template_name}")
134	134	return False
135		-
	135	+
136	136	try:
137	137	self.templates_dir.mkdir(parents=True, exist_ok=True)
138	138	template_path = self.templates_dir / f"{template_name}.txt"
139		-
	139	+
140	140	with open(template_path, "w", encoding="utf-8") as f:
141	141	f.write(self.templates[template_name])
142		-
	142	+
143	143	logger.debug(f"Saved template: {template_name}")
144	144	return True
145	145	except Exception as e:
146	146	logger.error(f"Error saving template {template_name}: {str(e)}")
147	147	return False
	148	+
148	149
149	150	# Default prompt templates
150	151	DEFAULT_TEMPLATES = {
151	152	"content_analysis": """
152	153	Analyze the provided video content and extract key information:
		@@ -161,50 +162,48 @@
161	162	- Main topics and themes
162	163	- Key points for each topic
163	164	- Important details or facts
164	165	- Action items or follow-ups
165	166	- Relationships between concepts
166		-
	167	+
167	168	Format the output as structured markdown.
168	169	""",
169		-
170	170	"diagram_extraction": """
171		- Analyze the following image that contains a diagram, whiteboard content, or other visual information.
172		-
	171	+ Analyze the following image that contains a diagram, whiteboard content,
	172	+ or other visual information.
	173	+
173	174	Extract and convert this visual information into a structured representation.
174		-
	175	+
175	176	If it's a flowchart, process diagram, or similar structured visual:
176	177	- Identify the components and their relationships
177	178	- Preserve the logical flow and structure
178	179	- Convert it to mermaid diagram syntax
179		-
	180	+
180	181	If it's a whiteboard with text, bullet points, or unstructured content:
181	182	- Extract all text elements
182	183	- Preserve hierarchical organization if present
183	184	- Maintain any emphasized or highlighted elements
184		-
	185	+
185	186	Image context: $image_context
186		-
	187	+
187	188	Return the results as markdown with appropriate structure.
188	189	""",
189		-
190	190	"action_item_detection": """
191	191	Review the following transcript and identify all action items, commitments, or follow-up tasks.
192		-
	192	+
193	193	TRANSCRIPT:
194	194	$transcript
195		-
	195	+
196	196	For each action item, extract:
197	197	- The specific action to be taken
198	198	- Who is responsible (if mentioned)
199	199	- Any deadlines or timeframes
200	200	- Priority level (if indicated)
201	201	- Context or additional details
202		-
	202	+
203	203	Format the results as a structured list of action items.
204	204	""",
205		-
206	205	"content_summary": """
207	206	Provide a concise summary of the following content:
208	207
209	208	$content
210	209
		@@ -214,11 +213,10 @@
214	213	- Focus on the most important information
215	214	- Maintain a neutral, objective tone
216	215
217	216	Format the summary as clear, readable text.
218	217	""",
219		-
220	218	"summary_generation": """
221	219	Generate a comprehensive summary of the following transcript content.
222	220
223	221	CONTENT:
224	222	$content
		@@ -229,11 +227,10 @@
229	227	- Notes any important context or background
230	228	- Is 3-5 paragraphs long
231	229
232	230	Write in clear, professional prose.
233	231	""",
234		-
235	232	"key_points_extraction": """
236	233	Extract the key points from the following content.
237	234
238	235	CONTENT:
239	236	$content
		@@ -243,31 +240,30 @@
243	240	- "topic": category or topic area (optional)
244	241	- "details": supporting details (optional)
245	242
246	243	Example format:
247	244	[
248		- {"point": "The system uses microservices architecture", "topic": "Architecture", "details": "Each service handles a specific domain"},
249		- {"point": "Migration is planned for Q2", "topic": "Timeline", "details": null}
	245	+ {"point": "The system uses microservices architecture",
	246	+ "topic": "Architecture", "details": "Each service handles a specific domain"},
250	247	]
251	248
252	249	Return ONLY the JSON array, no additional text.
253	250	""",
254		-
255	251	"entity_extraction": """
256		- Extract all notable entities (people, concepts, technologies, organizations, time references) from the following content.
257		-
	252	+ Extract all notable entities (people, concepts, technologies, organizations,
	253	+ time references) from the following content.
258	254	CONTENT:
259	255	$content
260	256
261	257	Return a JSON array of entity objects:
262	258	[
263		- {"name": "entity name", "type": "person\|concept\|technology\|organization\|time", "description": "brief description"}
264		- ]
	259	+ {"name": "entity name",
	260	+ "type": "person\|concept\|technology\|organization\|time",
	261	+ "description": "brief description"}
265	262
266	263	Return ONLY the JSON array, no additional text.
267	264	""",
268		-
269	265	"relationship_extraction": """
270	266	Given the following content and entities, identify relationships between them.
271	267
272	268	CONTENT:
273	269	$content
		@@ -275,16 +271,15 @@
275	271	KNOWN ENTITIES:
276	272	$entities
277	273
278	274	Return a JSON array of relationship objects:
279	275	[
280		- {"source": "entity A", "target": "entity B", "type": "relationship type (e.g., uses, manages, depends_on, created_by, part_of)"}
281		- ]
	276	+ {"source": "entity A", "target": "entity B",
	277	+ "type": "relationship type (e.g., uses, manages, depends_on, created_by, part_of)"}
282	278
283	279	Return ONLY the JSON array, no additional text.
284	280	""",
285		-
286	281	"diagram_analysis": """
287	282	Analyze the following text extracted from a diagram or visual element.
288	283
289	284	DIAGRAM TEXT:
290	285	$diagram_text
		@@ -303,11 +298,10 @@
303	298	"summary": "brief description of what the diagram shows"
304	299	}
305	300
306	301	Return ONLY the JSON object, no additional text.
307	302	""",
308		-
309	303	"mermaid_generation": """
310	304	Convert the following diagram information into valid Mermaid diagram syntax.
311	305
312	306	Diagram Type: $diagram_type
313	307	Text Content: $text_content
		@@ -315,10 +309,10 @@
315	309
316	310	Generate a Mermaid diagram that accurately represents the visual structure.
317	311	Use the appropriate Mermaid diagram type (graph, sequenceDiagram, classDiagram, etc.).
318	312
319	313	Return ONLY the Mermaid code, no markdown fences or explanations.
320		- """
	314	+ """,
321	315	}
322	316
323	317	# Create default prompt template manager
324	318	default_prompt_manager = PromptTemplate(default_templates=DEFAULT_TEMPLATES)
325	319

	--- video_processor/utils/prompt_templates.py
	+++ video_processor/utils/prompt_templates.py
	@@ -1,152 +1,153 @@
1	"""Prompt templates for LLM-based content analysis."""
2	import json
3	import logging
4	import os
5	from pathlib import Path
6	from string import Template
7	from typing import Any, Dict, List, Optional, Union
8
9	logger = logging.getLogger(__name__)

10
11	class PromptTemplate:
12	"""Template manager for LLM prompts."""
13
14	def __init__(
15	self,
16	templates_dir: Optional[Union[str, Path]] = None,
17	default_templates: Optional[Dict[str, str]] = None
18	):
19	"""
20	Initialize prompt template manager.
21
22	Parameters
23	----------
24	templates_dir : str or Path, optional
25	Directory containing template files
26	default_templates : dict, optional
27	Default templates to use
28	"""
29	self.templates_dir = Path(templates_dir) if templates_dir else None
30	self.templates = {}
31
32	# Load default templates
33	if default_templates:
34	self.templates.update(default_templates)
35
36	# Load templates from directory if provided
37	if self.templates_dir and self.templates_dir.exists():
38	self._load_templates_from_dir()
39
40	def _load_templates_from_dir(self) -> None:
41	"""Load templates from template directory."""
42	if not self.templates_dir:
43	return
44
45	for template_file in self.templates_dir.glob("*.txt"):
46	template_name = template_file.stem
47	try:
48	with open(template_file, "r", encoding="utf-8") as f:
49	template_content = f.read()
50	self.templates[template_name] = template_content
51	logger.debug(f"Loaded template: {template_name}")
52	except Exception as e:
53	logger.warning(f"Error loading template {template_name}: {str(e)}")
54
55	def get_template(self, template_name: str) -> Optional[Template]:
56	"""
57	Get template by name.
58
59	Parameters
60	----------
61	template_name : str
62	Template name
63
64	Returns
65	-------
66	Template or None
67	Template object if found, None otherwise
68	"""
69	if template_name not in self.templates:
70	logger.warning(f"Template not found: {template_name}")
71	return None
72
73	return Template(self.templates[template_name])
74
75	def format_prompt(self, template_name: str, **kwargs) -> Optional[str]:
76	"""
77	Format prompt with provided parameters.
78
79	Parameters
80	----------
81	template_name : str
82	Template name
83	**kwargs : dict
84	Template parameters
85
86	Returns
87	-------
88	str or None
89	Formatted prompt if template exists, None otherwise
90	"""
91	template = self.get_template(template_name)
92	if not template:
93	return None
94
95	try:
96	return template.safe_substitute(**kwargs)
97	except Exception as e:
98	logger.error(f"Error formatting template {template_name}: {str(e)}")
99	return None
100
101	def add_template(self, template_name: str, template_content: str) -> None:
102	"""
103	Add or update template.
104
105	Parameters
106	----------
107	template_name : str
108	Template name
109	template_content : str
110	Template content
111	"""
112	self.templates[template_name] = template_content
113
114	def save_template(self, template_name: str) -> bool:
115	"""
116	Save template to file.
117
118	Parameters
119	----------
120	template_name : str
121	Template name
122
123	Returns
124	-------
125	bool
126	True if successful, False otherwise
127	"""
128	if not self.templates_dir:
129	logger.error("Templates directory not set")
130	return False
131
132	if template_name not in self.templates:
133	logger.warning(f"Template not found: {template_name}")
134	return False
135
136	try:
137	self.templates_dir.mkdir(parents=True, exist_ok=True)
138	template_path = self.templates_dir / f"{template_name}.txt"
139
140	with open(template_path, "w", encoding="utf-8") as f:
141	f.write(self.templates[template_name])
142
143	logger.debug(f"Saved template: {template_name}")
144	return True
145	except Exception as e:
146	logger.error(f"Error saving template {template_name}: {str(e)}")
147	return False

148
149	# Default prompt templates
150	DEFAULT_TEMPLATES = {
151	"content_analysis": """
152	Analyze the provided video content and extract key information:
	@@ -161,50 +162,48 @@
161	- Main topics and themes
162	- Key points for each topic
163	- Important details or facts
164	- Action items or follow-ups
165	- Relationships between concepts
166
167	Format the output as structured markdown.
168	""",
169
170	"diagram_extraction": """
171	Analyze the following image that contains a diagram, whiteboard content, or other visual information.
172

173	Extract and convert this visual information into a structured representation.
174
175	If it's a flowchart, process diagram, or similar structured visual:
176	- Identify the components and their relationships
177	- Preserve the logical flow and structure
178	- Convert it to mermaid diagram syntax
179
180	If it's a whiteboard with text, bullet points, or unstructured content:
181	- Extract all text elements
182	- Preserve hierarchical organization if present
183	- Maintain any emphasized or highlighted elements
184
185	Image context: $image_context
186
187	Return the results as markdown with appropriate structure.
188	""",
189
190	"action_item_detection": """
191	Review the following transcript and identify all action items, commitments, or follow-up tasks.
192
193	TRANSCRIPT:
194	$transcript
195
196	For each action item, extract:
197	- The specific action to be taken
198	- Who is responsible (if mentioned)
199	- Any deadlines or timeframes
200	- Priority level (if indicated)
201	- Context or additional details
202
203	Format the results as a structured list of action items.
204	""",
205
206	"content_summary": """
207	Provide a concise summary of the following content:
208
209	$content
210
	@@ -214,11 +213,10 @@
214	- Focus on the most important information
215	- Maintain a neutral, objective tone
216
217	Format the summary as clear, readable text.
218	""",
219
220	"summary_generation": """
221	Generate a comprehensive summary of the following transcript content.
222
223	CONTENT:
224	$content
	@@ -229,11 +227,10 @@
229	- Notes any important context or background
230	- Is 3-5 paragraphs long
231
232	Write in clear, professional prose.
233	""",
234
235	"key_points_extraction": """
236	Extract the key points from the following content.
237
238	CONTENT:
239	$content
	@@ -243,31 +240,30 @@
243	- "topic": category or topic area (optional)
244	- "details": supporting details (optional)
245
246	Example format:
247	[
248	{"point": "The system uses microservices architecture", "topic": "Architecture", "details": "Each service handles a specific domain"},
249	{"point": "Migration is planned for Q2", "topic": "Timeline", "details": null}
250	]
251
252	Return ONLY the JSON array, no additional text.
253	""",
254
255	"entity_extraction": """
256	Extract all notable entities (people, concepts, technologies, organizations, time references) from the following content.
257
258	CONTENT:
259	$content
260
261	Return a JSON array of entity objects:
262	[
263	{"name": "entity name", "type": "person\|concept\|technology\|organization\|time", "description": "brief description"}
264	]

265
266	Return ONLY the JSON array, no additional text.
267	""",
268
269	"relationship_extraction": """
270	Given the following content and entities, identify relationships between them.
271
272	CONTENT:
273	$content
	@@ -275,16 +271,15 @@
275	KNOWN ENTITIES:
276	$entities
277
278	Return a JSON array of relationship objects:
279	[
280	{"source": "entity A", "target": "entity B", "type": "relationship type (e.g., uses, manages, depends_on, created_by, part_of)"}
281	]
282
283	Return ONLY the JSON array, no additional text.
284	""",
285
286	"diagram_analysis": """
287	Analyze the following text extracted from a diagram or visual element.
288
289	DIAGRAM TEXT:
290	$diagram_text
	@@ -303,11 +298,10 @@
303	"summary": "brief description of what the diagram shows"
304	}
305
306	Return ONLY the JSON object, no additional text.
307	""",
308
309	"mermaid_generation": """
310	Convert the following diagram information into valid Mermaid diagram syntax.
311
312	Diagram Type: $diagram_type
313	Text Content: $text_content
	@@ -315,10 +309,10 @@
315
316	Generate a Mermaid diagram that accurately represents the visual structure.
317	Use the appropriate Mermaid diagram type (graph, sequenceDiagram, classDiagram, etc.).
318
319	Return ONLY the Mermaid code, no markdown fences or explanations.
320	"""
321	}
322
323	# Create default prompt template manager
324	default_prompt_manager = PromptTemplate(default_templates=DEFAULT_TEMPLATES)
325

	--- video_processor/utils/prompt_templates.py
	+++ video_processor/utils/prompt_templates.py
	@@ -1,152 +1,153 @@
1	"""Prompt templates for LLM-based content analysis."""
2
3	import logging

4	from pathlib import Path
5	from string import Template
6	from typing import Dict, Optional, Union
7
8	logger = logging.getLogger(__name__)
9
10
11	class PromptTemplate:
12	"""Template manager for LLM prompts."""
13
14	def __init__(
15	self,
16	templates_dir: Optional[Union[str, Path]] = None,
17	default_templates: Optional[Dict[str, str]] = None,
18	):
19	"""
20	Initialize prompt template manager.
21
22	Parameters
23	----------
24	templates_dir : str or Path, optional
25	Directory containing template files
26	default_templates : dict, optional
27	Default templates to use
28	"""
29	self.templates_dir = Path(templates_dir) if templates_dir else None
30	self.templates = {}
31
32	# Load default templates
33	if default_templates:
34	self.templates.update(default_templates)
35
36	# Load templates from directory if provided
37	if self.templates_dir and self.templates_dir.exists():
38	self._load_templates_from_dir()
39
40	def _load_templates_from_dir(self) -> None:
41	"""Load templates from template directory."""
42	if not self.templates_dir:
43	return
44
45	for template_file in self.templates_dir.glob("*.txt"):
46	template_name = template_file.stem
47	try:
48	with open(template_file, "r", encoding="utf-8") as f:
49	template_content = f.read()
50	self.templates[template_name] = template_content
51	logger.debug(f"Loaded template: {template_name}")
52	except Exception as e:
53	logger.warning(f"Error loading template {template_name}: {str(e)}")
54
55	def get_template(self, template_name: str) -> Optional[Template]:
56	"""
57	Get template by name.
58
59	Parameters
60	----------
61	template_name : str
62	Template name
63
64	Returns
65	-------
66	Template or None
67	Template object if found, None otherwise
68	"""
69	if template_name not in self.templates:
70	logger.warning(f"Template not found: {template_name}")
71	return None
72
73	return Template(self.templates[template_name])
74
75	def format_prompt(self, template_name: str, **kwargs) -> Optional[str]:
76	"""
77	Format prompt with provided parameters.
78
79	Parameters
80	----------
81	template_name : str
82	Template name
83	**kwargs : dict
84	Template parameters
85
86	Returns
87	-------
88	str or None
89	Formatted prompt if template exists, None otherwise
90	"""
91	template = self.get_template(template_name)
92	if not template:
93	return None
94
95	try:
96	return template.safe_substitute(**kwargs)
97	except Exception as e:
98	logger.error(f"Error formatting template {template_name}: {str(e)}")
99	return None
100
101	def add_template(self, template_name: str, template_content: str) -> None:
102	"""
103	Add or update template.
104
105	Parameters
106	----------
107	template_name : str
108	Template name
109	template_content : str
110	Template content
111	"""
112	self.templates[template_name] = template_content
113
114	def save_template(self, template_name: str) -> bool:
115	"""
116	Save template to file.
117
118	Parameters
119	----------
120	template_name : str
121	Template name
122
123	Returns
124	-------
125	bool
126	True if successful, False otherwise
127	"""
128	if not self.templates_dir:
129	logger.error("Templates directory not set")
130	return False
131
132	if template_name not in self.templates:
133	logger.warning(f"Template not found: {template_name}")
134	return False
135
136	try:
137	self.templates_dir.mkdir(parents=True, exist_ok=True)
138	template_path = self.templates_dir / f"{template_name}.txt"
139
140	with open(template_path, "w", encoding="utf-8") as f:
141	f.write(self.templates[template_name])
142
143	logger.debug(f"Saved template: {template_name}")
144	return True
145	except Exception as e:
146	logger.error(f"Error saving template {template_name}: {str(e)}")
147	return False
148
149
150	# Default prompt templates
151	DEFAULT_TEMPLATES = {
152	"content_analysis": """
153	Analyze the provided video content and extract key information:
	@@ -161,50 +162,48 @@
162	- Main topics and themes
163	- Key points for each topic
164	- Important details or facts
165	- Action items or follow-ups
166	- Relationships between concepts
167
168	Format the output as structured markdown.
169	""",

170	"diagram_extraction": """
171	Analyze the following image that contains a diagram, whiteboard content,
172	or other visual information.
173
174	Extract and convert this visual information into a structured representation.
175
176	If it's a flowchart, process diagram, or similar structured visual:
177	- Identify the components and their relationships
178	- Preserve the logical flow and structure
179	- Convert it to mermaid diagram syntax
180
181	If it's a whiteboard with text, bullet points, or unstructured content:
182	- Extract all text elements
183	- Preserve hierarchical organization if present
184	- Maintain any emphasized or highlighted elements
185
186	Image context: $image_context
187
188	Return the results as markdown with appropriate structure.
189	""",

190	"action_item_detection": """
191	Review the following transcript and identify all action items, commitments, or follow-up tasks.
192
193	TRANSCRIPT:
194	$transcript
195
196	For each action item, extract:
197	- The specific action to be taken
198	- Who is responsible (if mentioned)
199	- Any deadlines or timeframes
200	- Priority level (if indicated)
201	- Context or additional details
202
203	Format the results as a structured list of action items.
204	""",

205	"content_summary": """
206	Provide a concise summary of the following content:
207
208	$content
209
	@@ -214,11 +213,10 @@
213	- Focus on the most important information
214	- Maintain a neutral, objective tone
215
216	Format the summary as clear, readable text.
217	""",

218	"summary_generation": """
219	Generate a comprehensive summary of the following transcript content.
220
221	CONTENT:
222	$content
	@@ -229,11 +227,10 @@
227	- Notes any important context or background
228	- Is 3-5 paragraphs long
229
230	Write in clear, professional prose.
231	""",

232	"key_points_extraction": """
233	Extract the key points from the following content.
234
235	CONTENT:
236	$content
	@@ -243,31 +240,30 @@
240	- "topic": category or topic area (optional)
241	- "details": supporting details (optional)
242
243	Example format:
244	[
245	{"point": "The system uses microservices architecture",
246	"topic": "Architecture", "details": "Each service handles a specific domain"},
247	]
248
249	Return ONLY the JSON array, no additional text.
250	""",

251	"entity_extraction": """
252	Extract all notable entities (people, concepts, technologies, organizations,
253	time references) from the following content.
254	CONTENT:
255	$content
256
257	Return a JSON array of entity objects:
258	[
259	{"name": "entity name",
260	"type": "person\|concept\|technology\|organization\|time",
261	"description": "brief description"}
262
263	Return ONLY the JSON array, no additional text.
264	""",

265	"relationship_extraction": """
266	Given the following content and entities, identify relationships between them.
267
268	CONTENT:
269	$content
	@@ -275,16 +271,15 @@
271	KNOWN ENTITIES:
272	$entities
273
274	Return a JSON array of relationship objects:
275	[
276	{"source": "entity A", "target": "entity B",
277	"type": "relationship type (e.g., uses, manages, depends_on, created_by, part_of)"}
278
279	Return ONLY the JSON array, no additional text.
280	""",

281	"diagram_analysis": """
282	Analyze the following text extracted from a diagram or visual element.
283
284	DIAGRAM TEXT:
285	$diagram_text
	@@ -303,11 +298,10 @@
298	"summary": "brief description of what the diagram shows"
299	}
300
301	Return ONLY the JSON object, no additional text.
302	""",

303	"mermaid_generation": """
304	Convert the following diagram information into valid Mermaid diagram syntax.
305
306	Diagram Type: $diagram_type
307	Text Content: $text_content
	@@ -315,10 +309,10 @@
309
310	Generate a Mermaid diagram that accurately represents the visual structure.
311	Use the appropriate Mermaid diagram type (graph, sequenceDiagram, classDiagram, etc.).
312
313	Return ONLY the Mermaid code, no markdown fences or explanations.
314	""",
315	}
316
317	# Create default prompt template manager
318	default_prompt_manager = PromptTemplate(default_templates=DEFAULT_TEMPLATES)
319

M video_processor/utils/rendering.py

+8 -3

		--- video_processor/utils/rendering.py
		+++ video_processor/utils/rendering.py
		@@ -1,10 +1,10 @@
1	1	"""Mermaid rendering and chart reproduction utilities."""
2	2
3	3	import logging
4	4	from pathlib import Path
5		-from typing import Dict, Optional
	5	+from typing import Dict
6	6
7	7	logger = logging.getLogger(__name__)
8	8
9	9
10	10	def render_mermaid(mermaid_code: str, output_dir: str \| Path, name: str) -> Dict[str, Path]:
		@@ -47,15 +47,20 @@
47	47	png_content = rendered.img_response
48	48	if png_content:
49	49	if isinstance(png_content, bytes):
50	50	png_path.write_bytes(png_content)
51	51	else:
52		- png_path.write_bytes(png_content.encode() if isinstance(png_content, str) else png_content)
	52	+ png_path.write_bytes(
	53	+ png_content.encode() if isinstance(png_content, str) else png_content
	54	+ )
53	55	result["png"] = png_path
54	56
55	57	except ImportError:
56		- logger.warning("mermaid-py not installed, skipping SVG/PNG rendering. Install with: pip install mermaid-py")
	58	+ logger.warning(
	59	+ "mermaid-py not installed, skipping SVG/PNG rendering. "
	60	+ "Install with: pip install mermaid-py"
	61	+ )
57	62	except Exception as e:
58	63	logger.warning(f"Mermaid rendering failed for '{name}': {e}")
59	64
60	65	return result
61	66
62	67

	--- video_processor/utils/rendering.py
	+++ video_processor/utils/rendering.py
	@@ -1,10 +1,10 @@
1	"""Mermaid rendering and chart reproduction utilities."""
2
3	import logging
4	from pathlib import Path
5	from typing import Dict, Optional
6
7	logger = logging.getLogger(__name__)
8
9
10	def render_mermaid(mermaid_code: str, output_dir: str \| Path, name: str) -> Dict[str, Path]:
	@@ -47,15 +47,20 @@
47	png_content = rendered.img_response
48	if png_content:
49	if isinstance(png_content, bytes):
50	png_path.write_bytes(png_content)
51	else:
52	png_path.write_bytes(png_content.encode() if isinstance(png_content, str) else png_content)


53	result["png"] = png_path
54
55	except ImportError:
56	logger.warning("mermaid-py not installed, skipping SVG/PNG rendering. Install with: pip install mermaid-py")



57	except Exception as e:
58	logger.warning(f"Mermaid rendering failed for '{name}': {e}")
59
60	return result
61
62

	--- video_processor/utils/rendering.py
	+++ video_processor/utils/rendering.py
	@@ -1,10 +1,10 @@
1	"""Mermaid rendering and chart reproduction utilities."""
2
3	import logging
4	from pathlib import Path
5	from typing import Dict
6
7	logger = logging.getLogger(__name__)
8
9
10	def render_mermaid(mermaid_code: str, output_dir: str \| Path, name: str) -> Dict[str, Path]:
	@@ -47,15 +47,20 @@
47	png_content = rendered.img_response
48	if png_content:
49	if isinstance(png_content, bytes):
50	png_path.write_bytes(png_content)
51	else:
52	png_path.write_bytes(
53	png_content.encode() if isinstance(png_content, str) else png_content
54	)
55	result["png"] = png_path
56
57	except ImportError:
58	logger.warning(
59	"mermaid-py not installed, skipping SVG/PNG rendering. "
60	"Install with: pip install mermaid-py"
61	)
62	except Exception as e:
63	logger.warning(f"Mermaid rendering failed for '{name}': {e}")
64
65	return result
66
67

M video_processor/utils/usage_tracker.py

+10 -5

		--- video_processor/utils/usage_tracker.py
		+++ video_processor/utils/usage_tracker.py
		@@ -2,11 +2,10 @@
2	2
3	3	import time
4	4	from dataclasses import dataclass, field
5	5	from typing import Optional
6	6
7		-
8	7	# Cost per million tokens (USD) — updated Feb 2025
9	8	_MODEL_PRICING = {
10	9	# Anthropic
11	10	"claude-sonnet-4-5-20250929": {"input": 3.00, "output": 15.00},
12	11	"claude-haiku-3-5-20241022": {"input": 0.80, "output": 4.00},
		@@ -26,10 +25,11 @@
26	25
27	26
28	27	@dataclass
29	28	class ModelUsage:
30	29	"""Accumulated usage for a single model."""
	30	+
31	31	provider: str = ""
32	32	model: str = ""
33	33	calls: int = 0
34	34	input_tokens: int = 0
35	35	output_tokens: int = 0
		@@ -59,10 +59,11 @@
59	59
60	60
61	61	@dataclass
62	62	class StepTiming:
63	63	"""Timing for a single pipeline step."""
	64	+
64	65	name: str
65	66	start_time: float = 0.0
66	67	end_time: float = 0.0
67	68
68	69	@property
		@@ -73,10 +74,11 @@
73	74
74	75
75	76	@dataclass
76	77	class UsageTracker:
77	78	"""Tracks API usage, costs, and timing across a pipeline run."""
	79	+
78	80	_models: dict = field(default_factory=dict)
79	81	_steps: list = field(default_factory=list)
80	82	_current_step: Optional[StepTiming] = field(default=None)
81	83	_start_time: float = field(default_factory=time.time)
82	84
		@@ -160,25 +162,28 @@
160	162	)
161	163
162	164	# API usage
163	165	if self._models:
164	166	lines.append(f"\n API Calls: {self.total_api_calls}")
165		- lines.append(f" Tokens: {self.total_tokens:,} "
166		- f"({self.total_input_tokens:,} in / {self.total_output_tokens:,} out)")
	167	+ lines.append(
	168	+ f" Tokens: {self.total_tokens:,} "
	169	+ f"({self.total_input_tokens:,} in / {self.total_output_tokens:,} out)"
	170	+ )
167	171	lines.append("")
168	172	lines.append(f" {'Model':<35} {'Calls':>6} {'In Tok':>8} {'Out Tok':>8} {'Cost':>8}")
169		- lines.append(f" {'-'35} {'-'6} {'-'8} {'-'8} {'-'*8}")
	173	+ lines.append(f" {'-' * 35} {'-' * 6} {'-' * 8} {'-' * 8} {'-' * 8}")
170	174	for key in sorted(self._models.keys()):
171	175	u = self._models[key]
172	176	cost_str = f"${u.estimated_cost:.4f}" if u.estimated_cost > 0 else "free"
173	177	if u.audio_minutes > 0:
174	178	lines.append(
175	179	f" {key:<35} {u.calls:>6} {u.audio_minutes:>7.1f}m {'-':>8} {cost_str:>8}"
176	180	)
177	181	else:
178	182	lines.append(
179		- f" {key:<35} {u.calls:>6} {u.input_tokens:>8,} {u.output_tokens:>8,} {cost_str:>8}"
	183	+ f" {key:<35} {u.calls:>6} "
	184	+ f"{u.input_tokens:>8,} {u.output_tokens:>8,} {cost_str:>8}"
180	185	)
181	186
182	187	lines.append(f"\n Estimated total cost: ${self.total_cost:.4f}")
183	188
184	189	lines.append("=" * 60)
185	190
186	191	DELETED work_plan.md

	--- video_processor/utils/usage_tracker.py
	+++ video_processor/utils/usage_tracker.py
	@@ -2,11 +2,10 @@
2
3	import time
4	from dataclasses import dataclass, field
5	from typing import Optional
6
7
8	# Cost per million tokens (USD) — updated Feb 2025
9	_MODEL_PRICING = {
10	# Anthropic
11	"claude-sonnet-4-5-20250929": {"input": 3.00, "output": 15.00},
12	"claude-haiku-3-5-20241022": {"input": 0.80, "output": 4.00},
	@@ -26,10 +25,11 @@
26
27
28	@dataclass
29	class ModelUsage:
30	"""Accumulated usage for a single model."""

31	provider: str = ""
32	model: str = ""
33	calls: int = 0
34	input_tokens: int = 0
35	output_tokens: int = 0
	@@ -59,10 +59,11 @@
59
60
61	@dataclass
62	class StepTiming:
63	"""Timing for a single pipeline step."""

64	name: str
65	start_time: float = 0.0
66	end_time: float = 0.0
67
68	@property
	@@ -73,10 +74,11 @@
73
74
75	@dataclass
76	class UsageTracker:
77	"""Tracks API usage, costs, and timing across a pipeline run."""

78	_models: dict = field(default_factory=dict)
79	_steps: list = field(default_factory=list)
80	_current_step: Optional[StepTiming] = field(default=None)
81	_start_time: float = field(default_factory=time.time)
82
	@@ -160,25 +162,28 @@
160	)
161
162	# API usage
163	if self._models:
164	lines.append(f"\n API Calls: {self.total_api_calls}")
165	lines.append(f" Tokens: {self.total_tokens:,} "
166	f"({self.total_input_tokens:,} in / {self.total_output_tokens:,} out)")


167	lines.append("")
168	lines.append(f" {'Model':<35} {'Calls':>6} {'In Tok':>8} {'Out Tok':>8} {'Cost':>8}")
169	lines.append(f" {'-'35} {'-'6} {'-'8} {'-'8} {'-'*8}")
170	for key in sorted(self._models.keys()):
171	u = self._models[key]
172	cost_str = f"${u.estimated_cost:.4f}" if u.estimated_cost > 0 else "free"
173	if u.audio_minutes > 0:
174	lines.append(
175	f" {key:<35} {u.calls:>6} {u.audio_minutes:>7.1f}m {'-':>8} {cost_str:>8}"
176	)
177	else:
178	lines.append(
179	f" {key:<35} {u.calls:>6} {u.input_tokens:>8,} {u.output_tokens:>8,} {cost_str:>8}"

180	)
181
182	lines.append(f"\n Estimated total cost: ${self.total_cost:.4f}")
183
184	lines.append("=" * 60)
185
186	ELETED work_plan.md

	--- video_processor/utils/usage_tracker.py
	+++ video_processor/utils/usage_tracker.py
	@@ -2,11 +2,10 @@
2
3	import time
4	from dataclasses import dataclass, field
5	from typing import Optional
6

7	# Cost per million tokens (USD) — updated Feb 2025
8	_MODEL_PRICING = {
9	# Anthropic
10	"claude-sonnet-4-5-20250929": {"input": 3.00, "output": 15.00},
11	"claude-haiku-3-5-20241022": {"input": 0.80, "output": 4.00},
	@@ -26,10 +25,11 @@
25
26
27	@dataclass
28	class ModelUsage:
29	"""Accumulated usage for a single model."""
30
31	provider: str = ""
32	model: str = ""
33	calls: int = 0
34	input_tokens: int = 0
35	output_tokens: int = 0
	@@ -59,10 +59,11 @@
59
60
61	@dataclass
62	class StepTiming:
63	"""Timing for a single pipeline step."""
64
65	name: str
66	start_time: float = 0.0
67	end_time: float = 0.0
68
69	@property
	@@ -73,10 +74,11 @@
74
75
76	@dataclass
77	class UsageTracker:
78	"""Tracks API usage, costs, and timing across a pipeline run."""
79
80	_models: dict = field(default_factory=dict)
81	_steps: list = field(default_factory=list)
82	_current_step: Optional[StepTiming] = field(default=None)
83	_start_time: float = field(default_factory=time.time)
84
	@@ -160,25 +162,28 @@
162	)
163
164	# API usage
165	if self._models:
166	lines.append(f"\n API Calls: {self.total_api_calls}")
167	lines.append(
168	f" Tokens: {self.total_tokens:,} "
169	f"({self.total_input_tokens:,} in / {self.total_output_tokens:,} out)"
170	)
171	lines.append("")
172	lines.append(f" {'Model':<35} {'Calls':>6} {'In Tok':>8} {'Out Tok':>8} {'Cost':>8}")
173	lines.append(f" {'-' * 35} {'-' * 6} {'-' * 8} {'-' * 8} {'-' * 8}")
174	for key in sorted(self._models.keys()):
175	u = self._models[key]
176	cost_str = f"${u.estimated_cost:.4f}" if u.estimated_cost > 0 else "free"
177	if u.audio_minutes > 0:
178	lines.append(
179	f" {key:<35} {u.calls:>6} {u.audio_minutes:>7.1f}m {'-':>8} {cost_str:>8}"
180	)
181	else:
182	lines.append(
183	f" {key:<35} {u.calls:>6} "
184	f"{u.input_tokens:>8,} {u.output_tokens:>8,} {cost_str:>8}"
185	)
186
187	lines.append(f"\n Estimated total cost: ${self.total_cost:.4f}")
188
189	lines.append("=" * 60)
190
191	ELETED work_plan.md

D work_plan.md

-188

		--- a/work_plan.md
		+++ b/work_plan.md
		@@ -1,188 +0,0 @@
1		-PlanOpticon Development Roadmap
2		-This document outlines the development milestones and actionable tasks for implementing the PlanOpticon video analysis system, prioritizing rapid delivery of useful outputs.
3		-Milestone 1: Core Video Processing & Markdown Output
4		-Goal: Process a video and produce markdown notes and mermaid diagrams
5		-Infrastructure Setup
6		-
7		- Initialize project repository structure
8		- Implement basic CLI with argparse
9		- Create configuration management system
10		- Set up logging framework
11		-
12		-Video & Audio Processing
13		-
14		- Implement video frame extraction
15		- Create audio extraction pipeline
16		- Build frame sampling strategy based on visual changes
17		- Implement basic scene detection using cloud APIs
18		-
19		-Transcription & Analysis
20		-
21		- Integrate with cloud speech-to-text APIs (e.g., OpenAI Whisper API, Google Speech-to-Text)
22		- Implement text analysis using LLM APIs (e.g., Claude API, GPT-4 API)
23		- Build keyword and key point extraction via API integration
24		- Create prompt templates for effective LLM content analysis
25		-
26		-Diagram Generation
27		-
28		- Create flow visualization module using mermaid syntax
29		- Implement relationship mapping for detected topics
30		- Build timeline representation generator
31		- Leverage computer vision APIs (e.g., GPT-4 Vision, Google Cloud Vision) for diagram extraction from slides/whiteboards
32		-
33		-Markdown Output Generation
34		-
35		- Implement structured markdown generator
36		- Create templating system for output
37		- Build mermaid diagram integration
38		- Develop table of contents generator
39		-
40		-Testing & Validation
41		-
42		- Set up basic testing infrastructure
43		- Create sample videos for testing
44		- Implement quality checks for outputs
45		- Build simple validation metrics
46		-
47		-Success Criteria:
48		-
49		-Run script with a video input and receive markdown output with embedded mermaid diagrams
50		-Content correctly captures main topics and relationships
51		-Basic structure includes headings, bullet points, and at least one diagram
52		-
53		-Milestone 2: Advanced Content Analysis
54		-Goal: Enhance extraction quality and content organization
55		-Improved Speech Processing
56		-
57		- Integrate specialized speaker diarization APIs
58		- Create transcript segmentation via LLM prompting
59		- Build timestamp synchronization with content
60		- Implement API-based vocabulary detection and handling
61		-
62		-Enhanced Visual Analysis
63		-
64		- Optimize prompts for vision APIs to detect diagrams and charts
65		- Create efficient frame selection for API cost management
66		- Build structured prompt chains for detailed visual analysis
67		- Implement caching mechanism for API responses
68		-
69		-Content Organization
70		-
71		- Implement hierarchical topic modeling
72		- Create concept relationship mapping
73		- Build content categorization
74		- Develop importance scoring for extracted points
75		-
76		-Quality Improvements
77		-
78		- Implement noise filtering for audio
79		- Create redundancy reduction in notes
80		- Build context preservation mechanisms
81		- Develop content verification systems
82		-
83		-Milestone 3: Action Item & Knowledge Extraction
84		-Goal: Identify action items and build knowledge structures
85		-Action Item Detection
86		-
87		- Implement commitment language recognition
88		- Create deadline and timeframe extraction
89		- Build responsibility attribution
90		- Develop priority estimation
91		-
92		-Knowledge Organization
93		-
94		- Implement knowledge graph construction
95		- Create entity recognition and linking
96		- Build cross-reference system
97		- Develop temporal relationship tracking
98		-
99		-Enhanced Output Options
100		-
101		- Implement JSON structured data output
102		- Create SVG diagram generation
103		- Build interactive HTML output option
104		- Develop customizable templates
105		-
106		-Integration Components
107		-
108		- Implement unified data model
109		- Create serialization framework
110		- Build persistence layer for results
111		- Develop query interface for extracted knowledge
112		-
113		-Milestone 4: Optimization & Deployment
114		-Goal: Enhance performance and create deployment package
115		-Performance Optimization
116		-
117		- Implement GPU acceleration for core algorithms
118		- Create ARM-specific optimizations
119		- Build memory usage optimization
120		- Develop parallel processing capabilities
121		-
122		-System Packaging
123		-
124		- Implement dependency management
125		- Create installation scripts
126		- Build comprehensive documentation
127		- Develop container deployment option
128		-
129		-Advanced Features
130		-
131		- Implement custom domain adaptation
132		- Create multi-video correlation
133		- Build confidence scoring for extraction
134		- Develop automated quality assessment
135		-
136		-User Experience
137		-
138		- Implement progress reporting
139		- Create error handling and recovery
140		- Build output customization options
141		- Develop feedback collection mechanism
142		-
143		-Priority Matrix
144		-FeatureImportanceTechnical ComplexityDependenciesPriorityVideo Frame ExtractionHighLowNoneP0Audio TranscriptionHighMediumAudio ExtractionP0Markdown GenerationHighLowContent AnalysisP0Mermaid Diagram CreationHighMediumContent AnalysisP0Topic ExtractionHighMediumTranscriptionP0Basic CLIHighLowNoneP0Speaker DiarizationMediumHighAudio ExtractionP2Visual Element DetectionHighHighFrame ExtractionP1Action Item DetectionMediumMediumTranscriptionP1GPU AccelerationLowMediumCore ProcessingP3ARM OptimizationMediumMediumCore ProcessingP2Installation PackageMediumLowWorking SystemP2
145		-Implementation Approach
146		-To achieve the first milestone efficiently:
147		-
148		-Leverage Existing Cloud APIs
149		-
150		-Integrate with cloud speech-to-text services rather than building models
151		-Use vision APIs for image/slide/whiteboard analysis
152		-Employ LLM APIs (OpenAI, Anthropic, etc.) for content analysis and summarization
153		-Implement API fallbacks and retries for robustness
154		-
155		-
156		-Focus on Pipeline Integration
157		-
158		-Build connectors between components
159		-Ensure data flows properly through the system
160		-Create uniform data structures for interoperability
161		-
162		-
163		-Build for Extensibility
164		-
165		-Design plugin architecture from the beginning
166		-Use configuration-driven approach where possible
167		-Create clear interfaces between components
168		-
169		-
170		-Iterative Refinement
171		-
172		-Implement basic functionality first
173		-Add sophistication in subsequent iterations
174		-Collect feedback after each milestone
175		-
176		-
177		-
178		-Next Steps
179		-After completing this roadmap, potential future enhancements include:
180		-
181		-Real-time processing capabilities
182		-Integration with video conferencing platforms
183		-Collaborative annotation and editing features
184		-Domain-specific model fine-tuning
185		-Multi-language support
186		-Customizable output formats
187		-
188		-This roadmap provides a clear path to developing PlanOpticon with a focus on delivering value quickly through a milestone-based approach, prioritizing the generation of markdown notes and mermaid diagrams as the first outcome.

	--- a/work_plan.md
	+++ b/work_plan.md
	@@ -1,188 +0,0 @@
1	PlanOpticon Development Roadmap
2	This document outlines the development milestones and actionable tasks for implementing the PlanOpticon video analysis system, prioritizing rapid delivery of useful outputs.
3	Milestone 1: Core Video Processing & Markdown Output
4	Goal: Process a video and produce markdown notes and mermaid diagrams
5	Infrastructure Setup
6
7	Initialize project repository structure
8	Implement basic CLI with argparse
9	Create configuration management system
10	Set up logging framework
11
12	Video & Audio Processing
13
14	Implement video frame extraction
15	Create audio extraction pipeline
16	Build frame sampling strategy based on visual changes
17	Implement basic scene detection using cloud APIs
18
19	Transcription & Analysis
20
21	Integrate with cloud speech-to-text APIs (e.g., OpenAI Whisper API, Google Speech-to-Text)
22	Implement text analysis using LLM APIs (e.g., Claude API, GPT-4 API)
23	Build keyword and key point extraction via API integration
24	Create prompt templates for effective LLM content analysis
25
26	Diagram Generation
27
28	Create flow visualization module using mermaid syntax
29	Implement relationship mapping for detected topics
30	Build timeline representation generator
31	Leverage computer vision APIs (e.g., GPT-4 Vision, Google Cloud Vision) for diagram extraction from slides/whiteboards
32
33	Markdown Output Generation
34
35	Implement structured markdown generator
36	Create templating system for output
37	Build mermaid diagram integration
38	Develop table of contents generator
39
40	Testing & Validation
41
42	Set up basic testing infrastructure
43	Create sample videos for testing
44	Implement quality checks for outputs
45	Build simple validation metrics
46
47	Success Criteria:
48
49	Run script with a video input and receive markdown output with embedded mermaid diagrams
50	Content correctly captures main topics and relationships
51	Basic structure includes headings, bullet points, and at least one diagram
52
53	Milestone 2: Advanced Content Analysis
54	Goal: Enhance extraction quality and content organization
55	Improved Speech Processing
56
57	Integrate specialized speaker diarization APIs
58	Create transcript segmentation via LLM prompting
59	Build timestamp synchronization with content
60	Implement API-based vocabulary detection and handling
61
62	Enhanced Visual Analysis
63
64	Optimize prompts for vision APIs to detect diagrams and charts
65	Create efficient frame selection for API cost management
66	Build structured prompt chains for detailed visual analysis
67	Implement caching mechanism for API responses
68
69	Content Organization
70
71	Implement hierarchical topic modeling
72	Create concept relationship mapping
73	Build content categorization
74	Develop importance scoring for extracted points
75
76	Quality Improvements
77
78	Implement noise filtering for audio
79	Create redundancy reduction in notes
80	Build context preservation mechanisms
81	Develop content verification systems
82
83	Milestone 3: Action Item & Knowledge Extraction
84	Goal: Identify action items and build knowledge structures
85	Action Item Detection
86
87	Implement commitment language recognition
88	Create deadline and timeframe extraction
89	Build responsibility attribution
90	Develop priority estimation
91
92	Knowledge Organization
93
94	Implement knowledge graph construction
95	Create entity recognition and linking
96	Build cross-reference system
97	Develop temporal relationship tracking
98
99	Enhanced Output Options
100
101	Implement JSON structured data output
102	Create SVG diagram generation
103	Build interactive HTML output option
104	Develop customizable templates
105
106	Integration Components
107
108	Implement unified data model
109	Create serialization framework
110	Build persistence layer for results
111	Develop query interface for extracted knowledge
112
113	Milestone 4: Optimization & Deployment
114	Goal: Enhance performance and create deployment package
115	Performance Optimization
116
117	Implement GPU acceleration for core algorithms
118	Create ARM-specific optimizations
119	Build memory usage optimization
120	Develop parallel processing capabilities
121
122	System Packaging
123
124	Implement dependency management
125	Create installation scripts
126	Build comprehensive documentation
127	Develop container deployment option
128
129	Advanced Features
130
131	Implement custom domain adaptation
132	Create multi-video correlation
133	Build confidence scoring for extraction
134	Develop automated quality assessment
135
136	User Experience
137
138	Implement progress reporting
139	Create error handling and recovery
140	Build output customization options
141	Develop feedback collection mechanism
142
143	Priority Matrix
144	FeatureImportanceTechnical ComplexityDependenciesPriorityVideo Frame ExtractionHighLowNoneP0Audio TranscriptionHighMediumAudio ExtractionP0Markdown GenerationHighLowContent AnalysisP0Mermaid Diagram CreationHighMediumContent AnalysisP0Topic ExtractionHighMediumTranscriptionP0Basic CLIHighLowNoneP0Speaker DiarizationMediumHighAudio ExtractionP2Visual Element DetectionHighHighFrame ExtractionP1Action Item DetectionMediumMediumTranscriptionP1GPU AccelerationLowMediumCore ProcessingP3ARM OptimizationMediumMediumCore ProcessingP2Installation PackageMediumLowWorking SystemP2
145	Implementation Approach
146	To achieve the first milestone efficiently:
147
148	Leverage Existing Cloud APIs
149
150	Integrate with cloud speech-to-text services rather than building models
151	Use vision APIs for image/slide/whiteboard analysis
152	Employ LLM APIs (OpenAI, Anthropic, etc.) for content analysis and summarization
153	Implement API fallbacks and retries for robustness
154
155
156	Focus on Pipeline Integration
157
158	Build connectors between components
159	Ensure data flows properly through the system
160	Create uniform data structures for interoperability
161
162
163	Build for Extensibility
164
165	Design plugin architecture from the beginning
166	Use configuration-driven approach where possible
167	Create clear interfaces between components
168
169
170	Iterative Refinement
171
172	Implement basic functionality first
173	Add sophistication in subsequent iterations
174	Collect feedback after each milestone
175
176
177
178	Next Steps
179	After completing this roadmap, potential future enhancements include:
180
181	Real-time processing capabilities
182	Integration with video conferencing platforms
183	Collaborative annotation and editing features
184	Domain-specific model fine-tuning
185	Multi-language support
186	Customizable output formats
187
188	This roadmap provides a clear path to developing PlanOpticon with a focus on delivering value quickly through a milestone-based approach, prioritizing the generation of markdown notes and mermaid diagrams as the first outcome.

	--- a/work_plan.md
	+++ b/work_plan.md
	@@ -1,188 +0,0 @@

PlanOpticon

Keyboard Shortcuts