PlanOpticon

Prepare repo for open-source publishing - Fix all ruff lint errors (400 -> 0), auto-format codebase - Add GitHub community files: issue templates (bug report, feature request), PR template, CONTRIBUTING.md, SECURITY.md, FUNDING.yml - Fix Windows binary build (add shell: bash to PyInstaller step) - Remove internal planning docs (implementation.md, work_plan.md) - Remove setup script (scripts/setup.sh) - Update .gitignore: add AI tools, cloud CLI dirs, .venv, site/ - Exclude prompt_templates.py from E501 (LLM prompt strings)

leo 2026-02-15 05:31 trunk
Commit 829e24abdf9a5ae4b33462ec759c18bf0128503bda0cebc7759347b15fd7a32d
59 files changed +79 +1 +106 +5 +37 +25 +40 +1 -272 +3 -120 +1 +56 -32 +9 -9 +1 -4 +28 -34 -3 +13 -6 +9 -7 +79 -42 +12 -9 +2 -4 +11 -7 -4 +33 -23 -2 +10 -4 +3 -7 +7 -21 +7 -5 +9 -7 +25 -11 +46 -15 +25 -22 +11 -11 +60 -65 +28 -15 +73 -72 +72 -52 +4 -3 +38 -17 -1 +17 -14 +8 -6 +2 -2 +3 +19 -16 +29 -7 +36 -20 +4 -5 +1 +3 -9 +7 -20 +50 -56 +7 -4 +50 -56 +8 -3 +10 -5 -188
+ .github/CONTRIBUTING.md + .github/FUNDING.yml + .github/ISSUE_TEMPLATE/bug_report.yml + .github/ISSUE_TEMPLATE/config.yml + .github/ISSUE_TEMPLATE/feature_request.yml + .github/PULL_REQUEST_TEMPLATE.md + .github/SECURITY.md ~ .github/workflows/release-binaries.yml - implementation.md ~ pyproject.toml - scripts/setup.sh ~ setup.py ~ tests/test_action_detector.py ~ tests/test_agent.py ~ tests/test_api_cache.py ~ tests/test_audio_extractor.py ~ tests/test_batch.py ~ tests/test_cloud_sources.py ~ tests/test_content_analyzer.py ~ tests/test_diagram_analyzer.py ~ tests/test_frame_extractor.py ~ tests/test_json_parsing.py ~ tests/test_models.py ~ tests/test_output_structure.py ~ tests/test_pipeline.py ~ tests/test_prompt_templates.py ~ tests/test_providers.py ~ tests/test_rendering.py ~ video_processor/agent/orchestrator.py ~ video_processor/analyzers/action_detector.py ~ video_processor/analyzers/content_analyzer.py ~ video_processor/analyzers/diagram_analyzer.py ~ video_processor/cli/commands.py ~ video_processor/cli/output_formatter.py ~ video_processor/extractors/__init__.py ~ video_processor/extractors/audio_extractor.py ~ video_processor/extractors/frame_extractor.py ~ video_processor/extractors/text_extractor.py ~ video_processor/integrators/knowledge_graph.py ~ video_processor/integrators/plan_generator.py ~ video_processor/models.py ~ video_processor/output_structure.py ~ video_processor/pipeline.py ~ video_processor/providers/anthropic_provider.py ~ video_processor/providers/base.py ~ video_processor/providers/discovery.py ~ video_processor/providers/gemini_provider.py ~ video_processor/providers/manager.py ~ video_processor/providers/openai_provider.py ~ video_processor/providers/whisper_local.py ~ video_processor/sources/base.py ~ video_processor/sources/dropbox_source.py ~ video_processor/sources/google_drive.py ~ video_processor/utils/api_cache.py ~ video_processor/utils/export.py ~ video_processor/utils/prompt_templates.py ~ video_processor/utils/rendering.py ~ video_processor/utils/usage_tracker.py - work_plan.md
--- a/.github/CONTRIBUTING.md
+++ b/.github/CONTRIBUTING.md
@@ -0,0 +1,79 @@
1
+# Contributing to PlanOpticon
2
+
3
+Thank you for your interest in contributing to PlanOpticon! This guide will help you get started.
4
+
5
+## Development Setup
6
+
7
+1. **Fork and clone the repository:**
8
+
9
+ ```bash
10
+ git clone https://github.com/<your-username>/PlanOpticon.git
11
+ cd PlanOpticon
12
+ ```
13
+
14
+2. **Create a virtual environment:**
15
+
16
+ ```bash
17
+ python -m venv .venv
18
+ source .venv/bin/activate # On Windows: .venv\Scripts\activate
19
+ ```
20
+
21
+3. **Install in editable mode with dev dependencies:**
22
+
23
+ ```bash
24
+ pip install -e ".[dev]"
25
+ ```
26
+
27
+4. **Install FFmpeg** (required for video processing):
28
+
29
+ ```bash
30
+ # macOS
31
+ brew install ffmpeg
32
+
33
+ # Ubuntu/Debian
34
+ sudo apt install ffmpeg
35
+ ```
36
+
37
+5. **Set up at least one AI provider API key:**
38
+
39
+ ```bash
40
+ export OPENAI_API_KEY="sk-..."
41
+ # or
42
+ export ANTHROPIC_API_KEY="sk-ant-..."
43
+ # or
44
+ export GEMINI_API_KEY="..."
45
+ ```
46
+
47
+## Running Tests
48
+
49
+```bash
50
+pytest tests/
51
+```
52
+
53
+To run tests with coverage:
54
+
55
+```bash
56
+pytest tests/ --cov=video_processor
57
+```
58
+
59
+## Code Style
60
+
61
+This project uses [Ruff](https://docs.astral.sh/ruff/) for linting and formatting.
62
+
63
+**Check for lint issues:**
64
+
65
+```bash
66
+ruff check .
67
+```
68
+
69
+*modifying files):**
70
+
71
+```bash
72
+ruff format --check .
73
+```
74
+
75
+The project targets a line length of 100 characters and Python 3.10+. See `pyproject.toml` for the full Ruff configuration.
76
+
77
+## Commit Conventions
78
+
79
+Write clear, descriptive commit messages. Use the imperative mood in t
--- a/.github/CONTRIBUTING.md
+++ b/.github/CONTRIBUTING.md
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- a/.github/CONTRIBUTING.md
+++ b/.github/CONTRIBUTING.md
@@ -0,0 +1,79 @@
1 # Contributing to PlanOpticon
2
3 Thank you for your interest in contributing to PlanOpticon! This guide will help you get started.
4
5 ## Development Setup
6
7 1. **Fork and clone the repository:**
8
9 ```bash
10 git clone https://github.com/<your-username>/PlanOpticon.git
11 cd PlanOpticon
12 ```
13
14 2. **Create a virtual environment:**
15
16 ```bash
17 python -m venv .venv
18 source .venv/bin/activate # On Windows: .venv\Scripts\activate
19 ```
20
21 3. **Install in editable mode with dev dependencies:**
22
23 ```bash
24 pip install -e ".[dev]"
25 ```
26
27 4. **Install FFmpeg** (required for video processing):
28
29 ```bash
30 # macOS
31 brew install ffmpeg
32
33 # Ubuntu/Debian
34 sudo apt install ffmpeg
35 ```
36
37 5. **Set up at least one AI provider API key:**
38
39 ```bash
40 export OPENAI_API_KEY="sk-..."
41 # or
42 export ANTHROPIC_API_KEY="sk-ant-..."
43 # or
44 export GEMINI_API_KEY="..."
45 ```
46
47 ## Running Tests
48
49 ```bash
50 pytest tests/
51 ```
52
53 To run tests with coverage:
54
55 ```bash
56 pytest tests/ --cov=video_processor
57 ```
58
59 ## Code Style
60
61 This project uses [Ruff](https://docs.astral.sh/ruff/) for linting and formatting.
62
63 **Check for lint issues:**
64
65 ```bash
66 ruff check .
67 ```
68
69 *modifying files):**
70
71 ```bash
72 ruff format --check .
73 ```
74
75 The project targets a line length of 100 characters and Python 3.10+. See `pyproject.toml` for the full Ruff configuration.
76
77 ## Commit Conventions
78
79 Write clear, descriptive commit messages. Use the imperative mood in t
--- a/.github/FUNDING.yml
+++ b/.github/FUNDING.yml
@@ -0,0 +1 @@
1
+github: ConflictHQ
--- a/.github/FUNDING.yml
+++ b/.github/FUNDING.yml
@@ -0,0 +1 @@
 
--- a/.github/FUNDING.yml
+++ b/.github/FUNDING.yml
@@ -0,0 +1 @@
1 github: ConflictHQ
--- a/.github/ISSUE_TEMPLATE/bug_report.yml
+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -0,0 +1,106 @@
1
+name: Bug Report
2
+description: Report a bug in PlanOpticon
3
+title: "[Bug]: "
4
+labels: ["bug", "triage"]
5
+body:
6
+ - type: markdown
7
+ attributes:
8
+ value: |
9
+ Thank you for taking the time to report a bug. Please fill out the fields below so we can diagnose and fix the issue as quickly as possible.
10
+
11
+ - type: textarea
12
+ id: description
13
+ attributes:
14
+ label: Description
15
+ description: A clear and concise description of the bug.
16
+ placeholder: Describe the bug...
17
+ validations:
18
+ required: true
19
+
20
+ - type: textarea
21
+ id: steps-to-reproduce
22
+ attributes:
23
+ label: Steps to Reproduce
24
+ description: The exact steps to reproduce the behavior.
25
+ placeholder: |
26
+ 1. Run `planopticon analyze -i video.mp4 -o ./output`
27
+ 2. Wait for frame extraction to complete
28
+ 3. Observe error in diagram extraction step
29
+ validations:
30
+ required: true
31
+
32
+ - type: textarea
33
+ id: expected-behavior
34
+ attributes:
35
+ label: Expected Behavior
36
+ description: What you expected to happen.
37
+ placeholder: Describe what you expected...
38
+ validations:
39
+ required: true
40
+
41
+ - type: textarea
42
+ id: actual-behavior
43
+ attributes:
44
+ label: Actual Behavior
45
+ description: What actually happened.
46
+ placeholder: Describe what actually happened...
47
+ validations:
48
+ required: true
49
+
50
+ - type: dropdown
51
+ id: os
52
+ attributes:
53
+ label: Operating System
54
+ options:
55
+ - macOS
56
+ - Linux (Ubuntu/Debian)
57
+ - Linux (Fedora/RHEL)
58
+ - Linux (other)
59
+ - Windows
60
+ - Other
61
+ validations:
62
+ required: true
63
+
64
+ - type: dropdown
65
+ id: python-version
66
+ attributes:
67
+ label: Python Version
68
+ options:
69
+ - "3.13"
70
+ - "3.12"
71
+ - "3.11"
72
+ - "3.10"
73
+ validations:
74
+ required: true
75
+
76
+ - type: input
77
+ id: planopticon-version
78
+ attributes:
79
+ label: PlanOpticon Version
80
+ description: Run `planopticon --version` or `pip show planopticon` to find this.
81
+ placeholder: "e.g. 0.2.0"
82
+ validations:
83
+ required: true
84
+
85
+ - type: dropdown
86
+ id: provider
87
+ attributes:
88
+ label: AI Provider
89
+ description: Which AI provider were you using when the bug occurred?
90
+ options:
91
+ - OpenAI
92
+ - Anthropic
93
+ - Google Gemini
94
+ - Multiple providers
95
+ - Not applicable
96
+ validations:
97
+ required: true
98
+
99
+ - type: textarea
100
+ id: logs
101
+ attributes:
102
+ label: Logs
103
+ description: Paste any relevant log output. This will be automatically formatted as code.
104
+ render: shell
105
+ validations:
106
+ required: false
--- a/.github/ISSUE_TEMPLATE/bug_report.yml
+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- a/.github/ISSUE_TEMPLATE/bug_report.yml
+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -0,0 +1,106 @@
1 name: Bug Report
2 description: Report a bug in PlanOpticon
3 title: "[Bug]: "
4 labels: ["bug", "triage"]
5 body:
6 - type: markdown
7 attributes:
8 value: |
9 Thank you for taking the time to report a bug. Please fill out the fields below so we can diagnose and fix the issue as quickly as possible.
10
11 - type: textarea
12 id: description
13 attributes:
14 label: Description
15 description: A clear and concise description of the bug.
16 placeholder: Describe the bug...
17 validations:
18 required: true
19
20 - type: textarea
21 id: steps-to-reproduce
22 attributes:
23 label: Steps to Reproduce
24 description: The exact steps to reproduce the behavior.
25 placeholder: |
26 1. Run `planopticon analyze -i video.mp4 -o ./output`
27 2. Wait for frame extraction to complete
28 3. Observe error in diagram extraction step
29 validations:
30 required: true
31
32 - type: textarea
33 id: expected-behavior
34 attributes:
35 label: Expected Behavior
36 description: What you expected to happen.
37 placeholder: Describe what you expected...
38 validations:
39 required: true
40
41 - type: textarea
42 id: actual-behavior
43 attributes:
44 label: Actual Behavior
45 description: What actually happened.
46 placeholder: Describe what actually happened...
47 validations:
48 required: true
49
50 - type: dropdown
51 id: os
52 attributes:
53 label: Operating System
54 options:
55 - macOS
56 - Linux (Ubuntu/Debian)
57 - Linux (Fedora/RHEL)
58 - Linux (other)
59 - Windows
60 - Other
61 validations:
62 required: true
63
64 - type: dropdown
65 id: python-version
66 attributes:
67 label: Python Version
68 options:
69 - "3.13"
70 - "3.12"
71 - "3.11"
72 - "3.10"
73 validations:
74 required: true
75
76 - type: input
77 id: planopticon-version
78 attributes:
79 label: PlanOpticon Version
80 description: Run `planopticon --version` or `pip show planopticon` to find this.
81 placeholder: "e.g. 0.2.0"
82 validations:
83 required: true
84
85 - type: dropdown
86 id: provider
87 attributes:
88 label: AI Provider
89 description: Which AI provider were you using when the bug occurred?
90 options:
91 - OpenAI
92 - Anthropic
93 - Google Gemini
94 - Multiple providers
95 - Not applicable
96 validations:
97 required: true
98
99 - type: textarea
100 id: logs
101 attributes:
102 label: Logs
103 description: Paste any relevant log output. This will be automatically formatted as code.
104 render: shell
105 validations:
106 required: false
--- a/.github/ISSUE_TEMPLATE/config.yml
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,5 @@
1
+blank_issues_enabled: false
2
+contact_links:
3
+ - name: Discussions
4
+ url: https://github.com/ConflictHQ/PlanOpticon/discussions
5
+ about: Ask questions, share ideas, or discuss PlanOpticon with the community.
--- a/.github/ISSUE_TEMPLATE/config.yml
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,5 @@
 
 
 
 
 
--- a/.github/ISSUE_TEMPLATE/config.yml
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,5 @@
1 blank_issues_enabled: false
2 contact_links:
3 - name: Discussions
4 url: https://github.com/ConflictHQ/PlanOpticon/discussions
5 about: Ask questions, share ideas, or discuss PlanOpticon with the community.
--- a/.github/ISSUE_TEMPLATE/feature_request.yml
+++ b/.github/ISSUE_TEMPLATE/feature_request.yml
@@ -0,0 +1,37 @@
1
+name: Feature Request
2
+description: Suggest a new feature or improvement for PlanOpticon
3
+title: "[Feature]: "
4
+labels: ["enhancement"]
5
+body:
6
+ - type: markdown
7
+ attributes:
8
+ value: |
9
+ We appreciate your ideas for improving PlanOpticon. Please describe your feature request in detail so we can evaluate and prioritize it.
10
+
11
+ - type: textarea
12
+ id: description
13
+ attributes:
14
+ label: Description
15
+ description: A clear and concise description of the feature you would like to see.
16
+ placeholder: Describe the feature...
17
+ validations:
18
+ required: true
19
+
20
+ - type: textarea
21
+ id: use-case
22
+ attributes:
23
+ label: Use Case
24
+ description: Explain the problem this feature would solve or the workflow it would improve. Why is this feature important to you?
25
+ placeholder: |
26
+ As a user who processes large batches of meeting recordings, I need...
27
+ validations:
28
+ required: true
29
+
30
+ - type: textarea
31
+ id: proposed-solution
32
+ attributes:
33
+ label: Proposed Solution
34
+ description: If you have ideas on how this could be implemented, describe them here. This is optional -- we welcome feature requests even without a proposed solution.
35
+ placeholder: Describe a possible implementation approach...
36
+ validations:
37
+ required: false
--- a/.github/ISSUE_TEMPLATE/feature_request.yml
+++ b/.github/ISSUE_TEMPLATE/feature_request.yml
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- a/.github/ISSUE_TEMPLATE/feature_request.yml
+++ b/.github/ISSUE_TEMPLATE/feature_request.yml
@@ -0,0 +1,37 @@
1 name: Feature Request
2 description: Suggest a new feature or improvement for PlanOpticon
3 title: "[Feature]: "
4 labels: ["enhancement"]
5 body:
6 - type: markdown
7 attributes:
8 value: |
9 We appreciate your ideas for improving PlanOpticon. Please describe your feature request in detail so we can evaluate and prioritize it.
10
11 - type: textarea
12 id: description
13 attributes:
14 label: Description
15 description: A clear and concise description of the feature you would like to see.
16 placeholder: Describe the feature...
17 validations:
18 required: true
19
20 - type: textarea
21 id: use-case
22 attributes:
23 label: Use Case
24 description: Explain the problem this feature would solve or the workflow it would improve. Why is this feature important to you?
25 placeholder: |
26 As a user who processes large batches of meeting recordings, I need...
27 validations:
28 required: true
29
30 - type: textarea
31 id: proposed-solution
32 attributes:
33 label: Proposed Solution
34 description: If you have ideas on how this could be implemented, describe them here. This is optional -- we welcome feature requests even without a proposed solution.
35 placeholder: Describe a possible implementation approach...
36 validations:
37 required: false
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,25 @@
1
+## Summary of Changes
2
+
3
+<!-- Briefly describe what this PR does and why. -->
4
+
5
+## Type of Change
6
+
7
+<!-- Check the one that applies. -->
8
+
9
+- [ ] Bug fix (non-breaking change that fixes an issue)
10
+- [ ] New feature (non-breaking change that adds functionality)
11
+- [ ] Documentation update
12
+- [ ] Refactor (no functional changes)
13
+- [ ] Breaking change (fix or feature that would cause existing functionality to change)
14
+
15
+## Test Plan
16
+
17
+<!-- Describe how you tested these changes. Include commands, scenarios, or links to CI runs. -->
18
+
19
+## Checklist
20
+
21
+- [ ] Tests pass locally (`pytest tests/`)
22
+- [ ] Lint is clean (`ruff check .` and `ruff format --check .`)
23
+- [ ] Documentation has been updated (if applicable)
24
+- [ ] Any new dependencies are added to `pyproject.toml`
25
+- [ ] Commit messages follow the project's conventions
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,25 @@
1 ## Summary of Changes
2
3 <!-- Briefly describe what this PR does and why. -->
4
5 ## Type of Change
6
7 <!-- Check the one that applies. -->
8
9 - [ ] Bug fix (non-breaking change that fixes an issue)
10 - [ ] New feature (non-breaking change that adds functionality)
11 - [ ] Documentation update
12 - [ ] Refactor (no functional changes)
13 - [ ] Breaking change (fix or feature that would cause existing functionality to change)
14
15 ## Test Plan
16
17 <!-- Describe how you tested these changes. Include commands, scenarios, or links to CI runs. -->
18
19 ## Checklist
20
21 - [ ] Tests pass locally (`pytest tests/`)
22 - [ ] Lint is clean (`ruff check .` and `ruff format --check .`)
23 - [ ] Documentation has been updated (if applicable)
24 - [ ] Any new dependencies are added to `pyproject.toml`
25 - [ ] Commit messages follow the project's conventions
--- a/.github/SECURITY.md
+++ b/.github/SECURITY.md
@@ -0,0 +1,40 @@
1
+# Security Policy
2
+
3
+## Reporting a Vulnerability
4
+
5
+If you discover a security vulnerability in PlanOpticon, we ask that you report it responsibly. **Please do not open a public GitHub issue for security vulnerabilities.**
6
+
7
+Instead, send an email to:
8
+
9
+**[email protected]**
10
+
11
+Include as much of the following information as possible:
12
+
13
+- A description of the vulnerability and its potential impact
14
+- Steps to reproduce the issue
15
+- Any relevant logs, screenshots, or proof-of-concept code
16
+- Your recommended fix, if you have one
17
+
18
+## What to Expect
19
+
20
+- **Acknowledgment:** We will acknowledge receipt of your report within 2 business days.
21
+- **Assessment:** We will investigate and assess the severity of the issue. We may reach out to you for additional details.
22
+- **Resolution:** We will work on a fix and coordinate disclosure with you. We aim to resolve critical issues within 14 days.
23
+- **Credit:** With your permission, we will credit you in the release notes for the fix.
24
+
25
+## Supported Versions
26
+
27
+We provide security updates for the latest minor release of PlanOpticon. We recommend always running the most recent version.
28
+
29
+| Version | Supported |
30
+|---------|-----------|
31
+| Latest | Yes |
32
+| Older | No |
33
+
34
+## Scope
35
+
36
+This security policy covers the PlanOpticon application and its first-party code. Vulnerabilities in third-party dependencies should be reported to the respective upstream projects, though we appreciate being notified so we can update our dependencies promptly.
37
+
38
+## Thank You
39
+
40
+We value the security research community and appreciate the effort that goes into finding and responsibly disclosing vulnerabilities. Thank you for helping keep PlanOpticon and its users safe.
--- a/.github/SECURITY.md
+++ b/.github/SECURITY.md
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- a/.github/SECURITY.md
+++ b/.github/SECURITY.md
@@ -0,0 +1,40 @@
1 # Security Policy
2
3 ## Reporting a Vulnerability
4
5 If you discover a security vulnerability in PlanOpticon, we ask that you report it responsibly. **Please do not open a public GitHub issue for security vulnerabilities.**
6
7 Instead, send an email to:
8
9 **[email protected]**
10
11 Include as much of the following information as possible:
12
13 - A description of the vulnerability and its potential impact
14 - Steps to reproduce the issue
15 - Any relevant logs, screenshots, or proof-of-concept code
16 - Your recommended fix, if you have one
17
18 ## What to Expect
19
20 - **Acknowledgment:** We will acknowledge receipt of your report within 2 business days.
21 - **Assessment:** We will investigate and assess the severity of the issue. We may reach out to you for additional details.
22 - **Resolution:** We will work on a fix and coordinate disclosure with you. We aim to resolve critical issues within 14 days.
23 - **Credit:** With your permission, we will credit you in the release notes for the fix.
24
25 ## Supported Versions
26
27 We provide security updates for the latest minor release of PlanOpticon. We recommend always running the most recent version.
28
29 | Version | Supported |
30 |---------|-----------|
31 | Latest | Yes |
32 | Older | No |
33
34 ## Scope
35
36 This security policy covers the PlanOpticon application and its first-party code. Vulnerabilities in third-party dependencies should be reported to the respective upstream projects, though we appreciate being notified so we can update our dependencies promptly.
37
38 ## Thank You
39
40 We value the security research community and appreciate the effort that goes into finding and responsibly disclosing vulnerabilities. Thank you for helping keep PlanOpticon and its users safe.
--- .github/workflows/release-binaries.yml
+++ .github/workflows/release-binaries.yml
@@ -47,10 +47,11 @@
4747
run: |
4848
pip install -e ".[all]"
4949
pip install pyinstaller
5050
5151
- name: Build binary
52
+ shell: bash
5253
run: |
5354
pyinstaller \
5455
--name planopticon-${{ matrix.target }} \
5556
--onefile \
5657
--console \
5758
5859
DELETED implementation.md
--- .github/workflows/release-binaries.yml
+++ .github/workflows/release-binaries.yml
@@ -47,10 +47,11 @@
47 run: |
48 pip install -e ".[all]"
49 pip install pyinstaller
50
51 - name: Build binary
 
52 run: |
53 pyinstaller \
54 --name planopticon-${{ matrix.target }} \
55 --onefile \
56 --console \
57
58 ELETED implementation.md
--- .github/workflows/release-binaries.yml
+++ .github/workflows/release-binaries.yml
@@ -47,10 +47,11 @@
47 run: |
48 pip install -e ".[all]"
49 pip install pyinstaller
50
51 - name: Build binary
52 shell: bash
53 run: |
54 pyinstaller \
55 --name planopticon-${{ matrix.target }} \
56 --onefile \
57 --console \
58
59 ELETED implementation.md
D implementation.md
-272
--- a/implementation.md
+++ b/implementation.md
@@ -1,272 +0,0 @@
1
-# PlanOpticon Implementation Guide
2
-This document provides detailed technical guidance for implementing the PlanOpticon system architecture. The suggested approach balances code quality, performance optimization, and architecture best practices.
3
-## System Architecture
4
-PlanOpticon follows a modular pipeline architecture with these core components:
5
-```
6
-video_processor/
7
-├── extractors/
8
-│ ├── frame_extractor.py
9
-│ ├── audio_extractor.py
10
-│ └── text_extractor.py
11
-├── api/
12
-│ ├── transcription_api.py
13
-│ ├── vision_api.py
14
-│ ├── llm_api.py
15
-│ └── api_manager.py
16
-├── analyzers/
17
-│ ├── content_analyzer.py
18
-│ ├── diagram_analyzer.py
19
-│ └── action_detector.py
20
-├── integrators/
21
-│ ├── knowledge_graph.py
22
-│ └── plan_generator.py
23
-├── utils/
24
-│ ├── api_cache.py
25
-│ ├── prompt_templates.py
26
-│ └── visualization.py
27
-└── cli/
28
- ├── commands.py
29
- └── output_formatter.py
30
-```
31
-## Implementation Approach
32
-When building complex systems like PlanOpticon, it's critical to develop each component with clear boundaries and interfaces. The following approach provides a framework for high-quality implementation:
33
-### Video and Audio Processing
34
-Video frame extraction should be implemented with performance in mind:
35
-```
36
-pythondef extract_frames(video_path, sampling_rate=1.0, change_threshold=0.15):
37
- """
38
- Extract frames from video based on sampling rate and visual change detection.
39
-
40
- Parameters
41
- ----------
42
- video_path : str
43
- Path to video file
44
- sampling_rate : float
45
- Frame sampling rate (1.0 = every frame)
46
- change_threshold : float
47
- Threshold for detecting significant visual changes
48
-
49
- Returns
50
- -------
51
- list
52
- List of extracted frames as numpy arrays
53
- """
54
- # Implementation details here
55
- pass
56
-```
57
-Consider using a decorator pattern for GPU acceleration when available:
58
-```
59
-pythondef gpu_accelerated(func):
60
- """Decorator to use GPU implementation when available."""
61
- @functools.wraps(func)
62
- def wrapper(*args, **kwargs):
63
- if is_gpu_available() and not kwargs.get('disable_gpu'):
64
- return func_gpu(*args, **kwargs)
65
- return func(*args, **kwargs)
66
- return wrapper
67
-```
68
-### Computer Vision Components
69
-When implementing diagram detection, consider using a progressive refinement approach:
70
-```
71
-pythonclass DiagramDetector:
72
- """Detects and extracts diagrams from video frames."""
73
-
74
- def __init__(self, model_path, confidence_threshold=0.7):
75
- """Initialize detector with pre-trained model."""
76
- # Implementation details
77
-
78
- def detect(self, frame):
79
- """
80
- Detect diagrams in a single frame.
81
-
82
- Parameters
83
- ----------
84
- frame : numpy.ndarray
85
- Video frame as numpy array
86
-
87
- Returns
88
- -------
89
- list
90
- List of detected diagram regions as bounding boxes
91
- """
92
- # 1. Initial region proposal
93
- # 2. Feature extraction
94
- # A well-designed detection pipeline would incorporate multiple stages
95
- # of increasingly refined detection to balance performance and accuracy
96
- pass
97
-
98
- def extract_and_normalize(self, frame, regions):
99
- """Extract and normalize detected diagrams."""
100
- # Implementation details
101
- pass
102
-```
103
-### Speech Processing Pipeline
104
-The speech recognition and diarization system should be implemented with careful attention to context:
105
-pythonclass SpeechProcessor:
106
- """Process speech from audio extraction."""
107
-
108
- def __init__(self, models_dir, device='auto'):
109
- """
110
- Initialize speech processor.
111
-
112
- Parameters
113
- ----------
114
- models_dir : str
115
- Directory containing pre-trained models
116
- device : str
117
- Computing device ('cpu', 'cuda', 'auto')
118
- """
119
- # Implementation details
120
-
121
- def process_audio(self, audio_path):
122
- """
123
- Process audio file for transcription and speaker diarization.
124
-
125
- Parameters
126
- ----------
127
- audio_path : str
128
- Path to audio file
129
-
130
- Returns
131
- -------
132
- dict
133
- Processed speech segments with speaker attribution
134
- """
135
- # The key to effective speech processing is maintaining temporal context
136
- # throughout the pipeline and handling speaker transitions gracefully
137
- pass
138
-### Action Item Detection
139
-Action item detection requires sophisticated NLP techniques:
140
-pythonclass ActionItemDetector:
141
- """Detect action items from transcript."""
142
-
143
- def detect_action_items(self, transcript):
144
- """
145
- Detect action items from transcript.
146
-
147
- Parameters
148
- ----------
149
- transcript : list
150
- List of transcript segments
151
-
152
- Returns
153
- -------
154
- list
155
- Detected action items with metadata
156
- """
157
- # A well-designed action item detector would incorporate:
158
- # 1. Intent recognition
159
- # 2. Commitment language detection
160
- # 3. Responsibility attribution
161
- # 4. Deadline extraction
162
- # 5. Priority estimation
163
- pass
164
-## Performance Optimization
165
-For optimal performance across different hardware targets:
166
-
167
-ARM Optimization
168
-
169
-Use vectorized operations with NumPy/SciPy where possible
170
-Implement conditional paths for ARM-specific optimizations
171
-Consider using PyTorch's mobile optimized models
172
-
173
-
174
-## Memory Management
175
-
176
-Implement progressive loading for large videos
177
-Use memory-mapped file access for large datasets
178
-Release resources explicitly when no longer needed
179
-
180
-
181
-## GPU Acceleration
182
-
183
-Design compute-intensive operations to work in batches
184
-Minimize CPU-GPU memory transfers
185
-Implement fallback paths for CPU-only environments
186
-
187
-
188
-
189
-## Code Quality Guidelines
190
-Maintain high code quality through these practices:
191
-
192
-### PEP 8 Compliance
193
-
194
-Consistent 4-space indentation
195
-Maximum line length of 88 characters (Black formatter standard)
196
-Descriptive variable names with snake_case convention
197
-Comprehensive docstrings for all public functions and classes
198
-
199
-
200
-### Type Annotations
201
-
202
-Use Python's type hints consistently throughout codebase
203
-Define custom types for complex data structures
204
-Validate with mypy during development
205
-
206
-
207
-### Testing Strategy
208
-
209
-Write unit tests for each module with minimum 80% coverage
210
-Create integration tests for component interactions
211
-Implement performance benchmarks for critical paths
212
-
213
-
214
-
215
-# API Integration Considerations
216
-When implementing cloud API components, consider:
217
-
218
-## API Selection
219
-
220
-Balance capabilities, cost, and performance requirements
221
-Implement appropriate rate limiting and quota management
222
-Design with graceful fallbacks between different API providers
223
-
224
-
225
-### Efficient API Usage
226
-
227
-Create optimized prompts for different content types
228
-Batch requests where possible to minimize API calls
229
-Implement caching to avoid redundant API calls
230
-
231
-
232
-### Prompt Engineering
233
-
234
-Design effective prompt templates for consistent results
235
-Implement few-shot examples for specialized content understanding
236
-Create chain-of-thought prompting for complex analysis tasks
237
-
238
-
239
-
240
-## Prompting Guidelines
241
-When developing complex AI systems, clear guidance helps ensure effective implementation. Consider these approaches:
242
-
243
-### Component Breakdown
244
-
245
-Begin by dividing the system into well-defined modules
246
-Define clear interfaces between components
247
-Specify expected inputs and outputs for each function
248
-
249
-
250
-### Progressive Development
251
-
252
-Start with skeleton implementation of core functionality
253
-Add refinements iteratively
254
-Implement error handling after core functionality works
255
-
256
-
257
-### Example-Driven Design
258
-
259
-Provide clear examples of expected behaviors
260
-Include sample inputs and outputs
261
-Demonstrate error cases and handling
262
-
263
-
264
-### Architecture Patterns
265
-
266
-Use factory patterns for flexible component creation
267
-Implement strategy patterns for algorithm selection
268
-Apply decorator patterns for cross-cutting concerns
269
-
270
-Remember that the best implementations come from clear understanding of the problem domain and careful consideration of edge cases.
271
-
272
-PlanOpticon's implementation requires attention to both high-level architecture and low-level optimization. By following these guidelines, developers can create a robust, performant system that effectively extracts valuable information from video content.
--- a/implementation.md
+++ b/implementation.md
@@ -1,272 +0,0 @@
1 # PlanOpticon Implementation Guide
2 This document provides detailed technical guidance for implementing the PlanOpticon system architecture. The suggested approach balances code quality, performance optimization, and architecture best practices.
3 ## System Architecture
4 PlanOpticon follows a modular pipeline architecture with these core components:
5 ```
6 video_processor/
7 ├── extractors/
8 │ ├── frame_extractor.py
9 │ ├── audio_extractor.py
10 │ └── text_extractor.py
11 ├── api/
12 │ ├── transcription_api.py
13 │ ├── vision_api.py
14 │ ├── llm_api.py
15 │ └── api_manager.py
16 ├── analyzers/
17 │ ├── content_analyzer.py
18 │ ├── diagram_analyzer.py
19 │ └── action_detector.py
20 ├── integrators/
21 │ ├── knowledge_graph.py
22 │ └── plan_generator.py
23 ├── utils/
24 │ ├── api_cache.py
25 │ ├── prompt_templates.py
26 │ └── visualization.py
27 └── cli/
28 ├── commands.py
29 └── output_formatter.py
30 ```
31 ## Implementation Approach
32 When building complex systems like PlanOpticon, it's critical to develop each component with clear boundaries and interfaces. The following approach provides a framework for high-quality implementation:
33 ### Video and Audio Processing
34 Video frame extraction should be implemented with performance in mind:
35 ```
36 pythondef extract_frames(video_path, sampling_rate=1.0, change_threshold=0.15):
37 """
38 Extract frames from video based on sampling rate and visual change detection.
39
40 Parameters
41 ----------
42 video_path : str
43 Path to video file
44 sampling_rate : float
45 Frame sampling rate (1.0 = every frame)
46 change_threshold : float
47 Threshold for detecting significant visual changes
48
49 Returns
50 -------
51 list
52 List of extracted frames as numpy arrays
53 """
54 # Implementation details here
55 pass
56 ```
57 Consider using a decorator pattern for GPU acceleration when available:
58 ```
59 pythondef gpu_accelerated(func):
60 """Decorator to use GPU implementation when available."""
61 @functools.wraps(func)
62 def wrapper(*args, **kwargs):
63 if is_gpu_available() and not kwargs.get('disable_gpu'):
64 return func_gpu(*args, **kwargs)
65 return func(*args, **kwargs)
66 return wrapper
67 ```
68 ### Computer Vision Components
69 When implementing diagram detection, consider using a progressive refinement approach:
70 ```
71 pythonclass DiagramDetector:
72 """Detects and extracts diagrams from video frames."""
73
74 def __init__(self, model_path, confidence_threshold=0.7):
75 """Initialize detector with pre-trained model."""
76 # Implementation details
77
78 def detect(self, frame):
79 """
80 Detect diagrams in a single frame.
81
82 Parameters
83 ----------
84 frame : numpy.ndarray
85 Video frame as numpy array
86
87 Returns
88 -------
89 list
90 List of detected diagram regions as bounding boxes
91 """
92 # 1. Initial region proposal
93 # 2. Feature extraction
94 # A well-designed detection pipeline would incorporate multiple stages
95 # of increasingly refined detection to balance performance and accuracy
96 pass
97
98 def extract_and_normalize(self, frame, regions):
99 """Extract and normalize detected diagrams."""
100 # Implementation details
101 pass
102 ```
103 ### Speech Processing Pipeline
104 The speech recognition and diarization system should be implemented with careful attention to context:
105 pythonclass SpeechProcessor:
106 """Process speech from audio extraction."""
107
108 def __init__(self, models_dir, device='auto'):
109 """
110 Initialize speech processor.
111
112 Parameters
113 ----------
114 models_dir : str
115 Directory containing pre-trained models
116 device : str
117 Computing device ('cpu', 'cuda', 'auto')
118 """
119 # Implementation details
120
121 def process_audio(self, audio_path):
122 """
123 Process audio file for transcription and speaker diarization.
124
125 Parameters
126 ----------
127 audio_path : str
128 Path to audio file
129
130 Returns
131 -------
132 dict
133 Processed speech segments with speaker attribution
134 """
135 # The key to effective speech processing is maintaining temporal context
136 # throughout the pipeline and handling speaker transitions gracefully
137 pass
138 ### Action Item Detection
139 Action item detection requires sophisticated NLP techniques:
140 pythonclass ActionItemDetector:
141 """Detect action items from transcript."""
142
143 def detect_action_items(self, transcript):
144 """
145 Detect action items from transcript.
146
147 Parameters
148 ----------
149 transcript : list
150 List of transcript segments
151
152 Returns
153 -------
154 list
155 Detected action items with metadata
156 """
157 # A well-designed action item detector would incorporate:
158 # 1. Intent recognition
159 # 2. Commitment language detection
160 # 3. Responsibility attribution
161 # 4. Deadline extraction
162 # 5. Priority estimation
163 pass
164 ## Performance Optimization
165 For optimal performance across different hardware targets:
166
167 ARM Optimization
168
169 Use vectorized operations with NumPy/SciPy where possible
170 Implement conditional paths for ARM-specific optimizations
171 Consider using PyTorch's mobile optimized models
172
173
174 ## Memory Management
175
176 Implement progressive loading for large videos
177 Use memory-mapped file access for large datasets
178 Release resources explicitly when no longer needed
179
180
181 ## GPU Acceleration
182
183 Design compute-intensive operations to work in batches
184 Minimize CPU-GPU memory transfers
185 Implement fallback paths for CPU-only environments
186
187
188
189 ## Code Quality Guidelines
190 Maintain high code quality through these practices:
191
192 ### PEP 8 Compliance
193
194 Consistent 4-space indentation
195 Maximum line length of 88 characters (Black formatter standard)
196 Descriptive variable names with snake_case convention
197 Comprehensive docstrings for all public functions and classes
198
199
200 ### Type Annotations
201
202 Use Python's type hints consistently throughout codebase
203 Define custom types for complex data structures
204 Validate with mypy during development
205
206
207 ### Testing Strategy
208
209 Write unit tests for each module with minimum 80% coverage
210 Create integration tests for component interactions
211 Implement performance benchmarks for critical paths
212
213
214
215 # API Integration Considerations
216 When implementing cloud API components, consider:
217
218 ## API Selection
219
220 Balance capabilities, cost, and performance requirements
221 Implement appropriate rate limiting and quota management
222 Design with graceful fallbacks between different API providers
223
224
225 ### Efficient API Usage
226
227 Create optimized prompts for different content types
228 Batch requests where possible to minimize API calls
229 Implement caching to avoid redundant API calls
230
231
232 ### Prompt Engineering
233
234 Design effective prompt templates for consistent results
235 Implement few-shot examples for specialized content understanding
236 Create chain-of-thought prompting for complex analysis tasks
237
238
239
240 ## Prompting Guidelines
241 When developing complex AI systems, clear guidance helps ensure effective implementation. Consider these approaches:
242
243 ### Component Breakdown
244
245 Begin by dividing the system into well-defined modules
246 Define clear interfaces between components
247 Specify expected inputs and outputs for each function
248
249
250 ### Progressive Development
251
252 Start with skeleton implementation of core functionality
253 Add refinements iteratively
254 Implement error handling after core functionality works
255
256
257 ### Example-Driven Design
258
259 Provide clear examples of expected behaviors
260 Include sample inputs and outputs
261 Demonstrate error cases and handling
262
263
264 ### Architecture Patterns
265
266 Use factory patterns for flexible component creation
267 Implement strategy patterns for algorithm selection
268 Apply decorator patterns for cross-cutting concerns
269
270 Remember that the best implementations come from clear understanding of the problem domain and careful consideration of edge cases.
271
272 PlanOpticon's implementation requires attention to both high-level architecture and low-level optimization. By following these guidelines, developers can create a robust, performant system that effectively extracts valuable information from video content.
--- a/implementation.md
+++ b/implementation.md
@@ -1,272 +0,0 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- pyproject.toml
+++ pyproject.toml
@@ -99,10 +99,13 @@
9999
target-version = "py310"
100100
101101
[tool.ruff.lint]
102102
select = ["E", "F", "W", "I"]
103103
104
+[tool.ruff.lint.per-file-ignores]
105
+"video_processor/utils/prompt_templates.py" = ["E501"]
106
+
104107
[tool.mypy]
105108
python_version = "3.10"
106109
warn_return_any = true
107110
warn_unused_configs = true
108111
109112
110113
DELETED scripts/setup.sh
--- pyproject.toml
+++ pyproject.toml
@@ -99,10 +99,13 @@
99 target-version = "py310"
100
101 [tool.ruff.lint]
102 select = ["E", "F", "W", "I"]
103
 
 
 
104 [tool.mypy]
105 python_version = "3.10"
106 warn_return_any = true
107 warn_unused_configs = true
108
109
110 ELETED scripts/setup.sh
--- pyproject.toml
+++ pyproject.toml
@@ -99,10 +99,13 @@
99 target-version = "py310"
100
101 [tool.ruff.lint]
102 select = ["E", "F", "W", "I"]
103
104 [tool.ruff.lint.per-file-ignores]
105 "video_processor/utils/prompt_templates.py" = ["E501"]
106
107 [tool.mypy]
108 python_version = "3.10"
109 warn_return_any = true
110 warn_unused_configs = true
111
112
113 ELETED scripts/setup.sh
D scripts/setup.sh
-120
--- a/scripts/setup.sh
+++ b/scripts/setup.sh
@@ -1,120 +0,0 @@
1
-#!/bin/bash
2
-# PlanOpticon setup script
3
-set -e
4
-
5
-# Detect operating system
6
-if [[ "$OSTYPE" == "darwin"* ]]; then
7
- OS="macos"
8
-elif [[ "$OSTYPE" == "linux-gnu"* ]]; then
9
- OS="linux"
10
-else
11
- echo "Unsupported operating system: $OSTYPE"
12
- exit 1
13
-fi
14
-
15
-# Detect architecture
16
-ARCH=$(uname -m)
17
-if [[ "$ARCH" == "arm64" ]] || [[ "$ARCH" == "aarch64" ]]; then
18
- ARCH="arm64"
19
-elif [[ "$ARCH" == "x86_64" ]]; then
20
- ARCH="x86_64"
21
-else
22
- echo "Unsupported architecture: $ARCH"
23
- exit 1
24
-fi
25
-
26
-echo "Setting up PlanOpticon on $OS ($ARCH)..."
27
-
28
-# Check for Python
29
-if ! command -v python3 &> /dev/null; then
30
- echo "Python 3 is required but not found."
31
- if [[ "$OS" == "macos" ]]; then
32
- echo "Please install Python 3 using Homebrew or from python.org."
33
- echo " brew install python"
34
- elif [[ "$OS" == "linux" ]]; then
35
- echo "Please install Python 3 using your package manager."
36
- echo " Ubuntu/Debian: sudo apt install python3 python3-pip python3-venv"
37
- echo " Fedora: sudo dnf install python3 python3-pip"
38
- fi
39
- exit 1
40
-fi
41
-
42
-# Check Python version
43
-PY_VERSION=$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
44
-PY_MAJOR=$(echo $PY_VERSION | cut -d. -f1)
45
-PY_MINOR=$(echo $PY_VERSION | cut -d. -f2)
46
-
47
-if [[ "$PY_MAJOR" -lt 3 ]] || [[ "$PY_MAJOR" -eq 3 && "$PY_MINOR" -lt 9 ]]; then
48
- echo "Python 3.9 or higher is required, but found $PY_VERSION."
49
- echo "Please upgrade your Python installation."
50
- exit 1
51
-fi
52
-
53
-echo "Using Python $PY_VERSION"
54
-
55
-# Check for FFmpeg
56
-if ! command -v ffmpeg &> /dev/null; then
57
- echo "FFmpeg is required but not found."
58
- if [[ "$OS" == "macos" ]]; then
59
- echo "Please install FFmpeg using Homebrew:"
60
- echo " brew install ffmpeg"
61
- elif [[ "$OS" == "linux" ]]; then
62
- echo "Please install FFmpeg using your package manager:"
63
- echo " Ubuntu/Debian: sudo apt install ffmpeg"
64
- echo " Fedora: sudo dnf install ffmpeg"
65
- fi
66
- exit 1
67
-fi
68
-
69
-echo "FFmpeg found"
70
-
71
-# Create and activate virtual environment
72
-if [[ -d "venv" ]]; then
73
- echo "Virtual environment already exists"
74
-else
75
- echo "Creating virtual environment..."
76
- python3 -m venv venv
77
-fi
78
-
79
-# Determine activate script path
80
-if [[ "$OS" == "macos" ]] || [[ "$OS" == "linux" ]]; then
81
- ACTIVATE="venv/bin/activate"
82
-fi
83
-
84
-echo "Activating virtual environment..."
85
-source "$ACTIVATE"
86
-
87
-# Upgrade pip
88
-echo "Upgrading pip..."
89
-pip install --upgrade pip
90
-
91
-# Install dependencies
92
-echo "Installing dependencies..."
93
-pip install -e .
94
-
95
-# Install optional GPU dependencies if available
96
-if [[ "$OS" == "macos" && "$ARCH" == "arm64" ]]; then
97
- echo "Installing optional ARM-specific packages for macOS..."
98
- pip install -r requirements-apple.txt 2>/dev/null || echo "No ARM-specific packages found or could not install them."
99
-elif [[ "$ARCH" == "x86_64" ]]; then
100
- # Check for NVIDIA GPU
101
- if [[ "$OS" == "linux" ]] && command -v nvidia-smi &> /dev/null; then
102
- echo "NVIDIA GPU detected, installing GPU dependencies..."
103
- pip install -r requirements-gpu.txt 2>/dev/null || echo "Could not install GPU packages."
104
- fi
105
-fi
106
-
107
-# Create example .env file if it doesn't exist
108
-if [[ ! -f ".env" ]]; then
109
- echo "Creating example .env file..."
110
- cp .env.example .env
111
- echo "Please edit the .env file to add your API keys."
112
-fi
113
-
114
-echo "Setup complete! PlanOpticon is ready to use."
115
-echo ""
116
-echo "To activate the virtual environment, run:"
117
-echo " source \"$ACTIVATE\""
118
-echo ""
119
-echo "To run PlanOpticon, use:"
120
-echo " planopticon --help"
--- a/scripts/setup.sh
+++ b/scripts/setup.sh
@@ -1,120 +0,0 @@
1 #!/bin/bash
2 # PlanOpticon setup script
3 set -e
4
5 # Detect operating system
6 if [[ "$OSTYPE" == "darwin"* ]]; then
7 OS="macos"
8 elif [[ "$OSTYPE" == "linux-gnu"* ]]; then
9 OS="linux"
10 else
11 echo "Unsupported operating system: $OSTYPE"
12 exit 1
13 fi
14
15 # Detect architecture
16 ARCH=$(uname -m)
17 if [[ "$ARCH" == "arm64" ]] || [[ "$ARCH" == "aarch64" ]]; then
18 ARCH="arm64"
19 elif [[ "$ARCH" == "x86_64" ]]; then
20 ARCH="x86_64"
21 else
22 echo "Unsupported architecture: $ARCH"
23 exit 1
24 fi
25
26 echo "Setting up PlanOpticon on $OS ($ARCH)..."
27
28 # Check for Python
29 if ! command -v python3 &> /dev/null; then
30 echo "Python 3 is required but not found."
31 if [[ "$OS" == "macos" ]]; then
32 echo "Please install Python 3 using Homebrew or from python.org."
33 echo " brew install python"
34 elif [[ "$OS" == "linux" ]]; then
35 echo "Please install Python 3 using your package manager."
36 echo " Ubuntu/Debian: sudo apt install python3 python3-pip python3-venv"
37 echo " Fedora: sudo dnf install python3 python3-pip"
38 fi
39 exit 1
40 fi
41
42 # Check Python version
43 PY_VERSION=$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
44 PY_MAJOR=$(echo $PY_VERSION | cut -d. -f1)
45 PY_MINOR=$(echo $PY_VERSION | cut -d. -f2)
46
47 if [[ "$PY_MAJOR" -lt 3 ]] || [[ "$PY_MAJOR" -eq 3 && "$PY_MINOR" -lt 9 ]]; then
48 echo "Python 3.9 or higher is required, but found $PY_VERSION."
49 echo "Please upgrade your Python installation."
50 exit 1
51 fi
52
53 echo "Using Python $PY_VERSION"
54
55 # Check for FFmpeg
56 if ! command -v ffmpeg &> /dev/null; then
57 echo "FFmpeg is required but not found."
58 if [[ "$OS" == "macos" ]]; then
59 echo "Please install FFmpeg using Homebrew:"
60 echo " brew install ffmpeg"
61 elif [[ "$OS" == "linux" ]]; then
62 echo "Please install FFmpeg using your package manager:"
63 echo " Ubuntu/Debian: sudo apt install ffmpeg"
64 echo " Fedora: sudo dnf install ffmpeg"
65 fi
66 exit 1
67 fi
68
69 echo "FFmpeg found"
70
71 # Create and activate virtual environment
72 if [[ -d "venv" ]]; then
73 echo "Virtual environment already exists"
74 else
75 echo "Creating virtual environment..."
76 python3 -m venv venv
77 fi
78
79 # Determine activate script path
80 if [[ "$OS" == "macos" ]] || [[ "$OS" == "linux" ]]; then
81 ACTIVATE="venv/bin/activate"
82 fi
83
84 echo "Activating virtual environment..."
85 source "$ACTIVATE"
86
87 # Upgrade pip
88 echo "Upgrading pip..."
89 pip install --upgrade pip
90
91 # Install dependencies
92 echo "Installing dependencies..."
93 pip install -e .
94
95 # Install optional GPU dependencies if available
96 if [[ "$OS" == "macos" && "$ARCH" == "arm64" ]]; then
97 echo "Installing optional ARM-specific packages for macOS..."
98 pip install -r requirements-apple.txt 2>/dev/null || echo "No ARM-specific packages found or could not install them."
99 elif [[ "$ARCH" == "x86_64" ]]; then
100 # Check for NVIDIA GPU
101 if [[ "$OS" == "linux" ]] && command -v nvidia-smi &> /dev/null; then
102 echo "NVIDIA GPU detected, installing GPU dependencies..."
103 pip install -r requirements-gpu.txt 2>/dev/null || echo "Could not install GPU packages."
104 fi
105 fi
106
107 # Create example .env file if it doesn't exist
108 if [[ ! -f ".env" ]]; then
109 echo "Creating example .env file..."
110 cp .env.example .env
111 echo "Please edit the .env file to add your API keys."
112 fi
113
114 echo "Setup complete! PlanOpticon is ready to use."
115 echo ""
116 echo "To activate the virtual environment, run:"
117 echo " source \"$ACTIVATE\""
118 echo ""
119 echo "To run PlanOpticon, use:"
120 echo " planopticon --help"
--- a/scripts/setup.sh
+++ b/scripts/setup.sh
@@ -1,120 +0,0 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
+1
--- setup.py
+++ setup.py
@@ -1,4 +1,5 @@
11
"""Backwards-compatible setup.py — all config lives in pyproject.toml."""
2
+
23
from setuptools import setup
34
45
setup()
56
--- setup.py
+++ setup.py
@@ -1,4 +1,5 @@
1 """Backwards-compatible setup.py — all config lives in pyproject.toml."""
 
2 from setuptools import setup
3
4 setup()
5
--- setup.py
+++ setup.py
@@ -1,4 +1,5 @@
1 """Backwards-compatible setup.py — all config lives in pyproject.toml."""
2
3 from setuptools import setup
4
5 setup()
6
--- tests/test_action_detector.py
+++ tests/test_action_detector.py
@@ -1,20 +1,20 @@
11
"""Tests for enhanced action item detection."""
22
33
import json
44
from unittest.mock import MagicMock
55
6
-import pytest
7
-
86
from video_processor.analyzers.action_detector import ActionDetector
97
from video_processor.models import ActionItem, TranscriptSegment
108
119
1210
class TestPatternExtract:
1311
def test_detects_need_to(self):
1412
detector = ActionDetector()
15
- items = detector.detect_from_transcript("We need to update the database schema before release.")
13
+ items = detector.detect_from_transcript(
14
+ "We need to update the database schema before release."
15
+ )
1616
assert len(items) >= 1
1717
assert any("database" in i.action.lower() for i in items)
1818
1919
def test_detects_should(self):
2020
detector = ActionDetector()
@@ -21,11 +21,13 @@
2121
items = detector.detect_from_transcript("Alice should review the pull request by Friday.")
2222
assert len(items) >= 1
2323
2424
def test_detects_action_item_keyword(self):
2525
detector = ActionDetector()
26
- items = detector.detect_from_transcript("Action item: set up monitoring for the new service.")
26
+ items = detector.detect_from_transcript(
27
+ "Action item: set up monitoring for the new service."
28
+ )
2729
assert len(items) >= 1
2830
2931
def test_detects_follow_up(self):
3032
detector = ActionDetector()
3133
items = detector.detect_from_transcript("Follow up with the client about requirements.")
@@ -41,22 +43,16 @@
4143
items = detector.detect_from_transcript("Do it.")
4244
assert len(items) == 0
4345
4446
def test_no_action_patterns(self):
4547
detector = ActionDetector()
46
- items = detector.detect_from_transcript(
47
- "The weather was nice today. We had lunch at noon."
48
- )
48
+ items = detector.detect_from_transcript("The weather was nice today. We had lunch at noon.")
4949
assert len(items) == 0
5050
5151
def test_multiple_sentences(self):
5252
detector = ActionDetector()
53
- text = (
54
- "We need to deploy the fix. "
55
- "Alice should test it first. "
56
- "The sky is blue."
57
- )
53
+ text = "We need to deploy the fix. Alice should test it first. The sky is blue."
5854
items = detector.detect_from_transcript(text)
5955
assert len(items) == 2
6056
6157
def test_source_is_transcript(self):
6258
detector = ActionDetector()
@@ -66,14 +62,21 @@
6662
6763
6864
class TestLLMExtract:
6965
def test_llm_extraction(self):
7066
pm = MagicMock()
71
- pm.chat.return_value = json.dumps([
72
- {"action": "Deploy new version", "assignee": "Bob", "deadline": "Friday",
73
- "priority": "high", "context": "Production release"}
74
- ])
67
+ pm.chat.return_value = json.dumps(
68
+ [
69
+ {
70
+ "action": "Deploy new version",
71
+ "assignee": "Bob",
72
+ "deadline": "Friday",
73
+ "priority": "high",
74
+ "context": "Production release",
75
+ }
76
+ ]
77
+ )
7578
detector = ActionDetector(provider_manager=pm)
7679
items = detector.detect_from_transcript("Deploy new version by Friday.")
7780
assert len(items) == 1
7881
assert items[0].action == "Deploy new version"
7982
assert items[0].assignee == "Bob"
@@ -102,28 +105,37 @@
102105
items = detector.detect_from_transcript("Update the docs.")
103106
assert items == []
104107
105108
def test_llm_skips_items_without_action(self):
106109
pm = MagicMock()
107
- pm.chat.return_value = json.dumps([
108
- {"action": "Valid action", "assignee": None},
109
- {"assignee": "Alice"}, # No action field
110
- {"action": "", "assignee": "Bob"}, # Empty action
111
- ])
110
+ pm.chat.return_value = json.dumps(
111
+ [
112
+ {"action": "Valid action", "assignee": None},
113
+ {"assignee": "Alice"}, # No action field
114
+ {"action": "", "assignee": "Bob"}, # Empty action
115
+ ]
116
+ )
112117
detector = ActionDetector(provider_manager=pm)
113118
items = detector.detect_from_transcript("Some text.")
114119
assert len(items) == 1
115120
assert items[0].action == "Valid action"
116121
117122
118123
class TestDetectFromDiagrams:
119124
def test_dict_diagrams(self):
120125
pm = MagicMock()
121
- pm.chat.return_value = json.dumps([
122
- {"action": "Migrate database", "assignee": None, "deadline": None,
123
- "priority": None, "context": None},
124
- ])
126
+ pm.chat.return_value = json.dumps(
127
+ [
128
+ {
129
+ "action": "Migrate database",
130
+ "assignee": None,
131
+ "deadline": None,
132
+ "priority": None,
133
+ "context": None,
134
+ },
135
+ ]
136
+ )
125137
detector = ActionDetector(provider_manager=pm)
126138
diagrams = [
127139
{"text_content": "Step 1: Migrate database", "elements": ["DB", "Migration"]},
128140
]
129141
items = detector.detect_from_diagrams(diagrams)
@@ -130,14 +142,21 @@
130142
assert len(items) == 1
131143
assert items[0].source == "diagram"
132144
133145
def test_object_diagrams(self):
134146
pm = MagicMock()
135
- pm.chat.return_value = json.dumps([
136
- {"action": "Update API", "assignee": None, "deadline": None,
137
- "priority": None, "context": None},
138
- ])
147
+ pm.chat.return_value = json.dumps(
148
+ [
149
+ {
150
+ "action": "Update API",
151
+ "assignee": None,
152
+ "deadline": None,
153
+ "priority": None,
154
+ "context": None,
155
+ },
156
+ ]
157
+ )
139158
detector = ActionDetector(provider_manager=pm)
140159
141160
class FakeDiagram:
142161
text_content = "Update API endpoints"
143162
elements = ["API", "Gateway"]
@@ -153,11 +172,14 @@
153172
assert items == []
154173
155174
def test_pattern_fallback_for_diagrams(self):
156175
detector = ActionDetector() # No provider
157176
diagrams = [
158
- {"text_content": "We need to update the configuration before deployment.", "elements": []},
177
+ {
178
+ "text_content": "We need to update the configuration before deployment.",
179
+ "elements": [],
180
+ },
159181
]
160182
items = detector.detect_from_diagrams(diagrams)
161183
assert len(items) >= 1
162184
assert items[0].source == "diagram"
163185
@@ -191,16 +213,18 @@
191213
192214
193215
class TestAttachTimestamps:
194216
def test_attaches_matching_segment(self):
195217
detector = ActionDetector()
196
- items = [
218
+ [
197219
ActionItem(action="We need to update the database schema before release"),
198220
]
199221
segments = [
200222
TranscriptSegment(start=0.0, end=5.0, text="Welcome to the meeting."),
201
- TranscriptSegment(start=5.0, end=15.0, text="We need to update the database schema before release."),
223
+ TranscriptSegment(
224
+ start=5.0, end=15.0, text="We need to update the database schema before release."
225
+ ),
202226
TranscriptSegment(start=15.0, end=20.0, text="Any questions?"),
203227
]
204228
detector.detect_from_transcript(
205229
"We need to update the database schema before release.",
206230
segments=segments,
207231
--- tests/test_action_detector.py
+++ tests/test_action_detector.py
@@ -1,20 +1,20 @@
1 """Tests for enhanced action item detection."""
2
3 import json
4 from unittest.mock import MagicMock
5
6 import pytest
7
8 from video_processor.analyzers.action_detector import ActionDetector
9 from video_processor.models import ActionItem, TranscriptSegment
10
11
12 class TestPatternExtract:
13 def test_detects_need_to(self):
14 detector = ActionDetector()
15 items = detector.detect_from_transcript("We need to update the database schema before release.")
 
 
16 assert len(items) >= 1
17 assert any("database" in i.action.lower() for i in items)
18
19 def test_detects_should(self):
20 detector = ActionDetector()
@@ -21,11 +21,13 @@
21 items = detector.detect_from_transcript("Alice should review the pull request by Friday.")
22 assert len(items) >= 1
23
24 def test_detects_action_item_keyword(self):
25 detector = ActionDetector()
26 items = detector.detect_from_transcript("Action item: set up monitoring for the new service.")
 
 
27 assert len(items) >= 1
28
29 def test_detects_follow_up(self):
30 detector = ActionDetector()
31 items = detector.detect_from_transcript("Follow up with the client about requirements.")
@@ -41,22 +43,16 @@
41 items = detector.detect_from_transcript("Do it.")
42 assert len(items) == 0
43
44 def test_no_action_patterns(self):
45 detector = ActionDetector()
46 items = detector.detect_from_transcript(
47 "The weather was nice today. We had lunch at noon."
48 )
49 assert len(items) == 0
50
51 def test_multiple_sentences(self):
52 detector = ActionDetector()
53 text = (
54 "We need to deploy the fix. "
55 "Alice should test it first. "
56 "The sky is blue."
57 )
58 items = detector.detect_from_transcript(text)
59 assert len(items) == 2
60
61 def test_source_is_transcript(self):
62 detector = ActionDetector()
@@ -66,14 +62,21 @@
66
67
68 class TestLLMExtract:
69 def test_llm_extraction(self):
70 pm = MagicMock()
71 pm.chat.return_value = json.dumps([
72 {"action": "Deploy new version", "assignee": "Bob", "deadline": "Friday",
73 "priority": "high", "context": "Production release"}
74 ])
 
 
 
 
 
 
 
75 detector = ActionDetector(provider_manager=pm)
76 items = detector.detect_from_transcript("Deploy new version by Friday.")
77 assert len(items) == 1
78 assert items[0].action == "Deploy new version"
79 assert items[0].assignee == "Bob"
@@ -102,28 +105,37 @@
102 items = detector.detect_from_transcript("Update the docs.")
103 assert items == []
104
105 def test_llm_skips_items_without_action(self):
106 pm = MagicMock()
107 pm.chat.return_value = json.dumps([
108 {"action": "Valid action", "assignee": None},
109 {"assignee": "Alice"}, # No action field
110 {"action": "", "assignee": "Bob"}, # Empty action
111 ])
 
 
112 detector = ActionDetector(provider_manager=pm)
113 items = detector.detect_from_transcript("Some text.")
114 assert len(items) == 1
115 assert items[0].action == "Valid action"
116
117
118 class TestDetectFromDiagrams:
119 def test_dict_diagrams(self):
120 pm = MagicMock()
121 pm.chat.return_value = json.dumps([
122 {"action": "Migrate database", "assignee": None, "deadline": None,
123 "priority": None, "context": None},
124 ])
 
 
 
 
 
 
 
125 detector = ActionDetector(provider_manager=pm)
126 diagrams = [
127 {"text_content": "Step 1: Migrate database", "elements": ["DB", "Migration"]},
128 ]
129 items = detector.detect_from_diagrams(diagrams)
@@ -130,14 +142,21 @@
130 assert len(items) == 1
131 assert items[0].source == "diagram"
132
133 def test_object_diagrams(self):
134 pm = MagicMock()
135 pm.chat.return_value = json.dumps([
136 {"action": "Update API", "assignee": None, "deadline": None,
137 "priority": None, "context": None},
138 ])
 
 
 
 
 
 
 
139 detector = ActionDetector(provider_manager=pm)
140
141 class FakeDiagram:
142 text_content = "Update API endpoints"
143 elements = ["API", "Gateway"]
@@ -153,11 +172,14 @@
153 assert items == []
154
155 def test_pattern_fallback_for_diagrams(self):
156 detector = ActionDetector() # No provider
157 diagrams = [
158 {"text_content": "We need to update the configuration before deployment.", "elements": []},
 
 
 
159 ]
160 items = detector.detect_from_diagrams(diagrams)
161 assert len(items) >= 1
162 assert items[0].source == "diagram"
163
@@ -191,16 +213,18 @@
191
192
193 class TestAttachTimestamps:
194 def test_attaches_matching_segment(self):
195 detector = ActionDetector()
196 items = [
197 ActionItem(action="We need to update the database schema before release"),
198 ]
199 segments = [
200 TranscriptSegment(start=0.0, end=5.0, text="Welcome to the meeting."),
201 TranscriptSegment(start=5.0, end=15.0, text="We need to update the database schema before release."),
 
 
202 TranscriptSegment(start=15.0, end=20.0, text="Any questions?"),
203 ]
204 detector.detect_from_transcript(
205 "We need to update the database schema before release.",
206 segments=segments,
207
--- tests/test_action_detector.py
+++ tests/test_action_detector.py
@@ -1,20 +1,20 @@
1 """Tests for enhanced action item detection."""
2
3 import json
4 from unittest.mock import MagicMock
5
 
 
6 from video_processor.analyzers.action_detector import ActionDetector
7 from video_processor.models import ActionItem, TranscriptSegment
8
9
10 class TestPatternExtract:
11 def test_detects_need_to(self):
12 detector = ActionDetector()
13 items = detector.detect_from_transcript(
14 "We need to update the database schema before release."
15 )
16 assert len(items) >= 1
17 assert any("database" in i.action.lower() for i in items)
18
19 def test_detects_should(self):
20 detector = ActionDetector()
@@ -21,11 +21,13 @@
21 items = detector.detect_from_transcript("Alice should review the pull request by Friday.")
22 assert len(items) >= 1
23
24 def test_detects_action_item_keyword(self):
25 detector = ActionDetector()
26 items = detector.detect_from_transcript(
27 "Action item: set up monitoring for the new service."
28 )
29 assert len(items) >= 1
30
31 def test_detects_follow_up(self):
32 detector = ActionDetector()
33 items = detector.detect_from_transcript("Follow up with the client about requirements.")
@@ -41,22 +43,16 @@
43 items = detector.detect_from_transcript("Do it.")
44 assert len(items) == 0
45
46 def test_no_action_patterns(self):
47 detector = ActionDetector()
48 items = detector.detect_from_transcript("The weather was nice today. We had lunch at noon.")
 
 
49 assert len(items) == 0
50
51 def test_multiple_sentences(self):
52 detector = ActionDetector()
53 text = "We need to deploy the fix. Alice should test it first. The sky is blue."
 
 
 
 
54 items = detector.detect_from_transcript(text)
55 assert len(items) == 2
56
57 def test_source_is_transcript(self):
58 detector = ActionDetector()
@@ -66,14 +62,21 @@
62
63
64 class TestLLMExtract:
65 def test_llm_extraction(self):
66 pm = MagicMock()
67 pm.chat.return_value = json.dumps(
68 [
69 {
70 "action": "Deploy new version",
71 "assignee": "Bob",
72 "deadline": "Friday",
73 "priority": "high",
74 "context": "Production release",
75 }
76 ]
77 )
78 detector = ActionDetector(provider_manager=pm)
79 items = detector.detect_from_transcript("Deploy new version by Friday.")
80 assert len(items) == 1
81 assert items[0].action == "Deploy new version"
82 assert items[0].assignee == "Bob"
@@ -102,28 +105,37 @@
105 items = detector.detect_from_transcript("Update the docs.")
106 assert items == []
107
108 def test_llm_skips_items_without_action(self):
109 pm = MagicMock()
110 pm.chat.return_value = json.dumps(
111 [
112 {"action": "Valid action", "assignee": None},
113 {"assignee": "Alice"}, # No action field
114 {"action": "", "assignee": "Bob"}, # Empty action
115 ]
116 )
117 detector = ActionDetector(provider_manager=pm)
118 items = detector.detect_from_transcript("Some text.")
119 assert len(items) == 1
120 assert items[0].action == "Valid action"
121
122
123 class TestDetectFromDiagrams:
124 def test_dict_diagrams(self):
125 pm = MagicMock()
126 pm.chat.return_value = json.dumps(
127 [
128 {
129 "action": "Migrate database",
130 "assignee": None,
131 "deadline": None,
132 "priority": None,
133 "context": None,
134 },
135 ]
136 )
137 detector = ActionDetector(provider_manager=pm)
138 diagrams = [
139 {"text_content": "Step 1: Migrate database", "elements": ["DB", "Migration"]},
140 ]
141 items = detector.detect_from_diagrams(diagrams)
@@ -130,14 +142,21 @@
142 assert len(items) == 1
143 assert items[0].source == "diagram"
144
145 def test_object_diagrams(self):
146 pm = MagicMock()
147 pm.chat.return_value = json.dumps(
148 [
149 {
150 "action": "Update API",
151 "assignee": None,
152 "deadline": None,
153 "priority": None,
154 "context": None,
155 },
156 ]
157 )
158 detector = ActionDetector(provider_manager=pm)
159
160 class FakeDiagram:
161 text_content = "Update API endpoints"
162 elements = ["API", "Gateway"]
@@ -153,11 +172,14 @@
172 assert items == []
173
174 def test_pattern_fallback_for_diagrams(self):
175 detector = ActionDetector() # No provider
176 diagrams = [
177 {
178 "text_content": "We need to update the configuration before deployment.",
179 "elements": [],
180 },
181 ]
182 items = detector.detect_from_diagrams(diagrams)
183 assert len(items) >= 1
184 assert items[0].source == "diagram"
185
@@ -191,16 +213,18 @@
213
214
215 class TestAttachTimestamps:
216 def test_attaches_matching_segment(self):
217 detector = ActionDetector()
218 [
219 ActionItem(action="We need to update the database schema before release"),
220 ]
221 segments = [
222 TranscriptSegment(start=0.0, end=5.0, text="Welcome to the meeting."),
223 TranscriptSegment(
224 start=5.0, end=15.0, text="We need to update the database schema before release."
225 ),
226 TranscriptSegment(start=15.0, end=20.0, text="Any questions?"),
227 ]
228 detector.detect_from_transcript(
229 "We need to update the database schema before release.",
230 segments=segments,
231
--- tests/test_agent.py
+++ tests/test_agent.py
@@ -1,11 +1,9 @@
11
"""Tests for the agentic processing orchestrator."""
22
33
import json
4
-from unittest.mock import MagicMock, patch
5
-
6
-import pytest
4
+from unittest.mock import MagicMock
75
86
from video_processor.agent.orchestrator import AgentOrchestrator
97
108
119
class TestPlanCreation:
@@ -99,16 +97,18 @@
9997
agent.insights.append("should not modify internal")
10098
assert len(agent._insights) == 2
10199
102100
def test_deep_analysis_populates_insights(self):
103101
pm = MagicMock()
104
- pm.chat.return_value = json.dumps({
105
- "decisions": ["Decided to use microservices"],
106
- "risks": ["Timeline is tight"],
107
- "follow_ups": [],
108
- "tensions": [],
109
- })
102
+ pm.chat.return_value = json.dumps(
103
+ {
104
+ "decisions": ["Decided to use microservices"],
105
+ "risks": ["Timeline is tight"],
106
+ "follow_ups": [],
107
+ "tensions": [],
108
+ }
109
+ )
110110
agent = AgentOrchestrator(provider_manager=pm)
111111
agent._results["transcribe"] = {"text": "Some long transcript text here"}
112112
result = agent._deep_analysis("/tmp")
113113
assert "decisions" in result
114114
assert any("microservices" in i for i in agent._insights)
115115
--- tests/test_agent.py
+++ tests/test_agent.py
@@ -1,11 +1,9 @@
1 """Tests for the agentic processing orchestrator."""
2
3 import json
4 from unittest.mock import MagicMock, patch
5
6 import pytest
7
8 from video_processor.agent.orchestrator import AgentOrchestrator
9
10
11 class TestPlanCreation:
@@ -99,16 +97,18 @@
99 agent.insights.append("should not modify internal")
100 assert len(agent._insights) == 2
101
102 def test_deep_analysis_populates_insights(self):
103 pm = MagicMock()
104 pm.chat.return_value = json.dumps({
105 "decisions": ["Decided to use microservices"],
106 "risks": ["Timeline is tight"],
107 "follow_ups": [],
108 "tensions": [],
109 })
 
 
110 agent = AgentOrchestrator(provider_manager=pm)
111 agent._results["transcribe"] = {"text": "Some long transcript text here"}
112 result = agent._deep_analysis("/tmp")
113 assert "decisions" in result
114 assert any("microservices" in i for i in agent._insights)
115
--- tests/test_agent.py
+++ tests/test_agent.py
@@ -1,11 +1,9 @@
1 """Tests for the agentic processing orchestrator."""
2
3 import json
4 from unittest.mock import MagicMock
 
 
5
6 from video_processor.agent.orchestrator import AgentOrchestrator
7
8
9 class TestPlanCreation:
@@ -99,16 +97,18 @@
97 agent.insights.append("should not modify internal")
98 assert len(agent._insights) == 2
99
100 def test_deep_analysis_populates_insights(self):
101 pm = MagicMock()
102 pm.chat.return_value = json.dumps(
103 {
104 "decisions": ["Decided to use microservices"],
105 "risks": ["Timeline is tight"],
106 "follow_ups": [],
107 "tensions": [],
108 }
109 )
110 agent = AgentOrchestrator(provider_manager=pm)
111 agent._results["transcribe"] = {"text": "Some long transcript text here"}
112 result = agent._deep_analysis("/tmp")
113 assert "decisions" in result
114 assert any("microservices" in i for i in agent._insights)
115
--- tests/test_api_cache.py
+++ tests/test_api_cache.py
@@ -1,12 +1,9 @@
11
"""Tests for API response cache."""
22
3
-import json
43
import time
54
6
-import pytest
7
-
85
from video_processor.utils.api_cache import ApiCache
96
107
118
class TestApiCache:
129
def test_set_and_get(self, tmp_path):
@@ -71,13 +68,13 @@
7168
cache_b.set("key", "value_b")
7269
assert cache_a.get("key") == "value_a"
7370
assert cache_b.get("key") == "value_b"
7471
7572
def test_creates_namespace_dir(self, tmp_path):
76
- cache = ApiCache(tmp_path / "sub", namespace="deep")
73
+ ApiCache(tmp_path / "sub", namespace="deep")
7774
assert (tmp_path / "sub" / "deep").exists()
7875
7976
def test_cache_path_uses_hash(self, tmp_path):
8077
cache = ApiCache(tmp_path, namespace="test")
8178
path = cache.get_cache_path("my_key")
8279
assert path.suffix == ".json"
8380
assert path.parent.name == "test"
8481
--- tests/test_api_cache.py
+++ tests/test_api_cache.py
@@ -1,12 +1,9 @@
1 """Tests for API response cache."""
2
3 import json
4 import time
5
6 import pytest
7
8 from video_processor.utils.api_cache import ApiCache
9
10
11 class TestApiCache:
12 def test_set_and_get(self, tmp_path):
@@ -71,13 +68,13 @@
71 cache_b.set("key", "value_b")
72 assert cache_a.get("key") == "value_a"
73 assert cache_b.get("key") == "value_b"
74
75 def test_creates_namespace_dir(self, tmp_path):
76 cache = ApiCache(tmp_path / "sub", namespace="deep")
77 assert (tmp_path / "sub" / "deep").exists()
78
79 def test_cache_path_uses_hash(self, tmp_path):
80 cache = ApiCache(tmp_path, namespace="test")
81 path = cache.get_cache_path("my_key")
82 assert path.suffix == ".json"
83 assert path.parent.name == "test"
84
--- tests/test_api_cache.py
+++ tests/test_api_cache.py
@@ -1,12 +1,9 @@
1 """Tests for API response cache."""
2
 
3 import time
4
 
 
5 from video_processor.utils.api_cache import ApiCache
6
7
8 class TestApiCache:
9 def test_set_and_get(self, tmp_path):
@@ -71,13 +68,13 @@
68 cache_b.set("key", "value_b")
69 assert cache_a.get("key") == "value_a"
70 assert cache_b.get("key") == "value_b"
71
72 def test_creates_namespace_dir(self, tmp_path):
73 ApiCache(tmp_path / "sub", namespace="deep")
74 assert (tmp_path / "sub" / "deep").exists()
75
76 def test_cache_path_uses_hash(self, tmp_path):
77 cache = ApiCache(tmp_path, namespace="test")
78 path = cache.get_cache_path("my_key")
79 assert path.suffix == ".json"
80 assert path.parent.name == "test"
81
--- tests/test_audio_extractor.py
+++ tests/test_audio_extractor.py
@@ -1,65 +1,65 @@
11
"""Tests for the audio extractor module."""
2
-import os
2
+
33
import tempfile
44
from pathlib import Path
5
-from unittest.mock import patch, MagicMock
5
+from unittest.mock import MagicMock, patch
66
77
import numpy as np
8
-import pytest
98
109
from video_processor.extractors.audio_extractor import AudioExtractor
10
+
1111
1212
class TestAudioExtractor:
1313
"""Test suite for AudioExtractor class."""
14
-
14
+
1515
def test_init(self):
1616
"""Test initialization of AudioExtractor."""
1717
# Default parameters
1818
extractor = AudioExtractor()
1919
assert extractor.sample_rate == 16000
2020
assert extractor.mono is True
21
-
21
+
2222
# Custom parameters
2323
extractor = AudioExtractor(sample_rate=44100, mono=False)
2424
assert extractor.sample_rate == 44100
2525
assert extractor.mono is False
26
-
27
- @patch('subprocess.run')
26
+
27
+ @patch("subprocess.run")
2828
def test_extract_audio(self, mock_run):
2929
"""Test audio extraction from video."""
3030
# Mock the subprocess.run call
3131
mock_result = MagicMock()
3232
mock_result.returncode = 0
3333
mock_run.return_value = mock_result
34
-
34
+
3535
with tempfile.TemporaryDirectory() as temp_dir:
3636
# Create a dummy video file
3737
video_path = Path(temp_dir) / "test_video.mp4"
3838
with open(video_path, "wb") as f:
3939
f.write(b"dummy video content")
40
-
40
+
4141
# Extract audio
4242
extractor = AudioExtractor()
43
-
43
+
4444
# Test with default output path
4545
output_path = extractor.extract_audio(video_path)
4646
assert output_path == video_path.with_suffix(".wav")
47
-
47
+
4848
# Test with custom output path
4949
custom_output = Path(temp_dir) / "custom_audio.wav"
5050
output_path = extractor.extract_audio(video_path, custom_output)
5151
assert output_path == custom_output
52
-
52
+
5353
# Verify subprocess.run was called with correct arguments
5454
mock_run.assert_called()
5555
args, kwargs = mock_run.call_args
5656
assert "ffmpeg" in args[0]
5757
assert "-i" in args[0]
5858
assert str(video_path) in args[0]
59
-
60
- @patch('soundfile.info')
59
+
60
+ @patch("soundfile.info")
6161
def test_get_audio_properties(self, mock_sf_info):
6262
"""Test getting audio properties."""
6363
# Mock soundfile.info
6464
mock_info = MagicMock()
6565
mock_info.duration = 10.5
@@ -66,55 +66,49 @@
6666
mock_info.samplerate = 16000
6767
mock_info.channels = 1
6868
mock_info.format = "WAV"
6969
mock_info.subtype = "PCM_16"
7070
mock_sf_info.return_value = mock_info
71
-
71
+
7272
with tempfile.TemporaryDirectory() as temp_dir:
7373
# Create a dummy audio file
7474
audio_path = Path(temp_dir) / "test_audio.wav"
7575
with open(audio_path, "wb") as f:
7676
f.write(b"dummy audio content")
77
-
77
+
7878
# Get properties
7979
extractor = AudioExtractor()
8080
props = extractor.get_audio_properties(audio_path)
81
-
81
+
8282
# Verify properties
8383
assert props["duration"] == 10.5
8484
assert props["sample_rate"] == 16000
8585
assert props["channels"] == 1
8686
assert props["format"] == "WAV"
8787
assert props["subtype"] == "PCM_16"
8888
assert props["path"] == str(audio_path)
89
-
89
+
9090
def test_segment_audio(self):
9191
"""Test audio segmentation."""
9292
# Create a dummy audio array (1 second at 16kHz)
9393
audio_data = np.ones(16000)
9494
sample_rate = 16000
95
-
95
+
9696
extractor = AudioExtractor()
97
-
97
+
9898
# Test with 500ms segments, no overlap
9999
segments = extractor.segment_audio(
100
- audio_data,
101
- sample_rate,
102
- segment_length_ms=500,
103
- overlap_ms=0
100
+ audio_data, sample_rate, segment_length_ms=500, overlap_ms=0
104101
)
105
-
102
+
106103
# Should produce 2 segments of 8000 samples each
107104
assert len(segments) == 2
108105
assert len(segments[0]) == 8000
109106
assert len(segments[1]) == 8000
110
-
107
+
111108
# Test with 600ms segments, 100ms overlap
112109
segments = extractor.segment_audio(
113
- audio_data,
114
- sample_rate,
115
- segment_length_ms=600,
116
- overlap_ms=100
117
- )
118
-
119
- # Should produce 2 segments (with overlap)
120
- assert len(segments) == 2
110
+ audio_data, sample_rate, segment_length_ms=600, overlap_ms=100
111
+ )
112
+
113
+ # Should produce 2 segments (with overlap)
114
+ assert len(segments) == 2
121115
--- tests/test_audio_extractor.py
+++ tests/test_audio_extractor.py
@@ -1,65 +1,65 @@
1 """Tests for the audio extractor module."""
2 import os
3 import tempfile
4 from pathlib import Path
5 from unittest.mock import patch, MagicMock
6
7 import numpy as np
8 import pytest
9
10 from video_processor.extractors.audio_extractor import AudioExtractor
 
11
12 class TestAudioExtractor:
13 """Test suite for AudioExtractor class."""
14
15 def test_init(self):
16 """Test initialization of AudioExtractor."""
17 # Default parameters
18 extractor = AudioExtractor()
19 assert extractor.sample_rate == 16000
20 assert extractor.mono is True
21
22 # Custom parameters
23 extractor = AudioExtractor(sample_rate=44100, mono=False)
24 assert extractor.sample_rate == 44100
25 assert extractor.mono is False
26
27 @patch('subprocess.run')
28 def test_extract_audio(self, mock_run):
29 """Test audio extraction from video."""
30 # Mock the subprocess.run call
31 mock_result = MagicMock()
32 mock_result.returncode = 0
33 mock_run.return_value = mock_result
34
35 with tempfile.TemporaryDirectory() as temp_dir:
36 # Create a dummy video file
37 video_path = Path(temp_dir) / "test_video.mp4"
38 with open(video_path, "wb") as f:
39 f.write(b"dummy video content")
40
41 # Extract audio
42 extractor = AudioExtractor()
43
44 # Test with default output path
45 output_path = extractor.extract_audio(video_path)
46 assert output_path == video_path.with_suffix(".wav")
47
48 # Test with custom output path
49 custom_output = Path(temp_dir) / "custom_audio.wav"
50 output_path = extractor.extract_audio(video_path, custom_output)
51 assert output_path == custom_output
52
53 # Verify subprocess.run was called with correct arguments
54 mock_run.assert_called()
55 args, kwargs = mock_run.call_args
56 assert "ffmpeg" in args[0]
57 assert "-i" in args[0]
58 assert str(video_path) in args[0]
59
60 @patch('soundfile.info')
61 def test_get_audio_properties(self, mock_sf_info):
62 """Test getting audio properties."""
63 # Mock soundfile.info
64 mock_info = MagicMock()
65 mock_info.duration = 10.5
@@ -66,55 +66,49 @@
66 mock_info.samplerate = 16000
67 mock_info.channels = 1
68 mock_info.format = "WAV"
69 mock_info.subtype = "PCM_16"
70 mock_sf_info.return_value = mock_info
71
72 with tempfile.TemporaryDirectory() as temp_dir:
73 # Create a dummy audio file
74 audio_path = Path(temp_dir) / "test_audio.wav"
75 with open(audio_path, "wb") as f:
76 f.write(b"dummy audio content")
77
78 # Get properties
79 extractor = AudioExtractor()
80 props = extractor.get_audio_properties(audio_path)
81
82 # Verify properties
83 assert props["duration"] == 10.5
84 assert props["sample_rate"] == 16000
85 assert props["channels"] == 1
86 assert props["format"] == "WAV"
87 assert props["subtype"] == "PCM_16"
88 assert props["path"] == str(audio_path)
89
90 def test_segment_audio(self):
91 """Test audio segmentation."""
92 # Create a dummy audio array (1 second at 16kHz)
93 audio_data = np.ones(16000)
94 sample_rate = 16000
95
96 extractor = AudioExtractor()
97
98 # Test with 500ms segments, no overlap
99 segments = extractor.segment_audio(
100 audio_data,
101 sample_rate,
102 segment_length_ms=500,
103 overlap_ms=0
104 )
105
106 # Should produce 2 segments of 8000 samples each
107 assert len(segments) == 2
108 assert len(segments[0]) == 8000
109 assert len(segments[1]) == 8000
110
111 # Test with 600ms segments, 100ms overlap
112 segments = extractor.segment_audio(
113 audio_data,
114 sample_rate,
115 segment_length_ms=600,
116 overlap_ms=100
117 )
118
119 # Should produce 2 segments (with overlap)
120 assert len(segments) == 2
121
--- tests/test_audio_extractor.py
+++ tests/test_audio_extractor.py
@@ -1,65 +1,65 @@
1 """Tests for the audio extractor module."""
2
3 import tempfile
4 from pathlib import Path
5 from unittest.mock import MagicMock, patch
6
7 import numpy as np
 
8
9 from video_processor.extractors.audio_extractor import AudioExtractor
10
11
12 class TestAudioExtractor:
13 """Test suite for AudioExtractor class."""
14
15 def test_init(self):
16 """Test initialization of AudioExtractor."""
17 # Default parameters
18 extractor = AudioExtractor()
19 assert extractor.sample_rate == 16000
20 assert extractor.mono is True
21
22 # Custom parameters
23 extractor = AudioExtractor(sample_rate=44100, mono=False)
24 assert extractor.sample_rate == 44100
25 assert extractor.mono is False
26
27 @patch("subprocess.run")
28 def test_extract_audio(self, mock_run):
29 """Test audio extraction from video."""
30 # Mock the subprocess.run call
31 mock_result = MagicMock()
32 mock_result.returncode = 0
33 mock_run.return_value = mock_result
34
35 with tempfile.TemporaryDirectory() as temp_dir:
36 # Create a dummy video file
37 video_path = Path(temp_dir) / "test_video.mp4"
38 with open(video_path, "wb") as f:
39 f.write(b"dummy video content")
40
41 # Extract audio
42 extractor = AudioExtractor()
43
44 # Test with default output path
45 output_path = extractor.extract_audio(video_path)
46 assert output_path == video_path.with_suffix(".wav")
47
48 # Test with custom output path
49 custom_output = Path(temp_dir) / "custom_audio.wav"
50 output_path = extractor.extract_audio(video_path, custom_output)
51 assert output_path == custom_output
52
53 # Verify subprocess.run was called with correct arguments
54 mock_run.assert_called()
55 args, kwargs = mock_run.call_args
56 assert "ffmpeg" in args[0]
57 assert "-i" in args[0]
58 assert str(video_path) in args[0]
59
60 @patch("soundfile.info")
61 def test_get_audio_properties(self, mock_sf_info):
62 """Test getting audio properties."""
63 # Mock soundfile.info
64 mock_info = MagicMock()
65 mock_info.duration = 10.5
@@ -66,55 +66,49 @@
66 mock_info.samplerate = 16000
67 mock_info.channels = 1
68 mock_info.format = "WAV"
69 mock_info.subtype = "PCM_16"
70 mock_sf_info.return_value = mock_info
71
72 with tempfile.TemporaryDirectory() as temp_dir:
73 # Create a dummy audio file
74 audio_path = Path(temp_dir) / "test_audio.wav"
75 with open(audio_path, "wb") as f:
76 f.write(b"dummy audio content")
77
78 # Get properties
79 extractor = AudioExtractor()
80 props = extractor.get_audio_properties(audio_path)
81
82 # Verify properties
83 assert props["duration"] == 10.5
84 assert props["sample_rate"] == 16000
85 assert props["channels"] == 1
86 assert props["format"] == "WAV"
87 assert props["subtype"] == "PCM_16"
88 assert props["path"] == str(audio_path)
89
90 def test_segment_audio(self):
91 """Test audio segmentation."""
92 # Create a dummy audio array (1 second at 16kHz)
93 audio_data = np.ones(16000)
94 sample_rate = 16000
95
96 extractor = AudioExtractor()
97
98 # Test with 500ms segments, no overlap
99 segments = extractor.segment_audio(
100 audio_data, sample_rate, segment_length_ms=500, overlap_ms=0
 
 
 
101 )
102
103 # Should produce 2 segments of 8000 samples each
104 assert len(segments) == 2
105 assert len(segments[0]) == 8000
106 assert len(segments[1]) == 8000
107
108 # Test with 600ms segments, 100ms overlap
109 segments = extractor.segment_audio(
110 audio_data, sample_rate, segment_length_ms=600, overlap_ms=100
111 )
112
113 # Should produce 2 segments (with overlap)
114 assert len(segments) == 2
 
 
 
115
--- tests/test_batch.py
+++ tests/test_batch.py
@@ -1,11 +1,8 @@
11
"""Tests for batch processing and knowledge graph merging."""
22
33
import json
4
-from pathlib import Path
5
-
6
-import pytest
74
85
from video_processor.integrators.knowledge_graph import KnowledgeGraph
96
from video_processor.integrators.plan_generator import PlanGenerator
107
from video_processor.models import (
118
ActionItem,
129
--- tests/test_batch.py
+++ tests/test_batch.py
@@ -1,11 +1,8 @@
1 """Tests for batch processing and knowledge graph merging."""
2
3 import json
4 from pathlib import Path
5
6 import pytest
7
8 from video_processor.integrators.knowledge_graph import KnowledgeGraph
9 from video_processor.integrators.plan_generator import PlanGenerator
10 from video_processor.models import (
11 ActionItem,
12
--- tests/test_batch.py
+++ tests/test_batch.py
@@ -1,11 +1,8 @@
1 """Tests for batch processing and knowledge graph merging."""
2
3 import json
 
 
 
4
5 from video_processor.integrators.knowledge_graph import KnowledgeGraph
6 from video_processor.integrators.plan_generator import PlanGenerator
7 from video_processor.models import (
8 ActionItem,
9
--- tests/test_cloud_sources.py
+++ tests/test_cloud_sources.py
@@ -134,11 +134,13 @@
134134
@patch("video_processor.sources.google_drive.GoogleDriveSource._auth_service_account")
135135
def test_authenticate_import_error(self, mock_auth):
136136
from video_processor.sources.google_drive import GoogleDriveSource
137137
138138
source = GoogleDriveSource()
139
- with patch.dict("sys.modules", {"google.oauth2": None, "google.oauth2.service_account": None}):
139
+ with patch.dict(
140
+ "sys.modules", {"google.oauth2": None, "google.oauth2.service_account": None}
141
+ ):
140142
# The import will fail inside authenticate
141143
result = source.authenticate()
142144
assert result is False
143145
144146
@@ -188,19 +190,24 @@
188190
def test_auth_saved_token(self, tmp_path):
189191
pytest.importorskip("dropbox")
190192
from video_processor.sources.dropbox_source import DropboxSource
191193
192194
token_file = tmp_path / "token.json"
193
- token_file.write_text(json.dumps({
194
- "refresh_token": "rt_test",
195
- "app_key": "key",
196
- "app_secret": "secret",
197
- }))
195
+ token_file.write_text(
196
+ json.dumps(
197
+ {
198
+ "refresh_token": "rt_test",
199
+ "app_key": "key",
200
+ "app_secret": "secret",
201
+ }
202
+ )
203
+ )
198204
199205
source = DropboxSource(token_path=token_file, app_key="key", app_secret="secret")
200206
201207
mock_dbx = MagicMock()
202208
with patch("dropbox.Dropbox", return_value=mock_dbx):
203209
import dropbox
210
+
204211
result = source._auth_saved_token(dropbox)
205212
assert result is True
206213
assert source.dbx is mock_dbx
207214
--- tests/test_cloud_sources.py
+++ tests/test_cloud_sources.py
@@ -134,11 +134,13 @@
134 @patch("video_processor.sources.google_drive.GoogleDriveSource._auth_service_account")
135 def test_authenticate_import_error(self, mock_auth):
136 from video_processor.sources.google_drive import GoogleDriveSource
137
138 source = GoogleDriveSource()
139 with patch.dict("sys.modules", {"google.oauth2": None, "google.oauth2.service_account": None}):
 
 
140 # The import will fail inside authenticate
141 result = source.authenticate()
142 assert result is False
143
144
@@ -188,19 +190,24 @@
188 def test_auth_saved_token(self, tmp_path):
189 pytest.importorskip("dropbox")
190 from video_processor.sources.dropbox_source import DropboxSource
191
192 token_file = tmp_path / "token.json"
193 token_file.write_text(json.dumps({
194 "refresh_token": "rt_test",
195 "app_key": "key",
196 "app_secret": "secret",
197 }))
 
 
 
 
198
199 source = DropboxSource(token_path=token_file, app_key="key", app_secret="secret")
200
201 mock_dbx = MagicMock()
202 with patch("dropbox.Dropbox", return_value=mock_dbx):
203 import dropbox
 
204 result = source._auth_saved_token(dropbox)
205 assert result is True
206 assert source.dbx is mock_dbx
207
--- tests/test_cloud_sources.py
+++ tests/test_cloud_sources.py
@@ -134,11 +134,13 @@
134 @patch("video_processor.sources.google_drive.GoogleDriveSource._auth_service_account")
135 def test_authenticate_import_error(self, mock_auth):
136 from video_processor.sources.google_drive import GoogleDriveSource
137
138 source = GoogleDriveSource()
139 with patch.dict(
140 "sys.modules", {"google.oauth2": None, "google.oauth2.service_account": None}
141 ):
142 # The import will fail inside authenticate
143 result = source.authenticate()
144 assert result is False
145
146
@@ -188,19 +190,24 @@
190 def test_auth_saved_token(self, tmp_path):
191 pytest.importorskip("dropbox")
192 from video_processor.sources.dropbox_source import DropboxSource
193
194 token_file = tmp_path / "token.json"
195 token_file.write_text(
196 json.dumps(
197 {
198 "refresh_token": "rt_test",
199 "app_key": "key",
200 "app_secret": "secret",
201 }
202 )
203 )
204
205 source = DropboxSource(token_path=token_file, app_key="key", app_secret="secret")
206
207 mock_dbx = MagicMock()
208 with patch("dropbox.Dropbox", return_value=mock_dbx):
209 import dropbox
210
211 result = source._auth_saved_token(dropbox)
212 assert result is True
213 assert source.dbx is mock_dbx
214
--- tests/test_content_analyzer.py
+++ tests/test_content_analyzer.py
@@ -1,11 +1,9 @@
11
"""Tests for content cross-referencing between transcript and diagram entities."""
22
33
import json
4
-from unittest.mock import MagicMock, patch
5
-
6
-import pytest
4
+from unittest.mock import MagicMock
75
86
from video_processor.analyzers.content_analyzer import ContentAnalyzer
97
from video_processor.models import Entity, KeyPoint
108
119
@@ -74,13 +72,15 @@
7472
7573
7674
class TestFuzzyMatch:
7775
def test_fuzzy_match_with_llm(self):
7876
pm = MagicMock()
79
- pm.chat.return_value = json.dumps([
80
- {"transcript": "K8s", "diagram": "Kubernetes"},
81
- ])
77
+ pm.chat.return_value = json.dumps(
78
+ [
79
+ {"transcript": "K8s", "diagram": "Kubernetes"},
80
+ ]
81
+ )
8282
analyzer = ContentAnalyzer(provider_manager=pm)
8383
8484
t_entities = [
8585
Entity(name="K8s", type="technology", descriptions=["Container orchestration"]),
8686
]
@@ -189,11 +189,13 @@
189189
assert len(result[0].related_diagrams) == 2
190190
191191
def test_details_used_for_matching(self):
192192
analyzer = ContentAnalyzer()
193193
kps = [
194
- KeyPoint(point="Architecture overview", details="Uses Docker and Kubernetes for deployment"),
194
+ KeyPoint(
195
+ point="Architecture overview", details="Uses Docker and Kubernetes for deployment"
196
+ ),
195197
]
196198
diagrams = [
197199
{"elements": ["Docker", "Kubernetes"], "text_content": "deployment infrastructure"},
198200
]
199201
result = analyzer.enrich_key_points(kps, diagrams, "")
200202
--- tests/test_content_analyzer.py
+++ tests/test_content_analyzer.py
@@ -1,11 +1,9 @@
1 """Tests for content cross-referencing between transcript and diagram entities."""
2
3 import json
4 from unittest.mock import MagicMock, patch
5
6 import pytest
7
8 from video_processor.analyzers.content_analyzer import ContentAnalyzer
9 from video_processor.models import Entity, KeyPoint
10
11
@@ -74,13 +72,15 @@
74
75
76 class TestFuzzyMatch:
77 def test_fuzzy_match_with_llm(self):
78 pm = MagicMock()
79 pm.chat.return_value = json.dumps([
80 {"transcript": "K8s", "diagram": "Kubernetes"},
81 ])
 
 
82 analyzer = ContentAnalyzer(provider_manager=pm)
83
84 t_entities = [
85 Entity(name="K8s", type="technology", descriptions=["Container orchestration"]),
86 ]
@@ -189,11 +189,13 @@
189 assert len(result[0].related_diagrams) == 2
190
191 def test_details_used_for_matching(self):
192 analyzer = ContentAnalyzer()
193 kps = [
194 KeyPoint(point="Architecture overview", details="Uses Docker and Kubernetes for deployment"),
 
 
195 ]
196 diagrams = [
197 {"elements": ["Docker", "Kubernetes"], "text_content": "deployment infrastructure"},
198 ]
199 result = analyzer.enrich_key_points(kps, diagrams, "")
200
--- tests/test_content_analyzer.py
+++ tests/test_content_analyzer.py
@@ -1,11 +1,9 @@
1 """Tests for content cross-referencing between transcript and diagram entities."""
2
3 import json
4 from unittest.mock import MagicMock
 
 
5
6 from video_processor.analyzers.content_analyzer import ContentAnalyzer
7 from video_processor.models import Entity, KeyPoint
8
9
@@ -74,13 +72,15 @@
72
73
74 class TestFuzzyMatch:
75 def test_fuzzy_match_with_llm(self):
76 pm = MagicMock()
77 pm.chat.return_value = json.dumps(
78 [
79 {"transcript": "K8s", "diagram": "Kubernetes"},
80 ]
81 )
82 analyzer = ContentAnalyzer(provider_manager=pm)
83
84 t_entities = [
85 Entity(name="K8s", type="technology", descriptions=["Container orchestration"]),
86 ]
@@ -189,11 +189,13 @@
189 assert len(result[0].related_diagrams) == 2
190
191 def test_details_used_for_matching(self):
192 analyzer = ContentAnalyzer()
193 kps = [
194 KeyPoint(
195 point="Architecture overview", details="Uses Docker and Kubernetes for deployment"
196 ),
197 ]
198 diagrams = [
199 {"elements": ["Docker", "Kubernetes"], "text_content": "deployment infrastructure"},
200 ]
201 result = analyzer.enrich_key_points(kps, diagrams, "")
202
--- tests/test_diagram_analyzer.py
+++ tests/test_diagram_analyzer.py
@@ -1,18 +1,17 @@
11
"""Tests for the rewritten diagram analyzer."""
22
33
import json
4
-from pathlib import Path
5
-from unittest.mock import MagicMock, patch
4
+from unittest.mock import MagicMock
65
76
import pytest
87
98
from video_processor.analyzers.diagram_analyzer import (
109
DiagramAnalyzer,
1110
_parse_json_response,
1211
)
13
-from video_processor.models import DiagramResult, DiagramType, ScreenCapture
12
+from video_processor.models import DiagramType
1413
1514
1615
class TestParseJsonResponse:
1716
def test_plain_json(self):
1817
result = _parse_json_response('{"key": "value"}')
@@ -50,27 +49,31 @@
5049
fp = tmp_path / "frame_0.jpg"
5150
fp.write_bytes(b"\xff\xd8\xff fake image data")
5251
return fp
5352
5453
def test_classify_frame_diagram(self, analyzer, mock_pm, fake_frame):
55
- mock_pm.analyze_image.return_value = json.dumps({
56
- "is_diagram": True,
57
- "diagram_type": "flowchart",
58
- "confidence": 0.85,
59
- "brief_description": "A flowchart showing login process"
60
- })
54
+ mock_pm.analyze_image.return_value = json.dumps(
55
+ {
56
+ "is_diagram": True,
57
+ "diagram_type": "flowchart",
58
+ "confidence": 0.85,
59
+ "brief_description": "A flowchart showing login process",
60
+ }
61
+ )
6162
result = analyzer.classify_frame(fake_frame)
6263
assert result["is_diagram"] is True
6364
assert result["confidence"] == 0.85
6465
6566
def test_classify_frame_not_diagram(self, analyzer, mock_pm, fake_frame):
66
- mock_pm.analyze_image.return_value = json.dumps({
67
- "is_diagram": False,
68
- "diagram_type": "unknown",
69
- "confidence": 0.1,
70
- "brief_description": "A person speaking"
71
- })
67
+ mock_pm.analyze_image.return_value = json.dumps(
68
+ {
69
+ "is_diagram": False,
70
+ "diagram_type": "unknown",
71
+ "confidence": 0.1,
72
+ "brief_description": "A person speaking",
73
+ }
74
+ )
7275
result = analyzer.classify_frame(fake_frame)
7376
assert result["is_diagram"] is False
7477
7578
def test_classify_frame_failure(self, analyzer, mock_pm, fake_frame):
7679
mock_pm.analyze_image.return_value = "I cannot parse this image"
@@ -77,19 +80,21 @@
7780
result = analyzer.classify_frame(fake_frame)
7881
assert result["is_diagram"] is False
7982
assert result["confidence"] == 0.0
8083
8184
def test_analyze_single_pass(self, analyzer, mock_pm, fake_frame):
82
- mock_pm.analyze_image.return_value = json.dumps({
83
- "diagram_type": "architecture",
84
- "description": "Microservices architecture",
85
- "text_content": "Service A, Service B",
86
- "elements": ["Service A", "Service B"],
87
- "relationships": ["A -> B: calls"],
88
- "mermaid": "graph LR\n A-->B",
89
- "chart_data": None
90
- })
85
+ mock_pm.analyze_image.return_value = json.dumps(
86
+ {
87
+ "diagram_type": "architecture",
88
+ "description": "Microservices architecture",
89
+ "text_content": "Service A, Service B",
90
+ "elements": ["Service A", "Service B"],
91
+ "relationships": ["A -> B: calls"],
92
+ "mermaid": "graph LR\n A-->B",
93
+ "chart_data": None,
94
+ }
95
+ )
9196
result = analyzer.analyze_diagram_single_pass(fake_frame)
9297
assert result["diagram_type"] == "architecture"
9398
assert result["mermaid"] == "graph LR\n A-->B"
9499
95100
def test_process_frames_high_confidence_diagram(self, analyzer, mock_pm, tmp_path):
@@ -105,38 +110,62 @@
105110
106111
# Frame 0: high confidence diagram
107112
# Frame 1: low confidence (skip)
108113
# Frame 2: medium confidence (screengrab)
109114
classify_responses = [
110
- json.dumps({"is_diagram": True, "diagram_type": "flowchart", "confidence": 0.9, "brief_description": "flow"}),
111
- json.dumps({"is_diagram": False, "diagram_type": "unknown", "confidence": 0.1, "brief_description": "nothing"}),
112
- json.dumps({"is_diagram": True, "diagram_type": "slide", "confidence": 0.5, "brief_description": "a slide"}),
113
- ]
114
- analysis_response = json.dumps({
115
- "diagram_type": "flowchart",
116
- "description": "Login flow",
117
- "text_content": "Start -> End",
118
- "elements": ["Start", "End"],
119
- "relationships": ["Start -> End"],
120
- "mermaid": "graph LR\n Start-->End",
121
- "chart_data": None
122
- })
115
+ json.dumps(
116
+ {
117
+ "is_diagram": True,
118
+ "diagram_type": "flowchart",
119
+ "confidence": 0.9,
120
+ "brief_description": "flow",
121
+ }
122
+ ),
123
+ json.dumps(
124
+ {
125
+ "is_diagram": False,
126
+ "diagram_type": "unknown",
127
+ "confidence": 0.1,
128
+ "brief_description": "nothing",
129
+ }
130
+ ),
131
+ json.dumps(
132
+ {
133
+ "is_diagram": True,
134
+ "diagram_type": "slide",
135
+ "confidence": 0.5,
136
+ "brief_description": "a slide",
137
+ }
138
+ ),
139
+ ]
140
+ analysis_response = json.dumps(
141
+ {
142
+ "diagram_type": "flowchart",
143
+ "description": "Login flow",
144
+ "text_content": "Start -> End",
145
+ "elements": ["Start", "End"],
146
+ "relationships": ["Start -> End"],
147
+ "mermaid": "graph LR\n Start-->End",
148
+ "chart_data": None,
149
+ }
150
+ )
123151
124152
# Calls are interleaved per-frame:
125153
# call 0: classify frame 0 (high conf)
126154
# call 1: analyze frame 0 (full analysis)
127155
# call 2: classify frame 1 (low conf - skip)
128156
# call 3: classify frame 2 (medium conf)
129157
# call 4: caption frame 2 (screengrab)
130158
call_sequence = [
131
- classify_responses[0], # classify frame 0
132
- analysis_response, # analyze frame 0
133
- classify_responses[1], # classify frame 1
134
- classify_responses[2], # classify frame 2
159
+ classify_responses[0], # classify frame 0
160
+ analysis_response, # analyze frame 0
161
+ classify_responses[1], # classify frame 1
162
+ classify_responses[2], # classify frame 2
135163
"A slide about something", # caption frame 2
136164
]
137165
call_count = [0]
166
+
138167
def side_effect(image_bytes, prompt, max_tokens=4096):
139168
idx = call_count[0]
140169
call_count[0] += 1
141170
return call_sequence[idx]
142171
@@ -164,15 +193,23 @@
164193
fp.write_bytes(b"\xff\xd8\xff fake")
165194
captures_dir = tmp_path / "captures"
166195
167196
# High confidence classification but analysis fails
168197
call_count = [0]
198
+
169199
def side_effect(image_bytes, prompt, max_tokens=4096):
170200
idx = call_count[0]
171201
call_count[0] += 1
172202
if idx == 0:
173
- return json.dumps({"is_diagram": True, "diagram_type": "chart", "confidence": 0.8, "brief_description": "chart"})
203
+ return json.dumps(
204
+ {
205
+ "is_diagram": True,
206
+ "diagram_type": "chart",
207
+ "confidence": 0.8,
208
+ "brief_description": "chart",
209
+ }
210
+ )
174211
if idx == 1:
175212
return "This is not valid JSON" # Analysis fails
176213
return "A chart showing data" # Caption
177214
178215
mock_pm.analyze_image.side_effect = side_effect
179216
--- tests/test_diagram_analyzer.py
+++ tests/test_diagram_analyzer.py
@@ -1,18 +1,17 @@
1 """Tests for the rewritten diagram analyzer."""
2
3 import json
4 from pathlib import Path
5 from unittest.mock import MagicMock, patch
6
7 import pytest
8
9 from video_processor.analyzers.diagram_analyzer import (
10 DiagramAnalyzer,
11 _parse_json_response,
12 )
13 from video_processor.models import DiagramResult, DiagramType, ScreenCapture
14
15
16 class TestParseJsonResponse:
17 def test_plain_json(self):
18 result = _parse_json_response('{"key": "value"}')
@@ -50,27 +49,31 @@
50 fp = tmp_path / "frame_0.jpg"
51 fp.write_bytes(b"\xff\xd8\xff fake image data")
52 return fp
53
54 def test_classify_frame_diagram(self, analyzer, mock_pm, fake_frame):
55 mock_pm.analyze_image.return_value = json.dumps({
56 "is_diagram": True,
57 "diagram_type": "flowchart",
58 "confidence": 0.85,
59 "brief_description": "A flowchart showing login process"
60 })
 
 
61 result = analyzer.classify_frame(fake_frame)
62 assert result["is_diagram"] is True
63 assert result["confidence"] == 0.85
64
65 def test_classify_frame_not_diagram(self, analyzer, mock_pm, fake_frame):
66 mock_pm.analyze_image.return_value = json.dumps({
67 "is_diagram": False,
68 "diagram_type": "unknown",
69 "confidence": 0.1,
70 "brief_description": "A person speaking"
71 })
 
 
72 result = analyzer.classify_frame(fake_frame)
73 assert result["is_diagram"] is False
74
75 def test_classify_frame_failure(self, analyzer, mock_pm, fake_frame):
76 mock_pm.analyze_image.return_value = "I cannot parse this image"
@@ -77,19 +80,21 @@
77 result = analyzer.classify_frame(fake_frame)
78 assert result["is_diagram"] is False
79 assert result["confidence"] == 0.0
80
81 def test_analyze_single_pass(self, analyzer, mock_pm, fake_frame):
82 mock_pm.analyze_image.return_value = json.dumps({
83 "diagram_type": "architecture",
84 "description": "Microservices architecture",
85 "text_content": "Service A, Service B",
86 "elements": ["Service A", "Service B"],
87 "relationships": ["A -> B: calls"],
88 "mermaid": "graph LR\n A-->B",
89 "chart_data": None
90 })
 
 
91 result = analyzer.analyze_diagram_single_pass(fake_frame)
92 assert result["diagram_type"] == "architecture"
93 assert result["mermaid"] == "graph LR\n A-->B"
94
95 def test_process_frames_high_confidence_diagram(self, analyzer, mock_pm, tmp_path):
@@ -105,38 +110,62 @@
105
106 # Frame 0: high confidence diagram
107 # Frame 1: low confidence (skip)
108 # Frame 2: medium confidence (screengrab)
109 classify_responses = [
110 json.dumps({"is_diagram": True, "diagram_type": "flowchart", "confidence": 0.9, "brief_description": "flow"}),
111 json.dumps({"is_diagram": False, "diagram_type": "unknown", "confidence": 0.1, "brief_description": "nothing"}),
112 json.dumps({"is_diagram": True, "diagram_type": "slide", "confidence": 0.5, "brief_description": "a slide"}),
113 ]
114 analysis_response = json.dumps({
115 "diagram_type": "flowchart",
116 "description": "Login flow",
117 "text_content": "Start -> End",
118 "elements": ["Start", "End"],
119 "relationships": ["Start -> End"],
120 "mermaid": "graph LR\n Start-->End",
121 "chart_data": None
122 })
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
123
124 # Calls are interleaved per-frame:
125 # call 0: classify frame 0 (high conf)
126 # call 1: analyze frame 0 (full analysis)
127 # call 2: classify frame 1 (low conf - skip)
128 # call 3: classify frame 2 (medium conf)
129 # call 4: caption frame 2 (screengrab)
130 call_sequence = [
131 classify_responses[0], # classify frame 0
132 analysis_response, # analyze frame 0
133 classify_responses[1], # classify frame 1
134 classify_responses[2], # classify frame 2
135 "A slide about something", # caption frame 2
136 ]
137 call_count = [0]
 
138 def side_effect(image_bytes, prompt, max_tokens=4096):
139 idx = call_count[0]
140 call_count[0] += 1
141 return call_sequence[idx]
142
@@ -164,15 +193,23 @@
164 fp.write_bytes(b"\xff\xd8\xff fake")
165 captures_dir = tmp_path / "captures"
166
167 # High confidence classification but analysis fails
168 call_count = [0]
 
169 def side_effect(image_bytes, prompt, max_tokens=4096):
170 idx = call_count[0]
171 call_count[0] += 1
172 if idx == 0:
173 return json.dumps({"is_diagram": True, "diagram_type": "chart", "confidence": 0.8, "brief_description": "chart"})
 
 
 
 
 
 
 
174 if idx == 1:
175 return "This is not valid JSON" # Analysis fails
176 return "A chart showing data" # Caption
177
178 mock_pm.analyze_image.side_effect = side_effect
179
--- tests/test_diagram_analyzer.py
+++ tests/test_diagram_analyzer.py
@@ -1,18 +1,17 @@
1 """Tests for the rewritten diagram analyzer."""
2
3 import json
4 from unittest.mock import MagicMock
 
5
6 import pytest
7
8 from video_processor.analyzers.diagram_analyzer import (
9 DiagramAnalyzer,
10 _parse_json_response,
11 )
12 from video_processor.models import DiagramType
13
14
15 class TestParseJsonResponse:
16 def test_plain_json(self):
17 result = _parse_json_response('{"key": "value"}')
@@ -50,27 +49,31 @@
49 fp = tmp_path / "frame_0.jpg"
50 fp.write_bytes(b"\xff\xd8\xff fake image data")
51 return fp
52
53 def test_classify_frame_diagram(self, analyzer, mock_pm, fake_frame):
54 mock_pm.analyze_image.return_value = json.dumps(
55 {
56 "is_diagram": True,
57 "diagram_type": "flowchart",
58 "confidence": 0.85,
59 "brief_description": "A flowchart showing login process",
60 }
61 )
62 result = analyzer.classify_frame(fake_frame)
63 assert result["is_diagram"] is True
64 assert result["confidence"] == 0.85
65
66 def test_classify_frame_not_diagram(self, analyzer, mock_pm, fake_frame):
67 mock_pm.analyze_image.return_value = json.dumps(
68 {
69 "is_diagram": False,
70 "diagram_type": "unknown",
71 "confidence": 0.1,
72 "brief_description": "A person speaking",
73 }
74 )
75 result = analyzer.classify_frame(fake_frame)
76 assert result["is_diagram"] is False
77
78 def test_classify_frame_failure(self, analyzer, mock_pm, fake_frame):
79 mock_pm.analyze_image.return_value = "I cannot parse this image"
@@ -77,19 +80,21 @@
80 result = analyzer.classify_frame(fake_frame)
81 assert result["is_diagram"] is False
82 assert result["confidence"] == 0.0
83
84 def test_analyze_single_pass(self, analyzer, mock_pm, fake_frame):
85 mock_pm.analyze_image.return_value = json.dumps(
86 {
87 "diagram_type": "architecture",
88 "description": "Microservices architecture",
89 "text_content": "Service A, Service B",
90 "elements": ["Service A", "Service B"],
91 "relationships": ["A -> B: calls"],
92 "mermaid": "graph LR\n A-->B",
93 "chart_data": None,
94 }
95 )
96 result = analyzer.analyze_diagram_single_pass(fake_frame)
97 assert result["diagram_type"] == "architecture"
98 assert result["mermaid"] == "graph LR\n A-->B"
99
100 def test_process_frames_high_confidence_diagram(self, analyzer, mock_pm, tmp_path):
@@ -105,38 +110,62 @@
110
111 # Frame 0: high confidence diagram
112 # Frame 1: low confidence (skip)
113 # Frame 2: medium confidence (screengrab)
114 classify_responses = [
115 json.dumps(
116 {
117 "is_diagram": True,
118 "diagram_type": "flowchart",
119 "confidence": 0.9,
120 "brief_description": "flow",
121 }
122 ),
123 json.dumps(
124 {
125 "is_diagram": False,
126 "diagram_type": "unknown",
127 "confidence": 0.1,
128 "brief_description": "nothing",
129 }
130 ),
131 json.dumps(
132 {
133 "is_diagram": True,
134 "diagram_type": "slide",
135 "confidence": 0.5,
136 "brief_description": "a slide",
137 }
138 ),
139 ]
140 analysis_response = json.dumps(
141 {
142 "diagram_type": "flowchart",
143 "description": "Login flow",
144 "text_content": "Start -> End",
145 "elements": ["Start", "End"],
146 "relationships": ["Start -> End"],
147 "mermaid": "graph LR\n Start-->End",
148 "chart_data": None,
149 }
150 )
151
152 # Calls are interleaved per-frame:
153 # call 0: classify frame 0 (high conf)
154 # call 1: analyze frame 0 (full analysis)
155 # call 2: classify frame 1 (low conf - skip)
156 # call 3: classify frame 2 (medium conf)
157 # call 4: caption frame 2 (screengrab)
158 call_sequence = [
159 classify_responses[0], # classify frame 0
160 analysis_response, # analyze frame 0
161 classify_responses[1], # classify frame 1
162 classify_responses[2], # classify frame 2
163 "A slide about something", # caption frame 2
164 ]
165 call_count = [0]
166
167 def side_effect(image_bytes, prompt, max_tokens=4096):
168 idx = call_count[0]
169 call_count[0] += 1
170 return call_sequence[idx]
171
@@ -164,15 +193,23 @@
193 fp.write_bytes(b"\xff\xd8\xff fake")
194 captures_dir = tmp_path / "captures"
195
196 # High confidence classification but analysis fails
197 call_count = [0]
198
199 def side_effect(image_bytes, prompt, max_tokens=4096):
200 idx = call_count[0]
201 call_count[0] += 1
202 if idx == 0:
203 return json.dumps(
204 {
205 "is_diagram": True,
206 "diagram_type": "chart",
207 "confidence": 0.8,
208 "brief_description": "chart",
209 }
210 )
211 if idx == 1:
212 return "This is not valid JSON" # Analysis fails
213 return "A chart showing data" # Caption
214
215 mock_pm.analyze_image.side_effect = side_effect
216
--- tests/test_frame_extractor.py
+++ tests/test_frame_extractor.py
@@ -1,19 +1,19 @@
11
"""Tests for the frame extractor module."""
2
+
23
import os
34
import tempfile
4
-from pathlib import Path
55
66
import numpy as np
77
import pytest
88
99
from video_processor.extractors.frame_extractor import (
1010
calculate_frame_difference,
11
- extract_frames,
1211
is_gpu_available,
13
- save_frames
12
+ save_frames,
1413
)
14
+
1515
1616
# Create dummy test frames
1717
@pytest.fixture
1818
def dummy_frames():
1919
# Create a list of dummy frames with different content
@@ -21,42 +21,45 @@
2121
for i in range(3):
2222
# Create frame with different intensity for each
2323
frame = np.ones((100, 100, 3), dtype=np.uint8) * (i * 50)
2424
frames.append(frame)
2525
return frames
26
+
2627
2728
def test_calculate_frame_difference():
2829
"""Test frame difference calculation."""
2930
# Create two frames with some difference
3031
frame1 = np.zeros((100, 100, 3), dtype=np.uint8)
3132
frame2 = np.ones((100, 100, 3), dtype=np.uint8) * 128 # 50% intensity
32
-
33
+
3334
# Calculate difference
3435
diff = calculate_frame_difference(frame1, frame2)
35
-
36
+
3637
# Expected difference is around 128/255 = 0.5
3738
assert 0.45 <= diff <= 0.55
38
-
39
+
3940
# Test identical frames
4041
diff_identical = calculate_frame_difference(frame1, frame1.copy())
4142
assert diff_identical < 0.001 # Should be very close to 0
43
+
4244
4345
def test_is_gpu_available():
4446
"""Test GPU availability check."""
4547
# This just tests that the function runs without error
4648
# We don't assert the result because it depends on the system
4749
result = is_gpu_available()
4850
assert isinstance(result, bool)
51
+
4952
5053
def test_save_frames(dummy_frames):
5154
"""Test saving frames to disk."""
5255
with tempfile.TemporaryDirectory() as temp_dir:
5356
# Save frames
5457
paths = save_frames(dummy_frames, temp_dir, "test_frame")
55
-
58
+
5659
# Check that we got the correct number of paths
5760
assert len(paths) == len(dummy_frames)
58
-
61
+
5962
# Check that files were created
6063
for path in paths:
6164
assert os.path.exists(path)
62
- assert os.path.getsize(path) > 0 # Files should have content
65
+ assert os.path.getsize(path) > 0 # Files should have content
6366
--- tests/test_frame_extractor.py
+++ tests/test_frame_extractor.py
@@ -1,19 +1,19 @@
1 """Tests for the frame extractor module."""
 
2 import os
3 import tempfile
4 from pathlib import Path
5
6 import numpy as np
7 import pytest
8
9 from video_processor.extractors.frame_extractor import (
10 calculate_frame_difference,
11 extract_frames,
12 is_gpu_available,
13 save_frames
14 )
 
15
16 # Create dummy test frames
17 @pytest.fixture
18 def dummy_frames():
19 # Create a list of dummy frames with different content
@@ -21,42 +21,45 @@
21 for i in range(3):
22 # Create frame with different intensity for each
23 frame = np.ones((100, 100, 3), dtype=np.uint8) * (i * 50)
24 frames.append(frame)
25 return frames
 
26
27 def test_calculate_frame_difference():
28 """Test frame difference calculation."""
29 # Create two frames with some difference
30 frame1 = np.zeros((100, 100, 3), dtype=np.uint8)
31 frame2 = np.ones((100, 100, 3), dtype=np.uint8) * 128 # 50% intensity
32
33 # Calculate difference
34 diff = calculate_frame_difference(frame1, frame2)
35
36 # Expected difference is around 128/255 = 0.5
37 assert 0.45 <= diff <= 0.55
38
39 # Test identical frames
40 diff_identical = calculate_frame_difference(frame1, frame1.copy())
41 assert diff_identical < 0.001 # Should be very close to 0
 
42
43 def test_is_gpu_available():
44 """Test GPU availability check."""
45 # This just tests that the function runs without error
46 # We don't assert the result because it depends on the system
47 result = is_gpu_available()
48 assert isinstance(result, bool)
 
49
50 def test_save_frames(dummy_frames):
51 """Test saving frames to disk."""
52 with tempfile.TemporaryDirectory() as temp_dir:
53 # Save frames
54 paths = save_frames(dummy_frames, temp_dir, "test_frame")
55
56 # Check that we got the correct number of paths
57 assert len(paths) == len(dummy_frames)
58
59 # Check that files were created
60 for path in paths:
61 assert os.path.exists(path)
62 assert os.path.getsize(path) > 0 # Files should have content
63
--- tests/test_frame_extractor.py
+++ tests/test_frame_extractor.py
@@ -1,19 +1,19 @@
1 """Tests for the frame extractor module."""
2
3 import os
4 import tempfile
 
5
6 import numpy as np
7 import pytest
8
9 from video_processor.extractors.frame_extractor import (
10 calculate_frame_difference,
 
11 is_gpu_available,
12 save_frames,
13 )
14
15
16 # Create dummy test frames
17 @pytest.fixture
18 def dummy_frames():
19 # Create a list of dummy frames with different content
@@ -21,42 +21,45 @@
21 for i in range(3):
22 # Create frame with different intensity for each
23 frame = np.ones((100, 100, 3), dtype=np.uint8) * (i * 50)
24 frames.append(frame)
25 return frames
26
27
28 def test_calculate_frame_difference():
29 """Test frame difference calculation."""
30 # Create two frames with some difference
31 frame1 = np.zeros((100, 100, 3), dtype=np.uint8)
32 frame2 = np.ones((100, 100, 3), dtype=np.uint8) * 128 # 50% intensity
33
34 # Calculate difference
35 diff = calculate_frame_difference(frame1, frame2)
36
37 # Expected difference is around 128/255 = 0.5
38 assert 0.45 <= diff <= 0.55
39
40 # Test identical frames
41 diff_identical = calculate_frame_difference(frame1, frame1.copy())
42 assert diff_identical < 0.001 # Should be very close to 0
43
44
45 def test_is_gpu_available():
46 """Test GPU availability check."""
47 # This just tests that the function runs without error
48 # We don't assert the result because it depends on the system
49 result = is_gpu_available()
50 assert isinstance(result, bool)
51
52
53 def test_save_frames(dummy_frames):
54 """Test saving frames to disk."""
55 with tempfile.TemporaryDirectory() as temp_dir:
56 # Save frames
57 paths = save_frames(dummy_frames, temp_dir, "test_frame")
58
59 # Check that we got the correct number of paths
60 assert len(paths) == len(dummy_frames)
61
62 # Check that files were created
63 for path in paths:
64 assert os.path.exists(path)
65 assert os.path.getsize(path) > 0 # Files should have content
66
--- tests/test_json_parsing.py
+++ tests/test_json_parsing.py
@@ -1,25 +1,23 @@
11
"""Tests for robust JSON parsing from LLM responses."""
22
3
-import pytest
4
-
53
from video_processor.utils.json_parsing import parse_json_from_response
64
75
86
class TestParseJsonFromResponse:
97
def test_direct_dict(self):
108
assert parse_json_from_response('{"key": "value"}') == {"key": "value"}
119
1210
def test_direct_array(self):
13
- assert parse_json_from_response('[1, 2, 3]') == [1, 2, 3]
11
+ assert parse_json_from_response("[1, 2, 3]") == [1, 2, 3]
1412
1513
def test_markdown_fenced_json(self):
1614
text = '```json\n{"key": "value"}\n```'
1715
assert parse_json_from_response(text) == {"key": "value"}
1816
1917
def test_markdown_fenced_no_lang(self):
20
- text = '```\n[1, 2]\n```'
18
+ text = "```\n[1, 2]\n```"
2119
assert parse_json_from_response(text) == [1, 2]
2220
2321
def test_json_embedded_in_text(self):
2422
text = 'Here is the result:\n{"name": "test", "value": 42}\nEnd of result.'
2523
result = parse_json_from_response(text)
2624
--- tests/test_json_parsing.py
+++ tests/test_json_parsing.py
@@ -1,25 +1,23 @@
1 """Tests for robust JSON parsing from LLM responses."""
2
3 import pytest
4
5 from video_processor.utils.json_parsing import parse_json_from_response
6
7
8 class TestParseJsonFromResponse:
9 def test_direct_dict(self):
10 assert parse_json_from_response('{"key": "value"}') == {"key": "value"}
11
12 def test_direct_array(self):
13 assert parse_json_from_response('[1, 2, 3]') == [1, 2, 3]
14
15 def test_markdown_fenced_json(self):
16 text = '```json\n{"key": "value"}\n```'
17 assert parse_json_from_response(text) == {"key": "value"}
18
19 def test_markdown_fenced_no_lang(self):
20 text = '```\n[1, 2]\n```'
21 assert parse_json_from_response(text) == [1, 2]
22
23 def test_json_embedded_in_text(self):
24 text = 'Here is the result:\n{"name": "test", "value": 42}\nEnd of result.'
25 result = parse_json_from_response(text)
26
--- tests/test_json_parsing.py
+++ tests/test_json_parsing.py
@@ -1,25 +1,23 @@
1 """Tests for robust JSON parsing from LLM responses."""
2
 
 
3 from video_processor.utils.json_parsing import parse_json_from_response
4
5
6 class TestParseJsonFromResponse:
7 def test_direct_dict(self):
8 assert parse_json_from_response('{"key": "value"}') == {"key": "value"}
9
10 def test_direct_array(self):
11 assert parse_json_from_response("[1, 2, 3]") == [1, 2, 3]
12
13 def test_markdown_fenced_json(self):
14 text = '```json\n{"key": "value"}\n```'
15 assert parse_json_from_response(text) == {"key": "value"}
16
17 def test_markdown_fenced_no_lang(self):
18 text = "```\n[1, 2]\n```"
19 assert parse_json_from_response(text) == [1, 2]
20
21 def test_json_embedded_in_text(self):
22 text = 'Here is the result:\n{"name": "test", "value": 42}\nEnd of result.'
23 result = parse_json_from_response(text)
24
--- tests/test_models.py
+++ tests/test_models.py
@@ -1,11 +1,7 @@
11
"""Tests for pydantic data models."""
22
3
-import json
4
-
5
-import pytest
6
-
73
from video_processor.models import (
84
ActionItem,
95
BatchManifest,
106
BatchVideoEntry,
117
DiagramResult,
@@ -66,11 +62,13 @@
6662
assert restored == item
6763
6864
6965
class TestKeyPoint:
7066
def test_with_related_diagrams(self):
71
- kp = KeyPoint(point="System uses microservices", topic="Architecture", related_diagrams=[0, 2])
67
+ kp = KeyPoint(
68
+ point="System uses microservices", topic="Architecture", related_diagrams=[0, 2]
69
+ )
7270
assert kp.related_diagrams == [0, 2]
7371
7472
def test_round_trip(self):
7573
kp = KeyPoint(point="Test", details="Detail", timestamp=42.0, source="diagram")
7674
restored = KeyPoint.model_validate_json(kp.model_dump_json())
@@ -120,11 +118,15 @@
120118
sc = ScreenCapture(frame_index=10, caption="Architecture overview slide", confidence=0.5)
121119
assert sc.image_path is None
122120
123121
def test_round_trip(self):
124122
sc = ScreenCapture(
125
- frame_index=7, timestamp=30.0, caption="Timeline", image_path="captures/capture_0.jpg", confidence=0.45
123
+ frame_index=7,
124
+ timestamp=30.0,
125
+ caption="Timeline",
126
+ image_path="captures/capture_0.jpg",
127
+ confidence=0.45,
126128
)
127129
restored = ScreenCapture.model_validate_json(sc.model_dump_json())
128130
assert restored == sc
129131
130132
@@ -171,11 +173,13 @@
171173
assert m.screen_captures == []
172174
assert m.stats.frames_extracted == 0
173175
174176
def test_full_round_trip(self):
175177
m = VideoManifest(
176
- video=VideoMetadata(title="Meeting", source_path="/tmp/video.mp4", duration_seconds=3600.0),
178
+ video=VideoMetadata(
179
+ title="Meeting", source_path="/tmp/video.mp4", duration_seconds=3600.0
180
+ ),
177181
stats=ProcessingStats(
178182
frames_extracted=50,
179183
diagrams_detected=3,
180184
screen_captures=2,
181185
models_used={"vision": "gpt-4o", "chat": "claude-sonnet-4-5"},
182186
--- tests/test_models.py
+++ tests/test_models.py
@@ -1,11 +1,7 @@
1 """Tests for pydantic data models."""
2
3 import json
4
5 import pytest
6
7 from video_processor.models import (
8 ActionItem,
9 BatchManifest,
10 BatchVideoEntry,
11 DiagramResult,
@@ -66,11 +62,13 @@
66 assert restored == item
67
68
69 class TestKeyPoint:
70 def test_with_related_diagrams(self):
71 kp = KeyPoint(point="System uses microservices", topic="Architecture", related_diagrams=[0, 2])
 
 
72 assert kp.related_diagrams == [0, 2]
73
74 def test_round_trip(self):
75 kp = KeyPoint(point="Test", details="Detail", timestamp=42.0, source="diagram")
76 restored = KeyPoint.model_validate_json(kp.model_dump_json())
@@ -120,11 +118,15 @@
120 sc = ScreenCapture(frame_index=10, caption="Architecture overview slide", confidence=0.5)
121 assert sc.image_path is None
122
123 def test_round_trip(self):
124 sc = ScreenCapture(
125 frame_index=7, timestamp=30.0, caption="Timeline", image_path="captures/capture_0.jpg", confidence=0.45
 
 
 
 
126 )
127 restored = ScreenCapture.model_validate_json(sc.model_dump_json())
128 assert restored == sc
129
130
@@ -171,11 +173,13 @@
171 assert m.screen_captures == []
172 assert m.stats.frames_extracted == 0
173
174 def test_full_round_trip(self):
175 m = VideoManifest(
176 video=VideoMetadata(title="Meeting", source_path="/tmp/video.mp4", duration_seconds=3600.0),
 
 
177 stats=ProcessingStats(
178 frames_extracted=50,
179 diagrams_detected=3,
180 screen_captures=2,
181 models_used={"vision": "gpt-4o", "chat": "claude-sonnet-4-5"},
182
--- tests/test_models.py
+++ tests/test_models.py
@@ -1,11 +1,7 @@
1 """Tests for pydantic data models."""
2
 
 
 
 
3 from video_processor.models import (
4 ActionItem,
5 BatchManifest,
6 BatchVideoEntry,
7 DiagramResult,
@@ -66,11 +62,13 @@
62 assert restored == item
63
64
65 class TestKeyPoint:
66 def test_with_related_diagrams(self):
67 kp = KeyPoint(
68 point="System uses microservices", topic="Architecture", related_diagrams=[0, 2]
69 )
70 assert kp.related_diagrams == [0, 2]
71
72 def test_round_trip(self):
73 kp = KeyPoint(point="Test", details="Detail", timestamp=42.0, source="diagram")
74 restored = KeyPoint.model_validate_json(kp.model_dump_json())
@@ -120,11 +118,15 @@
118 sc = ScreenCapture(frame_index=10, caption="Architecture overview slide", confidence=0.5)
119 assert sc.image_path is None
120
121 def test_round_trip(self):
122 sc = ScreenCapture(
123 frame_index=7,
124 timestamp=30.0,
125 caption="Timeline",
126 image_path="captures/capture_0.jpg",
127 confidence=0.45,
128 )
129 restored = ScreenCapture.model_validate_json(sc.model_dump_json())
130 assert restored == sc
131
132
@@ -171,11 +173,13 @@
173 assert m.screen_captures == []
174 assert m.stats.frames_extracted == 0
175
176 def test_full_round_trip(self):
177 m = VideoManifest(
178 video=VideoMetadata(
179 title="Meeting", source_path="/tmp/video.mp4", duration_seconds=3600.0
180 ),
181 stats=ProcessingStats(
182 frames_extracted=50,
183 diagrams_detected=3,
184 screen_captures=2,
185 models_used={"vision": "gpt-4o", "chat": "claude-sonnet-4-5"},
186
--- tests/test_output_structure.py
+++ tests/test_output_structure.py
@@ -1,12 +1,8 @@
11
"""Tests for output structure and manifest I/O."""
22
33
import json
4
-import tempfile
5
-from pathlib import Path
6
-
7
-import pytest
84
95
from video_processor.models import (
106
ActionItem,
117
BatchManifest,
128
BatchVideoEntry,
139
--- tests/test_output_structure.py
+++ tests/test_output_structure.py
@@ -1,12 +1,8 @@
1 """Tests for output structure and manifest I/O."""
2
3 import json
4 import tempfile
5 from pathlib import Path
6
7 import pytest
8
9 from video_processor.models import (
10 ActionItem,
11 BatchManifest,
12 BatchVideoEntry,
13
--- tests/test_output_structure.py
+++ tests/test_output_structure.py
@@ -1,12 +1,8 @@
1 """Tests for output structure and manifest I/O."""
2
3 import json
 
 
 
 
4
5 from video_processor.models import (
6 ActionItem,
7 BatchManifest,
8 BatchVideoEntry,
9
--- tests/test_pipeline.py
+++ tests/test_pipeline.py
@@ -1,14 +1,11 @@
11
"""Tests for the core video processing pipeline."""
22
33
import json
4
-from pathlib import Path
5
-from unittest.mock import MagicMock, patch
4
+from unittest.mock import MagicMock
65
7
-import pytest
8
-
9
-from video_processor.pipeline import _extract_key_points, _extract_action_items, _format_srt_time
6
+from video_processor.pipeline import _extract_action_items, _extract_key_points, _format_srt_time
107
118
129
class TestFormatSrtTime:
1310
def test_zero(self):
1411
assert _format_srt_time(0) == "00:00:00,000"
@@ -28,27 +25,31 @@
2825
2926
3027
class TestExtractKeyPoints:
3128
def test_parses_valid_response(self):
3229
pm = MagicMock()
33
- pm.chat.return_value = json.dumps([
34
- {"point": "Main point", "topic": "Architecture", "details": "Some details"},
35
- {"point": "Second point", "topic": None, "details": None},
36
- ])
30
+ pm.chat.return_value = json.dumps(
31
+ [
32
+ {"point": "Main point", "topic": "Architecture", "details": "Some details"},
33
+ {"point": "Second point", "topic": None, "details": None},
34
+ ]
35
+ )
3736
result = _extract_key_points(pm, "Some transcript text here")
3837
assert len(result) == 2
3938
assert result[0].point == "Main point"
4039
assert result[0].topic == "Architecture"
4140
assert result[1].point == "Second point"
4241
4342
def test_skips_invalid_items(self):
4443
pm = MagicMock()
45
- pm.chat.return_value = json.dumps([
46
- {"point": "Valid", "topic": None},
47
- {"topic": "No point field"},
48
- {"point": "", "topic": "Empty point"},
49
- ])
44
+ pm.chat.return_value = json.dumps(
45
+ [
46
+ {"point": "Valid", "topic": None},
47
+ {"topic": "No point field"},
48
+ {"point": "", "topic": "Empty point"},
49
+ ]
50
+ )
5051
result = _extract_key_points(pm, "text")
5152
assert len(result) == 1
5253
assert result[0].point == "Valid"
5354
5455
def test_handles_error(self):
@@ -65,29 +66,38 @@
6566
6667
6768
class TestExtractActionItems:
6869
def test_parses_valid_response(self):
6970
pm = MagicMock()
70
- pm.chat.return_value = json.dumps([
71
- {"action": "Deploy fix", "assignee": "Bob", "deadline": "Friday",
72
- "priority": "high", "context": "Production"},
73
- ])
71
+ pm.chat.return_value = json.dumps(
72
+ [
73
+ {
74
+ "action": "Deploy fix",
75
+ "assignee": "Bob",
76
+ "deadline": "Friday",
77
+ "priority": "high",
78
+ "context": "Production",
79
+ },
80
+ ]
81
+ )
7482
result = _extract_action_items(pm, "Some transcript text")
7583
assert len(result) == 1
7684
assert result[0].action == "Deploy fix"
7785
assert result[0].assignee == "Bob"
7886
7987
def test_skips_invalid_items(self):
8088
pm = MagicMock()
81
- pm.chat.return_value = json.dumps([
82
- {"action": "Valid action"},
83
- {"assignee": "No action field"},
84
- {"action": ""},
85
- ])
89
+ pm.chat.return_value = json.dumps(
90
+ [
91
+ {"action": "Valid action"},
92
+ {"assignee": "No action field"},
93
+ {"action": ""},
94
+ ]
95
+ )
8696
result = _extract_action_items(pm, "text")
8797
assert len(result) == 1
8898
8999
def test_handles_error(self):
90100
pm = MagicMock()
91101
pm.chat.side_effect = Exception("API down")
92102
result = _extract_action_items(pm, "text")
93103
assert result == []
94104
--- tests/test_pipeline.py
+++ tests/test_pipeline.py
@@ -1,14 +1,11 @@
1 """Tests for the core video processing pipeline."""
2
3 import json
4 from pathlib import Path
5 from unittest.mock import MagicMock, patch
6
7 import pytest
8
9 from video_processor.pipeline import _extract_key_points, _extract_action_items, _format_srt_time
10
11
12 class TestFormatSrtTime:
13 def test_zero(self):
14 assert _format_srt_time(0) == "00:00:00,000"
@@ -28,27 +25,31 @@
28
29
30 class TestExtractKeyPoints:
31 def test_parses_valid_response(self):
32 pm = MagicMock()
33 pm.chat.return_value = json.dumps([
34 {"point": "Main point", "topic": "Architecture", "details": "Some details"},
35 {"point": "Second point", "topic": None, "details": None},
36 ])
 
 
37 result = _extract_key_points(pm, "Some transcript text here")
38 assert len(result) == 2
39 assert result[0].point == "Main point"
40 assert result[0].topic == "Architecture"
41 assert result[1].point == "Second point"
42
43 def test_skips_invalid_items(self):
44 pm = MagicMock()
45 pm.chat.return_value = json.dumps([
46 {"point": "Valid", "topic": None},
47 {"topic": "No point field"},
48 {"point": "", "topic": "Empty point"},
49 ])
 
 
50 result = _extract_key_points(pm, "text")
51 assert len(result) == 1
52 assert result[0].point == "Valid"
53
54 def test_handles_error(self):
@@ -65,29 +66,38 @@
65
66
67 class TestExtractActionItems:
68 def test_parses_valid_response(self):
69 pm = MagicMock()
70 pm.chat.return_value = json.dumps([
71 {"action": "Deploy fix", "assignee": "Bob", "deadline": "Friday",
72 "priority": "high", "context": "Production"},
73 ])
 
 
 
 
 
 
 
74 result = _extract_action_items(pm, "Some transcript text")
75 assert len(result) == 1
76 assert result[0].action == "Deploy fix"
77 assert result[0].assignee == "Bob"
78
79 def test_skips_invalid_items(self):
80 pm = MagicMock()
81 pm.chat.return_value = json.dumps([
82 {"action": "Valid action"},
83 {"assignee": "No action field"},
84 {"action": ""},
85 ])
 
 
86 result = _extract_action_items(pm, "text")
87 assert len(result) == 1
88
89 def test_handles_error(self):
90 pm = MagicMock()
91 pm.chat.side_effect = Exception("API down")
92 result = _extract_action_items(pm, "text")
93 assert result == []
94
--- tests/test_pipeline.py
+++ tests/test_pipeline.py
@@ -1,14 +1,11 @@
1 """Tests for the core video processing pipeline."""
2
3 import json
4 from unittest.mock import MagicMock
 
5
6 from video_processor.pipeline import _extract_action_items, _extract_key_points, _format_srt_time
 
 
7
8
9 class TestFormatSrtTime:
10 def test_zero(self):
11 assert _format_srt_time(0) == "00:00:00,000"
@@ -28,27 +25,31 @@
25
26
27 class TestExtractKeyPoints:
28 def test_parses_valid_response(self):
29 pm = MagicMock()
30 pm.chat.return_value = json.dumps(
31 [
32 {"point": "Main point", "topic": "Architecture", "details": "Some details"},
33 {"point": "Second point", "topic": None, "details": None},
34 ]
35 )
36 result = _extract_key_points(pm, "Some transcript text here")
37 assert len(result) == 2
38 assert result[0].point == "Main point"
39 assert result[0].topic == "Architecture"
40 assert result[1].point == "Second point"
41
42 def test_skips_invalid_items(self):
43 pm = MagicMock()
44 pm.chat.return_value = json.dumps(
45 [
46 {"point": "Valid", "topic": None},
47 {"topic": "No point field"},
48 {"point": "", "topic": "Empty point"},
49 ]
50 )
51 result = _extract_key_points(pm, "text")
52 assert len(result) == 1
53 assert result[0].point == "Valid"
54
55 def test_handles_error(self):
@@ -65,29 +66,38 @@
66
67
68 class TestExtractActionItems:
69 def test_parses_valid_response(self):
70 pm = MagicMock()
71 pm.chat.return_value = json.dumps(
72 [
73 {
74 "action": "Deploy fix",
75 "assignee": "Bob",
76 "deadline": "Friday",
77 "priority": "high",
78 "context": "Production",
79 },
80 ]
81 )
82 result = _extract_action_items(pm, "Some transcript text")
83 assert len(result) == 1
84 assert result[0].action == "Deploy fix"
85 assert result[0].assignee == "Bob"
86
87 def test_skips_invalid_items(self):
88 pm = MagicMock()
89 pm.chat.return_value = json.dumps(
90 [
91 {"action": "Valid action"},
92 {"assignee": "No action field"},
93 {"action": ""},
94 ]
95 )
96 result = _extract_action_items(pm, "text")
97 assert len(result) == 1
98
99 def test_handles_error(self):
100 pm = MagicMock()
101 pm.chat.side_effect = Exception("API down")
102 result = _extract_action_items(pm, "text")
103 assert result == []
104
--- tests/test_prompt_templates.py
+++ tests/test_prompt_templates.py
@@ -1,9 +1,7 @@
11
"""Tests for prompt template management."""
22
3
-import pytest
4
-
53
from video_processor.utils.prompt_templates import (
64
DEFAULT_TEMPLATES,
75
PromptTemplate,
86
default_prompt_manager,
97
)
108
--- tests/test_prompt_templates.py
+++ tests/test_prompt_templates.py
@@ -1,9 +1,7 @@
1 """Tests for prompt template management."""
2
3 import pytest
4
5 from video_processor.utils.prompt_templates import (
6 DEFAULT_TEMPLATES,
7 PromptTemplate,
8 default_prompt_manager,
9 )
10
--- tests/test_prompt_templates.py
+++ tests/test_prompt_templates.py
@@ -1,9 +1,7 @@
1 """Tests for prompt template management."""
2
 
 
3 from video_processor.utils.prompt_templates import (
4 DEFAULT_TEMPLATES,
5 PromptTemplate,
6 default_prompt_manager,
7 )
8
--- tests/test_providers.py
+++ tests/test_providers.py
@@ -1,11 +1,9 @@
11
"""Tests for the provider abstraction layer."""
22
33
from unittest.mock import MagicMock, patch
44
5
-import pytest
6
-
75
from video_processor.providers.base import BaseProvider, ModelInfo
86
from video_processor.providers.manager import ProviderManager
97
108
119
class TestModelInfo:
@@ -13,11 +11,16 @@
1311
m = ModelInfo(id="gpt-4o", provider="openai", capabilities=["chat", "vision"])
1412
assert m.id == "gpt-4o"
1513
assert "vision" in m.capabilities
1614
1715
def test_round_trip(self):
18
- m = ModelInfo(id="claude-sonnet-4-5-20250929", provider="anthropic", display_name="Claude Sonnet", capabilities=["chat", "vision"])
16
+ m = ModelInfo(
17
+ id="claude-sonnet-4-5-20250929",
18
+ provider="anthropic",
19
+ display_name="Claude Sonnet",
20
+ capabilities=["chat", "vision"],
21
+ )
1922
restored = ModelInfo.model_validate_json(m.model_dump_json())
2023
assert restored == m
2124
2225
2326
class TestProviderManager:
@@ -107,23 +110,26 @@
107110
class TestDiscovery:
108111
@patch("video_processor.providers.discovery._cached_models", None)
109112
@patch.dict("os.environ", {}, clear=True)
110113
def test_discover_skips_missing_keys(self):
111114
from video_processor.providers.discovery import discover_available_models
115
+
112116
# No API keys -> empty list, no errors
113117
models = discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""})
114118
assert models == []
115119
116120
@patch.dict("os.environ", {}, clear=True)
117121
@patch("video_processor.providers.discovery._cached_models", None)
118122
def test_discover_caches_results(self):
119123
from video_processor.providers import discovery
120124
121
- models = discovery.discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""})
125
+ models = discovery.discover_available_models(
126
+ api_keys={"openai": "", "anthropic": "", "gemini": ""}
127
+ )
122128
assert models == []
123129
# Second call should use cache
124130
models2 = discovery.discover_available_models(api_keys={"openai": "key"})
125131
assert models2 == [] # Still cached empty result
126132
127133
# Force refresh
128134
discovery.clear_discovery_cache()
129135
# Would try to connect with real key, so skip that test
130136
--- tests/test_providers.py
+++ tests/test_providers.py
@@ -1,11 +1,9 @@
1 """Tests for the provider abstraction layer."""
2
3 from unittest.mock import MagicMock, patch
4
5 import pytest
6
7 from video_processor.providers.base import BaseProvider, ModelInfo
8 from video_processor.providers.manager import ProviderManager
9
10
11 class TestModelInfo:
@@ -13,11 +11,16 @@
13 m = ModelInfo(id="gpt-4o", provider="openai", capabilities=["chat", "vision"])
14 assert m.id == "gpt-4o"
15 assert "vision" in m.capabilities
16
17 def test_round_trip(self):
18 m = ModelInfo(id="claude-sonnet-4-5-20250929", provider="anthropic", display_name="Claude Sonnet", capabilities=["chat", "vision"])
 
 
 
 
 
19 restored = ModelInfo.model_validate_json(m.model_dump_json())
20 assert restored == m
21
22
23 class TestProviderManager:
@@ -107,23 +110,26 @@
107 class TestDiscovery:
108 @patch("video_processor.providers.discovery._cached_models", None)
109 @patch.dict("os.environ", {}, clear=True)
110 def test_discover_skips_missing_keys(self):
111 from video_processor.providers.discovery import discover_available_models
 
112 # No API keys -> empty list, no errors
113 models = discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""})
114 assert models == []
115
116 @patch.dict("os.environ", {}, clear=True)
117 @patch("video_processor.providers.discovery._cached_models", None)
118 def test_discover_caches_results(self):
119 from video_processor.providers import discovery
120
121 models = discovery.discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""})
 
 
122 assert models == []
123 # Second call should use cache
124 models2 = discovery.discover_available_models(api_keys={"openai": "key"})
125 assert models2 == [] # Still cached empty result
126
127 # Force refresh
128 discovery.clear_discovery_cache()
129 # Would try to connect with real key, so skip that test
130
--- tests/test_providers.py
+++ tests/test_providers.py
@@ -1,11 +1,9 @@
1 """Tests for the provider abstraction layer."""
2
3 from unittest.mock import MagicMock, patch
4
 
 
5 from video_processor.providers.base import BaseProvider, ModelInfo
6 from video_processor.providers.manager import ProviderManager
7
8
9 class TestModelInfo:
@@ -13,11 +11,16 @@
11 m = ModelInfo(id="gpt-4o", provider="openai", capabilities=["chat", "vision"])
12 assert m.id == "gpt-4o"
13 assert "vision" in m.capabilities
14
15 def test_round_trip(self):
16 m = ModelInfo(
17 id="claude-sonnet-4-5-20250929",
18 provider="anthropic",
19 display_name="Claude Sonnet",
20 capabilities=["chat", "vision"],
21 )
22 restored = ModelInfo.model_validate_json(m.model_dump_json())
23 assert restored == m
24
25
26 class TestProviderManager:
@@ -107,23 +110,26 @@
110 class TestDiscovery:
111 @patch("video_processor.providers.discovery._cached_models", None)
112 @patch.dict("os.environ", {}, clear=True)
113 def test_discover_skips_missing_keys(self):
114 from video_processor.providers.discovery import discover_available_models
115
116 # No API keys -> empty list, no errors
117 models = discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""})
118 assert models == []
119
120 @patch.dict("os.environ", {}, clear=True)
121 @patch("video_processor.providers.discovery._cached_models", None)
122 def test_discover_caches_results(self):
123 from video_processor.providers import discovery
124
125 models = discovery.discover_available_models(
126 api_keys={"openai": "", "anthropic": "", "gemini": ""}
127 )
128 assert models == []
129 # Second call should use cache
130 models2 = discovery.discover_available_models(api_keys={"openai": "key"})
131 assert models2 == [] # Still cached empty result
132
133 # Force refresh
134 discovery.clear_discovery_cache()
135 # Would try to connect with real key, so skip that test
136
--- tests/test_rendering.py
+++ tests/test_rendering.py
@@ -1,12 +1,8 @@
11
"""Tests for rendering and export utilities."""
22
3
-import json
4
-from pathlib import Path
5
-from unittest.mock import MagicMock, patch
6
-
7
-import pytest
3
+from unittest.mock import patch
84
95
from video_processor.models import (
106
ActionItem,
117
DiagramResult,
128
DiagramType,
@@ -101,11 +97,11 @@
10197
assert result == {}
10298
10399
def test_creates_output_dir(self, tmp_path):
104100
nested = tmp_path / "charts" / "output"
105101
data = {"labels": ["A"], "values": [1], "chart_type": "bar"}
106
- result = reproduce_chart(data, nested, "test")
102
+ reproduce_chart(data, nested, "test")
107103
assert nested.exists()
108104
109105
110106
class TestExportAllFormats:
111107
def _make_manifest(self) -> VideoManifest:
@@ -180,11 +176,11 @@
180176
],
181177
)
182178
(tmp_path / "results").mkdir()
183179
(tmp_path / "diagrams").mkdir()
184180
185
- result = export_all_formats(tmp_path, manifest)
181
+ export_all_formats(tmp_path, manifest)
186182
# Chart should be reproduced
187183
chart_svg = tmp_path / "diagrams" / "diagram_0_chart.svg"
188184
assert chart_svg.exists()
189185
190186
191187
--- tests/test_rendering.py
+++ tests/test_rendering.py
@@ -1,12 +1,8 @@
1 """Tests for rendering and export utilities."""
2
3 import json
4 from pathlib import Path
5 from unittest.mock import MagicMock, patch
6
7 import pytest
8
9 from video_processor.models import (
10 ActionItem,
11 DiagramResult,
12 DiagramType,
@@ -101,11 +97,11 @@
101 assert result == {}
102
103 def test_creates_output_dir(self, tmp_path):
104 nested = tmp_path / "charts" / "output"
105 data = {"labels": ["A"], "values": [1], "chart_type": "bar"}
106 result = reproduce_chart(data, nested, "test")
107 assert nested.exists()
108
109
110 class TestExportAllFormats:
111 def _make_manifest(self) -> VideoManifest:
@@ -180,11 +176,11 @@
180 ],
181 )
182 (tmp_path / "results").mkdir()
183 (tmp_path / "diagrams").mkdir()
184
185 result = export_all_formats(tmp_path, manifest)
186 # Chart should be reproduced
187 chart_svg = tmp_path / "diagrams" / "diagram_0_chart.svg"
188 assert chart_svg.exists()
189
190
191
--- tests/test_rendering.py
+++ tests/test_rendering.py
@@ -1,12 +1,8 @@
1 """Tests for rendering and export utilities."""
2
3 from unittest.mock import patch
 
 
 
 
4
5 from video_processor.models import (
6 ActionItem,
7 DiagramResult,
8 DiagramType,
@@ -101,11 +97,11 @@
97 assert result == {}
98
99 def test_creates_output_dir(self, tmp_path):
100 nested = tmp_path / "charts" / "output"
101 data = {"labels": ["A"], "values": [1], "chart_type": "bar"}
102 reproduce_chart(data, nested, "test")
103 assert nested.exists()
104
105
106 class TestExportAllFormats:
107 def _make_manifest(self) -> VideoManifest:
@@ -180,11 +176,11 @@
176 ],
177 )
178 (tmp_path / "results").mkdir()
179 (tmp_path / "diagrams").mkdir()
180
181 export_all_formats(tmp_path, manifest)
182 # Chart should be reproduced
183 chart_svg = tmp_path / "diagrams" / "diagram_0_chart.svg"
184 assert chart_svg.exists()
185
186
187
--- video_processor/agent/orchestrator.py
+++ video_processor/agent/orchestrator.py
@@ -5,17 +5,13 @@
55
import time
66
from pathlib import Path
77
from typing import Any, Dict, List, Optional
88
99
from video_processor.models import (
10
- ActionItem,
11
- DiagramResult,
12
- KeyPoint,
13
- ScreenCapture,
10
+ ProcessingStats,
1411
VideoManifest,
1512
VideoMetadata,
16
- ProcessingStats,
1713
)
1814
from video_processor.providers.manager import ProviderManager
1915
2016
logger = logging.getLogger(__name__)
2117
@@ -107,13 +103,11 @@
107103
plan.append({"step": "generate_reports", "priority": "required"})
108104
109105
self._plan = plan
110106
return plan
111107
112
- def _execute_step(
113
- self, step: Dict[str, Any], input_path: Path, output_dir: Path
114
- ) -> None:
108
+ def _execute_step(self, step: Dict[str, Any], input_path: Path, output_dir: Path) -> None:
115109
"""Execute a single step with retry logic."""
116110
step_name = step["step"]
117111
logger.info(f"Agent step: {step_name}")
118112
119113
for attempt in range(1, self.max_retries + 1):
@@ -141,13 +135,11 @@
141135
result = self._run_step(fallback, input_path, output_dir)
142136
self._results[step_name] = result
143137
except Exception as fe:
144138
logger.error(f"Fallback {fallback} also failed: {fe}")
145139
146
- def _run_step(
147
- self, step_name: str, input_path: Path, output_dir: Path
148
- ) -> Any:
140
+ def _run_step(self, step_name: str, input_path: Path, output_dir: Path) -> Any:
149141
"""Run a specific processing step."""
150142
from video_processor.output_structure import create_video_output_dirs
151143
152144
dirs = create_video_output_dirs(output_dir, input_path.stem)
153145
@@ -177,13 +169,11 @@
177169
transcription = self.pm.transcribe_audio(audio_path)
178170
text = transcription.get("text", "")
179171
180172
# Save transcript
181173
dirs["transcript"].mkdir(parents=True, exist_ok=True)
182
- (dirs["transcript"] / "transcript.json").write_text(
183
- json.dumps(transcription, indent=2)
184
- )
174
+ (dirs["transcript"] / "transcript.json").write_text(json.dumps(transcription, indent=2))
185175
(dirs["transcript"] / "transcript.txt").write_text(text)
186176
return transcription
187177
188178
elif step_name == "detect_diagrams":
189179
from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
@@ -256,23 +246,19 @@
256246
"""Adapt the plan based on step results."""
257247
258248
if completed_step == "transcribe":
259249
text = result.get("text", "") if isinstance(result, dict) else ""
260250
# If transcript is very long, add deep analysis
261
- if len(text) > 10000 and not any(
262
- s["step"] == "deep_analysis" for s in self._plan
263
- ):
251
+ if len(text) > 10000 and not any(s["step"] == "deep_analysis" for s in self._plan):
264252
self._plan.append({"step": "deep_analysis", "priority": "adaptive"})
265253
logger.info("Agent adapted: adding deep analysis for long transcript")
266254
267255
elif completed_step == "detect_diagrams":
268256
diagrams = result.get("diagrams", []) if isinstance(result, dict) else []
269257
captures = result.get("captures", []) if isinstance(result, dict) else []
270258
# If many diagrams found, ensure cross-referencing
271
- if len(diagrams) >= 3 and not any(
272
- s["step"] == "cross_reference" for s in self._plan
273
- ):
259
+ if len(diagrams) >= 3 and not any(s["step"] == "cross_reference" for s in self._plan):
274260
self._plan.append({"step": "cross_reference", "priority": "adaptive"})
275261
logger.info("Agent adapted: adding cross-reference for diagram-heavy video")
276262
277263
if len(captures) > len(diagrams):
278264
self._insights.append(
@@ -358,11 +344,11 @@
358344
359345
transcript = self._results.get("transcribe", {})
360346
kp_result = self._results.get("extract_key_points", {})
361347
key_points = kp_result.get("key_points", [])
362348
ai_result = self._results.get("extract_action_items", {})
363
- action_items = ai_result.get("action_items", [])
349
+ ai_result.get("action_items", [])
364350
diagram_result = self._results.get("detect_diagrams", {})
365351
diagrams = diagram_result.get("diagrams", [])
366352
kg_result = self._results.get("build_knowledge_graph", {})
367353
kg = kg_result.get("knowledge_graph")
368354
369355
--- video_processor/agent/orchestrator.py
+++ video_processor/agent/orchestrator.py
@@ -5,17 +5,13 @@
5 import time
6 from pathlib import Path
7 from typing import Any, Dict, List, Optional
8
9 from video_processor.models import (
10 ActionItem,
11 DiagramResult,
12 KeyPoint,
13 ScreenCapture,
14 VideoManifest,
15 VideoMetadata,
16 ProcessingStats,
17 )
18 from video_processor.providers.manager import ProviderManager
19
20 logger = logging.getLogger(__name__)
21
@@ -107,13 +103,11 @@
107 plan.append({"step": "generate_reports", "priority": "required"})
108
109 self._plan = plan
110 return plan
111
112 def _execute_step(
113 self, step: Dict[str, Any], input_path: Path, output_dir: Path
114 ) -> None:
115 """Execute a single step with retry logic."""
116 step_name = step["step"]
117 logger.info(f"Agent step: {step_name}")
118
119 for attempt in range(1, self.max_retries + 1):
@@ -141,13 +135,11 @@
141 result = self._run_step(fallback, input_path, output_dir)
142 self._results[step_name] = result
143 except Exception as fe:
144 logger.error(f"Fallback {fallback} also failed: {fe}")
145
146 def _run_step(
147 self, step_name: str, input_path: Path, output_dir: Path
148 ) -> Any:
149 """Run a specific processing step."""
150 from video_processor.output_structure import create_video_output_dirs
151
152 dirs = create_video_output_dirs(output_dir, input_path.stem)
153
@@ -177,13 +169,11 @@
177 transcription = self.pm.transcribe_audio(audio_path)
178 text = transcription.get("text", "")
179
180 # Save transcript
181 dirs["transcript"].mkdir(parents=True, exist_ok=True)
182 (dirs["transcript"] / "transcript.json").write_text(
183 json.dumps(transcription, indent=2)
184 )
185 (dirs["transcript"] / "transcript.txt").write_text(text)
186 return transcription
187
188 elif step_name == "detect_diagrams":
189 from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
@@ -256,23 +246,19 @@
256 """Adapt the plan based on step results."""
257
258 if completed_step == "transcribe":
259 text = result.get("text", "") if isinstance(result, dict) else ""
260 # If transcript is very long, add deep analysis
261 if len(text) > 10000 and not any(
262 s["step"] == "deep_analysis" for s in self._plan
263 ):
264 self._plan.append({"step": "deep_analysis", "priority": "adaptive"})
265 logger.info("Agent adapted: adding deep analysis for long transcript")
266
267 elif completed_step == "detect_diagrams":
268 diagrams = result.get("diagrams", []) if isinstance(result, dict) else []
269 captures = result.get("captures", []) if isinstance(result, dict) else []
270 # If many diagrams found, ensure cross-referencing
271 if len(diagrams) >= 3 and not any(
272 s["step"] == "cross_reference" for s in self._plan
273 ):
274 self._plan.append({"step": "cross_reference", "priority": "adaptive"})
275 logger.info("Agent adapted: adding cross-reference for diagram-heavy video")
276
277 if len(captures) > len(diagrams):
278 self._insights.append(
@@ -358,11 +344,11 @@
358
359 transcript = self._results.get("transcribe", {})
360 kp_result = self._results.get("extract_key_points", {})
361 key_points = kp_result.get("key_points", [])
362 ai_result = self._results.get("extract_action_items", {})
363 action_items = ai_result.get("action_items", [])
364 diagram_result = self._results.get("detect_diagrams", {})
365 diagrams = diagram_result.get("diagrams", [])
366 kg_result = self._results.get("build_knowledge_graph", {})
367 kg = kg_result.get("knowledge_graph")
368
369
--- video_processor/agent/orchestrator.py
+++ video_processor/agent/orchestrator.py
@@ -5,17 +5,13 @@
5 import time
6 from pathlib import Path
7 from typing import Any, Dict, List, Optional
8
9 from video_processor.models import (
10 ProcessingStats,
 
 
 
11 VideoManifest,
12 VideoMetadata,
 
13 )
14 from video_processor.providers.manager import ProviderManager
15
16 logger = logging.getLogger(__name__)
17
@@ -107,13 +103,11 @@
103 plan.append({"step": "generate_reports", "priority": "required"})
104
105 self._plan = plan
106 return plan
107
108 def _execute_step(self, step: Dict[str, Any], input_path: Path, output_dir: Path) -> None:
 
 
109 """Execute a single step with retry logic."""
110 step_name = step["step"]
111 logger.info(f"Agent step: {step_name}")
112
113 for attempt in range(1, self.max_retries + 1):
@@ -141,13 +135,11 @@
135 result = self._run_step(fallback, input_path, output_dir)
136 self._results[step_name] = result
137 except Exception as fe:
138 logger.error(f"Fallback {fallback} also failed: {fe}")
139
140 def _run_step(self, step_name: str, input_path: Path, output_dir: Path) -> Any:
 
 
141 """Run a specific processing step."""
142 from video_processor.output_structure import create_video_output_dirs
143
144 dirs = create_video_output_dirs(output_dir, input_path.stem)
145
@@ -177,13 +169,11 @@
169 transcription = self.pm.transcribe_audio(audio_path)
170 text = transcription.get("text", "")
171
172 # Save transcript
173 dirs["transcript"].mkdir(parents=True, exist_ok=True)
174 (dirs["transcript"] / "transcript.json").write_text(json.dumps(transcription, indent=2))
 
 
175 (dirs["transcript"] / "transcript.txt").write_text(text)
176 return transcription
177
178 elif step_name == "detect_diagrams":
179 from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
@@ -256,23 +246,19 @@
246 """Adapt the plan based on step results."""
247
248 if completed_step == "transcribe":
249 text = result.get("text", "") if isinstance(result, dict) else ""
250 # If transcript is very long, add deep analysis
251 if len(text) > 10000 and not any(s["step"] == "deep_analysis" for s in self._plan):
 
 
252 self._plan.append({"step": "deep_analysis", "priority": "adaptive"})
253 logger.info("Agent adapted: adding deep analysis for long transcript")
254
255 elif completed_step == "detect_diagrams":
256 diagrams = result.get("diagrams", []) if isinstance(result, dict) else []
257 captures = result.get("captures", []) if isinstance(result, dict) else []
258 # If many diagrams found, ensure cross-referencing
259 if len(diagrams) >= 3 and not any(s["step"] == "cross_reference" for s in self._plan):
 
 
260 self._plan.append({"step": "cross_reference", "priority": "adaptive"})
261 logger.info("Agent adapted: adding cross-reference for diagram-heavy video")
262
263 if len(captures) > len(diagrams):
264 self._insights.append(
@@ -358,11 +344,11 @@
344
345 transcript = self._results.get("transcribe", {})
346 kp_result = self._results.get("extract_key_points", {})
347 key_points = kp_result.get("key_points", [])
348 ai_result = self._results.get("extract_action_items", {})
349 ai_result.get("action_items", [])
350 diagram_result = self._results.get("detect_diagrams", {})
351 diagrams = diagram_result.get("diagrams", [])
352 kg_result = self._results.get("build_knowledge_graph", {})
353 kg = kg_result.get("knowledge_graph")
354
355
--- video_processor/analyzers/action_detector.py
+++ video_processor/analyzers/action_detector.py
@@ -150,23 +150,25 @@
150150
return []
151151
152152
def _pattern_extract(self, text: str) -> List[ActionItem]:
153153
"""Extract action items using regex pattern matching."""
154154
items: List[ActionItem] = []
155
- sentences = re.split(r'[.!?]\s+', text)
155
+ sentences = re.split(r"[.!?]\s+", text)
156156
157157
for sentence in sentences:
158158
sentence = sentence.strip()
159159
if not sentence or len(sentence) < 10:
160160
continue
161161
162162
for pattern in _ACTION_PATTERNS:
163163
if pattern.search(sentence):
164
- items.append(ActionItem(
165
- action=sentence,
166
- source="transcript",
167
- ))
164
+ items.append(
165
+ ActionItem(
166
+ action=sentence,
167
+ source="transcript",
168
+ )
169
+ )
168170
break # One match per sentence is enough
169171
170172
return items
171173
172174
def _attach_timestamps(
173175
--- video_processor/analyzers/action_detector.py
+++ video_processor/analyzers/action_detector.py
@@ -150,23 +150,25 @@
150 return []
151
152 def _pattern_extract(self, text: str) -> List[ActionItem]:
153 """Extract action items using regex pattern matching."""
154 items: List[ActionItem] = []
155 sentences = re.split(r'[.!?]\s+', text)
156
157 for sentence in sentences:
158 sentence = sentence.strip()
159 if not sentence or len(sentence) < 10:
160 continue
161
162 for pattern in _ACTION_PATTERNS:
163 if pattern.search(sentence):
164 items.append(ActionItem(
165 action=sentence,
166 source="transcript",
167 ))
 
 
168 break # One match per sentence is enough
169
170 return items
171
172 def _attach_timestamps(
173
--- video_processor/analyzers/action_detector.py
+++ video_processor/analyzers/action_detector.py
@@ -150,23 +150,25 @@
150 return []
151
152 def _pattern_extract(self, text: str) -> List[ActionItem]:
153 """Extract action items using regex pattern matching."""
154 items: List[ActionItem] = []
155 sentences = re.split(r"[.!?]\s+", text)
156
157 for sentence in sentences:
158 sentence = sentence.strip()
159 if not sentence or len(sentence) < 10:
160 continue
161
162 for pattern in _ACTION_PATTERNS:
163 if pattern.search(sentence):
164 items.append(
165 ActionItem(
166 action=sentence,
167 source="transcript",
168 )
169 )
170 break # One match per sentence is enough
171
172 return items
173
174 def _attach_timestamps(
175
--- video_processor/analyzers/content_analyzer.py
+++ video_processor/analyzers/content_analyzer.py
@@ -58,18 +58,18 @@
5858
)
5959
6060
# LLM fuzzy matching for unmatched entities
6161
if self.pm:
6262
unmatched_t = [
63
- e for e in transcript_entities if e.name.lower() not in {
64
- d.name.lower() for d in diagram_entities
65
- }
63
+ e
64
+ for e in transcript_entities
65
+ if e.name.lower() not in {d.name.lower() for d in diagram_entities}
6666
]
6767
unmatched_d = [
68
- e for e in diagram_entities if e.name.lower() not in {
69
- t.name.lower() for t in transcript_entities
70
- }
68
+ e
69
+ for e in diagram_entities
70
+ if e.name.lower() not in {t.name.lower() for t in transcript_entities}
7171
]
7272
7373
if unmatched_t and unmatched_d:
7474
matches = self._fuzzy_match(unmatched_t, unmatched_d)
7575
for t_name, d_name in matches:
@@ -136,11 +136,13 @@
136136
137137
# Build diagram entity index
138138
diagram_entities: dict[int, set[str]] = {}
139139
for i, d in enumerate(diagrams):
140140
elements = d.get("elements", []) if isinstance(d, dict) else getattr(d, "elements", [])
141
- text = d.get("text_content", "") if isinstance(d, dict) else getattr(d, "text_content", "")
141
+ text = (
142
+ d.get("text_content", "") if isinstance(d, dict) else getattr(d, "text_content", "")
143
+ )
142144
entities = set(str(e).lower() for e in elements)
143145
if text:
144146
entities.update(word.lower() for word in text.split() if len(word) > 3)
145147
diagram_entities[i] = entities
146148
147149
--- video_processor/analyzers/content_analyzer.py
+++ video_processor/analyzers/content_analyzer.py
@@ -58,18 +58,18 @@
58 )
59
60 # LLM fuzzy matching for unmatched entities
61 if self.pm:
62 unmatched_t = [
63 e for e in transcript_entities if e.name.lower() not in {
64 d.name.lower() for d in diagram_entities
65 }
66 ]
67 unmatched_d = [
68 e for e in diagram_entities if e.name.lower() not in {
69 t.name.lower() for t in transcript_entities
70 }
71 ]
72
73 if unmatched_t and unmatched_d:
74 matches = self._fuzzy_match(unmatched_t, unmatched_d)
75 for t_name, d_name in matches:
@@ -136,11 +136,13 @@
136
137 # Build diagram entity index
138 diagram_entities: dict[int, set[str]] = {}
139 for i, d in enumerate(diagrams):
140 elements = d.get("elements", []) if isinstance(d, dict) else getattr(d, "elements", [])
141 text = d.get("text_content", "") if isinstance(d, dict) else getattr(d, "text_content", "")
 
 
142 entities = set(str(e).lower() for e in elements)
143 if text:
144 entities.update(word.lower() for word in text.split() if len(word) > 3)
145 diagram_entities[i] = entities
146
147
--- video_processor/analyzers/content_analyzer.py
+++ video_processor/analyzers/content_analyzer.py
@@ -58,18 +58,18 @@
58 )
59
60 # LLM fuzzy matching for unmatched entities
61 if self.pm:
62 unmatched_t = [
63 e
64 for e in transcript_entities
65 if e.name.lower() not in {d.name.lower() for d in diagram_entities}
66 ]
67 unmatched_d = [
68 e
69 for e in diagram_entities
70 if e.name.lower() not in {t.name.lower() for t in transcript_entities}
71 ]
72
73 if unmatched_t and unmatched_d:
74 matches = self._fuzzy_match(unmatched_t, unmatched_d)
75 for t_name, d_name in matches:
@@ -136,11 +136,13 @@
136
137 # Build diagram entity index
138 diagram_entities: dict[int, set[str]] = {}
139 for i, d in enumerate(diagrams):
140 elements = d.get("elements", []) if isinstance(d, dict) else getattr(d, "elements", [])
141 text = (
142 d.get("text_content", "") if isinstance(d, dict) else getattr(d, "text_content", "")
143 )
144 entities = set(str(e).lower() for e in elements)
145 if text:
146 entities.update(word.lower() for word in text.split() if len(word) > 3)
147 diagram_entities[i] = entities
148
149
--- video_processor/analyzers/diagram_analyzer.py
+++ video_processor/analyzers/diagram_analyzer.py
@@ -24,23 +24,25 @@
2424
shared/presented content, NOT people or camera views.
2525
2626
Return ONLY a JSON object (no markdown fences):
2727
{
2828
"is_diagram": true/false,
29
- "diagram_type": "flowchart"|"sequence"|"architecture"|"whiteboard"|"chart"|"table"|"slide"|"screenshot"|"unknown",
29
+ "diagram_type": "flowchart"|"sequence"|"architecture"
30
+ |"whiteboard"|"chart"|"table"|"slide"|"screenshot"|"unknown",
3031
"confidence": 0.0 to 1.0,
3132
"content_type": "slide"|"diagram"|"document"|"screen_share"|"whiteboard"|"chart"|"person"|"other",
3233
"brief_description": "one-sentence description of what you see"
3334
}
3435
"""
3536
3637
# Single-pass analysis prompt — extracts everything in one call
3738
_ANALYSIS_PROMPT = """\
38
-Analyze this diagram/visual content comprehensively. Extract ALL of the following in a single JSON response (no markdown fences):
39
-
39
+Analyze this diagram/visual content comprehensively. Extract ALL of the
40
+following in a single JSON response (no markdown fences):
4041
{
41
- "diagram_type": "flowchart"|"sequence"|"architecture"|"whiteboard"|"chart"|"table"|"slide"|"screenshot"|"unknown",
42
+ "diagram_type": "flowchart"|"sequence"|"architecture"
43
+ |"whiteboard"|"chart"|"table"|"slide"|"screenshot"|"unknown",
4244
"description": "detailed description of the visual content",
4345
"text_content": "all visible text, preserving structure",
4446
"elements": ["list", "of", "identified", "elements/components"],
4547
"relationships": ["element A -> element B: relationship", ...],
4648
"mermaid": "mermaid diagram syntax representing this visual (graph LR, sequenceDiagram, etc.)",
@@ -68,12 +70,11 @@
6870
# Strip markdown fences
6971
cleaned = text.strip()
7072
if cleaned.startswith("```"):
7173
lines = cleaned.split("\n")
7274
# Remove first and last fence lines
73
- lines = [l for l in lines if not l.strip().startswith("```")]
74
- cleaned = "\n".join(lines)
75
+ lines = [line for line in lines if not line.strip().startswith("```")]
7576
try:
7677
return json.loads(cleaned)
7778
except json.JSONDecodeError:
7879
# Try to find JSON object in the text
7980
start = cleaned.find("{")
@@ -105,11 +106,16 @@
105106
"""
106107
image_bytes = _read_image_bytes(image_path)
107108
raw = self.pm.analyze_image(image_bytes, _CLASSIFY_PROMPT, max_tokens=512)
108109
result = _parse_json_response(raw)
109110
if result is None:
110
- return {"is_diagram": False, "diagram_type": "unknown", "confidence": 0.0, "brief_description": ""}
111
+ return {
112
+ "is_diagram": False,
113
+ "diagram_type": "unknown",
114
+ "confidence": 0.0,
115
+ "brief_description": "",
116
+ }
111117
return result
112118
113119
def analyze_diagram_single_pass(self, image_path: Union[str, Path]) -> dict:
114120
"""
115121
Full single-pass diagram analysis — description, text, mermaid, chart data.
@@ -163,15 +169,19 @@
163169
logger.debug(f"Frame {i}: confidence {confidence:.2f} below threshold, skipping")
164170
continue
165171
166172
if confidence >= 0.7:
167173
# Full diagram analysis
168
- logger.info(f"Frame {i}: diagram detected (confidence {confidence:.2f}), analyzing...")
174
+ logger.info(
175
+ f"Frame {i}: diagram detected (confidence {confidence:.2f}), analyzing..."
176
+ )
169177
try:
170178
analysis = self.analyze_diagram_single_pass(fp)
171179
except Exception as e:
172
- logger.warning(f"Diagram analysis failed for frame {i}: {e}, falling back to screengrab")
180
+ logger.warning(
181
+ f"Diagram analysis failed for frame {i}: {e}, falling back to screengrab"
182
+ )
173183
analysis = {}
174184
175185
if not analysis:
176186
# Analysis failed — fall back to screengrab
177187
capture = self._save_screengrab(fp, i, capture_idx, captures_dir, confidence)
@@ -221,16 +231,20 @@
221231
diagrams.append(dr)
222232
diagram_idx += 1
223233
224234
else:
225235
# Screengrab fallback (0.3 <= confidence < 0.7)
226
- logger.info(f"Frame {i}: uncertain (confidence {confidence:.2f}), saving as screengrab")
236
+ logger.info(
237
+ f"Frame {i}: uncertain (confidence {confidence:.2f}), saving as screengrab"
238
+ )
227239
capture = self._save_screengrab(fp, i, capture_idx, captures_dir, confidence)
228240
captures.append(capture)
229241
capture_idx += 1
230242
231
- logger.info(f"Diagram processing complete: {len(diagrams)} diagrams, {len(captures)} screengrabs")
243
+ logger.info(
244
+ f"Diagram processing complete: {len(diagrams)} diagrams, {len(captures)} screengrabs"
245
+ )
232246
return diagrams, captures
233247
234248
def _save_screengrab(
235249
self,
236250
frame_path: Path,
237251
--- video_processor/analyzers/diagram_analyzer.py
+++ video_processor/analyzers/diagram_analyzer.py
@@ -24,23 +24,25 @@
24 shared/presented content, NOT people or camera views.
25
26 Return ONLY a JSON object (no markdown fences):
27 {
28 "is_diagram": true/false,
29 "diagram_type": "flowchart"|"sequence"|"architecture"|"whiteboard"|"chart"|"table"|"slide"|"screenshot"|"unknown",
 
30 "confidence": 0.0 to 1.0,
31 "content_type": "slide"|"diagram"|"document"|"screen_share"|"whiteboard"|"chart"|"person"|"other",
32 "brief_description": "one-sentence description of what you see"
33 }
34 """
35
36 # Single-pass analysis prompt — extracts everything in one call
37 _ANALYSIS_PROMPT = """\
38 Analyze this diagram/visual content comprehensively. Extract ALL of the following in a single JSON response (no markdown fences):
39
40 {
41 "diagram_type": "flowchart"|"sequence"|"architecture"|"whiteboard"|"chart"|"table"|"slide"|"screenshot"|"unknown",
 
42 "description": "detailed description of the visual content",
43 "text_content": "all visible text, preserving structure",
44 "elements": ["list", "of", "identified", "elements/components"],
45 "relationships": ["element A -> element B: relationship", ...],
46 "mermaid": "mermaid diagram syntax representing this visual (graph LR, sequenceDiagram, etc.)",
@@ -68,12 +70,11 @@
68 # Strip markdown fences
69 cleaned = text.strip()
70 if cleaned.startswith("```"):
71 lines = cleaned.split("\n")
72 # Remove first and last fence lines
73 lines = [l for l in lines if not l.strip().startswith("```")]
74 cleaned = "\n".join(lines)
75 try:
76 return json.loads(cleaned)
77 except json.JSONDecodeError:
78 # Try to find JSON object in the text
79 start = cleaned.find("{")
@@ -105,11 +106,16 @@
105 """
106 image_bytes = _read_image_bytes(image_path)
107 raw = self.pm.analyze_image(image_bytes, _CLASSIFY_PROMPT, max_tokens=512)
108 result = _parse_json_response(raw)
109 if result is None:
110 return {"is_diagram": False, "diagram_type": "unknown", "confidence": 0.0, "brief_description": ""}
 
 
 
 
 
111 return result
112
113 def analyze_diagram_single_pass(self, image_path: Union[str, Path]) -> dict:
114 """
115 Full single-pass diagram analysis — description, text, mermaid, chart data.
@@ -163,15 +169,19 @@
163 logger.debug(f"Frame {i}: confidence {confidence:.2f} below threshold, skipping")
164 continue
165
166 if confidence >= 0.7:
167 # Full diagram analysis
168 logger.info(f"Frame {i}: diagram detected (confidence {confidence:.2f}), analyzing...")
 
 
169 try:
170 analysis = self.analyze_diagram_single_pass(fp)
171 except Exception as e:
172 logger.warning(f"Diagram analysis failed for frame {i}: {e}, falling back to screengrab")
 
 
173 analysis = {}
174
175 if not analysis:
176 # Analysis failed — fall back to screengrab
177 capture = self._save_screengrab(fp, i, capture_idx, captures_dir, confidence)
@@ -221,16 +231,20 @@
221 diagrams.append(dr)
222 diagram_idx += 1
223
224 else:
225 # Screengrab fallback (0.3 <= confidence < 0.7)
226 logger.info(f"Frame {i}: uncertain (confidence {confidence:.2f}), saving as screengrab")
 
 
227 capture = self._save_screengrab(fp, i, capture_idx, captures_dir, confidence)
228 captures.append(capture)
229 capture_idx += 1
230
231 logger.info(f"Diagram processing complete: {len(diagrams)} diagrams, {len(captures)} screengrabs")
 
 
232 return diagrams, captures
233
234 def _save_screengrab(
235 self,
236 frame_path: Path,
237
--- video_processor/analyzers/diagram_analyzer.py
+++ video_processor/analyzers/diagram_analyzer.py
@@ -24,23 +24,25 @@
24 shared/presented content, NOT people or camera views.
25
26 Return ONLY a JSON object (no markdown fences):
27 {
28 "is_diagram": true/false,
29 "diagram_type": "flowchart"|"sequence"|"architecture"
30 |"whiteboard"|"chart"|"table"|"slide"|"screenshot"|"unknown",
31 "confidence": 0.0 to 1.0,
32 "content_type": "slide"|"diagram"|"document"|"screen_share"|"whiteboard"|"chart"|"person"|"other",
33 "brief_description": "one-sentence description of what you see"
34 }
35 """
36
37 # Single-pass analysis prompt — extracts everything in one call
38 _ANALYSIS_PROMPT = """\
39 Analyze this diagram/visual content comprehensively. Extract ALL of the
40 following in a single JSON response (no markdown fences):
41 {
42 "diagram_type": "flowchart"|"sequence"|"architecture"
43 |"whiteboard"|"chart"|"table"|"slide"|"screenshot"|"unknown",
44 "description": "detailed description of the visual content",
45 "text_content": "all visible text, preserving structure",
46 "elements": ["list", "of", "identified", "elements/components"],
47 "relationships": ["element A -> element B: relationship", ...],
48 "mermaid": "mermaid diagram syntax representing this visual (graph LR, sequenceDiagram, etc.)",
@@ -68,12 +70,11 @@
70 # Strip markdown fences
71 cleaned = text.strip()
72 if cleaned.startswith("```"):
73 lines = cleaned.split("\n")
74 # Remove first and last fence lines
75 lines = [line for line in lines if not line.strip().startswith("```")]
 
76 try:
77 return json.loads(cleaned)
78 except json.JSONDecodeError:
79 # Try to find JSON object in the text
80 start = cleaned.find("{")
@@ -105,11 +106,16 @@
106 """
107 image_bytes = _read_image_bytes(image_path)
108 raw = self.pm.analyze_image(image_bytes, _CLASSIFY_PROMPT, max_tokens=512)
109 result = _parse_json_response(raw)
110 if result is None:
111 return {
112 "is_diagram": False,
113 "diagram_type": "unknown",
114 "confidence": 0.0,
115 "brief_description": "",
116 }
117 return result
118
119 def analyze_diagram_single_pass(self, image_path: Union[str, Path]) -> dict:
120 """
121 Full single-pass diagram analysis — description, text, mermaid, chart data.
@@ -163,15 +169,19 @@
169 logger.debug(f"Frame {i}: confidence {confidence:.2f} below threshold, skipping")
170 continue
171
172 if confidence >= 0.7:
173 # Full diagram analysis
174 logger.info(
175 f"Frame {i}: diagram detected (confidence {confidence:.2f}), analyzing..."
176 )
177 try:
178 analysis = self.analyze_diagram_single_pass(fp)
179 except Exception as e:
180 logger.warning(
181 f"Diagram analysis failed for frame {i}: {e}, falling back to screengrab"
182 )
183 analysis = {}
184
185 if not analysis:
186 # Analysis failed — fall back to screengrab
187 capture = self._save_screengrab(fp, i, capture_idx, captures_dir, confidence)
@@ -221,16 +231,20 @@
231 diagrams.append(dr)
232 diagram_idx += 1
233
234 else:
235 # Screengrab fallback (0.3 <= confidence < 0.7)
236 logger.info(
237 f"Frame {i}: uncertain (confidence {confidence:.2f}), saving as screengrab"
238 )
239 capture = self._save_screengrab(fp, i, capture_idx, captures_dir, confidence)
240 captures.append(capture)
241 capture_idx += 1
242
243 logger.info(
244 f"Diagram processing complete: {len(diagrams)} diagrams, {len(captures)} screengrabs"
245 )
246 return diagrams, captures
247
248 def _save_screengrab(
249 self,
250 frame_path: Path,
251
--- video_processor/cli/commands.py
+++ video_processor/cli/commands.py
@@ -2,13 +2,11 @@
22
33
import json
44
import logging
55
import os
66
import sys
7
-import time
87
from pathlib import Path
9
-from typing import List, Optional
108
119
import click
1210
import colorlog
1311
from tqdm import tqdm
1412
@@ -49,23 +47,32 @@
4947
if ctx.invoked_subcommand is None:
5048
_interactive_menu(ctx)
5149
5250
5351
@cli.command()
54
-@click.option("--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path")
52
+@click.option(
53
+ "--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path"
54
+)
5555
@click.option("--output", "-o", required=True, type=click.Path(), help="Output directory")
5656
@click.option(
5757
"--depth",
5858
type=click.Choice(["basic", "standard", "comprehensive"]),
5959
default="standard",
6060
help="Processing depth",
6161
)
62
-@click.option("--focus", type=str, help='Comma-separated focus areas (e.g., "diagrams,action-items")')
62
+@click.option(
63
+ "--focus", type=str, help='Comma-separated focus areas (e.g., "diagrams,action-items")'
64
+)
6365
@click.option("--use-gpu", is_flag=True, help="Enable GPU acceleration if available")
6466
@click.option("--sampling-rate", type=float, default=0.5, help="Frame sampling rate")
6567
@click.option("--change-threshold", type=float, default=0.15, help="Visual change threshold")
66
-@click.option("--periodic-capture", type=float, default=30.0, help="Capture a frame every N seconds regardless of change (0 to disable)")
68
+@click.option(
69
+ "--periodic-capture",
70
+ type=float,
71
+ default=30.0,
72
+ help="Capture a frame every N seconds regardless of change (0 to disable)",
73
+)
6774
@click.option("--title", type=str, help="Title for the analysis report")
6875
@click.option(
6976
"--provider",
7077
"-p",
7178
type=click.Choice(["auto", "openai", "anthropic", "gemini"]),
@@ -102,11 +109,11 @@
102109
chat_model=chat_model,
103110
provider=prov,
104111
)
105112
106113
try:
107
- manifest = process_single_video(
114
+ process_single_video(
108115
input_path=input,
109116
output_dir=output,
110117
provider_manager=pm,
111118
depth=depth,
112119
focus_areas=focus_areas,
@@ -127,11 +134,13 @@
127134
traceback.print_exc()
128135
sys.exit(1)
129136
130137
131138
@cli.command()
132
-@click.option("--input-dir", "-i", type=click.Path(), default=None, help="Local directory of videos")
139
+@click.option(
140
+ "--input-dir", "-i", type=click.Path(), default=None, help="Local directory of videos"
141
+)
133142
@click.option("--output", "-o", required=True, type=click.Path(), help="Output directory")
134143
@click.option(
135144
"--depth",
136145
type=click.Choice(["basic", "standard", "comprehensive"]),
137146
default="standard",
@@ -159,20 +168,35 @@
159168
default="local",
160169
help="Video source (local directory, Google Drive, or Dropbox)",
161170
)
162171
@click.option("--folder-id", type=str, default=None, help="Google Drive folder ID")
163172
@click.option("--folder-path", type=str, default=None, help="Cloud folder path")
164
-@click.option("--recursive/--no-recursive", default=True, help="Recurse into subfolders (default: recursive)")
173
+@click.option(
174
+ "--recursive/--no-recursive", default=True, help="Recurse into subfolders (default: recursive)"
175
+)
165176
@click.pass_context
166
-def batch(ctx, input_dir, output, depth, pattern, title, provider, vision_model, chat_model, source, folder_id, folder_path, recursive):
177
+def batch(
178
+ ctx,
179
+ input_dir,
180
+ output,
181
+ depth,
182
+ pattern,
183
+ title,
184
+ provider,
185
+ vision_model,
186
+ chat_model,
187
+ source,
188
+ folder_id,
189
+ folder_path,
190
+ recursive,
191
+):
167192
"""Process a folder of videos in batch."""
168193
from video_processor.integrators.knowledge_graph import KnowledgeGraph
169194
from video_processor.integrators.plan_generator import PlanGenerator
170195
from video_processor.models import BatchManifest, BatchVideoEntry
171196
from video_processor.output_structure import (
172197
create_batch_output_dirs,
173
- read_video_manifest,
174198
write_batch_manifest,
175199
)
176200
from video_processor.pipeline import process_single_video
177201
from video_processor.providers.manager import ProviderManager
178202
@@ -190,21 +214,23 @@
190214
191215
cloud = GoogleDriveSource()
192216
if not cloud.authenticate():
193217
logging.error("Google Drive authentication failed")
194218
sys.exit(1)
195
- cloud_files = cloud.list_videos(folder_id=folder_id, folder_path=folder_path, patterns=patterns, recursive=recursive)
196
- local_paths = cloud.download_all(cloud_files, download_dir)
219
+ cloud_files = cloud.list_videos(
220
+ folder_id=folder_id, folder_path=folder_path, patterns=patterns, recursive=recursive
221
+ )
222
+ cloud.download_all(cloud_files, download_dir)
197223
elif source == "dropbox":
198224
from video_processor.sources.dropbox_source import DropboxSource
199225
200226
cloud = DropboxSource()
201227
if not cloud.authenticate():
202228
logging.error("Dropbox authentication failed")
203229
sys.exit(1)
204230
cloud_files = cloud.list_videos(folder_path=folder_path, patterns=patterns)
205
- local_paths = cloud.download_all(cloud_files, download_dir)
231
+ cloud.download_all(cloud_files, download_dir)
206232
else:
207233
logging.error(f"Unknown source: {source}")
208234
sys.exit(1)
209235
210236
input_dir = download_dir
@@ -302,11 +328,14 @@
302328
batch_summary_md="batch_summary.md",
303329
merged_knowledge_graph_json="knowledge_graph.json",
304330
)
305331
write_batch_manifest(batch_manifest, output)
306332
click.echo(pm.usage.format_summary())
307
- click.echo(f"\n Batch complete: {batch_manifest.completed_videos}/{batch_manifest.total_videos} succeeded")
333
+ click.echo(
334
+ f"\n Batch complete: {batch_manifest.completed_videos}"
335
+ f"/{batch_manifest.total_videos} succeeded"
336
+ )
308337
click.echo(f" Results: {output}/batch_manifest.json")
309338
310339
311340
@cli.command("list-models")
312341
@click.pass_context
@@ -374,11 +403,13 @@
374403
traceback.print_exc()
375404
sys.exit(1)
376405
377406
378407
@cli.command("agent-analyze")
379
-@click.option("--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path")
408
+@click.option(
409
+ "--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path"
410
+)
380411
@click.option("--output", "-o", required=True, type=click.Path(), help="Output directory")
381412
@click.option(
382413
"--depth",
383414
type=click.Choice(["basic", "standard", "comprehensive"]),
384415
default="standard",
385416
--- video_processor/cli/commands.py
+++ video_processor/cli/commands.py
@@ -2,13 +2,11 @@
2
3 import json
4 import logging
5 import os
6 import sys
7 import time
8 from pathlib import Path
9 from typing import List, Optional
10
11 import click
12 import colorlog
13 from tqdm import tqdm
14
@@ -49,23 +47,32 @@
49 if ctx.invoked_subcommand is None:
50 _interactive_menu(ctx)
51
52
53 @cli.command()
54 @click.option("--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path")
 
 
55 @click.option("--output", "-o", required=True, type=click.Path(), help="Output directory")
56 @click.option(
57 "--depth",
58 type=click.Choice(["basic", "standard", "comprehensive"]),
59 default="standard",
60 help="Processing depth",
61 )
62 @click.option("--focus", type=str, help='Comma-separated focus areas (e.g., "diagrams,action-items")')
 
 
63 @click.option("--use-gpu", is_flag=True, help="Enable GPU acceleration if available")
64 @click.option("--sampling-rate", type=float, default=0.5, help="Frame sampling rate")
65 @click.option("--change-threshold", type=float, default=0.15, help="Visual change threshold")
66 @click.option("--periodic-capture", type=float, default=30.0, help="Capture a frame every N seconds regardless of change (0 to disable)")
 
 
 
 
 
67 @click.option("--title", type=str, help="Title for the analysis report")
68 @click.option(
69 "--provider",
70 "-p",
71 type=click.Choice(["auto", "openai", "anthropic", "gemini"]),
@@ -102,11 +109,11 @@
102 chat_model=chat_model,
103 provider=prov,
104 )
105
106 try:
107 manifest = process_single_video(
108 input_path=input,
109 output_dir=output,
110 provider_manager=pm,
111 depth=depth,
112 focus_areas=focus_areas,
@@ -127,11 +134,13 @@
127 traceback.print_exc()
128 sys.exit(1)
129
130
131 @cli.command()
132 @click.option("--input-dir", "-i", type=click.Path(), default=None, help="Local directory of videos")
 
 
133 @click.option("--output", "-o", required=True, type=click.Path(), help="Output directory")
134 @click.option(
135 "--depth",
136 type=click.Choice(["basic", "standard", "comprehensive"]),
137 default="standard",
@@ -159,20 +168,35 @@
159 default="local",
160 help="Video source (local directory, Google Drive, or Dropbox)",
161 )
162 @click.option("--folder-id", type=str, default=None, help="Google Drive folder ID")
163 @click.option("--folder-path", type=str, default=None, help="Cloud folder path")
164 @click.option("--recursive/--no-recursive", default=True, help="Recurse into subfolders (default: recursive)")
 
 
165 @click.pass_context
166 def batch(ctx, input_dir, output, depth, pattern, title, provider, vision_model, chat_model, source, folder_id, folder_path, recursive):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
167 """Process a folder of videos in batch."""
168 from video_processor.integrators.knowledge_graph import KnowledgeGraph
169 from video_processor.integrators.plan_generator import PlanGenerator
170 from video_processor.models import BatchManifest, BatchVideoEntry
171 from video_processor.output_structure import (
172 create_batch_output_dirs,
173 read_video_manifest,
174 write_batch_manifest,
175 )
176 from video_processor.pipeline import process_single_video
177 from video_processor.providers.manager import ProviderManager
178
@@ -190,21 +214,23 @@
190
191 cloud = GoogleDriveSource()
192 if not cloud.authenticate():
193 logging.error("Google Drive authentication failed")
194 sys.exit(1)
195 cloud_files = cloud.list_videos(folder_id=folder_id, folder_path=folder_path, patterns=patterns, recursive=recursive)
196 local_paths = cloud.download_all(cloud_files, download_dir)
 
 
197 elif source == "dropbox":
198 from video_processor.sources.dropbox_source import DropboxSource
199
200 cloud = DropboxSource()
201 if not cloud.authenticate():
202 logging.error("Dropbox authentication failed")
203 sys.exit(1)
204 cloud_files = cloud.list_videos(folder_path=folder_path, patterns=patterns)
205 local_paths = cloud.download_all(cloud_files, download_dir)
206 else:
207 logging.error(f"Unknown source: {source}")
208 sys.exit(1)
209
210 input_dir = download_dir
@@ -302,11 +328,14 @@
302 batch_summary_md="batch_summary.md",
303 merged_knowledge_graph_json="knowledge_graph.json",
304 )
305 write_batch_manifest(batch_manifest, output)
306 click.echo(pm.usage.format_summary())
307 click.echo(f"\n Batch complete: {batch_manifest.completed_videos}/{batch_manifest.total_videos} succeeded")
 
 
 
308 click.echo(f" Results: {output}/batch_manifest.json")
309
310
311 @cli.command("list-models")
312 @click.pass_context
@@ -374,11 +403,13 @@
374 traceback.print_exc()
375 sys.exit(1)
376
377
378 @cli.command("agent-analyze")
379 @click.option("--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path")
 
 
380 @click.option("--output", "-o", required=True, type=click.Path(), help="Output directory")
381 @click.option(
382 "--depth",
383 type=click.Choice(["basic", "standard", "comprehensive"]),
384 default="standard",
385
--- video_processor/cli/commands.py
+++ video_processor/cli/commands.py
@@ -2,13 +2,11 @@
2
3 import json
4 import logging
5 import os
6 import sys
 
7 from pathlib import Path
 
8
9 import click
10 import colorlog
11 from tqdm import tqdm
12
@@ -49,23 +47,32 @@
47 if ctx.invoked_subcommand is None:
48 _interactive_menu(ctx)
49
50
51 @cli.command()
52 @click.option(
53 "--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path"
54 )
55 @click.option("--output", "-o", required=True, type=click.Path(), help="Output directory")
56 @click.option(
57 "--depth",
58 type=click.Choice(["basic", "standard", "comprehensive"]),
59 default="standard",
60 help="Processing depth",
61 )
62 @click.option(
63 "--focus", type=str, help='Comma-separated focus areas (e.g., "diagrams,action-items")'
64 )
65 @click.option("--use-gpu", is_flag=True, help="Enable GPU acceleration if available")
66 @click.option("--sampling-rate", type=float, default=0.5, help="Frame sampling rate")
67 @click.option("--change-threshold", type=float, default=0.15, help="Visual change threshold")
68 @click.option(
69 "--periodic-capture",
70 type=float,
71 default=30.0,
72 help="Capture a frame every N seconds regardless of change (0 to disable)",
73 )
74 @click.option("--title", type=str, help="Title for the analysis report")
75 @click.option(
76 "--provider",
77 "-p",
78 type=click.Choice(["auto", "openai", "anthropic", "gemini"]),
@@ -102,11 +109,11 @@
109 chat_model=chat_model,
110 provider=prov,
111 )
112
113 try:
114 process_single_video(
115 input_path=input,
116 output_dir=output,
117 provider_manager=pm,
118 depth=depth,
119 focus_areas=focus_areas,
@@ -127,11 +134,13 @@
134 traceback.print_exc()
135 sys.exit(1)
136
137
138 @cli.command()
139 @click.option(
140 "--input-dir", "-i", type=click.Path(), default=None, help="Local directory of videos"
141 )
142 @click.option("--output", "-o", required=True, type=click.Path(), help="Output directory")
143 @click.option(
144 "--depth",
145 type=click.Choice(["basic", "standard", "comprehensive"]),
146 default="standard",
@@ -159,20 +168,35 @@
168 default="local",
169 help="Video source (local directory, Google Drive, or Dropbox)",
170 )
171 @click.option("--folder-id", type=str, default=None, help="Google Drive folder ID")
172 @click.option("--folder-path", type=str, default=None, help="Cloud folder path")
173 @click.option(
174 "--recursive/--no-recursive", default=True, help="Recurse into subfolders (default: recursive)"
175 )
176 @click.pass_context
177 def batch(
178 ctx,
179 input_dir,
180 output,
181 depth,
182 pattern,
183 title,
184 provider,
185 vision_model,
186 chat_model,
187 source,
188 folder_id,
189 folder_path,
190 recursive,
191 ):
192 """Process a folder of videos in batch."""
193 from video_processor.integrators.knowledge_graph import KnowledgeGraph
194 from video_processor.integrators.plan_generator import PlanGenerator
195 from video_processor.models import BatchManifest, BatchVideoEntry
196 from video_processor.output_structure import (
197 create_batch_output_dirs,
 
198 write_batch_manifest,
199 )
200 from video_processor.pipeline import process_single_video
201 from video_processor.providers.manager import ProviderManager
202
@@ -190,21 +214,23 @@
214
215 cloud = GoogleDriveSource()
216 if not cloud.authenticate():
217 logging.error("Google Drive authentication failed")
218 sys.exit(1)
219 cloud_files = cloud.list_videos(
220 folder_id=folder_id, folder_path=folder_path, patterns=patterns, recursive=recursive
221 )
222 cloud.download_all(cloud_files, download_dir)
223 elif source == "dropbox":
224 from video_processor.sources.dropbox_source import DropboxSource
225
226 cloud = DropboxSource()
227 if not cloud.authenticate():
228 logging.error("Dropbox authentication failed")
229 sys.exit(1)
230 cloud_files = cloud.list_videos(folder_path=folder_path, patterns=patterns)
231 cloud.download_all(cloud_files, download_dir)
232 else:
233 logging.error(f"Unknown source: {source}")
234 sys.exit(1)
235
236 input_dir = download_dir
@@ -302,11 +328,14 @@
328 batch_summary_md="batch_summary.md",
329 merged_knowledge_graph_json="knowledge_graph.json",
330 )
331 write_batch_manifest(batch_manifest, output)
332 click.echo(pm.usage.format_summary())
333 click.echo(
334 f"\n Batch complete: {batch_manifest.completed_videos}"
335 f"/{batch_manifest.total_videos} succeeded"
336 )
337 click.echo(f" Results: {output}/batch_manifest.json")
338
339
340 @cli.command("list-models")
341 @click.pass_context
@@ -374,11 +403,13 @@
403 traceback.print_exc()
404 sys.exit(1)
405
406
407 @cli.command("agent-analyze")
408 @click.option(
409 "--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path"
410 )
411 @click.option("--output", "-o", required=True, type=click.Path(), help="Output directory")
412 @click.option(
413 "--depth",
414 type=click.Choice(["basic", "standard", "comprehensive"]),
415 default="standard",
416
--- video_processor/cli/output_formatter.py
+++ video_processor/cli/output_formatter.py
@@ -1,42 +1,42 @@
11
"""Output formatting for PlanOpticon analysis results."""
22
33
import html
4
-import json
54
import logging
65
import shutil
76
from pathlib import Path
87
from typing import Dict, List, Optional, Union
98
109
logger = logging.getLogger(__name__)
10
+
1111
1212
class OutputFormatter:
1313
"""Formats and organizes output from video analysis."""
14
-
14
+
1515
def __init__(self, output_dir: Union[str, Path]):
1616
"""
1717
Initialize output formatter.
18
-
18
+
1919
Parameters
2020
----------
2121
output_dir : str or Path
2222
Output directory for formatted content
2323
"""
2424
self.output_dir = Path(output_dir)
2525
self.output_dir.mkdir(parents=True, exist_ok=True)
26
-
26
+
2727
def organize_outputs(
2828
self,
2929
markdown_path: Union[str, Path],
3030
knowledge_graph_path: Union[str, Path],
3131
diagrams: List[Dict],
3232
frames_dir: Optional[Union[str, Path]] = None,
33
- transcript_path: Optional[Union[str, Path]] = None
33
+ transcript_path: Optional[Union[str, Path]] = None,
3434
) -> Dict:
3535
"""
3636
Organize outputs into a consistent structure.
37
-
37
+
3838
Parameters
3939
----------
4040
markdown_path : str or Path
4141
Path to markdown analysis
4242
knowledge_graph_path : str or Path
@@ -45,84 +45,84 @@
4545
List of diagram analysis results
4646
frames_dir : str or Path, optional
4747
Directory with extracted frames
4848
transcript_path : str or Path, optional
4949
Path to transcript file
50
-
50
+
5151
Returns
5252
-------
5353
dict
5454
Dictionary with organized output paths
5555
"""
5656
# Create output structure
5757
md_dir = self.output_dir / "markdown"
5858
diagrams_dir = self.output_dir / "diagrams"
5959
data_dir = self.output_dir / "data"
60
-
60
+
6161
md_dir.mkdir(exist_ok=True)
6262
diagrams_dir.mkdir(exist_ok=True)
6363
data_dir.mkdir(exist_ok=True)
64
-
64
+
6565
# Copy markdown file
6666
markdown_path = Path(markdown_path)
6767
md_output = md_dir / markdown_path.name
6868
shutil.copy2(markdown_path, md_output)
69
-
69
+
7070
# Copy knowledge graph
7171
kg_path = Path(knowledge_graph_path)
7272
kg_output = data_dir / kg_path.name
7373
shutil.copy2(kg_path, kg_output)
74
-
74
+
7575
# Copy diagram images if available
7676
diagram_images = []
7777
for diagram in diagrams:
7878
if "image_path" in diagram and diagram["image_path"]:
7979
img_path = Path(diagram["image_path"])
8080
if img_path.exists():
8181
img_output = diagrams_dir / img_path.name
8282
shutil.copy2(img_path, img_output)
8383
diagram_images.append(str(img_output))
84
-
84
+
8585
# Copy transcript if provided
8686
transcript_output = None
8787
if transcript_path:
8888
transcript_path = Path(transcript_path)
8989
if transcript_path.exists():
9090
transcript_output = data_dir / transcript_path.name
9191
shutil.copy2(transcript_path, transcript_output)
92
-
92
+
9393
# Copy selected frames if provided
9494
frame_outputs = []
9595
if frames_dir:
9696
frames_dir = Path(frames_dir)
9797
if frames_dir.exists():
9898
frames_output_dir = self.output_dir / "frames"
9999
frames_output_dir.mkdir(exist_ok=True)
100
-
100
+
101101
# Copy a limited number of representative frames
102102
frame_files = sorted(list(frames_dir.glob("*.jpg")))
103103
max_frames = min(10, len(frame_files))
104104
step = max(1, len(frame_files) // max_frames)
105
-
105
+
106106
for i in range(0, len(frame_files), step):
107107
if len(frame_outputs) >= max_frames:
108108
break
109
-
109
+
110110
frame = frame_files[i]
111111
frame_output = frames_output_dir / frame.name
112112
shutil.copy2(frame, frame_output)
113113
frame_outputs.append(str(frame_output))
114
-
114
+
115115
# Return organized paths
116116
return {
117117
"markdown": str(md_output),
118118
"knowledge_graph": str(kg_output),
119119
"diagram_images": diagram_images,
120120
"frames": frame_outputs,
121
- "transcript": str(transcript_output) if transcript_output else None
121
+ "transcript": str(transcript_output) if transcript_output else None,
122122
}
123
-
123
+
124124
def create_html_index(self, outputs: Dict) -> Path:
125125
"""
126126
Create HTML index page for outputs.
127127
128128
Parameters
@@ -142,11 +142,12 @@
142142
"<!DOCTYPE html>",
143143
"<html>",
144144
"<head>",
145145
" <title>PlanOpticon Analysis Results</title>",
146146
" <style>",
147
- " body { font-family: Arial, sans-serif; margin: 0; padding: 20px; line-height: 1.6; }",
147
+ " body { font-family: Arial, sans-serif;"
148
+ " margin: 0; padding: 20px; line-height: 1.6; }",
148149
" .container { max-width: 1200px; margin: 0 auto; }",
149150
" h1 { color: #333; }",
150151
" h2 { color: #555; margin-top: 30px; }",
151152
" .section { margin-bottom: 30px; }",
152153
" .files { display: flex; flex-wrap: wrap; }",
@@ -158,11 +159,11 @@
158159
" </style>",
159160
"</head>",
160161
"<body>",
161162
"<div class='container'>",
162163
" <h1>PlanOpticon Analysis Results</h1>",
163
- ""
164
+ "",
164165
]
165166
166167
# Add markdown section
167168
if outputs.get("markdown"):
168169
md_path = Path(outputs["markdown"])
@@ -228,11 +229,13 @@
228229
lines.append(" <ul>")
229230
230231
for data_path in data_files:
231232
data_rel = esc(str(data_path.relative_to(self.output_dir)))
232233
data_name = esc(data_path.name)
233
- lines.append(f" <li><a href='{data_rel}' target='_blank'>{data_name}</a></li>")
234
+ lines.append(
235
+ f" <li><a href='{data_rel}' target='_blank'>{data_name}</a></li>"
236
+ )
234237
235238
lines.append(" </ul>")
236239
lines.append(" </div>")
237240
238241
# Close HTML
239242
--- video_processor/cli/output_formatter.py
+++ video_processor/cli/output_formatter.py
@@ -1,42 +1,42 @@
1 """Output formatting for PlanOpticon analysis results."""
2
3 import html
4 import json
5 import logging
6 import shutil
7 from pathlib import Path
8 from typing import Dict, List, Optional, Union
9
10 logger = logging.getLogger(__name__)
 
11
12 class OutputFormatter:
13 """Formats and organizes output from video analysis."""
14
15 def __init__(self, output_dir: Union[str, Path]):
16 """
17 Initialize output formatter.
18
19 Parameters
20 ----------
21 output_dir : str or Path
22 Output directory for formatted content
23 """
24 self.output_dir = Path(output_dir)
25 self.output_dir.mkdir(parents=True, exist_ok=True)
26
27 def organize_outputs(
28 self,
29 markdown_path: Union[str, Path],
30 knowledge_graph_path: Union[str, Path],
31 diagrams: List[Dict],
32 frames_dir: Optional[Union[str, Path]] = None,
33 transcript_path: Optional[Union[str, Path]] = None
34 ) -> Dict:
35 """
36 Organize outputs into a consistent structure.
37
38 Parameters
39 ----------
40 markdown_path : str or Path
41 Path to markdown analysis
42 knowledge_graph_path : str or Path
@@ -45,84 +45,84 @@
45 List of diagram analysis results
46 frames_dir : str or Path, optional
47 Directory with extracted frames
48 transcript_path : str or Path, optional
49 Path to transcript file
50
51 Returns
52 -------
53 dict
54 Dictionary with organized output paths
55 """
56 # Create output structure
57 md_dir = self.output_dir / "markdown"
58 diagrams_dir = self.output_dir / "diagrams"
59 data_dir = self.output_dir / "data"
60
61 md_dir.mkdir(exist_ok=True)
62 diagrams_dir.mkdir(exist_ok=True)
63 data_dir.mkdir(exist_ok=True)
64
65 # Copy markdown file
66 markdown_path = Path(markdown_path)
67 md_output = md_dir / markdown_path.name
68 shutil.copy2(markdown_path, md_output)
69
70 # Copy knowledge graph
71 kg_path = Path(knowledge_graph_path)
72 kg_output = data_dir / kg_path.name
73 shutil.copy2(kg_path, kg_output)
74
75 # Copy diagram images if available
76 diagram_images = []
77 for diagram in diagrams:
78 if "image_path" in diagram and diagram["image_path"]:
79 img_path = Path(diagram["image_path"])
80 if img_path.exists():
81 img_output = diagrams_dir / img_path.name
82 shutil.copy2(img_path, img_output)
83 diagram_images.append(str(img_output))
84
85 # Copy transcript if provided
86 transcript_output = None
87 if transcript_path:
88 transcript_path = Path(transcript_path)
89 if transcript_path.exists():
90 transcript_output = data_dir / transcript_path.name
91 shutil.copy2(transcript_path, transcript_output)
92
93 # Copy selected frames if provided
94 frame_outputs = []
95 if frames_dir:
96 frames_dir = Path(frames_dir)
97 if frames_dir.exists():
98 frames_output_dir = self.output_dir / "frames"
99 frames_output_dir.mkdir(exist_ok=True)
100
101 # Copy a limited number of representative frames
102 frame_files = sorted(list(frames_dir.glob("*.jpg")))
103 max_frames = min(10, len(frame_files))
104 step = max(1, len(frame_files) // max_frames)
105
106 for i in range(0, len(frame_files), step):
107 if len(frame_outputs) >= max_frames:
108 break
109
110 frame = frame_files[i]
111 frame_output = frames_output_dir / frame.name
112 shutil.copy2(frame, frame_output)
113 frame_outputs.append(str(frame_output))
114
115 # Return organized paths
116 return {
117 "markdown": str(md_output),
118 "knowledge_graph": str(kg_output),
119 "diagram_images": diagram_images,
120 "frames": frame_outputs,
121 "transcript": str(transcript_output) if transcript_output else None
122 }
123
124 def create_html_index(self, outputs: Dict) -> Path:
125 """
126 Create HTML index page for outputs.
127
128 Parameters
@@ -142,11 +142,12 @@
142 "<!DOCTYPE html>",
143 "<html>",
144 "<head>",
145 " <title>PlanOpticon Analysis Results</title>",
146 " <style>",
147 " body { font-family: Arial, sans-serif; margin: 0; padding: 20px; line-height: 1.6; }",
 
148 " .container { max-width: 1200px; margin: 0 auto; }",
149 " h1 { color: #333; }",
150 " h2 { color: #555; margin-top: 30px; }",
151 " .section { margin-bottom: 30px; }",
152 " .files { display: flex; flex-wrap: wrap; }",
@@ -158,11 +159,11 @@
158 " </style>",
159 "</head>",
160 "<body>",
161 "<div class='container'>",
162 " <h1>PlanOpticon Analysis Results</h1>",
163 ""
164 ]
165
166 # Add markdown section
167 if outputs.get("markdown"):
168 md_path = Path(outputs["markdown"])
@@ -228,11 +229,13 @@
228 lines.append(" <ul>")
229
230 for data_path in data_files:
231 data_rel = esc(str(data_path.relative_to(self.output_dir)))
232 data_name = esc(data_path.name)
233 lines.append(f" <li><a href='{data_rel}' target='_blank'>{data_name}</a></li>")
 
 
234
235 lines.append(" </ul>")
236 lines.append(" </div>")
237
238 # Close HTML
239
--- video_processor/cli/output_formatter.py
+++ video_processor/cli/output_formatter.py
@@ -1,42 +1,42 @@
1 """Output formatting for PlanOpticon analysis results."""
2
3 import html
 
4 import logging
5 import shutil
6 from pathlib import Path
7 from typing import Dict, List, Optional, Union
8
9 logger = logging.getLogger(__name__)
10
11
12 class OutputFormatter:
13 """Formats and organizes output from video analysis."""
14
15 def __init__(self, output_dir: Union[str, Path]):
16 """
17 Initialize output formatter.
18
19 Parameters
20 ----------
21 output_dir : str or Path
22 Output directory for formatted content
23 """
24 self.output_dir = Path(output_dir)
25 self.output_dir.mkdir(parents=True, exist_ok=True)
26
27 def organize_outputs(
28 self,
29 markdown_path: Union[str, Path],
30 knowledge_graph_path: Union[str, Path],
31 diagrams: List[Dict],
32 frames_dir: Optional[Union[str, Path]] = None,
33 transcript_path: Optional[Union[str, Path]] = None,
34 ) -> Dict:
35 """
36 Organize outputs into a consistent structure.
37
38 Parameters
39 ----------
40 markdown_path : str or Path
41 Path to markdown analysis
42 knowledge_graph_path : str or Path
@@ -45,84 +45,84 @@
45 List of diagram analysis results
46 frames_dir : str or Path, optional
47 Directory with extracted frames
48 transcript_path : str or Path, optional
49 Path to transcript file
50
51 Returns
52 -------
53 dict
54 Dictionary with organized output paths
55 """
56 # Create output structure
57 md_dir = self.output_dir / "markdown"
58 diagrams_dir = self.output_dir / "diagrams"
59 data_dir = self.output_dir / "data"
60
61 md_dir.mkdir(exist_ok=True)
62 diagrams_dir.mkdir(exist_ok=True)
63 data_dir.mkdir(exist_ok=True)
64
65 # Copy markdown file
66 markdown_path = Path(markdown_path)
67 md_output = md_dir / markdown_path.name
68 shutil.copy2(markdown_path, md_output)
69
70 # Copy knowledge graph
71 kg_path = Path(knowledge_graph_path)
72 kg_output = data_dir / kg_path.name
73 shutil.copy2(kg_path, kg_output)
74
75 # Copy diagram images if available
76 diagram_images = []
77 for diagram in diagrams:
78 if "image_path" in diagram and diagram["image_path"]:
79 img_path = Path(diagram["image_path"])
80 if img_path.exists():
81 img_output = diagrams_dir / img_path.name
82 shutil.copy2(img_path, img_output)
83 diagram_images.append(str(img_output))
84
85 # Copy transcript if provided
86 transcript_output = None
87 if transcript_path:
88 transcript_path = Path(transcript_path)
89 if transcript_path.exists():
90 transcript_output = data_dir / transcript_path.name
91 shutil.copy2(transcript_path, transcript_output)
92
93 # Copy selected frames if provided
94 frame_outputs = []
95 if frames_dir:
96 frames_dir = Path(frames_dir)
97 if frames_dir.exists():
98 frames_output_dir = self.output_dir / "frames"
99 frames_output_dir.mkdir(exist_ok=True)
100
101 # Copy a limited number of representative frames
102 frame_files = sorted(list(frames_dir.glob("*.jpg")))
103 max_frames = min(10, len(frame_files))
104 step = max(1, len(frame_files) // max_frames)
105
106 for i in range(0, len(frame_files), step):
107 if len(frame_outputs) >= max_frames:
108 break
109
110 frame = frame_files[i]
111 frame_output = frames_output_dir / frame.name
112 shutil.copy2(frame, frame_output)
113 frame_outputs.append(str(frame_output))
114
115 # Return organized paths
116 return {
117 "markdown": str(md_output),
118 "knowledge_graph": str(kg_output),
119 "diagram_images": diagram_images,
120 "frames": frame_outputs,
121 "transcript": str(transcript_output) if transcript_output else None,
122 }
123
124 def create_html_index(self, outputs: Dict) -> Path:
125 """
126 Create HTML index page for outputs.
127
128 Parameters
@@ -142,11 +142,12 @@
142 "<!DOCTYPE html>",
143 "<html>",
144 "<head>",
145 " <title>PlanOpticon Analysis Results</title>",
146 " <style>",
147 " body { font-family: Arial, sans-serif;"
148 " margin: 0; padding: 20px; line-height: 1.6; }",
149 " .container { max-width: 1200px; margin: 0 auto; }",
150 " h1 { color: #333; }",
151 " h2 { color: #555; margin-top: 30px; }",
152 " .section { margin-bottom: 30px; }",
153 " .files { display: flex; flex-wrap: wrap; }",
@@ -158,11 +159,11 @@
159 " </style>",
160 "</head>",
161 "<body>",
162 "<div class='container'>",
163 " <h1>PlanOpticon Analysis Results</h1>",
164 "",
165 ]
166
167 # Add markdown section
168 if outputs.get("markdown"):
169 md_path = Path(outputs["markdown"])
@@ -228,11 +229,13 @@
229 lines.append(" <ul>")
230
231 for data_path in data_files:
232 data_rel = esc(str(data_path.relative_to(self.output_dir)))
233 data_name = esc(data_path.name)
234 lines.append(
235 f" <li><a href='{data_rel}' target='_blank'>{data_name}</a></li>"
236 )
237
238 lines.append(" </ul>")
239 lines.append(" </div>")
240
241 # Close HTML
242
--- video_processor/extractors/__init__.py
+++ video_processor/extractors/__init__.py
@@ -1,17 +1,17 @@
1
+from video_processor.extractors.audio_extractor import AudioExtractor
12
from video_processor.extractors.frame_extractor import (
2
- extract_frames,
3
- save_frames,
4
- calculate_frame_difference,
5
- is_gpu_available
3
+ calculate_frame_difference,
4
+ extract_frames,
5
+ is_gpu_available,
6
+ save_frames,
67
)
7
-from video_processor.extractors.audio_extractor import AudioExtractor
88
from video_processor.extractors.text_extractor import TextExtractor
99
1010
__all__ = [
11
- 'extract_frames',
12
- 'save_frames',
13
- 'calculate_frame_difference',
14
- 'is_gpu_available',
15
- 'AudioExtractor',
16
- 'TextExtractor',
11
+ "extract_frames",
12
+ "save_frames",
13
+ "calculate_frame_difference",
14
+ "is_gpu_available",
15
+ "AudioExtractor",
16
+ "TextExtractor",
1717
]
1818
--- video_processor/extractors/__init__.py
+++ video_processor/extractors/__init__.py
@@ -1,17 +1,17 @@
 
1 from video_processor.extractors.frame_extractor import (
2 extract_frames,
3 save_frames,
4 calculate_frame_difference,
5 is_gpu_available
6 )
7 from video_processor.extractors.audio_extractor import AudioExtractor
8 from video_processor.extractors.text_extractor import TextExtractor
9
10 __all__ = [
11 'extract_frames',
12 'save_frames',
13 'calculate_frame_difference',
14 'is_gpu_available',
15 'AudioExtractor',
16 'TextExtractor',
17 ]
18
--- video_processor/extractors/__init__.py
+++ video_processor/extractors/__init__.py
@@ -1,17 +1,17 @@
1 from video_processor.extractors.audio_extractor import AudioExtractor
2 from video_processor.extractors.frame_extractor import (
3 calculate_frame_difference,
4 extract_frames,
5 is_gpu_available,
6 save_frames,
7 )
 
8 from video_processor.extractors.text_extractor import TextExtractor
9
10 __all__ = [
11 "extract_frames",
12 "save_frames",
13 "calculate_frame_difference",
14 "is_gpu_available",
15 "AudioExtractor",
16 "TextExtractor",
17 ]
18
--- video_processor/extractors/audio_extractor.py
+++ video_processor/extractors/audio_extractor.py
@@ -1,172 +1,170 @@
11
"""Audio extraction and processing module for video analysis."""
2
+
23
import logging
3
-import os
44
import subprocess
55
from pathlib import Path
66
from typing import Dict, Optional, Tuple, Union
77
88
import librosa
99
import numpy as np
1010
import soundfile as sf
1111
1212
logger = logging.getLogger(__name__)
13
+
1314
1415
class AudioExtractor:
1516
"""Extract and process audio from video files."""
16
-
17
+
1718
def __init__(self, sample_rate: int = 16000, mono: bool = True):
1819
"""
1920
Initialize the audio extractor.
20
-
21
+
2122
Parameters
2223
----------
2324
sample_rate : int
2425
Target sample rate for extracted audio
2526
mono : bool
2627
Whether to convert audio to mono
2728
"""
2829
self.sample_rate = sample_rate
2930
self.mono = mono
30
-
31
+
3132
def extract_audio(
32
- self,
33
- video_path: Union[str, Path],
34
- output_path: Optional[Union[str, Path]] = None,
35
- format: str = "wav"
33
+ self,
34
+ video_path: Union[str, Path],
35
+ output_path: Optional[Union[str, Path]] = None,
36
+ format: str = "wav",
3637
) -> Path:
3738
"""
3839
Extract audio from video file.
39
-
40
+
4041
Parameters
4142
----------
4243
video_path : str or Path
4344
Path to video file
4445
output_path : str or Path, optional
4546
Path to save extracted audio (if None, saves alongside video)
4647
format : str
4748
Audio format to save (wav, mp3, etc.)
48
-
49
+
4950
Returns
5051
-------
5152
Path
5253
Path to extracted audio file
5354
"""
5455
video_path = Path(video_path)
5556
if not video_path.exists():
5657
raise FileNotFoundError(f"Video file not found: {video_path}")
57
-
58
+
5859
# Generate output path if not provided
5960
if output_path is None:
6061
output_path = video_path.with_suffix(f".{format}")
6162
else:
6263
output_path = Path(output_path)
63
-
64
+
6465
# Ensure output directory exists
6566
output_path.parent.mkdir(parents=True, exist_ok=True)
66
-
67
+
6768
# Extract audio using ffmpeg
6869
try:
6970
cmd = [
70
- "ffmpeg",
71
- "-i", str(video_path),
72
- "-vn", # No video
73
- "-acodec", "pcm_s16le", # PCM 16-bit little-endian
74
- "-ar", str(self.sample_rate), # Sample rate
75
- "-ac", "1" if self.mono else "2", # Channels (mono or stereo)
76
- "-y", # Overwrite output
77
- str(output_path)
78
- ]
79
-
80
- # Run ffmpeg command
81
- result = subprocess.run(
82
- cmd,
83
- stdout=subprocess.PIPE,
84
- stderr=subprocess.PIPE,
85
- check=True
86
- )
87
-
71
+ "ffmpeg",
72
+ "-i",
73
+ str(video_path),
74
+ "-vn", # No video
75
+ "-acodec",
76
+ "pcm_s16le", # PCM 16-bit little-endian
77
+ "-ar",
78
+ str(self.sample_rate), # Sample rate
79
+ "-ac",
80
+ "1" if self.mono else "2", # Channels (mono or stereo)
81
+ "-y", # Overwrite output
82
+ str(output_path),
83
+ ]
84
+
85
+ # Run ffmpeg command
86
+ subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True)
87
+
8888
logger.info(f"Extracted audio from {video_path} to {output_path}")
8989
return output_path
90
-
90
+
9191
except subprocess.CalledProcessError as e:
9292
logger.error(f"Failed to extract audio: {e.stderr.decode()}")
9393
raise RuntimeError(f"Failed to extract audio: {e.stderr.decode()}")
9494
except Exception as e:
9595
logger.error(f"Error extracting audio: {str(e)}")
9696
raise
97
-
97
+
9898
def load_audio(self, audio_path: Union[str, Path]) -> Tuple[np.ndarray, int]:
9999
"""
100100
Load audio file into memory.
101
-
101
+
102102
Parameters
103103
----------
104104
audio_path : str or Path
105105
Path to audio file
106
-
106
+
107107
Returns
108108
-------
109109
tuple
110110
(audio_data, sample_rate)
111111
"""
112112
audio_path = Path(audio_path)
113113
if not audio_path.exists():
114114
raise FileNotFoundError(f"Audio file not found: {audio_path}")
115
-
115
+
116116
# Load audio data
117117
audio_data, sr = librosa.load(
118
- audio_path,
119
- sr=self.sample_rate if self.sample_rate else None,
120
- mono=self.mono
118
+ audio_path, sr=self.sample_rate if self.sample_rate else None, mono=self.mono
121119
)
122
-
120
+
123121
logger.info(f"Loaded audio from {audio_path}: shape={audio_data.shape}, sr={sr}")
124122
return audio_data, sr
125
-
123
+
126124
def get_audio_properties(self, audio_path: Union[str, Path]) -> Dict:
127125
"""
128126
Get properties of audio file.
129
-
127
+
130128
Parameters
131129
----------
132130
audio_path : str or Path
133131
Path to audio file
134
-
132
+
135133
Returns
136134
-------
137135
dict
138136
Audio properties (duration, sample_rate, channels, etc.)
139137
"""
140138
audio_path = Path(audio_path)
141139
if not audio_path.exists():
142140
raise FileNotFoundError(f"Audio file not found: {audio_path}")
143
-
141
+
144142
# Get audio info
145143
info = sf.info(audio_path)
146
-
144
+
147145
properties = {
148146
"duration": info.duration,
149147
"sample_rate": info.samplerate,
150148
"channels": info.channels,
151149
"format": info.format,
152150
"subtype": info.subtype,
153
- "path": str(audio_path)
151
+ "path": str(audio_path),
154152
}
155
-
153
+
156154
return properties
157
-
155
+
158156
def segment_audio(
159157
self,
160158
audio_data: np.ndarray,
161159
sample_rate: int,
162160
segment_length_ms: int = 30000,
163
- overlap_ms: int = 0
161
+ overlap_ms: int = 0,
164162
) -> list:
165163
"""
166164
Segment audio into chunks.
167
-
165
+
168166
Parameters
169167
----------
170168
audio_data : np.ndarray
171169
Audio data
172170
sample_rate : int
@@ -173,65 +171,62 @@
173171
Sample rate of audio
174172
segment_length_ms : int
175173
Length of segments in milliseconds
176174
overlap_ms : int
177175
Overlap between segments in milliseconds
178
-
176
+
179177
Returns
180178
-------
181179
list
182180
List of audio segments as numpy arrays
183181
"""
184182
# Convert ms to samples
185183
segment_length_samples = int(segment_length_ms * sample_rate / 1000)
186184
overlap_samples = int(overlap_ms * sample_rate / 1000)
187
-
185
+
188186
# Calculate hop length
189187
hop_length = segment_length_samples - overlap_samples
190
-
188
+
191189
# Initialize segments list
192190
segments = []
193
-
191
+
194192
# Generate segments
195193
for i in range(0, len(audio_data), hop_length):
196194
end_idx = min(i + segment_length_samples, len(audio_data))
197195
segment = audio_data[i:end_idx]
198
-
196
+
199197
# Only add if segment is long enough (at least 50% of target length)
200198
if len(segment) >= segment_length_samples * 0.5:
201199
segments.append(segment)
202
-
200
+
203201
# Break if we've reached the end
204202
if end_idx == len(audio_data):
205203
break
206
-
204
+
207205
logger.info(f"Segmented audio into {len(segments)} chunks")
208206
return segments
209
-
207
+
210208
def save_segment(
211
- self,
212
- segment: np.ndarray,
213
- output_path: Union[str, Path],
214
- sample_rate: int
209
+ self, segment: np.ndarray, output_path: Union[str, Path], sample_rate: int
215210
) -> Path:
216211
"""
217212
Save audio segment to file.
218
-
213
+
219214
Parameters
220215
----------
221216
segment : np.ndarray
222217
Audio segment data
223218
output_path : str or Path
224219
Path to save segment
225220
sample_rate : int
226221
Sample rate of segment
227
-
222
+
228223
Returns
229224
-------
230225
Path
231226
Path to saved segment
232227
"""
233228
output_path = Path(output_path)
234229
output_path.parent.mkdir(parents=True, exist_ok=True)
235
-
230
+
236231
sf.write(output_path, segment, sample_rate)
237232
return output_path
238233
--- video_processor/extractors/audio_extractor.py
+++ video_processor/extractors/audio_extractor.py
@@ -1,172 +1,170 @@
1 """Audio extraction and processing module for video analysis."""
 
2 import logging
3 import os
4 import subprocess
5 from pathlib import Path
6 from typing import Dict, Optional, Tuple, Union
7
8 import librosa
9 import numpy as np
10 import soundfile as sf
11
12 logger = logging.getLogger(__name__)
 
13
14 class AudioExtractor:
15 """Extract and process audio from video files."""
16
17 def __init__(self, sample_rate: int = 16000, mono: bool = True):
18 """
19 Initialize the audio extractor.
20
21 Parameters
22 ----------
23 sample_rate : int
24 Target sample rate for extracted audio
25 mono : bool
26 Whether to convert audio to mono
27 """
28 self.sample_rate = sample_rate
29 self.mono = mono
30
31 def extract_audio(
32 self,
33 video_path: Union[str, Path],
34 output_path: Optional[Union[str, Path]] = None,
35 format: str = "wav"
36 ) -> Path:
37 """
38 Extract audio from video file.
39
40 Parameters
41 ----------
42 video_path : str or Path
43 Path to video file
44 output_path : str or Path, optional
45 Path to save extracted audio (if None, saves alongside video)
46 format : str
47 Audio format to save (wav, mp3, etc.)
48
49 Returns
50 -------
51 Path
52 Path to extracted audio file
53 """
54 video_path = Path(video_path)
55 if not video_path.exists():
56 raise FileNotFoundError(f"Video file not found: {video_path}")
57
58 # Generate output path if not provided
59 if output_path is None:
60 output_path = video_path.with_suffix(f".{format}")
61 else:
62 output_path = Path(output_path)
63
64 # Ensure output directory exists
65 output_path.parent.mkdir(parents=True, exist_ok=True)
66
67 # Extract audio using ffmpeg
68 try:
69 cmd = [
70 "ffmpeg",
71 "-i", str(video_path),
72 "-vn", # No video
73 "-acodec", "pcm_s16le", # PCM 16-bit little-endian
74 "-ar", str(self.sample_rate), # Sample rate
75 "-ac", "1" if self.mono else "2", # Channels (mono or stereo)
76 "-y", # Overwrite output
77 str(output_path)
78 ]
79
80 # Run ffmpeg command
81 result = subprocess.run(
82 cmd,
83 stdout=subprocess.PIPE,
84 stderr=subprocess.PIPE,
85 check=True
86 )
87
88 logger.info(f"Extracted audio from {video_path} to {output_path}")
89 return output_path
90
91 except subprocess.CalledProcessError as e:
92 logger.error(f"Failed to extract audio: {e.stderr.decode()}")
93 raise RuntimeError(f"Failed to extract audio: {e.stderr.decode()}")
94 except Exception as e:
95 logger.error(f"Error extracting audio: {str(e)}")
96 raise
97
98 def load_audio(self, audio_path: Union[str, Path]) -> Tuple[np.ndarray, int]:
99 """
100 Load audio file into memory.
101
102 Parameters
103 ----------
104 audio_path : str or Path
105 Path to audio file
106
107 Returns
108 -------
109 tuple
110 (audio_data, sample_rate)
111 """
112 audio_path = Path(audio_path)
113 if not audio_path.exists():
114 raise FileNotFoundError(f"Audio file not found: {audio_path}")
115
116 # Load audio data
117 audio_data, sr = librosa.load(
118 audio_path,
119 sr=self.sample_rate if self.sample_rate else None,
120 mono=self.mono
121 )
122
123 logger.info(f"Loaded audio from {audio_path}: shape={audio_data.shape}, sr={sr}")
124 return audio_data, sr
125
126 def get_audio_properties(self, audio_path: Union[str, Path]) -> Dict:
127 """
128 Get properties of audio file.
129
130 Parameters
131 ----------
132 audio_path : str or Path
133 Path to audio file
134
135 Returns
136 -------
137 dict
138 Audio properties (duration, sample_rate, channels, etc.)
139 """
140 audio_path = Path(audio_path)
141 if not audio_path.exists():
142 raise FileNotFoundError(f"Audio file not found: {audio_path}")
143
144 # Get audio info
145 info = sf.info(audio_path)
146
147 properties = {
148 "duration": info.duration,
149 "sample_rate": info.samplerate,
150 "channels": info.channels,
151 "format": info.format,
152 "subtype": info.subtype,
153 "path": str(audio_path)
154 }
155
156 return properties
157
158 def segment_audio(
159 self,
160 audio_data: np.ndarray,
161 sample_rate: int,
162 segment_length_ms: int = 30000,
163 overlap_ms: int = 0
164 ) -> list:
165 """
166 Segment audio into chunks.
167
168 Parameters
169 ----------
170 audio_data : np.ndarray
171 Audio data
172 sample_rate : int
@@ -173,65 +171,62 @@
173 Sample rate of audio
174 segment_length_ms : int
175 Length of segments in milliseconds
176 overlap_ms : int
177 Overlap between segments in milliseconds
178
179 Returns
180 -------
181 list
182 List of audio segments as numpy arrays
183 """
184 # Convert ms to samples
185 segment_length_samples = int(segment_length_ms * sample_rate / 1000)
186 overlap_samples = int(overlap_ms * sample_rate / 1000)
187
188 # Calculate hop length
189 hop_length = segment_length_samples - overlap_samples
190
191 # Initialize segments list
192 segments = []
193
194 # Generate segments
195 for i in range(0, len(audio_data), hop_length):
196 end_idx = min(i + segment_length_samples, len(audio_data))
197 segment = audio_data[i:end_idx]
198
199 # Only add if segment is long enough (at least 50% of target length)
200 if len(segment) >= segment_length_samples * 0.5:
201 segments.append(segment)
202
203 # Break if we've reached the end
204 if end_idx == len(audio_data):
205 break
206
207 logger.info(f"Segmented audio into {len(segments)} chunks")
208 return segments
209
210 def save_segment(
211 self,
212 segment: np.ndarray,
213 output_path: Union[str, Path],
214 sample_rate: int
215 ) -> Path:
216 """
217 Save audio segment to file.
218
219 Parameters
220 ----------
221 segment : np.ndarray
222 Audio segment data
223 output_path : str or Path
224 Path to save segment
225 sample_rate : int
226 Sample rate of segment
227
228 Returns
229 -------
230 Path
231 Path to saved segment
232 """
233 output_path = Path(output_path)
234 output_path.parent.mkdir(parents=True, exist_ok=True)
235
236 sf.write(output_path, segment, sample_rate)
237 return output_path
238
--- video_processor/extractors/audio_extractor.py
+++ video_processor/extractors/audio_extractor.py
@@ -1,172 +1,170 @@
1 """Audio extraction and processing module for video analysis."""
2
3 import logging
 
4 import subprocess
5 from pathlib import Path
6 from typing import Dict, Optional, Tuple, Union
7
8 import librosa
9 import numpy as np
10 import soundfile as sf
11
12 logger = logging.getLogger(__name__)
13
14
15 class AudioExtractor:
16 """Extract and process audio from video files."""
17
18 def __init__(self, sample_rate: int = 16000, mono: bool = True):
19 """
20 Initialize the audio extractor.
21
22 Parameters
23 ----------
24 sample_rate : int
25 Target sample rate for extracted audio
26 mono : bool
27 Whether to convert audio to mono
28 """
29 self.sample_rate = sample_rate
30 self.mono = mono
31
32 def extract_audio(
33 self,
34 video_path: Union[str, Path],
35 output_path: Optional[Union[str, Path]] = None,
36 format: str = "wav",
37 ) -> Path:
38 """
39 Extract audio from video file.
40
41 Parameters
42 ----------
43 video_path : str or Path
44 Path to video file
45 output_path : str or Path, optional
46 Path to save extracted audio (if None, saves alongside video)
47 format : str
48 Audio format to save (wav, mp3, etc.)
49
50 Returns
51 -------
52 Path
53 Path to extracted audio file
54 """
55 video_path = Path(video_path)
56 if not video_path.exists():
57 raise FileNotFoundError(f"Video file not found: {video_path}")
58
59 # Generate output path if not provided
60 if output_path is None:
61 output_path = video_path.with_suffix(f".{format}")
62 else:
63 output_path = Path(output_path)
64
65 # Ensure output directory exists
66 output_path.parent.mkdir(parents=True, exist_ok=True)
67
68 # Extract audio using ffmpeg
69 try:
70 cmd = [
71 "ffmpeg",
72 "-i",
73 str(video_path),
74 "-vn", # No video
75 "-acodec",
76 "pcm_s16le", # PCM 16-bit little-endian
77 "-ar",
78 str(self.sample_rate), # Sample rate
79 "-ac",
80 "1" if self.mono else "2", # Channels (mono or stereo)
81 "-y", # Overwrite output
82 str(output_path),
83 ]
84
85 # Run ffmpeg command
86 subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True)
87
 
88 logger.info(f"Extracted audio from {video_path} to {output_path}")
89 return output_path
90
91 except subprocess.CalledProcessError as e:
92 logger.error(f"Failed to extract audio: {e.stderr.decode()}")
93 raise RuntimeError(f"Failed to extract audio: {e.stderr.decode()}")
94 except Exception as e:
95 logger.error(f"Error extracting audio: {str(e)}")
96 raise
97
98 def load_audio(self, audio_path: Union[str, Path]) -> Tuple[np.ndarray, int]:
99 """
100 Load audio file into memory.
101
102 Parameters
103 ----------
104 audio_path : str or Path
105 Path to audio file
106
107 Returns
108 -------
109 tuple
110 (audio_data, sample_rate)
111 """
112 audio_path = Path(audio_path)
113 if not audio_path.exists():
114 raise FileNotFoundError(f"Audio file not found: {audio_path}")
115
116 # Load audio data
117 audio_data, sr = librosa.load(
118 audio_path, sr=self.sample_rate if self.sample_rate else None, mono=self.mono
 
 
119 )
120
121 logger.info(f"Loaded audio from {audio_path}: shape={audio_data.shape}, sr={sr}")
122 return audio_data, sr
123
124 def get_audio_properties(self, audio_path: Union[str, Path]) -> Dict:
125 """
126 Get properties of audio file.
127
128 Parameters
129 ----------
130 audio_path : str or Path
131 Path to audio file
132
133 Returns
134 -------
135 dict
136 Audio properties (duration, sample_rate, channels, etc.)
137 """
138 audio_path = Path(audio_path)
139 if not audio_path.exists():
140 raise FileNotFoundError(f"Audio file not found: {audio_path}")
141
142 # Get audio info
143 info = sf.info(audio_path)
144
145 properties = {
146 "duration": info.duration,
147 "sample_rate": info.samplerate,
148 "channels": info.channels,
149 "format": info.format,
150 "subtype": info.subtype,
151 "path": str(audio_path),
152 }
153
154 return properties
155
156 def segment_audio(
157 self,
158 audio_data: np.ndarray,
159 sample_rate: int,
160 segment_length_ms: int = 30000,
161 overlap_ms: int = 0,
162 ) -> list:
163 """
164 Segment audio into chunks.
165
166 Parameters
167 ----------
168 audio_data : np.ndarray
169 Audio data
170 sample_rate : int
@@ -173,65 +171,62 @@
171 Sample rate of audio
172 segment_length_ms : int
173 Length of segments in milliseconds
174 overlap_ms : int
175 Overlap between segments in milliseconds
176
177 Returns
178 -------
179 list
180 List of audio segments as numpy arrays
181 """
182 # Convert ms to samples
183 segment_length_samples = int(segment_length_ms * sample_rate / 1000)
184 overlap_samples = int(overlap_ms * sample_rate / 1000)
185
186 # Calculate hop length
187 hop_length = segment_length_samples - overlap_samples
188
189 # Initialize segments list
190 segments = []
191
192 # Generate segments
193 for i in range(0, len(audio_data), hop_length):
194 end_idx = min(i + segment_length_samples, len(audio_data))
195 segment = audio_data[i:end_idx]
196
197 # Only add if segment is long enough (at least 50% of target length)
198 if len(segment) >= segment_length_samples * 0.5:
199 segments.append(segment)
200
201 # Break if we've reached the end
202 if end_idx == len(audio_data):
203 break
204
205 logger.info(f"Segmented audio into {len(segments)} chunks")
206 return segments
207
208 def save_segment(
209 self, segment: np.ndarray, output_path: Union[str, Path], sample_rate: int
 
 
 
210 ) -> Path:
211 """
212 Save audio segment to file.
213
214 Parameters
215 ----------
216 segment : np.ndarray
217 Audio segment data
218 output_path : str or Path
219 Path to save segment
220 sample_rate : int
221 Sample rate of segment
222
223 Returns
224 -------
225 Path
226 Path to saved segment
227 """
228 output_path = Path(output_path)
229 output_path.parent.mkdir(parents=True, exist_ok=True)
230
231 sf.write(output_path, segment, sample_rate)
232 return output_path
233
--- video_processor/extractors/frame_extractor.py
+++ video_processor/extractors/frame_extractor.py
@@ -1,6 +1,7 @@
11
"""Frame extraction module for video processing."""
2
+
23
import functools
34
import logging
45
from pathlib import Path
56
from typing import List, Optional, Tuple, Union
67
@@ -112,44 +113,49 @@
112113
filtered.append(frame)
113114
114115
if removed:
115116
logger.info(f"Filtered out {removed}/{len(frames)} people/webcam frames")
116117
return filtered, removed
118
+
117119
118120
def is_gpu_available() -> bool:
119121
"""Check if GPU acceleration is available for OpenCV."""
120122
try:
121123
# Check if CUDA is available
122124
count = cv2.cuda.getCudaEnabledDeviceCount()
123125
return count > 0
124126
except Exception:
125127
return False
128
+
126129
127130
def gpu_accelerated(func):
128131
"""Decorator to use GPU implementation when available."""
132
+
129133
@functools.wraps(func)
130134
def wrapper(*args, **kwargs):
131
- if is_gpu_available() and not kwargs.get('disable_gpu'):
135
+ if is_gpu_available() and not kwargs.get("disable_gpu"):
132136
# Remove the disable_gpu kwarg if it exists
133
- kwargs.pop('disable_gpu', None)
137
+ kwargs.pop("disable_gpu", None)
134138
return func_gpu(*args, **kwargs)
135139
# Remove the disable_gpu kwarg if it exists
136
- kwargs.pop('disable_gpu', None)
140
+ kwargs.pop("disable_gpu", None)
137141
return func(*args, **kwargs)
142
+
138143
return wrapper
144
+
139145
140146
def calculate_frame_difference(prev_frame: np.ndarray, curr_frame: np.ndarray) -> float:
141147
"""
142148
Calculate the difference between two frames.
143
-
149
+
144150
Parameters
145151
----------
146152
prev_frame : np.ndarray
147153
Previous frame
148154
curr_frame : np.ndarray
149155
Current frame
150
-
156
+
151157
Returns
152158
-------
153159
float
154160
Difference score between 0 and 1
155161
"""
@@ -156,30 +162,31 @@
156162
# Convert to grayscale
157163
if len(prev_frame.shape) == 3:
158164
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
159165
else:
160166
prev_gray = prev_frame
161
-
167
+
162168
if len(curr_frame.shape) == 3:
163169
curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)
164170
else:
165171
curr_gray = curr_frame
166
-
172
+
167173
# Calculate absolute difference
168174
diff = cv2.absdiff(prev_gray, curr_gray)
169
-
175
+
170176
# Normalize and return mean difference
171177
return np.mean(diff) / 255.0
178
+
172179
173180
@gpu_accelerated
174181
def extract_frames(
175182
video_path: Union[str, Path],
176183
sampling_rate: float = 1.0,
177184
change_threshold: float = 0.15,
178185
periodic_capture_seconds: float = 30.0,
179186
max_frames: Optional[int] = None,
180
- resize_to: Optional[Tuple[int, int]] = None
187
+ resize_to: Optional[Tuple[int, int]] = None,
181188
) -> List[np.ndarray]:
182189
"""
183190
Extract frames from video based on visual change detection + periodic capture.
184191
185192
Two capture strategies work together:
@@ -273,11 +280,13 @@
273280
if diff > change_threshold:
274281
should_capture = True
275282
reason = f"change={diff:.3f}"
276283
277284
# Periodic capture — even if change is small
278
- elif periodic_interval > 0 and (frame_idx - last_capture_frame) >= periodic_interval:
285
+ elif (
286
+ periodic_interval > 0 and (frame_idx - last_capture_frame) >= periodic_interval
287
+ ):
279288
should_capture = True
280289
reason = "periodic"
281290
282291
if should_capture:
283292
extracted_frames.append(frame)
@@ -299,41 +308,45 @@
299308
300309
pbar.close()
301310
cap.release()
302311
logger.info(f"Extracted {len(extracted_frames)} frames from {frame_count} total frames")
303312
return extracted_frames
313
+
304314
305315
def func_gpu(*args, **kwargs):
306316
"""GPU-accelerated version of extract_frames."""
307317
# This would be implemented with CUDA acceleration
308318
# For now, fall back to the unwrapped CPU version
309319
logger.info("GPU acceleration not yet implemented, falling back to CPU")
310320
return extract_frames.__wrapped__(*args, **kwargs)
311321
312
-def save_frames(frames: List[np.ndarray], output_dir: Union[str, Path], base_filename: str = "frame") -> List[Path]:
322
+
323
+def save_frames(
324
+ frames: List[np.ndarray], output_dir: Union[str, Path], base_filename: str = "frame"
325
+) -> List[Path]:
313326
"""
314327
Save extracted frames to disk.
315
-
328
+
316329
Parameters
317330
----------
318331
frames : list
319332
List of frames to save
320333
output_dir : str or Path
321334
Directory to save frames in
322335
base_filename : str
323336
Base name for frame files
324
-
337
+
325338
Returns
326339
-------
327340
list
328341
List of paths to saved frame files
329342
"""
330343
output_dir = Path(output_dir)
331344
output_dir.mkdir(parents=True, exist_ok=True)
332
-
345
+
333346
saved_paths = []
334347
for i, frame in enumerate(frames):
335348
output_path = output_dir / f"{base_filename}_{i:04d}.jpg"
336349
cv2.imwrite(str(output_path), frame)
337350
saved_paths.append(output_path)
338
-
351
+
339352
return saved_paths
340353
--- video_processor/extractors/frame_extractor.py
+++ video_processor/extractors/frame_extractor.py
@@ -1,6 +1,7 @@
1 """Frame extraction module for video processing."""
 
2 import functools
3 import logging
4 from pathlib import Path
5 from typing import List, Optional, Tuple, Union
6
@@ -112,44 +113,49 @@
112 filtered.append(frame)
113
114 if removed:
115 logger.info(f"Filtered out {removed}/{len(frames)} people/webcam frames")
116 return filtered, removed
 
117
118 def is_gpu_available() -> bool:
119 """Check if GPU acceleration is available for OpenCV."""
120 try:
121 # Check if CUDA is available
122 count = cv2.cuda.getCudaEnabledDeviceCount()
123 return count > 0
124 except Exception:
125 return False
 
126
127 def gpu_accelerated(func):
128 """Decorator to use GPU implementation when available."""
 
129 @functools.wraps(func)
130 def wrapper(*args, **kwargs):
131 if is_gpu_available() and not kwargs.get('disable_gpu'):
132 # Remove the disable_gpu kwarg if it exists
133 kwargs.pop('disable_gpu', None)
134 return func_gpu(*args, **kwargs)
135 # Remove the disable_gpu kwarg if it exists
136 kwargs.pop('disable_gpu', None)
137 return func(*args, **kwargs)
 
138 return wrapper
 
139
140 def calculate_frame_difference(prev_frame: np.ndarray, curr_frame: np.ndarray) -> float:
141 """
142 Calculate the difference between two frames.
143
144 Parameters
145 ----------
146 prev_frame : np.ndarray
147 Previous frame
148 curr_frame : np.ndarray
149 Current frame
150
151 Returns
152 -------
153 float
154 Difference score between 0 and 1
155 """
@@ -156,30 +162,31 @@
156 # Convert to grayscale
157 if len(prev_frame.shape) == 3:
158 prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
159 else:
160 prev_gray = prev_frame
161
162 if len(curr_frame.shape) == 3:
163 curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)
164 else:
165 curr_gray = curr_frame
166
167 # Calculate absolute difference
168 diff = cv2.absdiff(prev_gray, curr_gray)
169
170 # Normalize and return mean difference
171 return np.mean(diff) / 255.0
 
172
173 @gpu_accelerated
174 def extract_frames(
175 video_path: Union[str, Path],
176 sampling_rate: float = 1.0,
177 change_threshold: float = 0.15,
178 periodic_capture_seconds: float = 30.0,
179 max_frames: Optional[int] = None,
180 resize_to: Optional[Tuple[int, int]] = None
181 ) -> List[np.ndarray]:
182 """
183 Extract frames from video based on visual change detection + periodic capture.
184
185 Two capture strategies work together:
@@ -273,11 +280,13 @@
273 if diff > change_threshold:
274 should_capture = True
275 reason = f"change={diff:.3f}"
276
277 # Periodic capture — even if change is small
278 elif periodic_interval > 0 and (frame_idx - last_capture_frame) >= periodic_interval:
 
 
279 should_capture = True
280 reason = "periodic"
281
282 if should_capture:
283 extracted_frames.append(frame)
@@ -299,41 +308,45 @@
299
300 pbar.close()
301 cap.release()
302 logger.info(f"Extracted {len(extracted_frames)} frames from {frame_count} total frames")
303 return extracted_frames
 
304
305 def func_gpu(*args, **kwargs):
306 """GPU-accelerated version of extract_frames."""
307 # This would be implemented with CUDA acceleration
308 # For now, fall back to the unwrapped CPU version
309 logger.info("GPU acceleration not yet implemented, falling back to CPU")
310 return extract_frames.__wrapped__(*args, **kwargs)
311
312 def save_frames(frames: List[np.ndarray], output_dir: Union[str, Path], base_filename: str = "frame") -> List[Path]:
 
 
 
313 """
314 Save extracted frames to disk.
315
316 Parameters
317 ----------
318 frames : list
319 List of frames to save
320 output_dir : str or Path
321 Directory to save frames in
322 base_filename : str
323 Base name for frame files
324
325 Returns
326 -------
327 list
328 List of paths to saved frame files
329 """
330 output_dir = Path(output_dir)
331 output_dir.mkdir(parents=True, exist_ok=True)
332
333 saved_paths = []
334 for i, frame in enumerate(frames):
335 output_path = output_dir / f"{base_filename}_{i:04d}.jpg"
336 cv2.imwrite(str(output_path), frame)
337 saved_paths.append(output_path)
338
339 return saved_paths
340
--- video_processor/extractors/frame_extractor.py
+++ video_processor/extractors/frame_extractor.py
@@ -1,6 +1,7 @@
1 """Frame extraction module for video processing."""
2
3 import functools
4 import logging
5 from pathlib import Path
6 from typing import List, Optional, Tuple, Union
7
@@ -112,44 +113,49 @@
113 filtered.append(frame)
114
115 if removed:
116 logger.info(f"Filtered out {removed}/{len(frames)} people/webcam frames")
117 return filtered, removed
118
119
120 def is_gpu_available() -> bool:
121 """Check if GPU acceleration is available for OpenCV."""
122 try:
123 # Check if CUDA is available
124 count = cv2.cuda.getCudaEnabledDeviceCount()
125 return count > 0
126 except Exception:
127 return False
128
129
130 def gpu_accelerated(func):
131 """Decorator to use GPU implementation when available."""
132
133 @functools.wraps(func)
134 def wrapper(*args, **kwargs):
135 if is_gpu_available() and not kwargs.get("disable_gpu"):
136 # Remove the disable_gpu kwarg if it exists
137 kwargs.pop("disable_gpu", None)
138 return func_gpu(*args, **kwargs)
139 # Remove the disable_gpu kwarg if it exists
140 kwargs.pop("disable_gpu", None)
141 return func(*args, **kwargs)
142
143 return wrapper
144
145
146 def calculate_frame_difference(prev_frame: np.ndarray, curr_frame: np.ndarray) -> float:
147 """
148 Calculate the difference between two frames.
149
150 Parameters
151 ----------
152 prev_frame : np.ndarray
153 Previous frame
154 curr_frame : np.ndarray
155 Current frame
156
157 Returns
158 -------
159 float
160 Difference score between 0 and 1
161 """
@@ -156,30 +162,31 @@
162 # Convert to grayscale
163 if len(prev_frame.shape) == 3:
164 prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
165 else:
166 prev_gray = prev_frame
167
168 if len(curr_frame.shape) == 3:
169 curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY)
170 else:
171 curr_gray = curr_frame
172
173 # Calculate absolute difference
174 diff = cv2.absdiff(prev_gray, curr_gray)
175
176 # Normalize and return mean difference
177 return np.mean(diff) / 255.0
178
179
180 @gpu_accelerated
181 def extract_frames(
182 video_path: Union[str, Path],
183 sampling_rate: float = 1.0,
184 change_threshold: float = 0.15,
185 periodic_capture_seconds: float = 30.0,
186 max_frames: Optional[int] = None,
187 resize_to: Optional[Tuple[int, int]] = None,
188 ) -> List[np.ndarray]:
189 """
190 Extract frames from video based on visual change detection + periodic capture.
191
192 Two capture strategies work together:
@@ -273,11 +280,13 @@
280 if diff > change_threshold:
281 should_capture = True
282 reason = f"change={diff:.3f}"
283
284 # Periodic capture — even if change is small
285 elif (
286 periodic_interval > 0 and (frame_idx - last_capture_frame) >= periodic_interval
287 ):
288 should_capture = True
289 reason = "periodic"
290
291 if should_capture:
292 extracted_frames.append(frame)
@@ -299,41 +308,45 @@
308
309 pbar.close()
310 cap.release()
311 logger.info(f"Extracted {len(extracted_frames)} frames from {frame_count} total frames")
312 return extracted_frames
313
314
315 def func_gpu(*args, **kwargs):
316 """GPU-accelerated version of extract_frames."""
317 # This would be implemented with CUDA acceleration
318 # For now, fall back to the unwrapped CPU version
319 logger.info("GPU acceleration not yet implemented, falling back to CPU")
320 return extract_frames.__wrapped__(*args, **kwargs)
321
322
323 def save_frames(
324 frames: List[np.ndarray], output_dir: Union[str, Path], base_filename: str = "frame"
325 ) -> List[Path]:
326 """
327 Save extracted frames to disk.
328
329 Parameters
330 ----------
331 frames : list
332 List of frames to save
333 output_dir : str or Path
334 Directory to save frames in
335 base_filename : str
336 Base name for frame files
337
338 Returns
339 -------
340 list
341 List of paths to saved frame files
342 """
343 output_dir = Path(output_dir)
344 output_dir.mkdir(parents=True, exist_ok=True)
345
346 saved_paths = []
347 for i, frame in enumerate(frames):
348 output_path = output_dir / f"{base_filename}_{i:04d}.jpg"
349 cv2.imwrite(str(output_path), frame)
350 saved_paths.append(output_path)
351
352 return saved_paths
353
--- video_processor/extractors/text_extractor.py
+++ video_processor/extractors/text_extractor.py
@@ -1,48 +1,51 @@
11
"""Text extraction module for frames and diagrams."""
2
+
23
import logging
34
from pathlib import Path
45
from typing import Dict, List, Optional, Tuple, Union
56
67
import cv2
78
import numpy as np
89
910
logger = logging.getLogger(__name__)
11
+
1012
1113
class TextExtractor:
1214
"""Extract text from images, frames, and diagrams."""
13
-
15
+
1416
def __init__(self, tesseract_path: Optional[str] = None):
1517
"""
1618
Initialize text extractor.
17
-
19
+
1820
Parameters
1921
----------
2022
tesseract_path : str, optional
2123
Path to tesseract executable for local OCR
2224
"""
2325
self.tesseract_path = tesseract_path
24
-
26
+
2527
# Check if we're using tesseract locally
2628
self.use_local_ocr = False
2729
if tesseract_path:
2830
try:
2931
import pytesseract
32
+
3033
pytesseract.pytesseract.tesseract_cmd = tesseract_path
3134
self.use_local_ocr = True
3235
except ImportError:
3336
logger.warning("pytesseract not installed, local OCR unavailable")
34
-
37
+
3538
def preprocess_image(self, image: np.ndarray) -> np.ndarray:
3639
"""
3740
Preprocess image for better text extraction.
38
-
41
+
3942
Parameters
4043
----------
4144
image : np.ndarray
4245
Input image
43
-
46
+
4447
Returns
4548
-------
4649
np.ndarray
4750
Preprocessed image
4851
"""
@@ -49,66 +52,61 @@
4952
# Convert to grayscale if not already
5053
if len(image.shape) == 3:
5154
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
5255
else:
5356
gray = image
54
-
57
+
5558
# Apply adaptive thresholding
5659
thresh = cv2.adaptiveThreshold(
57
- gray,
58
- 255,
59
- cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
60
- cv2.THRESH_BINARY_INV,
61
- 11,
62
- 2
63
- )
64
-
60
+ gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2
61
+ )
62
+
6563
# Noise removal
6664
kernel = np.ones((1, 1), np.uint8)
6765
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel)
68
-
66
+
6967
# Invert back
7068
result = cv2.bitwise_not(opening)
71
-
69
+
7270
return result
73
-
71
+
7472
def extract_text_local(self, image: np.ndarray) -> str:
7573
"""
7674
Extract text from image using local OCR (Tesseract).
77
-
75
+
7876
Parameters
7977
----------
8078
image : np.ndarray
8179
Input image
82
-
80
+
8381
Returns
8482
-------
8583
str
8684
Extracted text
8785
"""
8886
if not self.use_local_ocr:
8987
raise RuntimeError("Local OCR not configured")
90
-
88
+
9189
import pytesseract
92
-
90
+
9391
# Preprocess image
9492
processed = self.preprocess_image(image)
95
-
93
+
9694
# Extract text
9795
text = pytesseract.image_to_string(processed)
98
-
96
+
9997
return text
100
-
98
+
10199
def detect_text_regions(self, image: np.ndarray) -> List[Tuple[int, int, int, int]]:
102100
"""
103101
Detect potential text regions in image.
104
-
102
+
105103
Parameters
106104
----------
107105
image : np.ndarray
108106
Input image
109
-
107
+
110108
Returns
111109
-------
112110
list
113111
List of bounding boxes for text regions (x, y, w, h)
114112
"""
@@ -115,179 +113,182 @@
115113
# Convert to grayscale
116114
if len(image.shape) == 3:
117115
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
118116
else:
119117
gray = image
120
-
118
+
121119
# Apply MSER (Maximally Stable Extremal Regions)
122120
mser = cv2.MSER_create()
123121
regions, _ = mser.detectRegions(gray)
124
-
122
+
125123
# Convert regions to bounding boxes
126124
bboxes = []
127125
for region in regions:
128126
x, y, w, h = cv2.boundingRect(region.reshape(-1, 1, 2))
129
-
127
+
130128
# Apply filtering criteria for text-like regions
131129
aspect_ratio = w / float(h)
132130
if 0.1 < aspect_ratio < 10 and h > 5 and w > 5:
133131
bboxes.append((x, y, w, h))
134
-
132
+
135133
# Merge overlapping boxes
136134
merged_bboxes = self._merge_overlapping_boxes(bboxes)
137
-
135
+
138136
logger.debug(f"Detected {len(merged_bboxes)} text regions")
139137
return merged_bboxes
140
-
141
- def _merge_overlapping_boxes(self, boxes: List[Tuple[int, int, int, int]]) -> List[Tuple[int, int, int, int]]:
138
+
139
+ def _merge_overlapping_boxes(
140
+ self, boxes: List[Tuple[int, int, int, int]]
141
+ ) -> List[Tuple[int, int, int, int]]:
142142
"""
143143
Merge overlapping bounding boxes.
144
-
144
+
145145
Parameters
146146
----------
147147
boxes : list
148148
List of bounding boxes (x, y, w, h)
149
-
149
+
150150
Returns
151151
-------
152152
list
153153
Merged bounding boxes
154154
"""
155155
if not boxes:
156156
return []
157
-
157
+
158158
# Sort boxes by x coordinate
159159
sorted_boxes = sorted(boxes, key=lambda b: b[0])
160
-
160
+
161161
merged = []
162162
current = list(sorted_boxes[0])
163
-
163
+
164164
for box in sorted_boxes[1:]:
165165
# Check if current box overlaps with the next one
166
- if (current[0] <= box[0] + box[2] and
167
- box[0] <= current[0] + current[2] and
168
- current[1] <= box[1] + box[3] and
169
- box[1] <= current[1] + current[3]):
170
-
166
+ if (
167
+ current[0] <= box[0] + box[2]
168
+ and box[0] <= current[0] + current[2]
169
+ and current[1] <= box[1] + box[3]
170
+ and box[1] <= current[1] + current[3]
171
+ ):
171172
# Calculate merged box
172173
x1 = min(current[0], box[0])
173174
y1 = min(current[1], box[1])
174175
x2 = max(current[0] + current[2], box[0] + box[2])
175176
y2 = max(current[1] + current[3], box[1] + box[3])
176
-
177
+
177178
# Update current box
178179
current = [x1, y1, x2 - x1, y2 - y1]
179180
else:
180181
# Add current box to merged list and update current
181182
merged.append(tuple(current))
182183
current = list(box)
183
-
184
+
184185
# Add the last box
185186
merged.append(tuple(current))
186
-
187
+
187188
return merged
188
-
189
+
189190
def extract_text_from_regions(
190
- self,
191
- image: np.ndarray,
192
- regions: List[Tuple[int, int, int, int]]
191
+ self, image: np.ndarray, regions: List[Tuple[int, int, int, int]]
193192
) -> Dict[Tuple[int, int, int, int], str]:
194193
"""
195194
Extract text from specified regions in image.
196
-
195
+
197196
Parameters
198197
----------
199198
image : np.ndarray
200199
Input image
201200
regions : list
202201
List of regions as (x, y, w, h)
203
-
202
+
204203
Returns
205204
-------
206205
dict
207206
Dictionary of {region: text}
208207
"""
209208
results = {}
210
-
209
+
211210
for region in regions:
212211
x, y, w, h = region
213
-
212
+
214213
# Extract region
215
- roi = image[y:y+h, x:x+w]
216
-
214
+ roi = image[y : y + h, x : x + w]
215
+
217216
# Skip empty regions
218217
if roi.size == 0:
219218
continue
220
-
219
+
221220
# Extract text
222221
if self.use_local_ocr:
223222
text = self.extract_text_local(roi)
224223
else:
225224
text = "API-based text extraction not yet implemented"
226
-
225
+
227226
# Store non-empty results
228227
if text.strip():
229228
results[region] = text.strip()
230
-
229
+
231230
return results
232
-
231
+
233232
def extract_text_from_image(self, image: np.ndarray, detect_regions: bool = True) -> str:
234233
"""
235234
Extract text from entire image.
236
-
235
+
237236
Parameters
238237
----------
239238
image : np.ndarray
240239
Input image
241240
detect_regions : bool
242241
Whether to detect and process text regions separately
243
-
242
+
244243
Returns
245244
-------
246245
str
247246
Extracted text
248247
"""
249248
if detect_regions:
250249
# Detect regions and extract text from each
251250
regions = self.detect_text_regions(image)
252251
region_texts = self.extract_text_from_regions(image, regions)
253
-
252
+
254253
# Combine text from all regions
255254
text = "\n".join(region_texts.values())
256255
else:
257256
# Extract text from entire image
258257
if self.use_local_ocr:
259258
text = self.extract_text_local(image)
260259
else:
261260
text = "API-based text extraction not yet implemented"
262
-
261
+
263262
return text
264
-
265
- def extract_text_from_file(self, image_path: Union[str, Path], detect_regions: bool = True) -> str:
263
+
264
+ def extract_text_from_file(
265
+ self, image_path: Union[str, Path], detect_regions: bool = True
266
+ ) -> str:
266267
"""
267268
Extract text from image file.
268
-
269
+
269270
Parameters
270271
----------
271272
image_path : str or Path
272273
Path to image file
273274
detect_regions : bool
274275
Whether to detect and process text regions separately
275
-
276
+
276277
Returns
277278
-------
278279
str
279280
Extracted text
280281
"""
281282
image_path = Path(image_path)
282283
if not image_path.exists():
283284
raise FileNotFoundError(f"Image file not found: {image_path}")
284
-
285
+
285286
# Load image
286287
image = cv2.imread(str(image_path))
287288
if image is None:
288289
raise ValueError(f"Failed to load image: {image_path}")
289
-
290
+
290291
# Extract text
291292
text = self.extract_text_from_image(image, detect_regions)
292
-
293
+
293294
return text
294295
--- video_processor/extractors/text_extractor.py
+++ video_processor/extractors/text_extractor.py
@@ -1,48 +1,51 @@
1 """Text extraction module for frames and diagrams."""
 
2 import logging
3 from pathlib import Path
4 from typing import Dict, List, Optional, Tuple, Union
5
6 import cv2
7 import numpy as np
8
9 logger = logging.getLogger(__name__)
 
10
11 class TextExtractor:
12 """Extract text from images, frames, and diagrams."""
13
14 def __init__(self, tesseract_path: Optional[str] = None):
15 """
16 Initialize text extractor.
17
18 Parameters
19 ----------
20 tesseract_path : str, optional
21 Path to tesseract executable for local OCR
22 """
23 self.tesseract_path = tesseract_path
24
25 # Check if we're using tesseract locally
26 self.use_local_ocr = False
27 if tesseract_path:
28 try:
29 import pytesseract
 
30 pytesseract.pytesseract.tesseract_cmd = tesseract_path
31 self.use_local_ocr = True
32 except ImportError:
33 logger.warning("pytesseract not installed, local OCR unavailable")
34
35 def preprocess_image(self, image: np.ndarray) -> np.ndarray:
36 """
37 Preprocess image for better text extraction.
38
39 Parameters
40 ----------
41 image : np.ndarray
42 Input image
43
44 Returns
45 -------
46 np.ndarray
47 Preprocessed image
48 """
@@ -49,66 +52,61 @@
49 # Convert to grayscale if not already
50 if len(image.shape) == 3:
51 gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
52 else:
53 gray = image
54
55 # Apply adaptive thresholding
56 thresh = cv2.adaptiveThreshold(
57 gray,
58 255,
59 cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
60 cv2.THRESH_BINARY_INV,
61 11,
62 2
63 )
64
65 # Noise removal
66 kernel = np.ones((1, 1), np.uint8)
67 opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel)
68
69 # Invert back
70 result = cv2.bitwise_not(opening)
71
72 return result
73
74 def extract_text_local(self, image: np.ndarray) -> str:
75 """
76 Extract text from image using local OCR (Tesseract).
77
78 Parameters
79 ----------
80 image : np.ndarray
81 Input image
82
83 Returns
84 -------
85 str
86 Extracted text
87 """
88 if not self.use_local_ocr:
89 raise RuntimeError("Local OCR not configured")
90
91 import pytesseract
92
93 # Preprocess image
94 processed = self.preprocess_image(image)
95
96 # Extract text
97 text = pytesseract.image_to_string(processed)
98
99 return text
100
101 def detect_text_regions(self, image: np.ndarray) -> List[Tuple[int, int, int, int]]:
102 """
103 Detect potential text regions in image.
104
105 Parameters
106 ----------
107 image : np.ndarray
108 Input image
109
110 Returns
111 -------
112 list
113 List of bounding boxes for text regions (x, y, w, h)
114 """
@@ -115,179 +113,182 @@
115 # Convert to grayscale
116 if len(image.shape) == 3:
117 gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
118 else:
119 gray = image
120
121 # Apply MSER (Maximally Stable Extremal Regions)
122 mser = cv2.MSER_create()
123 regions, _ = mser.detectRegions(gray)
124
125 # Convert regions to bounding boxes
126 bboxes = []
127 for region in regions:
128 x, y, w, h = cv2.boundingRect(region.reshape(-1, 1, 2))
129
130 # Apply filtering criteria for text-like regions
131 aspect_ratio = w / float(h)
132 if 0.1 < aspect_ratio < 10 and h > 5 and w > 5:
133 bboxes.append((x, y, w, h))
134
135 # Merge overlapping boxes
136 merged_bboxes = self._merge_overlapping_boxes(bboxes)
137
138 logger.debug(f"Detected {len(merged_bboxes)} text regions")
139 return merged_bboxes
140
141 def _merge_overlapping_boxes(self, boxes: List[Tuple[int, int, int, int]]) -> List[Tuple[int, int, int, int]]:
 
 
142 """
143 Merge overlapping bounding boxes.
144
145 Parameters
146 ----------
147 boxes : list
148 List of bounding boxes (x, y, w, h)
149
150 Returns
151 -------
152 list
153 Merged bounding boxes
154 """
155 if not boxes:
156 return []
157
158 # Sort boxes by x coordinate
159 sorted_boxes = sorted(boxes, key=lambda b: b[0])
160
161 merged = []
162 current = list(sorted_boxes[0])
163
164 for box in sorted_boxes[1:]:
165 # Check if current box overlaps with the next one
166 if (current[0] <= box[0] + box[2] and
167 box[0] <= current[0] + current[2] and
168 current[1] <= box[1] + box[3] and
169 box[1] <= current[1] + current[3]):
170
 
171 # Calculate merged box
172 x1 = min(current[0], box[0])
173 y1 = min(current[1], box[1])
174 x2 = max(current[0] + current[2], box[0] + box[2])
175 y2 = max(current[1] + current[3], box[1] + box[3])
176
177 # Update current box
178 current = [x1, y1, x2 - x1, y2 - y1]
179 else:
180 # Add current box to merged list and update current
181 merged.append(tuple(current))
182 current = list(box)
183
184 # Add the last box
185 merged.append(tuple(current))
186
187 return merged
188
189 def extract_text_from_regions(
190 self,
191 image: np.ndarray,
192 regions: List[Tuple[int, int, int, int]]
193 ) -> Dict[Tuple[int, int, int, int], str]:
194 """
195 Extract text from specified regions in image.
196
197 Parameters
198 ----------
199 image : np.ndarray
200 Input image
201 regions : list
202 List of regions as (x, y, w, h)
203
204 Returns
205 -------
206 dict
207 Dictionary of {region: text}
208 """
209 results = {}
210
211 for region in regions:
212 x, y, w, h = region
213
214 # Extract region
215 roi = image[y:y+h, x:x+w]
216
217 # Skip empty regions
218 if roi.size == 0:
219 continue
220
221 # Extract text
222 if self.use_local_ocr:
223 text = self.extract_text_local(roi)
224 else:
225 text = "API-based text extraction not yet implemented"
226
227 # Store non-empty results
228 if text.strip():
229 results[region] = text.strip()
230
231 return results
232
233 def extract_text_from_image(self, image: np.ndarray, detect_regions: bool = True) -> str:
234 """
235 Extract text from entire image.
236
237 Parameters
238 ----------
239 image : np.ndarray
240 Input image
241 detect_regions : bool
242 Whether to detect and process text regions separately
243
244 Returns
245 -------
246 str
247 Extracted text
248 """
249 if detect_regions:
250 # Detect regions and extract text from each
251 regions = self.detect_text_regions(image)
252 region_texts = self.extract_text_from_regions(image, regions)
253
254 # Combine text from all regions
255 text = "\n".join(region_texts.values())
256 else:
257 # Extract text from entire image
258 if self.use_local_ocr:
259 text = self.extract_text_local(image)
260 else:
261 text = "API-based text extraction not yet implemented"
262
263 return text
264
265 def extract_text_from_file(self, image_path: Union[str, Path], detect_regions: bool = True) -> str:
 
 
266 """
267 Extract text from image file.
268
269 Parameters
270 ----------
271 image_path : str or Path
272 Path to image file
273 detect_regions : bool
274 Whether to detect and process text regions separately
275
276 Returns
277 -------
278 str
279 Extracted text
280 """
281 image_path = Path(image_path)
282 if not image_path.exists():
283 raise FileNotFoundError(f"Image file not found: {image_path}")
284
285 # Load image
286 image = cv2.imread(str(image_path))
287 if image is None:
288 raise ValueError(f"Failed to load image: {image_path}")
289
290 # Extract text
291 text = self.extract_text_from_image(image, detect_regions)
292
293 return text
294
--- video_processor/extractors/text_extractor.py
+++ video_processor/extractors/text_extractor.py
@@ -1,48 +1,51 @@
1 """Text extraction module for frames and diagrams."""
2
3 import logging
4 from pathlib import Path
5 from typing import Dict, List, Optional, Tuple, Union
6
7 import cv2
8 import numpy as np
9
10 logger = logging.getLogger(__name__)
11
12
13 class TextExtractor:
14 """Extract text from images, frames, and diagrams."""
15
16 def __init__(self, tesseract_path: Optional[str] = None):
17 """
18 Initialize text extractor.
19
20 Parameters
21 ----------
22 tesseract_path : str, optional
23 Path to tesseract executable for local OCR
24 """
25 self.tesseract_path = tesseract_path
26
27 # Check if we're using tesseract locally
28 self.use_local_ocr = False
29 if tesseract_path:
30 try:
31 import pytesseract
32
33 pytesseract.pytesseract.tesseract_cmd = tesseract_path
34 self.use_local_ocr = True
35 except ImportError:
36 logger.warning("pytesseract not installed, local OCR unavailable")
37
38 def preprocess_image(self, image: np.ndarray) -> np.ndarray:
39 """
40 Preprocess image for better text extraction.
41
42 Parameters
43 ----------
44 image : np.ndarray
45 Input image
46
47 Returns
48 -------
49 np.ndarray
50 Preprocessed image
51 """
@@ -49,66 +52,61 @@
52 # Convert to grayscale if not already
53 if len(image.shape) == 3:
54 gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
55 else:
56 gray = image
57
58 # Apply adaptive thresholding
59 thresh = cv2.adaptiveThreshold(
60 gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2
61 )
62
 
 
 
 
 
63 # Noise removal
64 kernel = np.ones((1, 1), np.uint8)
65 opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel)
66
67 # Invert back
68 result = cv2.bitwise_not(opening)
69
70 return result
71
72 def extract_text_local(self, image: np.ndarray) -> str:
73 """
74 Extract text from image using local OCR (Tesseract).
75
76 Parameters
77 ----------
78 image : np.ndarray
79 Input image
80
81 Returns
82 -------
83 str
84 Extracted text
85 """
86 if not self.use_local_ocr:
87 raise RuntimeError("Local OCR not configured")
88
89 import pytesseract
90
91 # Preprocess image
92 processed = self.preprocess_image(image)
93
94 # Extract text
95 text = pytesseract.image_to_string(processed)
96
97 return text
98
99 def detect_text_regions(self, image: np.ndarray) -> List[Tuple[int, int, int, int]]:
100 """
101 Detect potential text regions in image.
102
103 Parameters
104 ----------
105 image : np.ndarray
106 Input image
107
108 Returns
109 -------
110 list
111 List of bounding boxes for text regions (x, y, w, h)
112 """
@@ -115,179 +113,182 @@
113 # Convert to grayscale
114 if len(image.shape) == 3:
115 gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
116 else:
117 gray = image
118
119 # Apply MSER (Maximally Stable Extremal Regions)
120 mser = cv2.MSER_create()
121 regions, _ = mser.detectRegions(gray)
122
123 # Convert regions to bounding boxes
124 bboxes = []
125 for region in regions:
126 x, y, w, h = cv2.boundingRect(region.reshape(-1, 1, 2))
127
128 # Apply filtering criteria for text-like regions
129 aspect_ratio = w / float(h)
130 if 0.1 < aspect_ratio < 10 and h > 5 and w > 5:
131 bboxes.append((x, y, w, h))
132
133 # Merge overlapping boxes
134 merged_bboxes = self._merge_overlapping_boxes(bboxes)
135
136 logger.debug(f"Detected {len(merged_bboxes)} text regions")
137 return merged_bboxes
138
139 def _merge_overlapping_boxes(
140 self, boxes: List[Tuple[int, int, int, int]]
141 ) -> List[Tuple[int, int, int, int]]:
142 """
143 Merge overlapping bounding boxes.
144
145 Parameters
146 ----------
147 boxes : list
148 List of bounding boxes (x, y, w, h)
149
150 Returns
151 -------
152 list
153 Merged bounding boxes
154 """
155 if not boxes:
156 return []
157
158 # Sort boxes by x coordinate
159 sorted_boxes = sorted(boxes, key=lambda b: b[0])
160
161 merged = []
162 current = list(sorted_boxes[0])
163
164 for box in sorted_boxes[1:]:
165 # Check if current box overlaps with the next one
166 if (
167 current[0] <= box[0] + box[2]
168 and box[0] <= current[0] + current[2]
169 and current[1] <= box[1] + box[3]
170 and box[1] <= current[1] + current[3]
171 ):
172 # Calculate merged box
173 x1 = min(current[0], box[0])
174 y1 = min(current[1], box[1])
175 x2 = max(current[0] + current[2], box[0] + box[2])
176 y2 = max(current[1] + current[3], box[1] + box[3])
177
178 # Update current box
179 current = [x1, y1, x2 - x1, y2 - y1]
180 else:
181 # Add current box to merged list and update current
182 merged.append(tuple(current))
183 current = list(box)
184
185 # Add the last box
186 merged.append(tuple(current))
187
188 return merged
189
190 def extract_text_from_regions(
191 self, image: np.ndarray, regions: List[Tuple[int, int, int, int]]
 
 
192 ) -> Dict[Tuple[int, int, int, int], str]:
193 """
194 Extract text from specified regions in image.
195
196 Parameters
197 ----------
198 image : np.ndarray
199 Input image
200 regions : list
201 List of regions as (x, y, w, h)
202
203 Returns
204 -------
205 dict
206 Dictionary of {region: text}
207 """
208 results = {}
209
210 for region in regions:
211 x, y, w, h = region
212
213 # Extract region
214 roi = image[y : y + h, x : x + w]
215
216 # Skip empty regions
217 if roi.size == 0:
218 continue
219
220 # Extract text
221 if self.use_local_ocr:
222 text = self.extract_text_local(roi)
223 else:
224 text = "API-based text extraction not yet implemented"
225
226 # Store non-empty results
227 if text.strip():
228 results[region] = text.strip()
229
230 return results
231
232 def extract_text_from_image(self, image: np.ndarray, detect_regions: bool = True) -> str:
233 """
234 Extract text from entire image.
235
236 Parameters
237 ----------
238 image : np.ndarray
239 Input image
240 detect_regions : bool
241 Whether to detect and process text regions separately
242
243 Returns
244 -------
245 str
246 Extracted text
247 """
248 if detect_regions:
249 # Detect regions and extract text from each
250 regions = self.detect_text_regions(image)
251 region_texts = self.extract_text_from_regions(image, regions)
252
253 # Combine text from all regions
254 text = "\n".join(region_texts.values())
255 else:
256 # Extract text from entire image
257 if self.use_local_ocr:
258 text = self.extract_text_local(image)
259 else:
260 text = "API-based text extraction not yet implemented"
261
262 return text
263
264 def extract_text_from_file(
265 self, image_path: Union[str, Path], detect_regions: bool = True
266 ) -> str:
267 """
268 Extract text from image file.
269
270 Parameters
271 ----------
272 image_path : str or Path
273 Path to image file
274 detect_regions : bool
275 Whether to detect and process text regions separately
276
277 Returns
278 -------
279 str
280 Extracted text
281 """
282 image_path = Path(image_path)
283 if not image_path.exists():
284 raise FileNotFoundError(f"Image file not found: {image_path}")
285
286 # Load image
287 image = cv2.imread(str(image_path))
288 if image is None:
289 raise ValueError(f"Failed to load image: {image_path}")
290
291 # Extract text
292 text = self.extract_text_from_image(image, detect_regions)
293
294 return text
295
--- video_processor/integrators/knowledge_graph.py
+++ video_processor/integrators/knowledge_graph.py
@@ -1,8 +1,7 @@
11
"""Knowledge graph integration for organizing extracted content."""
22
3
-import json
43
import logging
54
from pathlib import Path
65
from typing import Dict, List, Optional, Union
76
87
from tqdm import tqdm
@@ -33,18 +32,24 @@
3332
[{"role": "user", "content": prompt}],
3433
max_tokens=4096,
3534
temperature=temperature,
3635
)
3736
38
- def extract_entities_and_relationships(self, text: str) -> tuple[List[Entity], List[Relationship]]:
37
+ def extract_entities_and_relationships(
38
+ self, text: str
39
+ ) -> tuple[List[Entity], List[Relationship]]:
3940
"""Extract entities and relationships in a single LLM call."""
4041
prompt = (
4142
"Extract all notable entities and relationships from the following content.\n\n"
4243
f"CONTENT:\n{text}\n\n"
4344
"Return a JSON object with two keys:\n"
44
- '- "entities": array of {"name": "...", "type": "person|concept|technology|organization|time", "description": "brief description"}\n'
45
- '- "relationships": array of {"source": "entity name", "target": "entity name", "type": "relationship description"}\n\n'
45
+ '- "entities": array of {"name": "...", '
46
+ '"type": "person|concept|technology|organization|time", '
47
+ '"description": "brief description"}\n'
48
+ '- "relationships": array of {"source": "entity name", '
49
+ '"target": "entity name", '
50
+ '"type": "relationship description"}\n\n'
4651
"Return ONLY the JSON object."
4752
)
4853
raw = self._chat(prompt)
4954
parsed = parse_json_from_response(raw)
5055
@@ -52,32 +57,38 @@
5257
rels = []
5358
5459
if isinstance(parsed, dict):
5560
for item in parsed.get("entities", []):
5661
if isinstance(item, dict) and "name" in item:
57
- entities.append(Entity(
58
- name=item["name"],
59
- type=item.get("type", "concept"),
60
- descriptions=[item["description"]] if item.get("description") else [],
61
- ))
62
- entity_names = {e.name for e in entities}
62
+ entities.append(
63
+ Entity(
64
+ name=item["name"],
65
+ type=item.get("type", "concept"),
66
+ descriptions=[item["description"]] if item.get("description") else [],
67
+ )
68
+ )
69
+ {e.name for e in entities}
6370
for item in parsed.get("relationships", []):
6471
if isinstance(item, dict) and "source" in item and "target" in item:
65
- rels.append(Relationship(
66
- source=item["source"],
67
- target=item["target"],
68
- type=item.get("type", "related_to"),
69
- ))
72
+ rels.append(
73
+ Relationship(
74
+ source=item["source"],
75
+ target=item["target"],
76
+ type=item.get("type", "related_to"),
77
+ )
78
+ )
7079
elif isinstance(parsed, list):
7180
# Fallback: if model returns a flat entity list
7281
for item in parsed:
7382
if isinstance(item, dict) and "name" in item:
74
- entities.append(Entity(
75
- name=item["name"],
76
- type=item.get("type", "concept"),
77
- descriptions=[item["description"]] if item.get("description") else [],
78
- ))
83
+ entities.append(
84
+ Entity(
85
+ name=item["name"],
86
+ type=item.get("type", "concept"),
87
+ descriptions=[item["description"]] if item.get("description") else [],
88
+ )
89
+ )
7990
8091
return entities, rels
8192
8293
def add_content(self, text: str, source: str, timestamp: Optional[float] = None) -> None:
8394
"""Add content to knowledge graph by extracting entities and relationships."""
@@ -84,39 +95,45 @@
8495
entities, relationships = self.extract_entities_and_relationships(text)
8596
8697
for entity in entities:
8798
eid = entity.name
8899
if eid in self.nodes:
89
- self.nodes[eid]["occurrences"].append({
90
- "source": source,
91
- "timestamp": timestamp,
92
- "text": text[:100] + "..." if len(text) > 100 else text,
93
- })
100
+ self.nodes[eid]["occurrences"].append(
101
+ {
102
+ "source": source,
103
+ "timestamp": timestamp,
104
+ "text": text[:100] + "..." if len(text) > 100 else text,
105
+ }
106
+ )
94107
if entity.descriptions:
95108
self.nodes[eid]["descriptions"].update(entity.descriptions)
96109
else:
97110
self.nodes[eid] = {
98111
"id": eid,
99112
"name": entity.name,
100113
"type": entity.type,
101114
"descriptions": set(entity.descriptions),
102
- "occurrences": [{
103
- "source": source,
104
- "timestamp": timestamp,
105
- "text": text[:100] + "..." if len(text) > 100 else text,
106
- }],
115
+ "occurrences": [
116
+ {
117
+ "source": source,
118
+ "timestamp": timestamp,
119
+ "text": text[:100] + "..." if len(text) > 100 else text,
120
+ }
121
+ ],
107122
}
108123
109124
for rel in relationships:
110125
if rel.source in self.nodes and rel.target in self.nodes:
111
- self.relationships.append({
112
- "source": rel.source,
113
- "target": rel.target,
114
- "type": rel.type,
115
- "content_source": source,
116
- "timestamp": timestamp,
117
- })
126
+ self.relationships.append(
127
+ {
128
+ "source": rel.source,
129
+ "target": rel.target,
130
+ "type": rel.type,
131
+ "content_source": source,
132
+ "timestamp": timestamp,
133
+ }
134
+ )
118135
119136
def process_transcript(self, transcript: Dict, batch_size: int = 10) -> None:
120137
"""Process transcript segments into knowledge graph, batching for efficiency."""
121138
if "segments" not in transcript:
122139
logger.warning("Transcript missing segments")
@@ -137,17 +154,15 @@
137154
}
138155
139156
# Batch segments together for fewer API calls
140157
batches = []
141158
for start in range(0, len(segments), batch_size):
142
- batches.append(segments[start:start + batch_size])
159
+ batches.append(segments[start : start + batch_size])
143160
144161
for batch in tqdm(batches, desc="Building knowledge graph", unit="batch"):
145162
# Combine batch text
146
- combined_text = " ".join(
147
- seg["text"] for seg in batch if "text" in seg
148
- )
163
+ combined_text = " ".join(seg["text"] for seg in batch if "text" in seg)
149164
if not combined_text.strip():
150165
continue
151166
152167
# Use first segment's timestamp as batch timestamp
153168
batch_start_idx = segments.index(batch[0])
@@ -169,29 +184,33 @@
169184
self.nodes[diagram_id] = {
170185
"id": diagram_id,
171186
"name": f"Diagram {i}",
172187
"type": "diagram",
173188
"descriptions": {"Visual diagram from video"},
174
- "occurrences": [{
175
- "source": source if text_content else f"diagram_{i}",
176
- "frame_index": diagram.get("frame_index"),
177
- }],
189
+ "occurrences": [
190
+ {
191
+ "source": source if text_content else f"diagram_{i}",
192
+ "frame_index": diagram.get("frame_index"),
193
+ }
194
+ ],
178195
}
179196
180197
def to_data(self) -> KnowledgeGraphData:
181198
"""Convert to pydantic KnowledgeGraphData model."""
182199
nodes = []
183200
for node in self.nodes.values():
184201
descs = node.get("descriptions", set())
185202
if isinstance(descs, set):
186203
descs = list(descs)
187
- nodes.append(Entity(
188
- name=node["name"],
189
- type=node.get("type", "concept"),
190
- descriptions=descs,
191
- occurrences=node.get("occurrences", []),
192
- ))
204
+ nodes.append(
205
+ Entity(
206
+ name=node["name"],
207
+ type=node.get("type", "concept"),
208
+ descriptions=descs,
209
+ occurrences=node.get("occurrences", []),
210
+ )
211
+ )
193212
194213
rels = [
195214
Relationship(
196215
source=r["source"],
197216
target=r["target"],
@@ -280,11 +299,12 @@
280299
def generate_mermaid(self, max_nodes: int = 30) -> str:
281300
"""Generate Mermaid visualization code."""
282301
node_importance = {}
283302
for node_id in self.nodes:
284303
count = sum(
285
- 1 for rel in self.relationships
304
+ 1
305
+ for rel in self.relationships
286306
if rel["source"] == node_id or rel["target"] == node_id
287307
)
288308
node_importance[node_id] = count
289309
290310
important = sorted(node_importance.items(), key=lambda x: x[1], reverse=True)
291311
--- video_processor/integrators/knowledge_graph.py
+++ video_processor/integrators/knowledge_graph.py
@@ -1,8 +1,7 @@
1 """Knowledge graph integration for organizing extracted content."""
2
3 import json
4 import logging
5 from pathlib import Path
6 from typing import Dict, List, Optional, Union
7
8 from tqdm import tqdm
@@ -33,18 +32,24 @@
33 [{"role": "user", "content": prompt}],
34 max_tokens=4096,
35 temperature=temperature,
36 )
37
38 def extract_entities_and_relationships(self, text: str) -> tuple[List[Entity], List[Relationship]]:
 
 
39 """Extract entities and relationships in a single LLM call."""
40 prompt = (
41 "Extract all notable entities and relationships from the following content.\n\n"
42 f"CONTENT:\n{text}\n\n"
43 "Return a JSON object with two keys:\n"
44 '- "entities": array of {"name": "...", "type": "person|concept|technology|organization|time", "description": "brief description"}\n'
45 '- "relationships": array of {"source": "entity name", "target": "entity name", "type": "relationship description"}\n\n'
 
 
 
 
46 "Return ONLY the JSON object."
47 )
48 raw = self._chat(prompt)
49 parsed = parse_json_from_response(raw)
50
@@ -52,32 +57,38 @@
52 rels = []
53
54 if isinstance(parsed, dict):
55 for item in parsed.get("entities", []):
56 if isinstance(item, dict) and "name" in item:
57 entities.append(Entity(
58 name=item["name"],
59 type=item.get("type", "concept"),
60 descriptions=[item["description"]] if item.get("description") else [],
61 ))
62 entity_names = {e.name for e in entities}
 
 
63 for item in parsed.get("relationships", []):
64 if isinstance(item, dict) and "source" in item and "target" in item:
65 rels.append(Relationship(
66 source=item["source"],
67 target=item["target"],
68 type=item.get("type", "related_to"),
69 ))
 
 
70 elif isinstance(parsed, list):
71 # Fallback: if model returns a flat entity list
72 for item in parsed:
73 if isinstance(item, dict) and "name" in item:
74 entities.append(Entity(
75 name=item["name"],
76 type=item.get("type", "concept"),
77 descriptions=[item["description"]] if item.get("description") else [],
78 ))
 
 
79
80 return entities, rels
81
82 def add_content(self, text: str, source: str, timestamp: Optional[float] = None) -> None:
83 """Add content to knowledge graph by extracting entities and relationships."""
@@ -84,39 +95,45 @@
84 entities, relationships = self.extract_entities_and_relationships(text)
85
86 for entity in entities:
87 eid = entity.name
88 if eid in self.nodes:
89 self.nodes[eid]["occurrences"].append({
90 "source": source,
91 "timestamp": timestamp,
92 "text": text[:100] + "..." if len(text) > 100 else text,
93 })
 
 
94 if entity.descriptions:
95 self.nodes[eid]["descriptions"].update(entity.descriptions)
96 else:
97 self.nodes[eid] = {
98 "id": eid,
99 "name": entity.name,
100 "type": entity.type,
101 "descriptions": set(entity.descriptions),
102 "occurrences": [{
103 "source": source,
104 "timestamp": timestamp,
105 "text": text[:100] + "..." if len(text) > 100 else text,
106 }],
 
 
107 }
108
109 for rel in relationships:
110 if rel.source in self.nodes and rel.target in self.nodes:
111 self.relationships.append({
112 "source": rel.source,
113 "target": rel.target,
114 "type": rel.type,
115 "content_source": source,
116 "timestamp": timestamp,
117 })
 
 
118
119 def process_transcript(self, transcript: Dict, batch_size: int = 10) -> None:
120 """Process transcript segments into knowledge graph, batching for efficiency."""
121 if "segments" not in transcript:
122 logger.warning("Transcript missing segments")
@@ -137,17 +154,15 @@
137 }
138
139 # Batch segments together for fewer API calls
140 batches = []
141 for start in range(0, len(segments), batch_size):
142 batches.append(segments[start:start + batch_size])
143
144 for batch in tqdm(batches, desc="Building knowledge graph", unit="batch"):
145 # Combine batch text
146 combined_text = " ".join(
147 seg["text"] for seg in batch if "text" in seg
148 )
149 if not combined_text.strip():
150 continue
151
152 # Use first segment's timestamp as batch timestamp
153 batch_start_idx = segments.index(batch[0])
@@ -169,29 +184,33 @@
169 self.nodes[diagram_id] = {
170 "id": diagram_id,
171 "name": f"Diagram {i}",
172 "type": "diagram",
173 "descriptions": {"Visual diagram from video"},
174 "occurrences": [{
175 "source": source if text_content else f"diagram_{i}",
176 "frame_index": diagram.get("frame_index"),
177 }],
 
 
178 }
179
180 def to_data(self) -> KnowledgeGraphData:
181 """Convert to pydantic KnowledgeGraphData model."""
182 nodes = []
183 for node in self.nodes.values():
184 descs = node.get("descriptions", set())
185 if isinstance(descs, set):
186 descs = list(descs)
187 nodes.append(Entity(
188 name=node["name"],
189 type=node.get("type", "concept"),
190 descriptions=descs,
191 occurrences=node.get("occurrences", []),
192 ))
 
 
193
194 rels = [
195 Relationship(
196 source=r["source"],
197 target=r["target"],
@@ -280,11 +299,12 @@
280 def generate_mermaid(self, max_nodes: int = 30) -> str:
281 """Generate Mermaid visualization code."""
282 node_importance = {}
283 for node_id in self.nodes:
284 count = sum(
285 1 for rel in self.relationships
 
286 if rel["source"] == node_id or rel["target"] == node_id
287 )
288 node_importance[node_id] = count
289
290 important = sorted(node_importance.items(), key=lambda x: x[1], reverse=True)
291
--- video_processor/integrators/knowledge_graph.py
+++ video_processor/integrators/knowledge_graph.py
@@ -1,8 +1,7 @@
1 """Knowledge graph integration for organizing extracted content."""
2
 
3 import logging
4 from pathlib import Path
5 from typing import Dict, List, Optional, Union
6
7 from tqdm import tqdm
@@ -33,18 +32,24 @@
32 [{"role": "user", "content": prompt}],
33 max_tokens=4096,
34 temperature=temperature,
35 )
36
37 def extract_entities_and_relationships(
38 self, text: str
39 ) -> tuple[List[Entity], List[Relationship]]:
40 """Extract entities and relationships in a single LLM call."""
41 prompt = (
42 "Extract all notable entities and relationships from the following content.\n\n"
43 f"CONTENT:\n{text}\n\n"
44 "Return a JSON object with two keys:\n"
45 '- "entities": array of {"name": "...", '
46 '"type": "person|concept|technology|organization|time", '
47 '"description": "brief description"}\n'
48 '- "relationships": array of {"source": "entity name", '
49 '"target": "entity name", '
50 '"type": "relationship description"}\n\n'
51 "Return ONLY the JSON object."
52 )
53 raw = self._chat(prompt)
54 parsed = parse_json_from_response(raw)
55
@@ -52,32 +57,38 @@
57 rels = []
58
59 if isinstance(parsed, dict):
60 for item in parsed.get("entities", []):
61 if isinstance(item, dict) and "name" in item:
62 entities.append(
63 Entity(
64 name=item["name"],
65 type=item.get("type", "concept"),
66 descriptions=[item["description"]] if item.get("description") else [],
67 )
68 )
69 {e.name for e in entities}
70 for item in parsed.get("relationships", []):
71 if isinstance(item, dict) and "source" in item and "target" in item:
72 rels.append(
73 Relationship(
74 source=item["source"],
75 target=item["target"],
76 type=item.get("type", "related_to"),
77 )
78 )
79 elif isinstance(parsed, list):
80 # Fallback: if model returns a flat entity list
81 for item in parsed:
82 if isinstance(item, dict) and "name" in item:
83 entities.append(
84 Entity(
85 name=item["name"],
86 type=item.get("type", "concept"),
87 descriptions=[item["description"]] if item.get("description") else [],
88 )
89 )
90
91 return entities, rels
92
93 def add_content(self, text: str, source: str, timestamp: Optional[float] = None) -> None:
94 """Add content to knowledge graph by extracting entities and relationships."""
@@ -84,39 +95,45 @@
95 entities, relationships = self.extract_entities_and_relationships(text)
96
97 for entity in entities:
98 eid = entity.name
99 if eid in self.nodes:
100 self.nodes[eid]["occurrences"].append(
101 {
102 "source": source,
103 "timestamp": timestamp,
104 "text": text[:100] + "..." if len(text) > 100 else text,
105 }
106 )
107 if entity.descriptions:
108 self.nodes[eid]["descriptions"].update(entity.descriptions)
109 else:
110 self.nodes[eid] = {
111 "id": eid,
112 "name": entity.name,
113 "type": entity.type,
114 "descriptions": set(entity.descriptions),
115 "occurrences": [
116 {
117 "source": source,
118 "timestamp": timestamp,
119 "text": text[:100] + "..." if len(text) > 100 else text,
120 }
121 ],
122 }
123
124 for rel in relationships:
125 if rel.source in self.nodes and rel.target in self.nodes:
126 self.relationships.append(
127 {
128 "source": rel.source,
129 "target": rel.target,
130 "type": rel.type,
131 "content_source": source,
132 "timestamp": timestamp,
133 }
134 )
135
136 def process_transcript(self, transcript: Dict, batch_size: int = 10) -> None:
137 """Process transcript segments into knowledge graph, batching for efficiency."""
138 if "segments" not in transcript:
139 logger.warning("Transcript missing segments")
@@ -137,17 +154,15 @@
154 }
155
156 # Batch segments together for fewer API calls
157 batches = []
158 for start in range(0, len(segments), batch_size):
159 batches.append(segments[start : start + batch_size])
160
161 for batch in tqdm(batches, desc="Building knowledge graph", unit="batch"):
162 # Combine batch text
163 combined_text = " ".join(seg["text"] for seg in batch if "text" in seg)
 
 
164 if not combined_text.strip():
165 continue
166
167 # Use first segment's timestamp as batch timestamp
168 batch_start_idx = segments.index(batch[0])
@@ -169,29 +184,33 @@
184 self.nodes[diagram_id] = {
185 "id": diagram_id,
186 "name": f"Diagram {i}",
187 "type": "diagram",
188 "descriptions": {"Visual diagram from video"},
189 "occurrences": [
190 {
191 "source": source if text_content else f"diagram_{i}",
192 "frame_index": diagram.get("frame_index"),
193 }
194 ],
195 }
196
197 def to_data(self) -> KnowledgeGraphData:
198 """Convert to pydantic KnowledgeGraphData model."""
199 nodes = []
200 for node in self.nodes.values():
201 descs = node.get("descriptions", set())
202 if isinstance(descs, set):
203 descs = list(descs)
204 nodes.append(
205 Entity(
206 name=node["name"],
207 type=node.get("type", "concept"),
208 descriptions=descs,
209 occurrences=node.get("occurrences", []),
210 )
211 )
212
213 rels = [
214 Relationship(
215 source=r["source"],
216 target=r["target"],
@@ -280,11 +299,12 @@
299 def generate_mermaid(self, max_nodes: int = 30) -> str:
300 """Generate Mermaid visualization code."""
301 node_importance = {}
302 for node_id in self.nodes:
303 count = sum(
304 1
305 for rel in self.relationships
306 if rel["source"] == node_id or rel["target"] == node_id
307 )
308 node_importance[node_id] = count
309
310 important = sorted(node_importance.items(), key=lambda x: x[1], reverse=True)
311
--- video_processor/integrators/plan_generator.py
+++ video_processor/integrators/plan_generator.py
@@ -1,14 +1,13 @@
11
"""Plan generation for creating structured markdown output."""
22
3
-import json
43
import logging
54
from pathlib import Path
65
from typing import Dict, List, Optional, Union
76
87
from video_processor.integrators.knowledge_graph import KnowledgeGraph
9
-from video_processor.models import BatchManifest, VideoManifest
8
+from video_processor.models import VideoManifest
109
from video_processor.providers.manager import ProviderManager
1110
1211
logger = logging.getLogger(__name__)
1312
1413
@@ -36,11 +35,13 @@
3635
"""Generate summary from transcript."""
3736
full_text = ""
3837
if "segments" in transcript:
3938
for segment in transcript["segments"]:
4039
if "text" in segment:
41
- speaker = f"{segment.get('speaker', 'Speaker')}: " if "speaker" in segment else ""
40
+ speaker = (
41
+ f"{segment.get('speaker', 'Speaker')}: " if "speaker" in segment else ""
42
+ )
4243
full_text += f"{speaker}{segment['text']}\n\n"
4344
4445
if not full_text.strip():
4546
full_text = transcript.get("text", "")
4647
4748
--- video_processor/integrators/plan_generator.py
+++ video_processor/integrators/plan_generator.py
@@ -1,14 +1,13 @@
1 """Plan generation for creating structured markdown output."""
2
3 import json
4 import logging
5 from pathlib import Path
6 from typing import Dict, List, Optional, Union
7
8 from video_processor.integrators.knowledge_graph import KnowledgeGraph
9 from video_processor.models import BatchManifest, VideoManifest
10 from video_processor.providers.manager import ProviderManager
11
12 logger = logging.getLogger(__name__)
13
14
@@ -36,11 +35,13 @@
36 """Generate summary from transcript."""
37 full_text = ""
38 if "segments" in transcript:
39 for segment in transcript["segments"]:
40 if "text" in segment:
41 speaker = f"{segment.get('speaker', 'Speaker')}: " if "speaker" in segment else ""
 
 
42 full_text += f"{speaker}{segment['text']}\n\n"
43
44 if not full_text.strip():
45 full_text = transcript.get("text", "")
46
47
--- video_processor/integrators/plan_generator.py
+++ video_processor/integrators/plan_generator.py
@@ -1,14 +1,13 @@
1 """Plan generation for creating structured markdown output."""
2
 
3 import logging
4 from pathlib import Path
5 from typing import Dict, List, Optional, Union
6
7 from video_processor.integrators.knowledge_graph import KnowledgeGraph
8 from video_processor.models import VideoManifest
9 from video_processor.providers.manager import ProviderManager
10
11 logger = logging.getLogger(__name__)
12
13
@@ -36,11 +35,13 @@
35 """Generate summary from transcript."""
36 full_text = ""
37 if "segments" in transcript:
38 for segment in transcript["segments"]:
39 if "text" in segment:
40 speaker = (
41 f"{segment.get('speaker', 'Speaker')}: " if "speaker" in segment else ""
42 )
43 full_text += f"{speaker}{segment['text']}\n\n"
44
45 if not full_text.strip():
46 full_text = transcript.get("text", "")
47
48
--- video_processor/models.py
+++ video_processor/models.py
@@ -1,17 +1,17 @@
11
"""Pydantic data models for PlanOpticon output."""
22
33
from datetime import datetime
44
from enum import Enum
5
-from pathlib import Path
65
from typing import Any, Dict, List, Optional
76
87
from pydantic import BaseModel, Field
98
109
1110
class DiagramType(str, Enum):
1211
"""Types of visual content detected in video frames."""
12
+
1313
flowchart = "flowchart"
1414
sequence = "sequence"
1515
architecture = "architecture"
1616
whiteboard = "whiteboard"
1717
chart = "chart"
@@ -21,10 +21,11 @@
2121
unknown = "unknown"
2222
2323
2424
class OutputFormat(str, Enum):
2525
"""Available output formats."""
26
+
2627
markdown = "markdown"
2728
json = "json"
2829
html = "html"
2930
pdf = "pdf"
3031
svg = "svg"
@@ -31,39 +32,47 @@
3132
png = "png"
3233
3334
3435
class TranscriptSegment(BaseModel):
3536
"""A single segment of transcribed audio."""
37
+
3638
start: float = Field(description="Start time in seconds")
3739
end: float = Field(description="End time in seconds")
3840
text: str = Field(description="Transcribed text")
3941
speaker: Optional[str] = Field(default=None, description="Speaker identifier")
4042
confidence: Optional[float] = Field(default=None, description="Transcription confidence 0-1")
4143
4244
4345
class ActionItem(BaseModel):
4446
"""An action item extracted from content."""
47
+
4548
action: str = Field(description="The action to be taken")
4649
assignee: Optional[str] = Field(default=None, description="Person responsible")
4750
deadline: Optional[str] = Field(default=None, description="Deadline or timeframe")
4851
priority: Optional[str] = Field(default=None, description="Priority level")
4952
context: Optional[str] = Field(default=None, description="Additional context")
50
- source: Optional[str] = Field(default=None, description="Where this was found (transcript/diagram)")
53
+ source: Optional[str] = Field(
54
+ default=None, description="Where this was found (transcript/diagram)"
55
+ )
5156
5257
5358
class KeyPoint(BaseModel):
5459
"""A key point extracted from content."""
60
+
5561
point: str = Field(description="The key point")
5662
topic: Optional[str] = Field(default=None, description="Topic or category")
5763
details: Optional[str] = Field(default=None, description="Supporting details")
5864
timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)")
5965
source: Optional[str] = Field(default=None, description="Where this was found")
60
- related_diagrams: List[int] = Field(default_factory=list, description="Indices of related diagrams")
66
+ related_diagrams: List[int] = Field(
67
+ default_factory=list, description="Indices of related diagrams"
68
+ )
6169
6270
6371
class DiagramResult(BaseModel):
6472
"""Result from diagram extraction and analysis."""
73
+
6574
frame_index: int = Field(description="Index of the source frame")
6675
timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)")
6776
diagram_type: DiagramType = Field(default=DiagramType.unknown, description="Type of diagram")
6877
confidence: float = Field(default=0.0, description="Detection confidence 0-1")
6978
description: Optional[str] = Field(default=None, description="Description of the diagram")
@@ -70,85 +79,95 @@
7079
text_content: Optional[str] = Field(default=None, description="Text visible in the diagram")
7180
elements: List[str] = Field(default_factory=list, description="Identified elements")
7281
relationships: List[str] = Field(default_factory=list, description="Identified relationships")
7382
mermaid: Optional[str] = Field(default=None, description="Mermaid syntax representation")
7483
chart_data: Optional[Dict[str, Any]] = Field(
75
- default=None,
76
- description="Chart data for reproduction (labels, values, chart_type)"
84
+ default=None, description="Chart data for reproduction (labels, values, chart_type)"
7785
)
7886
image_path: Optional[str] = Field(default=None, description="Relative path to original frame")
7987
svg_path: Optional[str] = Field(default=None, description="Relative path to rendered SVG")
8088
png_path: Optional[str] = Field(default=None, description="Relative path to rendered PNG")
8189
mermaid_path: Optional[str] = Field(default=None, description="Relative path to mermaid source")
8290
8391
8492
class ScreenCapture(BaseModel):
8593
"""A screengrab fallback when diagram extraction fails or is uncertain."""
94
+
8695
frame_index: int = Field(description="Index of the source frame")
8796
timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)")
8897
caption: Optional[str] = Field(default=None, description="Brief description of the content")
8998
image_path: Optional[str] = Field(default=None, description="Relative path to screenshot")
90
- confidence: float = Field(default=0.0, description="Detection confidence that triggered fallback")
99
+ confidence: float = Field(
100
+ default=0.0, description="Detection confidence that triggered fallback"
101
+ )
91102
92103
93104
class Entity(BaseModel):
94105
"""An entity in the knowledge graph."""
106
+
95107
name: str = Field(description="Entity name")
96108
type: str = Field(default="concept", description="Entity type (person, concept, time, diagram)")
97109
descriptions: List[str] = Field(default_factory=list, description="Descriptions of this entity")
98
- source: Optional[str] = Field(default=None, description="Source attribution (transcript/diagram/both)")
110
+ source: Optional[str] = Field(
111
+ default=None, description="Source attribution (transcript/diagram/both)"
112
+ )
99113
occurrences: List[Dict[str, Any]] = Field(
100
- default_factory=list,
101
- description="List of occurrences with source, timestamp, text"
114
+ default_factory=list, description="List of occurrences with source, timestamp, text"
102115
)
103116
104117
105118
class Relationship(BaseModel):
106119
"""A relationship between entities in the knowledge graph."""
120
+
107121
source: str = Field(description="Source entity name")
108122
target: str = Field(description="Target entity name")
109123
type: str = Field(default="related_to", description="Relationship type")
110124
content_source: Optional[str] = Field(default=None, description="Content source identifier")
111125
timestamp: Optional[float] = Field(default=None, description="Timestamp in seconds")
112126
113127
114128
class KnowledgeGraphData(BaseModel):
115129
"""Serializable knowledge graph data."""
130
+
116131
nodes: List[Entity] = Field(default_factory=list, description="Graph nodes/entities")
117
- relationships: List[Relationship] = Field(default_factory=list, description="Graph relationships")
132
+ relationships: List[Relationship] = Field(
133
+ default_factory=list, description="Graph relationships"
134
+ )
118135
119136
120137
class ProcessingStats(BaseModel):
121138
"""Statistics about a processing run."""
139
+
122140
start_time: Optional[str] = Field(default=None, description="ISO format start time")
123141
end_time: Optional[str] = Field(default=None, description="ISO format end time")
124142
duration_seconds: Optional[float] = Field(default=None, description="Total processing time")
125143
frames_extracted: int = Field(default=0)
126144
people_frames_filtered: int = Field(default=0)
127145
diagrams_detected: int = Field(default=0)
128146
screen_captures: int = Field(default=0)
129147
transcript_duration_seconds: Optional[float] = Field(default=None)
130148
models_used: Dict[str, str] = Field(
131
- default_factory=dict,
132
- description="Map of task to model used (e.g. vision: gpt-4o)"
149
+ default_factory=dict, description="Map of task to model used (e.g. vision: gpt-4o)"
133150
)
134151
135152
136153
class VideoMetadata(BaseModel):
137154
"""Metadata about the source video."""
155
+
138156
title: str = Field(description="Video title")
139157
source_path: Optional[str] = Field(default=None, description="Original video file path")
140158
duration_seconds: Optional[float] = Field(default=None, description="Video duration")
141159
resolution: Optional[str] = Field(default=None, description="Video resolution (e.g. 1920x1080)")
142160
processed_at: str = Field(
143161
default_factory=lambda: datetime.now().isoformat(),
144
- description="ISO format processing timestamp"
162
+ description="ISO format processing timestamp",
145163
)
146164
147165
148166
class VideoManifest(BaseModel):
149167
"""Manifest for a single video processing run - the single source of truth."""
168
+
150169
version: str = Field(default="1.0", description="Manifest schema version")
151170
video: VideoMetadata = Field(description="Source video metadata")
152171
stats: ProcessingStats = Field(default_factory=ProcessingStats)
153172
154173
# Relative paths to output files
@@ -167,15 +186,18 @@
167186
action_items: List[ActionItem] = Field(default_factory=list)
168187
diagrams: List[DiagramResult] = Field(default_factory=list)
169188
screen_captures: List[ScreenCapture] = Field(default_factory=list)
170189
171190
# Frame paths
172
- frame_paths: List[str] = Field(default_factory=list, description="Relative paths to extracted frames")
191
+ frame_paths: List[str] = Field(
192
+ default_factory=list, description="Relative paths to extracted frames"
193
+ )
173194
174195
175196
class BatchVideoEntry(BaseModel):
176197
"""Summary of a single video within a batch."""
198
+
177199
video_name: str
178200
manifest_path: str = Field(description="Relative path to video manifest")
179201
status: str = Field(default="pending", description="pending/completed/failed")
180202
error: Optional[str] = Field(default=None, description="Error message if failed")
181203
diagrams_count: int = Field(default=0)
@@ -184,15 +206,14 @@
184206
duration_seconds: Optional[float] = Field(default=None)
185207
186208
187209
class BatchManifest(BaseModel):
188210
"""Manifest for a batch processing run."""
211
+
189212
version: str = Field(default="1.0")
190213
title: str = Field(default="Batch Processing Results")
191
- processed_at: str = Field(
192
- default_factory=lambda: datetime.now().isoformat()
193
- )
214
+ processed_at: str = Field(default_factory=lambda: datetime.now().isoformat())
194215
stats: ProcessingStats = Field(default_factory=ProcessingStats)
195216
196217
videos: List[BatchVideoEntry] = Field(default_factory=list)
197218
198219
# Aggregated counts
199220
--- video_processor/models.py
+++ video_processor/models.py
@@ -1,17 +1,17 @@
1 """Pydantic data models for PlanOpticon output."""
2
3 from datetime import datetime
4 from enum import Enum
5 from pathlib import Path
6 from typing import Any, Dict, List, Optional
7
8 from pydantic import BaseModel, Field
9
10
11 class DiagramType(str, Enum):
12 """Types of visual content detected in video frames."""
 
13 flowchart = "flowchart"
14 sequence = "sequence"
15 architecture = "architecture"
16 whiteboard = "whiteboard"
17 chart = "chart"
@@ -21,10 +21,11 @@
21 unknown = "unknown"
22
23
24 class OutputFormat(str, Enum):
25 """Available output formats."""
 
26 markdown = "markdown"
27 json = "json"
28 html = "html"
29 pdf = "pdf"
30 svg = "svg"
@@ -31,39 +32,47 @@
31 png = "png"
32
33
34 class TranscriptSegment(BaseModel):
35 """A single segment of transcribed audio."""
 
36 start: float = Field(description="Start time in seconds")
37 end: float = Field(description="End time in seconds")
38 text: str = Field(description="Transcribed text")
39 speaker: Optional[str] = Field(default=None, description="Speaker identifier")
40 confidence: Optional[float] = Field(default=None, description="Transcription confidence 0-1")
41
42
43 class ActionItem(BaseModel):
44 """An action item extracted from content."""
 
45 action: str = Field(description="The action to be taken")
46 assignee: Optional[str] = Field(default=None, description="Person responsible")
47 deadline: Optional[str] = Field(default=None, description="Deadline or timeframe")
48 priority: Optional[str] = Field(default=None, description="Priority level")
49 context: Optional[str] = Field(default=None, description="Additional context")
50 source: Optional[str] = Field(default=None, description="Where this was found (transcript/diagram)")
 
 
51
52
53 class KeyPoint(BaseModel):
54 """A key point extracted from content."""
 
55 point: str = Field(description="The key point")
56 topic: Optional[str] = Field(default=None, description="Topic or category")
57 details: Optional[str] = Field(default=None, description="Supporting details")
58 timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)")
59 source: Optional[str] = Field(default=None, description="Where this was found")
60 related_diagrams: List[int] = Field(default_factory=list, description="Indices of related diagrams")
 
 
61
62
63 class DiagramResult(BaseModel):
64 """Result from diagram extraction and analysis."""
 
65 frame_index: int = Field(description="Index of the source frame")
66 timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)")
67 diagram_type: DiagramType = Field(default=DiagramType.unknown, description="Type of diagram")
68 confidence: float = Field(default=0.0, description="Detection confidence 0-1")
69 description: Optional[str] = Field(default=None, description="Description of the diagram")
@@ -70,85 +79,95 @@
70 text_content: Optional[str] = Field(default=None, description="Text visible in the diagram")
71 elements: List[str] = Field(default_factory=list, description="Identified elements")
72 relationships: List[str] = Field(default_factory=list, description="Identified relationships")
73 mermaid: Optional[str] = Field(default=None, description="Mermaid syntax representation")
74 chart_data: Optional[Dict[str, Any]] = Field(
75 default=None,
76 description="Chart data for reproduction (labels, values, chart_type)"
77 )
78 image_path: Optional[str] = Field(default=None, description="Relative path to original frame")
79 svg_path: Optional[str] = Field(default=None, description="Relative path to rendered SVG")
80 png_path: Optional[str] = Field(default=None, description="Relative path to rendered PNG")
81 mermaid_path: Optional[str] = Field(default=None, description="Relative path to mermaid source")
82
83
84 class ScreenCapture(BaseModel):
85 """A screengrab fallback when diagram extraction fails or is uncertain."""
 
86 frame_index: int = Field(description="Index of the source frame")
87 timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)")
88 caption: Optional[str] = Field(default=None, description="Brief description of the content")
89 image_path: Optional[str] = Field(default=None, description="Relative path to screenshot")
90 confidence: float = Field(default=0.0, description="Detection confidence that triggered fallback")
 
 
91
92
93 class Entity(BaseModel):
94 """An entity in the knowledge graph."""
 
95 name: str = Field(description="Entity name")
96 type: str = Field(default="concept", description="Entity type (person, concept, time, diagram)")
97 descriptions: List[str] = Field(default_factory=list, description="Descriptions of this entity")
98 source: Optional[str] = Field(default=None, description="Source attribution (transcript/diagram/both)")
 
 
99 occurrences: List[Dict[str, Any]] = Field(
100 default_factory=list,
101 description="List of occurrences with source, timestamp, text"
102 )
103
104
105 class Relationship(BaseModel):
106 """A relationship between entities in the knowledge graph."""
 
107 source: str = Field(description="Source entity name")
108 target: str = Field(description="Target entity name")
109 type: str = Field(default="related_to", description="Relationship type")
110 content_source: Optional[str] = Field(default=None, description="Content source identifier")
111 timestamp: Optional[float] = Field(default=None, description="Timestamp in seconds")
112
113
114 class KnowledgeGraphData(BaseModel):
115 """Serializable knowledge graph data."""
 
116 nodes: List[Entity] = Field(default_factory=list, description="Graph nodes/entities")
117 relationships: List[Relationship] = Field(default_factory=list, description="Graph relationships")
 
 
118
119
120 class ProcessingStats(BaseModel):
121 """Statistics about a processing run."""
 
122 start_time: Optional[str] = Field(default=None, description="ISO format start time")
123 end_time: Optional[str] = Field(default=None, description="ISO format end time")
124 duration_seconds: Optional[float] = Field(default=None, description="Total processing time")
125 frames_extracted: int = Field(default=0)
126 people_frames_filtered: int = Field(default=0)
127 diagrams_detected: int = Field(default=0)
128 screen_captures: int = Field(default=0)
129 transcript_duration_seconds: Optional[float] = Field(default=None)
130 models_used: Dict[str, str] = Field(
131 default_factory=dict,
132 description="Map of task to model used (e.g. vision: gpt-4o)"
133 )
134
135
136 class VideoMetadata(BaseModel):
137 """Metadata about the source video."""
 
138 title: str = Field(description="Video title")
139 source_path: Optional[str] = Field(default=None, description="Original video file path")
140 duration_seconds: Optional[float] = Field(default=None, description="Video duration")
141 resolution: Optional[str] = Field(default=None, description="Video resolution (e.g. 1920x1080)")
142 processed_at: str = Field(
143 default_factory=lambda: datetime.now().isoformat(),
144 description="ISO format processing timestamp"
145 )
146
147
148 class VideoManifest(BaseModel):
149 """Manifest for a single video processing run - the single source of truth."""
 
150 version: str = Field(default="1.0", description="Manifest schema version")
151 video: VideoMetadata = Field(description="Source video metadata")
152 stats: ProcessingStats = Field(default_factory=ProcessingStats)
153
154 # Relative paths to output files
@@ -167,15 +186,18 @@
167 action_items: List[ActionItem] = Field(default_factory=list)
168 diagrams: List[DiagramResult] = Field(default_factory=list)
169 screen_captures: List[ScreenCapture] = Field(default_factory=list)
170
171 # Frame paths
172 frame_paths: List[str] = Field(default_factory=list, description="Relative paths to extracted frames")
 
 
173
174
175 class BatchVideoEntry(BaseModel):
176 """Summary of a single video within a batch."""
 
177 video_name: str
178 manifest_path: str = Field(description="Relative path to video manifest")
179 status: str = Field(default="pending", description="pending/completed/failed")
180 error: Optional[str] = Field(default=None, description="Error message if failed")
181 diagrams_count: int = Field(default=0)
@@ -184,15 +206,14 @@
184 duration_seconds: Optional[float] = Field(default=None)
185
186
187 class BatchManifest(BaseModel):
188 """Manifest for a batch processing run."""
 
189 version: str = Field(default="1.0")
190 title: str = Field(default="Batch Processing Results")
191 processed_at: str = Field(
192 default_factory=lambda: datetime.now().isoformat()
193 )
194 stats: ProcessingStats = Field(default_factory=ProcessingStats)
195
196 videos: List[BatchVideoEntry] = Field(default_factory=list)
197
198 # Aggregated counts
199
--- video_processor/models.py
+++ video_processor/models.py
@@ -1,17 +1,17 @@
1 """Pydantic data models for PlanOpticon output."""
2
3 from datetime import datetime
4 from enum import Enum
 
5 from typing import Any, Dict, List, Optional
6
7 from pydantic import BaseModel, Field
8
9
10 class DiagramType(str, Enum):
11 """Types of visual content detected in video frames."""
12
13 flowchart = "flowchart"
14 sequence = "sequence"
15 architecture = "architecture"
16 whiteboard = "whiteboard"
17 chart = "chart"
@@ -21,10 +21,11 @@
21 unknown = "unknown"
22
23
24 class OutputFormat(str, Enum):
25 """Available output formats."""
26
27 markdown = "markdown"
28 json = "json"
29 html = "html"
30 pdf = "pdf"
31 svg = "svg"
@@ -31,39 +32,47 @@
32 png = "png"
33
34
35 class TranscriptSegment(BaseModel):
36 """A single segment of transcribed audio."""
37
38 start: float = Field(description="Start time in seconds")
39 end: float = Field(description="End time in seconds")
40 text: str = Field(description="Transcribed text")
41 speaker: Optional[str] = Field(default=None, description="Speaker identifier")
42 confidence: Optional[float] = Field(default=None, description="Transcription confidence 0-1")
43
44
45 class ActionItem(BaseModel):
46 """An action item extracted from content."""
47
48 action: str = Field(description="The action to be taken")
49 assignee: Optional[str] = Field(default=None, description="Person responsible")
50 deadline: Optional[str] = Field(default=None, description="Deadline or timeframe")
51 priority: Optional[str] = Field(default=None, description="Priority level")
52 context: Optional[str] = Field(default=None, description="Additional context")
53 source: Optional[str] = Field(
54 default=None, description="Where this was found (transcript/diagram)"
55 )
56
57
58 class KeyPoint(BaseModel):
59 """A key point extracted from content."""
60
61 point: str = Field(description="The key point")
62 topic: Optional[str] = Field(default=None, description="Topic or category")
63 details: Optional[str] = Field(default=None, description="Supporting details")
64 timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)")
65 source: Optional[str] = Field(default=None, description="Where this was found")
66 related_diagrams: List[int] = Field(
67 default_factory=list, description="Indices of related diagrams"
68 )
69
70
71 class DiagramResult(BaseModel):
72 """Result from diagram extraction and analysis."""
73
74 frame_index: int = Field(description="Index of the source frame")
75 timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)")
76 diagram_type: DiagramType = Field(default=DiagramType.unknown, description="Type of diagram")
77 confidence: float = Field(default=0.0, description="Detection confidence 0-1")
78 description: Optional[str] = Field(default=None, description="Description of the diagram")
@@ -70,85 +79,95 @@
79 text_content: Optional[str] = Field(default=None, description="Text visible in the diagram")
80 elements: List[str] = Field(default_factory=list, description="Identified elements")
81 relationships: List[str] = Field(default_factory=list, description="Identified relationships")
82 mermaid: Optional[str] = Field(default=None, description="Mermaid syntax representation")
83 chart_data: Optional[Dict[str, Any]] = Field(
84 default=None, description="Chart data for reproduction (labels, values, chart_type)"
 
85 )
86 image_path: Optional[str] = Field(default=None, description="Relative path to original frame")
87 svg_path: Optional[str] = Field(default=None, description="Relative path to rendered SVG")
88 png_path: Optional[str] = Field(default=None, description="Relative path to rendered PNG")
89 mermaid_path: Optional[str] = Field(default=None, description="Relative path to mermaid source")
90
91
92 class ScreenCapture(BaseModel):
93 """A screengrab fallback when diagram extraction fails or is uncertain."""
94
95 frame_index: int = Field(description="Index of the source frame")
96 timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)")
97 caption: Optional[str] = Field(default=None, description="Brief description of the content")
98 image_path: Optional[str] = Field(default=None, description="Relative path to screenshot")
99 confidence: float = Field(
100 default=0.0, description="Detection confidence that triggered fallback"
101 )
102
103
104 class Entity(BaseModel):
105 """An entity in the knowledge graph."""
106
107 name: str = Field(description="Entity name")
108 type: str = Field(default="concept", description="Entity type (person, concept, time, diagram)")
109 descriptions: List[str] = Field(default_factory=list, description="Descriptions of this entity")
110 source: Optional[str] = Field(
111 default=None, description="Source attribution (transcript/diagram/both)"
112 )
113 occurrences: List[Dict[str, Any]] = Field(
114 default_factory=list, description="List of occurrences with source, timestamp, text"
 
115 )
116
117
118 class Relationship(BaseModel):
119 """A relationship between entities in the knowledge graph."""
120
121 source: str = Field(description="Source entity name")
122 target: str = Field(description="Target entity name")
123 type: str = Field(default="related_to", description="Relationship type")
124 content_source: Optional[str] = Field(default=None, description="Content source identifier")
125 timestamp: Optional[float] = Field(default=None, description="Timestamp in seconds")
126
127
128 class KnowledgeGraphData(BaseModel):
129 """Serializable knowledge graph data."""
130
131 nodes: List[Entity] = Field(default_factory=list, description="Graph nodes/entities")
132 relationships: List[Relationship] = Field(
133 default_factory=list, description="Graph relationships"
134 )
135
136
137 class ProcessingStats(BaseModel):
138 """Statistics about a processing run."""
139
140 start_time: Optional[str] = Field(default=None, description="ISO format start time")
141 end_time: Optional[str] = Field(default=None, description="ISO format end time")
142 duration_seconds: Optional[float] = Field(default=None, description="Total processing time")
143 frames_extracted: int = Field(default=0)
144 people_frames_filtered: int = Field(default=0)
145 diagrams_detected: int = Field(default=0)
146 screen_captures: int = Field(default=0)
147 transcript_duration_seconds: Optional[float] = Field(default=None)
148 models_used: Dict[str, str] = Field(
149 default_factory=dict, description="Map of task to model used (e.g. vision: gpt-4o)"
 
150 )
151
152
153 class VideoMetadata(BaseModel):
154 """Metadata about the source video."""
155
156 title: str = Field(description="Video title")
157 source_path: Optional[str] = Field(default=None, description="Original video file path")
158 duration_seconds: Optional[float] = Field(default=None, description="Video duration")
159 resolution: Optional[str] = Field(default=None, description="Video resolution (e.g. 1920x1080)")
160 processed_at: str = Field(
161 default_factory=lambda: datetime.now().isoformat(),
162 description="ISO format processing timestamp",
163 )
164
165
166 class VideoManifest(BaseModel):
167 """Manifest for a single video processing run - the single source of truth."""
168
169 version: str = Field(default="1.0", description="Manifest schema version")
170 video: VideoMetadata = Field(description="Source video metadata")
171 stats: ProcessingStats = Field(default_factory=ProcessingStats)
172
173 # Relative paths to output files
@@ -167,15 +186,18 @@
186 action_items: List[ActionItem] = Field(default_factory=list)
187 diagrams: List[DiagramResult] = Field(default_factory=list)
188 screen_captures: List[ScreenCapture] = Field(default_factory=list)
189
190 # Frame paths
191 frame_paths: List[str] = Field(
192 default_factory=list, description="Relative paths to extracted frames"
193 )
194
195
196 class BatchVideoEntry(BaseModel):
197 """Summary of a single video within a batch."""
198
199 video_name: str
200 manifest_path: str = Field(description="Relative path to video manifest")
201 status: str = Field(default="pending", description="pending/completed/failed")
202 error: Optional[str] = Field(default=None, description="Error message if failed")
203 diagrams_count: int = Field(default=0)
@@ -184,15 +206,14 @@
206 duration_seconds: Optional[float] = Field(default=None)
207
208
209 class BatchManifest(BaseModel):
210 """Manifest for a batch processing run."""
211
212 version: str = Field(default="1.0")
213 title: str = Field(default="Batch Processing Results")
214 processed_at: str = Field(default_factory=lambda: datetime.now().isoformat())
 
 
215 stats: ProcessingStats = Field(default_factory=ProcessingStats)
216
217 videos: List[BatchVideoEntry] = Field(default_factory=list)
218
219 # Aggregated counts
220
--- video_processor/output_structure.py
+++ video_processor/output_structure.py
@@ -1,8 +1,7 @@
11
"""Standardized output directory structure and manifest I/O for PlanOpticon."""
22
3
-import json
43
import logging
54
from pathlib import Path
65
from typing import Dict
76
87
from video_processor.models import BatchManifest, VideoManifest
98
--- video_processor/output_structure.py
+++ video_processor/output_structure.py
@@ -1,8 +1,7 @@
1 """Standardized output directory structure and manifest I/O for PlanOpticon."""
2
3 import json
4 import logging
5 from pathlib import Path
6 from typing import Dict
7
8 from video_processor.models import BatchManifest, VideoManifest
9
--- video_processor/output_structure.py
+++ video_processor/output_structure.py
@@ -1,8 +1,7 @@
1 """Standardized output directory structure and manifest I/O for PlanOpticon."""
2
 
3 import logging
4 from pathlib import Path
5 from typing import Dict
6
7 from video_processor.models import BatchManifest, VideoManifest
8
--- video_processor/pipeline.py
+++ video_processor/pipeline.py
@@ -9,11 +9,15 @@
99
1010
from tqdm import tqdm
1111
1212
from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
1313
from video_processor.extractors.audio_extractor import AudioExtractor
14
-from video_processor.extractors.frame_extractor import extract_frames, filter_people_frames, save_frames
14
+from video_processor.extractors.frame_extractor import (
15
+ extract_frames,
16
+ filter_people_frames,
17
+ save_frames,
18
+)
1519
from video_processor.integrators.knowledge_graph import KnowledgeGraph
1620
from video_processor.integrators.plan_generator import PlanGenerator
1721
from video_processor.models import (
1822
ActionItem,
1923
KeyPoint,
@@ -145,13 +149,11 @@
145149
srt_lines = []
146150
for i, seg in enumerate(segments):
147151
start = seg.get("start", 0)
148152
end = seg.get("end", 0)
149153
srt_lines.append(str(i + 1))
150
- srt_lines.append(
151
- f"{_format_srt_time(start)} --> {_format_srt_time(end)}"
152
- )
154
+ srt_lines.append(f"{_format_srt_time(start)} --> {_format_srt_time(end)}")
153155
srt_lines.append(seg.get("text", "").strip())
154156
srt_lines.append("")
155157
transcript_srt.write_text("\n".join(srt_lines))
156158
pipeline_bar.update(1)
157159
@@ -158,14 +160,17 @@
158160
# --- Step 4: Diagram extraction ---
159161
pm.usage.start_step("Visual analysis")
160162
pipeline_bar.set_description("Pipeline: analyzing visuals")
161163
diagrams = []
162164
screen_captures = []
163
- existing_diagrams = sorted(dirs["diagrams"].glob("diagram_*.json")) if dirs["diagrams"].exists() else []
165
+ existing_diagrams = (
166
+ sorted(dirs["diagrams"].glob("diagram_*.json")) if dirs["diagrams"].exists() else []
167
+ )
164168
if existing_diagrams:
165169
logger.info(f"Resuming: found {len(existing_diagrams)} diagrams on disk, skipping analysis")
166170
from video_processor.models import DiagramResult
171
+
167172
for dj in existing_diagrams:
168173
try:
169174
diagrams.append(DiagramResult.model_validate_json(dj.read_text()))
170175
except Exception as e:
171176
logger.warning(f"Failed to load diagram {dj}: {e}")
@@ -208,16 +213,12 @@
208213
pipeline_bar.set_description("Pipeline: extracting key points")
209214
kp_path = dirs["results"] / "key_points.json"
210215
ai_path = dirs["results"] / "action_items.json"
211216
if kp_path.exists() and ai_path.exists():
212217
logger.info("Resuming: found key points and action items on disk")
213
- key_points = [
214
- KeyPoint(**item) for item in json.loads(kp_path.read_text())
215
- ]
216
- action_items = [
217
- ActionItem(**item) for item in json.loads(ai_path.read_text())
218
- ]
218
+ key_points = [KeyPoint(**item) for item in json.loads(kp_path.read_text())]
219
+ action_items = [ActionItem(**item) for item in json.loads(ai_path.read_text())]
219220
else:
220221
key_points = _extract_key_points(pm, transcript_text)
221222
action_items = _extract_action_items(pm, transcript_text)
222223
223224
kp_path.write_text(json.dumps([kp.model_dump() for kp in key_points], indent=2))
@@ -286,13 +287,15 @@
286287
pipeline_bar.close()
287288
288289
# Write manifest
289290
write_video_manifest(manifest, output_dir)
290291
291
- logger.info(f"Processing complete in {elapsed:.1f}s: {len(diagrams)} diagrams, "
292
- f"{len(screen_captures)} captures, {len(key_points)} key points, "
293
- f"{len(action_items)} action items")
292
+ logger.info(
293
+ f"Processing complete in {elapsed:.1f}s: {len(diagrams)} diagrams, "
294
+ f"{len(screen_captures)} captures, {len(key_points)} key points, "
295
+ f"{len(action_items)} action items"
296
+ )
294297
295298
return manifest
296299
297300
298301
def _extract_key_points(pm: ProviderManager, text: str) -> list[KeyPoint]:
299302
--- video_processor/pipeline.py
+++ video_processor/pipeline.py
@@ -9,11 +9,15 @@
9
10 from tqdm import tqdm
11
12 from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
13 from video_processor.extractors.audio_extractor import AudioExtractor
14 from video_processor.extractors.frame_extractor import extract_frames, filter_people_frames, save_frames
 
 
 
 
15 from video_processor.integrators.knowledge_graph import KnowledgeGraph
16 from video_processor.integrators.plan_generator import PlanGenerator
17 from video_processor.models import (
18 ActionItem,
19 KeyPoint,
@@ -145,13 +149,11 @@
145 srt_lines = []
146 for i, seg in enumerate(segments):
147 start = seg.get("start", 0)
148 end = seg.get("end", 0)
149 srt_lines.append(str(i + 1))
150 srt_lines.append(
151 f"{_format_srt_time(start)} --> {_format_srt_time(end)}"
152 )
153 srt_lines.append(seg.get("text", "").strip())
154 srt_lines.append("")
155 transcript_srt.write_text("\n".join(srt_lines))
156 pipeline_bar.update(1)
157
@@ -158,14 +160,17 @@
158 # --- Step 4: Diagram extraction ---
159 pm.usage.start_step("Visual analysis")
160 pipeline_bar.set_description("Pipeline: analyzing visuals")
161 diagrams = []
162 screen_captures = []
163 existing_diagrams = sorted(dirs["diagrams"].glob("diagram_*.json")) if dirs["diagrams"].exists() else []
 
 
164 if existing_diagrams:
165 logger.info(f"Resuming: found {len(existing_diagrams)} diagrams on disk, skipping analysis")
166 from video_processor.models import DiagramResult
 
167 for dj in existing_diagrams:
168 try:
169 diagrams.append(DiagramResult.model_validate_json(dj.read_text()))
170 except Exception as e:
171 logger.warning(f"Failed to load diagram {dj}: {e}")
@@ -208,16 +213,12 @@
208 pipeline_bar.set_description("Pipeline: extracting key points")
209 kp_path = dirs["results"] / "key_points.json"
210 ai_path = dirs["results"] / "action_items.json"
211 if kp_path.exists() and ai_path.exists():
212 logger.info("Resuming: found key points and action items on disk")
213 key_points = [
214 KeyPoint(**item) for item in json.loads(kp_path.read_text())
215 ]
216 action_items = [
217 ActionItem(**item) for item in json.loads(ai_path.read_text())
218 ]
219 else:
220 key_points = _extract_key_points(pm, transcript_text)
221 action_items = _extract_action_items(pm, transcript_text)
222
223 kp_path.write_text(json.dumps([kp.model_dump() for kp in key_points], indent=2))
@@ -286,13 +287,15 @@
286 pipeline_bar.close()
287
288 # Write manifest
289 write_video_manifest(manifest, output_dir)
290
291 logger.info(f"Processing complete in {elapsed:.1f}s: {len(diagrams)} diagrams, "
292 f"{len(screen_captures)} captures, {len(key_points)} key points, "
293 f"{len(action_items)} action items")
 
 
294
295 return manifest
296
297
298 def _extract_key_points(pm: ProviderManager, text: str) -> list[KeyPoint]:
299
--- video_processor/pipeline.py
+++ video_processor/pipeline.py
@@ -9,11 +9,15 @@
9
10 from tqdm import tqdm
11
12 from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
13 from video_processor.extractors.audio_extractor import AudioExtractor
14 from video_processor.extractors.frame_extractor import (
15 extract_frames,
16 filter_people_frames,
17 save_frames,
18 )
19 from video_processor.integrators.knowledge_graph import KnowledgeGraph
20 from video_processor.integrators.plan_generator import PlanGenerator
21 from video_processor.models import (
22 ActionItem,
23 KeyPoint,
@@ -145,13 +149,11 @@
149 srt_lines = []
150 for i, seg in enumerate(segments):
151 start = seg.get("start", 0)
152 end = seg.get("end", 0)
153 srt_lines.append(str(i + 1))
154 srt_lines.append(f"{_format_srt_time(start)} --> {_format_srt_time(end)}")
 
 
155 srt_lines.append(seg.get("text", "").strip())
156 srt_lines.append("")
157 transcript_srt.write_text("\n".join(srt_lines))
158 pipeline_bar.update(1)
159
@@ -158,14 +160,17 @@
160 # --- Step 4: Diagram extraction ---
161 pm.usage.start_step("Visual analysis")
162 pipeline_bar.set_description("Pipeline: analyzing visuals")
163 diagrams = []
164 screen_captures = []
165 existing_diagrams = (
166 sorted(dirs["diagrams"].glob("diagram_*.json")) if dirs["diagrams"].exists() else []
167 )
168 if existing_diagrams:
169 logger.info(f"Resuming: found {len(existing_diagrams)} diagrams on disk, skipping analysis")
170 from video_processor.models import DiagramResult
171
172 for dj in existing_diagrams:
173 try:
174 diagrams.append(DiagramResult.model_validate_json(dj.read_text()))
175 except Exception as e:
176 logger.warning(f"Failed to load diagram {dj}: {e}")
@@ -208,16 +213,12 @@
213 pipeline_bar.set_description("Pipeline: extracting key points")
214 kp_path = dirs["results"] / "key_points.json"
215 ai_path = dirs["results"] / "action_items.json"
216 if kp_path.exists() and ai_path.exists():
217 logger.info("Resuming: found key points and action items on disk")
218 key_points = [KeyPoint(**item) for item in json.loads(kp_path.read_text())]
219 action_items = [ActionItem(**item) for item in json.loads(ai_path.read_text())]
 
 
 
 
220 else:
221 key_points = _extract_key_points(pm, transcript_text)
222 action_items = _extract_action_items(pm, transcript_text)
223
224 kp_path.write_text(json.dumps([kp.model_dump() for kp in key_points], indent=2))
@@ -286,13 +287,15 @@
287 pipeline_bar.close()
288
289 # Write manifest
290 write_video_manifest(manifest, output_dir)
291
292 logger.info(
293 f"Processing complete in {elapsed:.1f}s: {len(diagrams)} diagrams, "
294 f"{len(screen_captures)} captures, {len(key_points)} key points, "
295 f"{len(action_items)} action items"
296 )
297
298 return manifest
299
300
301 def _extract_key_points(pm: ProviderManager, text: str) -> list[KeyPoint]:
302
--- video_processor/providers/anthropic_provider.py
+++ video_processor/providers/anthropic_provider.py
@@ -97,14 +97,16 @@
9797
try:
9898
page = self.client.models.list(limit=100)
9999
for m in page.data:
100100
mid = m.id
101101
caps = ["chat", "vision"] # All Claude models support chat + vision
102
- models.append(ModelInfo(
103
- id=mid,
104
- provider="anthropic",
105
- display_name=getattr(m, "display_name", mid),
106
- capabilities=caps,
107
- ))
102
+ models.append(
103
+ ModelInfo(
104
+ id=mid,
105
+ provider="anthropic",
106
+ display_name=getattr(m, "display_name", mid),
107
+ capabilities=caps,
108
+ )
109
+ )
108110
except Exception as e:
109111
logger.warning(f"Failed to list Anthropic models: {e}")
110112
return sorted(models, key=lambda m: m.id)
111113
--- video_processor/providers/anthropic_provider.py
+++ video_processor/providers/anthropic_provider.py
@@ -97,14 +97,16 @@
97 try:
98 page = self.client.models.list(limit=100)
99 for m in page.data:
100 mid = m.id
101 caps = ["chat", "vision"] # All Claude models support chat + vision
102 models.append(ModelInfo(
103 id=mid,
104 provider="anthropic",
105 display_name=getattr(m, "display_name", mid),
106 capabilities=caps,
107 ))
 
 
108 except Exception as e:
109 logger.warning(f"Failed to list Anthropic models: {e}")
110 return sorted(models, key=lambda m: m.id)
111
--- video_processor/providers/anthropic_provider.py
+++ video_processor/providers/anthropic_provider.py
@@ -97,14 +97,16 @@
97 try:
98 page = self.client.models.list(limit=100)
99 for m in page.data:
100 mid = m.id
101 caps = ["chat", "vision"] # All Claude models support chat + vision
102 models.append(
103 ModelInfo(
104 id=mid,
105 provider="anthropic",
106 display_name=getattr(m, "display_name", mid),
107 capabilities=caps,
108 )
109 )
110 except Exception as e:
111 logger.warning(f"Failed to list Anthropic models: {e}")
112 return sorted(models, key=lambda m: m.id)
113
--- video_processor/providers/base.py
+++ video_processor/providers/base.py
@@ -7,16 +7,16 @@
77
from pydantic import BaseModel, Field
88
99
1010
class ModelInfo(BaseModel):
1111
"""Information about an available model."""
12
+
1213
id: str = Field(description="Model identifier (e.g. gpt-4o)")
1314
provider: str = Field(description="Provider name (openai, anthropic, gemini)")
1415
display_name: str = Field(default="", description="Human-readable name")
1516
capabilities: List[str] = Field(
16
- default_factory=list,
17
- description="Model capabilities: chat, vision, audio, embedding"
17
+ default_factory=list, description="Model capabilities: chat, vision, audio, embedding"
1818
)
1919
2020
2121
class BaseProvider(ABC):
2222
"""Abstract base for all provider implementations."""
2323
--- video_processor/providers/base.py
+++ video_processor/providers/base.py
@@ -7,16 +7,16 @@
7 from pydantic import BaseModel, Field
8
9
10 class ModelInfo(BaseModel):
11 """Information about an available model."""
 
12 id: str = Field(description="Model identifier (e.g. gpt-4o)")
13 provider: str = Field(description="Provider name (openai, anthropic, gemini)")
14 display_name: str = Field(default="", description="Human-readable name")
15 capabilities: List[str] = Field(
16 default_factory=list,
17 description="Model capabilities: chat, vision, audio, embedding"
18 )
19
20
21 class BaseProvider(ABC):
22 """Abstract base for all provider implementations."""
23
--- video_processor/providers/base.py
+++ video_processor/providers/base.py
@@ -7,16 +7,16 @@
7 from pydantic import BaseModel, Field
8
9
10 class ModelInfo(BaseModel):
11 """Information about an available model."""
12
13 id: str = Field(description="Model identifier (e.g. gpt-4o)")
14 provider: str = Field(description="Provider name (openai, anthropic, gemini)")
15 display_name: str = Field(default="", description="Human-readable name")
16 capabilities: List[str] = Field(
17 default_factory=list, description="Model capabilities: chat, vision, audio, embedding"
 
18 )
19
20
21 class BaseProvider(ABC):
22 """Abstract base for all provider implementations."""
23
--- video_processor/providers/discovery.py
+++ video_processor/providers/discovery.py
@@ -38,10 +38,11 @@
3838
3939
# OpenAI
4040
if keys.get("openai"):
4141
try:
4242
from video_processor.providers.openai_provider import OpenAIProvider
43
+
4344
provider = OpenAIProvider(api_key=keys["openai"])
4445
models = provider.list_models()
4546
logger.info(f"Discovered {len(models)} OpenAI models")
4647
all_models.extend(models)
4748
except Exception as e:
@@ -49,10 +50,11 @@
4950
5051
# Anthropic
5152
if keys.get("anthropic"):
5253
try:
5354
from video_processor.providers.anthropic_provider import AnthropicProvider
55
+
5456
provider = AnthropicProvider(api_key=keys["anthropic"])
5557
models = provider.list_models()
5658
logger.info(f"Discovered {len(models)} Anthropic models")
5759
all_models.extend(models)
5860
except Exception as e:
@@ -62,10 +64,11 @@
6264
gemini_key = keys.get("gemini")
6365
gemini_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS", "")
6466
if gemini_key or gemini_creds:
6567
try:
6668
from video_processor.providers.gemini_provider import GeminiProvider
69
+
6770
provider = GeminiProvider(
6871
api_key=gemini_key or None,
6972
credentials_path=gemini_creds or None,
7073
)
7174
models = provider.list_models()
7275
--- video_processor/providers/discovery.py
+++ video_processor/providers/discovery.py
@@ -38,10 +38,11 @@
38
39 # OpenAI
40 if keys.get("openai"):
41 try:
42 from video_processor.providers.openai_provider import OpenAIProvider
 
43 provider = OpenAIProvider(api_key=keys["openai"])
44 models = provider.list_models()
45 logger.info(f"Discovered {len(models)} OpenAI models")
46 all_models.extend(models)
47 except Exception as e:
@@ -49,10 +50,11 @@
49
50 # Anthropic
51 if keys.get("anthropic"):
52 try:
53 from video_processor.providers.anthropic_provider import AnthropicProvider
 
54 provider = AnthropicProvider(api_key=keys["anthropic"])
55 models = provider.list_models()
56 logger.info(f"Discovered {len(models)} Anthropic models")
57 all_models.extend(models)
58 except Exception as e:
@@ -62,10 +64,11 @@
62 gemini_key = keys.get("gemini")
63 gemini_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS", "")
64 if gemini_key or gemini_creds:
65 try:
66 from video_processor.providers.gemini_provider import GeminiProvider
 
67 provider = GeminiProvider(
68 api_key=gemini_key or None,
69 credentials_path=gemini_creds or None,
70 )
71 models = provider.list_models()
72
--- video_processor/providers/discovery.py
+++ video_processor/providers/discovery.py
@@ -38,10 +38,11 @@
38
39 # OpenAI
40 if keys.get("openai"):
41 try:
42 from video_processor.providers.openai_provider import OpenAIProvider
43
44 provider = OpenAIProvider(api_key=keys["openai"])
45 models = provider.list_models()
46 logger.info(f"Discovered {len(models)} OpenAI models")
47 all_models.extend(models)
48 except Exception as e:
@@ -49,10 +50,11 @@
50
51 # Anthropic
52 if keys.get("anthropic"):
53 try:
54 from video_processor.providers.anthropic_provider import AnthropicProvider
55
56 provider = AnthropicProvider(api_key=keys["anthropic"])
57 models = provider.list_models()
58 logger.info(f"Discovered {len(models)} Anthropic models")
59 all_models.extend(models)
60 except Exception as e:
@@ -62,10 +64,11 @@
64 gemini_key = keys.get("gemini")
65 gemini_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS", "")
66 if gemini_key or gemini_creds:
67 try:
68 from video_processor.providers.gemini_provider import GeminiProvider
69
70 provider = GeminiProvider(
71 api_key=gemini_key or None,
72 credentials_path=gemini_creds or None,
73 )
74 models = provider.list_models()
75
--- video_processor/providers/gemini_provider.py
+++ video_processor/providers/gemini_provider.py
@@ -29,16 +29,15 @@
2929
):
3030
self.api_key = api_key or os.getenv("GEMINI_API_KEY")
3131
self.credentials_path = credentials_path or os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
3232
3333
if not self.api_key and not self.credentials_path:
34
- raise ValueError(
35
- "Neither GEMINI_API_KEY nor GOOGLE_APPLICATION_CREDENTIALS is set"
36
- )
34
+ raise ValueError("Neither GEMINI_API_KEY nor GOOGLE_APPLICATION_CREDENTIALS is set")
3735
3836
try:
3937
from google import genai
38
+
4039
self._genai = genai
4140
4241
if self.api_key:
4342
self.client = genai.Client(api_key=self.api_key)
4443
else:
@@ -55,12 +54,11 @@
5554
project=project,
5655
location=location,
5756
)
5857
except ImportError:
5958
raise ImportError(
60
- "google-genai package not installed. "
61
- "Install with: pip install google-genai"
59
+ "google-genai package not installed. Install with: pip install google-genai"
6260
)
6361
6462
def chat(
6563
self,
6664
messages: list[dict],
@@ -73,14 +71,16 @@
7371
model = model or "gemini-2.5-flash"
7472
# Convert OpenAI-style messages to Gemini contents
7573
contents = []
7674
for msg in messages:
7775
role = "user" if msg["role"] == "user" else "model"
78
- contents.append(types.Content(
79
- role=role,
80
- parts=[types.Part.from_text(text=msg["content"])],
81
- ))
76
+ contents.append(
77
+ types.Content(
78
+ role=role,
79
+ parts=[types.Part.from_text(text=msg["content"])],
80
+ )
81
+ )
8282
8383
response = self.client.models.generate_content(
8484
model=model,
8585
contents=contents,
8686
config=types.GenerateContentConfig(
@@ -168,10 +168,11 @@
168168
),
169169
)
170170
171171
# Parse JSON response
172172
import json
173
+
173174
try:
174175
data = json.loads(response.text)
175176
except (json.JSONDecodeError, TypeError):
176177
data = {"text": response.text or "", "segments": []}
177178
@@ -190,11 +191,11 @@
190191
for m in self.client.models.list():
191192
mid = m.name or ""
192193
# Strip prefix variants from different API modes
193194
for prefix in ("models/", "publishers/google/models/"):
194195
if mid.startswith(prefix):
195
- mid = mid[len(prefix):]
196
+ mid = mid[len(prefix) :]
196197
break
197198
display = getattr(m, "display_name", mid) or mid
198199
199200
caps = []
200201
mid_lower = mid.lower()
@@ -206,14 +207,16 @@
206207
caps.append("audio")
207208
if "embedding" in mid_lower:
208209
caps.append("embedding")
209210
210211
if caps:
211
- models.append(ModelInfo(
212
- id=mid,
213
- provider="gemini",
214
- display_name=display,
215
- capabilities=caps,
216
- ))
212
+ models.append(
213
+ ModelInfo(
214
+ id=mid,
215
+ provider="gemini",
216
+ display_name=display,
217
+ capabilities=caps,
218
+ )
219
+ )
217220
except Exception as e:
218221
logger.warning(f"Failed to list Gemini models: {e}")
219222
return sorted(models, key=lambda m: m.id)
220223
--- video_processor/providers/gemini_provider.py
+++ video_processor/providers/gemini_provider.py
@@ -29,16 +29,15 @@
29 ):
30 self.api_key = api_key or os.getenv("GEMINI_API_KEY")
31 self.credentials_path = credentials_path or os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
32
33 if not self.api_key and not self.credentials_path:
34 raise ValueError(
35 "Neither GEMINI_API_KEY nor GOOGLE_APPLICATION_CREDENTIALS is set"
36 )
37
38 try:
39 from google import genai
 
40 self._genai = genai
41
42 if self.api_key:
43 self.client = genai.Client(api_key=self.api_key)
44 else:
@@ -55,12 +54,11 @@
55 project=project,
56 location=location,
57 )
58 except ImportError:
59 raise ImportError(
60 "google-genai package not installed. "
61 "Install with: pip install google-genai"
62 )
63
64 def chat(
65 self,
66 messages: list[dict],
@@ -73,14 +71,16 @@
73 model = model or "gemini-2.5-flash"
74 # Convert OpenAI-style messages to Gemini contents
75 contents = []
76 for msg in messages:
77 role = "user" if msg["role"] == "user" else "model"
78 contents.append(types.Content(
79 role=role,
80 parts=[types.Part.from_text(text=msg["content"])],
81 ))
 
 
82
83 response = self.client.models.generate_content(
84 model=model,
85 contents=contents,
86 config=types.GenerateContentConfig(
@@ -168,10 +168,11 @@
168 ),
169 )
170
171 # Parse JSON response
172 import json
 
173 try:
174 data = json.loads(response.text)
175 except (json.JSONDecodeError, TypeError):
176 data = {"text": response.text or "", "segments": []}
177
@@ -190,11 +191,11 @@
190 for m in self.client.models.list():
191 mid = m.name or ""
192 # Strip prefix variants from different API modes
193 for prefix in ("models/", "publishers/google/models/"):
194 if mid.startswith(prefix):
195 mid = mid[len(prefix):]
196 break
197 display = getattr(m, "display_name", mid) or mid
198
199 caps = []
200 mid_lower = mid.lower()
@@ -206,14 +207,16 @@
206 caps.append("audio")
207 if "embedding" in mid_lower:
208 caps.append("embedding")
209
210 if caps:
211 models.append(ModelInfo(
212 id=mid,
213 provider="gemini",
214 display_name=display,
215 capabilities=caps,
216 ))
 
 
217 except Exception as e:
218 logger.warning(f"Failed to list Gemini models: {e}")
219 return sorted(models, key=lambda m: m.id)
220
--- video_processor/providers/gemini_provider.py
+++ video_processor/providers/gemini_provider.py
@@ -29,16 +29,15 @@
29 ):
30 self.api_key = api_key or os.getenv("GEMINI_API_KEY")
31 self.credentials_path = credentials_path or os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
32
33 if not self.api_key and not self.credentials_path:
34 raise ValueError("Neither GEMINI_API_KEY nor GOOGLE_APPLICATION_CREDENTIALS is set")
 
 
35
36 try:
37 from google import genai
38
39 self._genai = genai
40
41 if self.api_key:
42 self.client = genai.Client(api_key=self.api_key)
43 else:
@@ -55,12 +54,11 @@
54 project=project,
55 location=location,
56 )
57 except ImportError:
58 raise ImportError(
59 "google-genai package not installed. Install with: pip install google-genai"
 
60 )
61
62 def chat(
63 self,
64 messages: list[dict],
@@ -73,14 +71,16 @@
71 model = model or "gemini-2.5-flash"
72 # Convert OpenAI-style messages to Gemini contents
73 contents = []
74 for msg in messages:
75 role = "user" if msg["role"] == "user" else "model"
76 contents.append(
77 types.Content(
78 role=role,
79 parts=[types.Part.from_text(text=msg["content"])],
80 )
81 )
82
83 response = self.client.models.generate_content(
84 model=model,
85 contents=contents,
86 config=types.GenerateContentConfig(
@@ -168,10 +168,11 @@
168 ),
169 )
170
171 # Parse JSON response
172 import json
173
174 try:
175 data = json.loads(response.text)
176 except (json.JSONDecodeError, TypeError):
177 data = {"text": response.text or "", "segments": []}
178
@@ -190,11 +191,11 @@
191 for m in self.client.models.list():
192 mid = m.name or ""
193 # Strip prefix variants from different API modes
194 for prefix in ("models/", "publishers/google/models/"):
195 if mid.startswith(prefix):
196 mid = mid[len(prefix) :]
197 break
198 display = getattr(m, "display_name", mid) or mid
199
200 caps = []
201 mid_lower = mid.lower()
@@ -206,14 +207,16 @@
207 caps.append("audio")
208 if "embedding" in mid_lower:
209 caps.append("embedding")
210
211 if caps:
212 models.append(
213 ModelInfo(
214 id=mid,
215 provider="gemini",
216 display_name=display,
217 capabilities=caps,
218 )
219 )
220 except Exception as e:
221 logger.warning(f"Failed to list Gemini models: {e}")
222 return sorted(models, key=lambda m: m.id)
223
--- video_processor/providers/manager.py
+++ video_processor/providers/manager.py
@@ -1,9 +1,8 @@
11
"""ProviderManager - unified interface for routing API calls to the best available provider."""
22
33
import logging
4
-import os
54
from pathlib import Path
65
from typing import Optional
76
87
from dotenv import load_dotenv
98
@@ -67,11 +66,13 @@
6766
6867
# If a single provider is forced, apply it
6968
if provider:
7069
self.vision_model = vision_model or self._default_for_provider(provider, "vision")
7170
self.chat_model = chat_model or self._default_for_provider(provider, "chat")
72
- self.transcription_model = transcription_model or self._default_for_provider(provider, "audio")
71
+ self.transcription_model = transcription_model or self._default_for_provider(
72
+ provider, "audio"
73
+ )
7374
else:
7475
self.vision_model = vision_model
7576
self.chat_model = chat_model
7677
self.transcription_model = transcription_model
7778
@@ -80,34 +81,51 @@
8081
@staticmethod
8182
def _default_for_provider(provider: str, capability: str) -> str:
8283
"""Return the default model for a provider/capability combo."""
8384
defaults = {
8485
"openai": {"chat": "gpt-4o", "vision": "gpt-4o", "audio": "whisper-1"},
85
- "anthropic": {"chat": "claude-sonnet-4-5-20250929", "vision": "claude-sonnet-4-5-20250929", "audio": ""},
86
- "gemini": {"chat": "gemini-2.5-flash", "vision": "gemini-2.5-flash", "audio": "gemini-2.5-flash"},
86
+ "anthropic": {
87
+ "chat": "claude-sonnet-4-5-20250929",
88
+ "vision": "claude-sonnet-4-5-20250929",
89
+ "audio": "",
90
+ },
91
+ "gemini": {
92
+ "chat": "gemini-2.5-flash",
93
+ "vision": "gemini-2.5-flash",
94
+ "audio": "gemini-2.5-flash",
95
+ },
8796
}
8897
return defaults.get(provider, {}).get(capability, "")
8998
9099
def _get_provider(self, provider_name: str) -> BaseProvider:
91100
"""Lazily initialize and cache a provider instance."""
92101
if provider_name not in self._providers:
93102
if provider_name == "openai":
94103
from video_processor.providers.openai_provider import OpenAIProvider
104
+
95105
self._providers[provider_name] = OpenAIProvider()
96106
elif provider_name == "anthropic":
97107
from video_processor.providers.anthropic_provider import AnthropicProvider
108
+
98109
self._providers[provider_name] = AnthropicProvider()
99110
elif provider_name == "gemini":
100111
from video_processor.providers.gemini_provider import GeminiProvider
112
+
101113
self._providers[provider_name] = GeminiProvider()
102114
else:
103115
raise ValueError(f"Unknown provider: {provider_name}")
104116
return self._providers[provider_name]
105117
106118
def _provider_for_model(self, model_id: str) -> str:
107119
"""Infer the provider from a model id."""
108
- if model_id.startswith("gpt-") or model_id.startswith("o1") or model_id.startswith("o3") or model_id.startswith("o4") or model_id.startswith("whisper"):
120
+ if (
121
+ model_id.startswith("gpt-")
122
+ or model_id.startswith("o1")
123
+ or model_id.startswith("o3")
124
+ or model_id.startswith("o4")
125
+ or model_id.startswith("whisper")
126
+ ):
109127
return "openai"
110128
if model_id.startswith("claude-"):
111129
return "anthropic"
112130
if model_id.startswith("gemini-"):
113131
return "gemini"
@@ -121,11 +139,13 @@
121139
def _get_available_models(self) -> list[ModelInfo]:
122140
if self._available_models is None:
123141
self._available_models = discover_available_models()
124142
return self._available_models
125143
126
- def _resolve_model(self, explicit: Optional[str], capability: str, preferences: list[tuple[str, str]]) -> tuple[str, str]:
144
+ def _resolve_model(
145
+ self, explicit: Optional[str], capability: str, preferences: list[tuple[str, str]]
146
+ ) -> tuple[str, str]:
127147
"""
128148
Resolve which (provider, model) to use for a capability.
129149
130150
Returns (provider_name, model_id).
131151
"""
@@ -169,11 +189,13 @@
169189
) -> str:
170190
"""Send a chat completion to the best available provider."""
171191
prov_name, model = self._resolve_model(self.chat_model, "chat", _CHAT_PREFERENCES)
172192
logger.info(f"Chat: using {prov_name}/{model}")
173193
provider = self._get_provider(prov_name)
174
- result = provider.chat(messages, max_tokens=max_tokens, temperature=temperature, model=model)
194
+ result = provider.chat(
195
+ messages, max_tokens=max_tokens, temperature=temperature, model=model
196
+ )
175197
self._track(provider, prov_name, model)
176198
return result
177199
178200
def analyze_image(
179201
self,
180202
--- video_processor/providers/manager.py
+++ video_processor/providers/manager.py
@@ -1,9 +1,8 @@
1 """ProviderManager - unified interface for routing API calls to the best available provider."""
2
3 import logging
4 import os
5 from pathlib import Path
6 from typing import Optional
7
8 from dotenv import load_dotenv
9
@@ -67,11 +66,13 @@
67
68 # If a single provider is forced, apply it
69 if provider:
70 self.vision_model = vision_model or self._default_for_provider(provider, "vision")
71 self.chat_model = chat_model or self._default_for_provider(provider, "chat")
72 self.transcription_model = transcription_model or self._default_for_provider(provider, "audio")
 
 
73 else:
74 self.vision_model = vision_model
75 self.chat_model = chat_model
76 self.transcription_model = transcription_model
77
@@ -80,34 +81,51 @@
80 @staticmethod
81 def _default_for_provider(provider: str, capability: str) -> str:
82 """Return the default model for a provider/capability combo."""
83 defaults = {
84 "openai": {"chat": "gpt-4o", "vision": "gpt-4o", "audio": "whisper-1"},
85 "anthropic": {"chat": "claude-sonnet-4-5-20250929", "vision": "claude-sonnet-4-5-20250929", "audio": ""},
86 "gemini": {"chat": "gemini-2.5-flash", "vision": "gemini-2.5-flash", "audio": "gemini-2.5-flash"},
 
 
 
 
 
 
 
 
87 }
88 return defaults.get(provider, {}).get(capability, "")
89
90 def _get_provider(self, provider_name: str) -> BaseProvider:
91 """Lazily initialize and cache a provider instance."""
92 if provider_name not in self._providers:
93 if provider_name == "openai":
94 from video_processor.providers.openai_provider import OpenAIProvider
 
95 self._providers[provider_name] = OpenAIProvider()
96 elif provider_name == "anthropic":
97 from video_processor.providers.anthropic_provider import AnthropicProvider
 
98 self._providers[provider_name] = AnthropicProvider()
99 elif provider_name == "gemini":
100 from video_processor.providers.gemini_provider import GeminiProvider
 
101 self._providers[provider_name] = GeminiProvider()
102 else:
103 raise ValueError(f"Unknown provider: {provider_name}")
104 return self._providers[provider_name]
105
106 def _provider_for_model(self, model_id: str) -> str:
107 """Infer the provider from a model id."""
108 if model_id.startswith("gpt-") or model_id.startswith("o1") or model_id.startswith("o3") or model_id.startswith("o4") or model_id.startswith("whisper"):
 
 
 
 
 
 
109 return "openai"
110 if model_id.startswith("claude-"):
111 return "anthropic"
112 if model_id.startswith("gemini-"):
113 return "gemini"
@@ -121,11 +139,13 @@
121 def _get_available_models(self) -> list[ModelInfo]:
122 if self._available_models is None:
123 self._available_models = discover_available_models()
124 return self._available_models
125
126 def _resolve_model(self, explicit: Optional[str], capability: str, preferences: list[tuple[str, str]]) -> tuple[str, str]:
 
 
127 """
128 Resolve which (provider, model) to use for a capability.
129
130 Returns (provider_name, model_id).
131 """
@@ -169,11 +189,13 @@
169 ) -> str:
170 """Send a chat completion to the best available provider."""
171 prov_name, model = self._resolve_model(self.chat_model, "chat", _CHAT_PREFERENCES)
172 logger.info(f"Chat: using {prov_name}/{model}")
173 provider = self._get_provider(prov_name)
174 result = provider.chat(messages, max_tokens=max_tokens, temperature=temperature, model=model)
 
 
175 self._track(provider, prov_name, model)
176 return result
177
178 def analyze_image(
179 self,
180
--- video_processor/providers/manager.py
+++ video_processor/providers/manager.py
@@ -1,9 +1,8 @@
1 """ProviderManager - unified interface for routing API calls to the best available provider."""
2
3 import logging
 
4 from pathlib import Path
5 from typing import Optional
6
7 from dotenv import load_dotenv
8
@@ -67,11 +66,13 @@
66
67 # If a single provider is forced, apply it
68 if provider:
69 self.vision_model = vision_model or self._default_for_provider(provider, "vision")
70 self.chat_model = chat_model or self._default_for_provider(provider, "chat")
71 self.transcription_model = transcription_model or self._default_for_provider(
72 provider, "audio"
73 )
74 else:
75 self.vision_model = vision_model
76 self.chat_model = chat_model
77 self.transcription_model = transcription_model
78
@@ -80,34 +81,51 @@
81 @staticmethod
82 def _default_for_provider(provider: str, capability: str) -> str:
83 """Return the default model for a provider/capability combo."""
84 defaults = {
85 "openai": {"chat": "gpt-4o", "vision": "gpt-4o", "audio": "whisper-1"},
86 "anthropic": {
87 "chat": "claude-sonnet-4-5-20250929",
88 "vision": "claude-sonnet-4-5-20250929",
89 "audio": "",
90 },
91 "gemini": {
92 "chat": "gemini-2.5-flash",
93 "vision": "gemini-2.5-flash",
94 "audio": "gemini-2.5-flash",
95 },
96 }
97 return defaults.get(provider, {}).get(capability, "")
98
99 def _get_provider(self, provider_name: str) -> BaseProvider:
100 """Lazily initialize and cache a provider instance."""
101 if provider_name not in self._providers:
102 if provider_name == "openai":
103 from video_processor.providers.openai_provider import OpenAIProvider
104
105 self._providers[provider_name] = OpenAIProvider()
106 elif provider_name == "anthropic":
107 from video_processor.providers.anthropic_provider import AnthropicProvider
108
109 self._providers[provider_name] = AnthropicProvider()
110 elif provider_name == "gemini":
111 from video_processor.providers.gemini_provider import GeminiProvider
112
113 self._providers[provider_name] = GeminiProvider()
114 else:
115 raise ValueError(f"Unknown provider: {provider_name}")
116 return self._providers[provider_name]
117
118 def _provider_for_model(self, model_id: str) -> str:
119 """Infer the provider from a model id."""
120 if (
121 model_id.startswith("gpt-")
122 or model_id.startswith("o1")
123 or model_id.startswith("o3")
124 or model_id.startswith("o4")
125 or model_id.startswith("whisper")
126 ):
127 return "openai"
128 if model_id.startswith("claude-"):
129 return "anthropic"
130 if model_id.startswith("gemini-"):
131 return "gemini"
@@ -121,11 +139,13 @@
139 def _get_available_models(self) -> list[ModelInfo]:
140 if self._available_models is None:
141 self._available_models = discover_available_models()
142 return self._available_models
143
144 def _resolve_model(
145 self, explicit: Optional[str], capability: str, preferences: list[tuple[str, str]]
146 ) -> tuple[str, str]:
147 """
148 Resolve which (provider, model) to use for a capability.
149
150 Returns (provider_name, model_id).
151 """
@@ -169,11 +189,13 @@
189 ) -> str:
190 """Send a chat completion to the best available provider."""
191 prov_name, model = self._resolve_model(self.chat_model, "chat", _CHAT_PREFERENCES)
192 logger.info(f"Chat: using {prov_name}/{model}")
193 provider = self._get_provider(prov_name)
194 result = provider.chat(
195 messages, max_tokens=max_tokens, temperature=temperature, model=model
196 )
197 self._track(provider, prov_name, model)
198 return result
199
200 def analyze_image(
201 self,
202
--- video_processor/providers/openai_provider.py
+++ video_processor/providers/openai_provider.py
@@ -13,11 +13,22 @@
1313
1414
load_dotenv()
1515
logger = logging.getLogger(__name__)
1616
1717
# Models known to have vision capability
18
-_VISION_MODELS = {"gpt-4o", "gpt-4o-mini", "gpt-4-turbo", "gpt-4.1", "gpt-4.1-mini", "gpt-4.1-nano", "o1", "o3", "o3-mini", "o4-mini"}
18
+_VISION_MODELS = {
19
+ "gpt-4o",
20
+ "gpt-4o-mini",
21
+ "gpt-4-turbo",
22
+ "gpt-4.1",
23
+ "gpt-4.1-mini",
24
+ "gpt-4.1-nano",
25
+ "o1",
26
+ "o3",
27
+ "o3-mini",
28
+ "o4-mini",
29
+}
1930
_AUDIO_MODELS = {"whisper-1"}
2031
2132
2233
class OpenAIProvider(BaseProvider):
2334
"""OpenAI API provider."""
@@ -44,11 +55,13 @@
4455
max_tokens=max_tokens,
4556
temperature=temperature,
4657
)
4758
self._last_usage = {
4859
"input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0,
49
- "output_tokens": getattr(response.usage, "completion_tokens", 0) if response.usage else 0,
60
+ "output_tokens": getattr(response.usage, "completion_tokens", 0)
61
+ if response.usage
62
+ else 0,
5063
}
5164
return response.choices[0].message.content or ""
5265
5366
def analyze_image(
5467
self,
@@ -75,11 +88,13 @@
7588
],
7689
max_tokens=max_tokens,
7790
)
7891
self._last_usage = {
7992
"input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0,
80
- "output_tokens": getattr(response.usage, "completion_tokens", 0) if response.usage else 0,
93
+ "output_tokens": getattr(response.usage, "completion_tokens", 0)
94
+ if response.usage
95
+ else 0,
8196
}
8297
return response.choices[0].message.content or ""
8398
8499
# Whisper API limit is 25MB
85100
_MAX_FILE_SIZE = 25 * 1024 * 1024
@@ -101,13 +116,11 @@
101116
logger.info(
102117
f"Audio file {file_size / 1024 / 1024:.1f}MB exceeds Whisper 25MB limit, chunking..."
103118
)
104119
return self._transcribe_chunked(audio_path, language, model)
105120
106
- def _transcribe_single(
107
- self, audio_path: Path, language: Optional[str], model: str
108
- ) -> dict:
121
+ def _transcribe_single(self, audio_path: Path, language: Optional[str], model: str) -> dict:
109122
"""Transcribe a single audio file."""
110123
with open(audio_path, "rb") as f:
111124
kwargs = {"model": model, "file": f}
112125
if language:
113126
kwargs["language"] = language
@@ -128,15 +141,14 @@
128141
"duration": getattr(response, "duration", None),
129142
"provider": "openai",
130143
"model": model,
131144
}
132145
133
- def _transcribe_chunked(
134
- self, audio_path: Path, language: Optional[str], model: str
135
- ) -> dict:
146
+ def _transcribe_chunked(self, audio_path: Path, language: Optional[str], model: str) -> dict:
136147
"""Split audio into chunks under 25MB and transcribe each."""
137148
import tempfile
149
+
138150
from video_processor.extractors.audio_extractor import AudioExtractor
139151
140152
extractor = AudioExtractor()
141153
audio_data, sr = extractor.load_audio(audio_path)
142154
total_duration = len(audio_data) / sr
@@ -164,15 +176,17 @@
164176
logger.info(f"Transcribing chunk {i + 1}/{len(segments_data)}...")
165177
result = self._transcribe_single(chunk_path, language, model)
166178
167179
all_text.append(result["text"])
168180
for seg in result.get("segments", []):
169
- all_segments.append({
170
- "start": seg["start"] + time_offset,
171
- "end": seg["end"] + time_offset,
172
- "text": seg["text"],
173
- })
181
+ all_segments.append(
182
+ {
183
+ "start": seg["start"] + time_offset,
184
+ "end": seg["end"] + time_offset,
185
+ "text": seg["text"],
186
+ }
187
+ )
174188
175189
if not detected_language and result.get("language"):
176190
detected_language = result["language"]
177191
178192
time_offset += len(chunk) / sr
@@ -200,14 +214,16 @@
200214
if mid in _AUDIO_MODELS or mid.startswith("whisper"):
201215
caps.append("audio")
202216
if "embedding" in mid:
203217
caps.append("embedding")
204218
if caps:
205
- models.append(ModelInfo(
206
- id=mid,
207
- provider="openai",
208
- display_name=mid,
209
- capabilities=caps,
210
- ))
219
+ models.append(
220
+ ModelInfo(
221
+ id=mid,
222
+ provider="openai",
223
+ display_name=mid,
224
+ capabilities=caps,
225
+ )
226
+ )
211227
except Exception as e:
212228
logger.warning(f"Failed to list OpenAI models: {e}")
213229
return sorted(models, key=lambda m: m.id)
214230
--- video_processor/providers/openai_provider.py
+++ video_processor/providers/openai_provider.py
@@ -13,11 +13,22 @@
13
14 load_dotenv()
15 logger = logging.getLogger(__name__)
16
17 # Models known to have vision capability
18 _VISION_MODELS = {"gpt-4o", "gpt-4o-mini", "gpt-4-turbo", "gpt-4.1", "gpt-4.1-mini", "gpt-4.1-nano", "o1", "o3", "o3-mini", "o4-mini"}
 
 
 
 
 
 
 
 
 
 
 
19 _AUDIO_MODELS = {"whisper-1"}
20
21
22 class OpenAIProvider(BaseProvider):
23 """OpenAI API provider."""
@@ -44,11 +55,13 @@
44 max_tokens=max_tokens,
45 temperature=temperature,
46 )
47 self._last_usage = {
48 "input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0,
49 "output_tokens": getattr(response.usage, "completion_tokens", 0) if response.usage else 0,
 
 
50 }
51 return response.choices[0].message.content or ""
52
53 def analyze_image(
54 self,
@@ -75,11 +88,13 @@
75 ],
76 max_tokens=max_tokens,
77 )
78 self._last_usage = {
79 "input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0,
80 "output_tokens": getattr(response.usage, "completion_tokens", 0) if response.usage else 0,
 
 
81 }
82 return response.choices[0].message.content or ""
83
84 # Whisper API limit is 25MB
85 _MAX_FILE_SIZE = 25 * 1024 * 1024
@@ -101,13 +116,11 @@
101 logger.info(
102 f"Audio file {file_size / 1024 / 1024:.1f}MB exceeds Whisper 25MB limit, chunking..."
103 )
104 return self._transcribe_chunked(audio_path, language, model)
105
106 def _transcribe_single(
107 self, audio_path: Path, language: Optional[str], model: str
108 ) -> dict:
109 """Transcribe a single audio file."""
110 with open(audio_path, "rb") as f:
111 kwargs = {"model": model, "file": f}
112 if language:
113 kwargs["language"] = language
@@ -128,15 +141,14 @@
128 "duration": getattr(response, "duration", None),
129 "provider": "openai",
130 "model": model,
131 }
132
133 def _transcribe_chunked(
134 self, audio_path: Path, language: Optional[str], model: str
135 ) -> dict:
136 """Split audio into chunks under 25MB and transcribe each."""
137 import tempfile
 
138 from video_processor.extractors.audio_extractor import AudioExtractor
139
140 extractor = AudioExtractor()
141 audio_data, sr = extractor.load_audio(audio_path)
142 total_duration = len(audio_data) / sr
@@ -164,15 +176,17 @@
164 logger.info(f"Transcribing chunk {i + 1}/{len(segments_data)}...")
165 result = self._transcribe_single(chunk_path, language, model)
166
167 all_text.append(result["text"])
168 for seg in result.get("segments", []):
169 all_segments.append({
170 "start": seg["start"] + time_offset,
171 "end": seg["end"] + time_offset,
172 "text": seg["text"],
173 })
 
 
174
175 if not detected_language and result.get("language"):
176 detected_language = result["language"]
177
178 time_offset += len(chunk) / sr
@@ -200,14 +214,16 @@
200 if mid in _AUDIO_MODELS or mid.startswith("whisper"):
201 caps.append("audio")
202 if "embedding" in mid:
203 caps.append("embedding")
204 if caps:
205 models.append(ModelInfo(
206 id=mid,
207 provider="openai",
208 display_name=mid,
209 capabilities=caps,
210 ))
 
 
211 except Exception as e:
212 logger.warning(f"Failed to list OpenAI models: {e}")
213 return sorted(models, key=lambda m: m.id)
214
--- video_processor/providers/openai_provider.py
+++ video_processor/providers/openai_provider.py
@@ -13,11 +13,22 @@
13
14 load_dotenv()
15 logger = logging.getLogger(__name__)
16
17 # Models known to have vision capability
18 _VISION_MODELS = {
19 "gpt-4o",
20 "gpt-4o-mini",
21 "gpt-4-turbo",
22 "gpt-4.1",
23 "gpt-4.1-mini",
24 "gpt-4.1-nano",
25 "o1",
26 "o3",
27 "o3-mini",
28 "o4-mini",
29 }
30 _AUDIO_MODELS = {"whisper-1"}
31
32
33 class OpenAIProvider(BaseProvider):
34 """OpenAI API provider."""
@@ -44,11 +55,13 @@
55 max_tokens=max_tokens,
56 temperature=temperature,
57 )
58 self._last_usage = {
59 "input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0,
60 "output_tokens": getattr(response.usage, "completion_tokens", 0)
61 if response.usage
62 else 0,
63 }
64 return response.choices[0].message.content or ""
65
66 def analyze_image(
67 self,
@@ -75,11 +88,13 @@
88 ],
89 max_tokens=max_tokens,
90 )
91 self._last_usage = {
92 "input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0,
93 "output_tokens": getattr(response.usage, "completion_tokens", 0)
94 if response.usage
95 else 0,
96 }
97 return response.choices[0].message.content or ""
98
99 # Whisper API limit is 25MB
100 _MAX_FILE_SIZE = 25 * 1024 * 1024
@@ -101,13 +116,11 @@
116 logger.info(
117 f"Audio file {file_size / 1024 / 1024:.1f}MB exceeds Whisper 25MB limit, chunking..."
118 )
119 return self._transcribe_chunked(audio_path, language, model)
120
121 def _transcribe_single(self, audio_path: Path, language: Optional[str], model: str) -> dict:
 
 
122 """Transcribe a single audio file."""
123 with open(audio_path, "rb") as f:
124 kwargs = {"model": model, "file": f}
125 if language:
126 kwargs["language"] = language
@@ -128,15 +141,14 @@
141 "duration": getattr(response, "duration", None),
142 "provider": "openai",
143 "model": model,
144 }
145
146 def _transcribe_chunked(self, audio_path: Path, language: Optional[str], model: str) -> dict:
 
 
147 """Split audio into chunks under 25MB and transcribe each."""
148 import tempfile
149
150 from video_processor.extractors.audio_extractor import AudioExtractor
151
152 extractor = AudioExtractor()
153 audio_data, sr = extractor.load_audio(audio_path)
154 total_duration = len(audio_data) / sr
@@ -164,15 +176,17 @@
176 logger.info(f"Transcribing chunk {i + 1}/{len(segments_data)}...")
177 result = self._transcribe_single(chunk_path, language, model)
178
179 all_text.append(result["text"])
180 for seg in result.get("segments", []):
181 all_segments.append(
182 {
183 "start": seg["start"] + time_offset,
184 "end": seg["end"] + time_offset,
185 "text": seg["text"],
186 }
187 )
188
189 if not detected_language and result.get("language"):
190 detected_language = result["language"]
191
192 time_offset += len(chunk) / sr
@@ -200,14 +214,16 @@
214 if mid in _AUDIO_MODELS or mid.startswith("whisper"):
215 caps.append("audio")
216 if "embedding" in mid:
217 caps.append("embedding")
218 if caps:
219 models.append(
220 ModelInfo(
221 id=mid,
222 provider="openai",
223 display_name=mid,
224 capabilities=caps,
225 )
226 )
227 except Exception as e:
228 logger.warning(f"Failed to list OpenAI models: {e}")
229 return sorted(models, key=lambda m: m.id)
230
--- video_processor/providers/whisper_local.py
+++ video_processor/providers/whisper_local.py
@@ -69,13 +69,11 @@
6969
return
7070
7171
try:
7272
import whisper
7373
except ImportError:
74
- raise ImportError(
75
- "openai-whisper not installed. Run: pip install openai-whisper torch"
76
- )
74
+ raise ImportError("openai-whisper not installed. Run: pip install openai-whisper torch")
7775
7876
logger.info(f"Loading Whisper {self.model_size} model on {self.device}...")
7977
self._model = whisper.load_model(self.model_size, device=self.device)
8078
logger.info("Whisper model loaded")
8179
@@ -125,10 +123,11 @@
125123
126124
@staticmethod
127125
def is_available() -> bool:
128126
"""Check if local Whisper is installed and usable."""
129127
try:
130
- import whisper
131
- import torch
128
+ import torch # noqa: F401
129
+ import whisper # noqa: F401
130
+
132131
return True
133132
except ImportError:
134133
return False
135134
--- video_processor/providers/whisper_local.py
+++ video_processor/providers/whisper_local.py
@@ -69,13 +69,11 @@
69 return
70
71 try:
72 import whisper
73 except ImportError:
74 raise ImportError(
75 "openai-whisper not installed. Run: pip install openai-whisper torch"
76 )
77
78 logger.info(f"Loading Whisper {self.model_size} model on {self.device}...")
79 self._model = whisper.load_model(self.model_size, device=self.device)
80 logger.info("Whisper model loaded")
81
@@ -125,10 +123,11 @@
125
126 @staticmethod
127 def is_available() -> bool:
128 """Check if local Whisper is installed and usable."""
129 try:
130 import whisper
131 import torch
 
132 return True
133 except ImportError:
134 return False
135
--- video_processor/providers/whisper_local.py
+++ video_processor/providers/whisper_local.py
@@ -69,13 +69,11 @@
69 return
70
71 try:
72 import whisper
73 except ImportError:
74 raise ImportError("openai-whisper not installed. Run: pip install openai-whisper torch")
 
 
75
76 logger.info(f"Loading Whisper {self.model_size} model on {self.device}...")
77 self._model = whisper.load_model(self.model_size, device=self.device)
78 logger.info("Whisper model loaded")
79
@@ -125,10 +123,11 @@
123
124 @staticmethod
125 def is_available() -> bool:
126 """Check if local Whisper is installed and usable."""
127 try:
128 import torch # noqa: F401
129 import whisper # noqa: F401
130
131 return True
132 except ImportError:
133 return False
134
--- video_processor/sources/base.py
+++ video_processor/sources/base.py
@@ -10,10 +10,11 @@
1010
logger = logging.getLogger(__name__)
1111
1212
1313
class SourceFile(BaseModel):
1414
"""A file available in a cloud source."""
15
+
1516
name: str = Field(description="File name")
1617
id: str = Field(description="Provider-specific file identifier")
1718
size_bytes: Optional[int] = Field(default=None, description="File size in bytes")
1819
mime_type: Optional[str] = Field(default=None, description="MIME type")
1920
modified_at: Optional[str] = Field(default=None, description="Last modified timestamp")
2021
--- video_processor/sources/base.py
+++ video_processor/sources/base.py
@@ -10,10 +10,11 @@
10 logger = logging.getLogger(__name__)
11
12
13 class SourceFile(BaseModel):
14 """A file available in a cloud source."""
 
15 name: str = Field(description="File name")
16 id: str = Field(description="Provider-specific file identifier")
17 size_bytes: Optional[int] = Field(default=None, description="File size in bytes")
18 mime_type: Optional[str] = Field(default=None, description="MIME type")
19 modified_at: Optional[str] = Field(default=None, description="Last modified timestamp")
20
--- video_processor/sources/base.py
+++ video_processor/sources/base.py
@@ -10,10 +10,11 @@
10 logger = logging.getLogger(__name__)
11
12
13 class SourceFile(BaseModel):
14 """A file available in a cloud source."""
15
16 name: str = Field(description="File name")
17 id: str = Field(description="Provider-specific file identifier")
18 size_bytes: Optional[int] = Field(default=None, description="File size in bytes")
19 mime_type: Optional[str] = Field(default=None, description="MIME type")
20 modified_at: Optional[str] = Field(default=None, description="Last modified timestamp")
21
--- video_processor/sources/dropbox_source.py
+++ video_processor/sources/dropbox_source.py
@@ -56,13 +56,11 @@
5656
def authenticate(self) -> bool:
5757
"""Authenticate with Dropbox API."""
5858
try:
5959
import dropbox
6060
except ImportError:
61
- logger.error(
62
- "Dropbox SDK not installed. Run: pip install planopticon[dropbox]"
63
- )
61
+ logger.error("Dropbox SDK not installed. Run: pip install planopticon[dropbox]")
6462
return False
6563
6664
# Try direct access token first
6765
if self.access_token:
6866
return self._auth_token(dropbox)
@@ -109,13 +107,11 @@
109107
return False
110108
111109
def _auth_oauth(self, dropbox) -> bool:
112110
"""Run OAuth2 PKCE flow."""
113111
if not self.app_key:
114
- logger.error(
115
- "Dropbox app key not configured. Set DROPBOX_APP_KEY env var."
116
- )
112
+ logger.error("Dropbox app key not configured. Set DROPBOX_APP_KEY env var.")
117113
return False
118114
119115
try:
120116
flow = dropbox.DropboxOAuth2FlowNoRedirect(
121117
consumer_key=self.app_key,
@@ -187,13 +183,11 @@
187183
ext = Path(entry.name).suffix.lower()
188184
if ext not in VIDEO_EXTENSIONS:
189185
continue
190186
191187
if patterns:
192
- if not any(
193
- entry.name.endswith(p.replace("*", "")) for p in patterns
194
- ):
188
+ if not any(entry.name.endswith(p.replace("*", "")) for p in patterns):
195189
continue
196190
197191
files.append(
198192
SourceFile(
199193
name=entry.name,
200194
--- video_processor/sources/dropbox_source.py
+++ video_processor/sources/dropbox_source.py
@@ -56,13 +56,11 @@
56 def authenticate(self) -> bool:
57 """Authenticate with Dropbox API."""
58 try:
59 import dropbox
60 except ImportError:
61 logger.error(
62 "Dropbox SDK not installed. Run: pip install planopticon[dropbox]"
63 )
64 return False
65
66 # Try direct access token first
67 if self.access_token:
68 return self._auth_token(dropbox)
@@ -109,13 +107,11 @@
109 return False
110
111 def _auth_oauth(self, dropbox) -> bool:
112 """Run OAuth2 PKCE flow."""
113 if not self.app_key:
114 logger.error(
115 "Dropbox app key not configured. Set DROPBOX_APP_KEY env var."
116 )
117 return False
118
119 try:
120 flow = dropbox.DropboxOAuth2FlowNoRedirect(
121 consumer_key=self.app_key,
@@ -187,13 +183,11 @@
187 ext = Path(entry.name).suffix.lower()
188 if ext not in VIDEO_EXTENSIONS:
189 continue
190
191 if patterns:
192 if not any(
193 entry.name.endswith(p.replace("*", "")) for p in patterns
194 ):
195 continue
196
197 files.append(
198 SourceFile(
199 name=entry.name,
200
--- video_processor/sources/dropbox_source.py
+++ video_processor/sources/dropbox_source.py
@@ -56,13 +56,11 @@
56 def authenticate(self) -> bool:
57 """Authenticate with Dropbox API."""
58 try:
59 import dropbox
60 except ImportError:
61 logger.error("Dropbox SDK not installed. Run: pip install planopticon[dropbox]")
 
 
62 return False
63
64 # Try direct access token first
65 if self.access_token:
66 return self._auth_token(dropbox)
@@ -109,13 +107,11 @@
107 return False
108
109 def _auth_oauth(self, dropbox) -> bool:
110 """Run OAuth2 PKCE flow."""
111 if not self.app_key:
112 logger.error("Dropbox app key not configured. Set DROPBOX_APP_KEY env var.")
 
 
113 return False
114
115 try:
116 flow = dropbox.DropboxOAuth2FlowNoRedirect(
117 consumer_key=self.app_key,
@@ -187,13 +183,11 @@
183 ext = Path(entry.name).suffix.lower()
184 if ext not in VIDEO_EXTENSIONS:
185 continue
186
187 if patterns:
188 if not any(entry.name.endswith(p.replace("*", "")) for p in patterns):
 
 
189 continue
190
191 files.append(
192 SourceFile(
193 name=entry.name,
194
--- video_processor/sources/google_drive.py
+++ video_processor/sources/google_drive.py
@@ -65,27 +65,23 @@
6565
If True, force service account auth. If False, force OAuth.
6666
If None, auto-detect from credentials file.
6767
token_path : Path, optional
6868
Where to store/load OAuth tokens. Defaults to ~/.planopticon/google_drive_token.json
6969
"""
70
- self.credentials_path = credentials_path or os.environ.get(
71
- "GOOGLE_APPLICATION_CREDENTIALS"
72
- )
70
+ self.credentials_path = credentials_path or os.environ.get("GOOGLE_APPLICATION_CREDENTIALS")
7371
self.use_service_account = use_service_account
7472
self.token_path = token_path or _TOKEN_PATH
7573
self.service = None
7674
self._creds = None
7775
7876
def authenticate(self) -> bool:
7977
"""Authenticate with Google Drive API."""
8078
try:
81
- from google.oauth2 import service_account as sa_module
79
+ from google.oauth2 import service_account as sa_module # noqa: F401
8280
from googleapiclient.discovery import build
8381
except ImportError:
84
- logger.error(
85
- "Google API client not installed. Run: pip install planopticon[gdrive]"
86
- )
82
+ logger.error("Google API client not installed. Run: pip install planopticon[gdrive]")
8783
return False
8884
8985
# Determine auth method
9086
if self.use_service_account is True or (
9187
self.use_service_account is None and self._is_service_account()
@@ -130,23 +126,19 @@
130126
try:
131127
from google.auth.transport.requests import Request
132128
from google.oauth2.credentials import Credentials
133129
from google_auth_oauthlib.flow import InstalledAppFlow
134130
except ImportError:
135
- logger.error(
136
- "OAuth libraries not installed. Run: pip install planopticon[gdrive]"
137
- )
131
+ logger.error("OAuth libraries not installed. Run: pip install planopticon[gdrive]")
138132
return False
139133
140134
creds = None
141135
142136
# Load existing token
143137
if self.token_path.exists():
144138
try:
145
- creds = Credentials.from_authorized_user_file(
146
- str(self.token_path), SCOPES
147
- )
139
+ creds = Credentials.from_authorized_user_file(str(self.token_path), SCOPES)
148140
except Exception:
149141
pass
150142
151143
# Refresh or run new flow
152144
if creds and creds.expired and creds.refresh_token:
@@ -251,13 +243,11 @@
251243
query_parts = []
252244
253245
if folder_id:
254246
query_parts.append(f"'{folder_id}' in parents")
255247
256
- mime_conditions = " or ".join(
257
- f"mimeType='{mt}'" for mt in VIDEO_MIME_TYPES
258
- )
248
+ mime_conditions = " or ".join(f"mimeType='{mt}'" for mt in VIDEO_MIME_TYPES)
259249
query_parts.append(f"({mime_conditions})")
260250
query_parts.append("trashed=false")
261251
262252
query = " and ".join(query_parts)
263253
page_token = None
@@ -275,13 +265,11 @@
275265
.execute()
276266
)
277267
278268
for f in response.get("files", []):
279269
name = f.get("name", "")
280
- if patterns and not any(
281
- name.endswith(p.replace("*", "")) for p in patterns
282
- ):
270
+ if patterns and not any(name.endswith(p.replace("*", "")) for p in patterns):
283271
continue
284272
285273
out.append(
286274
SourceFile(
287275
name=name,
@@ -336,11 +324,10 @@
336324
"""Download a file from Google Drive."""
337325
if not self.service:
338326
raise RuntimeError("Not authenticated. Call authenticate() first.")
339327
340328
from googleapiclient.http import MediaIoBaseDownload
341
- import io
342329
343330
destination = Path(destination)
344331
destination.parent.mkdir(parents=True, exist_ok=True)
345332
346333
request = self.service.files().get_media(fileId=file.id)
347334
--- video_processor/sources/google_drive.py
+++ video_processor/sources/google_drive.py
@@ -65,27 +65,23 @@
65 If True, force service account auth. If False, force OAuth.
66 If None, auto-detect from credentials file.
67 token_path : Path, optional
68 Where to store/load OAuth tokens. Defaults to ~/.planopticon/google_drive_token.json
69 """
70 self.credentials_path = credentials_path or os.environ.get(
71 "GOOGLE_APPLICATION_CREDENTIALS"
72 )
73 self.use_service_account = use_service_account
74 self.token_path = token_path or _TOKEN_PATH
75 self.service = None
76 self._creds = None
77
78 def authenticate(self) -> bool:
79 """Authenticate with Google Drive API."""
80 try:
81 from google.oauth2 import service_account as sa_module
82 from googleapiclient.discovery import build
83 except ImportError:
84 logger.error(
85 "Google API client not installed. Run: pip install planopticon[gdrive]"
86 )
87 return False
88
89 # Determine auth method
90 if self.use_service_account is True or (
91 self.use_service_account is None and self._is_service_account()
@@ -130,23 +126,19 @@
130 try:
131 from google.auth.transport.requests import Request
132 from google.oauth2.credentials import Credentials
133 from google_auth_oauthlib.flow import InstalledAppFlow
134 except ImportError:
135 logger.error(
136 "OAuth libraries not installed. Run: pip install planopticon[gdrive]"
137 )
138 return False
139
140 creds = None
141
142 # Load existing token
143 if self.token_path.exists():
144 try:
145 creds = Credentials.from_authorized_user_file(
146 str(self.token_path), SCOPES
147 )
148 except Exception:
149 pass
150
151 # Refresh or run new flow
152 if creds and creds.expired and creds.refresh_token:
@@ -251,13 +243,11 @@
251 query_parts = []
252
253 if folder_id:
254 query_parts.append(f"'{folder_id}' in parents")
255
256 mime_conditions = " or ".join(
257 f"mimeType='{mt}'" for mt in VIDEO_MIME_TYPES
258 )
259 query_parts.append(f"({mime_conditions})")
260 query_parts.append("trashed=false")
261
262 query = " and ".join(query_parts)
263 page_token = None
@@ -275,13 +265,11 @@
275 .execute()
276 )
277
278 for f in response.get("files", []):
279 name = f.get("name", "")
280 if patterns and not any(
281 name.endswith(p.replace("*", "")) for p in patterns
282 ):
283 continue
284
285 out.append(
286 SourceFile(
287 name=name,
@@ -336,11 +324,10 @@
336 """Download a file from Google Drive."""
337 if not self.service:
338 raise RuntimeError("Not authenticated. Call authenticate() first.")
339
340 from googleapiclient.http import MediaIoBaseDownload
341 import io
342
343 destination = Path(destination)
344 destination.parent.mkdir(parents=True, exist_ok=True)
345
346 request = self.service.files().get_media(fileId=file.id)
347
--- video_processor/sources/google_drive.py
+++ video_processor/sources/google_drive.py
@@ -65,27 +65,23 @@
65 If True, force service account auth. If False, force OAuth.
66 If None, auto-detect from credentials file.
67 token_path : Path, optional
68 Where to store/load OAuth tokens. Defaults to ~/.planopticon/google_drive_token.json
69 """
70 self.credentials_path = credentials_path or os.environ.get("GOOGLE_APPLICATION_CREDENTIALS")
 
 
71 self.use_service_account = use_service_account
72 self.token_path = token_path or _TOKEN_PATH
73 self.service = None
74 self._creds = None
75
76 def authenticate(self) -> bool:
77 """Authenticate with Google Drive API."""
78 try:
79 from google.oauth2 import service_account as sa_module # noqa: F401
80 from googleapiclient.discovery import build
81 except ImportError:
82 logger.error("Google API client not installed. Run: pip install planopticon[gdrive]")
 
 
83 return False
84
85 # Determine auth method
86 if self.use_service_account is True or (
87 self.use_service_account is None and self._is_service_account()
@@ -130,23 +126,19 @@
126 try:
127 from google.auth.transport.requests import Request
128 from google.oauth2.credentials import Credentials
129 from google_auth_oauthlib.flow import InstalledAppFlow
130 except ImportError:
131 logger.error("OAuth libraries not installed. Run: pip install planopticon[gdrive]")
 
 
132 return False
133
134 creds = None
135
136 # Load existing token
137 if self.token_path.exists():
138 try:
139 creds = Credentials.from_authorized_user_file(str(self.token_path), SCOPES)
 
 
140 except Exception:
141 pass
142
143 # Refresh or run new flow
144 if creds and creds.expired and creds.refresh_token:
@@ -251,13 +243,11 @@
243 query_parts = []
244
245 if folder_id:
246 query_parts.append(f"'{folder_id}' in parents")
247
248 mime_conditions = " or ".join(f"mimeType='{mt}'" for mt in VIDEO_MIME_TYPES)
 
 
249 query_parts.append(f"({mime_conditions})")
250 query_parts.append("trashed=false")
251
252 query = " and ".join(query_parts)
253 page_token = None
@@ -275,13 +265,11 @@
265 .execute()
266 )
267
268 for f in response.get("files", []):
269 name = f.get("name", "")
270 if patterns and not any(name.endswith(p.replace("*", "")) for p in patterns):
 
 
271 continue
272
273 out.append(
274 SourceFile(
275 name=name,
@@ -336,11 +324,10 @@
324 """Download a file from Google Drive."""
325 if not self.service:
326 raise RuntimeError("Not authenticated. Call authenticate() first.")
327
328 from googleapiclient.http import MediaIoBaseDownload
 
329
330 destination = Path(destination)
331 destination.parent.mkdir(parents=True, exist_ok=True)
332
333 request = self.service.files().get_media(fileId=file.id)
334
--- video_processor/utils/api_cache.py
+++ video_processor/utils/api_cache.py
@@ -1,28 +1,30 @@
11
"""Caching system for API responses to reduce API calls and costs."""
2
+
3
+import hashlib
24
import json
35
import logging
46
import os
57
import time
6
-import hashlib
78
from pathlib import Path
89
from typing import Any, Dict, Optional, Union
910
1011
logger = logging.getLogger(__name__)
12
+
1113
1214
class ApiCache:
1315
"""Disk-based API response cache."""
14
-
16
+
1517
def __init__(
16
- self,
17
- cache_dir: Union[str, Path],
18
+ self,
19
+ cache_dir: Union[str, Path],
1820
namespace: str = "default",
19
- ttl: int = 86400 # 24 hours in seconds
21
+ ttl: int = 86400, # 24 hours in seconds
2022
):
2123
"""
2224
Initialize API cache.
23
-
25
+
2426
Parameters
2527
----------
2628
cache_dir : str or Path
2729
Directory for cache files
2830
namespace : str
@@ -31,206 +33,198 @@
3133
Time-to-live for cache entries in seconds
3234
"""
3335
self.cache_dir = Path(cache_dir)
3436
self.namespace = namespace
3537
self.ttl = ttl
36
-
38
+
3739
# Ensure namespace directory exists
3840
self.namespace_dir = self.cache_dir / namespace
3941
self.namespace_dir.mkdir(parents=True, exist_ok=True)
40
-
42
+
4143
logger.debug(f"Initialized API cache in {self.namespace_dir}")
42
-
44
+
4345
def get_cache_path(self, key: str) -> Path:
4446
"""
4547
Get path to cache file for key.
46
-
48
+
4749
Parameters
4850
----------
4951
key : str
5052
Cache key
51
-
53
+
5254
Returns
5355
-------
5456
Path
5557
Path to cache file
5658
"""
5759
# Hash the key to ensure valid filename
5860
hashed_key = hashlib.md5(key.encode()).hexdigest()
5961
return self.namespace_dir / f"{hashed_key}.json"
60
-
62
+
6163
def get(self, key: str) -> Optional[Any]:
6264
"""
6365
Get value from cache.
64
-
66
+
6567
Parameters
6668
----------
6769
key : str
6870
Cache key
69
-
71
+
7072
Returns
7173
-------
7274
object or None
7375
Cached value if available and not expired, None otherwise
7476
"""
7577
cache_path = self.get_cache_path(key)
76
-
78
+
7779
# Check if cache file exists
7880
if not cache_path.exists():
7981
return None
80
-
82
+
8183
try:
8284
# Read cache file
8385
with open(cache_path, "r", encoding="utf-8") as f:
8486
cache_data = json.load(f)
85
-
87
+
8688
# Check if cache entry is expired
8789
timestamp = cache_data.get("timestamp", 0)
8890
now = time.time()
89
-
91
+
9092
if now - timestamp > self.ttl:
9193
logger.debug(f"Cache entry expired for {key}")
9294
return None
93
-
95
+
9496
logger.debug(f"Cache hit for {key}")
9597
return cache_data.get("value")
96
-
98
+
9799
except Exception as e:
98100
logger.warning(f"Error reading cache: {str(e)}")
99101
return None
100
-
102
+
101103
def set(self, key: str, value: Any) -> bool:
102104
"""
103105
Set value in cache.
104
-
106
+
105107
Parameters
106108
----------
107109
key : str
108110
Cache key
109111
value : object
110112
Value to cache (must be JSON serializable)
111
-
113
+
112114
Returns
113115
-------
114116
bool
115117
True if successful, False otherwise
116118
"""
117119
cache_path = self.get_cache_path(key)
118
-
120
+
119121
try:
120122
# Prepare cache data
121
- cache_data = {
122
- "timestamp": time.time(),
123
- "value": value
124
- }
125
-
123
+ cache_data = {"timestamp": time.time(), "value": value}
124
+
126125
# Write to cache file
127126
with open(cache_path, "w", encoding="utf-8") as f:
128127
json.dump(cache_data, f, ensure_ascii=False)
129
-
128
+
130129
logger.debug(f"Cached value for {key}")
131130
return True
132
-
131
+
133132
except Exception as e:
134133
logger.warning(f"Error writing to cache: {str(e)}")
135134
return False
136
-
135
+
137136
def invalidate(self, key: str) -> bool:
138137
"""
139138
Invalidate cache entry.
140
-
139
+
141140
Parameters
142141
----------
143142
key : str
144143
Cache key
145
-
144
+
146145
Returns
147146
-------
148147
bool
149148
True if entry was removed, False otherwise
150149
"""
151150
cache_path = self.get_cache_path(key)
152
-
151
+
153152
if cache_path.exists():
154153
try:
155154
os.remove(cache_path)
156155
logger.debug(f"Invalidated cache for {key}")
157156
return True
158157
except Exception as e:
159158
logger.warning(f"Error invalidating cache: {str(e)}")
160
-
159
+
161160
return False
162
-
161
+
163162
def clear(self, older_than: Optional[int] = None) -> int:
164163
"""
165164
Clear all cache entries or entries older than specified time.
166
-
165
+
167166
Parameters
168167
----------
169168
older_than : int, optional
170169
Clear entries older than this many seconds
171
-
170
+
172171
Returns
173172
-------
174173
int
175174
Number of entries cleared
176175
"""
177176
count = 0
178177
now = time.time()
179
-
178
+
180179
for cache_file in self.namespace_dir.glob("*.json"):
181180
try:
182181
# Check file age if criteria provided
183182
if older_than is not None:
184183
file_age = now - os.path.getmtime(cache_file)
185184
if file_age <= older_than:
186185
continue
187
-
186
+
188187
# Remove file
189188
os.remove(cache_file)
190189
count += 1
191
-
190
+
192191
except Exception as e:
193192
logger.warning(f"Error clearing cache file {cache_file}: {str(e)}")
194
-
193
+
195194
logger.info(f"Cleared {count} cache entries from {self.namespace}")
196195
return count
197
-
196
+
198197
def get_stats(self) -> Dict:
199198
"""
200199
Get cache statistics.
201
-
200
+
202201
Returns
203202
-------
204203
dict
205204
Cache statistics
206205
"""
207206
cache_files = list(self.namespace_dir.glob("*.json"))
208207
total_size = sum(os.path.getsize(f) for f in cache_files)
209
-
208
+
210209
# Analyze age distribution
211210
now = time.time()
212
- age_distribution = {
213
- "1h": 0,
214
- "6h": 0,
215
- "24h": 0,
216
- "older": 0
217
- }
218
-
211
+ age_distribution = {"1h": 0, "6h": 0, "24h": 0, "older": 0}
212
+
219213
for cache_file in cache_files:
220214
file_age = now - os.path.getmtime(cache_file)
221
-
215
+
222216
if file_age <= 3600: # 1 hour
223217
age_distribution["1h"] += 1
224218
elif file_age <= 21600: # 6 hours
225219
age_distribution["6h"] += 1
226220
elif file_age <= 86400: # 24 hours
227221
age_distribution["24h"] += 1
228222
else:
229223
age_distribution["older"] += 1
230
-
224
+
231225
return {
232226
"namespace": self.namespace,
233227
"entry_count": len(cache_files),
234228
"total_size_bytes": total_size,
235
- "age_distribution": age_distribution
229
+ "age_distribution": age_distribution,
236230
}
237231
--- video_processor/utils/api_cache.py
+++ video_processor/utils/api_cache.py
@@ -1,28 +1,30 @@
1 """Caching system for API responses to reduce API calls and costs."""
 
 
2 import json
3 import logging
4 import os
5 import time
6 import hashlib
7 from pathlib import Path
8 from typing import Any, Dict, Optional, Union
9
10 logger = logging.getLogger(__name__)
 
11
12 class ApiCache:
13 """Disk-based API response cache."""
14
15 def __init__(
16 self,
17 cache_dir: Union[str, Path],
18 namespace: str = "default",
19 ttl: int = 86400 # 24 hours in seconds
20 ):
21 """
22 Initialize API cache.
23
24 Parameters
25 ----------
26 cache_dir : str or Path
27 Directory for cache files
28 namespace : str
@@ -31,206 +33,198 @@
31 Time-to-live for cache entries in seconds
32 """
33 self.cache_dir = Path(cache_dir)
34 self.namespace = namespace
35 self.ttl = ttl
36
37 # Ensure namespace directory exists
38 self.namespace_dir = self.cache_dir / namespace
39 self.namespace_dir.mkdir(parents=True, exist_ok=True)
40
41 logger.debug(f"Initialized API cache in {self.namespace_dir}")
42
43 def get_cache_path(self, key: str) -> Path:
44 """
45 Get path to cache file for key.
46
47 Parameters
48 ----------
49 key : str
50 Cache key
51
52 Returns
53 -------
54 Path
55 Path to cache file
56 """
57 # Hash the key to ensure valid filename
58 hashed_key = hashlib.md5(key.encode()).hexdigest()
59 return self.namespace_dir / f"{hashed_key}.json"
60
61 def get(self, key: str) -> Optional[Any]:
62 """
63 Get value from cache.
64
65 Parameters
66 ----------
67 key : str
68 Cache key
69
70 Returns
71 -------
72 object or None
73 Cached value if available and not expired, None otherwise
74 """
75 cache_path = self.get_cache_path(key)
76
77 # Check if cache file exists
78 if not cache_path.exists():
79 return None
80
81 try:
82 # Read cache file
83 with open(cache_path, "r", encoding="utf-8") as f:
84 cache_data = json.load(f)
85
86 # Check if cache entry is expired
87 timestamp = cache_data.get("timestamp", 0)
88 now = time.time()
89
90 if now - timestamp > self.ttl:
91 logger.debug(f"Cache entry expired for {key}")
92 return None
93
94 logger.debug(f"Cache hit for {key}")
95 return cache_data.get("value")
96
97 except Exception as e:
98 logger.warning(f"Error reading cache: {str(e)}")
99 return None
100
101 def set(self, key: str, value: Any) -> bool:
102 """
103 Set value in cache.
104
105 Parameters
106 ----------
107 key : str
108 Cache key
109 value : object
110 Value to cache (must be JSON serializable)
111
112 Returns
113 -------
114 bool
115 True if successful, False otherwise
116 """
117 cache_path = self.get_cache_path(key)
118
119 try:
120 # Prepare cache data
121 cache_data = {
122 "timestamp": time.time(),
123 "value": value
124 }
125
126 # Write to cache file
127 with open(cache_path, "w", encoding="utf-8") as f:
128 json.dump(cache_data, f, ensure_ascii=False)
129
130 logger.debug(f"Cached value for {key}")
131 return True
132
133 except Exception as e:
134 logger.warning(f"Error writing to cache: {str(e)}")
135 return False
136
137 def invalidate(self, key: str) -> bool:
138 """
139 Invalidate cache entry.
140
141 Parameters
142 ----------
143 key : str
144 Cache key
145
146 Returns
147 -------
148 bool
149 True if entry was removed, False otherwise
150 """
151 cache_path = self.get_cache_path(key)
152
153 if cache_path.exists():
154 try:
155 os.remove(cache_path)
156 logger.debug(f"Invalidated cache for {key}")
157 return True
158 except Exception as e:
159 logger.warning(f"Error invalidating cache: {str(e)}")
160
161 return False
162
163 def clear(self, older_than: Optional[int] = None) -> int:
164 """
165 Clear all cache entries or entries older than specified time.
166
167 Parameters
168 ----------
169 older_than : int, optional
170 Clear entries older than this many seconds
171
172 Returns
173 -------
174 int
175 Number of entries cleared
176 """
177 count = 0
178 now = time.time()
179
180 for cache_file in self.namespace_dir.glob("*.json"):
181 try:
182 # Check file age if criteria provided
183 if older_than is not None:
184 file_age = now - os.path.getmtime(cache_file)
185 if file_age <= older_than:
186 continue
187
188 # Remove file
189 os.remove(cache_file)
190 count += 1
191
192 except Exception as e:
193 logger.warning(f"Error clearing cache file {cache_file}: {str(e)}")
194
195 logger.info(f"Cleared {count} cache entries from {self.namespace}")
196 return count
197
198 def get_stats(self) -> Dict:
199 """
200 Get cache statistics.
201
202 Returns
203 -------
204 dict
205 Cache statistics
206 """
207 cache_files = list(self.namespace_dir.glob("*.json"))
208 total_size = sum(os.path.getsize(f) for f in cache_files)
209
210 # Analyze age distribution
211 now = time.time()
212 age_distribution = {
213 "1h": 0,
214 "6h": 0,
215 "24h": 0,
216 "older": 0
217 }
218
219 for cache_file in cache_files:
220 file_age = now - os.path.getmtime(cache_file)
221
222 if file_age <= 3600: # 1 hour
223 age_distribution["1h"] += 1
224 elif file_age <= 21600: # 6 hours
225 age_distribution["6h"] += 1
226 elif file_age <= 86400: # 24 hours
227 age_distribution["24h"] += 1
228 else:
229 age_distribution["older"] += 1
230
231 return {
232 "namespace": self.namespace,
233 "entry_count": len(cache_files),
234 "total_size_bytes": total_size,
235 "age_distribution": age_distribution
236 }
237
--- video_processor/utils/api_cache.py
+++ video_processor/utils/api_cache.py
@@ -1,28 +1,30 @@
1 """Caching system for API responses to reduce API calls and costs."""
2
3 import hashlib
4 import json
5 import logging
6 import os
7 import time
 
8 from pathlib import Path
9 from typing import Any, Dict, Optional, Union
10
11 logger = logging.getLogger(__name__)
12
13
14 class ApiCache:
15 """Disk-based API response cache."""
16
17 def __init__(
18 self,
19 cache_dir: Union[str, Path],
20 namespace: str = "default",
21 ttl: int = 86400, # 24 hours in seconds
22 ):
23 """
24 Initialize API cache.
25
26 Parameters
27 ----------
28 cache_dir : str or Path
29 Directory for cache files
30 namespace : str
@@ -31,206 +33,198 @@
33 Time-to-live for cache entries in seconds
34 """
35 self.cache_dir = Path(cache_dir)
36 self.namespace = namespace
37 self.ttl = ttl
38
39 # Ensure namespace directory exists
40 self.namespace_dir = self.cache_dir / namespace
41 self.namespace_dir.mkdir(parents=True, exist_ok=True)
42
43 logger.debug(f"Initialized API cache in {self.namespace_dir}")
44
45 def get_cache_path(self, key: str) -> Path:
46 """
47 Get path to cache file for key.
48
49 Parameters
50 ----------
51 key : str
52 Cache key
53
54 Returns
55 -------
56 Path
57 Path to cache file
58 """
59 # Hash the key to ensure valid filename
60 hashed_key = hashlib.md5(key.encode()).hexdigest()
61 return self.namespace_dir / f"{hashed_key}.json"
62
63 def get(self, key: str) -> Optional[Any]:
64 """
65 Get value from cache.
66
67 Parameters
68 ----------
69 key : str
70 Cache key
71
72 Returns
73 -------
74 object or None
75 Cached value if available and not expired, None otherwise
76 """
77 cache_path = self.get_cache_path(key)
78
79 # Check if cache file exists
80 if not cache_path.exists():
81 return None
82
83 try:
84 # Read cache file
85 with open(cache_path, "r", encoding="utf-8") as f:
86 cache_data = json.load(f)
87
88 # Check if cache entry is expired
89 timestamp = cache_data.get("timestamp", 0)
90 now = time.time()
91
92 if now - timestamp > self.ttl:
93 logger.debug(f"Cache entry expired for {key}")
94 return None
95
96 logger.debug(f"Cache hit for {key}")
97 return cache_data.get("value")
98
99 except Exception as e:
100 logger.warning(f"Error reading cache: {str(e)}")
101 return None
102
103 def set(self, key: str, value: Any) -> bool:
104 """
105 Set value in cache.
106
107 Parameters
108 ----------
109 key : str
110 Cache key
111 value : object
112 Value to cache (must be JSON serializable)
113
114 Returns
115 -------
116 bool
117 True if successful, False otherwise
118 """
119 cache_path = self.get_cache_path(key)
120
121 try:
122 # Prepare cache data
123 cache_data = {"timestamp": time.time(), "value": value}
124
 
 
 
125 # Write to cache file
126 with open(cache_path, "w", encoding="utf-8") as f:
127 json.dump(cache_data, f, ensure_ascii=False)
128
129 logger.debug(f"Cached value for {key}")
130 return True
131
132 except Exception as e:
133 logger.warning(f"Error writing to cache: {str(e)}")
134 return False
135
136 def invalidate(self, key: str) -> bool:
137 """
138 Invalidate cache entry.
139
140 Parameters
141 ----------
142 key : str
143 Cache key
144
145 Returns
146 -------
147 bool
148 True if entry was removed, False otherwise
149 """
150 cache_path = self.get_cache_path(key)
151
152 if cache_path.exists():
153 try:
154 os.remove(cache_path)
155 logger.debug(f"Invalidated cache for {key}")
156 return True
157 except Exception as e:
158 logger.warning(f"Error invalidating cache: {str(e)}")
159
160 return False
161
162 def clear(self, older_than: Optional[int] = None) -> int:
163 """
164 Clear all cache entries or entries older than specified time.
165
166 Parameters
167 ----------
168 older_than : int, optional
169 Clear entries older than this many seconds
170
171 Returns
172 -------
173 int
174 Number of entries cleared
175 """
176 count = 0
177 now = time.time()
178
179 for cache_file in self.namespace_dir.glob("*.json"):
180 try:
181 # Check file age if criteria provided
182 if older_than is not None:
183 file_age = now - os.path.getmtime(cache_file)
184 if file_age <= older_than:
185 continue
186
187 # Remove file
188 os.remove(cache_file)
189 count += 1
190
191 except Exception as e:
192 logger.warning(f"Error clearing cache file {cache_file}: {str(e)}")
193
194 logger.info(f"Cleared {count} cache entries from {self.namespace}")
195 return count
196
197 def get_stats(self) -> Dict:
198 """
199 Get cache statistics.
200
201 Returns
202 -------
203 dict
204 Cache statistics
205 """
206 cache_files = list(self.namespace_dir.glob("*.json"))
207 total_size = sum(os.path.getsize(f) for f in cache_files)
208
209 # Analyze age distribution
210 now = time.time()
211 age_distribution = {"1h": 0, "6h": 0, "24h": 0, "older": 0}
212
 
 
 
 
 
213 for cache_file in cache_files:
214 file_age = now - os.path.getmtime(cache_file)
215
216 if file_age <= 3600: # 1 hour
217 age_distribution["1h"] += 1
218 elif file_age <= 21600: # 6 hours
219 age_distribution["6h"] += 1
220 elif file_age <= 86400: # 24 hours
221 age_distribution["24h"] += 1
222 else:
223 age_distribution["older"] += 1
224
225 return {
226 "namespace": self.namespace,
227 "entry_count": len(cache_files),
228 "total_size_bytes": total_size,
229 "age_distribution": age_distribution,
230 }
231
--- video_processor/utils/export.py
+++ video_processor/utils/export.py
@@ -1,15 +1,14 @@
11
"""Multi-format output orchestration."""
22
3
-import json
43
import logging
54
from pathlib import Path
65
from typing import Optional
76
87
from tqdm import tqdm
98
10
-from video_processor.models import DiagramResult, VideoManifest
9
+from video_processor.models import VideoManifest
1110
from video_processor.utils.rendering import render_mermaid, reproduce_chart
1211
1312
logger = logging.getLogger(__name__)
1413
1514
@@ -79,11 +78,13 @@
7978
svg_path = output_dir / d.svg_path if d.svg_path else None
8079
if svg_path and svg_path.exists():
8180
svg_content = svg_path.read_text()
8281
diag_html += f'<div class="diagram">{svg_content}</div>'
8382
elif d.image_path:
84
- diag_html += f'<img src="{d.image_path}" alt="Diagram {i + 1}" style="max-width:100%">'
83
+ diag_html += (
84
+ f'<img src="{d.image_path}" alt="Diagram {i + 1}" style="max-width:100%">'
85
+ )
8586
if d.mermaid:
8687
diag_html += f'<pre class="mermaid">{d.mermaid}</pre>'
8788
sections.append(diag_html)
8889
8990
title = manifest.video.title or "PlanOpticon Analysis"
@@ -155,11 +156,13 @@
155156
Updates manifest with output file paths and returns it.
156157
"""
157158
output_dir = Path(output_dir)
158159
159160
# Render mermaid diagrams to SVG/PNG
160
- for i, diagram in enumerate(tqdm(manifest.diagrams, desc="Rendering diagrams", unit="diag") if manifest.diagrams else []):
161
+ for i, diagram in enumerate(
162
+ tqdm(manifest.diagrams, desc="Rendering diagrams", unit="diag") if manifest.diagrams else []
163
+ ):
161164
if diagram.mermaid:
162165
diagrams_dir = output_dir / "diagrams"
163166
prefix = f"diagram_{i}"
164167
paths = render_mermaid(diagram.mermaid, diagrams_dir, prefix)
165168
if "svg" in paths:
166169
--- video_processor/utils/export.py
+++ video_processor/utils/export.py
@@ -1,15 +1,14 @@
1 """Multi-format output orchestration."""
2
3 import json
4 import logging
5 from pathlib import Path
6 from typing import Optional
7
8 from tqdm import tqdm
9
10 from video_processor.models import DiagramResult, VideoManifest
11 from video_processor.utils.rendering import render_mermaid, reproduce_chart
12
13 logger = logging.getLogger(__name__)
14
15
@@ -79,11 +78,13 @@
79 svg_path = output_dir / d.svg_path if d.svg_path else None
80 if svg_path and svg_path.exists():
81 svg_content = svg_path.read_text()
82 diag_html += f'<div class="diagram">{svg_content}</div>'
83 elif d.image_path:
84 diag_html += f'<img src="{d.image_path}" alt="Diagram {i + 1}" style="max-width:100%">'
 
 
85 if d.mermaid:
86 diag_html += f'<pre class="mermaid">{d.mermaid}</pre>'
87 sections.append(diag_html)
88
89 title = manifest.video.title or "PlanOpticon Analysis"
@@ -155,11 +156,13 @@
155 Updates manifest with output file paths and returns it.
156 """
157 output_dir = Path(output_dir)
158
159 # Render mermaid diagrams to SVG/PNG
160 for i, diagram in enumerate(tqdm(manifest.diagrams, desc="Rendering diagrams", unit="diag") if manifest.diagrams else []):
 
 
161 if diagram.mermaid:
162 diagrams_dir = output_dir / "diagrams"
163 prefix = f"diagram_{i}"
164 paths = render_mermaid(diagram.mermaid, diagrams_dir, prefix)
165 if "svg" in paths:
166
--- video_processor/utils/export.py
+++ video_processor/utils/export.py
@@ -1,15 +1,14 @@
1 """Multi-format output orchestration."""
2
 
3 import logging
4 from pathlib import Path
5 from typing import Optional
6
7 from tqdm import tqdm
8
9 from video_processor.models import VideoManifest
10 from video_processor.utils.rendering import render_mermaid, reproduce_chart
11
12 logger = logging.getLogger(__name__)
13
14
@@ -79,11 +78,13 @@
78 svg_path = output_dir / d.svg_path if d.svg_path else None
79 if svg_path and svg_path.exists():
80 svg_content = svg_path.read_text()
81 diag_html += f'<div class="diagram">{svg_content}</div>'
82 elif d.image_path:
83 diag_html += (
84 f'<img src="{d.image_path}" alt="Diagram {i + 1}" style="max-width:100%">'
85 )
86 if d.mermaid:
87 diag_html += f'<pre class="mermaid">{d.mermaid}</pre>'
88 sections.append(diag_html)
89
90 title = manifest.video.title or "PlanOpticon Analysis"
@@ -155,11 +156,13 @@
156 Updates manifest with output file paths and returns it.
157 """
158 output_dir = Path(output_dir)
159
160 # Render mermaid diagrams to SVG/PNG
161 for i, diagram in enumerate(
162 tqdm(manifest.diagrams, desc="Rendering diagrams", unit="diag") if manifest.diagrams else []
163 ):
164 if diagram.mermaid:
165 diagrams_dir = output_dir / "diagrams"
166 prefix = f"diagram_{i}"
167 paths = render_mermaid(diagram.mermaid, diagrams_dir, prefix)
168 if "svg" in paths:
169
--- video_processor/utils/prompt_templates.py
+++ video_processor/utils/prompt_templates.py
@@ -1,152 +1,153 @@
11
"""Prompt templates for LLM-based content analysis."""
2
-import json
2
+
33
import logging
4
-import os
54
from pathlib import Path
65
from string import Template
7
-from typing import Any, Dict, List, Optional, Union
6
+from typing import Dict, Optional, Union
87
98
logger = logging.getLogger(__name__)
9
+
1010
1111
class PromptTemplate:
1212
"""Template manager for LLM prompts."""
13
-
13
+
1414
def __init__(
15
- self,
15
+ self,
1616
templates_dir: Optional[Union[str, Path]] = None,
17
- default_templates: Optional[Dict[str, str]] = None
17
+ default_templates: Optional[Dict[str, str]] = None,
1818
):
1919
"""
2020
Initialize prompt template manager.
21
-
21
+
2222
Parameters
2323
----------
2424
templates_dir : str or Path, optional
2525
Directory containing template files
2626
default_templates : dict, optional
2727
Default templates to use
2828
"""
2929
self.templates_dir = Path(templates_dir) if templates_dir else None
3030
self.templates = {}
31
-
31
+
3232
# Load default templates
3333
if default_templates:
3434
self.templates.update(default_templates)
35
-
35
+
3636
# Load templates from directory if provided
3737
if self.templates_dir and self.templates_dir.exists():
3838
self._load_templates_from_dir()
39
-
39
+
4040
def _load_templates_from_dir(self) -> None:
4141
"""Load templates from template directory."""
4242
if not self.templates_dir:
4343
return
44
-
44
+
4545
for template_file in self.templates_dir.glob("*.txt"):
4646
template_name = template_file.stem
4747
try:
4848
with open(template_file, "r", encoding="utf-8") as f:
4949
template_content = f.read()
5050
self.templates[template_name] = template_content
5151
logger.debug(f"Loaded template: {template_name}")
5252
except Exception as e:
5353
logger.warning(f"Error loading template {template_name}: {str(e)}")
54
-
54
+
5555
def get_template(self, template_name: str) -> Optional[Template]:
5656
"""
5757
Get template by name.
58
-
58
+
5959
Parameters
6060
----------
6161
template_name : str
6262
Template name
63
-
63
+
6464
Returns
6565
-------
6666
Template or None
6767
Template object if found, None otherwise
6868
"""
6969
if template_name not in self.templates:
7070
logger.warning(f"Template not found: {template_name}")
7171
return None
72
-
72
+
7373
return Template(self.templates[template_name])
74
-
74
+
7575
def format_prompt(self, template_name: str, **kwargs) -> Optional[str]:
7676
"""
7777
Format prompt with provided parameters.
78
-
78
+
7979
Parameters
8080
----------
8181
template_name : str
8282
Template name
8383
**kwargs : dict
8484
Template parameters
85
-
85
+
8686
Returns
8787
-------
8888
str or None
8989
Formatted prompt if template exists, None otherwise
9090
"""
9191
template = self.get_template(template_name)
9292
if not template:
9393
return None
94
-
94
+
9595
try:
9696
return template.safe_substitute(**kwargs)
9797
except Exception as e:
9898
logger.error(f"Error formatting template {template_name}: {str(e)}")
9999
return None
100
-
100
+
101101
def add_template(self, template_name: str, template_content: str) -> None:
102102
"""
103103
Add or update template.
104
-
104
+
105105
Parameters
106106
----------
107107
template_name : str
108108
Template name
109109
template_content : str
110110
Template content
111111
"""
112112
self.templates[template_name] = template_content
113
-
113
+
114114
def save_template(self, template_name: str) -> bool:
115115
"""
116116
Save template to file.
117
-
117
+
118118
Parameters
119119
----------
120120
template_name : str
121121
Template name
122
-
122
+
123123
Returns
124124
-------
125125
bool
126126
True if successful, False otherwise
127127
"""
128128
if not self.templates_dir:
129129
logger.error("Templates directory not set")
130130
return False
131
-
131
+
132132
if template_name not in self.templates:
133133
logger.warning(f"Template not found: {template_name}")
134134
return False
135
-
135
+
136136
try:
137137
self.templates_dir.mkdir(parents=True, exist_ok=True)
138138
template_path = self.templates_dir / f"{template_name}.txt"
139
-
139
+
140140
with open(template_path, "w", encoding="utf-8") as f:
141141
f.write(self.templates[template_name])
142
-
142
+
143143
logger.debug(f"Saved template: {template_name}")
144144
return True
145145
except Exception as e:
146146
logger.error(f"Error saving template {template_name}: {str(e)}")
147147
return False
148
+
148149
149150
# Default prompt templates
150151
DEFAULT_TEMPLATES = {
151152
"content_analysis": """
152153
Analyze the provided video content and extract key information:
@@ -161,50 +162,48 @@
161162
- Main topics and themes
162163
- Key points for each topic
163164
- Important details or facts
164165
- Action items or follow-ups
165166
- Relationships between concepts
166
-
167
+
167168
Format the output as structured markdown.
168169
""",
169
-
170170
"diagram_extraction": """
171
- Analyze the following image that contains a diagram, whiteboard content, or other visual information.
172
-
171
+ Analyze the following image that contains a diagram, whiteboard content,
172
+ or other visual information.
173
+
173174
Extract and convert this visual information into a structured representation.
174
-
175
+
175176
If it's a flowchart, process diagram, or similar structured visual:
176177
- Identify the components and their relationships
177178
- Preserve the logical flow and structure
178179
- Convert it to mermaid diagram syntax
179
-
180
+
180181
If it's a whiteboard with text, bullet points, or unstructured content:
181182
- Extract all text elements
182183
- Preserve hierarchical organization if present
183184
- Maintain any emphasized or highlighted elements
184
-
185
+
185186
Image context: $image_context
186
-
187
+
187188
Return the results as markdown with appropriate structure.
188189
""",
189
-
190190
"action_item_detection": """
191191
Review the following transcript and identify all action items, commitments, or follow-up tasks.
192
-
192
+
193193
TRANSCRIPT:
194194
$transcript
195
-
195
+
196196
For each action item, extract:
197197
- The specific action to be taken
198198
- Who is responsible (if mentioned)
199199
- Any deadlines or timeframes
200200
- Priority level (if indicated)
201201
- Context or additional details
202
-
202
+
203203
Format the results as a structured list of action items.
204204
""",
205
-
206205
"content_summary": """
207206
Provide a concise summary of the following content:
208207
209208
$content
210209
@@ -214,11 +213,10 @@
214213
- Focus on the most important information
215214
- Maintain a neutral, objective tone
216215
217216
Format the summary as clear, readable text.
218217
""",
219
-
220218
"summary_generation": """
221219
Generate a comprehensive summary of the following transcript content.
222220
223221
CONTENT:
224222
$content
@@ -229,11 +227,10 @@
229227
- Notes any important context or background
230228
- Is 3-5 paragraphs long
231229
232230
Write in clear, professional prose.
233231
""",
234
-
235232
"key_points_extraction": """
236233
Extract the key points from the following content.
237234
238235
CONTENT:
239236
$content
@@ -243,31 +240,30 @@
243240
- "topic": category or topic area (optional)
244241
- "details": supporting details (optional)
245242
246243
Example format:
247244
[
248
- {"point": "The system uses microservices architecture", "topic": "Architecture", "details": "Each service handles a specific domain"},
249
- {"point": "Migration is planned for Q2", "topic": "Timeline", "details": null}
245
+ {"point": "The system uses microservices architecture",
246
+ "topic": "Architecture", "details": "Each service handles a specific domain"},
250247
]
251248
252249
Return ONLY the JSON array, no additional text.
253250
""",
254
-
255251
"entity_extraction": """
256
- Extract all notable entities (people, concepts, technologies, organizations, time references) from the following content.
257
-
252
+ Extract all notable entities (people, concepts, technologies, organizations,
253
+ time references) from the following content.
258254
CONTENT:
259255
$content
260256
261257
Return a JSON array of entity objects:
262258
[
263
- {"name": "entity name", "type": "person|concept|technology|organization|time", "description": "brief description"}
264
- ]
259
+ {"name": "entity name",
260
+ "type": "person|concept|technology|organization|time",
261
+ "description": "brief description"}
265262
266263
Return ONLY the JSON array, no additional text.
267264
""",
268
-
269265
"relationship_extraction": """
270266
Given the following content and entities, identify relationships between them.
271267
272268
CONTENT:
273269
$content
@@ -275,16 +271,15 @@
275271
KNOWN ENTITIES:
276272
$entities
277273
278274
Return a JSON array of relationship objects:
279275
[
280
- {"source": "entity A", "target": "entity B", "type": "relationship type (e.g., uses, manages, depends_on, created_by, part_of)"}
281
- ]
276
+ {"source": "entity A", "target": "entity B",
277
+ "type": "relationship type (e.g., uses, manages, depends_on, created_by, part_of)"}
282278
283279
Return ONLY the JSON array, no additional text.
284280
""",
285
-
286281
"diagram_analysis": """
287282
Analyze the following text extracted from a diagram or visual element.
288283
289284
DIAGRAM TEXT:
290285
$diagram_text
@@ -303,11 +298,10 @@
303298
"summary": "brief description of what the diagram shows"
304299
}
305300
306301
Return ONLY the JSON object, no additional text.
307302
""",
308
-
309303
"mermaid_generation": """
310304
Convert the following diagram information into valid Mermaid diagram syntax.
311305
312306
Diagram Type: $diagram_type
313307
Text Content: $text_content
@@ -315,10 +309,10 @@
315309
316310
Generate a Mermaid diagram that accurately represents the visual structure.
317311
Use the appropriate Mermaid diagram type (graph, sequenceDiagram, classDiagram, etc.).
318312
319313
Return ONLY the Mermaid code, no markdown fences or explanations.
320
- """
314
+ """,
321315
}
322316
323317
# Create default prompt template manager
324318
default_prompt_manager = PromptTemplate(default_templates=DEFAULT_TEMPLATES)
325319
--- video_processor/utils/prompt_templates.py
+++ video_processor/utils/prompt_templates.py
@@ -1,152 +1,153 @@
1 """Prompt templates for LLM-based content analysis."""
2 import json
3 import logging
4 import os
5 from pathlib import Path
6 from string import Template
7 from typing import Any, Dict, List, Optional, Union
8
9 logger = logging.getLogger(__name__)
 
10
11 class PromptTemplate:
12 """Template manager for LLM prompts."""
13
14 def __init__(
15 self,
16 templates_dir: Optional[Union[str, Path]] = None,
17 default_templates: Optional[Dict[str, str]] = None
18 ):
19 """
20 Initialize prompt template manager.
21
22 Parameters
23 ----------
24 templates_dir : str or Path, optional
25 Directory containing template files
26 default_templates : dict, optional
27 Default templates to use
28 """
29 self.templates_dir = Path(templates_dir) if templates_dir else None
30 self.templates = {}
31
32 # Load default templates
33 if default_templates:
34 self.templates.update(default_templates)
35
36 # Load templates from directory if provided
37 if self.templates_dir and self.templates_dir.exists():
38 self._load_templates_from_dir()
39
40 def _load_templates_from_dir(self) -> None:
41 """Load templates from template directory."""
42 if not self.templates_dir:
43 return
44
45 for template_file in self.templates_dir.glob("*.txt"):
46 template_name = template_file.stem
47 try:
48 with open(template_file, "r", encoding="utf-8") as f:
49 template_content = f.read()
50 self.templates[template_name] = template_content
51 logger.debug(f"Loaded template: {template_name}")
52 except Exception as e:
53 logger.warning(f"Error loading template {template_name}: {str(e)}")
54
55 def get_template(self, template_name: str) -> Optional[Template]:
56 """
57 Get template by name.
58
59 Parameters
60 ----------
61 template_name : str
62 Template name
63
64 Returns
65 -------
66 Template or None
67 Template object if found, None otherwise
68 """
69 if template_name not in self.templates:
70 logger.warning(f"Template not found: {template_name}")
71 return None
72
73 return Template(self.templates[template_name])
74
75 def format_prompt(self, template_name: str, **kwargs) -> Optional[str]:
76 """
77 Format prompt with provided parameters.
78
79 Parameters
80 ----------
81 template_name : str
82 Template name
83 **kwargs : dict
84 Template parameters
85
86 Returns
87 -------
88 str or None
89 Formatted prompt if template exists, None otherwise
90 """
91 template = self.get_template(template_name)
92 if not template:
93 return None
94
95 try:
96 return template.safe_substitute(**kwargs)
97 except Exception as e:
98 logger.error(f"Error formatting template {template_name}: {str(e)}")
99 return None
100
101 def add_template(self, template_name: str, template_content: str) -> None:
102 """
103 Add or update template.
104
105 Parameters
106 ----------
107 template_name : str
108 Template name
109 template_content : str
110 Template content
111 """
112 self.templates[template_name] = template_content
113
114 def save_template(self, template_name: str) -> bool:
115 """
116 Save template to file.
117
118 Parameters
119 ----------
120 template_name : str
121 Template name
122
123 Returns
124 -------
125 bool
126 True if successful, False otherwise
127 """
128 if not self.templates_dir:
129 logger.error("Templates directory not set")
130 return False
131
132 if template_name not in self.templates:
133 logger.warning(f"Template not found: {template_name}")
134 return False
135
136 try:
137 self.templates_dir.mkdir(parents=True, exist_ok=True)
138 template_path = self.templates_dir / f"{template_name}.txt"
139
140 with open(template_path, "w", encoding="utf-8") as f:
141 f.write(self.templates[template_name])
142
143 logger.debug(f"Saved template: {template_name}")
144 return True
145 except Exception as e:
146 logger.error(f"Error saving template {template_name}: {str(e)}")
147 return False
 
148
149 # Default prompt templates
150 DEFAULT_TEMPLATES = {
151 "content_analysis": """
152 Analyze the provided video content and extract key information:
@@ -161,50 +162,48 @@
161 - Main topics and themes
162 - Key points for each topic
163 - Important details or facts
164 - Action items or follow-ups
165 - Relationships between concepts
166
167 Format the output as structured markdown.
168 """,
169
170 "diagram_extraction": """
171 Analyze the following image that contains a diagram, whiteboard content, or other visual information.
172
 
173 Extract and convert this visual information into a structured representation.
174
175 If it's a flowchart, process diagram, or similar structured visual:
176 - Identify the components and their relationships
177 - Preserve the logical flow and structure
178 - Convert it to mermaid diagram syntax
179
180 If it's a whiteboard with text, bullet points, or unstructured content:
181 - Extract all text elements
182 - Preserve hierarchical organization if present
183 - Maintain any emphasized or highlighted elements
184
185 Image context: $image_context
186
187 Return the results as markdown with appropriate structure.
188 """,
189
190 "action_item_detection": """
191 Review the following transcript and identify all action items, commitments, or follow-up tasks.
192
193 TRANSCRIPT:
194 $transcript
195
196 For each action item, extract:
197 - The specific action to be taken
198 - Who is responsible (if mentioned)
199 - Any deadlines or timeframes
200 - Priority level (if indicated)
201 - Context or additional details
202
203 Format the results as a structured list of action items.
204 """,
205
206 "content_summary": """
207 Provide a concise summary of the following content:
208
209 $content
210
@@ -214,11 +213,10 @@
214 - Focus on the most important information
215 - Maintain a neutral, objective tone
216
217 Format the summary as clear, readable text.
218 """,
219
220 "summary_generation": """
221 Generate a comprehensive summary of the following transcript content.
222
223 CONTENT:
224 $content
@@ -229,11 +227,10 @@
229 - Notes any important context or background
230 - Is 3-5 paragraphs long
231
232 Write in clear, professional prose.
233 """,
234
235 "key_points_extraction": """
236 Extract the key points from the following content.
237
238 CONTENT:
239 $content
@@ -243,31 +240,30 @@
243 - "topic": category or topic area (optional)
244 - "details": supporting details (optional)
245
246 Example format:
247 [
248 {"point": "The system uses microservices architecture", "topic": "Architecture", "details": "Each service handles a specific domain"},
249 {"point": "Migration is planned for Q2", "topic": "Timeline", "details": null}
250 ]
251
252 Return ONLY the JSON array, no additional text.
253 """,
254
255 "entity_extraction": """
256 Extract all notable entities (people, concepts, technologies, organizations, time references) from the following content.
257
258 CONTENT:
259 $content
260
261 Return a JSON array of entity objects:
262 [
263 {"name": "entity name", "type": "person|concept|technology|organization|time", "description": "brief description"}
264 ]
 
265
266 Return ONLY the JSON array, no additional text.
267 """,
268
269 "relationship_extraction": """
270 Given the following content and entities, identify relationships between them.
271
272 CONTENT:
273 $content
@@ -275,16 +271,15 @@
275 KNOWN ENTITIES:
276 $entities
277
278 Return a JSON array of relationship objects:
279 [
280 {"source": "entity A", "target": "entity B", "type": "relationship type (e.g., uses, manages, depends_on, created_by, part_of)"}
281 ]
282
283 Return ONLY the JSON array, no additional text.
284 """,
285
286 "diagram_analysis": """
287 Analyze the following text extracted from a diagram or visual element.
288
289 DIAGRAM TEXT:
290 $diagram_text
@@ -303,11 +298,10 @@
303 "summary": "brief description of what the diagram shows"
304 }
305
306 Return ONLY the JSON object, no additional text.
307 """,
308
309 "mermaid_generation": """
310 Convert the following diagram information into valid Mermaid diagram syntax.
311
312 Diagram Type: $diagram_type
313 Text Content: $text_content
@@ -315,10 +309,10 @@
315
316 Generate a Mermaid diagram that accurately represents the visual structure.
317 Use the appropriate Mermaid diagram type (graph, sequenceDiagram, classDiagram, etc.).
318
319 Return ONLY the Mermaid code, no markdown fences or explanations.
320 """
321 }
322
323 # Create default prompt template manager
324 default_prompt_manager = PromptTemplate(default_templates=DEFAULT_TEMPLATES)
325
--- video_processor/utils/prompt_templates.py
+++ video_processor/utils/prompt_templates.py
@@ -1,152 +1,153 @@
1 """Prompt templates for LLM-based content analysis."""
2
3 import logging
 
4 from pathlib import Path
5 from string import Template
6 from typing import Dict, Optional, Union
7
8 logger = logging.getLogger(__name__)
9
10
11 class PromptTemplate:
12 """Template manager for LLM prompts."""
13
14 def __init__(
15 self,
16 templates_dir: Optional[Union[str, Path]] = None,
17 default_templates: Optional[Dict[str, str]] = None,
18 ):
19 """
20 Initialize prompt template manager.
21
22 Parameters
23 ----------
24 templates_dir : str or Path, optional
25 Directory containing template files
26 default_templates : dict, optional
27 Default templates to use
28 """
29 self.templates_dir = Path(templates_dir) if templates_dir else None
30 self.templates = {}
31
32 # Load default templates
33 if default_templates:
34 self.templates.update(default_templates)
35
36 # Load templates from directory if provided
37 if self.templates_dir and self.templates_dir.exists():
38 self._load_templates_from_dir()
39
40 def _load_templates_from_dir(self) -> None:
41 """Load templates from template directory."""
42 if not self.templates_dir:
43 return
44
45 for template_file in self.templates_dir.glob("*.txt"):
46 template_name = template_file.stem
47 try:
48 with open(template_file, "r", encoding="utf-8") as f:
49 template_content = f.read()
50 self.templates[template_name] = template_content
51 logger.debug(f"Loaded template: {template_name}")
52 except Exception as e:
53 logger.warning(f"Error loading template {template_name}: {str(e)}")
54
55 def get_template(self, template_name: str) -> Optional[Template]:
56 """
57 Get template by name.
58
59 Parameters
60 ----------
61 template_name : str
62 Template name
63
64 Returns
65 -------
66 Template or None
67 Template object if found, None otherwise
68 """
69 if template_name not in self.templates:
70 logger.warning(f"Template not found: {template_name}")
71 return None
72
73 return Template(self.templates[template_name])
74
75 def format_prompt(self, template_name: str, **kwargs) -> Optional[str]:
76 """
77 Format prompt with provided parameters.
78
79 Parameters
80 ----------
81 template_name : str
82 Template name
83 **kwargs : dict
84 Template parameters
85
86 Returns
87 -------
88 str or None
89 Formatted prompt if template exists, None otherwise
90 """
91 template = self.get_template(template_name)
92 if not template:
93 return None
94
95 try:
96 return template.safe_substitute(**kwargs)
97 except Exception as e:
98 logger.error(f"Error formatting template {template_name}: {str(e)}")
99 return None
100
101 def add_template(self, template_name: str, template_content: str) -> None:
102 """
103 Add or update template.
104
105 Parameters
106 ----------
107 template_name : str
108 Template name
109 template_content : str
110 Template content
111 """
112 self.templates[template_name] = template_content
113
114 def save_template(self, template_name: str) -> bool:
115 """
116 Save template to file.
117
118 Parameters
119 ----------
120 template_name : str
121 Template name
122
123 Returns
124 -------
125 bool
126 True if successful, False otherwise
127 """
128 if not self.templates_dir:
129 logger.error("Templates directory not set")
130 return False
131
132 if template_name not in self.templates:
133 logger.warning(f"Template not found: {template_name}")
134 return False
135
136 try:
137 self.templates_dir.mkdir(parents=True, exist_ok=True)
138 template_path = self.templates_dir / f"{template_name}.txt"
139
140 with open(template_path, "w", encoding="utf-8") as f:
141 f.write(self.templates[template_name])
142
143 logger.debug(f"Saved template: {template_name}")
144 return True
145 except Exception as e:
146 logger.error(f"Error saving template {template_name}: {str(e)}")
147 return False
148
149
150 # Default prompt templates
151 DEFAULT_TEMPLATES = {
152 "content_analysis": """
153 Analyze the provided video content and extract key information:
@@ -161,50 +162,48 @@
162 - Main topics and themes
163 - Key points for each topic
164 - Important details or facts
165 - Action items or follow-ups
166 - Relationships between concepts
167
168 Format the output as structured markdown.
169 """,
 
170 "diagram_extraction": """
171 Analyze the following image that contains a diagram, whiteboard content,
172 or other visual information.
173
174 Extract and convert this visual information into a structured representation.
175
176 If it's a flowchart, process diagram, or similar structured visual:
177 - Identify the components and their relationships
178 - Preserve the logical flow and structure
179 - Convert it to mermaid diagram syntax
180
181 If it's a whiteboard with text, bullet points, or unstructured content:
182 - Extract all text elements
183 - Preserve hierarchical organization if present
184 - Maintain any emphasized or highlighted elements
185
186 Image context: $image_context
187
188 Return the results as markdown with appropriate structure.
189 """,
 
190 "action_item_detection": """
191 Review the following transcript and identify all action items, commitments, or follow-up tasks.
192
193 TRANSCRIPT:
194 $transcript
195
196 For each action item, extract:
197 - The specific action to be taken
198 - Who is responsible (if mentioned)
199 - Any deadlines or timeframes
200 - Priority level (if indicated)
201 - Context or additional details
202
203 Format the results as a structured list of action items.
204 """,
 
205 "content_summary": """
206 Provide a concise summary of the following content:
207
208 $content
209
@@ -214,11 +213,10 @@
213 - Focus on the most important information
214 - Maintain a neutral, objective tone
215
216 Format the summary as clear, readable text.
217 """,
 
218 "summary_generation": """
219 Generate a comprehensive summary of the following transcript content.
220
221 CONTENT:
222 $content
@@ -229,11 +227,10 @@
227 - Notes any important context or background
228 - Is 3-5 paragraphs long
229
230 Write in clear, professional prose.
231 """,
 
232 "key_points_extraction": """
233 Extract the key points from the following content.
234
235 CONTENT:
236 $content
@@ -243,31 +240,30 @@
240 - "topic": category or topic area (optional)
241 - "details": supporting details (optional)
242
243 Example format:
244 [
245 {"point": "The system uses microservices architecture",
246 "topic": "Architecture", "details": "Each service handles a specific domain"},
247 ]
248
249 Return ONLY the JSON array, no additional text.
250 """,
 
251 "entity_extraction": """
252 Extract all notable entities (people, concepts, technologies, organizations,
253 time references) from the following content.
254 CONTENT:
255 $content
256
257 Return a JSON array of entity objects:
258 [
259 {"name": "entity name",
260 "type": "person|concept|technology|organization|time",
261 "description": "brief description"}
262
263 Return ONLY the JSON array, no additional text.
264 """,
 
265 "relationship_extraction": """
266 Given the following content and entities, identify relationships between them.
267
268 CONTENT:
269 $content
@@ -275,16 +271,15 @@
271 KNOWN ENTITIES:
272 $entities
273
274 Return a JSON array of relationship objects:
275 [
276 {"source": "entity A", "target": "entity B",
277 "type": "relationship type (e.g., uses, manages, depends_on, created_by, part_of)"}
278
279 Return ONLY the JSON array, no additional text.
280 """,
 
281 "diagram_analysis": """
282 Analyze the following text extracted from a diagram or visual element.
283
284 DIAGRAM TEXT:
285 $diagram_text
@@ -303,11 +298,10 @@
298 "summary": "brief description of what the diagram shows"
299 }
300
301 Return ONLY the JSON object, no additional text.
302 """,
 
303 "mermaid_generation": """
304 Convert the following diagram information into valid Mermaid diagram syntax.
305
306 Diagram Type: $diagram_type
307 Text Content: $text_content
@@ -315,10 +309,10 @@
309
310 Generate a Mermaid diagram that accurately represents the visual structure.
311 Use the appropriate Mermaid diagram type (graph, sequenceDiagram, classDiagram, etc.).
312
313 Return ONLY the Mermaid code, no markdown fences or explanations.
314 """,
315 }
316
317 # Create default prompt template manager
318 default_prompt_manager = PromptTemplate(default_templates=DEFAULT_TEMPLATES)
319
--- video_processor/utils/rendering.py
+++ video_processor/utils/rendering.py
@@ -1,10 +1,10 @@
11
"""Mermaid rendering and chart reproduction utilities."""
22
33
import logging
44
from pathlib import Path
5
-from typing import Dict, Optional
5
+from typing import Dict
66
77
logger = logging.getLogger(__name__)
88
99
1010
def render_mermaid(mermaid_code: str, output_dir: str | Path, name: str) -> Dict[str, Path]:
@@ -47,15 +47,20 @@
4747
png_content = rendered.img_response
4848
if png_content:
4949
if isinstance(png_content, bytes):
5050
png_path.write_bytes(png_content)
5151
else:
52
- png_path.write_bytes(png_content.encode() if isinstance(png_content, str) else png_content)
52
+ png_path.write_bytes(
53
+ png_content.encode() if isinstance(png_content, str) else png_content
54
+ )
5355
result["png"] = png_path
5456
5557
except ImportError:
56
- logger.warning("mermaid-py not installed, skipping SVG/PNG rendering. Install with: pip install mermaid-py")
58
+ logger.warning(
59
+ "mermaid-py not installed, skipping SVG/PNG rendering. "
60
+ "Install with: pip install mermaid-py"
61
+ )
5762
except Exception as e:
5863
logger.warning(f"Mermaid rendering failed for '{name}': {e}")
5964
6065
return result
6166
6267
--- video_processor/utils/rendering.py
+++ video_processor/utils/rendering.py
@@ -1,10 +1,10 @@
1 """Mermaid rendering and chart reproduction utilities."""
2
3 import logging
4 from pathlib import Path
5 from typing import Dict, Optional
6
7 logger = logging.getLogger(__name__)
8
9
10 def render_mermaid(mermaid_code: str, output_dir: str | Path, name: str) -> Dict[str, Path]:
@@ -47,15 +47,20 @@
47 png_content = rendered.img_response
48 if png_content:
49 if isinstance(png_content, bytes):
50 png_path.write_bytes(png_content)
51 else:
52 png_path.write_bytes(png_content.encode() if isinstance(png_content, str) else png_content)
 
 
53 result["png"] = png_path
54
55 except ImportError:
56 logger.warning("mermaid-py not installed, skipping SVG/PNG rendering. Install with: pip install mermaid-py")
 
 
 
57 except Exception as e:
58 logger.warning(f"Mermaid rendering failed for '{name}': {e}")
59
60 return result
61
62
--- video_processor/utils/rendering.py
+++ video_processor/utils/rendering.py
@@ -1,10 +1,10 @@
1 """Mermaid rendering and chart reproduction utilities."""
2
3 import logging
4 from pathlib import Path
5 from typing import Dict
6
7 logger = logging.getLogger(__name__)
8
9
10 def render_mermaid(mermaid_code: str, output_dir: str | Path, name: str) -> Dict[str, Path]:
@@ -47,15 +47,20 @@
47 png_content = rendered.img_response
48 if png_content:
49 if isinstance(png_content, bytes):
50 png_path.write_bytes(png_content)
51 else:
52 png_path.write_bytes(
53 png_content.encode() if isinstance(png_content, str) else png_content
54 )
55 result["png"] = png_path
56
57 except ImportError:
58 logger.warning(
59 "mermaid-py not installed, skipping SVG/PNG rendering. "
60 "Install with: pip install mermaid-py"
61 )
62 except Exception as e:
63 logger.warning(f"Mermaid rendering failed for '{name}': {e}")
64
65 return result
66
67
--- video_processor/utils/usage_tracker.py
+++ video_processor/utils/usage_tracker.py
@@ -2,11 +2,10 @@
22
33
import time
44
from dataclasses import dataclass, field
55
from typing import Optional
66
7
-
87
# Cost per million tokens (USD) — updated Feb 2025
98
_MODEL_PRICING = {
109
# Anthropic
1110
"claude-sonnet-4-5-20250929": {"input": 3.00, "output": 15.00},
1211
"claude-haiku-3-5-20241022": {"input": 0.80, "output": 4.00},
@@ -26,10 +25,11 @@
2625
2726
2827
@dataclass
2928
class ModelUsage:
3029
"""Accumulated usage for a single model."""
30
+
3131
provider: str = ""
3232
model: str = ""
3333
calls: int = 0
3434
input_tokens: int = 0
3535
output_tokens: int = 0
@@ -59,10 +59,11 @@
5959
6060
6161
@dataclass
6262
class StepTiming:
6363
"""Timing for a single pipeline step."""
64
+
6465
name: str
6566
start_time: float = 0.0
6667
end_time: float = 0.0
6768
6869
@property
@@ -73,10 +74,11 @@
7374
7475
7576
@dataclass
7677
class UsageTracker:
7778
"""Tracks API usage, costs, and timing across a pipeline run."""
79
+
7880
_models: dict = field(default_factory=dict)
7981
_steps: list = field(default_factory=list)
8082
_current_step: Optional[StepTiming] = field(default=None)
8183
_start_time: float = field(default_factory=time.time)
8284
@@ -160,25 +162,28 @@
160162
)
161163
162164
# API usage
163165
if self._models:
164166
lines.append(f"\n API Calls: {self.total_api_calls}")
165
- lines.append(f" Tokens: {self.total_tokens:,} "
166
- f"({self.total_input_tokens:,} in / {self.total_output_tokens:,} out)")
167
+ lines.append(
168
+ f" Tokens: {self.total_tokens:,} "
169
+ f"({self.total_input_tokens:,} in / {self.total_output_tokens:,} out)"
170
+ )
167171
lines.append("")
168172
lines.append(f" {'Model':<35} {'Calls':>6} {'In Tok':>8} {'Out Tok':>8} {'Cost':>8}")
169
- lines.append(f" {'-'*35} {'-'*6} {'-'*8} {'-'*8} {'-'*8}")
173
+ lines.append(f" {'-' * 35} {'-' * 6} {'-' * 8} {'-' * 8} {'-' * 8}")
170174
for key in sorted(self._models.keys()):
171175
u = self._models[key]
172176
cost_str = f"${u.estimated_cost:.4f}" if u.estimated_cost > 0 else "free"
173177
if u.audio_minutes > 0:
174178
lines.append(
175179
f" {key:<35} {u.calls:>6} {u.audio_minutes:>7.1f}m {'-':>8} {cost_str:>8}"
176180
)
177181
else:
178182
lines.append(
179
- f" {key:<35} {u.calls:>6} {u.input_tokens:>8,} {u.output_tokens:>8,} {cost_str:>8}"
183
+ f" {key:<35} {u.calls:>6} "
184
+ f"{u.input_tokens:>8,} {u.output_tokens:>8,} {cost_str:>8}"
180185
)
181186
182187
lines.append(f"\n Estimated total cost: ${self.total_cost:.4f}")
183188
184189
lines.append("=" * 60)
185190
186191
DELETED work_plan.md
--- video_processor/utils/usage_tracker.py
+++ video_processor/utils/usage_tracker.py
@@ -2,11 +2,10 @@
2
3 import time
4 from dataclasses import dataclass, field
5 from typing import Optional
6
7
8 # Cost per million tokens (USD) — updated Feb 2025
9 _MODEL_PRICING = {
10 # Anthropic
11 "claude-sonnet-4-5-20250929": {"input": 3.00, "output": 15.00},
12 "claude-haiku-3-5-20241022": {"input": 0.80, "output": 4.00},
@@ -26,10 +25,11 @@
26
27
28 @dataclass
29 class ModelUsage:
30 """Accumulated usage for a single model."""
 
31 provider: str = ""
32 model: str = ""
33 calls: int = 0
34 input_tokens: int = 0
35 output_tokens: int = 0
@@ -59,10 +59,11 @@
59
60
61 @dataclass
62 class StepTiming:
63 """Timing for a single pipeline step."""
 
64 name: str
65 start_time: float = 0.0
66 end_time: float = 0.0
67
68 @property
@@ -73,10 +74,11 @@
73
74
75 @dataclass
76 class UsageTracker:
77 """Tracks API usage, costs, and timing across a pipeline run."""
 
78 _models: dict = field(default_factory=dict)
79 _steps: list = field(default_factory=list)
80 _current_step: Optional[StepTiming] = field(default=None)
81 _start_time: float = field(default_factory=time.time)
82
@@ -160,25 +162,28 @@
160 )
161
162 # API usage
163 if self._models:
164 lines.append(f"\n API Calls: {self.total_api_calls}")
165 lines.append(f" Tokens: {self.total_tokens:,} "
166 f"({self.total_input_tokens:,} in / {self.total_output_tokens:,} out)")
 
 
167 lines.append("")
168 lines.append(f" {'Model':<35} {'Calls':>6} {'In Tok':>8} {'Out Tok':>8} {'Cost':>8}")
169 lines.append(f" {'-'*35} {'-'*6} {'-'*8} {'-'*8} {'-'*8}")
170 for key in sorted(self._models.keys()):
171 u = self._models[key]
172 cost_str = f"${u.estimated_cost:.4f}" if u.estimated_cost > 0 else "free"
173 if u.audio_minutes > 0:
174 lines.append(
175 f" {key:<35} {u.calls:>6} {u.audio_minutes:>7.1f}m {'-':>8} {cost_str:>8}"
176 )
177 else:
178 lines.append(
179 f" {key:<35} {u.calls:>6} {u.input_tokens:>8,} {u.output_tokens:>8,} {cost_str:>8}"
 
180 )
181
182 lines.append(f"\n Estimated total cost: ${self.total_cost:.4f}")
183
184 lines.append("=" * 60)
185
186 ELETED work_plan.md
--- video_processor/utils/usage_tracker.py
+++ video_processor/utils/usage_tracker.py
@@ -2,11 +2,10 @@
2
3 import time
4 from dataclasses import dataclass, field
5 from typing import Optional
6
 
7 # Cost per million tokens (USD) — updated Feb 2025
8 _MODEL_PRICING = {
9 # Anthropic
10 "claude-sonnet-4-5-20250929": {"input": 3.00, "output": 15.00},
11 "claude-haiku-3-5-20241022": {"input": 0.80, "output": 4.00},
@@ -26,10 +25,11 @@
25
26
27 @dataclass
28 class ModelUsage:
29 """Accumulated usage for a single model."""
30
31 provider: str = ""
32 model: str = ""
33 calls: int = 0
34 input_tokens: int = 0
35 output_tokens: int = 0
@@ -59,10 +59,11 @@
59
60
61 @dataclass
62 class StepTiming:
63 """Timing for a single pipeline step."""
64
65 name: str
66 start_time: float = 0.0
67 end_time: float = 0.0
68
69 @property
@@ -73,10 +74,11 @@
74
75
76 @dataclass
77 class UsageTracker:
78 """Tracks API usage, costs, and timing across a pipeline run."""
79
80 _models: dict = field(default_factory=dict)
81 _steps: list = field(default_factory=list)
82 _current_step: Optional[StepTiming] = field(default=None)
83 _start_time: float = field(default_factory=time.time)
84
@@ -160,25 +162,28 @@
162 )
163
164 # API usage
165 if self._models:
166 lines.append(f"\n API Calls: {self.total_api_calls}")
167 lines.append(
168 f" Tokens: {self.total_tokens:,} "
169 f"({self.total_input_tokens:,} in / {self.total_output_tokens:,} out)"
170 )
171 lines.append("")
172 lines.append(f" {'Model':<35} {'Calls':>6} {'In Tok':>8} {'Out Tok':>8} {'Cost':>8}")
173 lines.append(f" {'-' * 35} {'-' * 6} {'-' * 8} {'-' * 8} {'-' * 8}")
174 for key in sorted(self._models.keys()):
175 u = self._models[key]
176 cost_str = f"${u.estimated_cost:.4f}" if u.estimated_cost > 0 else "free"
177 if u.audio_minutes > 0:
178 lines.append(
179 f" {key:<35} {u.calls:>6} {u.audio_minutes:>7.1f}m {'-':>8} {cost_str:>8}"
180 )
181 else:
182 lines.append(
183 f" {key:<35} {u.calls:>6} "
184 f"{u.input_tokens:>8,} {u.output_tokens:>8,} {cost_str:>8}"
185 )
186
187 lines.append(f"\n Estimated total cost: ${self.total_cost:.4f}")
188
189 lines.append("=" * 60)
190
191 ELETED work_plan.md
D work_plan.md
-188
--- a/work_plan.md
+++ b/work_plan.md
@@ -1,188 +0,0 @@
1
-PlanOpticon Development Roadmap
2
-This document outlines the development milestones and actionable tasks for implementing the PlanOpticon video analysis system, prioritizing rapid delivery of useful outputs.
3
-Milestone 1: Core Video Processing & Markdown Output
4
-Goal: Process a video and produce markdown notes and mermaid diagrams
5
-Infrastructure Setup
6
-
7
- Initialize project repository structure
8
- Implement basic CLI with argparse
9
- Create configuration management system
10
- Set up logging framework
11
-
12
-Video & Audio Processing
13
-
14
- Implement video frame extraction
15
- Create audio extraction pipeline
16
- Build frame sampling strategy based on visual changes
17
- Implement basic scene detection using cloud APIs
18
-
19
-Transcription & Analysis
20
-
21
- Integrate with cloud speech-to-text APIs (e.g., OpenAI Whisper API, Google Speech-to-Text)
22
- Implement text analysis using LLM APIs (e.g., Claude API, GPT-4 API)
23
- Build keyword and key point extraction via API integration
24
- Create prompt templates for effective LLM content analysis
25
-
26
-Diagram Generation
27
-
28
- Create flow visualization module using mermaid syntax
29
- Implement relationship mapping for detected topics
30
- Build timeline representation generator
31
- Leverage computer vision APIs (e.g., GPT-4 Vision, Google Cloud Vision) for diagram extraction from slides/whiteboards
32
-
33
-Markdown Output Generation
34
-
35
- Implement structured markdown generator
36
- Create templating system for output
37
- Build mermaid diagram integration
38
- Develop table of contents generator
39
-
40
-Testing & Validation
41
-
42
- Set up basic testing infrastructure
43
- Create sample videos for testing
44
- Implement quality checks for outputs
45
- Build simple validation metrics
46
-
47
-Success Criteria:
48
-
49
-Run script with a video input and receive markdown output with embedded mermaid diagrams
50
-Content correctly captures main topics and relationships
51
-Basic structure includes headings, bullet points, and at least one diagram
52
-
53
-Milestone 2: Advanced Content Analysis
54
-Goal: Enhance extraction quality and content organization
55
-Improved Speech Processing
56
-
57
- Integrate specialized speaker diarization APIs
58
- Create transcript segmentation via LLM prompting
59
- Build timestamp synchronization with content
60
- Implement API-based vocabulary detection and handling
61
-
62
-Enhanced Visual Analysis
63
-
64
- Optimize prompts for vision APIs to detect diagrams and charts
65
- Create efficient frame selection for API cost management
66
- Build structured prompt chains for detailed visual analysis
67
- Implement caching mechanism for API responses
68
-
69
-Content Organization
70
-
71
- Implement hierarchical topic modeling
72
- Create concept relationship mapping
73
- Build content categorization
74
- Develop importance scoring for extracted points
75
-
76
-Quality Improvements
77
-
78
- Implement noise filtering for audio
79
- Create redundancy reduction in notes
80
- Build context preservation mechanisms
81
- Develop content verification systems
82
-
83
-Milestone 3: Action Item & Knowledge Extraction
84
-Goal: Identify action items and build knowledge structures
85
-Action Item Detection
86
-
87
- Implement commitment language recognition
88
- Create deadline and timeframe extraction
89
- Build responsibility attribution
90
- Develop priority estimation
91
-
92
-Knowledge Organization
93
-
94
- Implement knowledge graph construction
95
- Create entity recognition and linking
96
- Build cross-reference system
97
- Develop temporal relationship tracking
98
-
99
-Enhanced Output Options
100
-
101
- Implement JSON structured data output
102
- Create SVG diagram generation
103
- Build interactive HTML output option
104
- Develop customizable templates
105
-
106
-Integration Components
107
-
108
- Implement unified data model
109
- Create serialization framework
110
- Build persistence layer for results
111
- Develop query interface for extracted knowledge
112
-
113
-Milestone 4: Optimization & Deployment
114
-Goal: Enhance performance and create deployment package
115
-Performance Optimization
116
-
117
- Implement GPU acceleration for core algorithms
118
- Create ARM-specific optimizations
119
- Build memory usage optimization
120
- Develop parallel processing capabilities
121
-
122
-System Packaging
123
-
124
- Implement dependency management
125
- Create installation scripts
126
- Build comprehensive documentation
127
- Develop container deployment option
128
-
129
-Advanced Features
130
-
131
- Implement custom domain adaptation
132
- Create multi-video correlation
133
- Build confidence scoring for extraction
134
- Develop automated quality assessment
135
-
136
-User Experience
137
-
138
- Implement progress reporting
139
- Create error handling and recovery
140
- Build output customization options
141
- Develop feedback collection mechanism
142
-
143
-Priority Matrix
144
-FeatureImportanceTechnical ComplexityDependenciesPriorityVideo Frame ExtractionHighLowNoneP0Audio TranscriptionHighMediumAudio ExtractionP0Markdown GenerationHighLowContent AnalysisP0Mermaid Diagram CreationHighMediumContent AnalysisP0Topic ExtractionHighMediumTranscriptionP0Basic CLIHighLowNoneP0Speaker DiarizationMediumHighAudio ExtractionP2Visual Element DetectionHighHighFrame ExtractionP1Action Item DetectionMediumMediumTranscriptionP1GPU AccelerationLowMediumCore ProcessingP3ARM OptimizationMediumMediumCore ProcessingP2Installation PackageMediumLowWorking SystemP2
145
-Implementation Approach
146
-To achieve the first milestone efficiently:
147
-
148
-Leverage Existing Cloud APIs
149
-
150
-Integrate with cloud speech-to-text services rather than building models
151
-Use vision APIs for image/slide/whiteboard analysis
152
-Employ LLM APIs (OpenAI, Anthropic, etc.) for content analysis and summarization
153
-Implement API fallbacks and retries for robustness
154
-
155
-
156
-Focus on Pipeline Integration
157
-
158
-Build connectors between components
159
-Ensure data flows properly through the system
160
-Create uniform data structures for interoperability
161
-
162
-
163
-Build for Extensibility
164
-
165
-Design plugin architecture from the beginning
166
-Use configuration-driven approach where possible
167
-Create clear interfaces between components
168
-
169
-
170
-Iterative Refinement
171
-
172
-Implement basic functionality first
173
-Add sophistication in subsequent iterations
174
-Collect feedback after each milestone
175
-
176
-
177
-
178
-Next Steps
179
-After completing this roadmap, potential future enhancements include:
180
-
181
-Real-time processing capabilities
182
-Integration with video conferencing platforms
183
-Collaborative annotation and editing features
184
-Domain-specific model fine-tuning
185
-Multi-language support
186
-Customizable output formats
187
-
188
-This roadmap provides a clear path to developing PlanOpticon with a focus on delivering value quickly through a milestone-based approach, prioritizing the generation of markdown notes and mermaid diagrams as the first outcome.
--- a/work_plan.md
+++ b/work_plan.md
@@ -1,188 +0,0 @@
1 PlanOpticon Development Roadmap
2 This document outlines the development milestones and actionable tasks for implementing the PlanOpticon video analysis system, prioritizing rapid delivery of useful outputs.
3 Milestone 1: Core Video Processing & Markdown Output
4 Goal: Process a video and produce markdown notes and mermaid diagrams
5 Infrastructure Setup
6
7 Initialize project repository structure
8 Implement basic CLI with argparse
9 Create configuration management system
10 Set up logging framework
11
12 Video & Audio Processing
13
14 Implement video frame extraction
15 Create audio extraction pipeline
16 Build frame sampling strategy based on visual changes
17 Implement basic scene detection using cloud APIs
18
19 Transcription & Analysis
20
21 Integrate with cloud speech-to-text APIs (e.g., OpenAI Whisper API, Google Speech-to-Text)
22 Implement text analysis using LLM APIs (e.g., Claude API, GPT-4 API)
23 Build keyword and key point extraction via API integration
24 Create prompt templates for effective LLM content analysis
25
26 Diagram Generation
27
28 Create flow visualization module using mermaid syntax
29 Implement relationship mapping for detected topics
30 Build timeline representation generator
31 Leverage computer vision APIs (e.g., GPT-4 Vision, Google Cloud Vision) for diagram extraction from slides/whiteboards
32
33 Markdown Output Generation
34
35 Implement structured markdown generator
36 Create templating system for output
37 Build mermaid diagram integration
38 Develop table of contents generator
39
40 Testing & Validation
41
42 Set up basic testing infrastructure
43 Create sample videos for testing
44 Implement quality checks for outputs
45 Build simple validation metrics
46
47 Success Criteria:
48
49 Run script with a video input and receive markdown output with embedded mermaid diagrams
50 Content correctly captures main topics and relationships
51 Basic structure includes headings, bullet points, and at least one diagram
52
53 Milestone 2: Advanced Content Analysis
54 Goal: Enhance extraction quality and content organization
55 Improved Speech Processing
56
57 Integrate specialized speaker diarization APIs
58 Create transcript segmentation via LLM prompting
59 Build timestamp synchronization with content
60 Implement API-based vocabulary detection and handling
61
62 Enhanced Visual Analysis
63
64 Optimize prompts for vision APIs to detect diagrams and charts
65 Create efficient frame selection for API cost management
66 Build structured prompt chains for detailed visual analysis
67 Implement caching mechanism for API responses
68
69 Content Organization
70
71 Implement hierarchical topic modeling
72 Create concept relationship mapping
73 Build content categorization
74 Develop importance scoring for extracted points
75
76 Quality Improvements
77
78 Implement noise filtering for audio
79 Create redundancy reduction in notes
80 Build context preservation mechanisms
81 Develop content verification systems
82
83 Milestone 3: Action Item & Knowledge Extraction
84 Goal: Identify action items and build knowledge structures
85 Action Item Detection
86
87 Implement commitment language recognition
88 Create deadline and timeframe extraction
89 Build responsibility attribution
90 Develop priority estimation
91
92 Knowledge Organization
93
94 Implement knowledge graph construction
95 Create entity recognition and linking
96 Build cross-reference system
97 Develop temporal relationship tracking
98
99 Enhanced Output Options
100
101 Implement JSON structured data output
102 Create SVG diagram generation
103 Build interactive HTML output option
104 Develop customizable templates
105
106 Integration Components
107
108 Implement unified data model
109 Create serialization framework
110 Build persistence layer for results
111 Develop query interface for extracted knowledge
112
113 Milestone 4: Optimization & Deployment
114 Goal: Enhance performance and create deployment package
115 Performance Optimization
116
117 Implement GPU acceleration for core algorithms
118 Create ARM-specific optimizations
119 Build memory usage optimization
120 Develop parallel processing capabilities
121
122 System Packaging
123
124 Implement dependency management
125 Create installation scripts
126 Build comprehensive documentation
127 Develop container deployment option
128
129 Advanced Features
130
131 Implement custom domain adaptation
132 Create multi-video correlation
133 Build confidence scoring for extraction
134 Develop automated quality assessment
135
136 User Experience
137
138 Implement progress reporting
139 Create error handling and recovery
140 Build output customization options
141 Develop feedback collection mechanism
142
143 Priority Matrix
144 FeatureImportanceTechnical ComplexityDependenciesPriorityVideo Frame ExtractionHighLowNoneP0Audio TranscriptionHighMediumAudio ExtractionP0Markdown GenerationHighLowContent AnalysisP0Mermaid Diagram CreationHighMediumContent AnalysisP0Topic ExtractionHighMediumTranscriptionP0Basic CLIHighLowNoneP0Speaker DiarizationMediumHighAudio ExtractionP2Visual Element DetectionHighHighFrame ExtractionP1Action Item DetectionMediumMediumTranscriptionP1GPU AccelerationLowMediumCore ProcessingP3ARM OptimizationMediumMediumCore ProcessingP2Installation PackageMediumLowWorking SystemP2
145 Implementation Approach
146 To achieve the first milestone efficiently:
147
148 Leverage Existing Cloud APIs
149
150 Integrate with cloud speech-to-text services rather than building models
151 Use vision APIs for image/slide/whiteboard analysis
152 Employ LLM APIs (OpenAI, Anthropic, etc.) for content analysis and summarization
153 Implement API fallbacks and retries for robustness
154
155
156 Focus on Pipeline Integration
157
158 Build connectors between components
159 Ensure data flows properly through the system
160 Create uniform data structures for interoperability
161
162
163 Build for Extensibility
164
165 Design plugin architecture from the beginning
166 Use configuration-driven approach where possible
167 Create clear interfaces between components
168
169
170 Iterative Refinement
171
172 Implement basic functionality first
173 Add sophistication in subsequent iterations
174 Collect feedback after each milestone
175
176
177
178 Next Steps
179 After completing this roadmap, potential future enhancements include:
180
181 Real-time processing capabilities
182 Integration with video conferencing platforms
183 Collaborative annotation and editing features
184 Domain-specific model fine-tuning
185 Multi-language support
186 Customizable output formats
187
188 This roadmap provides a clear path to developing PlanOpticon with a focus on delivering value quickly through a milestone-based approach, prioritizing the generation of markdown notes and mermaid diagrams as the first outcome.
--- a/work_plan.md
+++ b/work_plan.md
@@ -1,188 +0,0 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button