PlanOpticon
Prepare repo for open-source publishing - Fix all ruff lint errors (400 -> 0), auto-format codebase - Add GitHub community files: issue templates (bug report, feature request), PR template, CONTRIBUTING.md, SECURITY.md, FUNDING.yml - Fix Windows binary build (add shell: bash to PyInstaller step) - Remove internal planning docs (implementation.md, work_plan.md) - Remove setup script (scripts/setup.sh) - Update .gitignore: add AI tools, cloud CLI dirs, .venv, site/ - Exclude prompt_templates.py from E501 (LLM prompt strings)
Commit
829e24abdf9a5ae4b33462ec759c18bf0128503bda0cebc7759347b15fd7a32d
Parent
67de7139b0d6d95…
59 files changed
+79
+1
+106
+5
+37
+25
+40
+1
-272
+3
-120
+1
+56
-32
+9
-9
+1
-4
+28
-34
-3
+13
-6
+9
-7
+79
-42
+12
-9
+2
-4
+11
-7
-4
+33
-23
-2
+10
-4
+3
-7
+7
-21
+7
-5
+9
-7
+25
-11
+46
-15
+25
-22
+11
-11
+60
-65
+28
-15
+73
-72
+72
-52
+4
-3
+38
-17
-1
+17
-14
+8
-6
+2
-2
+3
+19
-16
+29
-7
+36
-20
+4
-5
+1
+3
-9
+7
-20
+50
-56
+7
-4
+50
-56
+8
-3
+10
-5
-188
+
.github/CONTRIBUTING.md
+
.github/FUNDING.yml
+
.github/ISSUE_TEMPLATE/bug_report.yml
+
.github/ISSUE_TEMPLATE/config.yml
+
.github/ISSUE_TEMPLATE/feature_request.yml
+
.github/PULL_REQUEST_TEMPLATE.md
+
.github/SECURITY.md
~
.github/workflows/release-binaries.yml
-
implementation.md
~
pyproject.toml
-
scripts/setup.sh
~
setup.py
~
tests/test_action_detector.py
~
tests/test_agent.py
~
tests/test_api_cache.py
~
tests/test_audio_extractor.py
~
tests/test_batch.py
~
tests/test_cloud_sources.py
~
tests/test_content_analyzer.py
~
tests/test_diagram_analyzer.py
~
tests/test_frame_extractor.py
~
tests/test_json_parsing.py
~
tests/test_models.py
~
tests/test_output_structure.py
~
tests/test_pipeline.py
~
tests/test_prompt_templates.py
~
tests/test_providers.py
~
tests/test_rendering.py
~
video_processor/agent/orchestrator.py
~
video_processor/analyzers/action_detector.py
~
video_processor/analyzers/content_analyzer.py
~
video_processor/analyzers/diagram_analyzer.py
~
video_processor/cli/commands.py
~
video_processor/cli/output_formatter.py
~
video_processor/extractors/__init__.py
~
video_processor/extractors/audio_extractor.py
~
video_processor/extractors/frame_extractor.py
~
video_processor/extractors/text_extractor.py
~
video_processor/integrators/knowledge_graph.py
~
video_processor/integrators/plan_generator.py
~
video_processor/models.py
~
video_processor/output_structure.py
~
video_processor/pipeline.py
~
video_processor/providers/anthropic_provider.py
~
video_processor/providers/base.py
~
video_processor/providers/discovery.py
~
video_processor/providers/gemini_provider.py
~
video_processor/providers/manager.py
~
video_processor/providers/openai_provider.py
~
video_processor/providers/whisper_local.py
~
video_processor/sources/base.py
~
video_processor/sources/dropbox_source.py
~
video_processor/sources/google_drive.py
~
video_processor/utils/api_cache.py
~
video_processor/utils/export.py
~
video_processor/utils/prompt_templates.py
~
video_processor/utils/rendering.py
~
video_processor/utils/usage_tracker.py
-
work_plan.md
+79
| --- a/.github/CONTRIBUTING.md | ||
| +++ b/.github/CONTRIBUTING.md | ||
| @@ -0,0 +1,79 @@ | ||
| 1 | +# Contributing to PlanOpticon | |
| 2 | + | |
| 3 | +Thank you for your interest in contributing to PlanOpticon! This guide will help you get started. | |
| 4 | + | |
| 5 | +## Development Setup | |
| 6 | + | |
| 7 | +1. **Fork and clone the repository:** | |
| 8 | + | |
| 9 | + ```bash | |
| 10 | + git clone https://github.com/<your-username>/PlanOpticon.git | |
| 11 | + cd PlanOpticon | |
| 12 | + ``` | |
| 13 | + | |
| 14 | +2. **Create a virtual environment:** | |
| 15 | + | |
| 16 | + ```bash | |
| 17 | + python -m venv .venv | |
| 18 | + source .venv/bin/activate # On Windows: .venv\Scripts\activate | |
| 19 | + ``` | |
| 20 | + | |
| 21 | +3. **Install in editable mode with dev dependencies:** | |
| 22 | + | |
| 23 | + ```bash | |
| 24 | + pip install -e ".[dev]" | |
| 25 | + ``` | |
| 26 | + | |
| 27 | +4. **Install FFmpeg** (required for video processing): | |
| 28 | + | |
| 29 | + ```bash | |
| 30 | + # macOS | |
| 31 | + brew install ffmpeg | |
| 32 | + | |
| 33 | + # Ubuntu/Debian | |
| 34 | + sudo apt install ffmpeg | |
| 35 | + ``` | |
| 36 | + | |
| 37 | +5. **Set up at least one AI provider API key:** | |
| 38 | + | |
| 39 | + ```bash | |
| 40 | + export OPENAI_API_KEY="sk-..." | |
| 41 | + # or | |
| 42 | + export ANTHROPIC_API_KEY="sk-ant-..." | |
| 43 | + # or | |
| 44 | + export GEMINI_API_KEY="..." | |
| 45 | + ``` | |
| 46 | + | |
| 47 | +## Running Tests | |
| 48 | + | |
| 49 | +```bash | |
| 50 | +pytest tests/ | |
| 51 | +``` | |
| 52 | + | |
| 53 | +To run tests with coverage: | |
| 54 | + | |
| 55 | +```bash | |
| 56 | +pytest tests/ --cov=video_processor | |
| 57 | +``` | |
| 58 | + | |
| 59 | +## Code Style | |
| 60 | + | |
| 61 | +This project uses [Ruff](https://docs.astral.sh/ruff/) for linting and formatting. | |
| 62 | + | |
| 63 | +**Check for lint issues:** | |
| 64 | + | |
| 65 | +```bash | |
| 66 | +ruff check . | |
| 67 | +``` | |
| 68 | + | |
| 69 | +*modifying files):** | |
| 70 | + | |
| 71 | +```bash | |
| 72 | +ruff format --check . | |
| 73 | +``` | |
| 74 | + | |
| 75 | +The project targets a line length of 100 characters and Python 3.10+. See `pyproject.toml` for the full Ruff configuration. | |
| 76 | + | |
| 77 | +## Commit Conventions | |
| 78 | + | |
| 79 | +Write clear, descriptive commit messages. Use the imperative mood in t |
| --- a/.github/CONTRIBUTING.md | |
| +++ b/.github/CONTRIBUTING.md | |
| @@ -0,0 +1,79 @@ | |
| --- a/.github/CONTRIBUTING.md | |
| +++ b/.github/CONTRIBUTING.md | |
| @@ -0,0 +1,79 @@ | |
| 1 | # Contributing to PlanOpticon |
| 2 | |
| 3 | Thank you for your interest in contributing to PlanOpticon! This guide will help you get started. |
| 4 | |
| 5 | ## Development Setup |
| 6 | |
| 7 | 1. **Fork and clone the repository:** |
| 8 | |
| 9 | ```bash |
| 10 | git clone https://github.com/<your-username>/PlanOpticon.git |
| 11 | cd PlanOpticon |
| 12 | ``` |
| 13 | |
| 14 | 2. **Create a virtual environment:** |
| 15 | |
| 16 | ```bash |
| 17 | python -m venv .venv |
| 18 | source .venv/bin/activate # On Windows: .venv\Scripts\activate |
| 19 | ``` |
| 20 | |
| 21 | 3. **Install in editable mode with dev dependencies:** |
| 22 | |
| 23 | ```bash |
| 24 | pip install -e ".[dev]" |
| 25 | ``` |
| 26 | |
| 27 | 4. **Install FFmpeg** (required for video processing): |
| 28 | |
| 29 | ```bash |
| 30 | # macOS |
| 31 | brew install ffmpeg |
| 32 | |
| 33 | # Ubuntu/Debian |
| 34 | sudo apt install ffmpeg |
| 35 | ``` |
| 36 | |
| 37 | 5. **Set up at least one AI provider API key:** |
| 38 | |
| 39 | ```bash |
| 40 | export OPENAI_API_KEY="sk-..." |
| 41 | # or |
| 42 | export ANTHROPIC_API_KEY="sk-ant-..." |
| 43 | # or |
| 44 | export GEMINI_API_KEY="..." |
| 45 | ``` |
| 46 | |
| 47 | ## Running Tests |
| 48 | |
| 49 | ```bash |
| 50 | pytest tests/ |
| 51 | ``` |
| 52 | |
| 53 | To run tests with coverage: |
| 54 | |
| 55 | ```bash |
| 56 | pytest tests/ --cov=video_processor |
| 57 | ``` |
| 58 | |
| 59 | ## Code Style |
| 60 | |
| 61 | This project uses [Ruff](https://docs.astral.sh/ruff/) for linting and formatting. |
| 62 | |
| 63 | **Check for lint issues:** |
| 64 | |
| 65 | ```bash |
| 66 | ruff check . |
| 67 | ``` |
| 68 | |
| 69 | *modifying files):** |
| 70 | |
| 71 | ```bash |
| 72 | ruff format --check . |
| 73 | ``` |
| 74 | |
| 75 | The project targets a line length of 100 characters and Python 3.10+. See `pyproject.toml` for the full Ruff configuration. |
| 76 | |
| 77 | ## Commit Conventions |
| 78 | |
| 79 | Write clear, descriptive commit messages. Use the imperative mood in t |
+1
| --- a/.github/FUNDING.yml | ||
| +++ b/.github/FUNDING.yml | ||
| @@ -0,0 +1 @@ | ||
| 1 | +github: ConflictHQ |
| --- a/.github/FUNDING.yml | |
| +++ b/.github/FUNDING.yml | |
| @@ -0,0 +1 @@ | |
| --- a/.github/FUNDING.yml | |
| +++ b/.github/FUNDING.yml | |
| @@ -0,0 +1 @@ | |
| 1 | github: ConflictHQ |
| --- a/.github/ISSUE_TEMPLATE/bug_report.yml | ||
| +++ b/.github/ISSUE_TEMPLATE/bug_report.yml | ||
| @@ -0,0 +1,106 @@ | ||
| 1 | +name: Bug Report | |
| 2 | +description: Report a bug in PlanOpticon | |
| 3 | +title: "[Bug]: " | |
| 4 | +labels: ["bug", "triage"] | |
| 5 | +body: | |
| 6 | + - type: markdown | |
| 7 | + attributes: | |
| 8 | + value: | | |
| 9 | + Thank you for taking the time to report a bug. Please fill out the fields below so we can diagnose and fix the issue as quickly as possible. | |
| 10 | + | |
| 11 | + - type: textarea | |
| 12 | + id: description | |
| 13 | + attributes: | |
| 14 | + label: Description | |
| 15 | + description: A clear and concise description of the bug. | |
| 16 | + placeholder: Describe the bug... | |
| 17 | + validations: | |
| 18 | + required: true | |
| 19 | + | |
| 20 | + - type: textarea | |
| 21 | + id: steps-to-reproduce | |
| 22 | + attributes: | |
| 23 | + label: Steps to Reproduce | |
| 24 | + description: The exact steps to reproduce the behavior. | |
| 25 | + placeholder: | | |
| 26 | + 1. Run `planopticon analyze -i video.mp4 -o ./output` | |
| 27 | + 2. Wait for frame extraction to complete | |
| 28 | + 3. Observe error in diagram extraction step | |
| 29 | + validations: | |
| 30 | + required: true | |
| 31 | + | |
| 32 | + - type: textarea | |
| 33 | + id: expected-behavior | |
| 34 | + attributes: | |
| 35 | + label: Expected Behavior | |
| 36 | + description: What you expected to happen. | |
| 37 | + placeholder: Describe what you expected... | |
| 38 | + validations: | |
| 39 | + required: true | |
| 40 | + | |
| 41 | + - type: textarea | |
| 42 | + id: actual-behavior | |
| 43 | + attributes: | |
| 44 | + label: Actual Behavior | |
| 45 | + description: What actually happened. | |
| 46 | + placeholder: Describe what actually happened... | |
| 47 | + validations: | |
| 48 | + required: true | |
| 49 | + | |
| 50 | + - type: dropdown | |
| 51 | + id: os | |
| 52 | + attributes: | |
| 53 | + label: Operating System | |
| 54 | + options: | |
| 55 | + - macOS | |
| 56 | + - Linux (Ubuntu/Debian) | |
| 57 | + - Linux (Fedora/RHEL) | |
| 58 | + - Linux (other) | |
| 59 | + - Windows | |
| 60 | + - Other | |
| 61 | + validations: | |
| 62 | + required: true | |
| 63 | + | |
| 64 | + - type: dropdown | |
| 65 | + id: python-version | |
| 66 | + attributes: | |
| 67 | + label: Python Version | |
| 68 | + options: | |
| 69 | + - "3.13" | |
| 70 | + - "3.12" | |
| 71 | + - "3.11" | |
| 72 | + - "3.10" | |
| 73 | + validations: | |
| 74 | + required: true | |
| 75 | + | |
| 76 | + - type: input | |
| 77 | + id: planopticon-version | |
| 78 | + attributes: | |
| 79 | + label: PlanOpticon Version | |
| 80 | + description: Run `planopticon --version` or `pip show planopticon` to find this. | |
| 81 | + placeholder: "e.g. 0.2.0" | |
| 82 | + validations: | |
| 83 | + required: true | |
| 84 | + | |
| 85 | + - type: dropdown | |
| 86 | + id: provider | |
| 87 | + attributes: | |
| 88 | + label: AI Provider | |
| 89 | + description: Which AI provider were you using when the bug occurred? | |
| 90 | + options: | |
| 91 | + - OpenAI | |
| 92 | + - Anthropic | |
| 93 | + - Google Gemini | |
| 94 | + - Multiple providers | |
| 95 | + - Not applicable | |
| 96 | + validations: | |
| 97 | + required: true | |
| 98 | + | |
| 99 | + - type: textarea | |
| 100 | + id: logs | |
| 101 | + attributes: | |
| 102 | + label: Logs | |
| 103 | + description: Paste any relevant log output. This will be automatically formatted as code. | |
| 104 | + render: shell | |
| 105 | + validations: | |
| 106 | + required: false |
| --- a/.github/ISSUE_TEMPLATE/bug_report.yml | |
| +++ b/.github/ISSUE_TEMPLATE/bug_report.yml | |
| @@ -0,0 +1,106 @@ | |
| --- a/.github/ISSUE_TEMPLATE/bug_report.yml | |
| +++ b/.github/ISSUE_TEMPLATE/bug_report.yml | |
| @@ -0,0 +1,106 @@ | |
| 1 | name: Bug Report |
| 2 | description: Report a bug in PlanOpticon |
| 3 | title: "[Bug]: " |
| 4 | labels: ["bug", "triage"] |
| 5 | body: |
| 6 | - type: markdown |
| 7 | attributes: |
| 8 | value: | |
| 9 | Thank you for taking the time to report a bug. Please fill out the fields below so we can diagnose and fix the issue as quickly as possible. |
| 10 | |
| 11 | - type: textarea |
| 12 | id: description |
| 13 | attributes: |
| 14 | label: Description |
| 15 | description: A clear and concise description of the bug. |
| 16 | placeholder: Describe the bug... |
| 17 | validations: |
| 18 | required: true |
| 19 | |
| 20 | - type: textarea |
| 21 | id: steps-to-reproduce |
| 22 | attributes: |
| 23 | label: Steps to Reproduce |
| 24 | description: The exact steps to reproduce the behavior. |
| 25 | placeholder: | |
| 26 | 1. Run `planopticon analyze -i video.mp4 -o ./output` |
| 27 | 2. Wait for frame extraction to complete |
| 28 | 3. Observe error in diagram extraction step |
| 29 | validations: |
| 30 | required: true |
| 31 | |
| 32 | - type: textarea |
| 33 | id: expected-behavior |
| 34 | attributes: |
| 35 | label: Expected Behavior |
| 36 | description: What you expected to happen. |
| 37 | placeholder: Describe what you expected... |
| 38 | validations: |
| 39 | required: true |
| 40 | |
| 41 | - type: textarea |
| 42 | id: actual-behavior |
| 43 | attributes: |
| 44 | label: Actual Behavior |
| 45 | description: What actually happened. |
| 46 | placeholder: Describe what actually happened... |
| 47 | validations: |
| 48 | required: true |
| 49 | |
| 50 | - type: dropdown |
| 51 | id: os |
| 52 | attributes: |
| 53 | label: Operating System |
| 54 | options: |
| 55 | - macOS |
| 56 | - Linux (Ubuntu/Debian) |
| 57 | - Linux (Fedora/RHEL) |
| 58 | - Linux (other) |
| 59 | - Windows |
| 60 | - Other |
| 61 | validations: |
| 62 | required: true |
| 63 | |
| 64 | - type: dropdown |
| 65 | id: python-version |
| 66 | attributes: |
| 67 | label: Python Version |
| 68 | options: |
| 69 | - "3.13" |
| 70 | - "3.12" |
| 71 | - "3.11" |
| 72 | - "3.10" |
| 73 | validations: |
| 74 | required: true |
| 75 | |
| 76 | - type: input |
| 77 | id: planopticon-version |
| 78 | attributes: |
| 79 | label: PlanOpticon Version |
| 80 | description: Run `planopticon --version` or `pip show planopticon` to find this. |
| 81 | placeholder: "e.g. 0.2.0" |
| 82 | validations: |
| 83 | required: true |
| 84 | |
| 85 | - type: dropdown |
| 86 | id: provider |
| 87 | attributes: |
| 88 | label: AI Provider |
| 89 | description: Which AI provider were you using when the bug occurred? |
| 90 | options: |
| 91 | - OpenAI |
| 92 | - Anthropic |
| 93 | - Google Gemini |
| 94 | - Multiple providers |
| 95 | - Not applicable |
| 96 | validations: |
| 97 | required: true |
| 98 | |
| 99 | - type: textarea |
| 100 | id: logs |
| 101 | attributes: |
| 102 | label: Logs |
| 103 | description: Paste any relevant log output. This will be automatically formatted as code. |
| 104 | render: shell |
| 105 | validations: |
| 106 | required: false |
| --- a/.github/ISSUE_TEMPLATE/config.yml | ||
| +++ b/.github/ISSUE_TEMPLATE/config.yml | ||
| @@ -0,0 +1,5 @@ | ||
| 1 | +blank_issues_enabled: false | |
| 2 | +contact_links: | |
| 3 | + - name: Discussions | |
| 4 | + url: https://github.com/ConflictHQ/PlanOpticon/discussions | |
| 5 | + about: Ask questions, share ideas, or discuss PlanOpticon with the community. |
| --- a/.github/ISSUE_TEMPLATE/config.yml | |
| +++ b/.github/ISSUE_TEMPLATE/config.yml | |
| @@ -0,0 +1,5 @@ | |
| --- a/.github/ISSUE_TEMPLATE/config.yml | |
| +++ b/.github/ISSUE_TEMPLATE/config.yml | |
| @@ -0,0 +1,5 @@ | |
| 1 | blank_issues_enabled: false |
| 2 | contact_links: |
| 3 | - name: Discussions |
| 4 | url: https://github.com/ConflictHQ/PlanOpticon/discussions |
| 5 | about: Ask questions, share ideas, or discuss PlanOpticon with the community. |
| --- a/.github/ISSUE_TEMPLATE/feature_request.yml | ||
| +++ b/.github/ISSUE_TEMPLATE/feature_request.yml | ||
| @@ -0,0 +1,37 @@ | ||
| 1 | +name: Feature Request | |
| 2 | +description: Suggest a new feature or improvement for PlanOpticon | |
| 3 | +title: "[Feature]: " | |
| 4 | +labels: ["enhancement"] | |
| 5 | +body: | |
| 6 | + - type: markdown | |
| 7 | + attributes: | |
| 8 | + value: | | |
| 9 | + We appreciate your ideas for improving PlanOpticon. Please describe your feature request in detail so we can evaluate and prioritize it. | |
| 10 | + | |
| 11 | + - type: textarea | |
| 12 | + id: description | |
| 13 | + attributes: | |
| 14 | + label: Description | |
| 15 | + description: A clear and concise description of the feature you would like to see. | |
| 16 | + placeholder: Describe the feature... | |
| 17 | + validations: | |
| 18 | + required: true | |
| 19 | + | |
| 20 | + - type: textarea | |
| 21 | + id: use-case | |
| 22 | + attributes: | |
| 23 | + label: Use Case | |
| 24 | + description: Explain the problem this feature would solve or the workflow it would improve. Why is this feature important to you? | |
| 25 | + placeholder: | | |
| 26 | + As a user who processes large batches of meeting recordings, I need... | |
| 27 | + validations: | |
| 28 | + required: true | |
| 29 | + | |
| 30 | + - type: textarea | |
| 31 | + id: proposed-solution | |
| 32 | + attributes: | |
| 33 | + label: Proposed Solution | |
| 34 | + description: If you have ideas on how this could be implemented, describe them here. This is optional -- we welcome feature requests even without a proposed solution. | |
| 35 | + placeholder: Describe a possible implementation approach... | |
| 36 | + validations: | |
| 37 | + required: false |
| --- a/.github/ISSUE_TEMPLATE/feature_request.yml | |
| +++ b/.github/ISSUE_TEMPLATE/feature_request.yml | |
| @@ -0,0 +1,37 @@ | |
| --- a/.github/ISSUE_TEMPLATE/feature_request.yml | |
| +++ b/.github/ISSUE_TEMPLATE/feature_request.yml | |
| @@ -0,0 +1,37 @@ | |
| 1 | name: Feature Request |
| 2 | description: Suggest a new feature or improvement for PlanOpticon |
| 3 | title: "[Feature]: " |
| 4 | labels: ["enhancement"] |
| 5 | body: |
| 6 | - type: markdown |
| 7 | attributes: |
| 8 | value: | |
| 9 | We appreciate your ideas for improving PlanOpticon. Please describe your feature request in detail so we can evaluate and prioritize it. |
| 10 | |
| 11 | - type: textarea |
| 12 | id: description |
| 13 | attributes: |
| 14 | label: Description |
| 15 | description: A clear and concise description of the feature you would like to see. |
| 16 | placeholder: Describe the feature... |
| 17 | validations: |
| 18 | required: true |
| 19 | |
| 20 | - type: textarea |
| 21 | id: use-case |
| 22 | attributes: |
| 23 | label: Use Case |
| 24 | description: Explain the problem this feature would solve or the workflow it would improve. Why is this feature important to you? |
| 25 | placeholder: | |
| 26 | As a user who processes large batches of meeting recordings, I need... |
| 27 | validations: |
| 28 | required: true |
| 29 | |
| 30 | - type: textarea |
| 31 | id: proposed-solution |
| 32 | attributes: |
| 33 | label: Proposed Solution |
| 34 | description: If you have ideas on how this could be implemented, describe them here. This is optional -- we welcome feature requests even without a proposed solution. |
| 35 | placeholder: Describe a possible implementation approach... |
| 36 | validations: |
| 37 | required: false |
| --- a/.github/PULL_REQUEST_TEMPLATE.md | ||
| +++ b/.github/PULL_REQUEST_TEMPLATE.md | ||
| @@ -0,0 +1,25 @@ | ||
| 1 | +## Summary of Changes | |
| 2 | + | |
| 3 | +<!-- Briefly describe what this PR does and why. --> | |
| 4 | + | |
| 5 | +## Type of Change | |
| 6 | + | |
| 7 | +<!-- Check the one that applies. --> | |
| 8 | + | |
| 9 | +- [ ] Bug fix (non-breaking change that fixes an issue) | |
| 10 | +- [ ] New feature (non-breaking change that adds functionality) | |
| 11 | +- [ ] Documentation update | |
| 12 | +- [ ] Refactor (no functional changes) | |
| 13 | +- [ ] Breaking change (fix or feature that would cause existing functionality to change) | |
| 14 | + | |
| 15 | +## Test Plan | |
| 16 | + | |
| 17 | +<!-- Describe how you tested these changes. Include commands, scenarios, or links to CI runs. --> | |
| 18 | + | |
| 19 | +## Checklist | |
| 20 | + | |
| 21 | +- [ ] Tests pass locally (`pytest tests/`) | |
| 22 | +- [ ] Lint is clean (`ruff check .` and `ruff format --check .`) | |
| 23 | +- [ ] Documentation has been updated (if applicable) | |
| 24 | +- [ ] Any new dependencies are added to `pyproject.toml` | |
| 25 | +- [ ] Commit messages follow the project's conventions |
| --- a/.github/PULL_REQUEST_TEMPLATE.md | |
| +++ b/.github/PULL_REQUEST_TEMPLATE.md | |
| @@ -0,0 +1,25 @@ | |
| --- a/.github/PULL_REQUEST_TEMPLATE.md | |
| +++ b/.github/PULL_REQUEST_TEMPLATE.md | |
| @@ -0,0 +1,25 @@ | |
| 1 | ## Summary of Changes |
| 2 | |
| 3 | <!-- Briefly describe what this PR does and why. --> |
| 4 | |
| 5 | ## Type of Change |
| 6 | |
| 7 | <!-- Check the one that applies. --> |
| 8 | |
| 9 | - [ ] Bug fix (non-breaking change that fixes an issue) |
| 10 | - [ ] New feature (non-breaking change that adds functionality) |
| 11 | - [ ] Documentation update |
| 12 | - [ ] Refactor (no functional changes) |
| 13 | - [ ] Breaking change (fix or feature that would cause existing functionality to change) |
| 14 | |
| 15 | ## Test Plan |
| 16 | |
| 17 | <!-- Describe how you tested these changes. Include commands, scenarios, or links to CI runs. --> |
| 18 | |
| 19 | ## Checklist |
| 20 | |
| 21 | - [ ] Tests pass locally (`pytest tests/`) |
| 22 | - [ ] Lint is clean (`ruff check .` and `ruff format --check .`) |
| 23 | - [ ] Documentation has been updated (if applicable) |
| 24 | - [ ] Any new dependencies are added to `pyproject.toml` |
| 25 | - [ ] Commit messages follow the project's conventions |
+40
| --- a/.github/SECURITY.md | ||
| +++ b/.github/SECURITY.md | ||
| @@ -0,0 +1,40 @@ | ||
| 1 | +# Security Policy | |
| 2 | + | |
| 3 | +## Reporting a Vulnerability | |
| 4 | + | |
| 5 | +If you discover a security vulnerability in PlanOpticon, we ask that you report it responsibly. **Please do not open a public GitHub issue for security vulnerabilities.** | |
| 6 | + | |
| 7 | +Instead, send an email to: | |
| 8 | + | |
| 9 | +**[email protected]** | |
| 10 | + | |
| 11 | +Include as much of the following information as possible: | |
| 12 | + | |
| 13 | +- A description of the vulnerability and its potential impact | |
| 14 | +- Steps to reproduce the issue | |
| 15 | +- Any relevant logs, screenshots, or proof-of-concept code | |
| 16 | +- Your recommended fix, if you have one | |
| 17 | + | |
| 18 | +## What to Expect | |
| 19 | + | |
| 20 | +- **Acknowledgment:** We will acknowledge receipt of your report within 2 business days. | |
| 21 | +- **Assessment:** We will investigate and assess the severity of the issue. We may reach out to you for additional details. | |
| 22 | +- **Resolution:** We will work on a fix and coordinate disclosure with you. We aim to resolve critical issues within 14 days. | |
| 23 | +- **Credit:** With your permission, we will credit you in the release notes for the fix. | |
| 24 | + | |
| 25 | +## Supported Versions | |
| 26 | + | |
| 27 | +We provide security updates for the latest minor release of PlanOpticon. We recommend always running the most recent version. | |
| 28 | + | |
| 29 | +| Version | Supported | | |
| 30 | +|---------|-----------| | |
| 31 | +| Latest | Yes | | |
| 32 | +| Older | No | | |
| 33 | + | |
| 34 | +## Scope | |
| 35 | + | |
| 36 | +This security policy covers the PlanOpticon application and its first-party code. Vulnerabilities in third-party dependencies should be reported to the respective upstream projects, though we appreciate being notified so we can update our dependencies promptly. | |
| 37 | + | |
| 38 | +## Thank You | |
| 39 | + | |
| 40 | +We value the security research community and appreciate the effort that goes into finding and responsibly disclosing vulnerabilities. Thank you for helping keep PlanOpticon and its users safe. |
| --- a/.github/SECURITY.md | |
| +++ b/.github/SECURITY.md | |
| @@ -0,0 +1,40 @@ | |
| --- a/.github/SECURITY.md | |
| +++ b/.github/SECURITY.md | |
| @@ -0,0 +1,40 @@ | |
| 1 | # Security Policy |
| 2 | |
| 3 | ## Reporting a Vulnerability |
| 4 | |
| 5 | If you discover a security vulnerability in PlanOpticon, we ask that you report it responsibly. **Please do not open a public GitHub issue for security vulnerabilities.** |
| 6 | |
| 7 | Instead, send an email to: |
| 8 | |
| 9 | **[email protected]** |
| 10 | |
| 11 | Include as much of the following information as possible: |
| 12 | |
| 13 | - A description of the vulnerability and its potential impact |
| 14 | - Steps to reproduce the issue |
| 15 | - Any relevant logs, screenshots, or proof-of-concept code |
| 16 | - Your recommended fix, if you have one |
| 17 | |
| 18 | ## What to Expect |
| 19 | |
| 20 | - **Acknowledgment:** We will acknowledge receipt of your report within 2 business days. |
| 21 | - **Assessment:** We will investigate and assess the severity of the issue. We may reach out to you for additional details. |
| 22 | - **Resolution:** We will work on a fix and coordinate disclosure with you. We aim to resolve critical issues within 14 days. |
| 23 | - **Credit:** With your permission, we will credit you in the release notes for the fix. |
| 24 | |
| 25 | ## Supported Versions |
| 26 | |
| 27 | We provide security updates for the latest minor release of PlanOpticon. We recommend always running the most recent version. |
| 28 | |
| 29 | | Version | Supported | |
| 30 | |---------|-----------| |
| 31 | | Latest | Yes | |
| 32 | | Older | No | |
| 33 | |
| 34 | ## Scope |
| 35 | |
| 36 | This security policy covers the PlanOpticon application and its first-party code. Vulnerabilities in third-party dependencies should be reported to the respective upstream projects, though we appreciate being notified so we can update our dependencies promptly. |
| 37 | |
| 38 | ## Thank You |
| 39 | |
| 40 | We value the security research community and appreciate the effort that goes into finding and responsibly disclosing vulnerabilities. Thank you for helping keep PlanOpticon and its users safe. |
| --- .github/workflows/release-binaries.yml | ||
| +++ .github/workflows/release-binaries.yml | ||
| @@ -47,10 +47,11 @@ | ||
| 47 | 47 | run: | |
| 48 | 48 | pip install -e ".[all]" |
| 49 | 49 | pip install pyinstaller |
| 50 | 50 | |
| 51 | 51 | - name: Build binary |
| 52 | + shell: bash | |
| 52 | 53 | run: | |
| 53 | 54 | pyinstaller \ |
| 54 | 55 | --name planopticon-${{ matrix.target }} \ |
| 55 | 56 | --onefile \ |
| 56 | 57 | --console \ |
| 57 | 58 | |
| 58 | 59 | DELETED implementation.md |
| --- .github/workflows/release-binaries.yml | |
| +++ .github/workflows/release-binaries.yml | |
| @@ -47,10 +47,11 @@ | |
| 47 | run: | |
| 48 | pip install -e ".[all]" |
| 49 | pip install pyinstaller |
| 50 | |
| 51 | - name: Build binary |
| 52 | run: | |
| 53 | pyinstaller \ |
| 54 | --name planopticon-${{ matrix.target }} \ |
| 55 | --onefile \ |
| 56 | --console \ |
| 57 | |
| 58 | ELETED implementation.md |
| --- .github/workflows/release-binaries.yml | |
| +++ .github/workflows/release-binaries.yml | |
| @@ -47,10 +47,11 @@ | |
| 47 | run: | |
| 48 | pip install -e ".[all]" |
| 49 | pip install pyinstaller |
| 50 | |
| 51 | - name: Build binary |
| 52 | shell: bash |
| 53 | run: | |
| 54 | pyinstaller \ |
| 55 | --name planopticon-${{ matrix.target }} \ |
| 56 | --onefile \ |
| 57 | --console \ |
| 58 | |
| 59 | ELETED implementation.md |
D
implementation.md
-272
| --- a/implementation.md | ||
| +++ b/implementation.md | ||
| @@ -1,272 +0,0 @@ | ||
| 1 | -# PlanOpticon Implementation Guide | |
| 2 | -This document provides detailed technical guidance for implementing the PlanOpticon system architecture. The suggested approach balances code quality, performance optimization, and architecture best practices. | |
| 3 | -## System Architecture | |
| 4 | -PlanOpticon follows a modular pipeline architecture with these core components: | |
| 5 | -``` | |
| 6 | -video_processor/ | |
| 7 | -├── extractors/ | |
| 8 | -│ ├── frame_extractor.py | |
| 9 | -│ ├── audio_extractor.py | |
| 10 | -│ └── text_extractor.py | |
| 11 | -├── api/ | |
| 12 | -│ ├── transcription_api.py | |
| 13 | -│ ├── vision_api.py | |
| 14 | -│ ├── llm_api.py | |
| 15 | -│ └── api_manager.py | |
| 16 | -├── analyzers/ | |
| 17 | -│ ├── content_analyzer.py | |
| 18 | -│ ├── diagram_analyzer.py | |
| 19 | -│ └── action_detector.py | |
| 20 | -├── integrators/ | |
| 21 | -│ ├── knowledge_graph.py | |
| 22 | -│ └── plan_generator.py | |
| 23 | -├── utils/ | |
| 24 | -│ ├── api_cache.py | |
| 25 | -│ ├── prompt_templates.py | |
| 26 | -│ └── visualization.py | |
| 27 | -└── cli/ | |
| 28 | - ├── commands.py | |
| 29 | - └── output_formatter.py | |
| 30 | -``` | |
| 31 | -## Implementation Approach | |
| 32 | -When building complex systems like PlanOpticon, it's critical to develop each component with clear boundaries and interfaces. The following approach provides a framework for high-quality implementation: | |
| 33 | -### Video and Audio Processing | |
| 34 | -Video frame extraction should be implemented with performance in mind: | |
| 35 | -``` | |
| 36 | -pythondef extract_frames(video_path, sampling_rate=1.0, change_threshold=0.15): | |
| 37 | - """ | |
| 38 | - Extract frames from video based on sampling rate and visual change detection. | |
| 39 | - | |
| 40 | - Parameters | |
| 41 | - ---------- | |
| 42 | - video_path : str | |
| 43 | - Path to video file | |
| 44 | - sampling_rate : float | |
| 45 | - Frame sampling rate (1.0 = every frame) | |
| 46 | - change_threshold : float | |
| 47 | - Threshold for detecting significant visual changes | |
| 48 | - | |
| 49 | - Returns | |
| 50 | - ------- | |
| 51 | - list | |
| 52 | - List of extracted frames as numpy arrays | |
| 53 | - """ | |
| 54 | - # Implementation details here | |
| 55 | - pass | |
| 56 | -``` | |
| 57 | -Consider using a decorator pattern for GPU acceleration when available: | |
| 58 | -``` | |
| 59 | -pythondef gpu_accelerated(func): | |
| 60 | - """Decorator to use GPU implementation when available.""" | |
| 61 | - @functools.wraps(func) | |
| 62 | - def wrapper(*args, **kwargs): | |
| 63 | - if is_gpu_available() and not kwargs.get('disable_gpu'): | |
| 64 | - return func_gpu(*args, **kwargs) | |
| 65 | - return func(*args, **kwargs) | |
| 66 | - return wrapper | |
| 67 | -``` | |
| 68 | -### Computer Vision Components | |
| 69 | -When implementing diagram detection, consider using a progressive refinement approach: | |
| 70 | -``` | |
| 71 | -pythonclass DiagramDetector: | |
| 72 | - """Detects and extracts diagrams from video frames.""" | |
| 73 | - | |
| 74 | - def __init__(self, model_path, confidence_threshold=0.7): | |
| 75 | - """Initialize detector with pre-trained model.""" | |
| 76 | - # Implementation details | |
| 77 | - | |
| 78 | - def detect(self, frame): | |
| 79 | - """ | |
| 80 | - Detect diagrams in a single frame. | |
| 81 | - | |
| 82 | - Parameters | |
| 83 | - ---------- | |
| 84 | - frame : numpy.ndarray | |
| 85 | - Video frame as numpy array | |
| 86 | - | |
| 87 | - Returns | |
| 88 | - ------- | |
| 89 | - list | |
| 90 | - List of detected diagram regions as bounding boxes | |
| 91 | - """ | |
| 92 | - # 1. Initial region proposal | |
| 93 | - # 2. Feature extraction | |
| 94 | - # A well-designed detection pipeline would incorporate multiple stages | |
| 95 | - # of increasingly refined detection to balance performance and accuracy | |
| 96 | - pass | |
| 97 | - | |
| 98 | - def extract_and_normalize(self, frame, regions): | |
| 99 | - """Extract and normalize detected diagrams.""" | |
| 100 | - # Implementation details | |
| 101 | - pass | |
| 102 | -``` | |
| 103 | -### Speech Processing Pipeline | |
| 104 | -The speech recognition and diarization system should be implemented with careful attention to context: | |
| 105 | -pythonclass SpeechProcessor: | |
| 106 | - """Process speech from audio extraction.""" | |
| 107 | - | |
| 108 | - def __init__(self, models_dir, device='auto'): | |
| 109 | - """ | |
| 110 | - Initialize speech processor. | |
| 111 | - | |
| 112 | - Parameters | |
| 113 | - ---------- | |
| 114 | - models_dir : str | |
| 115 | - Directory containing pre-trained models | |
| 116 | - device : str | |
| 117 | - Computing device ('cpu', 'cuda', 'auto') | |
| 118 | - """ | |
| 119 | - # Implementation details | |
| 120 | - | |
| 121 | - def process_audio(self, audio_path): | |
| 122 | - """ | |
| 123 | - Process audio file for transcription and speaker diarization. | |
| 124 | - | |
| 125 | - Parameters | |
| 126 | - ---------- | |
| 127 | - audio_path : str | |
| 128 | - Path to audio file | |
| 129 | - | |
| 130 | - Returns | |
| 131 | - ------- | |
| 132 | - dict | |
| 133 | - Processed speech segments with speaker attribution | |
| 134 | - """ | |
| 135 | - # The key to effective speech processing is maintaining temporal context | |
| 136 | - # throughout the pipeline and handling speaker transitions gracefully | |
| 137 | - pass | |
| 138 | -### Action Item Detection | |
| 139 | -Action item detection requires sophisticated NLP techniques: | |
| 140 | -pythonclass ActionItemDetector: | |
| 141 | - """Detect action items from transcript.""" | |
| 142 | - | |
| 143 | - def detect_action_items(self, transcript): | |
| 144 | - """ | |
| 145 | - Detect action items from transcript. | |
| 146 | - | |
| 147 | - Parameters | |
| 148 | - ---------- | |
| 149 | - transcript : list | |
| 150 | - List of transcript segments | |
| 151 | - | |
| 152 | - Returns | |
| 153 | - ------- | |
| 154 | - list | |
| 155 | - Detected action items with metadata | |
| 156 | - """ | |
| 157 | - # A well-designed action item detector would incorporate: | |
| 158 | - # 1. Intent recognition | |
| 159 | - # 2. Commitment language detection | |
| 160 | - # 3. Responsibility attribution | |
| 161 | - # 4. Deadline extraction | |
| 162 | - # 5. Priority estimation | |
| 163 | - pass | |
| 164 | -## Performance Optimization | |
| 165 | -For optimal performance across different hardware targets: | |
| 166 | - | |
| 167 | -ARM Optimization | |
| 168 | - | |
| 169 | -Use vectorized operations with NumPy/SciPy where possible | |
| 170 | -Implement conditional paths for ARM-specific optimizations | |
| 171 | -Consider using PyTorch's mobile optimized models | |
| 172 | - | |
| 173 | - | |
| 174 | -## Memory Management | |
| 175 | - | |
| 176 | -Implement progressive loading for large videos | |
| 177 | -Use memory-mapped file access for large datasets | |
| 178 | -Release resources explicitly when no longer needed | |
| 179 | - | |
| 180 | - | |
| 181 | -## GPU Acceleration | |
| 182 | - | |
| 183 | -Design compute-intensive operations to work in batches | |
| 184 | -Minimize CPU-GPU memory transfers | |
| 185 | -Implement fallback paths for CPU-only environments | |
| 186 | - | |
| 187 | - | |
| 188 | - | |
| 189 | -## Code Quality Guidelines | |
| 190 | -Maintain high code quality through these practices: | |
| 191 | - | |
| 192 | -### PEP 8 Compliance | |
| 193 | - | |
| 194 | -Consistent 4-space indentation | |
| 195 | -Maximum line length of 88 characters (Black formatter standard) | |
| 196 | -Descriptive variable names with snake_case convention | |
| 197 | -Comprehensive docstrings for all public functions and classes | |
| 198 | - | |
| 199 | - | |
| 200 | -### Type Annotations | |
| 201 | - | |
| 202 | -Use Python's type hints consistently throughout codebase | |
| 203 | -Define custom types for complex data structures | |
| 204 | -Validate with mypy during development | |
| 205 | - | |
| 206 | - | |
| 207 | -### Testing Strategy | |
| 208 | - | |
| 209 | -Write unit tests for each module with minimum 80% coverage | |
| 210 | -Create integration tests for component interactions | |
| 211 | -Implement performance benchmarks for critical paths | |
| 212 | - | |
| 213 | - | |
| 214 | - | |
| 215 | -# API Integration Considerations | |
| 216 | -When implementing cloud API components, consider: | |
| 217 | - | |
| 218 | -## API Selection | |
| 219 | - | |
| 220 | -Balance capabilities, cost, and performance requirements | |
| 221 | -Implement appropriate rate limiting and quota management | |
| 222 | -Design with graceful fallbacks between different API providers | |
| 223 | - | |
| 224 | - | |
| 225 | -### Efficient API Usage | |
| 226 | - | |
| 227 | -Create optimized prompts for different content types | |
| 228 | -Batch requests where possible to minimize API calls | |
| 229 | -Implement caching to avoid redundant API calls | |
| 230 | - | |
| 231 | - | |
| 232 | -### Prompt Engineering | |
| 233 | - | |
| 234 | -Design effective prompt templates for consistent results | |
| 235 | -Implement few-shot examples for specialized content understanding | |
| 236 | -Create chain-of-thought prompting for complex analysis tasks | |
| 237 | - | |
| 238 | - | |
| 239 | - | |
| 240 | -## Prompting Guidelines | |
| 241 | -When developing complex AI systems, clear guidance helps ensure effective implementation. Consider these approaches: | |
| 242 | - | |
| 243 | -### Component Breakdown | |
| 244 | - | |
| 245 | -Begin by dividing the system into well-defined modules | |
| 246 | -Define clear interfaces between components | |
| 247 | -Specify expected inputs and outputs for each function | |
| 248 | - | |
| 249 | - | |
| 250 | -### Progressive Development | |
| 251 | - | |
| 252 | -Start with skeleton implementation of core functionality | |
| 253 | -Add refinements iteratively | |
| 254 | -Implement error handling after core functionality works | |
| 255 | - | |
| 256 | - | |
| 257 | -### Example-Driven Design | |
| 258 | - | |
| 259 | -Provide clear examples of expected behaviors | |
| 260 | -Include sample inputs and outputs | |
| 261 | -Demonstrate error cases and handling | |
| 262 | - | |
| 263 | - | |
| 264 | -### Architecture Patterns | |
| 265 | - | |
| 266 | -Use factory patterns for flexible component creation | |
| 267 | -Implement strategy patterns for algorithm selection | |
| 268 | -Apply decorator patterns for cross-cutting concerns | |
| 269 | - | |
| 270 | -Remember that the best implementations come from clear understanding of the problem domain and careful consideration of edge cases. | |
| 271 | - | |
| 272 | -PlanOpticon's implementation requires attention to both high-level architecture and low-level optimization. By following these guidelines, developers can create a robust, performant system that effectively extracts valuable information from video content. |
| --- a/implementation.md | |
| +++ b/implementation.md | |
| @@ -1,272 +0,0 @@ | |
| 1 | # PlanOpticon Implementation Guide |
| 2 | This document provides detailed technical guidance for implementing the PlanOpticon system architecture. The suggested approach balances code quality, performance optimization, and architecture best practices. |
| 3 | ## System Architecture |
| 4 | PlanOpticon follows a modular pipeline architecture with these core components: |
| 5 | ``` |
| 6 | video_processor/ |
| 7 | ├── extractors/ |
| 8 | │ ├── frame_extractor.py |
| 9 | │ ├── audio_extractor.py |
| 10 | │ └── text_extractor.py |
| 11 | ├── api/ |
| 12 | │ ├── transcription_api.py |
| 13 | │ ├── vision_api.py |
| 14 | │ ├── llm_api.py |
| 15 | │ └── api_manager.py |
| 16 | ├── analyzers/ |
| 17 | │ ├── content_analyzer.py |
| 18 | │ ├── diagram_analyzer.py |
| 19 | │ └── action_detector.py |
| 20 | ├── integrators/ |
| 21 | │ ├── knowledge_graph.py |
| 22 | │ └── plan_generator.py |
| 23 | ├── utils/ |
| 24 | │ ├── api_cache.py |
| 25 | │ ├── prompt_templates.py |
| 26 | │ └── visualization.py |
| 27 | └── cli/ |
| 28 | ├── commands.py |
| 29 | └── output_formatter.py |
| 30 | ``` |
| 31 | ## Implementation Approach |
| 32 | When building complex systems like PlanOpticon, it's critical to develop each component with clear boundaries and interfaces. The following approach provides a framework for high-quality implementation: |
| 33 | ### Video and Audio Processing |
| 34 | Video frame extraction should be implemented with performance in mind: |
| 35 | ``` |
| 36 | pythondef extract_frames(video_path, sampling_rate=1.0, change_threshold=0.15): |
| 37 | """ |
| 38 | Extract frames from video based on sampling rate and visual change detection. |
| 39 | |
| 40 | Parameters |
| 41 | ---------- |
| 42 | video_path : str |
| 43 | Path to video file |
| 44 | sampling_rate : float |
| 45 | Frame sampling rate (1.0 = every frame) |
| 46 | change_threshold : float |
| 47 | Threshold for detecting significant visual changes |
| 48 | |
| 49 | Returns |
| 50 | ------- |
| 51 | list |
| 52 | List of extracted frames as numpy arrays |
| 53 | """ |
| 54 | # Implementation details here |
| 55 | pass |
| 56 | ``` |
| 57 | Consider using a decorator pattern for GPU acceleration when available: |
| 58 | ``` |
| 59 | pythondef gpu_accelerated(func): |
| 60 | """Decorator to use GPU implementation when available.""" |
| 61 | @functools.wraps(func) |
| 62 | def wrapper(*args, **kwargs): |
| 63 | if is_gpu_available() and not kwargs.get('disable_gpu'): |
| 64 | return func_gpu(*args, **kwargs) |
| 65 | return func(*args, **kwargs) |
| 66 | return wrapper |
| 67 | ``` |
| 68 | ### Computer Vision Components |
| 69 | When implementing diagram detection, consider using a progressive refinement approach: |
| 70 | ``` |
| 71 | pythonclass DiagramDetector: |
| 72 | """Detects and extracts diagrams from video frames.""" |
| 73 | |
| 74 | def __init__(self, model_path, confidence_threshold=0.7): |
| 75 | """Initialize detector with pre-trained model.""" |
| 76 | # Implementation details |
| 77 | |
| 78 | def detect(self, frame): |
| 79 | """ |
| 80 | Detect diagrams in a single frame. |
| 81 | |
| 82 | Parameters |
| 83 | ---------- |
| 84 | frame : numpy.ndarray |
| 85 | Video frame as numpy array |
| 86 | |
| 87 | Returns |
| 88 | ------- |
| 89 | list |
| 90 | List of detected diagram regions as bounding boxes |
| 91 | """ |
| 92 | # 1. Initial region proposal |
| 93 | # 2. Feature extraction |
| 94 | # A well-designed detection pipeline would incorporate multiple stages |
| 95 | # of increasingly refined detection to balance performance and accuracy |
| 96 | pass |
| 97 | |
| 98 | def extract_and_normalize(self, frame, regions): |
| 99 | """Extract and normalize detected diagrams.""" |
| 100 | # Implementation details |
| 101 | pass |
| 102 | ``` |
| 103 | ### Speech Processing Pipeline |
| 104 | The speech recognition and diarization system should be implemented with careful attention to context: |
| 105 | pythonclass SpeechProcessor: |
| 106 | """Process speech from audio extraction.""" |
| 107 | |
| 108 | def __init__(self, models_dir, device='auto'): |
| 109 | """ |
| 110 | Initialize speech processor. |
| 111 | |
| 112 | Parameters |
| 113 | ---------- |
| 114 | models_dir : str |
| 115 | Directory containing pre-trained models |
| 116 | device : str |
| 117 | Computing device ('cpu', 'cuda', 'auto') |
| 118 | """ |
| 119 | # Implementation details |
| 120 | |
| 121 | def process_audio(self, audio_path): |
| 122 | """ |
| 123 | Process audio file for transcription and speaker diarization. |
| 124 | |
| 125 | Parameters |
| 126 | ---------- |
| 127 | audio_path : str |
| 128 | Path to audio file |
| 129 | |
| 130 | Returns |
| 131 | ------- |
| 132 | dict |
| 133 | Processed speech segments with speaker attribution |
| 134 | """ |
| 135 | # The key to effective speech processing is maintaining temporal context |
| 136 | # throughout the pipeline and handling speaker transitions gracefully |
| 137 | pass |
| 138 | ### Action Item Detection |
| 139 | Action item detection requires sophisticated NLP techniques: |
| 140 | pythonclass ActionItemDetector: |
| 141 | """Detect action items from transcript.""" |
| 142 | |
| 143 | def detect_action_items(self, transcript): |
| 144 | """ |
| 145 | Detect action items from transcript. |
| 146 | |
| 147 | Parameters |
| 148 | ---------- |
| 149 | transcript : list |
| 150 | List of transcript segments |
| 151 | |
| 152 | Returns |
| 153 | ------- |
| 154 | list |
| 155 | Detected action items with metadata |
| 156 | """ |
| 157 | # A well-designed action item detector would incorporate: |
| 158 | # 1. Intent recognition |
| 159 | # 2. Commitment language detection |
| 160 | # 3. Responsibility attribution |
| 161 | # 4. Deadline extraction |
| 162 | # 5. Priority estimation |
| 163 | pass |
| 164 | ## Performance Optimization |
| 165 | For optimal performance across different hardware targets: |
| 166 | |
| 167 | ARM Optimization |
| 168 | |
| 169 | Use vectorized operations with NumPy/SciPy where possible |
| 170 | Implement conditional paths for ARM-specific optimizations |
| 171 | Consider using PyTorch's mobile optimized models |
| 172 | |
| 173 | |
| 174 | ## Memory Management |
| 175 | |
| 176 | Implement progressive loading for large videos |
| 177 | Use memory-mapped file access for large datasets |
| 178 | Release resources explicitly when no longer needed |
| 179 | |
| 180 | |
| 181 | ## GPU Acceleration |
| 182 | |
| 183 | Design compute-intensive operations to work in batches |
| 184 | Minimize CPU-GPU memory transfers |
| 185 | Implement fallback paths for CPU-only environments |
| 186 | |
| 187 | |
| 188 | |
| 189 | ## Code Quality Guidelines |
| 190 | Maintain high code quality through these practices: |
| 191 | |
| 192 | ### PEP 8 Compliance |
| 193 | |
| 194 | Consistent 4-space indentation |
| 195 | Maximum line length of 88 characters (Black formatter standard) |
| 196 | Descriptive variable names with snake_case convention |
| 197 | Comprehensive docstrings for all public functions and classes |
| 198 | |
| 199 | |
| 200 | ### Type Annotations |
| 201 | |
| 202 | Use Python's type hints consistently throughout codebase |
| 203 | Define custom types for complex data structures |
| 204 | Validate with mypy during development |
| 205 | |
| 206 | |
| 207 | ### Testing Strategy |
| 208 | |
| 209 | Write unit tests for each module with minimum 80% coverage |
| 210 | Create integration tests for component interactions |
| 211 | Implement performance benchmarks for critical paths |
| 212 | |
| 213 | |
| 214 | |
| 215 | # API Integration Considerations |
| 216 | When implementing cloud API components, consider: |
| 217 | |
| 218 | ## API Selection |
| 219 | |
| 220 | Balance capabilities, cost, and performance requirements |
| 221 | Implement appropriate rate limiting and quota management |
| 222 | Design with graceful fallbacks between different API providers |
| 223 | |
| 224 | |
| 225 | ### Efficient API Usage |
| 226 | |
| 227 | Create optimized prompts for different content types |
| 228 | Batch requests where possible to minimize API calls |
| 229 | Implement caching to avoid redundant API calls |
| 230 | |
| 231 | |
| 232 | ### Prompt Engineering |
| 233 | |
| 234 | Design effective prompt templates for consistent results |
| 235 | Implement few-shot examples for specialized content understanding |
| 236 | Create chain-of-thought prompting for complex analysis tasks |
| 237 | |
| 238 | |
| 239 | |
| 240 | ## Prompting Guidelines |
| 241 | When developing complex AI systems, clear guidance helps ensure effective implementation. Consider these approaches: |
| 242 | |
| 243 | ### Component Breakdown |
| 244 | |
| 245 | Begin by dividing the system into well-defined modules |
| 246 | Define clear interfaces between components |
| 247 | Specify expected inputs and outputs for each function |
| 248 | |
| 249 | |
| 250 | ### Progressive Development |
| 251 | |
| 252 | Start with skeleton implementation of core functionality |
| 253 | Add refinements iteratively |
| 254 | Implement error handling after core functionality works |
| 255 | |
| 256 | |
| 257 | ### Example-Driven Design |
| 258 | |
| 259 | Provide clear examples of expected behaviors |
| 260 | Include sample inputs and outputs |
| 261 | Demonstrate error cases and handling |
| 262 | |
| 263 | |
| 264 | ### Architecture Patterns |
| 265 | |
| 266 | Use factory patterns for flexible component creation |
| 267 | Implement strategy patterns for algorithm selection |
| 268 | Apply decorator patterns for cross-cutting concerns |
| 269 | |
| 270 | Remember that the best implementations come from clear understanding of the problem domain and careful consideration of edge cases. |
| 271 | |
| 272 | PlanOpticon's implementation requires attention to both high-level architecture and low-level optimization. By following these guidelines, developers can create a robust, performant system that effectively extracts valuable information from video content. |
| --- a/implementation.md | |
| +++ b/implementation.md | |
| @@ -1,272 +0,0 @@ | |
+3
| --- pyproject.toml | ||
| +++ pyproject.toml | ||
| @@ -99,10 +99,13 @@ | ||
| 99 | 99 | target-version = "py310" |
| 100 | 100 | |
| 101 | 101 | [tool.ruff.lint] |
| 102 | 102 | select = ["E", "F", "W", "I"] |
| 103 | 103 | |
| 104 | +[tool.ruff.lint.per-file-ignores] | |
| 105 | +"video_processor/utils/prompt_templates.py" = ["E501"] | |
| 106 | + | |
| 104 | 107 | [tool.mypy] |
| 105 | 108 | python_version = "3.10" |
| 106 | 109 | warn_return_any = true |
| 107 | 110 | warn_unused_configs = true |
| 108 | 111 | |
| 109 | 112 | |
| 110 | 113 | DELETED scripts/setup.sh |
| --- pyproject.toml | |
| +++ pyproject.toml | |
| @@ -99,10 +99,13 @@ | |
| 99 | target-version = "py310" |
| 100 | |
| 101 | [tool.ruff.lint] |
| 102 | select = ["E", "F", "W", "I"] |
| 103 | |
| 104 | [tool.mypy] |
| 105 | python_version = "3.10" |
| 106 | warn_return_any = true |
| 107 | warn_unused_configs = true |
| 108 | |
| 109 | |
| 110 | ELETED scripts/setup.sh |
| --- pyproject.toml | |
| +++ pyproject.toml | |
| @@ -99,10 +99,13 @@ | |
| 99 | target-version = "py310" |
| 100 | |
| 101 | [tool.ruff.lint] |
| 102 | select = ["E", "F", "W", "I"] |
| 103 | |
| 104 | [tool.ruff.lint.per-file-ignores] |
| 105 | "video_processor/utils/prompt_templates.py" = ["E501"] |
| 106 | |
| 107 | [tool.mypy] |
| 108 | python_version = "3.10" |
| 109 | warn_return_any = true |
| 110 | warn_unused_configs = true |
| 111 | |
| 112 | |
| 113 | ELETED scripts/setup.sh |
D
scripts/setup.sh
-120
| --- a/scripts/setup.sh | ||
| +++ b/scripts/setup.sh | ||
| @@ -1,120 +0,0 @@ | ||
| 1 | -#!/bin/bash | |
| 2 | -# PlanOpticon setup script | |
| 3 | -set -e | |
| 4 | - | |
| 5 | -# Detect operating system | |
| 6 | -if [[ "$OSTYPE" == "darwin"* ]]; then | |
| 7 | - OS="macos" | |
| 8 | -elif [[ "$OSTYPE" == "linux-gnu"* ]]; then | |
| 9 | - OS="linux" | |
| 10 | -else | |
| 11 | - echo "Unsupported operating system: $OSTYPE" | |
| 12 | - exit 1 | |
| 13 | -fi | |
| 14 | - | |
| 15 | -# Detect architecture | |
| 16 | -ARCH=$(uname -m) | |
| 17 | -if [[ "$ARCH" == "arm64" ]] || [[ "$ARCH" == "aarch64" ]]; then | |
| 18 | - ARCH="arm64" | |
| 19 | -elif [[ "$ARCH" == "x86_64" ]]; then | |
| 20 | - ARCH="x86_64" | |
| 21 | -else | |
| 22 | - echo "Unsupported architecture: $ARCH" | |
| 23 | - exit 1 | |
| 24 | -fi | |
| 25 | - | |
| 26 | -echo "Setting up PlanOpticon on $OS ($ARCH)..." | |
| 27 | - | |
| 28 | -# Check for Python | |
| 29 | -if ! command -v python3 &> /dev/null; then | |
| 30 | - echo "Python 3 is required but not found." | |
| 31 | - if [[ "$OS" == "macos" ]]; then | |
| 32 | - echo "Please install Python 3 using Homebrew or from python.org." | |
| 33 | - echo " brew install python" | |
| 34 | - elif [[ "$OS" == "linux" ]]; then | |
| 35 | - echo "Please install Python 3 using your package manager." | |
| 36 | - echo " Ubuntu/Debian: sudo apt install python3 python3-pip python3-venv" | |
| 37 | - echo " Fedora: sudo dnf install python3 python3-pip" | |
| 38 | - fi | |
| 39 | - exit 1 | |
| 40 | -fi | |
| 41 | - | |
| 42 | -# Check Python version | |
| 43 | -PY_VERSION=$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")') | |
| 44 | -PY_MAJOR=$(echo $PY_VERSION | cut -d. -f1) | |
| 45 | -PY_MINOR=$(echo $PY_VERSION | cut -d. -f2) | |
| 46 | - | |
| 47 | -if [[ "$PY_MAJOR" -lt 3 ]] || [[ "$PY_MAJOR" -eq 3 && "$PY_MINOR" -lt 9 ]]; then | |
| 48 | - echo "Python 3.9 or higher is required, but found $PY_VERSION." | |
| 49 | - echo "Please upgrade your Python installation." | |
| 50 | - exit 1 | |
| 51 | -fi | |
| 52 | - | |
| 53 | -echo "Using Python $PY_VERSION" | |
| 54 | - | |
| 55 | -# Check for FFmpeg | |
| 56 | -if ! command -v ffmpeg &> /dev/null; then | |
| 57 | - echo "FFmpeg is required but not found." | |
| 58 | - if [[ "$OS" == "macos" ]]; then | |
| 59 | - echo "Please install FFmpeg using Homebrew:" | |
| 60 | - echo " brew install ffmpeg" | |
| 61 | - elif [[ "$OS" == "linux" ]]; then | |
| 62 | - echo "Please install FFmpeg using your package manager:" | |
| 63 | - echo " Ubuntu/Debian: sudo apt install ffmpeg" | |
| 64 | - echo " Fedora: sudo dnf install ffmpeg" | |
| 65 | - fi | |
| 66 | - exit 1 | |
| 67 | -fi | |
| 68 | - | |
| 69 | -echo "FFmpeg found" | |
| 70 | - | |
| 71 | -# Create and activate virtual environment | |
| 72 | -if [[ -d "venv" ]]; then | |
| 73 | - echo "Virtual environment already exists" | |
| 74 | -else | |
| 75 | - echo "Creating virtual environment..." | |
| 76 | - python3 -m venv venv | |
| 77 | -fi | |
| 78 | - | |
| 79 | -# Determine activate script path | |
| 80 | -if [[ "$OS" == "macos" ]] || [[ "$OS" == "linux" ]]; then | |
| 81 | - ACTIVATE="venv/bin/activate" | |
| 82 | -fi | |
| 83 | - | |
| 84 | -echo "Activating virtual environment..." | |
| 85 | -source "$ACTIVATE" | |
| 86 | - | |
| 87 | -# Upgrade pip | |
| 88 | -echo "Upgrading pip..." | |
| 89 | -pip install --upgrade pip | |
| 90 | - | |
| 91 | -# Install dependencies | |
| 92 | -echo "Installing dependencies..." | |
| 93 | -pip install -e . | |
| 94 | - | |
| 95 | -# Install optional GPU dependencies if available | |
| 96 | -if [[ "$OS" == "macos" && "$ARCH" == "arm64" ]]; then | |
| 97 | - echo "Installing optional ARM-specific packages for macOS..." | |
| 98 | - pip install -r requirements-apple.txt 2>/dev/null || echo "No ARM-specific packages found or could not install them." | |
| 99 | -elif [[ "$ARCH" == "x86_64" ]]; then | |
| 100 | - # Check for NVIDIA GPU | |
| 101 | - if [[ "$OS" == "linux" ]] && command -v nvidia-smi &> /dev/null; then | |
| 102 | - echo "NVIDIA GPU detected, installing GPU dependencies..." | |
| 103 | - pip install -r requirements-gpu.txt 2>/dev/null || echo "Could not install GPU packages." | |
| 104 | - fi | |
| 105 | -fi | |
| 106 | - | |
| 107 | -# Create example .env file if it doesn't exist | |
| 108 | -if [[ ! -f ".env" ]]; then | |
| 109 | - echo "Creating example .env file..." | |
| 110 | - cp .env.example .env | |
| 111 | - echo "Please edit the .env file to add your API keys." | |
| 112 | -fi | |
| 113 | - | |
| 114 | -echo "Setup complete! PlanOpticon is ready to use." | |
| 115 | -echo "" | |
| 116 | -echo "To activate the virtual environment, run:" | |
| 117 | -echo " source \"$ACTIVATE\"" | |
| 118 | -echo "" | |
| 119 | -echo "To run PlanOpticon, use:" | |
| 120 | -echo " planopticon --help" |
| --- a/scripts/setup.sh | |
| +++ b/scripts/setup.sh | |
| @@ -1,120 +0,0 @@ | |
| 1 | #!/bin/bash |
| 2 | # PlanOpticon setup script |
| 3 | set -e |
| 4 | |
| 5 | # Detect operating system |
| 6 | if [[ "$OSTYPE" == "darwin"* ]]; then |
| 7 | OS="macos" |
| 8 | elif [[ "$OSTYPE" == "linux-gnu"* ]]; then |
| 9 | OS="linux" |
| 10 | else |
| 11 | echo "Unsupported operating system: $OSTYPE" |
| 12 | exit 1 |
| 13 | fi |
| 14 | |
| 15 | # Detect architecture |
| 16 | ARCH=$(uname -m) |
| 17 | if [[ "$ARCH" == "arm64" ]] || [[ "$ARCH" == "aarch64" ]]; then |
| 18 | ARCH="arm64" |
| 19 | elif [[ "$ARCH" == "x86_64" ]]; then |
| 20 | ARCH="x86_64" |
| 21 | else |
| 22 | echo "Unsupported architecture: $ARCH" |
| 23 | exit 1 |
| 24 | fi |
| 25 | |
| 26 | echo "Setting up PlanOpticon on $OS ($ARCH)..." |
| 27 | |
| 28 | # Check for Python |
| 29 | if ! command -v python3 &> /dev/null; then |
| 30 | echo "Python 3 is required but not found." |
| 31 | if [[ "$OS" == "macos" ]]; then |
| 32 | echo "Please install Python 3 using Homebrew or from python.org." |
| 33 | echo " brew install python" |
| 34 | elif [[ "$OS" == "linux" ]]; then |
| 35 | echo "Please install Python 3 using your package manager." |
| 36 | echo " Ubuntu/Debian: sudo apt install python3 python3-pip python3-venv" |
| 37 | echo " Fedora: sudo dnf install python3 python3-pip" |
| 38 | fi |
| 39 | exit 1 |
| 40 | fi |
| 41 | |
| 42 | # Check Python version |
| 43 | PY_VERSION=$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")') |
| 44 | PY_MAJOR=$(echo $PY_VERSION | cut -d. -f1) |
| 45 | PY_MINOR=$(echo $PY_VERSION | cut -d. -f2) |
| 46 | |
| 47 | if [[ "$PY_MAJOR" -lt 3 ]] || [[ "$PY_MAJOR" -eq 3 && "$PY_MINOR" -lt 9 ]]; then |
| 48 | echo "Python 3.9 or higher is required, but found $PY_VERSION." |
| 49 | echo "Please upgrade your Python installation." |
| 50 | exit 1 |
| 51 | fi |
| 52 | |
| 53 | echo "Using Python $PY_VERSION" |
| 54 | |
| 55 | # Check for FFmpeg |
| 56 | if ! command -v ffmpeg &> /dev/null; then |
| 57 | echo "FFmpeg is required but not found." |
| 58 | if [[ "$OS" == "macos" ]]; then |
| 59 | echo "Please install FFmpeg using Homebrew:" |
| 60 | echo " brew install ffmpeg" |
| 61 | elif [[ "$OS" == "linux" ]]; then |
| 62 | echo "Please install FFmpeg using your package manager:" |
| 63 | echo " Ubuntu/Debian: sudo apt install ffmpeg" |
| 64 | echo " Fedora: sudo dnf install ffmpeg" |
| 65 | fi |
| 66 | exit 1 |
| 67 | fi |
| 68 | |
| 69 | echo "FFmpeg found" |
| 70 | |
| 71 | # Create and activate virtual environment |
| 72 | if [[ -d "venv" ]]; then |
| 73 | echo "Virtual environment already exists" |
| 74 | else |
| 75 | echo "Creating virtual environment..." |
| 76 | python3 -m venv venv |
| 77 | fi |
| 78 | |
| 79 | # Determine activate script path |
| 80 | if [[ "$OS" == "macos" ]] || [[ "$OS" == "linux" ]]; then |
| 81 | ACTIVATE="venv/bin/activate" |
| 82 | fi |
| 83 | |
| 84 | echo "Activating virtual environment..." |
| 85 | source "$ACTIVATE" |
| 86 | |
| 87 | # Upgrade pip |
| 88 | echo "Upgrading pip..." |
| 89 | pip install --upgrade pip |
| 90 | |
| 91 | # Install dependencies |
| 92 | echo "Installing dependencies..." |
| 93 | pip install -e . |
| 94 | |
| 95 | # Install optional GPU dependencies if available |
| 96 | if [[ "$OS" == "macos" && "$ARCH" == "arm64" ]]; then |
| 97 | echo "Installing optional ARM-specific packages for macOS..." |
| 98 | pip install -r requirements-apple.txt 2>/dev/null || echo "No ARM-specific packages found or could not install them." |
| 99 | elif [[ "$ARCH" == "x86_64" ]]; then |
| 100 | # Check for NVIDIA GPU |
| 101 | if [[ "$OS" == "linux" ]] && command -v nvidia-smi &> /dev/null; then |
| 102 | echo "NVIDIA GPU detected, installing GPU dependencies..." |
| 103 | pip install -r requirements-gpu.txt 2>/dev/null || echo "Could not install GPU packages." |
| 104 | fi |
| 105 | fi |
| 106 | |
| 107 | # Create example .env file if it doesn't exist |
| 108 | if [[ ! -f ".env" ]]; then |
| 109 | echo "Creating example .env file..." |
| 110 | cp .env.example .env |
| 111 | echo "Please edit the .env file to add your API keys." |
| 112 | fi |
| 113 | |
| 114 | echo "Setup complete! PlanOpticon is ready to use." |
| 115 | echo "" |
| 116 | echo "To activate the virtual environment, run:" |
| 117 | echo " source \"$ACTIVATE\"" |
| 118 | echo "" |
| 119 | echo "To run PlanOpticon, use:" |
| 120 | echo " planopticon --help" |
| --- a/scripts/setup.sh | |
| +++ b/scripts/setup.sh | |
| @@ -1,120 +0,0 @@ | |
M
setup.py
+1
| --- setup.py | ||
| +++ setup.py | ||
| @@ -1,4 +1,5 @@ | ||
| 1 | 1 | """Backwards-compatible setup.py — all config lives in pyproject.toml.""" |
| 2 | + | |
| 2 | 3 | from setuptools import setup |
| 3 | 4 | |
| 4 | 5 | setup() |
| 5 | 6 |
| --- setup.py | |
| +++ setup.py | |
| @@ -1,4 +1,5 @@ | |
| 1 | """Backwards-compatible setup.py — all config lives in pyproject.toml.""" |
| 2 | from setuptools import setup |
| 3 | |
| 4 | setup() |
| 5 |
| --- setup.py | |
| +++ setup.py | |
| @@ -1,4 +1,5 @@ | |
| 1 | """Backwards-compatible setup.py — all config lives in pyproject.toml.""" |
| 2 | |
| 3 | from setuptools import setup |
| 4 | |
| 5 | setup() |
| 6 |
+56
-32
| --- tests/test_action_detector.py | ||
| +++ tests/test_action_detector.py | ||
| @@ -1,20 +1,20 @@ | ||
| 1 | 1 | """Tests for enhanced action item detection.""" |
| 2 | 2 | |
| 3 | 3 | import json |
| 4 | 4 | from unittest.mock import MagicMock |
| 5 | 5 | |
| 6 | -import pytest | |
| 7 | - | |
| 8 | 6 | from video_processor.analyzers.action_detector import ActionDetector |
| 9 | 7 | from video_processor.models import ActionItem, TranscriptSegment |
| 10 | 8 | |
| 11 | 9 | |
| 12 | 10 | class TestPatternExtract: |
| 13 | 11 | def test_detects_need_to(self): |
| 14 | 12 | detector = ActionDetector() |
| 15 | - items = detector.detect_from_transcript("We need to update the database schema before release.") | |
| 13 | + items = detector.detect_from_transcript( | |
| 14 | + "We need to update the database schema before release." | |
| 15 | + ) | |
| 16 | 16 | assert len(items) >= 1 |
| 17 | 17 | assert any("database" in i.action.lower() for i in items) |
| 18 | 18 | |
| 19 | 19 | def test_detects_should(self): |
| 20 | 20 | detector = ActionDetector() |
| @@ -21,11 +21,13 @@ | ||
| 21 | 21 | items = detector.detect_from_transcript("Alice should review the pull request by Friday.") |
| 22 | 22 | assert len(items) >= 1 |
| 23 | 23 | |
| 24 | 24 | def test_detects_action_item_keyword(self): |
| 25 | 25 | detector = ActionDetector() |
| 26 | - items = detector.detect_from_transcript("Action item: set up monitoring for the new service.") | |
| 26 | + items = detector.detect_from_transcript( | |
| 27 | + "Action item: set up monitoring for the new service." | |
| 28 | + ) | |
| 27 | 29 | assert len(items) >= 1 |
| 28 | 30 | |
| 29 | 31 | def test_detects_follow_up(self): |
| 30 | 32 | detector = ActionDetector() |
| 31 | 33 | items = detector.detect_from_transcript("Follow up with the client about requirements.") |
| @@ -41,22 +43,16 @@ | ||
| 41 | 43 | items = detector.detect_from_transcript("Do it.") |
| 42 | 44 | assert len(items) == 0 |
| 43 | 45 | |
| 44 | 46 | def test_no_action_patterns(self): |
| 45 | 47 | detector = ActionDetector() |
| 46 | - items = detector.detect_from_transcript( | |
| 47 | - "The weather was nice today. We had lunch at noon." | |
| 48 | - ) | |
| 48 | + items = detector.detect_from_transcript("The weather was nice today. We had lunch at noon.") | |
| 49 | 49 | assert len(items) == 0 |
| 50 | 50 | |
| 51 | 51 | def test_multiple_sentences(self): |
| 52 | 52 | detector = ActionDetector() |
| 53 | - text = ( | |
| 54 | - "We need to deploy the fix. " | |
| 55 | - "Alice should test it first. " | |
| 56 | - "The sky is blue." | |
| 57 | - ) | |
| 53 | + text = "We need to deploy the fix. Alice should test it first. The sky is blue." | |
| 58 | 54 | items = detector.detect_from_transcript(text) |
| 59 | 55 | assert len(items) == 2 |
| 60 | 56 | |
| 61 | 57 | def test_source_is_transcript(self): |
| 62 | 58 | detector = ActionDetector() |
| @@ -66,14 +62,21 @@ | ||
| 66 | 62 | |
| 67 | 63 | |
| 68 | 64 | class TestLLMExtract: |
| 69 | 65 | def test_llm_extraction(self): |
| 70 | 66 | pm = MagicMock() |
| 71 | - pm.chat.return_value = json.dumps([ | |
| 72 | - {"action": "Deploy new version", "assignee": "Bob", "deadline": "Friday", | |
| 73 | - "priority": "high", "context": "Production release"} | |
| 74 | - ]) | |
| 67 | + pm.chat.return_value = json.dumps( | |
| 68 | + [ | |
| 69 | + { | |
| 70 | + "action": "Deploy new version", | |
| 71 | + "assignee": "Bob", | |
| 72 | + "deadline": "Friday", | |
| 73 | + "priority": "high", | |
| 74 | + "context": "Production release", | |
| 75 | + } | |
| 76 | + ] | |
| 77 | + ) | |
| 75 | 78 | detector = ActionDetector(provider_manager=pm) |
| 76 | 79 | items = detector.detect_from_transcript("Deploy new version by Friday.") |
| 77 | 80 | assert len(items) == 1 |
| 78 | 81 | assert items[0].action == "Deploy new version" |
| 79 | 82 | assert items[0].assignee == "Bob" |
| @@ -102,28 +105,37 @@ | ||
| 102 | 105 | items = detector.detect_from_transcript("Update the docs.") |
| 103 | 106 | assert items == [] |
| 104 | 107 | |
| 105 | 108 | def test_llm_skips_items_without_action(self): |
| 106 | 109 | pm = MagicMock() |
| 107 | - pm.chat.return_value = json.dumps([ | |
| 108 | - {"action": "Valid action", "assignee": None}, | |
| 109 | - {"assignee": "Alice"}, # No action field | |
| 110 | - {"action": "", "assignee": "Bob"}, # Empty action | |
| 111 | - ]) | |
| 110 | + pm.chat.return_value = json.dumps( | |
| 111 | + [ | |
| 112 | + {"action": "Valid action", "assignee": None}, | |
| 113 | + {"assignee": "Alice"}, # No action field | |
| 114 | + {"action": "", "assignee": "Bob"}, # Empty action | |
| 115 | + ] | |
| 116 | + ) | |
| 112 | 117 | detector = ActionDetector(provider_manager=pm) |
| 113 | 118 | items = detector.detect_from_transcript("Some text.") |
| 114 | 119 | assert len(items) == 1 |
| 115 | 120 | assert items[0].action == "Valid action" |
| 116 | 121 | |
| 117 | 122 | |
| 118 | 123 | class TestDetectFromDiagrams: |
| 119 | 124 | def test_dict_diagrams(self): |
| 120 | 125 | pm = MagicMock() |
| 121 | - pm.chat.return_value = json.dumps([ | |
| 122 | - {"action": "Migrate database", "assignee": None, "deadline": None, | |
| 123 | - "priority": None, "context": None}, | |
| 124 | - ]) | |
| 126 | + pm.chat.return_value = json.dumps( | |
| 127 | + [ | |
| 128 | + { | |
| 129 | + "action": "Migrate database", | |
| 130 | + "assignee": None, | |
| 131 | + "deadline": None, | |
| 132 | + "priority": None, | |
| 133 | + "context": None, | |
| 134 | + }, | |
| 135 | + ] | |
| 136 | + ) | |
| 125 | 137 | detector = ActionDetector(provider_manager=pm) |
| 126 | 138 | diagrams = [ |
| 127 | 139 | {"text_content": "Step 1: Migrate database", "elements": ["DB", "Migration"]}, |
| 128 | 140 | ] |
| 129 | 141 | items = detector.detect_from_diagrams(diagrams) |
| @@ -130,14 +142,21 @@ | ||
| 130 | 142 | assert len(items) == 1 |
| 131 | 143 | assert items[0].source == "diagram" |
| 132 | 144 | |
| 133 | 145 | def test_object_diagrams(self): |
| 134 | 146 | pm = MagicMock() |
| 135 | - pm.chat.return_value = json.dumps([ | |
| 136 | - {"action": "Update API", "assignee": None, "deadline": None, | |
| 137 | - "priority": None, "context": None}, | |
| 138 | - ]) | |
| 147 | + pm.chat.return_value = json.dumps( | |
| 148 | + [ | |
| 149 | + { | |
| 150 | + "action": "Update API", | |
| 151 | + "assignee": None, | |
| 152 | + "deadline": None, | |
| 153 | + "priority": None, | |
| 154 | + "context": None, | |
| 155 | + }, | |
| 156 | + ] | |
| 157 | + ) | |
| 139 | 158 | detector = ActionDetector(provider_manager=pm) |
| 140 | 159 | |
| 141 | 160 | class FakeDiagram: |
| 142 | 161 | text_content = "Update API endpoints" |
| 143 | 162 | elements = ["API", "Gateway"] |
| @@ -153,11 +172,14 @@ | ||
| 153 | 172 | assert items == [] |
| 154 | 173 | |
| 155 | 174 | def test_pattern_fallback_for_diagrams(self): |
| 156 | 175 | detector = ActionDetector() # No provider |
| 157 | 176 | diagrams = [ |
| 158 | - {"text_content": "We need to update the configuration before deployment.", "elements": []}, | |
| 177 | + { | |
| 178 | + "text_content": "We need to update the configuration before deployment.", | |
| 179 | + "elements": [], | |
| 180 | + }, | |
| 159 | 181 | ] |
| 160 | 182 | items = detector.detect_from_diagrams(diagrams) |
| 161 | 183 | assert len(items) >= 1 |
| 162 | 184 | assert items[0].source == "diagram" |
| 163 | 185 | |
| @@ -191,16 +213,18 @@ | ||
| 191 | 213 | |
| 192 | 214 | |
| 193 | 215 | class TestAttachTimestamps: |
| 194 | 216 | def test_attaches_matching_segment(self): |
| 195 | 217 | detector = ActionDetector() |
| 196 | - items = [ | |
| 218 | + [ | |
| 197 | 219 | ActionItem(action="We need to update the database schema before release"), |
| 198 | 220 | ] |
| 199 | 221 | segments = [ |
| 200 | 222 | TranscriptSegment(start=0.0, end=5.0, text="Welcome to the meeting."), |
| 201 | - TranscriptSegment(start=5.0, end=15.0, text="We need to update the database schema before release."), | |
| 223 | + TranscriptSegment( | |
| 224 | + start=5.0, end=15.0, text="We need to update the database schema before release." | |
| 225 | + ), | |
| 202 | 226 | TranscriptSegment(start=15.0, end=20.0, text="Any questions?"), |
| 203 | 227 | ] |
| 204 | 228 | detector.detect_from_transcript( |
| 205 | 229 | "We need to update the database schema before release.", |
| 206 | 230 | segments=segments, |
| 207 | 231 |
| --- tests/test_action_detector.py | |
| +++ tests/test_action_detector.py | |
| @@ -1,20 +1,20 @@ | |
| 1 | """Tests for enhanced action item detection.""" |
| 2 | |
| 3 | import json |
| 4 | from unittest.mock import MagicMock |
| 5 | |
| 6 | import pytest |
| 7 | |
| 8 | from video_processor.analyzers.action_detector import ActionDetector |
| 9 | from video_processor.models import ActionItem, TranscriptSegment |
| 10 | |
| 11 | |
| 12 | class TestPatternExtract: |
| 13 | def test_detects_need_to(self): |
| 14 | detector = ActionDetector() |
| 15 | items = detector.detect_from_transcript("We need to update the database schema before release.") |
| 16 | assert len(items) >= 1 |
| 17 | assert any("database" in i.action.lower() for i in items) |
| 18 | |
| 19 | def test_detects_should(self): |
| 20 | detector = ActionDetector() |
| @@ -21,11 +21,13 @@ | |
| 21 | items = detector.detect_from_transcript("Alice should review the pull request by Friday.") |
| 22 | assert len(items) >= 1 |
| 23 | |
| 24 | def test_detects_action_item_keyword(self): |
| 25 | detector = ActionDetector() |
| 26 | items = detector.detect_from_transcript("Action item: set up monitoring for the new service.") |
| 27 | assert len(items) >= 1 |
| 28 | |
| 29 | def test_detects_follow_up(self): |
| 30 | detector = ActionDetector() |
| 31 | items = detector.detect_from_transcript("Follow up with the client about requirements.") |
| @@ -41,22 +43,16 @@ | |
| 41 | items = detector.detect_from_transcript("Do it.") |
| 42 | assert len(items) == 0 |
| 43 | |
| 44 | def test_no_action_patterns(self): |
| 45 | detector = ActionDetector() |
| 46 | items = detector.detect_from_transcript( |
| 47 | "The weather was nice today. We had lunch at noon." |
| 48 | ) |
| 49 | assert len(items) == 0 |
| 50 | |
| 51 | def test_multiple_sentences(self): |
| 52 | detector = ActionDetector() |
| 53 | text = ( |
| 54 | "We need to deploy the fix. " |
| 55 | "Alice should test it first. " |
| 56 | "The sky is blue." |
| 57 | ) |
| 58 | items = detector.detect_from_transcript(text) |
| 59 | assert len(items) == 2 |
| 60 | |
| 61 | def test_source_is_transcript(self): |
| 62 | detector = ActionDetector() |
| @@ -66,14 +62,21 @@ | |
| 66 | |
| 67 | |
| 68 | class TestLLMExtract: |
| 69 | def test_llm_extraction(self): |
| 70 | pm = MagicMock() |
| 71 | pm.chat.return_value = json.dumps([ |
| 72 | {"action": "Deploy new version", "assignee": "Bob", "deadline": "Friday", |
| 73 | "priority": "high", "context": "Production release"} |
| 74 | ]) |
| 75 | detector = ActionDetector(provider_manager=pm) |
| 76 | items = detector.detect_from_transcript("Deploy new version by Friday.") |
| 77 | assert len(items) == 1 |
| 78 | assert items[0].action == "Deploy new version" |
| 79 | assert items[0].assignee == "Bob" |
| @@ -102,28 +105,37 @@ | |
| 102 | items = detector.detect_from_transcript("Update the docs.") |
| 103 | assert items == [] |
| 104 | |
| 105 | def test_llm_skips_items_without_action(self): |
| 106 | pm = MagicMock() |
| 107 | pm.chat.return_value = json.dumps([ |
| 108 | {"action": "Valid action", "assignee": None}, |
| 109 | {"assignee": "Alice"}, # No action field |
| 110 | {"action": "", "assignee": "Bob"}, # Empty action |
| 111 | ]) |
| 112 | detector = ActionDetector(provider_manager=pm) |
| 113 | items = detector.detect_from_transcript("Some text.") |
| 114 | assert len(items) == 1 |
| 115 | assert items[0].action == "Valid action" |
| 116 | |
| 117 | |
| 118 | class TestDetectFromDiagrams: |
| 119 | def test_dict_diagrams(self): |
| 120 | pm = MagicMock() |
| 121 | pm.chat.return_value = json.dumps([ |
| 122 | {"action": "Migrate database", "assignee": None, "deadline": None, |
| 123 | "priority": None, "context": None}, |
| 124 | ]) |
| 125 | detector = ActionDetector(provider_manager=pm) |
| 126 | diagrams = [ |
| 127 | {"text_content": "Step 1: Migrate database", "elements": ["DB", "Migration"]}, |
| 128 | ] |
| 129 | items = detector.detect_from_diagrams(diagrams) |
| @@ -130,14 +142,21 @@ | |
| 130 | assert len(items) == 1 |
| 131 | assert items[0].source == "diagram" |
| 132 | |
| 133 | def test_object_diagrams(self): |
| 134 | pm = MagicMock() |
| 135 | pm.chat.return_value = json.dumps([ |
| 136 | {"action": "Update API", "assignee": None, "deadline": None, |
| 137 | "priority": None, "context": None}, |
| 138 | ]) |
| 139 | detector = ActionDetector(provider_manager=pm) |
| 140 | |
| 141 | class FakeDiagram: |
| 142 | text_content = "Update API endpoints" |
| 143 | elements = ["API", "Gateway"] |
| @@ -153,11 +172,14 @@ | |
| 153 | assert items == [] |
| 154 | |
| 155 | def test_pattern_fallback_for_diagrams(self): |
| 156 | detector = ActionDetector() # No provider |
| 157 | diagrams = [ |
| 158 | {"text_content": "We need to update the configuration before deployment.", "elements": []}, |
| 159 | ] |
| 160 | items = detector.detect_from_diagrams(diagrams) |
| 161 | assert len(items) >= 1 |
| 162 | assert items[0].source == "diagram" |
| 163 | |
| @@ -191,16 +213,18 @@ | |
| 191 | |
| 192 | |
| 193 | class TestAttachTimestamps: |
| 194 | def test_attaches_matching_segment(self): |
| 195 | detector = ActionDetector() |
| 196 | items = [ |
| 197 | ActionItem(action="We need to update the database schema before release"), |
| 198 | ] |
| 199 | segments = [ |
| 200 | TranscriptSegment(start=0.0, end=5.0, text="Welcome to the meeting."), |
| 201 | TranscriptSegment(start=5.0, end=15.0, text="We need to update the database schema before release."), |
| 202 | TranscriptSegment(start=15.0, end=20.0, text="Any questions?"), |
| 203 | ] |
| 204 | detector.detect_from_transcript( |
| 205 | "We need to update the database schema before release.", |
| 206 | segments=segments, |
| 207 |
| --- tests/test_action_detector.py | |
| +++ tests/test_action_detector.py | |
| @@ -1,20 +1,20 @@ | |
| 1 | """Tests for enhanced action item detection.""" |
| 2 | |
| 3 | import json |
| 4 | from unittest.mock import MagicMock |
| 5 | |
| 6 | from video_processor.analyzers.action_detector import ActionDetector |
| 7 | from video_processor.models import ActionItem, TranscriptSegment |
| 8 | |
| 9 | |
| 10 | class TestPatternExtract: |
| 11 | def test_detects_need_to(self): |
| 12 | detector = ActionDetector() |
| 13 | items = detector.detect_from_transcript( |
| 14 | "We need to update the database schema before release." |
| 15 | ) |
| 16 | assert len(items) >= 1 |
| 17 | assert any("database" in i.action.lower() for i in items) |
| 18 | |
| 19 | def test_detects_should(self): |
| 20 | detector = ActionDetector() |
| @@ -21,11 +21,13 @@ | |
| 21 | items = detector.detect_from_transcript("Alice should review the pull request by Friday.") |
| 22 | assert len(items) >= 1 |
| 23 | |
| 24 | def test_detects_action_item_keyword(self): |
| 25 | detector = ActionDetector() |
| 26 | items = detector.detect_from_transcript( |
| 27 | "Action item: set up monitoring for the new service." |
| 28 | ) |
| 29 | assert len(items) >= 1 |
| 30 | |
| 31 | def test_detects_follow_up(self): |
| 32 | detector = ActionDetector() |
| 33 | items = detector.detect_from_transcript("Follow up with the client about requirements.") |
| @@ -41,22 +43,16 @@ | |
| 43 | items = detector.detect_from_transcript("Do it.") |
| 44 | assert len(items) == 0 |
| 45 | |
| 46 | def test_no_action_patterns(self): |
| 47 | detector = ActionDetector() |
| 48 | items = detector.detect_from_transcript("The weather was nice today. We had lunch at noon.") |
| 49 | assert len(items) == 0 |
| 50 | |
| 51 | def test_multiple_sentences(self): |
| 52 | detector = ActionDetector() |
| 53 | text = "We need to deploy the fix. Alice should test it first. The sky is blue." |
| 54 | items = detector.detect_from_transcript(text) |
| 55 | assert len(items) == 2 |
| 56 | |
| 57 | def test_source_is_transcript(self): |
| 58 | detector = ActionDetector() |
| @@ -66,14 +62,21 @@ | |
| 62 | |
| 63 | |
| 64 | class TestLLMExtract: |
| 65 | def test_llm_extraction(self): |
| 66 | pm = MagicMock() |
| 67 | pm.chat.return_value = json.dumps( |
| 68 | [ |
| 69 | { |
| 70 | "action": "Deploy new version", |
| 71 | "assignee": "Bob", |
| 72 | "deadline": "Friday", |
| 73 | "priority": "high", |
| 74 | "context": "Production release", |
| 75 | } |
| 76 | ] |
| 77 | ) |
| 78 | detector = ActionDetector(provider_manager=pm) |
| 79 | items = detector.detect_from_transcript("Deploy new version by Friday.") |
| 80 | assert len(items) == 1 |
| 81 | assert items[0].action == "Deploy new version" |
| 82 | assert items[0].assignee == "Bob" |
| @@ -102,28 +105,37 @@ | |
| 105 | items = detector.detect_from_transcript("Update the docs.") |
| 106 | assert items == [] |
| 107 | |
| 108 | def test_llm_skips_items_without_action(self): |
| 109 | pm = MagicMock() |
| 110 | pm.chat.return_value = json.dumps( |
| 111 | [ |
| 112 | {"action": "Valid action", "assignee": None}, |
| 113 | {"assignee": "Alice"}, # No action field |
| 114 | {"action": "", "assignee": "Bob"}, # Empty action |
| 115 | ] |
| 116 | ) |
| 117 | detector = ActionDetector(provider_manager=pm) |
| 118 | items = detector.detect_from_transcript("Some text.") |
| 119 | assert len(items) == 1 |
| 120 | assert items[0].action == "Valid action" |
| 121 | |
| 122 | |
| 123 | class TestDetectFromDiagrams: |
| 124 | def test_dict_diagrams(self): |
| 125 | pm = MagicMock() |
| 126 | pm.chat.return_value = json.dumps( |
| 127 | [ |
| 128 | { |
| 129 | "action": "Migrate database", |
| 130 | "assignee": None, |
| 131 | "deadline": None, |
| 132 | "priority": None, |
| 133 | "context": None, |
| 134 | }, |
| 135 | ] |
| 136 | ) |
| 137 | detector = ActionDetector(provider_manager=pm) |
| 138 | diagrams = [ |
| 139 | {"text_content": "Step 1: Migrate database", "elements": ["DB", "Migration"]}, |
| 140 | ] |
| 141 | items = detector.detect_from_diagrams(diagrams) |
| @@ -130,14 +142,21 @@ | |
| 142 | assert len(items) == 1 |
| 143 | assert items[0].source == "diagram" |
| 144 | |
| 145 | def test_object_diagrams(self): |
| 146 | pm = MagicMock() |
| 147 | pm.chat.return_value = json.dumps( |
| 148 | [ |
| 149 | { |
| 150 | "action": "Update API", |
| 151 | "assignee": None, |
| 152 | "deadline": None, |
| 153 | "priority": None, |
| 154 | "context": None, |
| 155 | }, |
| 156 | ] |
| 157 | ) |
| 158 | detector = ActionDetector(provider_manager=pm) |
| 159 | |
| 160 | class FakeDiagram: |
| 161 | text_content = "Update API endpoints" |
| 162 | elements = ["API", "Gateway"] |
| @@ -153,11 +172,14 @@ | |
| 172 | assert items == [] |
| 173 | |
| 174 | def test_pattern_fallback_for_diagrams(self): |
| 175 | detector = ActionDetector() # No provider |
| 176 | diagrams = [ |
| 177 | { |
| 178 | "text_content": "We need to update the configuration before deployment.", |
| 179 | "elements": [], |
| 180 | }, |
| 181 | ] |
| 182 | items = detector.detect_from_diagrams(diagrams) |
| 183 | assert len(items) >= 1 |
| 184 | assert items[0].source == "diagram" |
| 185 | |
| @@ -191,16 +213,18 @@ | |
| 213 | |
| 214 | |
| 215 | class TestAttachTimestamps: |
| 216 | def test_attaches_matching_segment(self): |
| 217 | detector = ActionDetector() |
| 218 | [ |
| 219 | ActionItem(action="We need to update the database schema before release"), |
| 220 | ] |
| 221 | segments = [ |
| 222 | TranscriptSegment(start=0.0, end=5.0, text="Welcome to the meeting."), |
| 223 | TranscriptSegment( |
| 224 | start=5.0, end=15.0, text="We need to update the database schema before release." |
| 225 | ), |
| 226 | TranscriptSegment(start=15.0, end=20.0, text="Any questions?"), |
| 227 | ] |
| 228 | detector.detect_from_transcript( |
| 229 | "We need to update the database schema before release.", |
| 230 | segments=segments, |
| 231 |
+9
-9
| --- tests/test_agent.py | ||
| +++ tests/test_agent.py | ||
| @@ -1,11 +1,9 @@ | ||
| 1 | 1 | """Tests for the agentic processing orchestrator.""" |
| 2 | 2 | |
| 3 | 3 | import json |
| 4 | -from unittest.mock import MagicMock, patch | |
| 5 | - | |
| 6 | -import pytest | |
| 4 | +from unittest.mock import MagicMock | |
| 7 | 5 | |
| 8 | 6 | from video_processor.agent.orchestrator import AgentOrchestrator |
| 9 | 7 | |
| 10 | 8 | |
| 11 | 9 | class TestPlanCreation: |
| @@ -99,16 +97,18 @@ | ||
| 99 | 97 | agent.insights.append("should not modify internal") |
| 100 | 98 | assert len(agent._insights) == 2 |
| 101 | 99 | |
| 102 | 100 | def test_deep_analysis_populates_insights(self): |
| 103 | 101 | pm = MagicMock() |
| 104 | - pm.chat.return_value = json.dumps({ | |
| 105 | - "decisions": ["Decided to use microservices"], | |
| 106 | - "risks": ["Timeline is tight"], | |
| 107 | - "follow_ups": [], | |
| 108 | - "tensions": [], | |
| 109 | - }) | |
| 102 | + pm.chat.return_value = json.dumps( | |
| 103 | + { | |
| 104 | + "decisions": ["Decided to use microservices"], | |
| 105 | + "risks": ["Timeline is tight"], | |
| 106 | + "follow_ups": [], | |
| 107 | + "tensions": [], | |
| 108 | + } | |
| 109 | + ) | |
| 110 | 110 | agent = AgentOrchestrator(provider_manager=pm) |
| 111 | 111 | agent._results["transcribe"] = {"text": "Some long transcript text here"} |
| 112 | 112 | result = agent._deep_analysis("/tmp") |
| 113 | 113 | assert "decisions" in result |
| 114 | 114 | assert any("microservices" in i for i in agent._insights) |
| 115 | 115 |
| --- tests/test_agent.py | |
| +++ tests/test_agent.py | |
| @@ -1,11 +1,9 @@ | |
| 1 | """Tests for the agentic processing orchestrator.""" |
| 2 | |
| 3 | import json |
| 4 | from unittest.mock import MagicMock, patch |
| 5 | |
| 6 | import pytest |
| 7 | |
| 8 | from video_processor.agent.orchestrator import AgentOrchestrator |
| 9 | |
| 10 | |
| 11 | class TestPlanCreation: |
| @@ -99,16 +97,18 @@ | |
| 99 | agent.insights.append("should not modify internal") |
| 100 | assert len(agent._insights) == 2 |
| 101 | |
| 102 | def test_deep_analysis_populates_insights(self): |
| 103 | pm = MagicMock() |
| 104 | pm.chat.return_value = json.dumps({ |
| 105 | "decisions": ["Decided to use microservices"], |
| 106 | "risks": ["Timeline is tight"], |
| 107 | "follow_ups": [], |
| 108 | "tensions": [], |
| 109 | }) |
| 110 | agent = AgentOrchestrator(provider_manager=pm) |
| 111 | agent._results["transcribe"] = {"text": "Some long transcript text here"} |
| 112 | result = agent._deep_analysis("/tmp") |
| 113 | assert "decisions" in result |
| 114 | assert any("microservices" in i for i in agent._insights) |
| 115 |
| --- tests/test_agent.py | |
| +++ tests/test_agent.py | |
| @@ -1,11 +1,9 @@ | |
| 1 | """Tests for the agentic processing orchestrator.""" |
| 2 | |
| 3 | import json |
| 4 | from unittest.mock import MagicMock |
| 5 | |
| 6 | from video_processor.agent.orchestrator import AgentOrchestrator |
| 7 | |
| 8 | |
| 9 | class TestPlanCreation: |
| @@ -99,16 +97,18 @@ | |
| 97 | agent.insights.append("should not modify internal") |
| 98 | assert len(agent._insights) == 2 |
| 99 | |
| 100 | def test_deep_analysis_populates_insights(self): |
| 101 | pm = MagicMock() |
| 102 | pm.chat.return_value = json.dumps( |
| 103 | { |
| 104 | "decisions": ["Decided to use microservices"], |
| 105 | "risks": ["Timeline is tight"], |
| 106 | "follow_ups": [], |
| 107 | "tensions": [], |
| 108 | } |
| 109 | ) |
| 110 | agent = AgentOrchestrator(provider_manager=pm) |
| 111 | agent._results["transcribe"] = {"text": "Some long transcript text here"} |
| 112 | result = agent._deep_analysis("/tmp") |
| 113 | assert "decisions" in result |
| 114 | assert any("microservices" in i for i in agent._insights) |
| 115 |
+1
-4
| --- tests/test_api_cache.py | ||
| +++ tests/test_api_cache.py | ||
| @@ -1,12 +1,9 @@ | ||
| 1 | 1 | """Tests for API response cache.""" |
| 2 | 2 | |
| 3 | -import json | |
| 4 | 3 | import time |
| 5 | 4 | |
| 6 | -import pytest | |
| 7 | - | |
| 8 | 5 | from video_processor.utils.api_cache import ApiCache |
| 9 | 6 | |
| 10 | 7 | |
| 11 | 8 | class TestApiCache: |
| 12 | 9 | def test_set_and_get(self, tmp_path): |
| @@ -71,13 +68,13 @@ | ||
| 71 | 68 | cache_b.set("key", "value_b") |
| 72 | 69 | assert cache_a.get("key") == "value_a" |
| 73 | 70 | assert cache_b.get("key") == "value_b" |
| 74 | 71 | |
| 75 | 72 | def test_creates_namespace_dir(self, tmp_path): |
| 76 | - cache = ApiCache(tmp_path / "sub", namespace="deep") | |
| 73 | + ApiCache(tmp_path / "sub", namespace="deep") | |
| 77 | 74 | assert (tmp_path / "sub" / "deep").exists() |
| 78 | 75 | |
| 79 | 76 | def test_cache_path_uses_hash(self, tmp_path): |
| 80 | 77 | cache = ApiCache(tmp_path, namespace="test") |
| 81 | 78 | path = cache.get_cache_path("my_key") |
| 82 | 79 | assert path.suffix == ".json" |
| 83 | 80 | assert path.parent.name == "test" |
| 84 | 81 |
| --- tests/test_api_cache.py | |
| +++ tests/test_api_cache.py | |
| @@ -1,12 +1,9 @@ | |
| 1 | """Tests for API response cache.""" |
| 2 | |
| 3 | import json |
| 4 | import time |
| 5 | |
| 6 | import pytest |
| 7 | |
| 8 | from video_processor.utils.api_cache import ApiCache |
| 9 | |
| 10 | |
| 11 | class TestApiCache: |
| 12 | def test_set_and_get(self, tmp_path): |
| @@ -71,13 +68,13 @@ | |
| 71 | cache_b.set("key", "value_b") |
| 72 | assert cache_a.get("key") == "value_a" |
| 73 | assert cache_b.get("key") == "value_b" |
| 74 | |
| 75 | def test_creates_namespace_dir(self, tmp_path): |
| 76 | cache = ApiCache(tmp_path / "sub", namespace="deep") |
| 77 | assert (tmp_path / "sub" / "deep").exists() |
| 78 | |
| 79 | def test_cache_path_uses_hash(self, tmp_path): |
| 80 | cache = ApiCache(tmp_path, namespace="test") |
| 81 | path = cache.get_cache_path("my_key") |
| 82 | assert path.suffix == ".json" |
| 83 | assert path.parent.name == "test" |
| 84 |
| --- tests/test_api_cache.py | |
| +++ tests/test_api_cache.py | |
| @@ -1,12 +1,9 @@ | |
| 1 | """Tests for API response cache.""" |
| 2 | |
| 3 | import time |
| 4 | |
| 5 | from video_processor.utils.api_cache import ApiCache |
| 6 | |
| 7 | |
| 8 | class TestApiCache: |
| 9 | def test_set_and_get(self, tmp_path): |
| @@ -71,13 +68,13 @@ | |
| 68 | cache_b.set("key", "value_b") |
| 69 | assert cache_a.get("key") == "value_a" |
| 70 | assert cache_b.get("key") == "value_b" |
| 71 | |
| 72 | def test_creates_namespace_dir(self, tmp_path): |
| 73 | ApiCache(tmp_path / "sub", namespace="deep") |
| 74 | assert (tmp_path / "sub" / "deep").exists() |
| 75 | |
| 76 | def test_cache_path_uses_hash(self, tmp_path): |
| 77 | cache = ApiCache(tmp_path, namespace="test") |
| 78 | path = cache.get_cache_path("my_key") |
| 79 | assert path.suffix == ".json" |
| 80 | assert path.parent.name == "test" |
| 81 |
+28
-34
| --- tests/test_audio_extractor.py | ||
| +++ tests/test_audio_extractor.py | ||
| @@ -1,65 +1,65 @@ | ||
| 1 | 1 | """Tests for the audio extractor module.""" |
| 2 | -import os | |
| 2 | + | |
| 3 | 3 | import tempfile |
| 4 | 4 | from pathlib import Path |
| 5 | -from unittest.mock import patch, MagicMock | |
| 5 | +from unittest.mock import MagicMock, patch | |
| 6 | 6 | |
| 7 | 7 | import numpy as np |
| 8 | -import pytest | |
| 9 | 8 | |
| 10 | 9 | from video_processor.extractors.audio_extractor import AudioExtractor |
| 10 | + | |
| 11 | 11 | |
| 12 | 12 | class TestAudioExtractor: |
| 13 | 13 | """Test suite for AudioExtractor class.""" |
| 14 | - | |
| 14 | + | |
| 15 | 15 | def test_init(self): |
| 16 | 16 | """Test initialization of AudioExtractor.""" |
| 17 | 17 | # Default parameters |
| 18 | 18 | extractor = AudioExtractor() |
| 19 | 19 | assert extractor.sample_rate == 16000 |
| 20 | 20 | assert extractor.mono is True |
| 21 | - | |
| 21 | + | |
| 22 | 22 | # Custom parameters |
| 23 | 23 | extractor = AudioExtractor(sample_rate=44100, mono=False) |
| 24 | 24 | assert extractor.sample_rate == 44100 |
| 25 | 25 | assert extractor.mono is False |
| 26 | - | |
| 27 | - @patch('subprocess.run') | |
| 26 | + | |
| 27 | + @patch("subprocess.run") | |
| 28 | 28 | def test_extract_audio(self, mock_run): |
| 29 | 29 | """Test audio extraction from video.""" |
| 30 | 30 | # Mock the subprocess.run call |
| 31 | 31 | mock_result = MagicMock() |
| 32 | 32 | mock_result.returncode = 0 |
| 33 | 33 | mock_run.return_value = mock_result |
| 34 | - | |
| 34 | + | |
| 35 | 35 | with tempfile.TemporaryDirectory() as temp_dir: |
| 36 | 36 | # Create a dummy video file |
| 37 | 37 | video_path = Path(temp_dir) / "test_video.mp4" |
| 38 | 38 | with open(video_path, "wb") as f: |
| 39 | 39 | f.write(b"dummy video content") |
| 40 | - | |
| 40 | + | |
| 41 | 41 | # Extract audio |
| 42 | 42 | extractor = AudioExtractor() |
| 43 | - | |
| 43 | + | |
| 44 | 44 | # Test with default output path |
| 45 | 45 | output_path = extractor.extract_audio(video_path) |
| 46 | 46 | assert output_path == video_path.with_suffix(".wav") |
| 47 | - | |
| 47 | + | |
| 48 | 48 | # Test with custom output path |
| 49 | 49 | custom_output = Path(temp_dir) / "custom_audio.wav" |
| 50 | 50 | output_path = extractor.extract_audio(video_path, custom_output) |
| 51 | 51 | assert output_path == custom_output |
| 52 | - | |
| 52 | + | |
| 53 | 53 | # Verify subprocess.run was called with correct arguments |
| 54 | 54 | mock_run.assert_called() |
| 55 | 55 | args, kwargs = mock_run.call_args |
| 56 | 56 | assert "ffmpeg" in args[0] |
| 57 | 57 | assert "-i" in args[0] |
| 58 | 58 | assert str(video_path) in args[0] |
| 59 | - | |
| 60 | - @patch('soundfile.info') | |
| 59 | + | |
| 60 | + @patch("soundfile.info") | |
| 61 | 61 | def test_get_audio_properties(self, mock_sf_info): |
| 62 | 62 | """Test getting audio properties.""" |
| 63 | 63 | # Mock soundfile.info |
| 64 | 64 | mock_info = MagicMock() |
| 65 | 65 | mock_info.duration = 10.5 |
| @@ -66,55 +66,49 @@ | ||
| 66 | 66 | mock_info.samplerate = 16000 |
| 67 | 67 | mock_info.channels = 1 |
| 68 | 68 | mock_info.format = "WAV" |
| 69 | 69 | mock_info.subtype = "PCM_16" |
| 70 | 70 | mock_sf_info.return_value = mock_info |
| 71 | - | |
| 71 | + | |
| 72 | 72 | with tempfile.TemporaryDirectory() as temp_dir: |
| 73 | 73 | # Create a dummy audio file |
| 74 | 74 | audio_path = Path(temp_dir) / "test_audio.wav" |
| 75 | 75 | with open(audio_path, "wb") as f: |
| 76 | 76 | f.write(b"dummy audio content") |
| 77 | - | |
| 77 | + | |
| 78 | 78 | # Get properties |
| 79 | 79 | extractor = AudioExtractor() |
| 80 | 80 | props = extractor.get_audio_properties(audio_path) |
| 81 | - | |
| 81 | + | |
| 82 | 82 | # Verify properties |
| 83 | 83 | assert props["duration"] == 10.5 |
| 84 | 84 | assert props["sample_rate"] == 16000 |
| 85 | 85 | assert props["channels"] == 1 |
| 86 | 86 | assert props["format"] == "WAV" |
| 87 | 87 | assert props["subtype"] == "PCM_16" |
| 88 | 88 | assert props["path"] == str(audio_path) |
| 89 | - | |
| 89 | + | |
| 90 | 90 | def test_segment_audio(self): |
| 91 | 91 | """Test audio segmentation.""" |
| 92 | 92 | # Create a dummy audio array (1 second at 16kHz) |
| 93 | 93 | audio_data = np.ones(16000) |
| 94 | 94 | sample_rate = 16000 |
| 95 | - | |
| 95 | + | |
| 96 | 96 | extractor = AudioExtractor() |
| 97 | - | |
| 97 | + | |
| 98 | 98 | # Test with 500ms segments, no overlap |
| 99 | 99 | segments = extractor.segment_audio( |
| 100 | - audio_data, | |
| 101 | - sample_rate, | |
| 102 | - segment_length_ms=500, | |
| 103 | - overlap_ms=0 | |
| 100 | + audio_data, sample_rate, segment_length_ms=500, overlap_ms=0 | |
| 104 | 101 | ) |
| 105 | - | |
| 102 | + | |
| 106 | 103 | # Should produce 2 segments of 8000 samples each |
| 107 | 104 | assert len(segments) == 2 |
| 108 | 105 | assert len(segments[0]) == 8000 |
| 109 | 106 | assert len(segments[1]) == 8000 |
| 110 | - | |
| 107 | + | |
| 111 | 108 | # Test with 600ms segments, 100ms overlap |
| 112 | 109 | segments = extractor.segment_audio( |
| 113 | - audio_data, | |
| 114 | - sample_rate, | |
| 115 | - segment_length_ms=600, | |
| 116 | - overlap_ms=100 | |
| 117 | - ) | |
| 118 | - | |
| 119 | - # Should produce 2 segments (with overlap) | |
| 120 | - assert len(segments) == 2 | |
| 110 | + audio_data, sample_rate, segment_length_ms=600, overlap_ms=100 | |
| 111 | + ) | |
| 112 | + | |
| 113 | + # Should produce 2 segments (with overlap) | |
| 114 | + assert len(segments) == 2 | |
| 121 | 115 |
| --- tests/test_audio_extractor.py | |
| +++ tests/test_audio_extractor.py | |
| @@ -1,65 +1,65 @@ | |
| 1 | """Tests for the audio extractor module.""" |
| 2 | import os |
| 3 | import tempfile |
| 4 | from pathlib import Path |
| 5 | from unittest.mock import patch, MagicMock |
| 6 | |
| 7 | import numpy as np |
| 8 | import pytest |
| 9 | |
| 10 | from video_processor.extractors.audio_extractor import AudioExtractor |
| 11 | |
| 12 | class TestAudioExtractor: |
| 13 | """Test suite for AudioExtractor class.""" |
| 14 | |
| 15 | def test_init(self): |
| 16 | """Test initialization of AudioExtractor.""" |
| 17 | # Default parameters |
| 18 | extractor = AudioExtractor() |
| 19 | assert extractor.sample_rate == 16000 |
| 20 | assert extractor.mono is True |
| 21 | |
| 22 | # Custom parameters |
| 23 | extractor = AudioExtractor(sample_rate=44100, mono=False) |
| 24 | assert extractor.sample_rate == 44100 |
| 25 | assert extractor.mono is False |
| 26 | |
| 27 | @patch('subprocess.run') |
| 28 | def test_extract_audio(self, mock_run): |
| 29 | """Test audio extraction from video.""" |
| 30 | # Mock the subprocess.run call |
| 31 | mock_result = MagicMock() |
| 32 | mock_result.returncode = 0 |
| 33 | mock_run.return_value = mock_result |
| 34 | |
| 35 | with tempfile.TemporaryDirectory() as temp_dir: |
| 36 | # Create a dummy video file |
| 37 | video_path = Path(temp_dir) / "test_video.mp4" |
| 38 | with open(video_path, "wb") as f: |
| 39 | f.write(b"dummy video content") |
| 40 | |
| 41 | # Extract audio |
| 42 | extractor = AudioExtractor() |
| 43 | |
| 44 | # Test with default output path |
| 45 | output_path = extractor.extract_audio(video_path) |
| 46 | assert output_path == video_path.with_suffix(".wav") |
| 47 | |
| 48 | # Test with custom output path |
| 49 | custom_output = Path(temp_dir) / "custom_audio.wav" |
| 50 | output_path = extractor.extract_audio(video_path, custom_output) |
| 51 | assert output_path == custom_output |
| 52 | |
| 53 | # Verify subprocess.run was called with correct arguments |
| 54 | mock_run.assert_called() |
| 55 | args, kwargs = mock_run.call_args |
| 56 | assert "ffmpeg" in args[0] |
| 57 | assert "-i" in args[0] |
| 58 | assert str(video_path) in args[0] |
| 59 | |
| 60 | @patch('soundfile.info') |
| 61 | def test_get_audio_properties(self, mock_sf_info): |
| 62 | """Test getting audio properties.""" |
| 63 | # Mock soundfile.info |
| 64 | mock_info = MagicMock() |
| 65 | mock_info.duration = 10.5 |
| @@ -66,55 +66,49 @@ | |
| 66 | mock_info.samplerate = 16000 |
| 67 | mock_info.channels = 1 |
| 68 | mock_info.format = "WAV" |
| 69 | mock_info.subtype = "PCM_16" |
| 70 | mock_sf_info.return_value = mock_info |
| 71 | |
| 72 | with tempfile.TemporaryDirectory() as temp_dir: |
| 73 | # Create a dummy audio file |
| 74 | audio_path = Path(temp_dir) / "test_audio.wav" |
| 75 | with open(audio_path, "wb") as f: |
| 76 | f.write(b"dummy audio content") |
| 77 | |
| 78 | # Get properties |
| 79 | extractor = AudioExtractor() |
| 80 | props = extractor.get_audio_properties(audio_path) |
| 81 | |
| 82 | # Verify properties |
| 83 | assert props["duration"] == 10.5 |
| 84 | assert props["sample_rate"] == 16000 |
| 85 | assert props["channels"] == 1 |
| 86 | assert props["format"] == "WAV" |
| 87 | assert props["subtype"] == "PCM_16" |
| 88 | assert props["path"] == str(audio_path) |
| 89 | |
| 90 | def test_segment_audio(self): |
| 91 | """Test audio segmentation.""" |
| 92 | # Create a dummy audio array (1 second at 16kHz) |
| 93 | audio_data = np.ones(16000) |
| 94 | sample_rate = 16000 |
| 95 | |
| 96 | extractor = AudioExtractor() |
| 97 | |
| 98 | # Test with 500ms segments, no overlap |
| 99 | segments = extractor.segment_audio( |
| 100 | audio_data, |
| 101 | sample_rate, |
| 102 | segment_length_ms=500, |
| 103 | overlap_ms=0 |
| 104 | ) |
| 105 | |
| 106 | # Should produce 2 segments of 8000 samples each |
| 107 | assert len(segments) == 2 |
| 108 | assert len(segments[0]) == 8000 |
| 109 | assert len(segments[1]) == 8000 |
| 110 | |
| 111 | # Test with 600ms segments, 100ms overlap |
| 112 | segments = extractor.segment_audio( |
| 113 | audio_data, |
| 114 | sample_rate, |
| 115 | segment_length_ms=600, |
| 116 | overlap_ms=100 |
| 117 | ) |
| 118 | |
| 119 | # Should produce 2 segments (with overlap) |
| 120 | assert len(segments) == 2 |
| 121 |
| --- tests/test_audio_extractor.py | |
| +++ tests/test_audio_extractor.py | |
| @@ -1,65 +1,65 @@ | |
| 1 | """Tests for the audio extractor module.""" |
| 2 | |
| 3 | import tempfile |
| 4 | from pathlib import Path |
| 5 | from unittest.mock import MagicMock, patch |
| 6 | |
| 7 | import numpy as np |
| 8 | |
| 9 | from video_processor.extractors.audio_extractor import AudioExtractor |
| 10 | |
| 11 | |
| 12 | class TestAudioExtractor: |
| 13 | """Test suite for AudioExtractor class.""" |
| 14 | |
| 15 | def test_init(self): |
| 16 | """Test initialization of AudioExtractor.""" |
| 17 | # Default parameters |
| 18 | extractor = AudioExtractor() |
| 19 | assert extractor.sample_rate == 16000 |
| 20 | assert extractor.mono is True |
| 21 | |
| 22 | # Custom parameters |
| 23 | extractor = AudioExtractor(sample_rate=44100, mono=False) |
| 24 | assert extractor.sample_rate == 44100 |
| 25 | assert extractor.mono is False |
| 26 | |
| 27 | @patch("subprocess.run") |
| 28 | def test_extract_audio(self, mock_run): |
| 29 | """Test audio extraction from video.""" |
| 30 | # Mock the subprocess.run call |
| 31 | mock_result = MagicMock() |
| 32 | mock_result.returncode = 0 |
| 33 | mock_run.return_value = mock_result |
| 34 | |
| 35 | with tempfile.TemporaryDirectory() as temp_dir: |
| 36 | # Create a dummy video file |
| 37 | video_path = Path(temp_dir) / "test_video.mp4" |
| 38 | with open(video_path, "wb") as f: |
| 39 | f.write(b"dummy video content") |
| 40 | |
| 41 | # Extract audio |
| 42 | extractor = AudioExtractor() |
| 43 | |
| 44 | # Test with default output path |
| 45 | output_path = extractor.extract_audio(video_path) |
| 46 | assert output_path == video_path.with_suffix(".wav") |
| 47 | |
| 48 | # Test with custom output path |
| 49 | custom_output = Path(temp_dir) / "custom_audio.wav" |
| 50 | output_path = extractor.extract_audio(video_path, custom_output) |
| 51 | assert output_path == custom_output |
| 52 | |
| 53 | # Verify subprocess.run was called with correct arguments |
| 54 | mock_run.assert_called() |
| 55 | args, kwargs = mock_run.call_args |
| 56 | assert "ffmpeg" in args[0] |
| 57 | assert "-i" in args[0] |
| 58 | assert str(video_path) in args[0] |
| 59 | |
| 60 | @patch("soundfile.info") |
| 61 | def test_get_audio_properties(self, mock_sf_info): |
| 62 | """Test getting audio properties.""" |
| 63 | # Mock soundfile.info |
| 64 | mock_info = MagicMock() |
| 65 | mock_info.duration = 10.5 |
| @@ -66,55 +66,49 @@ | |
| 66 | mock_info.samplerate = 16000 |
| 67 | mock_info.channels = 1 |
| 68 | mock_info.format = "WAV" |
| 69 | mock_info.subtype = "PCM_16" |
| 70 | mock_sf_info.return_value = mock_info |
| 71 | |
| 72 | with tempfile.TemporaryDirectory() as temp_dir: |
| 73 | # Create a dummy audio file |
| 74 | audio_path = Path(temp_dir) / "test_audio.wav" |
| 75 | with open(audio_path, "wb") as f: |
| 76 | f.write(b"dummy audio content") |
| 77 | |
| 78 | # Get properties |
| 79 | extractor = AudioExtractor() |
| 80 | props = extractor.get_audio_properties(audio_path) |
| 81 | |
| 82 | # Verify properties |
| 83 | assert props["duration"] == 10.5 |
| 84 | assert props["sample_rate"] == 16000 |
| 85 | assert props["channels"] == 1 |
| 86 | assert props["format"] == "WAV" |
| 87 | assert props["subtype"] == "PCM_16" |
| 88 | assert props["path"] == str(audio_path) |
| 89 | |
| 90 | def test_segment_audio(self): |
| 91 | """Test audio segmentation.""" |
| 92 | # Create a dummy audio array (1 second at 16kHz) |
| 93 | audio_data = np.ones(16000) |
| 94 | sample_rate = 16000 |
| 95 | |
| 96 | extractor = AudioExtractor() |
| 97 | |
| 98 | # Test with 500ms segments, no overlap |
| 99 | segments = extractor.segment_audio( |
| 100 | audio_data, sample_rate, segment_length_ms=500, overlap_ms=0 |
| 101 | ) |
| 102 | |
| 103 | # Should produce 2 segments of 8000 samples each |
| 104 | assert len(segments) == 2 |
| 105 | assert len(segments[0]) == 8000 |
| 106 | assert len(segments[1]) == 8000 |
| 107 | |
| 108 | # Test with 600ms segments, 100ms overlap |
| 109 | segments = extractor.segment_audio( |
| 110 | audio_data, sample_rate, segment_length_ms=600, overlap_ms=100 |
| 111 | ) |
| 112 | |
| 113 | # Should produce 2 segments (with overlap) |
| 114 | assert len(segments) == 2 |
| 115 |
-3
| --- tests/test_batch.py | ||
| +++ tests/test_batch.py | ||
| @@ -1,11 +1,8 @@ | ||
| 1 | 1 | """Tests for batch processing and knowledge graph merging.""" |
| 2 | 2 | |
| 3 | 3 | import json |
| 4 | -from pathlib import Path | |
| 5 | - | |
| 6 | -import pytest | |
| 7 | 4 | |
| 8 | 5 | from video_processor.integrators.knowledge_graph import KnowledgeGraph |
| 9 | 6 | from video_processor.integrators.plan_generator import PlanGenerator |
| 10 | 7 | from video_processor.models import ( |
| 11 | 8 | ActionItem, |
| 12 | 9 |
| --- tests/test_batch.py | |
| +++ tests/test_batch.py | |
| @@ -1,11 +1,8 @@ | |
| 1 | """Tests for batch processing and knowledge graph merging.""" |
| 2 | |
| 3 | import json |
| 4 | from pathlib import Path |
| 5 | |
| 6 | import pytest |
| 7 | |
| 8 | from video_processor.integrators.knowledge_graph import KnowledgeGraph |
| 9 | from video_processor.integrators.plan_generator import PlanGenerator |
| 10 | from video_processor.models import ( |
| 11 | ActionItem, |
| 12 |
| --- tests/test_batch.py | |
| +++ tests/test_batch.py | |
| @@ -1,11 +1,8 @@ | |
| 1 | """Tests for batch processing and knowledge graph merging.""" |
| 2 | |
| 3 | import json |
| 4 | |
| 5 | from video_processor.integrators.knowledge_graph import KnowledgeGraph |
| 6 | from video_processor.integrators.plan_generator import PlanGenerator |
| 7 | from video_processor.models import ( |
| 8 | ActionItem, |
| 9 |
+13
-6
| --- tests/test_cloud_sources.py | ||
| +++ tests/test_cloud_sources.py | ||
| @@ -134,11 +134,13 @@ | ||
| 134 | 134 | @patch("video_processor.sources.google_drive.GoogleDriveSource._auth_service_account") |
| 135 | 135 | def test_authenticate_import_error(self, mock_auth): |
| 136 | 136 | from video_processor.sources.google_drive import GoogleDriveSource |
| 137 | 137 | |
| 138 | 138 | source = GoogleDriveSource() |
| 139 | - with patch.dict("sys.modules", {"google.oauth2": None, "google.oauth2.service_account": None}): | |
| 139 | + with patch.dict( | |
| 140 | + "sys.modules", {"google.oauth2": None, "google.oauth2.service_account": None} | |
| 141 | + ): | |
| 140 | 142 | # The import will fail inside authenticate |
| 141 | 143 | result = source.authenticate() |
| 142 | 144 | assert result is False |
| 143 | 145 | |
| 144 | 146 | |
| @@ -188,19 +190,24 @@ | ||
| 188 | 190 | def test_auth_saved_token(self, tmp_path): |
| 189 | 191 | pytest.importorskip("dropbox") |
| 190 | 192 | from video_processor.sources.dropbox_source import DropboxSource |
| 191 | 193 | |
| 192 | 194 | token_file = tmp_path / "token.json" |
| 193 | - token_file.write_text(json.dumps({ | |
| 194 | - "refresh_token": "rt_test", | |
| 195 | - "app_key": "key", | |
| 196 | - "app_secret": "secret", | |
| 197 | - })) | |
| 195 | + token_file.write_text( | |
| 196 | + json.dumps( | |
| 197 | + { | |
| 198 | + "refresh_token": "rt_test", | |
| 199 | + "app_key": "key", | |
| 200 | + "app_secret": "secret", | |
| 201 | + } | |
| 202 | + ) | |
| 203 | + ) | |
| 198 | 204 | |
| 199 | 205 | source = DropboxSource(token_path=token_file, app_key="key", app_secret="secret") |
| 200 | 206 | |
| 201 | 207 | mock_dbx = MagicMock() |
| 202 | 208 | with patch("dropbox.Dropbox", return_value=mock_dbx): |
| 203 | 209 | import dropbox |
| 210 | + | |
| 204 | 211 | result = source._auth_saved_token(dropbox) |
| 205 | 212 | assert result is True |
| 206 | 213 | assert source.dbx is mock_dbx |
| 207 | 214 |
| --- tests/test_cloud_sources.py | |
| +++ tests/test_cloud_sources.py | |
| @@ -134,11 +134,13 @@ | |
| 134 | @patch("video_processor.sources.google_drive.GoogleDriveSource._auth_service_account") |
| 135 | def test_authenticate_import_error(self, mock_auth): |
| 136 | from video_processor.sources.google_drive import GoogleDriveSource |
| 137 | |
| 138 | source = GoogleDriveSource() |
| 139 | with patch.dict("sys.modules", {"google.oauth2": None, "google.oauth2.service_account": None}): |
| 140 | # The import will fail inside authenticate |
| 141 | result = source.authenticate() |
| 142 | assert result is False |
| 143 | |
| 144 | |
| @@ -188,19 +190,24 @@ | |
| 188 | def test_auth_saved_token(self, tmp_path): |
| 189 | pytest.importorskip("dropbox") |
| 190 | from video_processor.sources.dropbox_source import DropboxSource |
| 191 | |
| 192 | token_file = tmp_path / "token.json" |
| 193 | token_file.write_text(json.dumps({ |
| 194 | "refresh_token": "rt_test", |
| 195 | "app_key": "key", |
| 196 | "app_secret": "secret", |
| 197 | })) |
| 198 | |
| 199 | source = DropboxSource(token_path=token_file, app_key="key", app_secret="secret") |
| 200 | |
| 201 | mock_dbx = MagicMock() |
| 202 | with patch("dropbox.Dropbox", return_value=mock_dbx): |
| 203 | import dropbox |
| 204 | result = source._auth_saved_token(dropbox) |
| 205 | assert result is True |
| 206 | assert source.dbx is mock_dbx |
| 207 |
| --- tests/test_cloud_sources.py | |
| +++ tests/test_cloud_sources.py | |
| @@ -134,11 +134,13 @@ | |
| 134 | @patch("video_processor.sources.google_drive.GoogleDriveSource._auth_service_account") |
| 135 | def test_authenticate_import_error(self, mock_auth): |
| 136 | from video_processor.sources.google_drive import GoogleDriveSource |
| 137 | |
| 138 | source = GoogleDriveSource() |
| 139 | with patch.dict( |
| 140 | "sys.modules", {"google.oauth2": None, "google.oauth2.service_account": None} |
| 141 | ): |
| 142 | # The import will fail inside authenticate |
| 143 | result = source.authenticate() |
| 144 | assert result is False |
| 145 | |
| 146 | |
| @@ -188,19 +190,24 @@ | |
| 190 | def test_auth_saved_token(self, tmp_path): |
| 191 | pytest.importorskip("dropbox") |
| 192 | from video_processor.sources.dropbox_source import DropboxSource |
| 193 | |
| 194 | token_file = tmp_path / "token.json" |
| 195 | token_file.write_text( |
| 196 | json.dumps( |
| 197 | { |
| 198 | "refresh_token": "rt_test", |
| 199 | "app_key": "key", |
| 200 | "app_secret": "secret", |
| 201 | } |
| 202 | ) |
| 203 | ) |
| 204 | |
| 205 | source = DropboxSource(token_path=token_file, app_key="key", app_secret="secret") |
| 206 | |
| 207 | mock_dbx = MagicMock() |
| 208 | with patch("dropbox.Dropbox", return_value=mock_dbx): |
| 209 | import dropbox |
| 210 | |
| 211 | result = source._auth_saved_token(dropbox) |
| 212 | assert result is True |
| 213 | assert source.dbx is mock_dbx |
| 214 |
+9
-7
| --- tests/test_content_analyzer.py | ||
| +++ tests/test_content_analyzer.py | ||
| @@ -1,11 +1,9 @@ | ||
| 1 | 1 | """Tests for content cross-referencing between transcript and diagram entities.""" |
| 2 | 2 | |
| 3 | 3 | import json |
| 4 | -from unittest.mock import MagicMock, patch | |
| 5 | - | |
| 6 | -import pytest | |
| 4 | +from unittest.mock import MagicMock | |
| 7 | 5 | |
| 8 | 6 | from video_processor.analyzers.content_analyzer import ContentAnalyzer |
| 9 | 7 | from video_processor.models import Entity, KeyPoint |
| 10 | 8 | |
| 11 | 9 | |
| @@ -74,13 +72,15 @@ | ||
| 74 | 72 | |
| 75 | 73 | |
| 76 | 74 | class TestFuzzyMatch: |
| 77 | 75 | def test_fuzzy_match_with_llm(self): |
| 78 | 76 | pm = MagicMock() |
| 79 | - pm.chat.return_value = json.dumps([ | |
| 80 | - {"transcript": "K8s", "diagram": "Kubernetes"}, | |
| 81 | - ]) | |
| 77 | + pm.chat.return_value = json.dumps( | |
| 78 | + [ | |
| 79 | + {"transcript": "K8s", "diagram": "Kubernetes"}, | |
| 80 | + ] | |
| 81 | + ) | |
| 82 | 82 | analyzer = ContentAnalyzer(provider_manager=pm) |
| 83 | 83 | |
| 84 | 84 | t_entities = [ |
| 85 | 85 | Entity(name="K8s", type="technology", descriptions=["Container orchestration"]), |
| 86 | 86 | ] |
| @@ -189,11 +189,13 @@ | ||
| 189 | 189 | assert len(result[0].related_diagrams) == 2 |
| 190 | 190 | |
| 191 | 191 | def test_details_used_for_matching(self): |
| 192 | 192 | analyzer = ContentAnalyzer() |
| 193 | 193 | kps = [ |
| 194 | - KeyPoint(point="Architecture overview", details="Uses Docker and Kubernetes for deployment"), | |
| 194 | + KeyPoint( | |
| 195 | + point="Architecture overview", details="Uses Docker and Kubernetes for deployment" | |
| 196 | + ), | |
| 195 | 197 | ] |
| 196 | 198 | diagrams = [ |
| 197 | 199 | {"elements": ["Docker", "Kubernetes"], "text_content": "deployment infrastructure"}, |
| 198 | 200 | ] |
| 199 | 201 | result = analyzer.enrich_key_points(kps, diagrams, "") |
| 200 | 202 |
| --- tests/test_content_analyzer.py | |
| +++ tests/test_content_analyzer.py | |
| @@ -1,11 +1,9 @@ | |
| 1 | """Tests for content cross-referencing between transcript and diagram entities.""" |
| 2 | |
| 3 | import json |
| 4 | from unittest.mock import MagicMock, patch |
| 5 | |
| 6 | import pytest |
| 7 | |
| 8 | from video_processor.analyzers.content_analyzer import ContentAnalyzer |
| 9 | from video_processor.models import Entity, KeyPoint |
| 10 | |
| 11 | |
| @@ -74,13 +72,15 @@ | |
| 74 | |
| 75 | |
| 76 | class TestFuzzyMatch: |
| 77 | def test_fuzzy_match_with_llm(self): |
| 78 | pm = MagicMock() |
| 79 | pm.chat.return_value = json.dumps([ |
| 80 | {"transcript": "K8s", "diagram": "Kubernetes"}, |
| 81 | ]) |
| 82 | analyzer = ContentAnalyzer(provider_manager=pm) |
| 83 | |
| 84 | t_entities = [ |
| 85 | Entity(name="K8s", type="technology", descriptions=["Container orchestration"]), |
| 86 | ] |
| @@ -189,11 +189,13 @@ | |
| 189 | assert len(result[0].related_diagrams) == 2 |
| 190 | |
| 191 | def test_details_used_for_matching(self): |
| 192 | analyzer = ContentAnalyzer() |
| 193 | kps = [ |
| 194 | KeyPoint(point="Architecture overview", details="Uses Docker and Kubernetes for deployment"), |
| 195 | ] |
| 196 | diagrams = [ |
| 197 | {"elements": ["Docker", "Kubernetes"], "text_content": "deployment infrastructure"}, |
| 198 | ] |
| 199 | result = analyzer.enrich_key_points(kps, diagrams, "") |
| 200 |
| --- tests/test_content_analyzer.py | |
| +++ tests/test_content_analyzer.py | |
| @@ -1,11 +1,9 @@ | |
| 1 | """Tests for content cross-referencing between transcript and diagram entities.""" |
| 2 | |
| 3 | import json |
| 4 | from unittest.mock import MagicMock |
| 5 | |
| 6 | from video_processor.analyzers.content_analyzer import ContentAnalyzer |
| 7 | from video_processor.models import Entity, KeyPoint |
| 8 | |
| 9 | |
| @@ -74,13 +72,15 @@ | |
| 72 | |
| 73 | |
| 74 | class TestFuzzyMatch: |
| 75 | def test_fuzzy_match_with_llm(self): |
| 76 | pm = MagicMock() |
| 77 | pm.chat.return_value = json.dumps( |
| 78 | [ |
| 79 | {"transcript": "K8s", "diagram": "Kubernetes"}, |
| 80 | ] |
| 81 | ) |
| 82 | analyzer = ContentAnalyzer(provider_manager=pm) |
| 83 | |
| 84 | t_entities = [ |
| 85 | Entity(name="K8s", type="technology", descriptions=["Container orchestration"]), |
| 86 | ] |
| @@ -189,11 +189,13 @@ | |
| 189 | assert len(result[0].related_diagrams) == 2 |
| 190 | |
| 191 | def test_details_used_for_matching(self): |
| 192 | analyzer = ContentAnalyzer() |
| 193 | kps = [ |
| 194 | KeyPoint( |
| 195 | point="Architecture overview", details="Uses Docker and Kubernetes for deployment" |
| 196 | ), |
| 197 | ] |
| 198 | diagrams = [ |
| 199 | {"elements": ["Docker", "Kubernetes"], "text_content": "deployment infrastructure"}, |
| 200 | ] |
| 201 | result = analyzer.enrich_key_points(kps, diagrams, "") |
| 202 |
+79
-42
| --- tests/test_diagram_analyzer.py | ||
| +++ tests/test_diagram_analyzer.py | ||
| @@ -1,18 +1,17 @@ | ||
| 1 | 1 | """Tests for the rewritten diagram analyzer.""" |
| 2 | 2 | |
| 3 | 3 | import json |
| 4 | -from pathlib import Path | |
| 5 | -from unittest.mock import MagicMock, patch | |
| 4 | +from unittest.mock import MagicMock | |
| 6 | 5 | |
| 7 | 6 | import pytest |
| 8 | 7 | |
| 9 | 8 | from video_processor.analyzers.diagram_analyzer import ( |
| 10 | 9 | DiagramAnalyzer, |
| 11 | 10 | _parse_json_response, |
| 12 | 11 | ) |
| 13 | -from video_processor.models import DiagramResult, DiagramType, ScreenCapture | |
| 12 | +from video_processor.models import DiagramType | |
| 14 | 13 | |
| 15 | 14 | |
| 16 | 15 | class TestParseJsonResponse: |
| 17 | 16 | def test_plain_json(self): |
| 18 | 17 | result = _parse_json_response('{"key": "value"}') |
| @@ -50,27 +49,31 @@ | ||
| 50 | 49 | fp = tmp_path / "frame_0.jpg" |
| 51 | 50 | fp.write_bytes(b"\xff\xd8\xff fake image data") |
| 52 | 51 | return fp |
| 53 | 52 | |
| 54 | 53 | def test_classify_frame_diagram(self, analyzer, mock_pm, fake_frame): |
| 55 | - mock_pm.analyze_image.return_value = json.dumps({ | |
| 56 | - "is_diagram": True, | |
| 57 | - "diagram_type": "flowchart", | |
| 58 | - "confidence": 0.85, | |
| 59 | - "brief_description": "A flowchart showing login process" | |
| 60 | - }) | |
| 54 | + mock_pm.analyze_image.return_value = json.dumps( | |
| 55 | + { | |
| 56 | + "is_diagram": True, | |
| 57 | + "diagram_type": "flowchart", | |
| 58 | + "confidence": 0.85, | |
| 59 | + "brief_description": "A flowchart showing login process", | |
| 60 | + } | |
| 61 | + ) | |
| 61 | 62 | result = analyzer.classify_frame(fake_frame) |
| 62 | 63 | assert result["is_diagram"] is True |
| 63 | 64 | assert result["confidence"] == 0.85 |
| 64 | 65 | |
| 65 | 66 | def test_classify_frame_not_diagram(self, analyzer, mock_pm, fake_frame): |
| 66 | - mock_pm.analyze_image.return_value = json.dumps({ | |
| 67 | - "is_diagram": False, | |
| 68 | - "diagram_type": "unknown", | |
| 69 | - "confidence": 0.1, | |
| 70 | - "brief_description": "A person speaking" | |
| 71 | - }) | |
| 67 | + mock_pm.analyze_image.return_value = json.dumps( | |
| 68 | + { | |
| 69 | + "is_diagram": False, | |
| 70 | + "diagram_type": "unknown", | |
| 71 | + "confidence": 0.1, | |
| 72 | + "brief_description": "A person speaking", | |
| 73 | + } | |
| 74 | + ) | |
| 72 | 75 | result = analyzer.classify_frame(fake_frame) |
| 73 | 76 | assert result["is_diagram"] is False |
| 74 | 77 | |
| 75 | 78 | def test_classify_frame_failure(self, analyzer, mock_pm, fake_frame): |
| 76 | 79 | mock_pm.analyze_image.return_value = "I cannot parse this image" |
| @@ -77,19 +80,21 @@ | ||
| 77 | 80 | result = analyzer.classify_frame(fake_frame) |
| 78 | 81 | assert result["is_diagram"] is False |
| 79 | 82 | assert result["confidence"] == 0.0 |
| 80 | 83 | |
| 81 | 84 | def test_analyze_single_pass(self, analyzer, mock_pm, fake_frame): |
| 82 | - mock_pm.analyze_image.return_value = json.dumps({ | |
| 83 | - "diagram_type": "architecture", | |
| 84 | - "description": "Microservices architecture", | |
| 85 | - "text_content": "Service A, Service B", | |
| 86 | - "elements": ["Service A", "Service B"], | |
| 87 | - "relationships": ["A -> B: calls"], | |
| 88 | - "mermaid": "graph LR\n A-->B", | |
| 89 | - "chart_data": None | |
| 90 | - }) | |
| 85 | + mock_pm.analyze_image.return_value = json.dumps( | |
| 86 | + { | |
| 87 | + "diagram_type": "architecture", | |
| 88 | + "description": "Microservices architecture", | |
| 89 | + "text_content": "Service A, Service B", | |
| 90 | + "elements": ["Service A", "Service B"], | |
| 91 | + "relationships": ["A -> B: calls"], | |
| 92 | + "mermaid": "graph LR\n A-->B", | |
| 93 | + "chart_data": None, | |
| 94 | + } | |
| 95 | + ) | |
| 91 | 96 | result = analyzer.analyze_diagram_single_pass(fake_frame) |
| 92 | 97 | assert result["diagram_type"] == "architecture" |
| 93 | 98 | assert result["mermaid"] == "graph LR\n A-->B" |
| 94 | 99 | |
| 95 | 100 | def test_process_frames_high_confidence_diagram(self, analyzer, mock_pm, tmp_path): |
| @@ -105,38 +110,62 @@ | ||
| 105 | 110 | |
| 106 | 111 | # Frame 0: high confidence diagram |
| 107 | 112 | # Frame 1: low confidence (skip) |
| 108 | 113 | # Frame 2: medium confidence (screengrab) |
| 109 | 114 | classify_responses = [ |
| 110 | - json.dumps({"is_diagram": True, "diagram_type": "flowchart", "confidence": 0.9, "brief_description": "flow"}), | |
| 111 | - json.dumps({"is_diagram": False, "diagram_type": "unknown", "confidence": 0.1, "brief_description": "nothing"}), | |
| 112 | - json.dumps({"is_diagram": True, "diagram_type": "slide", "confidence": 0.5, "brief_description": "a slide"}), | |
| 113 | - ] | |
| 114 | - analysis_response = json.dumps({ | |
| 115 | - "diagram_type": "flowchart", | |
| 116 | - "description": "Login flow", | |
| 117 | - "text_content": "Start -> End", | |
| 118 | - "elements": ["Start", "End"], | |
| 119 | - "relationships": ["Start -> End"], | |
| 120 | - "mermaid": "graph LR\n Start-->End", | |
| 121 | - "chart_data": None | |
| 122 | - }) | |
| 115 | + json.dumps( | |
| 116 | + { | |
| 117 | + "is_diagram": True, | |
| 118 | + "diagram_type": "flowchart", | |
| 119 | + "confidence": 0.9, | |
| 120 | + "brief_description": "flow", | |
| 121 | + } | |
| 122 | + ), | |
| 123 | + json.dumps( | |
| 124 | + { | |
| 125 | + "is_diagram": False, | |
| 126 | + "diagram_type": "unknown", | |
| 127 | + "confidence": 0.1, | |
| 128 | + "brief_description": "nothing", | |
| 129 | + } | |
| 130 | + ), | |
| 131 | + json.dumps( | |
| 132 | + { | |
| 133 | + "is_diagram": True, | |
| 134 | + "diagram_type": "slide", | |
| 135 | + "confidence": 0.5, | |
| 136 | + "brief_description": "a slide", | |
| 137 | + } | |
| 138 | + ), | |
| 139 | + ] | |
| 140 | + analysis_response = json.dumps( | |
| 141 | + { | |
| 142 | + "diagram_type": "flowchart", | |
| 143 | + "description": "Login flow", | |
| 144 | + "text_content": "Start -> End", | |
| 145 | + "elements": ["Start", "End"], | |
| 146 | + "relationships": ["Start -> End"], | |
| 147 | + "mermaid": "graph LR\n Start-->End", | |
| 148 | + "chart_data": None, | |
| 149 | + } | |
| 150 | + ) | |
| 123 | 151 | |
| 124 | 152 | # Calls are interleaved per-frame: |
| 125 | 153 | # call 0: classify frame 0 (high conf) |
| 126 | 154 | # call 1: analyze frame 0 (full analysis) |
| 127 | 155 | # call 2: classify frame 1 (low conf - skip) |
| 128 | 156 | # call 3: classify frame 2 (medium conf) |
| 129 | 157 | # call 4: caption frame 2 (screengrab) |
| 130 | 158 | call_sequence = [ |
| 131 | - classify_responses[0], # classify frame 0 | |
| 132 | - analysis_response, # analyze frame 0 | |
| 133 | - classify_responses[1], # classify frame 1 | |
| 134 | - classify_responses[2], # classify frame 2 | |
| 159 | + classify_responses[0], # classify frame 0 | |
| 160 | + analysis_response, # analyze frame 0 | |
| 161 | + classify_responses[1], # classify frame 1 | |
| 162 | + classify_responses[2], # classify frame 2 | |
| 135 | 163 | "A slide about something", # caption frame 2 |
| 136 | 164 | ] |
| 137 | 165 | call_count = [0] |
| 166 | + | |
| 138 | 167 | def side_effect(image_bytes, prompt, max_tokens=4096): |
| 139 | 168 | idx = call_count[0] |
| 140 | 169 | call_count[0] += 1 |
| 141 | 170 | return call_sequence[idx] |
| 142 | 171 | |
| @@ -164,15 +193,23 @@ | ||
| 164 | 193 | fp.write_bytes(b"\xff\xd8\xff fake") |
| 165 | 194 | captures_dir = tmp_path / "captures" |
| 166 | 195 | |
| 167 | 196 | # High confidence classification but analysis fails |
| 168 | 197 | call_count = [0] |
| 198 | + | |
| 169 | 199 | def side_effect(image_bytes, prompt, max_tokens=4096): |
| 170 | 200 | idx = call_count[0] |
| 171 | 201 | call_count[0] += 1 |
| 172 | 202 | if idx == 0: |
| 173 | - return json.dumps({"is_diagram": True, "diagram_type": "chart", "confidence": 0.8, "brief_description": "chart"}) | |
| 203 | + return json.dumps( | |
| 204 | + { | |
| 205 | + "is_diagram": True, | |
| 206 | + "diagram_type": "chart", | |
| 207 | + "confidence": 0.8, | |
| 208 | + "brief_description": "chart", | |
| 209 | + } | |
| 210 | + ) | |
| 174 | 211 | if idx == 1: |
| 175 | 212 | return "This is not valid JSON" # Analysis fails |
| 176 | 213 | return "A chart showing data" # Caption |
| 177 | 214 | |
| 178 | 215 | mock_pm.analyze_image.side_effect = side_effect |
| 179 | 216 |
| --- tests/test_diagram_analyzer.py | |
| +++ tests/test_diagram_analyzer.py | |
| @@ -1,18 +1,17 @@ | |
| 1 | """Tests for the rewritten diagram analyzer.""" |
| 2 | |
| 3 | import json |
| 4 | from pathlib import Path |
| 5 | from unittest.mock import MagicMock, patch |
| 6 | |
| 7 | import pytest |
| 8 | |
| 9 | from video_processor.analyzers.diagram_analyzer import ( |
| 10 | DiagramAnalyzer, |
| 11 | _parse_json_response, |
| 12 | ) |
| 13 | from video_processor.models import DiagramResult, DiagramType, ScreenCapture |
| 14 | |
| 15 | |
| 16 | class TestParseJsonResponse: |
| 17 | def test_plain_json(self): |
| 18 | result = _parse_json_response('{"key": "value"}') |
| @@ -50,27 +49,31 @@ | |
| 50 | fp = tmp_path / "frame_0.jpg" |
| 51 | fp.write_bytes(b"\xff\xd8\xff fake image data") |
| 52 | return fp |
| 53 | |
| 54 | def test_classify_frame_diagram(self, analyzer, mock_pm, fake_frame): |
| 55 | mock_pm.analyze_image.return_value = json.dumps({ |
| 56 | "is_diagram": True, |
| 57 | "diagram_type": "flowchart", |
| 58 | "confidence": 0.85, |
| 59 | "brief_description": "A flowchart showing login process" |
| 60 | }) |
| 61 | result = analyzer.classify_frame(fake_frame) |
| 62 | assert result["is_diagram"] is True |
| 63 | assert result["confidence"] == 0.85 |
| 64 | |
| 65 | def test_classify_frame_not_diagram(self, analyzer, mock_pm, fake_frame): |
| 66 | mock_pm.analyze_image.return_value = json.dumps({ |
| 67 | "is_diagram": False, |
| 68 | "diagram_type": "unknown", |
| 69 | "confidence": 0.1, |
| 70 | "brief_description": "A person speaking" |
| 71 | }) |
| 72 | result = analyzer.classify_frame(fake_frame) |
| 73 | assert result["is_diagram"] is False |
| 74 | |
| 75 | def test_classify_frame_failure(self, analyzer, mock_pm, fake_frame): |
| 76 | mock_pm.analyze_image.return_value = "I cannot parse this image" |
| @@ -77,19 +80,21 @@ | |
| 77 | result = analyzer.classify_frame(fake_frame) |
| 78 | assert result["is_diagram"] is False |
| 79 | assert result["confidence"] == 0.0 |
| 80 | |
| 81 | def test_analyze_single_pass(self, analyzer, mock_pm, fake_frame): |
| 82 | mock_pm.analyze_image.return_value = json.dumps({ |
| 83 | "diagram_type": "architecture", |
| 84 | "description": "Microservices architecture", |
| 85 | "text_content": "Service A, Service B", |
| 86 | "elements": ["Service A", "Service B"], |
| 87 | "relationships": ["A -> B: calls"], |
| 88 | "mermaid": "graph LR\n A-->B", |
| 89 | "chart_data": None |
| 90 | }) |
| 91 | result = analyzer.analyze_diagram_single_pass(fake_frame) |
| 92 | assert result["diagram_type"] == "architecture" |
| 93 | assert result["mermaid"] == "graph LR\n A-->B" |
| 94 | |
| 95 | def test_process_frames_high_confidence_diagram(self, analyzer, mock_pm, tmp_path): |
| @@ -105,38 +110,62 @@ | |
| 105 | |
| 106 | # Frame 0: high confidence diagram |
| 107 | # Frame 1: low confidence (skip) |
| 108 | # Frame 2: medium confidence (screengrab) |
| 109 | classify_responses = [ |
| 110 | json.dumps({"is_diagram": True, "diagram_type": "flowchart", "confidence": 0.9, "brief_description": "flow"}), |
| 111 | json.dumps({"is_diagram": False, "diagram_type": "unknown", "confidence": 0.1, "brief_description": "nothing"}), |
| 112 | json.dumps({"is_diagram": True, "diagram_type": "slide", "confidence": 0.5, "brief_description": "a slide"}), |
| 113 | ] |
| 114 | analysis_response = json.dumps({ |
| 115 | "diagram_type": "flowchart", |
| 116 | "description": "Login flow", |
| 117 | "text_content": "Start -> End", |
| 118 | "elements": ["Start", "End"], |
| 119 | "relationships": ["Start -> End"], |
| 120 | "mermaid": "graph LR\n Start-->End", |
| 121 | "chart_data": None |
| 122 | }) |
| 123 | |
| 124 | # Calls are interleaved per-frame: |
| 125 | # call 0: classify frame 0 (high conf) |
| 126 | # call 1: analyze frame 0 (full analysis) |
| 127 | # call 2: classify frame 1 (low conf - skip) |
| 128 | # call 3: classify frame 2 (medium conf) |
| 129 | # call 4: caption frame 2 (screengrab) |
| 130 | call_sequence = [ |
| 131 | classify_responses[0], # classify frame 0 |
| 132 | analysis_response, # analyze frame 0 |
| 133 | classify_responses[1], # classify frame 1 |
| 134 | classify_responses[2], # classify frame 2 |
| 135 | "A slide about something", # caption frame 2 |
| 136 | ] |
| 137 | call_count = [0] |
| 138 | def side_effect(image_bytes, prompt, max_tokens=4096): |
| 139 | idx = call_count[0] |
| 140 | call_count[0] += 1 |
| 141 | return call_sequence[idx] |
| 142 | |
| @@ -164,15 +193,23 @@ | |
| 164 | fp.write_bytes(b"\xff\xd8\xff fake") |
| 165 | captures_dir = tmp_path / "captures" |
| 166 | |
| 167 | # High confidence classification but analysis fails |
| 168 | call_count = [0] |
| 169 | def side_effect(image_bytes, prompt, max_tokens=4096): |
| 170 | idx = call_count[0] |
| 171 | call_count[0] += 1 |
| 172 | if idx == 0: |
| 173 | return json.dumps({"is_diagram": True, "diagram_type": "chart", "confidence": 0.8, "brief_description": "chart"}) |
| 174 | if idx == 1: |
| 175 | return "This is not valid JSON" # Analysis fails |
| 176 | return "A chart showing data" # Caption |
| 177 | |
| 178 | mock_pm.analyze_image.side_effect = side_effect |
| 179 |
| --- tests/test_diagram_analyzer.py | |
| +++ tests/test_diagram_analyzer.py | |
| @@ -1,18 +1,17 @@ | |
| 1 | """Tests for the rewritten diagram analyzer.""" |
| 2 | |
| 3 | import json |
| 4 | from unittest.mock import MagicMock |
| 5 | |
| 6 | import pytest |
| 7 | |
| 8 | from video_processor.analyzers.diagram_analyzer import ( |
| 9 | DiagramAnalyzer, |
| 10 | _parse_json_response, |
| 11 | ) |
| 12 | from video_processor.models import DiagramType |
| 13 | |
| 14 | |
| 15 | class TestParseJsonResponse: |
| 16 | def test_plain_json(self): |
| 17 | result = _parse_json_response('{"key": "value"}') |
| @@ -50,27 +49,31 @@ | |
| 49 | fp = tmp_path / "frame_0.jpg" |
| 50 | fp.write_bytes(b"\xff\xd8\xff fake image data") |
| 51 | return fp |
| 52 | |
| 53 | def test_classify_frame_diagram(self, analyzer, mock_pm, fake_frame): |
| 54 | mock_pm.analyze_image.return_value = json.dumps( |
| 55 | { |
| 56 | "is_diagram": True, |
| 57 | "diagram_type": "flowchart", |
| 58 | "confidence": 0.85, |
| 59 | "brief_description": "A flowchart showing login process", |
| 60 | } |
| 61 | ) |
| 62 | result = analyzer.classify_frame(fake_frame) |
| 63 | assert result["is_diagram"] is True |
| 64 | assert result["confidence"] == 0.85 |
| 65 | |
| 66 | def test_classify_frame_not_diagram(self, analyzer, mock_pm, fake_frame): |
| 67 | mock_pm.analyze_image.return_value = json.dumps( |
| 68 | { |
| 69 | "is_diagram": False, |
| 70 | "diagram_type": "unknown", |
| 71 | "confidence": 0.1, |
| 72 | "brief_description": "A person speaking", |
| 73 | } |
| 74 | ) |
| 75 | result = analyzer.classify_frame(fake_frame) |
| 76 | assert result["is_diagram"] is False |
| 77 | |
| 78 | def test_classify_frame_failure(self, analyzer, mock_pm, fake_frame): |
| 79 | mock_pm.analyze_image.return_value = "I cannot parse this image" |
| @@ -77,19 +80,21 @@ | |
| 80 | result = analyzer.classify_frame(fake_frame) |
| 81 | assert result["is_diagram"] is False |
| 82 | assert result["confidence"] == 0.0 |
| 83 | |
| 84 | def test_analyze_single_pass(self, analyzer, mock_pm, fake_frame): |
| 85 | mock_pm.analyze_image.return_value = json.dumps( |
| 86 | { |
| 87 | "diagram_type": "architecture", |
| 88 | "description": "Microservices architecture", |
| 89 | "text_content": "Service A, Service B", |
| 90 | "elements": ["Service A", "Service B"], |
| 91 | "relationships": ["A -> B: calls"], |
| 92 | "mermaid": "graph LR\n A-->B", |
| 93 | "chart_data": None, |
| 94 | } |
| 95 | ) |
| 96 | result = analyzer.analyze_diagram_single_pass(fake_frame) |
| 97 | assert result["diagram_type"] == "architecture" |
| 98 | assert result["mermaid"] == "graph LR\n A-->B" |
| 99 | |
| 100 | def test_process_frames_high_confidence_diagram(self, analyzer, mock_pm, tmp_path): |
| @@ -105,38 +110,62 @@ | |
| 110 | |
| 111 | # Frame 0: high confidence diagram |
| 112 | # Frame 1: low confidence (skip) |
| 113 | # Frame 2: medium confidence (screengrab) |
| 114 | classify_responses = [ |
| 115 | json.dumps( |
| 116 | { |
| 117 | "is_diagram": True, |
| 118 | "diagram_type": "flowchart", |
| 119 | "confidence": 0.9, |
| 120 | "brief_description": "flow", |
| 121 | } |
| 122 | ), |
| 123 | json.dumps( |
| 124 | { |
| 125 | "is_diagram": False, |
| 126 | "diagram_type": "unknown", |
| 127 | "confidence": 0.1, |
| 128 | "brief_description": "nothing", |
| 129 | } |
| 130 | ), |
| 131 | json.dumps( |
| 132 | { |
| 133 | "is_diagram": True, |
| 134 | "diagram_type": "slide", |
| 135 | "confidence": 0.5, |
| 136 | "brief_description": "a slide", |
| 137 | } |
| 138 | ), |
| 139 | ] |
| 140 | analysis_response = json.dumps( |
| 141 | { |
| 142 | "diagram_type": "flowchart", |
| 143 | "description": "Login flow", |
| 144 | "text_content": "Start -> End", |
| 145 | "elements": ["Start", "End"], |
| 146 | "relationships": ["Start -> End"], |
| 147 | "mermaid": "graph LR\n Start-->End", |
| 148 | "chart_data": None, |
| 149 | } |
| 150 | ) |
| 151 | |
| 152 | # Calls are interleaved per-frame: |
| 153 | # call 0: classify frame 0 (high conf) |
| 154 | # call 1: analyze frame 0 (full analysis) |
| 155 | # call 2: classify frame 1 (low conf - skip) |
| 156 | # call 3: classify frame 2 (medium conf) |
| 157 | # call 4: caption frame 2 (screengrab) |
| 158 | call_sequence = [ |
| 159 | classify_responses[0], # classify frame 0 |
| 160 | analysis_response, # analyze frame 0 |
| 161 | classify_responses[1], # classify frame 1 |
| 162 | classify_responses[2], # classify frame 2 |
| 163 | "A slide about something", # caption frame 2 |
| 164 | ] |
| 165 | call_count = [0] |
| 166 | |
| 167 | def side_effect(image_bytes, prompt, max_tokens=4096): |
| 168 | idx = call_count[0] |
| 169 | call_count[0] += 1 |
| 170 | return call_sequence[idx] |
| 171 | |
| @@ -164,15 +193,23 @@ | |
| 193 | fp.write_bytes(b"\xff\xd8\xff fake") |
| 194 | captures_dir = tmp_path / "captures" |
| 195 | |
| 196 | # High confidence classification but analysis fails |
| 197 | call_count = [0] |
| 198 | |
| 199 | def side_effect(image_bytes, prompt, max_tokens=4096): |
| 200 | idx = call_count[0] |
| 201 | call_count[0] += 1 |
| 202 | if idx == 0: |
| 203 | return json.dumps( |
| 204 | { |
| 205 | "is_diagram": True, |
| 206 | "diagram_type": "chart", |
| 207 | "confidence": 0.8, |
| 208 | "brief_description": "chart", |
| 209 | } |
| 210 | ) |
| 211 | if idx == 1: |
| 212 | return "This is not valid JSON" # Analysis fails |
| 213 | return "A chart showing data" # Caption |
| 214 | |
| 215 | mock_pm.analyze_image.side_effect = side_effect |
| 216 |
+12
-9
| --- tests/test_frame_extractor.py | ||
| +++ tests/test_frame_extractor.py | ||
| @@ -1,19 +1,19 @@ | ||
| 1 | 1 | """Tests for the frame extractor module.""" |
| 2 | + | |
| 2 | 3 | import os |
| 3 | 4 | import tempfile |
| 4 | -from pathlib import Path | |
| 5 | 5 | |
| 6 | 6 | import numpy as np |
| 7 | 7 | import pytest |
| 8 | 8 | |
| 9 | 9 | from video_processor.extractors.frame_extractor import ( |
| 10 | 10 | calculate_frame_difference, |
| 11 | - extract_frames, | |
| 12 | 11 | is_gpu_available, |
| 13 | - save_frames | |
| 12 | + save_frames, | |
| 14 | 13 | ) |
| 14 | + | |
| 15 | 15 | |
| 16 | 16 | # Create dummy test frames |
| 17 | 17 | @pytest.fixture |
| 18 | 18 | def dummy_frames(): |
| 19 | 19 | # Create a list of dummy frames with different content |
| @@ -21,42 +21,45 @@ | ||
| 21 | 21 | for i in range(3): |
| 22 | 22 | # Create frame with different intensity for each |
| 23 | 23 | frame = np.ones((100, 100, 3), dtype=np.uint8) * (i * 50) |
| 24 | 24 | frames.append(frame) |
| 25 | 25 | return frames |
| 26 | + | |
| 26 | 27 | |
| 27 | 28 | def test_calculate_frame_difference(): |
| 28 | 29 | """Test frame difference calculation.""" |
| 29 | 30 | # Create two frames with some difference |
| 30 | 31 | frame1 = np.zeros((100, 100, 3), dtype=np.uint8) |
| 31 | 32 | frame2 = np.ones((100, 100, 3), dtype=np.uint8) * 128 # 50% intensity |
| 32 | - | |
| 33 | + | |
| 33 | 34 | # Calculate difference |
| 34 | 35 | diff = calculate_frame_difference(frame1, frame2) |
| 35 | - | |
| 36 | + | |
| 36 | 37 | # Expected difference is around 128/255 = 0.5 |
| 37 | 38 | assert 0.45 <= diff <= 0.55 |
| 38 | - | |
| 39 | + | |
| 39 | 40 | # Test identical frames |
| 40 | 41 | diff_identical = calculate_frame_difference(frame1, frame1.copy()) |
| 41 | 42 | assert diff_identical < 0.001 # Should be very close to 0 |
| 43 | + | |
| 42 | 44 | |
| 43 | 45 | def test_is_gpu_available(): |
| 44 | 46 | """Test GPU availability check.""" |
| 45 | 47 | # This just tests that the function runs without error |
| 46 | 48 | # We don't assert the result because it depends on the system |
| 47 | 49 | result = is_gpu_available() |
| 48 | 50 | assert isinstance(result, bool) |
| 51 | + | |
| 49 | 52 | |
| 50 | 53 | def test_save_frames(dummy_frames): |
| 51 | 54 | """Test saving frames to disk.""" |
| 52 | 55 | with tempfile.TemporaryDirectory() as temp_dir: |
| 53 | 56 | # Save frames |
| 54 | 57 | paths = save_frames(dummy_frames, temp_dir, "test_frame") |
| 55 | - | |
| 58 | + | |
| 56 | 59 | # Check that we got the correct number of paths |
| 57 | 60 | assert len(paths) == len(dummy_frames) |
| 58 | - | |
| 61 | + | |
| 59 | 62 | # Check that files were created |
| 60 | 63 | for path in paths: |
| 61 | 64 | assert os.path.exists(path) |
| 62 | - assert os.path.getsize(path) > 0 # Files should have content | |
| 65 | + assert os.path.getsize(path) > 0 # Files should have content | |
| 63 | 66 |
| --- tests/test_frame_extractor.py | |
| +++ tests/test_frame_extractor.py | |
| @@ -1,19 +1,19 @@ | |
| 1 | """Tests for the frame extractor module.""" |
| 2 | import os |
| 3 | import tempfile |
| 4 | from pathlib import Path |
| 5 | |
| 6 | import numpy as np |
| 7 | import pytest |
| 8 | |
| 9 | from video_processor.extractors.frame_extractor import ( |
| 10 | calculate_frame_difference, |
| 11 | extract_frames, |
| 12 | is_gpu_available, |
| 13 | save_frames |
| 14 | ) |
| 15 | |
| 16 | # Create dummy test frames |
| 17 | @pytest.fixture |
| 18 | def dummy_frames(): |
| 19 | # Create a list of dummy frames with different content |
| @@ -21,42 +21,45 @@ | |
| 21 | for i in range(3): |
| 22 | # Create frame with different intensity for each |
| 23 | frame = np.ones((100, 100, 3), dtype=np.uint8) * (i * 50) |
| 24 | frames.append(frame) |
| 25 | return frames |
| 26 | |
| 27 | def test_calculate_frame_difference(): |
| 28 | """Test frame difference calculation.""" |
| 29 | # Create two frames with some difference |
| 30 | frame1 = np.zeros((100, 100, 3), dtype=np.uint8) |
| 31 | frame2 = np.ones((100, 100, 3), dtype=np.uint8) * 128 # 50% intensity |
| 32 | |
| 33 | # Calculate difference |
| 34 | diff = calculate_frame_difference(frame1, frame2) |
| 35 | |
| 36 | # Expected difference is around 128/255 = 0.5 |
| 37 | assert 0.45 <= diff <= 0.55 |
| 38 | |
| 39 | # Test identical frames |
| 40 | diff_identical = calculate_frame_difference(frame1, frame1.copy()) |
| 41 | assert diff_identical < 0.001 # Should be very close to 0 |
| 42 | |
| 43 | def test_is_gpu_available(): |
| 44 | """Test GPU availability check.""" |
| 45 | # This just tests that the function runs without error |
| 46 | # We don't assert the result because it depends on the system |
| 47 | result = is_gpu_available() |
| 48 | assert isinstance(result, bool) |
| 49 | |
| 50 | def test_save_frames(dummy_frames): |
| 51 | """Test saving frames to disk.""" |
| 52 | with tempfile.TemporaryDirectory() as temp_dir: |
| 53 | # Save frames |
| 54 | paths = save_frames(dummy_frames, temp_dir, "test_frame") |
| 55 | |
| 56 | # Check that we got the correct number of paths |
| 57 | assert len(paths) == len(dummy_frames) |
| 58 | |
| 59 | # Check that files were created |
| 60 | for path in paths: |
| 61 | assert os.path.exists(path) |
| 62 | assert os.path.getsize(path) > 0 # Files should have content |
| 63 |
| --- tests/test_frame_extractor.py | |
| +++ tests/test_frame_extractor.py | |
| @@ -1,19 +1,19 @@ | |
| 1 | """Tests for the frame extractor module.""" |
| 2 | |
| 3 | import os |
| 4 | import tempfile |
| 5 | |
| 6 | import numpy as np |
| 7 | import pytest |
| 8 | |
| 9 | from video_processor.extractors.frame_extractor import ( |
| 10 | calculate_frame_difference, |
| 11 | is_gpu_available, |
| 12 | save_frames, |
| 13 | ) |
| 14 | |
| 15 | |
| 16 | # Create dummy test frames |
| 17 | @pytest.fixture |
| 18 | def dummy_frames(): |
| 19 | # Create a list of dummy frames with different content |
| @@ -21,42 +21,45 @@ | |
| 21 | for i in range(3): |
| 22 | # Create frame with different intensity for each |
| 23 | frame = np.ones((100, 100, 3), dtype=np.uint8) * (i * 50) |
| 24 | frames.append(frame) |
| 25 | return frames |
| 26 | |
| 27 | |
| 28 | def test_calculate_frame_difference(): |
| 29 | """Test frame difference calculation.""" |
| 30 | # Create two frames with some difference |
| 31 | frame1 = np.zeros((100, 100, 3), dtype=np.uint8) |
| 32 | frame2 = np.ones((100, 100, 3), dtype=np.uint8) * 128 # 50% intensity |
| 33 | |
| 34 | # Calculate difference |
| 35 | diff = calculate_frame_difference(frame1, frame2) |
| 36 | |
| 37 | # Expected difference is around 128/255 = 0.5 |
| 38 | assert 0.45 <= diff <= 0.55 |
| 39 | |
| 40 | # Test identical frames |
| 41 | diff_identical = calculate_frame_difference(frame1, frame1.copy()) |
| 42 | assert diff_identical < 0.001 # Should be very close to 0 |
| 43 | |
| 44 | |
| 45 | def test_is_gpu_available(): |
| 46 | """Test GPU availability check.""" |
| 47 | # This just tests that the function runs without error |
| 48 | # We don't assert the result because it depends on the system |
| 49 | result = is_gpu_available() |
| 50 | assert isinstance(result, bool) |
| 51 | |
| 52 | |
| 53 | def test_save_frames(dummy_frames): |
| 54 | """Test saving frames to disk.""" |
| 55 | with tempfile.TemporaryDirectory() as temp_dir: |
| 56 | # Save frames |
| 57 | paths = save_frames(dummy_frames, temp_dir, "test_frame") |
| 58 | |
| 59 | # Check that we got the correct number of paths |
| 60 | assert len(paths) == len(dummy_frames) |
| 61 | |
| 62 | # Check that files were created |
| 63 | for path in paths: |
| 64 | assert os.path.exists(path) |
| 65 | assert os.path.getsize(path) > 0 # Files should have content |
| 66 |
+2
-4
| --- tests/test_json_parsing.py | ||
| +++ tests/test_json_parsing.py | ||
| @@ -1,25 +1,23 @@ | ||
| 1 | 1 | """Tests for robust JSON parsing from LLM responses.""" |
| 2 | 2 | |
| 3 | -import pytest | |
| 4 | - | |
| 5 | 3 | from video_processor.utils.json_parsing import parse_json_from_response |
| 6 | 4 | |
| 7 | 5 | |
| 8 | 6 | class TestParseJsonFromResponse: |
| 9 | 7 | def test_direct_dict(self): |
| 10 | 8 | assert parse_json_from_response('{"key": "value"}') == {"key": "value"} |
| 11 | 9 | |
| 12 | 10 | def test_direct_array(self): |
| 13 | - assert parse_json_from_response('[1, 2, 3]') == [1, 2, 3] | |
| 11 | + assert parse_json_from_response("[1, 2, 3]") == [1, 2, 3] | |
| 14 | 12 | |
| 15 | 13 | def test_markdown_fenced_json(self): |
| 16 | 14 | text = '```json\n{"key": "value"}\n```' |
| 17 | 15 | assert parse_json_from_response(text) == {"key": "value"} |
| 18 | 16 | |
| 19 | 17 | def test_markdown_fenced_no_lang(self): |
| 20 | - text = '```\n[1, 2]\n```' | |
| 18 | + text = "```\n[1, 2]\n```" | |
| 21 | 19 | assert parse_json_from_response(text) == [1, 2] |
| 22 | 20 | |
| 23 | 21 | def test_json_embedded_in_text(self): |
| 24 | 22 | text = 'Here is the result:\n{"name": "test", "value": 42}\nEnd of result.' |
| 25 | 23 | result = parse_json_from_response(text) |
| 26 | 24 |
| --- tests/test_json_parsing.py | |
| +++ tests/test_json_parsing.py | |
| @@ -1,25 +1,23 @@ | |
| 1 | """Tests for robust JSON parsing from LLM responses.""" |
| 2 | |
| 3 | import pytest |
| 4 | |
| 5 | from video_processor.utils.json_parsing import parse_json_from_response |
| 6 | |
| 7 | |
| 8 | class TestParseJsonFromResponse: |
| 9 | def test_direct_dict(self): |
| 10 | assert parse_json_from_response('{"key": "value"}') == {"key": "value"} |
| 11 | |
| 12 | def test_direct_array(self): |
| 13 | assert parse_json_from_response('[1, 2, 3]') == [1, 2, 3] |
| 14 | |
| 15 | def test_markdown_fenced_json(self): |
| 16 | text = '```json\n{"key": "value"}\n```' |
| 17 | assert parse_json_from_response(text) == {"key": "value"} |
| 18 | |
| 19 | def test_markdown_fenced_no_lang(self): |
| 20 | text = '```\n[1, 2]\n```' |
| 21 | assert parse_json_from_response(text) == [1, 2] |
| 22 | |
| 23 | def test_json_embedded_in_text(self): |
| 24 | text = 'Here is the result:\n{"name": "test", "value": 42}\nEnd of result.' |
| 25 | result = parse_json_from_response(text) |
| 26 |
| --- tests/test_json_parsing.py | |
| +++ tests/test_json_parsing.py | |
| @@ -1,25 +1,23 @@ | |
| 1 | """Tests for robust JSON parsing from LLM responses.""" |
| 2 | |
| 3 | from video_processor.utils.json_parsing import parse_json_from_response |
| 4 | |
| 5 | |
| 6 | class TestParseJsonFromResponse: |
| 7 | def test_direct_dict(self): |
| 8 | assert parse_json_from_response('{"key": "value"}') == {"key": "value"} |
| 9 | |
| 10 | def test_direct_array(self): |
| 11 | assert parse_json_from_response("[1, 2, 3]") == [1, 2, 3] |
| 12 | |
| 13 | def test_markdown_fenced_json(self): |
| 14 | text = '```json\n{"key": "value"}\n```' |
| 15 | assert parse_json_from_response(text) == {"key": "value"} |
| 16 | |
| 17 | def test_markdown_fenced_no_lang(self): |
| 18 | text = "```\n[1, 2]\n```" |
| 19 | assert parse_json_from_response(text) == [1, 2] |
| 20 | |
| 21 | def test_json_embedded_in_text(self): |
| 22 | text = 'Here is the result:\n{"name": "test", "value": 42}\nEnd of result.' |
| 23 | result = parse_json_from_response(text) |
| 24 |
+11
-7
| --- tests/test_models.py | ||
| +++ tests/test_models.py | ||
| @@ -1,11 +1,7 @@ | ||
| 1 | 1 | """Tests for pydantic data models.""" |
| 2 | 2 | |
| 3 | -import json | |
| 4 | - | |
| 5 | -import pytest | |
| 6 | - | |
| 7 | 3 | from video_processor.models import ( |
| 8 | 4 | ActionItem, |
| 9 | 5 | BatchManifest, |
| 10 | 6 | BatchVideoEntry, |
| 11 | 7 | DiagramResult, |
| @@ -66,11 +62,13 @@ | ||
| 66 | 62 | assert restored == item |
| 67 | 63 | |
| 68 | 64 | |
| 69 | 65 | class TestKeyPoint: |
| 70 | 66 | def test_with_related_diagrams(self): |
| 71 | - kp = KeyPoint(point="System uses microservices", topic="Architecture", related_diagrams=[0, 2]) | |
| 67 | + kp = KeyPoint( | |
| 68 | + point="System uses microservices", topic="Architecture", related_diagrams=[0, 2] | |
| 69 | + ) | |
| 72 | 70 | assert kp.related_diagrams == [0, 2] |
| 73 | 71 | |
| 74 | 72 | def test_round_trip(self): |
| 75 | 73 | kp = KeyPoint(point="Test", details="Detail", timestamp=42.0, source="diagram") |
| 76 | 74 | restored = KeyPoint.model_validate_json(kp.model_dump_json()) |
| @@ -120,11 +118,15 @@ | ||
| 120 | 118 | sc = ScreenCapture(frame_index=10, caption="Architecture overview slide", confidence=0.5) |
| 121 | 119 | assert sc.image_path is None |
| 122 | 120 | |
| 123 | 121 | def test_round_trip(self): |
| 124 | 122 | sc = ScreenCapture( |
| 125 | - frame_index=7, timestamp=30.0, caption="Timeline", image_path="captures/capture_0.jpg", confidence=0.45 | |
| 123 | + frame_index=7, | |
| 124 | + timestamp=30.0, | |
| 125 | + caption="Timeline", | |
| 126 | + image_path="captures/capture_0.jpg", | |
| 127 | + confidence=0.45, | |
| 126 | 128 | ) |
| 127 | 129 | restored = ScreenCapture.model_validate_json(sc.model_dump_json()) |
| 128 | 130 | assert restored == sc |
| 129 | 131 | |
| 130 | 132 | |
| @@ -171,11 +173,13 @@ | ||
| 171 | 173 | assert m.screen_captures == [] |
| 172 | 174 | assert m.stats.frames_extracted == 0 |
| 173 | 175 | |
| 174 | 176 | def test_full_round_trip(self): |
| 175 | 177 | m = VideoManifest( |
| 176 | - video=VideoMetadata(title="Meeting", source_path="/tmp/video.mp4", duration_seconds=3600.0), | |
| 178 | + video=VideoMetadata( | |
| 179 | + title="Meeting", source_path="/tmp/video.mp4", duration_seconds=3600.0 | |
| 180 | + ), | |
| 177 | 181 | stats=ProcessingStats( |
| 178 | 182 | frames_extracted=50, |
| 179 | 183 | diagrams_detected=3, |
| 180 | 184 | screen_captures=2, |
| 181 | 185 | models_used={"vision": "gpt-4o", "chat": "claude-sonnet-4-5"}, |
| 182 | 186 |
| --- tests/test_models.py | |
| +++ tests/test_models.py | |
| @@ -1,11 +1,7 @@ | |
| 1 | """Tests for pydantic data models.""" |
| 2 | |
| 3 | import json |
| 4 | |
| 5 | import pytest |
| 6 | |
| 7 | from video_processor.models import ( |
| 8 | ActionItem, |
| 9 | BatchManifest, |
| 10 | BatchVideoEntry, |
| 11 | DiagramResult, |
| @@ -66,11 +62,13 @@ | |
| 66 | assert restored == item |
| 67 | |
| 68 | |
| 69 | class TestKeyPoint: |
| 70 | def test_with_related_diagrams(self): |
| 71 | kp = KeyPoint(point="System uses microservices", topic="Architecture", related_diagrams=[0, 2]) |
| 72 | assert kp.related_diagrams == [0, 2] |
| 73 | |
| 74 | def test_round_trip(self): |
| 75 | kp = KeyPoint(point="Test", details="Detail", timestamp=42.0, source="diagram") |
| 76 | restored = KeyPoint.model_validate_json(kp.model_dump_json()) |
| @@ -120,11 +118,15 @@ | |
| 120 | sc = ScreenCapture(frame_index=10, caption="Architecture overview slide", confidence=0.5) |
| 121 | assert sc.image_path is None |
| 122 | |
| 123 | def test_round_trip(self): |
| 124 | sc = ScreenCapture( |
| 125 | frame_index=7, timestamp=30.0, caption="Timeline", image_path="captures/capture_0.jpg", confidence=0.45 |
| 126 | ) |
| 127 | restored = ScreenCapture.model_validate_json(sc.model_dump_json()) |
| 128 | assert restored == sc |
| 129 | |
| 130 | |
| @@ -171,11 +173,13 @@ | |
| 171 | assert m.screen_captures == [] |
| 172 | assert m.stats.frames_extracted == 0 |
| 173 | |
| 174 | def test_full_round_trip(self): |
| 175 | m = VideoManifest( |
| 176 | video=VideoMetadata(title="Meeting", source_path="/tmp/video.mp4", duration_seconds=3600.0), |
| 177 | stats=ProcessingStats( |
| 178 | frames_extracted=50, |
| 179 | diagrams_detected=3, |
| 180 | screen_captures=2, |
| 181 | models_used={"vision": "gpt-4o", "chat": "claude-sonnet-4-5"}, |
| 182 |
| --- tests/test_models.py | |
| +++ tests/test_models.py | |
| @@ -1,11 +1,7 @@ | |
| 1 | """Tests for pydantic data models.""" |
| 2 | |
| 3 | from video_processor.models import ( |
| 4 | ActionItem, |
| 5 | BatchManifest, |
| 6 | BatchVideoEntry, |
| 7 | DiagramResult, |
| @@ -66,11 +62,13 @@ | |
| 62 | assert restored == item |
| 63 | |
| 64 | |
| 65 | class TestKeyPoint: |
| 66 | def test_with_related_diagrams(self): |
| 67 | kp = KeyPoint( |
| 68 | point="System uses microservices", topic="Architecture", related_diagrams=[0, 2] |
| 69 | ) |
| 70 | assert kp.related_diagrams == [0, 2] |
| 71 | |
| 72 | def test_round_trip(self): |
| 73 | kp = KeyPoint(point="Test", details="Detail", timestamp=42.0, source="diagram") |
| 74 | restored = KeyPoint.model_validate_json(kp.model_dump_json()) |
| @@ -120,11 +118,15 @@ | |
| 118 | sc = ScreenCapture(frame_index=10, caption="Architecture overview slide", confidence=0.5) |
| 119 | assert sc.image_path is None |
| 120 | |
| 121 | def test_round_trip(self): |
| 122 | sc = ScreenCapture( |
| 123 | frame_index=7, |
| 124 | timestamp=30.0, |
| 125 | caption="Timeline", |
| 126 | image_path="captures/capture_0.jpg", |
| 127 | confidence=0.45, |
| 128 | ) |
| 129 | restored = ScreenCapture.model_validate_json(sc.model_dump_json()) |
| 130 | assert restored == sc |
| 131 | |
| 132 | |
| @@ -171,11 +173,13 @@ | |
| 173 | assert m.screen_captures == [] |
| 174 | assert m.stats.frames_extracted == 0 |
| 175 | |
| 176 | def test_full_round_trip(self): |
| 177 | m = VideoManifest( |
| 178 | video=VideoMetadata( |
| 179 | title="Meeting", source_path="/tmp/video.mp4", duration_seconds=3600.0 |
| 180 | ), |
| 181 | stats=ProcessingStats( |
| 182 | frames_extracted=50, |
| 183 | diagrams_detected=3, |
| 184 | screen_captures=2, |
| 185 | models_used={"vision": "gpt-4o", "chat": "claude-sonnet-4-5"}, |
| 186 |
| --- tests/test_output_structure.py | ||
| +++ tests/test_output_structure.py | ||
| @@ -1,12 +1,8 @@ | ||
| 1 | 1 | """Tests for output structure and manifest I/O.""" |
| 2 | 2 | |
| 3 | 3 | import json |
| 4 | -import tempfile | |
| 5 | -from pathlib import Path | |
| 6 | - | |
| 7 | -import pytest | |
| 8 | 4 | |
| 9 | 5 | from video_processor.models import ( |
| 10 | 6 | ActionItem, |
| 11 | 7 | BatchManifest, |
| 12 | 8 | BatchVideoEntry, |
| 13 | 9 |
| --- tests/test_output_structure.py | |
| +++ tests/test_output_structure.py | |
| @@ -1,12 +1,8 @@ | |
| 1 | """Tests for output structure and manifest I/O.""" |
| 2 | |
| 3 | import json |
| 4 | import tempfile |
| 5 | from pathlib import Path |
| 6 | |
| 7 | import pytest |
| 8 | |
| 9 | from video_processor.models import ( |
| 10 | ActionItem, |
| 11 | BatchManifest, |
| 12 | BatchVideoEntry, |
| 13 |
| --- tests/test_output_structure.py | |
| +++ tests/test_output_structure.py | |
| @@ -1,12 +1,8 @@ | |
| 1 | """Tests for output structure and manifest I/O.""" |
| 2 | |
| 3 | import json |
| 4 | |
| 5 | from video_processor.models import ( |
| 6 | ActionItem, |
| 7 | BatchManifest, |
| 8 | BatchVideoEntry, |
| 9 |
+33
-23
| --- tests/test_pipeline.py | ||
| +++ tests/test_pipeline.py | ||
| @@ -1,14 +1,11 @@ | ||
| 1 | 1 | """Tests for the core video processing pipeline.""" |
| 2 | 2 | |
| 3 | 3 | import json |
| 4 | -from pathlib import Path | |
| 5 | -from unittest.mock import MagicMock, patch | |
| 4 | +from unittest.mock import MagicMock | |
| 6 | 5 | |
| 7 | -import pytest | |
| 8 | - | |
| 9 | -from video_processor.pipeline import _extract_key_points, _extract_action_items, _format_srt_time | |
| 6 | +from video_processor.pipeline import _extract_action_items, _extract_key_points, _format_srt_time | |
| 10 | 7 | |
| 11 | 8 | |
| 12 | 9 | class TestFormatSrtTime: |
| 13 | 10 | def test_zero(self): |
| 14 | 11 | assert _format_srt_time(0) == "00:00:00,000" |
| @@ -28,27 +25,31 @@ | ||
| 28 | 25 | |
| 29 | 26 | |
| 30 | 27 | class TestExtractKeyPoints: |
| 31 | 28 | def test_parses_valid_response(self): |
| 32 | 29 | pm = MagicMock() |
| 33 | - pm.chat.return_value = json.dumps([ | |
| 34 | - {"point": "Main point", "topic": "Architecture", "details": "Some details"}, | |
| 35 | - {"point": "Second point", "topic": None, "details": None}, | |
| 36 | - ]) | |
| 30 | + pm.chat.return_value = json.dumps( | |
| 31 | + [ | |
| 32 | + {"point": "Main point", "topic": "Architecture", "details": "Some details"}, | |
| 33 | + {"point": "Second point", "topic": None, "details": None}, | |
| 34 | + ] | |
| 35 | + ) | |
| 37 | 36 | result = _extract_key_points(pm, "Some transcript text here") |
| 38 | 37 | assert len(result) == 2 |
| 39 | 38 | assert result[0].point == "Main point" |
| 40 | 39 | assert result[0].topic == "Architecture" |
| 41 | 40 | assert result[1].point == "Second point" |
| 42 | 41 | |
| 43 | 42 | def test_skips_invalid_items(self): |
| 44 | 43 | pm = MagicMock() |
| 45 | - pm.chat.return_value = json.dumps([ | |
| 46 | - {"point": "Valid", "topic": None}, | |
| 47 | - {"topic": "No point field"}, | |
| 48 | - {"point": "", "topic": "Empty point"}, | |
| 49 | - ]) | |
| 44 | + pm.chat.return_value = json.dumps( | |
| 45 | + [ | |
| 46 | + {"point": "Valid", "topic": None}, | |
| 47 | + {"topic": "No point field"}, | |
| 48 | + {"point": "", "topic": "Empty point"}, | |
| 49 | + ] | |
| 50 | + ) | |
| 50 | 51 | result = _extract_key_points(pm, "text") |
| 51 | 52 | assert len(result) == 1 |
| 52 | 53 | assert result[0].point == "Valid" |
| 53 | 54 | |
| 54 | 55 | def test_handles_error(self): |
| @@ -65,29 +66,38 @@ | ||
| 65 | 66 | |
| 66 | 67 | |
| 67 | 68 | class TestExtractActionItems: |
| 68 | 69 | def test_parses_valid_response(self): |
| 69 | 70 | pm = MagicMock() |
| 70 | - pm.chat.return_value = json.dumps([ | |
| 71 | - {"action": "Deploy fix", "assignee": "Bob", "deadline": "Friday", | |
| 72 | - "priority": "high", "context": "Production"}, | |
| 73 | - ]) | |
| 71 | + pm.chat.return_value = json.dumps( | |
| 72 | + [ | |
| 73 | + { | |
| 74 | + "action": "Deploy fix", | |
| 75 | + "assignee": "Bob", | |
| 76 | + "deadline": "Friday", | |
| 77 | + "priority": "high", | |
| 78 | + "context": "Production", | |
| 79 | + }, | |
| 80 | + ] | |
| 81 | + ) | |
| 74 | 82 | result = _extract_action_items(pm, "Some transcript text") |
| 75 | 83 | assert len(result) == 1 |
| 76 | 84 | assert result[0].action == "Deploy fix" |
| 77 | 85 | assert result[0].assignee == "Bob" |
| 78 | 86 | |
| 79 | 87 | def test_skips_invalid_items(self): |
| 80 | 88 | pm = MagicMock() |
| 81 | - pm.chat.return_value = json.dumps([ | |
| 82 | - {"action": "Valid action"}, | |
| 83 | - {"assignee": "No action field"}, | |
| 84 | - {"action": ""}, | |
| 85 | - ]) | |
| 89 | + pm.chat.return_value = json.dumps( | |
| 90 | + [ | |
| 91 | + {"action": "Valid action"}, | |
| 92 | + {"assignee": "No action field"}, | |
| 93 | + {"action": ""}, | |
| 94 | + ] | |
| 95 | + ) | |
| 86 | 96 | result = _extract_action_items(pm, "text") |
| 87 | 97 | assert len(result) == 1 |
| 88 | 98 | |
| 89 | 99 | def test_handles_error(self): |
| 90 | 100 | pm = MagicMock() |
| 91 | 101 | pm.chat.side_effect = Exception("API down") |
| 92 | 102 | result = _extract_action_items(pm, "text") |
| 93 | 103 | assert result == [] |
| 94 | 104 |
| --- tests/test_pipeline.py | |
| +++ tests/test_pipeline.py | |
| @@ -1,14 +1,11 @@ | |
| 1 | """Tests for the core video processing pipeline.""" |
| 2 | |
| 3 | import json |
| 4 | from pathlib import Path |
| 5 | from unittest.mock import MagicMock, patch |
| 6 | |
| 7 | import pytest |
| 8 | |
| 9 | from video_processor.pipeline import _extract_key_points, _extract_action_items, _format_srt_time |
| 10 | |
| 11 | |
| 12 | class TestFormatSrtTime: |
| 13 | def test_zero(self): |
| 14 | assert _format_srt_time(0) == "00:00:00,000" |
| @@ -28,27 +25,31 @@ | |
| 28 | |
| 29 | |
| 30 | class TestExtractKeyPoints: |
| 31 | def test_parses_valid_response(self): |
| 32 | pm = MagicMock() |
| 33 | pm.chat.return_value = json.dumps([ |
| 34 | {"point": "Main point", "topic": "Architecture", "details": "Some details"}, |
| 35 | {"point": "Second point", "topic": None, "details": None}, |
| 36 | ]) |
| 37 | result = _extract_key_points(pm, "Some transcript text here") |
| 38 | assert len(result) == 2 |
| 39 | assert result[0].point == "Main point" |
| 40 | assert result[0].topic == "Architecture" |
| 41 | assert result[1].point == "Second point" |
| 42 | |
| 43 | def test_skips_invalid_items(self): |
| 44 | pm = MagicMock() |
| 45 | pm.chat.return_value = json.dumps([ |
| 46 | {"point": "Valid", "topic": None}, |
| 47 | {"topic": "No point field"}, |
| 48 | {"point": "", "topic": "Empty point"}, |
| 49 | ]) |
| 50 | result = _extract_key_points(pm, "text") |
| 51 | assert len(result) == 1 |
| 52 | assert result[0].point == "Valid" |
| 53 | |
| 54 | def test_handles_error(self): |
| @@ -65,29 +66,38 @@ | |
| 65 | |
| 66 | |
| 67 | class TestExtractActionItems: |
| 68 | def test_parses_valid_response(self): |
| 69 | pm = MagicMock() |
| 70 | pm.chat.return_value = json.dumps([ |
| 71 | {"action": "Deploy fix", "assignee": "Bob", "deadline": "Friday", |
| 72 | "priority": "high", "context": "Production"}, |
| 73 | ]) |
| 74 | result = _extract_action_items(pm, "Some transcript text") |
| 75 | assert len(result) == 1 |
| 76 | assert result[0].action == "Deploy fix" |
| 77 | assert result[0].assignee == "Bob" |
| 78 | |
| 79 | def test_skips_invalid_items(self): |
| 80 | pm = MagicMock() |
| 81 | pm.chat.return_value = json.dumps([ |
| 82 | {"action": "Valid action"}, |
| 83 | {"assignee": "No action field"}, |
| 84 | {"action": ""}, |
| 85 | ]) |
| 86 | result = _extract_action_items(pm, "text") |
| 87 | assert len(result) == 1 |
| 88 | |
| 89 | def test_handles_error(self): |
| 90 | pm = MagicMock() |
| 91 | pm.chat.side_effect = Exception("API down") |
| 92 | result = _extract_action_items(pm, "text") |
| 93 | assert result == [] |
| 94 |
| --- tests/test_pipeline.py | |
| +++ tests/test_pipeline.py | |
| @@ -1,14 +1,11 @@ | |
| 1 | """Tests for the core video processing pipeline.""" |
| 2 | |
| 3 | import json |
| 4 | from unittest.mock import MagicMock |
| 5 | |
| 6 | from video_processor.pipeline import _extract_action_items, _extract_key_points, _format_srt_time |
| 7 | |
| 8 | |
| 9 | class TestFormatSrtTime: |
| 10 | def test_zero(self): |
| 11 | assert _format_srt_time(0) == "00:00:00,000" |
| @@ -28,27 +25,31 @@ | |
| 25 | |
| 26 | |
| 27 | class TestExtractKeyPoints: |
| 28 | def test_parses_valid_response(self): |
| 29 | pm = MagicMock() |
| 30 | pm.chat.return_value = json.dumps( |
| 31 | [ |
| 32 | {"point": "Main point", "topic": "Architecture", "details": "Some details"}, |
| 33 | {"point": "Second point", "topic": None, "details": None}, |
| 34 | ] |
| 35 | ) |
| 36 | result = _extract_key_points(pm, "Some transcript text here") |
| 37 | assert len(result) == 2 |
| 38 | assert result[0].point == "Main point" |
| 39 | assert result[0].topic == "Architecture" |
| 40 | assert result[1].point == "Second point" |
| 41 | |
| 42 | def test_skips_invalid_items(self): |
| 43 | pm = MagicMock() |
| 44 | pm.chat.return_value = json.dumps( |
| 45 | [ |
| 46 | {"point": "Valid", "topic": None}, |
| 47 | {"topic": "No point field"}, |
| 48 | {"point": "", "topic": "Empty point"}, |
| 49 | ] |
| 50 | ) |
| 51 | result = _extract_key_points(pm, "text") |
| 52 | assert len(result) == 1 |
| 53 | assert result[0].point == "Valid" |
| 54 | |
| 55 | def test_handles_error(self): |
| @@ -65,29 +66,38 @@ | |
| 66 | |
| 67 | |
| 68 | class TestExtractActionItems: |
| 69 | def test_parses_valid_response(self): |
| 70 | pm = MagicMock() |
| 71 | pm.chat.return_value = json.dumps( |
| 72 | [ |
| 73 | { |
| 74 | "action": "Deploy fix", |
| 75 | "assignee": "Bob", |
| 76 | "deadline": "Friday", |
| 77 | "priority": "high", |
| 78 | "context": "Production", |
| 79 | }, |
| 80 | ] |
| 81 | ) |
| 82 | result = _extract_action_items(pm, "Some transcript text") |
| 83 | assert len(result) == 1 |
| 84 | assert result[0].action == "Deploy fix" |
| 85 | assert result[0].assignee == "Bob" |
| 86 | |
| 87 | def test_skips_invalid_items(self): |
| 88 | pm = MagicMock() |
| 89 | pm.chat.return_value = json.dumps( |
| 90 | [ |
| 91 | {"action": "Valid action"}, |
| 92 | {"assignee": "No action field"}, |
| 93 | {"action": ""}, |
| 94 | ] |
| 95 | ) |
| 96 | result = _extract_action_items(pm, "text") |
| 97 | assert len(result) == 1 |
| 98 | |
| 99 | def test_handles_error(self): |
| 100 | pm = MagicMock() |
| 101 | pm.chat.side_effect = Exception("API down") |
| 102 | result = _extract_action_items(pm, "text") |
| 103 | assert result == [] |
| 104 |
| --- tests/test_prompt_templates.py | ||
| +++ tests/test_prompt_templates.py | ||
| @@ -1,9 +1,7 @@ | ||
| 1 | 1 | """Tests for prompt template management.""" |
| 2 | 2 | |
| 3 | -import pytest | |
| 4 | - | |
| 5 | 3 | from video_processor.utils.prompt_templates import ( |
| 6 | 4 | DEFAULT_TEMPLATES, |
| 7 | 5 | PromptTemplate, |
| 8 | 6 | default_prompt_manager, |
| 9 | 7 | ) |
| 10 | 8 |
| --- tests/test_prompt_templates.py | |
| +++ tests/test_prompt_templates.py | |
| @@ -1,9 +1,7 @@ | |
| 1 | """Tests for prompt template management.""" |
| 2 | |
| 3 | import pytest |
| 4 | |
| 5 | from video_processor.utils.prompt_templates import ( |
| 6 | DEFAULT_TEMPLATES, |
| 7 | PromptTemplate, |
| 8 | default_prompt_manager, |
| 9 | ) |
| 10 |
| --- tests/test_prompt_templates.py | |
| +++ tests/test_prompt_templates.py | |
| @@ -1,9 +1,7 @@ | |
| 1 | """Tests for prompt template management.""" |
| 2 | |
| 3 | from video_processor.utils.prompt_templates import ( |
| 4 | DEFAULT_TEMPLATES, |
| 5 | PromptTemplate, |
| 6 | default_prompt_manager, |
| 7 | ) |
| 8 |
+10
-4
| --- tests/test_providers.py | ||
| +++ tests/test_providers.py | ||
| @@ -1,11 +1,9 @@ | ||
| 1 | 1 | """Tests for the provider abstraction layer.""" |
| 2 | 2 | |
| 3 | 3 | from unittest.mock import MagicMock, patch |
| 4 | 4 | |
| 5 | -import pytest | |
| 6 | - | |
| 7 | 5 | from video_processor.providers.base import BaseProvider, ModelInfo |
| 8 | 6 | from video_processor.providers.manager import ProviderManager |
| 9 | 7 | |
| 10 | 8 | |
| 11 | 9 | class TestModelInfo: |
| @@ -13,11 +11,16 @@ | ||
| 13 | 11 | m = ModelInfo(id="gpt-4o", provider="openai", capabilities=["chat", "vision"]) |
| 14 | 12 | assert m.id == "gpt-4o" |
| 15 | 13 | assert "vision" in m.capabilities |
| 16 | 14 | |
| 17 | 15 | def test_round_trip(self): |
| 18 | - m = ModelInfo(id="claude-sonnet-4-5-20250929", provider="anthropic", display_name="Claude Sonnet", capabilities=["chat", "vision"]) | |
| 16 | + m = ModelInfo( | |
| 17 | + id="claude-sonnet-4-5-20250929", | |
| 18 | + provider="anthropic", | |
| 19 | + display_name="Claude Sonnet", | |
| 20 | + capabilities=["chat", "vision"], | |
| 21 | + ) | |
| 19 | 22 | restored = ModelInfo.model_validate_json(m.model_dump_json()) |
| 20 | 23 | assert restored == m |
| 21 | 24 | |
| 22 | 25 | |
| 23 | 26 | class TestProviderManager: |
| @@ -107,23 +110,26 @@ | ||
| 107 | 110 | class TestDiscovery: |
| 108 | 111 | @patch("video_processor.providers.discovery._cached_models", None) |
| 109 | 112 | @patch.dict("os.environ", {}, clear=True) |
| 110 | 113 | def test_discover_skips_missing_keys(self): |
| 111 | 114 | from video_processor.providers.discovery import discover_available_models |
| 115 | + | |
| 112 | 116 | # No API keys -> empty list, no errors |
| 113 | 117 | models = discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""}) |
| 114 | 118 | assert models == [] |
| 115 | 119 | |
| 116 | 120 | @patch.dict("os.environ", {}, clear=True) |
| 117 | 121 | @patch("video_processor.providers.discovery._cached_models", None) |
| 118 | 122 | def test_discover_caches_results(self): |
| 119 | 123 | from video_processor.providers import discovery |
| 120 | 124 | |
| 121 | - models = discovery.discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""}) | |
| 125 | + models = discovery.discover_available_models( | |
| 126 | + api_keys={"openai": "", "anthropic": "", "gemini": ""} | |
| 127 | + ) | |
| 122 | 128 | assert models == [] |
| 123 | 129 | # Second call should use cache |
| 124 | 130 | models2 = discovery.discover_available_models(api_keys={"openai": "key"}) |
| 125 | 131 | assert models2 == [] # Still cached empty result |
| 126 | 132 | |
| 127 | 133 | # Force refresh |
| 128 | 134 | discovery.clear_discovery_cache() |
| 129 | 135 | # Would try to connect with real key, so skip that test |
| 130 | 136 |
| --- tests/test_providers.py | |
| +++ tests/test_providers.py | |
| @@ -1,11 +1,9 @@ | |
| 1 | """Tests for the provider abstraction layer.""" |
| 2 | |
| 3 | from unittest.mock import MagicMock, patch |
| 4 | |
| 5 | import pytest |
| 6 | |
| 7 | from video_processor.providers.base import BaseProvider, ModelInfo |
| 8 | from video_processor.providers.manager import ProviderManager |
| 9 | |
| 10 | |
| 11 | class TestModelInfo: |
| @@ -13,11 +11,16 @@ | |
| 13 | m = ModelInfo(id="gpt-4o", provider="openai", capabilities=["chat", "vision"]) |
| 14 | assert m.id == "gpt-4o" |
| 15 | assert "vision" in m.capabilities |
| 16 | |
| 17 | def test_round_trip(self): |
| 18 | m = ModelInfo(id="claude-sonnet-4-5-20250929", provider="anthropic", display_name="Claude Sonnet", capabilities=["chat", "vision"]) |
| 19 | restored = ModelInfo.model_validate_json(m.model_dump_json()) |
| 20 | assert restored == m |
| 21 | |
| 22 | |
| 23 | class TestProviderManager: |
| @@ -107,23 +110,26 @@ | |
| 107 | class TestDiscovery: |
| 108 | @patch("video_processor.providers.discovery._cached_models", None) |
| 109 | @patch.dict("os.environ", {}, clear=True) |
| 110 | def test_discover_skips_missing_keys(self): |
| 111 | from video_processor.providers.discovery import discover_available_models |
| 112 | # No API keys -> empty list, no errors |
| 113 | models = discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""}) |
| 114 | assert models == [] |
| 115 | |
| 116 | @patch.dict("os.environ", {}, clear=True) |
| 117 | @patch("video_processor.providers.discovery._cached_models", None) |
| 118 | def test_discover_caches_results(self): |
| 119 | from video_processor.providers import discovery |
| 120 | |
| 121 | models = discovery.discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""}) |
| 122 | assert models == [] |
| 123 | # Second call should use cache |
| 124 | models2 = discovery.discover_available_models(api_keys={"openai": "key"}) |
| 125 | assert models2 == [] # Still cached empty result |
| 126 | |
| 127 | # Force refresh |
| 128 | discovery.clear_discovery_cache() |
| 129 | # Would try to connect with real key, so skip that test |
| 130 |
| --- tests/test_providers.py | |
| +++ tests/test_providers.py | |
| @@ -1,11 +1,9 @@ | |
| 1 | """Tests for the provider abstraction layer.""" |
| 2 | |
| 3 | from unittest.mock import MagicMock, patch |
| 4 | |
| 5 | from video_processor.providers.base import BaseProvider, ModelInfo |
| 6 | from video_processor.providers.manager import ProviderManager |
| 7 | |
| 8 | |
| 9 | class TestModelInfo: |
| @@ -13,11 +11,16 @@ | |
| 11 | m = ModelInfo(id="gpt-4o", provider="openai", capabilities=["chat", "vision"]) |
| 12 | assert m.id == "gpt-4o" |
| 13 | assert "vision" in m.capabilities |
| 14 | |
| 15 | def test_round_trip(self): |
| 16 | m = ModelInfo( |
| 17 | id="claude-sonnet-4-5-20250929", |
| 18 | provider="anthropic", |
| 19 | display_name="Claude Sonnet", |
| 20 | capabilities=["chat", "vision"], |
| 21 | ) |
| 22 | restored = ModelInfo.model_validate_json(m.model_dump_json()) |
| 23 | assert restored == m |
| 24 | |
| 25 | |
| 26 | class TestProviderManager: |
| @@ -107,23 +110,26 @@ | |
| 110 | class TestDiscovery: |
| 111 | @patch("video_processor.providers.discovery._cached_models", None) |
| 112 | @patch.dict("os.environ", {}, clear=True) |
| 113 | def test_discover_skips_missing_keys(self): |
| 114 | from video_processor.providers.discovery import discover_available_models |
| 115 | |
| 116 | # No API keys -> empty list, no errors |
| 117 | models = discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""}) |
| 118 | assert models == [] |
| 119 | |
| 120 | @patch.dict("os.environ", {}, clear=True) |
| 121 | @patch("video_processor.providers.discovery._cached_models", None) |
| 122 | def test_discover_caches_results(self): |
| 123 | from video_processor.providers import discovery |
| 124 | |
| 125 | models = discovery.discover_available_models( |
| 126 | api_keys={"openai": "", "anthropic": "", "gemini": ""} |
| 127 | ) |
| 128 | assert models == [] |
| 129 | # Second call should use cache |
| 130 | models2 = discovery.discover_available_models(api_keys={"openai": "key"}) |
| 131 | assert models2 == [] # Still cached empty result |
| 132 | |
| 133 | # Force refresh |
| 134 | discovery.clear_discovery_cache() |
| 135 | # Would try to connect with real key, so skip that test |
| 136 |
+3
-7
| --- tests/test_rendering.py | ||
| +++ tests/test_rendering.py | ||
| @@ -1,12 +1,8 @@ | ||
| 1 | 1 | """Tests for rendering and export utilities.""" |
| 2 | 2 | |
| 3 | -import json | |
| 4 | -from pathlib import Path | |
| 5 | -from unittest.mock import MagicMock, patch | |
| 6 | - | |
| 7 | -import pytest | |
| 3 | +from unittest.mock import patch | |
| 8 | 4 | |
| 9 | 5 | from video_processor.models import ( |
| 10 | 6 | ActionItem, |
| 11 | 7 | DiagramResult, |
| 12 | 8 | DiagramType, |
| @@ -101,11 +97,11 @@ | ||
| 101 | 97 | assert result == {} |
| 102 | 98 | |
| 103 | 99 | def test_creates_output_dir(self, tmp_path): |
| 104 | 100 | nested = tmp_path / "charts" / "output" |
| 105 | 101 | data = {"labels": ["A"], "values": [1], "chart_type": "bar"} |
| 106 | - result = reproduce_chart(data, nested, "test") | |
| 102 | + reproduce_chart(data, nested, "test") | |
| 107 | 103 | assert nested.exists() |
| 108 | 104 | |
| 109 | 105 | |
| 110 | 106 | class TestExportAllFormats: |
| 111 | 107 | def _make_manifest(self) -> VideoManifest: |
| @@ -180,11 +176,11 @@ | ||
| 180 | 176 | ], |
| 181 | 177 | ) |
| 182 | 178 | (tmp_path / "results").mkdir() |
| 183 | 179 | (tmp_path / "diagrams").mkdir() |
| 184 | 180 | |
| 185 | - result = export_all_formats(tmp_path, manifest) | |
| 181 | + export_all_formats(tmp_path, manifest) | |
| 186 | 182 | # Chart should be reproduced |
| 187 | 183 | chart_svg = tmp_path / "diagrams" / "diagram_0_chart.svg" |
| 188 | 184 | assert chart_svg.exists() |
| 189 | 185 | |
| 190 | 186 | |
| 191 | 187 |
| --- tests/test_rendering.py | |
| +++ tests/test_rendering.py | |
| @@ -1,12 +1,8 @@ | |
| 1 | """Tests for rendering and export utilities.""" |
| 2 | |
| 3 | import json |
| 4 | from pathlib import Path |
| 5 | from unittest.mock import MagicMock, patch |
| 6 | |
| 7 | import pytest |
| 8 | |
| 9 | from video_processor.models import ( |
| 10 | ActionItem, |
| 11 | DiagramResult, |
| 12 | DiagramType, |
| @@ -101,11 +97,11 @@ | |
| 101 | assert result == {} |
| 102 | |
| 103 | def test_creates_output_dir(self, tmp_path): |
| 104 | nested = tmp_path / "charts" / "output" |
| 105 | data = {"labels": ["A"], "values": [1], "chart_type": "bar"} |
| 106 | result = reproduce_chart(data, nested, "test") |
| 107 | assert nested.exists() |
| 108 | |
| 109 | |
| 110 | class TestExportAllFormats: |
| 111 | def _make_manifest(self) -> VideoManifest: |
| @@ -180,11 +176,11 @@ | |
| 180 | ], |
| 181 | ) |
| 182 | (tmp_path / "results").mkdir() |
| 183 | (tmp_path / "diagrams").mkdir() |
| 184 | |
| 185 | result = export_all_formats(tmp_path, manifest) |
| 186 | # Chart should be reproduced |
| 187 | chart_svg = tmp_path / "diagrams" / "diagram_0_chart.svg" |
| 188 | assert chart_svg.exists() |
| 189 | |
| 190 | |
| 191 |
| --- tests/test_rendering.py | |
| +++ tests/test_rendering.py | |
| @@ -1,12 +1,8 @@ | |
| 1 | """Tests for rendering and export utilities.""" |
| 2 | |
| 3 | from unittest.mock import patch |
| 4 | |
| 5 | from video_processor.models import ( |
| 6 | ActionItem, |
| 7 | DiagramResult, |
| 8 | DiagramType, |
| @@ -101,11 +97,11 @@ | |
| 97 | assert result == {} |
| 98 | |
| 99 | def test_creates_output_dir(self, tmp_path): |
| 100 | nested = tmp_path / "charts" / "output" |
| 101 | data = {"labels": ["A"], "values": [1], "chart_type": "bar"} |
| 102 | reproduce_chart(data, nested, "test") |
| 103 | assert nested.exists() |
| 104 | |
| 105 | |
| 106 | class TestExportAllFormats: |
| 107 | def _make_manifest(self) -> VideoManifest: |
| @@ -180,11 +176,11 @@ | |
| 176 | ], |
| 177 | ) |
| 178 | (tmp_path / "results").mkdir() |
| 179 | (tmp_path / "diagrams").mkdir() |
| 180 | |
| 181 | export_all_formats(tmp_path, manifest) |
| 182 | # Chart should be reproduced |
| 183 | chart_svg = tmp_path / "diagrams" / "diagram_0_chart.svg" |
| 184 | assert chart_svg.exists() |
| 185 | |
| 186 | |
| 187 |
| --- video_processor/agent/orchestrator.py | ||
| +++ video_processor/agent/orchestrator.py | ||
| @@ -5,17 +5,13 @@ | ||
| 5 | 5 | import time |
| 6 | 6 | from pathlib import Path |
| 7 | 7 | from typing import Any, Dict, List, Optional |
| 8 | 8 | |
| 9 | 9 | from video_processor.models import ( |
| 10 | - ActionItem, | |
| 11 | - DiagramResult, | |
| 12 | - KeyPoint, | |
| 13 | - ScreenCapture, | |
| 10 | + ProcessingStats, | |
| 14 | 11 | VideoManifest, |
| 15 | 12 | VideoMetadata, |
| 16 | - ProcessingStats, | |
| 17 | 13 | ) |
| 18 | 14 | from video_processor.providers.manager import ProviderManager |
| 19 | 15 | |
| 20 | 16 | logger = logging.getLogger(__name__) |
| 21 | 17 | |
| @@ -107,13 +103,11 @@ | ||
| 107 | 103 | plan.append({"step": "generate_reports", "priority": "required"}) |
| 108 | 104 | |
| 109 | 105 | self._plan = plan |
| 110 | 106 | return plan |
| 111 | 107 | |
| 112 | - def _execute_step( | |
| 113 | - self, step: Dict[str, Any], input_path: Path, output_dir: Path | |
| 114 | - ) -> None: | |
| 108 | + def _execute_step(self, step: Dict[str, Any], input_path: Path, output_dir: Path) -> None: | |
| 115 | 109 | """Execute a single step with retry logic.""" |
| 116 | 110 | step_name = step["step"] |
| 117 | 111 | logger.info(f"Agent step: {step_name}") |
| 118 | 112 | |
| 119 | 113 | for attempt in range(1, self.max_retries + 1): |
| @@ -141,13 +135,11 @@ | ||
| 141 | 135 | result = self._run_step(fallback, input_path, output_dir) |
| 142 | 136 | self._results[step_name] = result |
| 143 | 137 | except Exception as fe: |
| 144 | 138 | logger.error(f"Fallback {fallback} also failed: {fe}") |
| 145 | 139 | |
| 146 | - def _run_step( | |
| 147 | - self, step_name: str, input_path: Path, output_dir: Path | |
| 148 | - ) -> Any: | |
| 140 | + def _run_step(self, step_name: str, input_path: Path, output_dir: Path) -> Any: | |
| 149 | 141 | """Run a specific processing step.""" |
| 150 | 142 | from video_processor.output_structure import create_video_output_dirs |
| 151 | 143 | |
| 152 | 144 | dirs = create_video_output_dirs(output_dir, input_path.stem) |
| 153 | 145 | |
| @@ -177,13 +169,11 @@ | ||
| 177 | 169 | transcription = self.pm.transcribe_audio(audio_path) |
| 178 | 170 | text = transcription.get("text", "") |
| 179 | 171 | |
| 180 | 172 | # Save transcript |
| 181 | 173 | dirs["transcript"].mkdir(parents=True, exist_ok=True) |
| 182 | - (dirs["transcript"] / "transcript.json").write_text( | |
| 183 | - json.dumps(transcription, indent=2) | |
| 184 | - ) | |
| 174 | + (dirs["transcript"] / "transcript.json").write_text(json.dumps(transcription, indent=2)) | |
| 185 | 175 | (dirs["transcript"] / "transcript.txt").write_text(text) |
| 186 | 176 | return transcription |
| 187 | 177 | |
| 188 | 178 | elif step_name == "detect_diagrams": |
| 189 | 179 | from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer |
| @@ -256,23 +246,19 @@ | ||
| 256 | 246 | """Adapt the plan based on step results.""" |
| 257 | 247 | |
| 258 | 248 | if completed_step == "transcribe": |
| 259 | 249 | text = result.get("text", "") if isinstance(result, dict) else "" |
| 260 | 250 | # If transcript is very long, add deep analysis |
| 261 | - if len(text) > 10000 and not any( | |
| 262 | - s["step"] == "deep_analysis" for s in self._plan | |
| 263 | - ): | |
| 251 | + if len(text) > 10000 and not any(s["step"] == "deep_analysis" for s in self._plan): | |
| 264 | 252 | self._plan.append({"step": "deep_analysis", "priority": "adaptive"}) |
| 265 | 253 | logger.info("Agent adapted: adding deep analysis for long transcript") |
| 266 | 254 | |
| 267 | 255 | elif completed_step == "detect_diagrams": |
| 268 | 256 | diagrams = result.get("diagrams", []) if isinstance(result, dict) else [] |
| 269 | 257 | captures = result.get("captures", []) if isinstance(result, dict) else [] |
| 270 | 258 | # If many diagrams found, ensure cross-referencing |
| 271 | - if len(diagrams) >= 3 and not any( | |
| 272 | - s["step"] == "cross_reference" for s in self._plan | |
| 273 | - ): | |
| 259 | + if len(diagrams) >= 3 and not any(s["step"] == "cross_reference" for s in self._plan): | |
| 274 | 260 | self._plan.append({"step": "cross_reference", "priority": "adaptive"}) |
| 275 | 261 | logger.info("Agent adapted: adding cross-reference for diagram-heavy video") |
| 276 | 262 | |
| 277 | 263 | if len(captures) > len(diagrams): |
| 278 | 264 | self._insights.append( |
| @@ -358,11 +344,11 @@ | ||
| 358 | 344 | |
| 359 | 345 | transcript = self._results.get("transcribe", {}) |
| 360 | 346 | kp_result = self._results.get("extract_key_points", {}) |
| 361 | 347 | key_points = kp_result.get("key_points", []) |
| 362 | 348 | ai_result = self._results.get("extract_action_items", {}) |
| 363 | - action_items = ai_result.get("action_items", []) | |
| 349 | + ai_result.get("action_items", []) | |
| 364 | 350 | diagram_result = self._results.get("detect_diagrams", {}) |
| 365 | 351 | diagrams = diagram_result.get("diagrams", []) |
| 366 | 352 | kg_result = self._results.get("build_knowledge_graph", {}) |
| 367 | 353 | kg = kg_result.get("knowledge_graph") |
| 368 | 354 | |
| 369 | 355 |
| --- video_processor/agent/orchestrator.py | |
| +++ video_processor/agent/orchestrator.py | |
| @@ -5,17 +5,13 @@ | |
| 5 | import time |
| 6 | from pathlib import Path |
| 7 | from typing import Any, Dict, List, Optional |
| 8 | |
| 9 | from video_processor.models import ( |
| 10 | ActionItem, |
| 11 | DiagramResult, |
| 12 | KeyPoint, |
| 13 | ScreenCapture, |
| 14 | VideoManifest, |
| 15 | VideoMetadata, |
| 16 | ProcessingStats, |
| 17 | ) |
| 18 | from video_processor.providers.manager import ProviderManager |
| 19 | |
| 20 | logger = logging.getLogger(__name__) |
| 21 | |
| @@ -107,13 +103,11 @@ | |
| 107 | plan.append({"step": "generate_reports", "priority": "required"}) |
| 108 | |
| 109 | self._plan = plan |
| 110 | return plan |
| 111 | |
| 112 | def _execute_step( |
| 113 | self, step: Dict[str, Any], input_path: Path, output_dir: Path |
| 114 | ) -> None: |
| 115 | """Execute a single step with retry logic.""" |
| 116 | step_name = step["step"] |
| 117 | logger.info(f"Agent step: {step_name}") |
| 118 | |
| 119 | for attempt in range(1, self.max_retries + 1): |
| @@ -141,13 +135,11 @@ | |
| 141 | result = self._run_step(fallback, input_path, output_dir) |
| 142 | self._results[step_name] = result |
| 143 | except Exception as fe: |
| 144 | logger.error(f"Fallback {fallback} also failed: {fe}") |
| 145 | |
| 146 | def _run_step( |
| 147 | self, step_name: str, input_path: Path, output_dir: Path |
| 148 | ) -> Any: |
| 149 | """Run a specific processing step.""" |
| 150 | from video_processor.output_structure import create_video_output_dirs |
| 151 | |
| 152 | dirs = create_video_output_dirs(output_dir, input_path.stem) |
| 153 | |
| @@ -177,13 +169,11 @@ | |
| 177 | transcription = self.pm.transcribe_audio(audio_path) |
| 178 | text = transcription.get("text", "") |
| 179 | |
| 180 | # Save transcript |
| 181 | dirs["transcript"].mkdir(parents=True, exist_ok=True) |
| 182 | (dirs["transcript"] / "transcript.json").write_text( |
| 183 | json.dumps(transcription, indent=2) |
| 184 | ) |
| 185 | (dirs["transcript"] / "transcript.txt").write_text(text) |
| 186 | return transcription |
| 187 | |
| 188 | elif step_name == "detect_diagrams": |
| 189 | from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer |
| @@ -256,23 +246,19 @@ | |
| 256 | """Adapt the plan based on step results.""" |
| 257 | |
| 258 | if completed_step == "transcribe": |
| 259 | text = result.get("text", "") if isinstance(result, dict) else "" |
| 260 | # If transcript is very long, add deep analysis |
| 261 | if len(text) > 10000 and not any( |
| 262 | s["step"] == "deep_analysis" for s in self._plan |
| 263 | ): |
| 264 | self._plan.append({"step": "deep_analysis", "priority": "adaptive"}) |
| 265 | logger.info("Agent adapted: adding deep analysis for long transcript") |
| 266 | |
| 267 | elif completed_step == "detect_diagrams": |
| 268 | diagrams = result.get("diagrams", []) if isinstance(result, dict) else [] |
| 269 | captures = result.get("captures", []) if isinstance(result, dict) else [] |
| 270 | # If many diagrams found, ensure cross-referencing |
| 271 | if len(diagrams) >= 3 and not any( |
| 272 | s["step"] == "cross_reference" for s in self._plan |
| 273 | ): |
| 274 | self._plan.append({"step": "cross_reference", "priority": "adaptive"}) |
| 275 | logger.info("Agent adapted: adding cross-reference for diagram-heavy video") |
| 276 | |
| 277 | if len(captures) > len(diagrams): |
| 278 | self._insights.append( |
| @@ -358,11 +344,11 @@ | |
| 358 | |
| 359 | transcript = self._results.get("transcribe", {}) |
| 360 | kp_result = self._results.get("extract_key_points", {}) |
| 361 | key_points = kp_result.get("key_points", []) |
| 362 | ai_result = self._results.get("extract_action_items", {}) |
| 363 | action_items = ai_result.get("action_items", []) |
| 364 | diagram_result = self._results.get("detect_diagrams", {}) |
| 365 | diagrams = diagram_result.get("diagrams", []) |
| 366 | kg_result = self._results.get("build_knowledge_graph", {}) |
| 367 | kg = kg_result.get("knowledge_graph") |
| 368 | |
| 369 |
| --- video_processor/agent/orchestrator.py | |
| +++ video_processor/agent/orchestrator.py | |
| @@ -5,17 +5,13 @@ | |
| 5 | import time |
| 6 | from pathlib import Path |
| 7 | from typing import Any, Dict, List, Optional |
| 8 | |
| 9 | from video_processor.models import ( |
| 10 | ProcessingStats, |
| 11 | VideoManifest, |
| 12 | VideoMetadata, |
| 13 | ) |
| 14 | from video_processor.providers.manager import ProviderManager |
| 15 | |
| 16 | logger = logging.getLogger(__name__) |
| 17 | |
| @@ -107,13 +103,11 @@ | |
| 103 | plan.append({"step": "generate_reports", "priority": "required"}) |
| 104 | |
| 105 | self._plan = plan |
| 106 | return plan |
| 107 | |
| 108 | def _execute_step(self, step: Dict[str, Any], input_path: Path, output_dir: Path) -> None: |
| 109 | """Execute a single step with retry logic.""" |
| 110 | step_name = step["step"] |
| 111 | logger.info(f"Agent step: {step_name}") |
| 112 | |
| 113 | for attempt in range(1, self.max_retries + 1): |
| @@ -141,13 +135,11 @@ | |
| 135 | result = self._run_step(fallback, input_path, output_dir) |
| 136 | self._results[step_name] = result |
| 137 | except Exception as fe: |
| 138 | logger.error(f"Fallback {fallback} also failed: {fe}") |
| 139 | |
| 140 | def _run_step(self, step_name: str, input_path: Path, output_dir: Path) -> Any: |
| 141 | """Run a specific processing step.""" |
| 142 | from video_processor.output_structure import create_video_output_dirs |
| 143 | |
| 144 | dirs = create_video_output_dirs(output_dir, input_path.stem) |
| 145 | |
| @@ -177,13 +169,11 @@ | |
| 169 | transcription = self.pm.transcribe_audio(audio_path) |
| 170 | text = transcription.get("text", "") |
| 171 | |
| 172 | # Save transcript |
| 173 | dirs["transcript"].mkdir(parents=True, exist_ok=True) |
| 174 | (dirs["transcript"] / "transcript.json").write_text(json.dumps(transcription, indent=2)) |
| 175 | (dirs["transcript"] / "transcript.txt").write_text(text) |
| 176 | return transcription |
| 177 | |
| 178 | elif step_name == "detect_diagrams": |
| 179 | from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer |
| @@ -256,23 +246,19 @@ | |
| 246 | """Adapt the plan based on step results.""" |
| 247 | |
| 248 | if completed_step == "transcribe": |
| 249 | text = result.get("text", "") if isinstance(result, dict) else "" |
| 250 | # If transcript is very long, add deep analysis |
| 251 | if len(text) > 10000 and not any(s["step"] == "deep_analysis" for s in self._plan): |
| 252 | self._plan.append({"step": "deep_analysis", "priority": "adaptive"}) |
| 253 | logger.info("Agent adapted: adding deep analysis for long transcript") |
| 254 | |
| 255 | elif completed_step == "detect_diagrams": |
| 256 | diagrams = result.get("diagrams", []) if isinstance(result, dict) else [] |
| 257 | captures = result.get("captures", []) if isinstance(result, dict) else [] |
| 258 | # If many diagrams found, ensure cross-referencing |
| 259 | if len(diagrams) >= 3 and not any(s["step"] == "cross_reference" for s in self._plan): |
| 260 | self._plan.append({"step": "cross_reference", "priority": "adaptive"}) |
| 261 | logger.info("Agent adapted: adding cross-reference for diagram-heavy video") |
| 262 | |
| 263 | if len(captures) > len(diagrams): |
| 264 | self._insights.append( |
| @@ -358,11 +344,11 @@ | |
| 344 | |
| 345 | transcript = self._results.get("transcribe", {}) |
| 346 | kp_result = self._results.get("extract_key_points", {}) |
| 347 | key_points = kp_result.get("key_points", []) |
| 348 | ai_result = self._results.get("extract_action_items", {}) |
| 349 | ai_result.get("action_items", []) |
| 350 | diagram_result = self._results.get("detect_diagrams", {}) |
| 351 | diagrams = diagram_result.get("diagrams", []) |
| 352 | kg_result = self._results.get("build_knowledge_graph", {}) |
| 353 | kg = kg_result.get("knowledge_graph") |
| 354 | |
| 355 |
| --- video_processor/analyzers/action_detector.py | ||
| +++ video_processor/analyzers/action_detector.py | ||
| @@ -150,23 +150,25 @@ | ||
| 150 | 150 | return [] |
| 151 | 151 | |
| 152 | 152 | def _pattern_extract(self, text: str) -> List[ActionItem]: |
| 153 | 153 | """Extract action items using regex pattern matching.""" |
| 154 | 154 | items: List[ActionItem] = [] |
| 155 | - sentences = re.split(r'[.!?]\s+', text) | |
| 155 | + sentences = re.split(r"[.!?]\s+", text) | |
| 156 | 156 | |
| 157 | 157 | for sentence in sentences: |
| 158 | 158 | sentence = sentence.strip() |
| 159 | 159 | if not sentence or len(sentence) < 10: |
| 160 | 160 | continue |
| 161 | 161 | |
| 162 | 162 | for pattern in _ACTION_PATTERNS: |
| 163 | 163 | if pattern.search(sentence): |
| 164 | - items.append(ActionItem( | |
| 165 | - action=sentence, | |
| 166 | - source="transcript", | |
| 167 | - )) | |
| 164 | + items.append( | |
| 165 | + ActionItem( | |
| 166 | + action=sentence, | |
| 167 | + source="transcript", | |
| 168 | + ) | |
| 169 | + ) | |
| 168 | 170 | break # One match per sentence is enough |
| 169 | 171 | |
| 170 | 172 | return items |
| 171 | 173 | |
| 172 | 174 | def _attach_timestamps( |
| 173 | 175 |
| --- video_processor/analyzers/action_detector.py | |
| +++ video_processor/analyzers/action_detector.py | |
| @@ -150,23 +150,25 @@ | |
| 150 | return [] |
| 151 | |
| 152 | def _pattern_extract(self, text: str) -> List[ActionItem]: |
| 153 | """Extract action items using regex pattern matching.""" |
| 154 | items: List[ActionItem] = [] |
| 155 | sentences = re.split(r'[.!?]\s+', text) |
| 156 | |
| 157 | for sentence in sentences: |
| 158 | sentence = sentence.strip() |
| 159 | if not sentence or len(sentence) < 10: |
| 160 | continue |
| 161 | |
| 162 | for pattern in _ACTION_PATTERNS: |
| 163 | if pattern.search(sentence): |
| 164 | items.append(ActionItem( |
| 165 | action=sentence, |
| 166 | source="transcript", |
| 167 | )) |
| 168 | break # One match per sentence is enough |
| 169 | |
| 170 | return items |
| 171 | |
| 172 | def _attach_timestamps( |
| 173 |
| --- video_processor/analyzers/action_detector.py | |
| +++ video_processor/analyzers/action_detector.py | |
| @@ -150,23 +150,25 @@ | |
| 150 | return [] |
| 151 | |
| 152 | def _pattern_extract(self, text: str) -> List[ActionItem]: |
| 153 | """Extract action items using regex pattern matching.""" |
| 154 | items: List[ActionItem] = [] |
| 155 | sentences = re.split(r"[.!?]\s+", text) |
| 156 | |
| 157 | for sentence in sentences: |
| 158 | sentence = sentence.strip() |
| 159 | if not sentence or len(sentence) < 10: |
| 160 | continue |
| 161 | |
| 162 | for pattern in _ACTION_PATTERNS: |
| 163 | if pattern.search(sentence): |
| 164 | items.append( |
| 165 | ActionItem( |
| 166 | action=sentence, |
| 167 | source="transcript", |
| 168 | ) |
| 169 | ) |
| 170 | break # One match per sentence is enough |
| 171 | |
| 172 | return items |
| 173 | |
| 174 | def _attach_timestamps( |
| 175 |
| --- video_processor/analyzers/content_analyzer.py | ||
| +++ video_processor/analyzers/content_analyzer.py | ||
| @@ -58,18 +58,18 @@ | ||
| 58 | 58 | ) |
| 59 | 59 | |
| 60 | 60 | # LLM fuzzy matching for unmatched entities |
| 61 | 61 | if self.pm: |
| 62 | 62 | unmatched_t = [ |
| 63 | - e for e in transcript_entities if e.name.lower() not in { | |
| 64 | - d.name.lower() for d in diagram_entities | |
| 65 | - } | |
| 63 | + e | |
| 64 | + for e in transcript_entities | |
| 65 | + if e.name.lower() not in {d.name.lower() for d in diagram_entities} | |
| 66 | 66 | ] |
| 67 | 67 | unmatched_d = [ |
| 68 | - e for e in diagram_entities if e.name.lower() not in { | |
| 69 | - t.name.lower() for t in transcript_entities | |
| 70 | - } | |
| 68 | + e | |
| 69 | + for e in diagram_entities | |
| 70 | + if e.name.lower() not in {t.name.lower() for t in transcript_entities} | |
| 71 | 71 | ] |
| 72 | 72 | |
| 73 | 73 | if unmatched_t and unmatched_d: |
| 74 | 74 | matches = self._fuzzy_match(unmatched_t, unmatched_d) |
| 75 | 75 | for t_name, d_name in matches: |
| @@ -136,11 +136,13 @@ | ||
| 136 | 136 | |
| 137 | 137 | # Build diagram entity index |
| 138 | 138 | diagram_entities: dict[int, set[str]] = {} |
| 139 | 139 | for i, d in enumerate(diagrams): |
| 140 | 140 | elements = d.get("elements", []) if isinstance(d, dict) else getattr(d, "elements", []) |
| 141 | - text = d.get("text_content", "") if isinstance(d, dict) else getattr(d, "text_content", "") | |
| 141 | + text = ( | |
| 142 | + d.get("text_content", "") if isinstance(d, dict) else getattr(d, "text_content", "") | |
| 143 | + ) | |
| 142 | 144 | entities = set(str(e).lower() for e in elements) |
| 143 | 145 | if text: |
| 144 | 146 | entities.update(word.lower() for word in text.split() if len(word) > 3) |
| 145 | 147 | diagram_entities[i] = entities |
| 146 | 148 | |
| 147 | 149 |
| --- video_processor/analyzers/content_analyzer.py | |
| +++ video_processor/analyzers/content_analyzer.py | |
| @@ -58,18 +58,18 @@ | |
| 58 | ) |
| 59 | |
| 60 | # LLM fuzzy matching for unmatched entities |
| 61 | if self.pm: |
| 62 | unmatched_t = [ |
| 63 | e for e in transcript_entities if e.name.lower() not in { |
| 64 | d.name.lower() for d in diagram_entities |
| 65 | } |
| 66 | ] |
| 67 | unmatched_d = [ |
| 68 | e for e in diagram_entities if e.name.lower() not in { |
| 69 | t.name.lower() for t in transcript_entities |
| 70 | } |
| 71 | ] |
| 72 | |
| 73 | if unmatched_t and unmatched_d: |
| 74 | matches = self._fuzzy_match(unmatched_t, unmatched_d) |
| 75 | for t_name, d_name in matches: |
| @@ -136,11 +136,13 @@ | |
| 136 | |
| 137 | # Build diagram entity index |
| 138 | diagram_entities: dict[int, set[str]] = {} |
| 139 | for i, d in enumerate(diagrams): |
| 140 | elements = d.get("elements", []) if isinstance(d, dict) else getattr(d, "elements", []) |
| 141 | text = d.get("text_content", "") if isinstance(d, dict) else getattr(d, "text_content", "") |
| 142 | entities = set(str(e).lower() for e in elements) |
| 143 | if text: |
| 144 | entities.update(word.lower() for word in text.split() if len(word) > 3) |
| 145 | diagram_entities[i] = entities |
| 146 | |
| 147 |
| --- video_processor/analyzers/content_analyzer.py | |
| +++ video_processor/analyzers/content_analyzer.py | |
| @@ -58,18 +58,18 @@ | |
| 58 | ) |
| 59 | |
| 60 | # LLM fuzzy matching for unmatched entities |
| 61 | if self.pm: |
| 62 | unmatched_t = [ |
| 63 | e |
| 64 | for e in transcript_entities |
| 65 | if e.name.lower() not in {d.name.lower() for d in diagram_entities} |
| 66 | ] |
| 67 | unmatched_d = [ |
| 68 | e |
| 69 | for e in diagram_entities |
| 70 | if e.name.lower() not in {t.name.lower() for t in transcript_entities} |
| 71 | ] |
| 72 | |
| 73 | if unmatched_t and unmatched_d: |
| 74 | matches = self._fuzzy_match(unmatched_t, unmatched_d) |
| 75 | for t_name, d_name in matches: |
| @@ -136,11 +136,13 @@ | |
| 136 | |
| 137 | # Build diagram entity index |
| 138 | diagram_entities: dict[int, set[str]] = {} |
| 139 | for i, d in enumerate(diagrams): |
| 140 | elements = d.get("elements", []) if isinstance(d, dict) else getattr(d, "elements", []) |
| 141 | text = ( |
| 142 | d.get("text_content", "") if isinstance(d, dict) else getattr(d, "text_content", "") |
| 143 | ) |
| 144 | entities = set(str(e).lower() for e in elements) |
| 145 | if text: |
| 146 | entities.update(word.lower() for word in text.split() if len(word) > 3) |
| 147 | diagram_entities[i] = entities |
| 148 | |
| 149 |
| --- video_processor/analyzers/diagram_analyzer.py | ||
| +++ video_processor/analyzers/diagram_analyzer.py | ||
| @@ -24,23 +24,25 @@ | ||
| 24 | 24 | shared/presented content, NOT people or camera views. |
| 25 | 25 | |
| 26 | 26 | Return ONLY a JSON object (no markdown fences): |
| 27 | 27 | { |
| 28 | 28 | "is_diagram": true/false, |
| 29 | - "diagram_type": "flowchart"|"sequence"|"architecture"|"whiteboard"|"chart"|"table"|"slide"|"screenshot"|"unknown", | |
| 29 | + "diagram_type": "flowchart"|"sequence"|"architecture" | |
| 30 | + |"whiteboard"|"chart"|"table"|"slide"|"screenshot"|"unknown", | |
| 30 | 31 | "confidence": 0.0 to 1.0, |
| 31 | 32 | "content_type": "slide"|"diagram"|"document"|"screen_share"|"whiteboard"|"chart"|"person"|"other", |
| 32 | 33 | "brief_description": "one-sentence description of what you see" |
| 33 | 34 | } |
| 34 | 35 | """ |
| 35 | 36 | |
| 36 | 37 | # Single-pass analysis prompt — extracts everything in one call |
| 37 | 38 | _ANALYSIS_PROMPT = """\ |
| 38 | -Analyze this diagram/visual content comprehensively. Extract ALL of the following in a single JSON response (no markdown fences): | |
| 39 | - | |
| 39 | +Analyze this diagram/visual content comprehensively. Extract ALL of the | |
| 40 | +following in a single JSON response (no markdown fences): | |
| 40 | 41 | { |
| 41 | - "diagram_type": "flowchart"|"sequence"|"architecture"|"whiteboard"|"chart"|"table"|"slide"|"screenshot"|"unknown", | |
| 42 | + "diagram_type": "flowchart"|"sequence"|"architecture" | |
| 43 | + |"whiteboard"|"chart"|"table"|"slide"|"screenshot"|"unknown", | |
| 42 | 44 | "description": "detailed description of the visual content", |
| 43 | 45 | "text_content": "all visible text, preserving structure", |
| 44 | 46 | "elements": ["list", "of", "identified", "elements/components"], |
| 45 | 47 | "relationships": ["element A -> element B: relationship", ...], |
| 46 | 48 | "mermaid": "mermaid diagram syntax representing this visual (graph LR, sequenceDiagram, etc.)", |
| @@ -68,12 +70,11 @@ | ||
| 68 | 70 | # Strip markdown fences |
| 69 | 71 | cleaned = text.strip() |
| 70 | 72 | if cleaned.startswith("```"): |
| 71 | 73 | lines = cleaned.split("\n") |
| 72 | 74 | # Remove first and last fence lines |
| 73 | - lines = [l for l in lines if not l.strip().startswith("```")] | |
| 74 | - cleaned = "\n".join(lines) | |
| 75 | + lines = [line for line in lines if not line.strip().startswith("```")] | |
| 75 | 76 | try: |
| 76 | 77 | return json.loads(cleaned) |
| 77 | 78 | except json.JSONDecodeError: |
| 78 | 79 | # Try to find JSON object in the text |
| 79 | 80 | start = cleaned.find("{") |
| @@ -105,11 +106,16 @@ | ||
| 105 | 106 | """ |
| 106 | 107 | image_bytes = _read_image_bytes(image_path) |
| 107 | 108 | raw = self.pm.analyze_image(image_bytes, _CLASSIFY_PROMPT, max_tokens=512) |
| 108 | 109 | result = _parse_json_response(raw) |
| 109 | 110 | if result is None: |
| 110 | - return {"is_diagram": False, "diagram_type": "unknown", "confidence": 0.0, "brief_description": ""} | |
| 111 | + return { | |
| 112 | + "is_diagram": False, | |
| 113 | + "diagram_type": "unknown", | |
| 114 | + "confidence": 0.0, | |
| 115 | + "brief_description": "", | |
| 116 | + } | |
| 111 | 117 | return result |
| 112 | 118 | |
| 113 | 119 | def analyze_diagram_single_pass(self, image_path: Union[str, Path]) -> dict: |
| 114 | 120 | """ |
| 115 | 121 | Full single-pass diagram analysis — description, text, mermaid, chart data. |
| @@ -163,15 +169,19 @@ | ||
| 163 | 169 | logger.debug(f"Frame {i}: confidence {confidence:.2f} below threshold, skipping") |
| 164 | 170 | continue |
| 165 | 171 | |
| 166 | 172 | if confidence >= 0.7: |
| 167 | 173 | # Full diagram analysis |
| 168 | - logger.info(f"Frame {i}: diagram detected (confidence {confidence:.2f}), analyzing...") | |
| 174 | + logger.info( | |
| 175 | + f"Frame {i}: diagram detected (confidence {confidence:.2f}), analyzing..." | |
| 176 | + ) | |
| 169 | 177 | try: |
| 170 | 178 | analysis = self.analyze_diagram_single_pass(fp) |
| 171 | 179 | except Exception as e: |
| 172 | - logger.warning(f"Diagram analysis failed for frame {i}: {e}, falling back to screengrab") | |
| 180 | + logger.warning( | |
| 181 | + f"Diagram analysis failed for frame {i}: {e}, falling back to screengrab" | |
| 182 | + ) | |
| 173 | 183 | analysis = {} |
| 174 | 184 | |
| 175 | 185 | if not analysis: |
| 176 | 186 | # Analysis failed — fall back to screengrab |
| 177 | 187 | capture = self._save_screengrab(fp, i, capture_idx, captures_dir, confidence) |
| @@ -221,16 +231,20 @@ | ||
| 221 | 231 | diagrams.append(dr) |
| 222 | 232 | diagram_idx += 1 |
| 223 | 233 | |
| 224 | 234 | else: |
| 225 | 235 | # Screengrab fallback (0.3 <= confidence < 0.7) |
| 226 | - logger.info(f"Frame {i}: uncertain (confidence {confidence:.2f}), saving as screengrab") | |
| 236 | + logger.info( | |
| 237 | + f"Frame {i}: uncertain (confidence {confidence:.2f}), saving as screengrab" | |
| 238 | + ) | |
| 227 | 239 | capture = self._save_screengrab(fp, i, capture_idx, captures_dir, confidence) |
| 228 | 240 | captures.append(capture) |
| 229 | 241 | capture_idx += 1 |
| 230 | 242 | |
| 231 | - logger.info(f"Diagram processing complete: {len(diagrams)} diagrams, {len(captures)} screengrabs") | |
| 243 | + logger.info( | |
| 244 | + f"Diagram processing complete: {len(diagrams)} diagrams, {len(captures)} screengrabs" | |
| 245 | + ) | |
| 232 | 246 | return diagrams, captures |
| 233 | 247 | |
| 234 | 248 | def _save_screengrab( |
| 235 | 249 | self, |
| 236 | 250 | frame_path: Path, |
| 237 | 251 |
| --- video_processor/analyzers/diagram_analyzer.py | |
| +++ video_processor/analyzers/diagram_analyzer.py | |
| @@ -24,23 +24,25 @@ | |
| 24 | shared/presented content, NOT people or camera views. |
| 25 | |
| 26 | Return ONLY a JSON object (no markdown fences): |
| 27 | { |
| 28 | "is_diagram": true/false, |
| 29 | "diagram_type": "flowchart"|"sequence"|"architecture"|"whiteboard"|"chart"|"table"|"slide"|"screenshot"|"unknown", |
| 30 | "confidence": 0.0 to 1.0, |
| 31 | "content_type": "slide"|"diagram"|"document"|"screen_share"|"whiteboard"|"chart"|"person"|"other", |
| 32 | "brief_description": "one-sentence description of what you see" |
| 33 | } |
| 34 | """ |
| 35 | |
| 36 | # Single-pass analysis prompt — extracts everything in one call |
| 37 | _ANALYSIS_PROMPT = """\ |
| 38 | Analyze this diagram/visual content comprehensively. Extract ALL of the following in a single JSON response (no markdown fences): |
| 39 | |
| 40 | { |
| 41 | "diagram_type": "flowchart"|"sequence"|"architecture"|"whiteboard"|"chart"|"table"|"slide"|"screenshot"|"unknown", |
| 42 | "description": "detailed description of the visual content", |
| 43 | "text_content": "all visible text, preserving structure", |
| 44 | "elements": ["list", "of", "identified", "elements/components"], |
| 45 | "relationships": ["element A -> element B: relationship", ...], |
| 46 | "mermaid": "mermaid diagram syntax representing this visual (graph LR, sequenceDiagram, etc.)", |
| @@ -68,12 +70,11 @@ | |
| 68 | # Strip markdown fences |
| 69 | cleaned = text.strip() |
| 70 | if cleaned.startswith("```"): |
| 71 | lines = cleaned.split("\n") |
| 72 | # Remove first and last fence lines |
| 73 | lines = [l for l in lines if not l.strip().startswith("```")] |
| 74 | cleaned = "\n".join(lines) |
| 75 | try: |
| 76 | return json.loads(cleaned) |
| 77 | except json.JSONDecodeError: |
| 78 | # Try to find JSON object in the text |
| 79 | start = cleaned.find("{") |
| @@ -105,11 +106,16 @@ | |
| 105 | """ |
| 106 | image_bytes = _read_image_bytes(image_path) |
| 107 | raw = self.pm.analyze_image(image_bytes, _CLASSIFY_PROMPT, max_tokens=512) |
| 108 | result = _parse_json_response(raw) |
| 109 | if result is None: |
| 110 | return {"is_diagram": False, "diagram_type": "unknown", "confidence": 0.0, "brief_description": ""} |
| 111 | return result |
| 112 | |
| 113 | def analyze_diagram_single_pass(self, image_path: Union[str, Path]) -> dict: |
| 114 | """ |
| 115 | Full single-pass diagram analysis — description, text, mermaid, chart data. |
| @@ -163,15 +169,19 @@ | |
| 163 | logger.debug(f"Frame {i}: confidence {confidence:.2f} below threshold, skipping") |
| 164 | continue |
| 165 | |
| 166 | if confidence >= 0.7: |
| 167 | # Full diagram analysis |
| 168 | logger.info(f"Frame {i}: diagram detected (confidence {confidence:.2f}), analyzing...") |
| 169 | try: |
| 170 | analysis = self.analyze_diagram_single_pass(fp) |
| 171 | except Exception as e: |
| 172 | logger.warning(f"Diagram analysis failed for frame {i}: {e}, falling back to screengrab") |
| 173 | analysis = {} |
| 174 | |
| 175 | if not analysis: |
| 176 | # Analysis failed — fall back to screengrab |
| 177 | capture = self._save_screengrab(fp, i, capture_idx, captures_dir, confidence) |
| @@ -221,16 +231,20 @@ | |
| 221 | diagrams.append(dr) |
| 222 | diagram_idx += 1 |
| 223 | |
| 224 | else: |
| 225 | # Screengrab fallback (0.3 <= confidence < 0.7) |
| 226 | logger.info(f"Frame {i}: uncertain (confidence {confidence:.2f}), saving as screengrab") |
| 227 | capture = self._save_screengrab(fp, i, capture_idx, captures_dir, confidence) |
| 228 | captures.append(capture) |
| 229 | capture_idx += 1 |
| 230 | |
| 231 | logger.info(f"Diagram processing complete: {len(diagrams)} diagrams, {len(captures)} screengrabs") |
| 232 | return diagrams, captures |
| 233 | |
| 234 | def _save_screengrab( |
| 235 | self, |
| 236 | frame_path: Path, |
| 237 |
| --- video_processor/analyzers/diagram_analyzer.py | |
| +++ video_processor/analyzers/diagram_analyzer.py | |
| @@ -24,23 +24,25 @@ | |
| 24 | shared/presented content, NOT people or camera views. |
| 25 | |
| 26 | Return ONLY a JSON object (no markdown fences): |
| 27 | { |
| 28 | "is_diagram": true/false, |
| 29 | "diagram_type": "flowchart"|"sequence"|"architecture" |
| 30 | |"whiteboard"|"chart"|"table"|"slide"|"screenshot"|"unknown", |
| 31 | "confidence": 0.0 to 1.0, |
| 32 | "content_type": "slide"|"diagram"|"document"|"screen_share"|"whiteboard"|"chart"|"person"|"other", |
| 33 | "brief_description": "one-sentence description of what you see" |
| 34 | } |
| 35 | """ |
| 36 | |
| 37 | # Single-pass analysis prompt — extracts everything in one call |
| 38 | _ANALYSIS_PROMPT = """\ |
| 39 | Analyze this diagram/visual content comprehensively. Extract ALL of the |
| 40 | following in a single JSON response (no markdown fences): |
| 41 | { |
| 42 | "diagram_type": "flowchart"|"sequence"|"architecture" |
| 43 | |"whiteboard"|"chart"|"table"|"slide"|"screenshot"|"unknown", |
| 44 | "description": "detailed description of the visual content", |
| 45 | "text_content": "all visible text, preserving structure", |
| 46 | "elements": ["list", "of", "identified", "elements/components"], |
| 47 | "relationships": ["element A -> element B: relationship", ...], |
| 48 | "mermaid": "mermaid diagram syntax representing this visual (graph LR, sequenceDiagram, etc.)", |
| @@ -68,12 +70,11 @@ | |
| 70 | # Strip markdown fences |
| 71 | cleaned = text.strip() |
| 72 | if cleaned.startswith("```"): |
| 73 | lines = cleaned.split("\n") |
| 74 | # Remove first and last fence lines |
| 75 | lines = [line for line in lines if not line.strip().startswith("```")] |
| 76 | try: |
| 77 | return json.loads(cleaned) |
| 78 | except json.JSONDecodeError: |
| 79 | # Try to find JSON object in the text |
| 80 | start = cleaned.find("{") |
| @@ -105,11 +106,16 @@ | |
| 106 | """ |
| 107 | image_bytes = _read_image_bytes(image_path) |
| 108 | raw = self.pm.analyze_image(image_bytes, _CLASSIFY_PROMPT, max_tokens=512) |
| 109 | result = _parse_json_response(raw) |
| 110 | if result is None: |
| 111 | return { |
| 112 | "is_diagram": False, |
| 113 | "diagram_type": "unknown", |
| 114 | "confidence": 0.0, |
| 115 | "brief_description": "", |
| 116 | } |
| 117 | return result |
| 118 | |
| 119 | def analyze_diagram_single_pass(self, image_path: Union[str, Path]) -> dict: |
| 120 | """ |
| 121 | Full single-pass diagram analysis — description, text, mermaid, chart data. |
| @@ -163,15 +169,19 @@ | |
| 169 | logger.debug(f"Frame {i}: confidence {confidence:.2f} below threshold, skipping") |
| 170 | continue |
| 171 | |
| 172 | if confidence >= 0.7: |
| 173 | # Full diagram analysis |
| 174 | logger.info( |
| 175 | f"Frame {i}: diagram detected (confidence {confidence:.2f}), analyzing..." |
| 176 | ) |
| 177 | try: |
| 178 | analysis = self.analyze_diagram_single_pass(fp) |
| 179 | except Exception as e: |
| 180 | logger.warning( |
| 181 | f"Diagram analysis failed for frame {i}: {e}, falling back to screengrab" |
| 182 | ) |
| 183 | analysis = {} |
| 184 | |
| 185 | if not analysis: |
| 186 | # Analysis failed — fall back to screengrab |
| 187 | capture = self._save_screengrab(fp, i, capture_idx, captures_dir, confidence) |
| @@ -221,16 +231,20 @@ | |
| 231 | diagrams.append(dr) |
| 232 | diagram_idx += 1 |
| 233 | |
| 234 | else: |
| 235 | # Screengrab fallback (0.3 <= confidence < 0.7) |
| 236 | logger.info( |
| 237 | f"Frame {i}: uncertain (confidence {confidence:.2f}), saving as screengrab" |
| 238 | ) |
| 239 | capture = self._save_screengrab(fp, i, capture_idx, captures_dir, confidence) |
| 240 | captures.append(capture) |
| 241 | capture_idx += 1 |
| 242 | |
| 243 | logger.info( |
| 244 | f"Diagram processing complete: {len(diagrams)} diagrams, {len(captures)} screengrabs" |
| 245 | ) |
| 246 | return diagrams, captures |
| 247 | |
| 248 | def _save_screengrab( |
| 249 | self, |
| 250 | frame_path: Path, |
| 251 |
+46
-15
| --- video_processor/cli/commands.py | ||
| +++ video_processor/cli/commands.py | ||
| @@ -2,13 +2,11 @@ | ||
| 2 | 2 | |
| 3 | 3 | import json |
| 4 | 4 | import logging |
| 5 | 5 | import os |
| 6 | 6 | import sys |
| 7 | -import time | |
| 8 | 7 | from pathlib import Path |
| 9 | -from typing import List, Optional | |
| 10 | 8 | |
| 11 | 9 | import click |
| 12 | 10 | import colorlog |
| 13 | 11 | from tqdm import tqdm |
| 14 | 12 | |
| @@ -49,23 +47,32 @@ | ||
| 49 | 47 | if ctx.invoked_subcommand is None: |
| 50 | 48 | _interactive_menu(ctx) |
| 51 | 49 | |
| 52 | 50 | |
| 53 | 51 | @cli.command() |
| 54 | -@click.option("--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path") | |
| 52 | +@click.option( | |
| 53 | + "--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path" | |
| 54 | +) | |
| 55 | 55 | @click.option("--output", "-o", required=True, type=click.Path(), help="Output directory") |
| 56 | 56 | @click.option( |
| 57 | 57 | "--depth", |
| 58 | 58 | type=click.Choice(["basic", "standard", "comprehensive"]), |
| 59 | 59 | default="standard", |
| 60 | 60 | help="Processing depth", |
| 61 | 61 | ) |
| 62 | -@click.option("--focus", type=str, help='Comma-separated focus areas (e.g., "diagrams,action-items")') | |
| 62 | +@click.option( | |
| 63 | + "--focus", type=str, help='Comma-separated focus areas (e.g., "diagrams,action-items")' | |
| 64 | +) | |
| 63 | 65 | @click.option("--use-gpu", is_flag=True, help="Enable GPU acceleration if available") |
| 64 | 66 | @click.option("--sampling-rate", type=float, default=0.5, help="Frame sampling rate") |
| 65 | 67 | @click.option("--change-threshold", type=float, default=0.15, help="Visual change threshold") |
| 66 | -@click.option("--periodic-capture", type=float, default=30.0, help="Capture a frame every N seconds regardless of change (0 to disable)") | |
| 68 | +@click.option( | |
| 69 | + "--periodic-capture", | |
| 70 | + type=float, | |
| 71 | + default=30.0, | |
| 72 | + help="Capture a frame every N seconds regardless of change (0 to disable)", | |
| 73 | +) | |
| 67 | 74 | @click.option("--title", type=str, help="Title for the analysis report") |
| 68 | 75 | @click.option( |
| 69 | 76 | "--provider", |
| 70 | 77 | "-p", |
| 71 | 78 | type=click.Choice(["auto", "openai", "anthropic", "gemini"]), |
| @@ -102,11 +109,11 @@ | ||
| 102 | 109 | chat_model=chat_model, |
| 103 | 110 | provider=prov, |
| 104 | 111 | ) |
| 105 | 112 | |
| 106 | 113 | try: |
| 107 | - manifest = process_single_video( | |
| 114 | + process_single_video( | |
| 108 | 115 | input_path=input, |
| 109 | 116 | output_dir=output, |
| 110 | 117 | provider_manager=pm, |
| 111 | 118 | depth=depth, |
| 112 | 119 | focus_areas=focus_areas, |
| @@ -127,11 +134,13 @@ | ||
| 127 | 134 | traceback.print_exc() |
| 128 | 135 | sys.exit(1) |
| 129 | 136 | |
| 130 | 137 | |
| 131 | 138 | @cli.command() |
| 132 | -@click.option("--input-dir", "-i", type=click.Path(), default=None, help="Local directory of videos") | |
| 139 | +@click.option( | |
| 140 | + "--input-dir", "-i", type=click.Path(), default=None, help="Local directory of videos" | |
| 141 | +) | |
| 133 | 142 | @click.option("--output", "-o", required=True, type=click.Path(), help="Output directory") |
| 134 | 143 | @click.option( |
| 135 | 144 | "--depth", |
| 136 | 145 | type=click.Choice(["basic", "standard", "comprehensive"]), |
| 137 | 146 | default="standard", |
| @@ -159,20 +168,35 @@ | ||
| 159 | 168 | default="local", |
| 160 | 169 | help="Video source (local directory, Google Drive, or Dropbox)", |
| 161 | 170 | ) |
| 162 | 171 | @click.option("--folder-id", type=str, default=None, help="Google Drive folder ID") |
| 163 | 172 | @click.option("--folder-path", type=str, default=None, help="Cloud folder path") |
| 164 | -@click.option("--recursive/--no-recursive", default=True, help="Recurse into subfolders (default: recursive)") | |
| 173 | +@click.option( | |
| 174 | + "--recursive/--no-recursive", default=True, help="Recurse into subfolders (default: recursive)" | |
| 175 | +) | |
| 165 | 176 | @click.pass_context |
| 166 | -def batch(ctx, input_dir, output, depth, pattern, title, provider, vision_model, chat_model, source, folder_id, folder_path, recursive): | |
| 177 | +def batch( | |
| 178 | + ctx, | |
| 179 | + input_dir, | |
| 180 | + output, | |
| 181 | + depth, | |
| 182 | + pattern, | |
| 183 | + title, | |
| 184 | + provider, | |
| 185 | + vision_model, | |
| 186 | + chat_model, | |
| 187 | + source, | |
| 188 | + folder_id, | |
| 189 | + folder_path, | |
| 190 | + recursive, | |
| 191 | +): | |
| 167 | 192 | """Process a folder of videos in batch.""" |
| 168 | 193 | from video_processor.integrators.knowledge_graph import KnowledgeGraph |
| 169 | 194 | from video_processor.integrators.plan_generator import PlanGenerator |
| 170 | 195 | from video_processor.models import BatchManifest, BatchVideoEntry |
| 171 | 196 | from video_processor.output_structure import ( |
| 172 | 197 | create_batch_output_dirs, |
| 173 | - read_video_manifest, | |
| 174 | 198 | write_batch_manifest, |
| 175 | 199 | ) |
| 176 | 200 | from video_processor.pipeline import process_single_video |
| 177 | 201 | from video_processor.providers.manager import ProviderManager |
| 178 | 202 | |
| @@ -190,21 +214,23 @@ | ||
| 190 | 214 | |
| 191 | 215 | cloud = GoogleDriveSource() |
| 192 | 216 | if not cloud.authenticate(): |
| 193 | 217 | logging.error("Google Drive authentication failed") |
| 194 | 218 | sys.exit(1) |
| 195 | - cloud_files = cloud.list_videos(folder_id=folder_id, folder_path=folder_path, patterns=patterns, recursive=recursive) | |
| 196 | - local_paths = cloud.download_all(cloud_files, download_dir) | |
| 219 | + cloud_files = cloud.list_videos( | |
| 220 | + folder_id=folder_id, folder_path=folder_path, patterns=patterns, recursive=recursive | |
| 221 | + ) | |
| 222 | + cloud.download_all(cloud_files, download_dir) | |
| 197 | 223 | elif source == "dropbox": |
| 198 | 224 | from video_processor.sources.dropbox_source import DropboxSource |
| 199 | 225 | |
| 200 | 226 | cloud = DropboxSource() |
| 201 | 227 | if not cloud.authenticate(): |
| 202 | 228 | logging.error("Dropbox authentication failed") |
| 203 | 229 | sys.exit(1) |
| 204 | 230 | cloud_files = cloud.list_videos(folder_path=folder_path, patterns=patterns) |
| 205 | - local_paths = cloud.download_all(cloud_files, download_dir) | |
| 231 | + cloud.download_all(cloud_files, download_dir) | |
| 206 | 232 | else: |
| 207 | 233 | logging.error(f"Unknown source: {source}") |
| 208 | 234 | sys.exit(1) |
| 209 | 235 | |
| 210 | 236 | input_dir = download_dir |
| @@ -302,11 +328,14 @@ | ||
| 302 | 328 | batch_summary_md="batch_summary.md", |
| 303 | 329 | merged_knowledge_graph_json="knowledge_graph.json", |
| 304 | 330 | ) |
| 305 | 331 | write_batch_manifest(batch_manifest, output) |
| 306 | 332 | click.echo(pm.usage.format_summary()) |
| 307 | - click.echo(f"\n Batch complete: {batch_manifest.completed_videos}/{batch_manifest.total_videos} succeeded") | |
| 333 | + click.echo( | |
| 334 | + f"\n Batch complete: {batch_manifest.completed_videos}" | |
| 335 | + f"/{batch_manifest.total_videos} succeeded" | |
| 336 | + ) | |
| 308 | 337 | click.echo(f" Results: {output}/batch_manifest.json") |
| 309 | 338 | |
| 310 | 339 | |
| 311 | 340 | @cli.command("list-models") |
| 312 | 341 | @click.pass_context |
| @@ -374,11 +403,13 @@ | ||
| 374 | 403 | traceback.print_exc() |
| 375 | 404 | sys.exit(1) |
| 376 | 405 | |
| 377 | 406 | |
| 378 | 407 | @cli.command("agent-analyze") |
| 379 | -@click.option("--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path") | |
| 408 | +@click.option( | |
| 409 | + "--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path" | |
| 410 | +) | |
| 380 | 411 | @click.option("--output", "-o", required=True, type=click.Path(), help="Output directory") |
| 381 | 412 | @click.option( |
| 382 | 413 | "--depth", |
| 383 | 414 | type=click.Choice(["basic", "standard", "comprehensive"]), |
| 384 | 415 | default="standard", |
| 385 | 416 |
| --- video_processor/cli/commands.py | |
| +++ video_processor/cli/commands.py | |
| @@ -2,13 +2,11 @@ | |
| 2 | |
| 3 | import json |
| 4 | import logging |
| 5 | import os |
| 6 | import sys |
| 7 | import time |
| 8 | from pathlib import Path |
| 9 | from typing import List, Optional |
| 10 | |
| 11 | import click |
| 12 | import colorlog |
| 13 | from tqdm import tqdm |
| 14 | |
| @@ -49,23 +47,32 @@ | |
| 49 | if ctx.invoked_subcommand is None: |
| 50 | _interactive_menu(ctx) |
| 51 | |
| 52 | |
| 53 | @cli.command() |
| 54 | @click.option("--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path") |
| 55 | @click.option("--output", "-o", required=True, type=click.Path(), help="Output directory") |
| 56 | @click.option( |
| 57 | "--depth", |
| 58 | type=click.Choice(["basic", "standard", "comprehensive"]), |
| 59 | default="standard", |
| 60 | help="Processing depth", |
| 61 | ) |
| 62 | @click.option("--focus", type=str, help='Comma-separated focus areas (e.g., "diagrams,action-items")') |
| 63 | @click.option("--use-gpu", is_flag=True, help="Enable GPU acceleration if available") |
| 64 | @click.option("--sampling-rate", type=float, default=0.5, help="Frame sampling rate") |
| 65 | @click.option("--change-threshold", type=float, default=0.15, help="Visual change threshold") |
| 66 | @click.option("--periodic-capture", type=float, default=30.0, help="Capture a frame every N seconds regardless of change (0 to disable)") |
| 67 | @click.option("--title", type=str, help="Title for the analysis report") |
| 68 | @click.option( |
| 69 | "--provider", |
| 70 | "-p", |
| 71 | type=click.Choice(["auto", "openai", "anthropic", "gemini"]), |
| @@ -102,11 +109,11 @@ | |
| 102 | chat_model=chat_model, |
| 103 | provider=prov, |
| 104 | ) |
| 105 | |
| 106 | try: |
| 107 | manifest = process_single_video( |
| 108 | input_path=input, |
| 109 | output_dir=output, |
| 110 | provider_manager=pm, |
| 111 | depth=depth, |
| 112 | focus_areas=focus_areas, |
| @@ -127,11 +134,13 @@ | |
| 127 | traceback.print_exc() |
| 128 | sys.exit(1) |
| 129 | |
| 130 | |
| 131 | @cli.command() |
| 132 | @click.option("--input-dir", "-i", type=click.Path(), default=None, help="Local directory of videos") |
| 133 | @click.option("--output", "-o", required=True, type=click.Path(), help="Output directory") |
| 134 | @click.option( |
| 135 | "--depth", |
| 136 | type=click.Choice(["basic", "standard", "comprehensive"]), |
| 137 | default="standard", |
| @@ -159,20 +168,35 @@ | |
| 159 | default="local", |
| 160 | help="Video source (local directory, Google Drive, or Dropbox)", |
| 161 | ) |
| 162 | @click.option("--folder-id", type=str, default=None, help="Google Drive folder ID") |
| 163 | @click.option("--folder-path", type=str, default=None, help="Cloud folder path") |
| 164 | @click.option("--recursive/--no-recursive", default=True, help="Recurse into subfolders (default: recursive)") |
| 165 | @click.pass_context |
| 166 | def batch(ctx, input_dir, output, depth, pattern, title, provider, vision_model, chat_model, source, folder_id, folder_path, recursive): |
| 167 | """Process a folder of videos in batch.""" |
| 168 | from video_processor.integrators.knowledge_graph import KnowledgeGraph |
| 169 | from video_processor.integrators.plan_generator import PlanGenerator |
| 170 | from video_processor.models import BatchManifest, BatchVideoEntry |
| 171 | from video_processor.output_structure import ( |
| 172 | create_batch_output_dirs, |
| 173 | read_video_manifest, |
| 174 | write_batch_manifest, |
| 175 | ) |
| 176 | from video_processor.pipeline import process_single_video |
| 177 | from video_processor.providers.manager import ProviderManager |
| 178 | |
| @@ -190,21 +214,23 @@ | |
| 190 | |
| 191 | cloud = GoogleDriveSource() |
| 192 | if not cloud.authenticate(): |
| 193 | logging.error("Google Drive authentication failed") |
| 194 | sys.exit(1) |
| 195 | cloud_files = cloud.list_videos(folder_id=folder_id, folder_path=folder_path, patterns=patterns, recursive=recursive) |
| 196 | local_paths = cloud.download_all(cloud_files, download_dir) |
| 197 | elif source == "dropbox": |
| 198 | from video_processor.sources.dropbox_source import DropboxSource |
| 199 | |
| 200 | cloud = DropboxSource() |
| 201 | if not cloud.authenticate(): |
| 202 | logging.error("Dropbox authentication failed") |
| 203 | sys.exit(1) |
| 204 | cloud_files = cloud.list_videos(folder_path=folder_path, patterns=patterns) |
| 205 | local_paths = cloud.download_all(cloud_files, download_dir) |
| 206 | else: |
| 207 | logging.error(f"Unknown source: {source}") |
| 208 | sys.exit(1) |
| 209 | |
| 210 | input_dir = download_dir |
| @@ -302,11 +328,14 @@ | |
| 302 | batch_summary_md="batch_summary.md", |
| 303 | merged_knowledge_graph_json="knowledge_graph.json", |
| 304 | ) |
| 305 | write_batch_manifest(batch_manifest, output) |
| 306 | click.echo(pm.usage.format_summary()) |
| 307 | click.echo(f"\n Batch complete: {batch_manifest.completed_videos}/{batch_manifest.total_videos} succeeded") |
| 308 | click.echo(f" Results: {output}/batch_manifest.json") |
| 309 | |
| 310 | |
| 311 | @cli.command("list-models") |
| 312 | @click.pass_context |
| @@ -374,11 +403,13 @@ | |
| 374 | traceback.print_exc() |
| 375 | sys.exit(1) |
| 376 | |
| 377 | |
| 378 | @cli.command("agent-analyze") |
| 379 | @click.option("--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path") |
| 380 | @click.option("--output", "-o", required=True, type=click.Path(), help="Output directory") |
| 381 | @click.option( |
| 382 | "--depth", |
| 383 | type=click.Choice(["basic", "standard", "comprehensive"]), |
| 384 | default="standard", |
| 385 |
| --- video_processor/cli/commands.py | |
| +++ video_processor/cli/commands.py | |
| @@ -2,13 +2,11 @@ | |
| 2 | |
| 3 | import json |
| 4 | import logging |
| 5 | import os |
| 6 | import sys |
| 7 | from pathlib import Path |
| 8 | |
| 9 | import click |
| 10 | import colorlog |
| 11 | from tqdm import tqdm |
| 12 | |
| @@ -49,23 +47,32 @@ | |
| 47 | if ctx.invoked_subcommand is None: |
| 48 | _interactive_menu(ctx) |
| 49 | |
| 50 | |
| 51 | @cli.command() |
| 52 | @click.option( |
| 53 | "--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path" |
| 54 | ) |
| 55 | @click.option("--output", "-o", required=True, type=click.Path(), help="Output directory") |
| 56 | @click.option( |
| 57 | "--depth", |
| 58 | type=click.Choice(["basic", "standard", "comprehensive"]), |
| 59 | default="standard", |
| 60 | help="Processing depth", |
| 61 | ) |
| 62 | @click.option( |
| 63 | "--focus", type=str, help='Comma-separated focus areas (e.g., "diagrams,action-items")' |
| 64 | ) |
| 65 | @click.option("--use-gpu", is_flag=True, help="Enable GPU acceleration if available") |
| 66 | @click.option("--sampling-rate", type=float, default=0.5, help="Frame sampling rate") |
| 67 | @click.option("--change-threshold", type=float, default=0.15, help="Visual change threshold") |
| 68 | @click.option( |
| 69 | "--periodic-capture", |
| 70 | type=float, |
| 71 | default=30.0, |
| 72 | help="Capture a frame every N seconds regardless of change (0 to disable)", |
| 73 | ) |
| 74 | @click.option("--title", type=str, help="Title for the analysis report") |
| 75 | @click.option( |
| 76 | "--provider", |
| 77 | "-p", |
| 78 | type=click.Choice(["auto", "openai", "anthropic", "gemini"]), |
| @@ -102,11 +109,11 @@ | |
| 109 | chat_model=chat_model, |
| 110 | provider=prov, |
| 111 | ) |
| 112 | |
| 113 | try: |
| 114 | process_single_video( |
| 115 | input_path=input, |
| 116 | output_dir=output, |
| 117 | provider_manager=pm, |
| 118 | depth=depth, |
| 119 | focus_areas=focus_areas, |
| @@ -127,11 +134,13 @@ | |
| 134 | traceback.print_exc() |
| 135 | sys.exit(1) |
| 136 | |
| 137 | |
| 138 | @cli.command() |
| 139 | @click.option( |
| 140 | "--input-dir", "-i", type=click.Path(), default=None, help="Local directory of videos" |
| 141 | ) |
| 142 | @click.option("--output", "-o", required=True, type=click.Path(), help="Output directory") |
| 143 | @click.option( |
| 144 | "--depth", |
| 145 | type=click.Choice(["basic", "standard", "comprehensive"]), |
| 146 | default="standard", |
| @@ -159,20 +168,35 @@ | |
| 168 | default="local", |
| 169 | help="Video source (local directory, Google Drive, or Dropbox)", |
| 170 | ) |
| 171 | @click.option("--folder-id", type=str, default=None, help="Google Drive folder ID") |
| 172 | @click.option("--folder-path", type=str, default=None, help="Cloud folder path") |
| 173 | @click.option( |
| 174 | "--recursive/--no-recursive", default=True, help="Recurse into subfolders (default: recursive)" |
| 175 | ) |
| 176 | @click.pass_context |
| 177 | def batch( |
| 178 | ctx, |
| 179 | input_dir, |
| 180 | output, |
| 181 | depth, |
| 182 | pattern, |
| 183 | title, |
| 184 | provider, |
| 185 | vision_model, |
| 186 | chat_model, |
| 187 | source, |
| 188 | folder_id, |
| 189 | folder_path, |
| 190 | recursive, |
| 191 | ): |
| 192 | """Process a folder of videos in batch.""" |
| 193 | from video_processor.integrators.knowledge_graph import KnowledgeGraph |
| 194 | from video_processor.integrators.plan_generator import PlanGenerator |
| 195 | from video_processor.models import BatchManifest, BatchVideoEntry |
| 196 | from video_processor.output_structure import ( |
| 197 | create_batch_output_dirs, |
| 198 | write_batch_manifest, |
| 199 | ) |
| 200 | from video_processor.pipeline import process_single_video |
| 201 | from video_processor.providers.manager import ProviderManager |
| 202 | |
| @@ -190,21 +214,23 @@ | |
| 214 | |
| 215 | cloud = GoogleDriveSource() |
| 216 | if not cloud.authenticate(): |
| 217 | logging.error("Google Drive authentication failed") |
| 218 | sys.exit(1) |
| 219 | cloud_files = cloud.list_videos( |
| 220 | folder_id=folder_id, folder_path=folder_path, patterns=patterns, recursive=recursive |
| 221 | ) |
| 222 | cloud.download_all(cloud_files, download_dir) |
| 223 | elif source == "dropbox": |
| 224 | from video_processor.sources.dropbox_source import DropboxSource |
| 225 | |
| 226 | cloud = DropboxSource() |
| 227 | if not cloud.authenticate(): |
| 228 | logging.error("Dropbox authentication failed") |
| 229 | sys.exit(1) |
| 230 | cloud_files = cloud.list_videos(folder_path=folder_path, patterns=patterns) |
| 231 | cloud.download_all(cloud_files, download_dir) |
| 232 | else: |
| 233 | logging.error(f"Unknown source: {source}") |
| 234 | sys.exit(1) |
| 235 | |
| 236 | input_dir = download_dir |
| @@ -302,11 +328,14 @@ | |
| 328 | batch_summary_md="batch_summary.md", |
| 329 | merged_knowledge_graph_json="knowledge_graph.json", |
| 330 | ) |
| 331 | write_batch_manifest(batch_manifest, output) |
| 332 | click.echo(pm.usage.format_summary()) |
| 333 | click.echo( |
| 334 | f"\n Batch complete: {batch_manifest.completed_videos}" |
| 335 | f"/{batch_manifest.total_videos} succeeded" |
| 336 | ) |
| 337 | click.echo(f" Results: {output}/batch_manifest.json") |
| 338 | |
| 339 | |
| 340 | @cli.command("list-models") |
| 341 | @click.pass_context |
| @@ -374,11 +403,13 @@ | |
| 403 | traceback.print_exc() |
| 404 | sys.exit(1) |
| 405 | |
| 406 | |
| 407 | @cli.command("agent-analyze") |
| 408 | @click.option( |
| 409 | "--input", "-i", required=True, type=click.Path(exists=True), help="Input video file path" |
| 410 | ) |
| 411 | @click.option("--output", "-o", required=True, type=click.Path(), help="Output directory") |
| 412 | @click.option( |
| 413 | "--depth", |
| 414 | type=click.Choice(["basic", "standard", "comprehensive"]), |
| 415 | default="standard", |
| 416 |
+25
-22
| --- video_processor/cli/output_formatter.py | ||
| +++ video_processor/cli/output_formatter.py | ||
| @@ -1,42 +1,42 @@ | ||
| 1 | 1 | """Output formatting for PlanOpticon analysis results.""" |
| 2 | 2 | |
| 3 | 3 | import html |
| 4 | -import json | |
| 5 | 4 | import logging |
| 6 | 5 | import shutil |
| 7 | 6 | from pathlib import Path |
| 8 | 7 | from typing import Dict, List, Optional, Union |
| 9 | 8 | |
| 10 | 9 | logger = logging.getLogger(__name__) |
| 10 | + | |
| 11 | 11 | |
| 12 | 12 | class OutputFormatter: |
| 13 | 13 | """Formats and organizes output from video analysis.""" |
| 14 | - | |
| 14 | + | |
| 15 | 15 | def __init__(self, output_dir: Union[str, Path]): |
| 16 | 16 | """ |
| 17 | 17 | Initialize output formatter. |
| 18 | - | |
| 18 | + | |
| 19 | 19 | Parameters |
| 20 | 20 | ---------- |
| 21 | 21 | output_dir : str or Path |
| 22 | 22 | Output directory for formatted content |
| 23 | 23 | """ |
| 24 | 24 | self.output_dir = Path(output_dir) |
| 25 | 25 | self.output_dir.mkdir(parents=True, exist_ok=True) |
| 26 | - | |
| 26 | + | |
| 27 | 27 | def organize_outputs( |
| 28 | 28 | self, |
| 29 | 29 | markdown_path: Union[str, Path], |
| 30 | 30 | knowledge_graph_path: Union[str, Path], |
| 31 | 31 | diagrams: List[Dict], |
| 32 | 32 | frames_dir: Optional[Union[str, Path]] = None, |
| 33 | - transcript_path: Optional[Union[str, Path]] = None | |
| 33 | + transcript_path: Optional[Union[str, Path]] = None, | |
| 34 | 34 | ) -> Dict: |
| 35 | 35 | """ |
| 36 | 36 | Organize outputs into a consistent structure. |
| 37 | - | |
| 37 | + | |
| 38 | 38 | Parameters |
| 39 | 39 | ---------- |
| 40 | 40 | markdown_path : str or Path |
| 41 | 41 | Path to markdown analysis |
| 42 | 42 | knowledge_graph_path : str or Path |
| @@ -45,84 +45,84 @@ | ||
| 45 | 45 | List of diagram analysis results |
| 46 | 46 | frames_dir : str or Path, optional |
| 47 | 47 | Directory with extracted frames |
| 48 | 48 | transcript_path : str or Path, optional |
| 49 | 49 | Path to transcript file |
| 50 | - | |
| 50 | + | |
| 51 | 51 | Returns |
| 52 | 52 | ------- |
| 53 | 53 | dict |
| 54 | 54 | Dictionary with organized output paths |
| 55 | 55 | """ |
| 56 | 56 | # Create output structure |
| 57 | 57 | md_dir = self.output_dir / "markdown" |
| 58 | 58 | diagrams_dir = self.output_dir / "diagrams" |
| 59 | 59 | data_dir = self.output_dir / "data" |
| 60 | - | |
| 60 | + | |
| 61 | 61 | md_dir.mkdir(exist_ok=True) |
| 62 | 62 | diagrams_dir.mkdir(exist_ok=True) |
| 63 | 63 | data_dir.mkdir(exist_ok=True) |
| 64 | - | |
| 64 | + | |
| 65 | 65 | # Copy markdown file |
| 66 | 66 | markdown_path = Path(markdown_path) |
| 67 | 67 | md_output = md_dir / markdown_path.name |
| 68 | 68 | shutil.copy2(markdown_path, md_output) |
| 69 | - | |
| 69 | + | |
| 70 | 70 | # Copy knowledge graph |
| 71 | 71 | kg_path = Path(knowledge_graph_path) |
| 72 | 72 | kg_output = data_dir / kg_path.name |
| 73 | 73 | shutil.copy2(kg_path, kg_output) |
| 74 | - | |
| 74 | + | |
| 75 | 75 | # Copy diagram images if available |
| 76 | 76 | diagram_images = [] |
| 77 | 77 | for diagram in diagrams: |
| 78 | 78 | if "image_path" in diagram and diagram["image_path"]: |
| 79 | 79 | img_path = Path(diagram["image_path"]) |
| 80 | 80 | if img_path.exists(): |
| 81 | 81 | img_output = diagrams_dir / img_path.name |
| 82 | 82 | shutil.copy2(img_path, img_output) |
| 83 | 83 | diagram_images.append(str(img_output)) |
| 84 | - | |
| 84 | + | |
| 85 | 85 | # Copy transcript if provided |
| 86 | 86 | transcript_output = None |
| 87 | 87 | if transcript_path: |
| 88 | 88 | transcript_path = Path(transcript_path) |
| 89 | 89 | if transcript_path.exists(): |
| 90 | 90 | transcript_output = data_dir / transcript_path.name |
| 91 | 91 | shutil.copy2(transcript_path, transcript_output) |
| 92 | - | |
| 92 | + | |
| 93 | 93 | # Copy selected frames if provided |
| 94 | 94 | frame_outputs = [] |
| 95 | 95 | if frames_dir: |
| 96 | 96 | frames_dir = Path(frames_dir) |
| 97 | 97 | if frames_dir.exists(): |
| 98 | 98 | frames_output_dir = self.output_dir / "frames" |
| 99 | 99 | frames_output_dir.mkdir(exist_ok=True) |
| 100 | - | |
| 100 | + | |
| 101 | 101 | # Copy a limited number of representative frames |
| 102 | 102 | frame_files = sorted(list(frames_dir.glob("*.jpg"))) |
| 103 | 103 | max_frames = min(10, len(frame_files)) |
| 104 | 104 | step = max(1, len(frame_files) // max_frames) |
| 105 | - | |
| 105 | + | |
| 106 | 106 | for i in range(0, len(frame_files), step): |
| 107 | 107 | if len(frame_outputs) >= max_frames: |
| 108 | 108 | break |
| 109 | - | |
| 109 | + | |
| 110 | 110 | frame = frame_files[i] |
| 111 | 111 | frame_output = frames_output_dir / frame.name |
| 112 | 112 | shutil.copy2(frame, frame_output) |
| 113 | 113 | frame_outputs.append(str(frame_output)) |
| 114 | - | |
| 114 | + | |
| 115 | 115 | # Return organized paths |
| 116 | 116 | return { |
| 117 | 117 | "markdown": str(md_output), |
| 118 | 118 | "knowledge_graph": str(kg_output), |
| 119 | 119 | "diagram_images": diagram_images, |
| 120 | 120 | "frames": frame_outputs, |
| 121 | - "transcript": str(transcript_output) if transcript_output else None | |
| 121 | + "transcript": str(transcript_output) if transcript_output else None, | |
| 122 | 122 | } |
| 123 | - | |
| 123 | + | |
| 124 | 124 | def create_html_index(self, outputs: Dict) -> Path: |
| 125 | 125 | """ |
| 126 | 126 | Create HTML index page for outputs. |
| 127 | 127 | |
| 128 | 128 | Parameters |
| @@ -142,11 +142,12 @@ | ||
| 142 | 142 | "<!DOCTYPE html>", |
| 143 | 143 | "<html>", |
| 144 | 144 | "<head>", |
| 145 | 145 | " <title>PlanOpticon Analysis Results</title>", |
| 146 | 146 | " <style>", |
| 147 | - " body { font-family: Arial, sans-serif; margin: 0; padding: 20px; line-height: 1.6; }", | |
| 147 | + " body { font-family: Arial, sans-serif;" | |
| 148 | + " margin: 0; padding: 20px; line-height: 1.6; }", | |
| 148 | 149 | " .container { max-width: 1200px; margin: 0 auto; }", |
| 149 | 150 | " h1 { color: #333; }", |
| 150 | 151 | " h2 { color: #555; margin-top: 30px; }", |
| 151 | 152 | " .section { margin-bottom: 30px; }", |
| 152 | 153 | " .files { display: flex; flex-wrap: wrap; }", |
| @@ -158,11 +159,11 @@ | ||
| 158 | 159 | " </style>", |
| 159 | 160 | "</head>", |
| 160 | 161 | "<body>", |
| 161 | 162 | "<div class='container'>", |
| 162 | 163 | " <h1>PlanOpticon Analysis Results</h1>", |
| 163 | - "" | |
| 164 | + "", | |
| 164 | 165 | ] |
| 165 | 166 | |
| 166 | 167 | # Add markdown section |
| 167 | 168 | if outputs.get("markdown"): |
| 168 | 169 | md_path = Path(outputs["markdown"]) |
| @@ -228,11 +229,13 @@ | ||
| 228 | 229 | lines.append(" <ul>") |
| 229 | 230 | |
| 230 | 231 | for data_path in data_files: |
| 231 | 232 | data_rel = esc(str(data_path.relative_to(self.output_dir))) |
| 232 | 233 | data_name = esc(data_path.name) |
| 233 | - lines.append(f" <li><a href='{data_rel}' target='_blank'>{data_name}</a></li>") | |
| 234 | + lines.append( | |
| 235 | + f" <li><a href='{data_rel}' target='_blank'>{data_name}</a></li>" | |
| 236 | + ) | |
| 234 | 237 | |
| 235 | 238 | lines.append(" </ul>") |
| 236 | 239 | lines.append(" </div>") |
| 237 | 240 | |
| 238 | 241 | # Close HTML |
| 239 | 242 |
| --- video_processor/cli/output_formatter.py | |
| +++ video_processor/cli/output_formatter.py | |
| @@ -1,42 +1,42 @@ | |
| 1 | """Output formatting for PlanOpticon analysis results.""" |
| 2 | |
| 3 | import html |
| 4 | import json |
| 5 | import logging |
| 6 | import shutil |
| 7 | from pathlib import Path |
| 8 | from typing import Dict, List, Optional, Union |
| 9 | |
| 10 | logger = logging.getLogger(__name__) |
| 11 | |
| 12 | class OutputFormatter: |
| 13 | """Formats and organizes output from video analysis.""" |
| 14 | |
| 15 | def __init__(self, output_dir: Union[str, Path]): |
| 16 | """ |
| 17 | Initialize output formatter. |
| 18 | |
| 19 | Parameters |
| 20 | ---------- |
| 21 | output_dir : str or Path |
| 22 | Output directory for formatted content |
| 23 | """ |
| 24 | self.output_dir = Path(output_dir) |
| 25 | self.output_dir.mkdir(parents=True, exist_ok=True) |
| 26 | |
| 27 | def organize_outputs( |
| 28 | self, |
| 29 | markdown_path: Union[str, Path], |
| 30 | knowledge_graph_path: Union[str, Path], |
| 31 | diagrams: List[Dict], |
| 32 | frames_dir: Optional[Union[str, Path]] = None, |
| 33 | transcript_path: Optional[Union[str, Path]] = None |
| 34 | ) -> Dict: |
| 35 | """ |
| 36 | Organize outputs into a consistent structure. |
| 37 | |
| 38 | Parameters |
| 39 | ---------- |
| 40 | markdown_path : str or Path |
| 41 | Path to markdown analysis |
| 42 | knowledge_graph_path : str or Path |
| @@ -45,84 +45,84 @@ | |
| 45 | List of diagram analysis results |
| 46 | frames_dir : str or Path, optional |
| 47 | Directory with extracted frames |
| 48 | transcript_path : str or Path, optional |
| 49 | Path to transcript file |
| 50 | |
| 51 | Returns |
| 52 | ------- |
| 53 | dict |
| 54 | Dictionary with organized output paths |
| 55 | """ |
| 56 | # Create output structure |
| 57 | md_dir = self.output_dir / "markdown" |
| 58 | diagrams_dir = self.output_dir / "diagrams" |
| 59 | data_dir = self.output_dir / "data" |
| 60 | |
| 61 | md_dir.mkdir(exist_ok=True) |
| 62 | diagrams_dir.mkdir(exist_ok=True) |
| 63 | data_dir.mkdir(exist_ok=True) |
| 64 | |
| 65 | # Copy markdown file |
| 66 | markdown_path = Path(markdown_path) |
| 67 | md_output = md_dir / markdown_path.name |
| 68 | shutil.copy2(markdown_path, md_output) |
| 69 | |
| 70 | # Copy knowledge graph |
| 71 | kg_path = Path(knowledge_graph_path) |
| 72 | kg_output = data_dir / kg_path.name |
| 73 | shutil.copy2(kg_path, kg_output) |
| 74 | |
| 75 | # Copy diagram images if available |
| 76 | diagram_images = [] |
| 77 | for diagram in diagrams: |
| 78 | if "image_path" in diagram and diagram["image_path"]: |
| 79 | img_path = Path(diagram["image_path"]) |
| 80 | if img_path.exists(): |
| 81 | img_output = diagrams_dir / img_path.name |
| 82 | shutil.copy2(img_path, img_output) |
| 83 | diagram_images.append(str(img_output)) |
| 84 | |
| 85 | # Copy transcript if provided |
| 86 | transcript_output = None |
| 87 | if transcript_path: |
| 88 | transcript_path = Path(transcript_path) |
| 89 | if transcript_path.exists(): |
| 90 | transcript_output = data_dir / transcript_path.name |
| 91 | shutil.copy2(transcript_path, transcript_output) |
| 92 | |
| 93 | # Copy selected frames if provided |
| 94 | frame_outputs = [] |
| 95 | if frames_dir: |
| 96 | frames_dir = Path(frames_dir) |
| 97 | if frames_dir.exists(): |
| 98 | frames_output_dir = self.output_dir / "frames" |
| 99 | frames_output_dir.mkdir(exist_ok=True) |
| 100 | |
| 101 | # Copy a limited number of representative frames |
| 102 | frame_files = sorted(list(frames_dir.glob("*.jpg"))) |
| 103 | max_frames = min(10, len(frame_files)) |
| 104 | step = max(1, len(frame_files) // max_frames) |
| 105 | |
| 106 | for i in range(0, len(frame_files), step): |
| 107 | if len(frame_outputs) >= max_frames: |
| 108 | break |
| 109 | |
| 110 | frame = frame_files[i] |
| 111 | frame_output = frames_output_dir / frame.name |
| 112 | shutil.copy2(frame, frame_output) |
| 113 | frame_outputs.append(str(frame_output)) |
| 114 | |
| 115 | # Return organized paths |
| 116 | return { |
| 117 | "markdown": str(md_output), |
| 118 | "knowledge_graph": str(kg_output), |
| 119 | "diagram_images": diagram_images, |
| 120 | "frames": frame_outputs, |
| 121 | "transcript": str(transcript_output) if transcript_output else None |
| 122 | } |
| 123 | |
| 124 | def create_html_index(self, outputs: Dict) -> Path: |
| 125 | """ |
| 126 | Create HTML index page for outputs. |
| 127 | |
| 128 | Parameters |
| @@ -142,11 +142,12 @@ | |
| 142 | "<!DOCTYPE html>", |
| 143 | "<html>", |
| 144 | "<head>", |
| 145 | " <title>PlanOpticon Analysis Results</title>", |
| 146 | " <style>", |
| 147 | " body { font-family: Arial, sans-serif; margin: 0; padding: 20px; line-height: 1.6; }", |
| 148 | " .container { max-width: 1200px; margin: 0 auto; }", |
| 149 | " h1 { color: #333; }", |
| 150 | " h2 { color: #555; margin-top: 30px; }", |
| 151 | " .section { margin-bottom: 30px; }", |
| 152 | " .files { display: flex; flex-wrap: wrap; }", |
| @@ -158,11 +159,11 @@ | |
| 158 | " </style>", |
| 159 | "</head>", |
| 160 | "<body>", |
| 161 | "<div class='container'>", |
| 162 | " <h1>PlanOpticon Analysis Results</h1>", |
| 163 | "" |
| 164 | ] |
| 165 | |
| 166 | # Add markdown section |
| 167 | if outputs.get("markdown"): |
| 168 | md_path = Path(outputs["markdown"]) |
| @@ -228,11 +229,13 @@ | |
| 228 | lines.append(" <ul>") |
| 229 | |
| 230 | for data_path in data_files: |
| 231 | data_rel = esc(str(data_path.relative_to(self.output_dir))) |
| 232 | data_name = esc(data_path.name) |
| 233 | lines.append(f" <li><a href='{data_rel}' target='_blank'>{data_name}</a></li>") |
| 234 | |
| 235 | lines.append(" </ul>") |
| 236 | lines.append(" </div>") |
| 237 | |
| 238 | # Close HTML |
| 239 |
| --- video_processor/cli/output_formatter.py | |
| +++ video_processor/cli/output_formatter.py | |
| @@ -1,42 +1,42 @@ | |
| 1 | """Output formatting for PlanOpticon analysis results.""" |
| 2 | |
| 3 | import html |
| 4 | import logging |
| 5 | import shutil |
| 6 | from pathlib import Path |
| 7 | from typing import Dict, List, Optional, Union |
| 8 | |
| 9 | logger = logging.getLogger(__name__) |
| 10 | |
| 11 | |
| 12 | class OutputFormatter: |
| 13 | """Formats and organizes output from video analysis.""" |
| 14 | |
| 15 | def __init__(self, output_dir: Union[str, Path]): |
| 16 | """ |
| 17 | Initialize output formatter. |
| 18 | |
| 19 | Parameters |
| 20 | ---------- |
| 21 | output_dir : str or Path |
| 22 | Output directory for formatted content |
| 23 | """ |
| 24 | self.output_dir = Path(output_dir) |
| 25 | self.output_dir.mkdir(parents=True, exist_ok=True) |
| 26 | |
| 27 | def organize_outputs( |
| 28 | self, |
| 29 | markdown_path: Union[str, Path], |
| 30 | knowledge_graph_path: Union[str, Path], |
| 31 | diagrams: List[Dict], |
| 32 | frames_dir: Optional[Union[str, Path]] = None, |
| 33 | transcript_path: Optional[Union[str, Path]] = None, |
| 34 | ) -> Dict: |
| 35 | """ |
| 36 | Organize outputs into a consistent structure. |
| 37 | |
| 38 | Parameters |
| 39 | ---------- |
| 40 | markdown_path : str or Path |
| 41 | Path to markdown analysis |
| 42 | knowledge_graph_path : str or Path |
| @@ -45,84 +45,84 @@ | |
| 45 | List of diagram analysis results |
| 46 | frames_dir : str or Path, optional |
| 47 | Directory with extracted frames |
| 48 | transcript_path : str or Path, optional |
| 49 | Path to transcript file |
| 50 | |
| 51 | Returns |
| 52 | ------- |
| 53 | dict |
| 54 | Dictionary with organized output paths |
| 55 | """ |
| 56 | # Create output structure |
| 57 | md_dir = self.output_dir / "markdown" |
| 58 | diagrams_dir = self.output_dir / "diagrams" |
| 59 | data_dir = self.output_dir / "data" |
| 60 | |
| 61 | md_dir.mkdir(exist_ok=True) |
| 62 | diagrams_dir.mkdir(exist_ok=True) |
| 63 | data_dir.mkdir(exist_ok=True) |
| 64 | |
| 65 | # Copy markdown file |
| 66 | markdown_path = Path(markdown_path) |
| 67 | md_output = md_dir / markdown_path.name |
| 68 | shutil.copy2(markdown_path, md_output) |
| 69 | |
| 70 | # Copy knowledge graph |
| 71 | kg_path = Path(knowledge_graph_path) |
| 72 | kg_output = data_dir / kg_path.name |
| 73 | shutil.copy2(kg_path, kg_output) |
| 74 | |
| 75 | # Copy diagram images if available |
| 76 | diagram_images = [] |
| 77 | for diagram in diagrams: |
| 78 | if "image_path" in diagram and diagram["image_path"]: |
| 79 | img_path = Path(diagram["image_path"]) |
| 80 | if img_path.exists(): |
| 81 | img_output = diagrams_dir / img_path.name |
| 82 | shutil.copy2(img_path, img_output) |
| 83 | diagram_images.append(str(img_output)) |
| 84 | |
| 85 | # Copy transcript if provided |
| 86 | transcript_output = None |
| 87 | if transcript_path: |
| 88 | transcript_path = Path(transcript_path) |
| 89 | if transcript_path.exists(): |
| 90 | transcript_output = data_dir / transcript_path.name |
| 91 | shutil.copy2(transcript_path, transcript_output) |
| 92 | |
| 93 | # Copy selected frames if provided |
| 94 | frame_outputs = [] |
| 95 | if frames_dir: |
| 96 | frames_dir = Path(frames_dir) |
| 97 | if frames_dir.exists(): |
| 98 | frames_output_dir = self.output_dir / "frames" |
| 99 | frames_output_dir.mkdir(exist_ok=True) |
| 100 | |
| 101 | # Copy a limited number of representative frames |
| 102 | frame_files = sorted(list(frames_dir.glob("*.jpg"))) |
| 103 | max_frames = min(10, len(frame_files)) |
| 104 | step = max(1, len(frame_files) // max_frames) |
| 105 | |
| 106 | for i in range(0, len(frame_files), step): |
| 107 | if len(frame_outputs) >= max_frames: |
| 108 | break |
| 109 | |
| 110 | frame = frame_files[i] |
| 111 | frame_output = frames_output_dir / frame.name |
| 112 | shutil.copy2(frame, frame_output) |
| 113 | frame_outputs.append(str(frame_output)) |
| 114 | |
| 115 | # Return organized paths |
| 116 | return { |
| 117 | "markdown": str(md_output), |
| 118 | "knowledge_graph": str(kg_output), |
| 119 | "diagram_images": diagram_images, |
| 120 | "frames": frame_outputs, |
| 121 | "transcript": str(transcript_output) if transcript_output else None, |
| 122 | } |
| 123 | |
| 124 | def create_html_index(self, outputs: Dict) -> Path: |
| 125 | """ |
| 126 | Create HTML index page for outputs. |
| 127 | |
| 128 | Parameters |
| @@ -142,11 +142,12 @@ | |
| 142 | "<!DOCTYPE html>", |
| 143 | "<html>", |
| 144 | "<head>", |
| 145 | " <title>PlanOpticon Analysis Results</title>", |
| 146 | " <style>", |
| 147 | " body { font-family: Arial, sans-serif;" |
| 148 | " margin: 0; padding: 20px; line-height: 1.6; }", |
| 149 | " .container { max-width: 1200px; margin: 0 auto; }", |
| 150 | " h1 { color: #333; }", |
| 151 | " h2 { color: #555; margin-top: 30px; }", |
| 152 | " .section { margin-bottom: 30px; }", |
| 153 | " .files { display: flex; flex-wrap: wrap; }", |
| @@ -158,11 +159,11 @@ | |
| 159 | " </style>", |
| 160 | "</head>", |
| 161 | "<body>", |
| 162 | "<div class='container'>", |
| 163 | " <h1>PlanOpticon Analysis Results</h1>", |
| 164 | "", |
| 165 | ] |
| 166 | |
| 167 | # Add markdown section |
| 168 | if outputs.get("markdown"): |
| 169 | md_path = Path(outputs["markdown"]) |
| @@ -228,11 +229,13 @@ | |
| 229 | lines.append(" <ul>") |
| 230 | |
| 231 | for data_path in data_files: |
| 232 | data_rel = esc(str(data_path.relative_to(self.output_dir))) |
| 233 | data_name = esc(data_path.name) |
| 234 | lines.append( |
| 235 | f" <li><a href='{data_rel}' target='_blank'>{data_name}</a></li>" |
| 236 | ) |
| 237 | |
| 238 | lines.append(" </ul>") |
| 239 | lines.append(" </div>") |
| 240 | |
| 241 | # Close HTML |
| 242 |
+11
-11
| --- video_processor/extractors/__init__.py | ||
| +++ video_processor/extractors/__init__.py | ||
| @@ -1,17 +1,17 @@ | ||
| 1 | +from video_processor.extractors.audio_extractor import AudioExtractor | |
| 1 | 2 | from video_processor.extractors.frame_extractor import ( |
| 2 | - extract_frames, | |
| 3 | - save_frames, | |
| 4 | - calculate_frame_difference, | |
| 5 | - is_gpu_available | |
| 3 | + calculate_frame_difference, | |
| 4 | + extract_frames, | |
| 5 | + is_gpu_available, | |
| 6 | + save_frames, | |
| 6 | 7 | ) |
| 7 | -from video_processor.extractors.audio_extractor import AudioExtractor | |
| 8 | 8 | from video_processor.extractors.text_extractor import TextExtractor |
| 9 | 9 | |
| 10 | 10 | __all__ = [ |
| 11 | - 'extract_frames', | |
| 12 | - 'save_frames', | |
| 13 | - 'calculate_frame_difference', | |
| 14 | - 'is_gpu_available', | |
| 15 | - 'AudioExtractor', | |
| 16 | - 'TextExtractor', | |
| 11 | + "extract_frames", | |
| 12 | + "save_frames", | |
| 13 | + "calculate_frame_difference", | |
| 14 | + "is_gpu_available", | |
| 15 | + "AudioExtractor", | |
| 16 | + "TextExtractor", | |
| 17 | 17 | ] |
| 18 | 18 |
| --- video_processor/extractors/__init__.py | |
| +++ video_processor/extractors/__init__.py | |
| @@ -1,17 +1,17 @@ | |
| 1 | from video_processor.extractors.frame_extractor import ( |
| 2 | extract_frames, |
| 3 | save_frames, |
| 4 | calculate_frame_difference, |
| 5 | is_gpu_available |
| 6 | ) |
| 7 | from video_processor.extractors.audio_extractor import AudioExtractor |
| 8 | from video_processor.extractors.text_extractor import TextExtractor |
| 9 | |
| 10 | __all__ = [ |
| 11 | 'extract_frames', |
| 12 | 'save_frames', |
| 13 | 'calculate_frame_difference', |
| 14 | 'is_gpu_available', |
| 15 | 'AudioExtractor', |
| 16 | 'TextExtractor', |
| 17 | ] |
| 18 |
| --- video_processor/extractors/__init__.py | |
| +++ video_processor/extractors/__init__.py | |
| @@ -1,17 +1,17 @@ | |
| 1 | from video_processor.extractors.audio_extractor import AudioExtractor |
| 2 | from video_processor.extractors.frame_extractor import ( |
| 3 | calculate_frame_difference, |
| 4 | extract_frames, |
| 5 | is_gpu_available, |
| 6 | save_frames, |
| 7 | ) |
| 8 | from video_processor.extractors.text_extractor import TextExtractor |
| 9 | |
| 10 | __all__ = [ |
| 11 | "extract_frames", |
| 12 | "save_frames", |
| 13 | "calculate_frame_difference", |
| 14 | "is_gpu_available", |
| 15 | "AudioExtractor", |
| 16 | "TextExtractor", |
| 17 | ] |
| 18 |
| --- video_processor/extractors/audio_extractor.py | ||
| +++ video_processor/extractors/audio_extractor.py | ||
| @@ -1,172 +1,170 @@ | ||
| 1 | 1 | """Audio extraction and processing module for video analysis.""" |
| 2 | + | |
| 2 | 3 | import logging |
| 3 | -import os | |
| 4 | 4 | import subprocess |
| 5 | 5 | from pathlib import Path |
| 6 | 6 | from typing import Dict, Optional, Tuple, Union |
| 7 | 7 | |
| 8 | 8 | import librosa |
| 9 | 9 | import numpy as np |
| 10 | 10 | import soundfile as sf |
| 11 | 11 | |
| 12 | 12 | logger = logging.getLogger(__name__) |
| 13 | + | |
| 13 | 14 | |
| 14 | 15 | class AudioExtractor: |
| 15 | 16 | """Extract and process audio from video files.""" |
| 16 | - | |
| 17 | + | |
| 17 | 18 | def __init__(self, sample_rate: int = 16000, mono: bool = True): |
| 18 | 19 | """ |
| 19 | 20 | Initialize the audio extractor. |
| 20 | - | |
| 21 | + | |
| 21 | 22 | Parameters |
| 22 | 23 | ---------- |
| 23 | 24 | sample_rate : int |
| 24 | 25 | Target sample rate for extracted audio |
| 25 | 26 | mono : bool |
| 26 | 27 | Whether to convert audio to mono |
| 27 | 28 | """ |
| 28 | 29 | self.sample_rate = sample_rate |
| 29 | 30 | self.mono = mono |
| 30 | - | |
| 31 | + | |
| 31 | 32 | def extract_audio( |
| 32 | - self, | |
| 33 | - video_path: Union[str, Path], | |
| 34 | - output_path: Optional[Union[str, Path]] = None, | |
| 35 | - format: str = "wav" | |
| 33 | + self, | |
| 34 | + video_path: Union[str, Path], | |
| 35 | + output_path: Optional[Union[str, Path]] = None, | |
| 36 | + format: str = "wav", | |
| 36 | 37 | ) -> Path: |
| 37 | 38 | """ |
| 38 | 39 | Extract audio from video file. |
| 39 | - | |
| 40 | + | |
| 40 | 41 | Parameters |
| 41 | 42 | ---------- |
| 42 | 43 | video_path : str or Path |
| 43 | 44 | Path to video file |
| 44 | 45 | output_path : str or Path, optional |
| 45 | 46 | Path to save extracted audio (if None, saves alongside video) |
| 46 | 47 | format : str |
| 47 | 48 | Audio format to save (wav, mp3, etc.) |
| 48 | - | |
| 49 | + | |
| 49 | 50 | Returns |
| 50 | 51 | ------- |
| 51 | 52 | Path |
| 52 | 53 | Path to extracted audio file |
| 53 | 54 | """ |
| 54 | 55 | video_path = Path(video_path) |
| 55 | 56 | if not video_path.exists(): |
| 56 | 57 | raise FileNotFoundError(f"Video file not found: {video_path}") |
| 57 | - | |
| 58 | + | |
| 58 | 59 | # Generate output path if not provided |
| 59 | 60 | if output_path is None: |
| 60 | 61 | output_path = video_path.with_suffix(f".{format}") |
| 61 | 62 | else: |
| 62 | 63 | output_path = Path(output_path) |
| 63 | - | |
| 64 | + | |
| 64 | 65 | # Ensure output directory exists |
| 65 | 66 | output_path.parent.mkdir(parents=True, exist_ok=True) |
| 66 | - | |
| 67 | + | |
| 67 | 68 | # Extract audio using ffmpeg |
| 68 | 69 | try: |
| 69 | 70 | cmd = [ |
| 70 | - "ffmpeg", | |
| 71 | - "-i", str(video_path), | |
| 72 | - "-vn", # No video | |
| 73 | - "-acodec", "pcm_s16le", # PCM 16-bit little-endian | |
| 74 | - "-ar", str(self.sample_rate), # Sample rate | |
| 75 | - "-ac", "1" if self.mono else "2", # Channels (mono or stereo) | |
| 76 | - "-y", # Overwrite output | |
| 77 | - str(output_path) | |
| 78 | - ] | |
| 79 | - | |
| 80 | - # Run ffmpeg command | |
| 81 | - result = subprocess.run( | |
| 82 | - cmd, | |
| 83 | - stdout=subprocess.PIPE, | |
| 84 | - stderr=subprocess.PIPE, | |
| 85 | - check=True | |
| 86 | - ) | |
| 87 | - | |
| 71 | + "ffmpeg", | |
| 72 | + "-i", | |
| 73 | + str(video_path), | |
| 74 | + "-vn", # No video | |
| 75 | + "-acodec", | |
| 76 | + "pcm_s16le", # PCM 16-bit little-endian | |
| 77 | + "-ar", | |
| 78 | + str(self.sample_rate), # Sample rate | |
| 79 | + "-ac", | |
| 80 | + "1" if self.mono else "2", # Channels (mono or stereo) | |
| 81 | + "-y", # Overwrite output | |
| 82 | + str(output_path), | |
| 83 | + ] | |
| 84 | + | |
| 85 | + # Run ffmpeg command | |
| 86 | + subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True) | |
| 87 | + | |
| 88 | 88 | logger.info(f"Extracted audio from {video_path} to {output_path}") |
| 89 | 89 | return output_path |
| 90 | - | |
| 90 | + | |
| 91 | 91 | except subprocess.CalledProcessError as e: |
| 92 | 92 | logger.error(f"Failed to extract audio: {e.stderr.decode()}") |
| 93 | 93 | raise RuntimeError(f"Failed to extract audio: {e.stderr.decode()}") |
| 94 | 94 | except Exception as e: |
| 95 | 95 | logger.error(f"Error extracting audio: {str(e)}") |
| 96 | 96 | raise |
| 97 | - | |
| 97 | + | |
| 98 | 98 | def load_audio(self, audio_path: Union[str, Path]) -> Tuple[np.ndarray, int]: |
| 99 | 99 | """ |
| 100 | 100 | Load audio file into memory. |
| 101 | - | |
| 101 | + | |
| 102 | 102 | Parameters |
| 103 | 103 | ---------- |
| 104 | 104 | audio_path : str or Path |
| 105 | 105 | Path to audio file |
| 106 | - | |
| 106 | + | |
| 107 | 107 | Returns |
| 108 | 108 | ------- |
| 109 | 109 | tuple |
| 110 | 110 | (audio_data, sample_rate) |
| 111 | 111 | """ |
| 112 | 112 | audio_path = Path(audio_path) |
| 113 | 113 | if not audio_path.exists(): |
| 114 | 114 | raise FileNotFoundError(f"Audio file not found: {audio_path}") |
| 115 | - | |
| 115 | + | |
| 116 | 116 | # Load audio data |
| 117 | 117 | audio_data, sr = librosa.load( |
| 118 | - audio_path, | |
| 119 | - sr=self.sample_rate if self.sample_rate else None, | |
| 120 | - mono=self.mono | |
| 118 | + audio_path, sr=self.sample_rate if self.sample_rate else None, mono=self.mono | |
| 121 | 119 | ) |
| 122 | - | |
| 120 | + | |
| 123 | 121 | logger.info(f"Loaded audio from {audio_path}: shape={audio_data.shape}, sr={sr}") |
| 124 | 122 | return audio_data, sr |
| 125 | - | |
| 123 | + | |
| 126 | 124 | def get_audio_properties(self, audio_path: Union[str, Path]) -> Dict: |
| 127 | 125 | """ |
| 128 | 126 | Get properties of audio file. |
| 129 | - | |
| 127 | + | |
| 130 | 128 | Parameters |
| 131 | 129 | ---------- |
| 132 | 130 | audio_path : str or Path |
| 133 | 131 | Path to audio file |
| 134 | - | |
| 132 | + | |
| 135 | 133 | Returns |
| 136 | 134 | ------- |
| 137 | 135 | dict |
| 138 | 136 | Audio properties (duration, sample_rate, channels, etc.) |
| 139 | 137 | """ |
| 140 | 138 | audio_path = Path(audio_path) |
| 141 | 139 | if not audio_path.exists(): |
| 142 | 140 | raise FileNotFoundError(f"Audio file not found: {audio_path}") |
| 143 | - | |
| 141 | + | |
| 144 | 142 | # Get audio info |
| 145 | 143 | info = sf.info(audio_path) |
| 146 | - | |
| 144 | + | |
| 147 | 145 | properties = { |
| 148 | 146 | "duration": info.duration, |
| 149 | 147 | "sample_rate": info.samplerate, |
| 150 | 148 | "channels": info.channels, |
| 151 | 149 | "format": info.format, |
| 152 | 150 | "subtype": info.subtype, |
| 153 | - "path": str(audio_path) | |
| 151 | + "path": str(audio_path), | |
| 154 | 152 | } |
| 155 | - | |
| 153 | + | |
| 156 | 154 | return properties |
| 157 | - | |
| 155 | + | |
| 158 | 156 | def segment_audio( |
| 159 | 157 | self, |
| 160 | 158 | audio_data: np.ndarray, |
| 161 | 159 | sample_rate: int, |
| 162 | 160 | segment_length_ms: int = 30000, |
| 163 | - overlap_ms: int = 0 | |
| 161 | + overlap_ms: int = 0, | |
| 164 | 162 | ) -> list: |
| 165 | 163 | """ |
| 166 | 164 | Segment audio into chunks. |
| 167 | - | |
| 165 | + | |
| 168 | 166 | Parameters |
| 169 | 167 | ---------- |
| 170 | 168 | audio_data : np.ndarray |
| 171 | 169 | Audio data |
| 172 | 170 | sample_rate : int |
| @@ -173,65 +171,62 @@ | ||
| 173 | 171 | Sample rate of audio |
| 174 | 172 | segment_length_ms : int |
| 175 | 173 | Length of segments in milliseconds |
| 176 | 174 | overlap_ms : int |
| 177 | 175 | Overlap between segments in milliseconds |
| 178 | - | |
| 176 | + | |
| 179 | 177 | Returns |
| 180 | 178 | ------- |
| 181 | 179 | list |
| 182 | 180 | List of audio segments as numpy arrays |
| 183 | 181 | """ |
| 184 | 182 | # Convert ms to samples |
| 185 | 183 | segment_length_samples = int(segment_length_ms * sample_rate / 1000) |
| 186 | 184 | overlap_samples = int(overlap_ms * sample_rate / 1000) |
| 187 | - | |
| 185 | + | |
| 188 | 186 | # Calculate hop length |
| 189 | 187 | hop_length = segment_length_samples - overlap_samples |
| 190 | - | |
| 188 | + | |
| 191 | 189 | # Initialize segments list |
| 192 | 190 | segments = [] |
| 193 | - | |
| 191 | + | |
| 194 | 192 | # Generate segments |
| 195 | 193 | for i in range(0, len(audio_data), hop_length): |
| 196 | 194 | end_idx = min(i + segment_length_samples, len(audio_data)) |
| 197 | 195 | segment = audio_data[i:end_idx] |
| 198 | - | |
| 196 | + | |
| 199 | 197 | # Only add if segment is long enough (at least 50% of target length) |
| 200 | 198 | if len(segment) >= segment_length_samples * 0.5: |
| 201 | 199 | segments.append(segment) |
| 202 | - | |
| 200 | + | |
| 203 | 201 | # Break if we've reached the end |
| 204 | 202 | if end_idx == len(audio_data): |
| 205 | 203 | break |
| 206 | - | |
| 204 | + | |
| 207 | 205 | logger.info(f"Segmented audio into {len(segments)} chunks") |
| 208 | 206 | return segments |
| 209 | - | |
| 207 | + | |
| 210 | 208 | def save_segment( |
| 211 | - self, | |
| 212 | - segment: np.ndarray, | |
| 213 | - output_path: Union[str, Path], | |
| 214 | - sample_rate: int | |
| 209 | + self, segment: np.ndarray, output_path: Union[str, Path], sample_rate: int | |
| 215 | 210 | ) -> Path: |
| 216 | 211 | """ |
| 217 | 212 | Save audio segment to file. |
| 218 | - | |
| 213 | + | |
| 219 | 214 | Parameters |
| 220 | 215 | ---------- |
| 221 | 216 | segment : np.ndarray |
| 222 | 217 | Audio segment data |
| 223 | 218 | output_path : str or Path |
| 224 | 219 | Path to save segment |
| 225 | 220 | sample_rate : int |
| 226 | 221 | Sample rate of segment |
| 227 | - | |
| 222 | + | |
| 228 | 223 | Returns |
| 229 | 224 | ------- |
| 230 | 225 | Path |
| 231 | 226 | Path to saved segment |
| 232 | 227 | """ |
| 233 | 228 | output_path = Path(output_path) |
| 234 | 229 | output_path.parent.mkdir(parents=True, exist_ok=True) |
| 235 | - | |
| 230 | + | |
| 236 | 231 | sf.write(output_path, segment, sample_rate) |
| 237 | 232 | return output_path |
| 238 | 233 |
| --- video_processor/extractors/audio_extractor.py | |
| +++ video_processor/extractors/audio_extractor.py | |
| @@ -1,172 +1,170 @@ | |
| 1 | """Audio extraction and processing module for video analysis.""" |
| 2 | import logging |
| 3 | import os |
| 4 | import subprocess |
| 5 | from pathlib import Path |
| 6 | from typing import Dict, Optional, Tuple, Union |
| 7 | |
| 8 | import librosa |
| 9 | import numpy as np |
| 10 | import soundfile as sf |
| 11 | |
| 12 | logger = logging.getLogger(__name__) |
| 13 | |
| 14 | class AudioExtractor: |
| 15 | """Extract and process audio from video files.""" |
| 16 | |
| 17 | def __init__(self, sample_rate: int = 16000, mono: bool = True): |
| 18 | """ |
| 19 | Initialize the audio extractor. |
| 20 | |
| 21 | Parameters |
| 22 | ---------- |
| 23 | sample_rate : int |
| 24 | Target sample rate for extracted audio |
| 25 | mono : bool |
| 26 | Whether to convert audio to mono |
| 27 | """ |
| 28 | self.sample_rate = sample_rate |
| 29 | self.mono = mono |
| 30 | |
| 31 | def extract_audio( |
| 32 | self, |
| 33 | video_path: Union[str, Path], |
| 34 | output_path: Optional[Union[str, Path]] = None, |
| 35 | format: str = "wav" |
| 36 | ) -> Path: |
| 37 | """ |
| 38 | Extract audio from video file. |
| 39 | |
| 40 | Parameters |
| 41 | ---------- |
| 42 | video_path : str or Path |
| 43 | Path to video file |
| 44 | output_path : str or Path, optional |
| 45 | Path to save extracted audio (if None, saves alongside video) |
| 46 | format : str |
| 47 | Audio format to save (wav, mp3, etc.) |
| 48 | |
| 49 | Returns |
| 50 | ------- |
| 51 | Path |
| 52 | Path to extracted audio file |
| 53 | """ |
| 54 | video_path = Path(video_path) |
| 55 | if not video_path.exists(): |
| 56 | raise FileNotFoundError(f"Video file not found: {video_path}") |
| 57 | |
| 58 | # Generate output path if not provided |
| 59 | if output_path is None: |
| 60 | output_path = video_path.with_suffix(f".{format}") |
| 61 | else: |
| 62 | output_path = Path(output_path) |
| 63 | |
| 64 | # Ensure output directory exists |
| 65 | output_path.parent.mkdir(parents=True, exist_ok=True) |
| 66 | |
| 67 | # Extract audio using ffmpeg |
| 68 | try: |
| 69 | cmd = [ |
| 70 | "ffmpeg", |
| 71 | "-i", str(video_path), |
| 72 | "-vn", # No video |
| 73 | "-acodec", "pcm_s16le", # PCM 16-bit little-endian |
| 74 | "-ar", str(self.sample_rate), # Sample rate |
| 75 | "-ac", "1" if self.mono else "2", # Channels (mono or stereo) |
| 76 | "-y", # Overwrite output |
| 77 | str(output_path) |
| 78 | ] |
| 79 | |
| 80 | # Run ffmpeg command |
| 81 | result = subprocess.run( |
| 82 | cmd, |
| 83 | stdout=subprocess.PIPE, |
| 84 | stderr=subprocess.PIPE, |
| 85 | check=True |
| 86 | ) |
| 87 | |
| 88 | logger.info(f"Extracted audio from {video_path} to {output_path}") |
| 89 | return output_path |
| 90 | |
| 91 | except subprocess.CalledProcessError as e: |
| 92 | logger.error(f"Failed to extract audio: {e.stderr.decode()}") |
| 93 | raise RuntimeError(f"Failed to extract audio: {e.stderr.decode()}") |
| 94 | except Exception as e: |
| 95 | logger.error(f"Error extracting audio: {str(e)}") |
| 96 | raise |
| 97 | |
| 98 | def load_audio(self, audio_path: Union[str, Path]) -> Tuple[np.ndarray, int]: |
| 99 | """ |
| 100 | Load audio file into memory. |
| 101 | |
| 102 | Parameters |
| 103 | ---------- |
| 104 | audio_path : str or Path |
| 105 | Path to audio file |
| 106 | |
| 107 | Returns |
| 108 | ------- |
| 109 | tuple |
| 110 | (audio_data, sample_rate) |
| 111 | """ |
| 112 | audio_path = Path(audio_path) |
| 113 | if not audio_path.exists(): |
| 114 | raise FileNotFoundError(f"Audio file not found: {audio_path}") |
| 115 | |
| 116 | # Load audio data |
| 117 | audio_data, sr = librosa.load( |
| 118 | audio_path, |
| 119 | sr=self.sample_rate if self.sample_rate else None, |
| 120 | mono=self.mono |
| 121 | ) |
| 122 | |
| 123 | logger.info(f"Loaded audio from {audio_path}: shape={audio_data.shape}, sr={sr}") |
| 124 | return audio_data, sr |
| 125 | |
| 126 | def get_audio_properties(self, audio_path: Union[str, Path]) -> Dict: |
| 127 | """ |
| 128 | Get properties of audio file. |
| 129 | |
| 130 | Parameters |
| 131 | ---------- |
| 132 | audio_path : str or Path |
| 133 | Path to audio file |
| 134 | |
| 135 | Returns |
| 136 | ------- |
| 137 | dict |
| 138 | Audio properties (duration, sample_rate, channels, etc.) |
| 139 | """ |
| 140 | audio_path = Path(audio_path) |
| 141 | if not audio_path.exists(): |
| 142 | raise FileNotFoundError(f"Audio file not found: {audio_path}") |
| 143 | |
| 144 | # Get audio info |
| 145 | info = sf.info(audio_path) |
| 146 | |
| 147 | properties = { |
| 148 | "duration": info.duration, |
| 149 | "sample_rate": info.samplerate, |
| 150 | "channels": info.channels, |
| 151 | "format": info.format, |
| 152 | "subtype": info.subtype, |
| 153 | "path": str(audio_path) |
| 154 | } |
| 155 | |
| 156 | return properties |
| 157 | |
| 158 | def segment_audio( |
| 159 | self, |
| 160 | audio_data: np.ndarray, |
| 161 | sample_rate: int, |
| 162 | segment_length_ms: int = 30000, |
| 163 | overlap_ms: int = 0 |
| 164 | ) -> list: |
| 165 | """ |
| 166 | Segment audio into chunks. |
| 167 | |
| 168 | Parameters |
| 169 | ---------- |
| 170 | audio_data : np.ndarray |
| 171 | Audio data |
| 172 | sample_rate : int |
| @@ -173,65 +171,62 @@ | |
| 173 | Sample rate of audio |
| 174 | segment_length_ms : int |
| 175 | Length of segments in milliseconds |
| 176 | overlap_ms : int |
| 177 | Overlap between segments in milliseconds |
| 178 | |
| 179 | Returns |
| 180 | ------- |
| 181 | list |
| 182 | List of audio segments as numpy arrays |
| 183 | """ |
| 184 | # Convert ms to samples |
| 185 | segment_length_samples = int(segment_length_ms * sample_rate / 1000) |
| 186 | overlap_samples = int(overlap_ms * sample_rate / 1000) |
| 187 | |
| 188 | # Calculate hop length |
| 189 | hop_length = segment_length_samples - overlap_samples |
| 190 | |
| 191 | # Initialize segments list |
| 192 | segments = [] |
| 193 | |
| 194 | # Generate segments |
| 195 | for i in range(0, len(audio_data), hop_length): |
| 196 | end_idx = min(i + segment_length_samples, len(audio_data)) |
| 197 | segment = audio_data[i:end_idx] |
| 198 | |
| 199 | # Only add if segment is long enough (at least 50% of target length) |
| 200 | if len(segment) >= segment_length_samples * 0.5: |
| 201 | segments.append(segment) |
| 202 | |
| 203 | # Break if we've reached the end |
| 204 | if end_idx == len(audio_data): |
| 205 | break |
| 206 | |
| 207 | logger.info(f"Segmented audio into {len(segments)} chunks") |
| 208 | return segments |
| 209 | |
| 210 | def save_segment( |
| 211 | self, |
| 212 | segment: np.ndarray, |
| 213 | output_path: Union[str, Path], |
| 214 | sample_rate: int |
| 215 | ) -> Path: |
| 216 | """ |
| 217 | Save audio segment to file. |
| 218 | |
| 219 | Parameters |
| 220 | ---------- |
| 221 | segment : np.ndarray |
| 222 | Audio segment data |
| 223 | output_path : str or Path |
| 224 | Path to save segment |
| 225 | sample_rate : int |
| 226 | Sample rate of segment |
| 227 | |
| 228 | Returns |
| 229 | ------- |
| 230 | Path |
| 231 | Path to saved segment |
| 232 | """ |
| 233 | output_path = Path(output_path) |
| 234 | output_path.parent.mkdir(parents=True, exist_ok=True) |
| 235 | |
| 236 | sf.write(output_path, segment, sample_rate) |
| 237 | return output_path |
| 238 |
| --- video_processor/extractors/audio_extractor.py | |
| +++ video_processor/extractors/audio_extractor.py | |
| @@ -1,172 +1,170 @@ | |
| 1 | """Audio extraction and processing module for video analysis.""" |
| 2 | |
| 3 | import logging |
| 4 | import subprocess |
| 5 | from pathlib import Path |
| 6 | from typing import Dict, Optional, Tuple, Union |
| 7 | |
| 8 | import librosa |
| 9 | import numpy as np |
| 10 | import soundfile as sf |
| 11 | |
| 12 | logger = logging.getLogger(__name__) |
| 13 | |
| 14 | |
| 15 | class AudioExtractor: |
| 16 | """Extract and process audio from video files.""" |
| 17 | |
| 18 | def __init__(self, sample_rate: int = 16000, mono: bool = True): |
| 19 | """ |
| 20 | Initialize the audio extractor. |
| 21 | |
| 22 | Parameters |
| 23 | ---------- |
| 24 | sample_rate : int |
| 25 | Target sample rate for extracted audio |
| 26 | mono : bool |
| 27 | Whether to convert audio to mono |
| 28 | """ |
| 29 | self.sample_rate = sample_rate |
| 30 | self.mono = mono |
| 31 | |
| 32 | def extract_audio( |
| 33 | self, |
| 34 | video_path: Union[str, Path], |
| 35 | output_path: Optional[Union[str, Path]] = None, |
| 36 | format: str = "wav", |
| 37 | ) -> Path: |
| 38 | """ |
| 39 | Extract audio from video file. |
| 40 | |
| 41 | Parameters |
| 42 | ---------- |
| 43 | video_path : str or Path |
| 44 | Path to video file |
| 45 | output_path : str or Path, optional |
| 46 | Path to save extracted audio (if None, saves alongside video) |
| 47 | format : str |
| 48 | Audio format to save (wav, mp3, etc.) |
| 49 | |
| 50 | Returns |
| 51 | ------- |
| 52 | Path |
| 53 | Path to extracted audio file |
| 54 | """ |
| 55 | video_path = Path(video_path) |
| 56 | if not video_path.exists(): |
| 57 | raise FileNotFoundError(f"Video file not found: {video_path}") |
| 58 | |
| 59 | # Generate output path if not provided |
| 60 | if output_path is None: |
| 61 | output_path = video_path.with_suffix(f".{format}") |
| 62 | else: |
| 63 | output_path = Path(output_path) |
| 64 | |
| 65 | # Ensure output directory exists |
| 66 | output_path.parent.mkdir(parents=True, exist_ok=True) |
| 67 | |
| 68 | # Extract audio using ffmpeg |
| 69 | try: |
| 70 | cmd = [ |
| 71 | "ffmpeg", |
| 72 | "-i", |
| 73 | str(video_path), |
| 74 | "-vn", # No video |
| 75 | "-acodec", |
| 76 | "pcm_s16le", # PCM 16-bit little-endian |
| 77 | "-ar", |
| 78 | str(self.sample_rate), # Sample rate |
| 79 | "-ac", |
| 80 | "1" if self.mono else "2", # Channels (mono or stereo) |
| 81 | "-y", # Overwrite output |
| 82 | str(output_path), |
| 83 | ] |
| 84 | |
| 85 | # Run ffmpeg command |
| 86 | subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True) |
| 87 | |
| 88 | logger.info(f"Extracted audio from {video_path} to {output_path}") |
| 89 | return output_path |
| 90 | |
| 91 | except subprocess.CalledProcessError as e: |
| 92 | logger.error(f"Failed to extract audio: {e.stderr.decode()}") |
| 93 | raise RuntimeError(f"Failed to extract audio: {e.stderr.decode()}") |
| 94 | except Exception as e: |
| 95 | logger.error(f"Error extracting audio: {str(e)}") |
| 96 | raise |
| 97 | |
| 98 | def load_audio(self, audio_path: Union[str, Path]) -> Tuple[np.ndarray, int]: |
| 99 | """ |
| 100 | Load audio file into memory. |
| 101 | |
| 102 | Parameters |
| 103 | ---------- |
| 104 | audio_path : str or Path |
| 105 | Path to audio file |
| 106 | |
| 107 | Returns |
| 108 | ------- |
| 109 | tuple |
| 110 | (audio_data, sample_rate) |
| 111 | """ |
| 112 | audio_path = Path(audio_path) |
| 113 | if not audio_path.exists(): |
| 114 | raise FileNotFoundError(f"Audio file not found: {audio_path}") |
| 115 | |
| 116 | # Load audio data |
| 117 | audio_data, sr = librosa.load( |
| 118 | audio_path, sr=self.sample_rate if self.sample_rate else None, mono=self.mono |
| 119 | ) |
| 120 | |
| 121 | logger.info(f"Loaded audio from {audio_path}: shape={audio_data.shape}, sr={sr}") |
| 122 | return audio_data, sr |
| 123 | |
| 124 | def get_audio_properties(self, audio_path: Union[str, Path]) -> Dict: |
| 125 | """ |
| 126 | Get properties of audio file. |
| 127 | |
| 128 | Parameters |
| 129 | ---------- |
| 130 | audio_path : str or Path |
| 131 | Path to audio file |
| 132 | |
| 133 | Returns |
| 134 | ------- |
| 135 | dict |
| 136 | Audio properties (duration, sample_rate, channels, etc.) |
| 137 | """ |
| 138 | audio_path = Path(audio_path) |
| 139 | if not audio_path.exists(): |
| 140 | raise FileNotFoundError(f"Audio file not found: {audio_path}") |
| 141 | |
| 142 | # Get audio info |
| 143 | info = sf.info(audio_path) |
| 144 | |
| 145 | properties = { |
| 146 | "duration": info.duration, |
| 147 | "sample_rate": info.samplerate, |
| 148 | "channels": info.channels, |
| 149 | "format": info.format, |
| 150 | "subtype": info.subtype, |
| 151 | "path": str(audio_path), |
| 152 | } |
| 153 | |
| 154 | return properties |
| 155 | |
| 156 | def segment_audio( |
| 157 | self, |
| 158 | audio_data: np.ndarray, |
| 159 | sample_rate: int, |
| 160 | segment_length_ms: int = 30000, |
| 161 | overlap_ms: int = 0, |
| 162 | ) -> list: |
| 163 | """ |
| 164 | Segment audio into chunks. |
| 165 | |
| 166 | Parameters |
| 167 | ---------- |
| 168 | audio_data : np.ndarray |
| 169 | Audio data |
| 170 | sample_rate : int |
| @@ -173,65 +171,62 @@ | |
| 171 | Sample rate of audio |
| 172 | segment_length_ms : int |
| 173 | Length of segments in milliseconds |
| 174 | overlap_ms : int |
| 175 | Overlap between segments in milliseconds |
| 176 | |
| 177 | Returns |
| 178 | ------- |
| 179 | list |
| 180 | List of audio segments as numpy arrays |
| 181 | """ |
| 182 | # Convert ms to samples |
| 183 | segment_length_samples = int(segment_length_ms * sample_rate / 1000) |
| 184 | overlap_samples = int(overlap_ms * sample_rate / 1000) |
| 185 | |
| 186 | # Calculate hop length |
| 187 | hop_length = segment_length_samples - overlap_samples |
| 188 | |
| 189 | # Initialize segments list |
| 190 | segments = [] |
| 191 | |
| 192 | # Generate segments |
| 193 | for i in range(0, len(audio_data), hop_length): |
| 194 | end_idx = min(i + segment_length_samples, len(audio_data)) |
| 195 | segment = audio_data[i:end_idx] |
| 196 | |
| 197 | # Only add if segment is long enough (at least 50% of target length) |
| 198 | if len(segment) >= segment_length_samples * 0.5: |
| 199 | segments.append(segment) |
| 200 | |
| 201 | # Break if we've reached the end |
| 202 | if end_idx == len(audio_data): |
| 203 | break |
| 204 | |
| 205 | logger.info(f"Segmented audio into {len(segments)} chunks") |
| 206 | return segments |
| 207 | |
| 208 | def save_segment( |
| 209 | self, segment: np.ndarray, output_path: Union[str, Path], sample_rate: int |
| 210 | ) -> Path: |
| 211 | """ |
| 212 | Save audio segment to file. |
| 213 | |
| 214 | Parameters |
| 215 | ---------- |
| 216 | segment : np.ndarray |
| 217 | Audio segment data |
| 218 | output_path : str or Path |
| 219 | Path to save segment |
| 220 | sample_rate : int |
| 221 | Sample rate of segment |
| 222 | |
| 223 | Returns |
| 224 | ------- |
| 225 | Path |
| 226 | Path to saved segment |
| 227 | """ |
| 228 | output_path = Path(output_path) |
| 229 | output_path.parent.mkdir(parents=True, exist_ok=True) |
| 230 | |
| 231 | sf.write(output_path, segment, sample_rate) |
| 232 | return output_path |
| 233 |
| --- video_processor/extractors/frame_extractor.py | ||
| +++ video_processor/extractors/frame_extractor.py | ||
| @@ -1,6 +1,7 @@ | ||
| 1 | 1 | """Frame extraction module for video processing.""" |
| 2 | + | |
| 2 | 3 | import functools |
| 3 | 4 | import logging |
| 4 | 5 | from pathlib import Path |
| 5 | 6 | from typing import List, Optional, Tuple, Union |
| 6 | 7 | |
| @@ -112,44 +113,49 @@ | ||
| 112 | 113 | filtered.append(frame) |
| 113 | 114 | |
| 114 | 115 | if removed: |
| 115 | 116 | logger.info(f"Filtered out {removed}/{len(frames)} people/webcam frames") |
| 116 | 117 | return filtered, removed |
| 118 | + | |
| 117 | 119 | |
| 118 | 120 | def is_gpu_available() -> bool: |
| 119 | 121 | """Check if GPU acceleration is available for OpenCV.""" |
| 120 | 122 | try: |
| 121 | 123 | # Check if CUDA is available |
| 122 | 124 | count = cv2.cuda.getCudaEnabledDeviceCount() |
| 123 | 125 | return count > 0 |
| 124 | 126 | except Exception: |
| 125 | 127 | return False |
| 128 | + | |
| 126 | 129 | |
| 127 | 130 | def gpu_accelerated(func): |
| 128 | 131 | """Decorator to use GPU implementation when available.""" |
| 132 | + | |
| 129 | 133 | @functools.wraps(func) |
| 130 | 134 | def wrapper(*args, **kwargs): |
| 131 | - if is_gpu_available() and not kwargs.get('disable_gpu'): | |
| 135 | + if is_gpu_available() and not kwargs.get("disable_gpu"): | |
| 132 | 136 | # Remove the disable_gpu kwarg if it exists |
| 133 | - kwargs.pop('disable_gpu', None) | |
| 137 | + kwargs.pop("disable_gpu", None) | |
| 134 | 138 | return func_gpu(*args, **kwargs) |
| 135 | 139 | # Remove the disable_gpu kwarg if it exists |
| 136 | - kwargs.pop('disable_gpu', None) | |
| 140 | + kwargs.pop("disable_gpu", None) | |
| 137 | 141 | return func(*args, **kwargs) |
| 142 | + | |
| 138 | 143 | return wrapper |
| 144 | + | |
| 139 | 145 | |
| 140 | 146 | def calculate_frame_difference(prev_frame: np.ndarray, curr_frame: np.ndarray) -> float: |
| 141 | 147 | """ |
| 142 | 148 | Calculate the difference between two frames. |
| 143 | - | |
| 149 | + | |
| 144 | 150 | Parameters |
| 145 | 151 | ---------- |
| 146 | 152 | prev_frame : np.ndarray |
| 147 | 153 | Previous frame |
| 148 | 154 | curr_frame : np.ndarray |
| 149 | 155 | Current frame |
| 150 | - | |
| 156 | + | |
| 151 | 157 | Returns |
| 152 | 158 | ------- |
| 153 | 159 | float |
| 154 | 160 | Difference score between 0 and 1 |
| 155 | 161 | """ |
| @@ -156,30 +162,31 @@ | ||
| 156 | 162 | # Convert to grayscale |
| 157 | 163 | if len(prev_frame.shape) == 3: |
| 158 | 164 | prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY) |
| 159 | 165 | else: |
| 160 | 166 | prev_gray = prev_frame |
| 161 | - | |
| 167 | + | |
| 162 | 168 | if len(curr_frame.shape) == 3: |
| 163 | 169 | curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY) |
| 164 | 170 | else: |
| 165 | 171 | curr_gray = curr_frame |
| 166 | - | |
| 172 | + | |
| 167 | 173 | # Calculate absolute difference |
| 168 | 174 | diff = cv2.absdiff(prev_gray, curr_gray) |
| 169 | - | |
| 175 | + | |
| 170 | 176 | # Normalize and return mean difference |
| 171 | 177 | return np.mean(diff) / 255.0 |
| 178 | + | |
| 172 | 179 | |
| 173 | 180 | @gpu_accelerated |
| 174 | 181 | def extract_frames( |
| 175 | 182 | video_path: Union[str, Path], |
| 176 | 183 | sampling_rate: float = 1.0, |
| 177 | 184 | change_threshold: float = 0.15, |
| 178 | 185 | periodic_capture_seconds: float = 30.0, |
| 179 | 186 | max_frames: Optional[int] = None, |
| 180 | - resize_to: Optional[Tuple[int, int]] = None | |
| 187 | + resize_to: Optional[Tuple[int, int]] = None, | |
| 181 | 188 | ) -> List[np.ndarray]: |
| 182 | 189 | """ |
| 183 | 190 | Extract frames from video based on visual change detection + periodic capture. |
| 184 | 191 | |
| 185 | 192 | Two capture strategies work together: |
| @@ -273,11 +280,13 @@ | ||
| 273 | 280 | if diff > change_threshold: |
| 274 | 281 | should_capture = True |
| 275 | 282 | reason = f"change={diff:.3f}" |
| 276 | 283 | |
| 277 | 284 | # Periodic capture — even if change is small |
| 278 | - elif periodic_interval > 0 and (frame_idx - last_capture_frame) >= periodic_interval: | |
| 285 | + elif ( | |
| 286 | + periodic_interval > 0 and (frame_idx - last_capture_frame) >= periodic_interval | |
| 287 | + ): | |
| 279 | 288 | should_capture = True |
| 280 | 289 | reason = "periodic" |
| 281 | 290 | |
| 282 | 291 | if should_capture: |
| 283 | 292 | extracted_frames.append(frame) |
| @@ -299,41 +308,45 @@ | ||
| 299 | 308 | |
| 300 | 309 | pbar.close() |
| 301 | 310 | cap.release() |
| 302 | 311 | logger.info(f"Extracted {len(extracted_frames)} frames from {frame_count} total frames") |
| 303 | 312 | return extracted_frames |
| 313 | + | |
| 304 | 314 | |
| 305 | 315 | def func_gpu(*args, **kwargs): |
| 306 | 316 | """GPU-accelerated version of extract_frames.""" |
| 307 | 317 | # This would be implemented with CUDA acceleration |
| 308 | 318 | # For now, fall back to the unwrapped CPU version |
| 309 | 319 | logger.info("GPU acceleration not yet implemented, falling back to CPU") |
| 310 | 320 | return extract_frames.__wrapped__(*args, **kwargs) |
| 311 | 321 | |
| 312 | -def save_frames(frames: List[np.ndarray], output_dir: Union[str, Path], base_filename: str = "frame") -> List[Path]: | |
| 322 | + | |
| 323 | +def save_frames( | |
| 324 | + frames: List[np.ndarray], output_dir: Union[str, Path], base_filename: str = "frame" | |
| 325 | +) -> List[Path]: | |
| 313 | 326 | """ |
| 314 | 327 | Save extracted frames to disk. |
| 315 | - | |
| 328 | + | |
| 316 | 329 | Parameters |
| 317 | 330 | ---------- |
| 318 | 331 | frames : list |
| 319 | 332 | List of frames to save |
| 320 | 333 | output_dir : str or Path |
| 321 | 334 | Directory to save frames in |
| 322 | 335 | base_filename : str |
| 323 | 336 | Base name for frame files |
| 324 | - | |
| 337 | + | |
| 325 | 338 | Returns |
| 326 | 339 | ------- |
| 327 | 340 | list |
| 328 | 341 | List of paths to saved frame files |
| 329 | 342 | """ |
| 330 | 343 | output_dir = Path(output_dir) |
| 331 | 344 | output_dir.mkdir(parents=True, exist_ok=True) |
| 332 | - | |
| 345 | + | |
| 333 | 346 | saved_paths = [] |
| 334 | 347 | for i, frame in enumerate(frames): |
| 335 | 348 | output_path = output_dir / f"{base_filename}_{i:04d}.jpg" |
| 336 | 349 | cv2.imwrite(str(output_path), frame) |
| 337 | 350 | saved_paths.append(output_path) |
| 338 | - | |
| 351 | + | |
| 339 | 352 | return saved_paths |
| 340 | 353 |
| --- video_processor/extractors/frame_extractor.py | |
| +++ video_processor/extractors/frame_extractor.py | |
| @@ -1,6 +1,7 @@ | |
| 1 | """Frame extraction module for video processing.""" |
| 2 | import functools |
| 3 | import logging |
| 4 | from pathlib import Path |
| 5 | from typing import List, Optional, Tuple, Union |
| 6 | |
| @@ -112,44 +113,49 @@ | |
| 112 | filtered.append(frame) |
| 113 | |
| 114 | if removed: |
| 115 | logger.info(f"Filtered out {removed}/{len(frames)} people/webcam frames") |
| 116 | return filtered, removed |
| 117 | |
| 118 | def is_gpu_available() -> bool: |
| 119 | """Check if GPU acceleration is available for OpenCV.""" |
| 120 | try: |
| 121 | # Check if CUDA is available |
| 122 | count = cv2.cuda.getCudaEnabledDeviceCount() |
| 123 | return count > 0 |
| 124 | except Exception: |
| 125 | return False |
| 126 | |
| 127 | def gpu_accelerated(func): |
| 128 | """Decorator to use GPU implementation when available.""" |
| 129 | @functools.wraps(func) |
| 130 | def wrapper(*args, **kwargs): |
| 131 | if is_gpu_available() and not kwargs.get('disable_gpu'): |
| 132 | # Remove the disable_gpu kwarg if it exists |
| 133 | kwargs.pop('disable_gpu', None) |
| 134 | return func_gpu(*args, **kwargs) |
| 135 | # Remove the disable_gpu kwarg if it exists |
| 136 | kwargs.pop('disable_gpu', None) |
| 137 | return func(*args, **kwargs) |
| 138 | return wrapper |
| 139 | |
| 140 | def calculate_frame_difference(prev_frame: np.ndarray, curr_frame: np.ndarray) -> float: |
| 141 | """ |
| 142 | Calculate the difference between two frames. |
| 143 | |
| 144 | Parameters |
| 145 | ---------- |
| 146 | prev_frame : np.ndarray |
| 147 | Previous frame |
| 148 | curr_frame : np.ndarray |
| 149 | Current frame |
| 150 | |
| 151 | Returns |
| 152 | ------- |
| 153 | float |
| 154 | Difference score between 0 and 1 |
| 155 | """ |
| @@ -156,30 +162,31 @@ | |
| 156 | # Convert to grayscale |
| 157 | if len(prev_frame.shape) == 3: |
| 158 | prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY) |
| 159 | else: |
| 160 | prev_gray = prev_frame |
| 161 | |
| 162 | if len(curr_frame.shape) == 3: |
| 163 | curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY) |
| 164 | else: |
| 165 | curr_gray = curr_frame |
| 166 | |
| 167 | # Calculate absolute difference |
| 168 | diff = cv2.absdiff(prev_gray, curr_gray) |
| 169 | |
| 170 | # Normalize and return mean difference |
| 171 | return np.mean(diff) / 255.0 |
| 172 | |
| 173 | @gpu_accelerated |
| 174 | def extract_frames( |
| 175 | video_path: Union[str, Path], |
| 176 | sampling_rate: float = 1.0, |
| 177 | change_threshold: float = 0.15, |
| 178 | periodic_capture_seconds: float = 30.0, |
| 179 | max_frames: Optional[int] = None, |
| 180 | resize_to: Optional[Tuple[int, int]] = None |
| 181 | ) -> List[np.ndarray]: |
| 182 | """ |
| 183 | Extract frames from video based on visual change detection + periodic capture. |
| 184 | |
| 185 | Two capture strategies work together: |
| @@ -273,11 +280,13 @@ | |
| 273 | if diff > change_threshold: |
| 274 | should_capture = True |
| 275 | reason = f"change={diff:.3f}" |
| 276 | |
| 277 | # Periodic capture — even if change is small |
| 278 | elif periodic_interval > 0 and (frame_idx - last_capture_frame) >= periodic_interval: |
| 279 | should_capture = True |
| 280 | reason = "periodic" |
| 281 | |
| 282 | if should_capture: |
| 283 | extracted_frames.append(frame) |
| @@ -299,41 +308,45 @@ | |
| 299 | |
| 300 | pbar.close() |
| 301 | cap.release() |
| 302 | logger.info(f"Extracted {len(extracted_frames)} frames from {frame_count} total frames") |
| 303 | return extracted_frames |
| 304 | |
| 305 | def func_gpu(*args, **kwargs): |
| 306 | """GPU-accelerated version of extract_frames.""" |
| 307 | # This would be implemented with CUDA acceleration |
| 308 | # For now, fall back to the unwrapped CPU version |
| 309 | logger.info("GPU acceleration not yet implemented, falling back to CPU") |
| 310 | return extract_frames.__wrapped__(*args, **kwargs) |
| 311 | |
| 312 | def save_frames(frames: List[np.ndarray], output_dir: Union[str, Path], base_filename: str = "frame") -> List[Path]: |
| 313 | """ |
| 314 | Save extracted frames to disk. |
| 315 | |
| 316 | Parameters |
| 317 | ---------- |
| 318 | frames : list |
| 319 | List of frames to save |
| 320 | output_dir : str or Path |
| 321 | Directory to save frames in |
| 322 | base_filename : str |
| 323 | Base name for frame files |
| 324 | |
| 325 | Returns |
| 326 | ------- |
| 327 | list |
| 328 | List of paths to saved frame files |
| 329 | """ |
| 330 | output_dir = Path(output_dir) |
| 331 | output_dir.mkdir(parents=True, exist_ok=True) |
| 332 | |
| 333 | saved_paths = [] |
| 334 | for i, frame in enumerate(frames): |
| 335 | output_path = output_dir / f"{base_filename}_{i:04d}.jpg" |
| 336 | cv2.imwrite(str(output_path), frame) |
| 337 | saved_paths.append(output_path) |
| 338 | |
| 339 | return saved_paths |
| 340 |
| --- video_processor/extractors/frame_extractor.py | |
| +++ video_processor/extractors/frame_extractor.py | |
| @@ -1,6 +1,7 @@ | |
| 1 | """Frame extraction module for video processing.""" |
| 2 | |
| 3 | import functools |
| 4 | import logging |
| 5 | from pathlib import Path |
| 6 | from typing import List, Optional, Tuple, Union |
| 7 | |
| @@ -112,44 +113,49 @@ | |
| 113 | filtered.append(frame) |
| 114 | |
| 115 | if removed: |
| 116 | logger.info(f"Filtered out {removed}/{len(frames)} people/webcam frames") |
| 117 | return filtered, removed |
| 118 | |
| 119 | |
| 120 | def is_gpu_available() -> bool: |
| 121 | """Check if GPU acceleration is available for OpenCV.""" |
| 122 | try: |
| 123 | # Check if CUDA is available |
| 124 | count = cv2.cuda.getCudaEnabledDeviceCount() |
| 125 | return count > 0 |
| 126 | except Exception: |
| 127 | return False |
| 128 | |
| 129 | |
| 130 | def gpu_accelerated(func): |
| 131 | """Decorator to use GPU implementation when available.""" |
| 132 | |
| 133 | @functools.wraps(func) |
| 134 | def wrapper(*args, **kwargs): |
| 135 | if is_gpu_available() and not kwargs.get("disable_gpu"): |
| 136 | # Remove the disable_gpu kwarg if it exists |
| 137 | kwargs.pop("disable_gpu", None) |
| 138 | return func_gpu(*args, **kwargs) |
| 139 | # Remove the disable_gpu kwarg if it exists |
| 140 | kwargs.pop("disable_gpu", None) |
| 141 | return func(*args, **kwargs) |
| 142 | |
| 143 | return wrapper |
| 144 | |
| 145 | |
| 146 | def calculate_frame_difference(prev_frame: np.ndarray, curr_frame: np.ndarray) -> float: |
| 147 | """ |
| 148 | Calculate the difference between two frames. |
| 149 | |
| 150 | Parameters |
| 151 | ---------- |
| 152 | prev_frame : np.ndarray |
| 153 | Previous frame |
| 154 | curr_frame : np.ndarray |
| 155 | Current frame |
| 156 | |
| 157 | Returns |
| 158 | ------- |
| 159 | float |
| 160 | Difference score between 0 and 1 |
| 161 | """ |
| @@ -156,30 +162,31 @@ | |
| 162 | # Convert to grayscale |
| 163 | if len(prev_frame.shape) == 3: |
| 164 | prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY) |
| 165 | else: |
| 166 | prev_gray = prev_frame |
| 167 | |
| 168 | if len(curr_frame.shape) == 3: |
| 169 | curr_gray = cv2.cvtColor(curr_frame, cv2.COLOR_BGR2GRAY) |
| 170 | else: |
| 171 | curr_gray = curr_frame |
| 172 | |
| 173 | # Calculate absolute difference |
| 174 | diff = cv2.absdiff(prev_gray, curr_gray) |
| 175 | |
| 176 | # Normalize and return mean difference |
| 177 | return np.mean(diff) / 255.0 |
| 178 | |
| 179 | |
| 180 | @gpu_accelerated |
| 181 | def extract_frames( |
| 182 | video_path: Union[str, Path], |
| 183 | sampling_rate: float = 1.0, |
| 184 | change_threshold: float = 0.15, |
| 185 | periodic_capture_seconds: float = 30.0, |
| 186 | max_frames: Optional[int] = None, |
| 187 | resize_to: Optional[Tuple[int, int]] = None, |
| 188 | ) -> List[np.ndarray]: |
| 189 | """ |
| 190 | Extract frames from video based on visual change detection + periodic capture. |
| 191 | |
| 192 | Two capture strategies work together: |
| @@ -273,11 +280,13 @@ | |
| 280 | if diff > change_threshold: |
| 281 | should_capture = True |
| 282 | reason = f"change={diff:.3f}" |
| 283 | |
| 284 | # Periodic capture — even if change is small |
| 285 | elif ( |
| 286 | periodic_interval > 0 and (frame_idx - last_capture_frame) >= periodic_interval |
| 287 | ): |
| 288 | should_capture = True |
| 289 | reason = "periodic" |
| 290 | |
| 291 | if should_capture: |
| 292 | extracted_frames.append(frame) |
| @@ -299,41 +308,45 @@ | |
| 308 | |
| 309 | pbar.close() |
| 310 | cap.release() |
| 311 | logger.info(f"Extracted {len(extracted_frames)} frames from {frame_count} total frames") |
| 312 | return extracted_frames |
| 313 | |
| 314 | |
| 315 | def func_gpu(*args, **kwargs): |
| 316 | """GPU-accelerated version of extract_frames.""" |
| 317 | # This would be implemented with CUDA acceleration |
| 318 | # For now, fall back to the unwrapped CPU version |
| 319 | logger.info("GPU acceleration not yet implemented, falling back to CPU") |
| 320 | return extract_frames.__wrapped__(*args, **kwargs) |
| 321 | |
| 322 | |
| 323 | def save_frames( |
| 324 | frames: List[np.ndarray], output_dir: Union[str, Path], base_filename: str = "frame" |
| 325 | ) -> List[Path]: |
| 326 | """ |
| 327 | Save extracted frames to disk. |
| 328 | |
| 329 | Parameters |
| 330 | ---------- |
| 331 | frames : list |
| 332 | List of frames to save |
| 333 | output_dir : str or Path |
| 334 | Directory to save frames in |
| 335 | base_filename : str |
| 336 | Base name for frame files |
| 337 | |
| 338 | Returns |
| 339 | ------- |
| 340 | list |
| 341 | List of paths to saved frame files |
| 342 | """ |
| 343 | output_dir = Path(output_dir) |
| 344 | output_dir.mkdir(parents=True, exist_ok=True) |
| 345 | |
| 346 | saved_paths = [] |
| 347 | for i, frame in enumerate(frames): |
| 348 | output_path = output_dir / f"{base_filename}_{i:04d}.jpg" |
| 349 | cv2.imwrite(str(output_path), frame) |
| 350 | saved_paths.append(output_path) |
| 351 | |
| 352 | return saved_paths |
| 353 |
| --- video_processor/extractors/text_extractor.py | ||
| +++ video_processor/extractors/text_extractor.py | ||
| @@ -1,48 +1,51 @@ | ||
| 1 | 1 | """Text extraction module for frames and diagrams.""" |
| 2 | + | |
| 2 | 3 | import logging |
| 3 | 4 | from pathlib import Path |
| 4 | 5 | from typing import Dict, List, Optional, Tuple, Union |
| 5 | 6 | |
| 6 | 7 | import cv2 |
| 7 | 8 | import numpy as np |
| 8 | 9 | |
| 9 | 10 | logger = logging.getLogger(__name__) |
| 11 | + | |
| 10 | 12 | |
| 11 | 13 | class TextExtractor: |
| 12 | 14 | """Extract text from images, frames, and diagrams.""" |
| 13 | - | |
| 15 | + | |
| 14 | 16 | def __init__(self, tesseract_path: Optional[str] = None): |
| 15 | 17 | """ |
| 16 | 18 | Initialize text extractor. |
| 17 | - | |
| 19 | + | |
| 18 | 20 | Parameters |
| 19 | 21 | ---------- |
| 20 | 22 | tesseract_path : str, optional |
| 21 | 23 | Path to tesseract executable for local OCR |
| 22 | 24 | """ |
| 23 | 25 | self.tesseract_path = tesseract_path |
| 24 | - | |
| 26 | + | |
| 25 | 27 | # Check if we're using tesseract locally |
| 26 | 28 | self.use_local_ocr = False |
| 27 | 29 | if tesseract_path: |
| 28 | 30 | try: |
| 29 | 31 | import pytesseract |
| 32 | + | |
| 30 | 33 | pytesseract.pytesseract.tesseract_cmd = tesseract_path |
| 31 | 34 | self.use_local_ocr = True |
| 32 | 35 | except ImportError: |
| 33 | 36 | logger.warning("pytesseract not installed, local OCR unavailable") |
| 34 | - | |
| 37 | + | |
| 35 | 38 | def preprocess_image(self, image: np.ndarray) -> np.ndarray: |
| 36 | 39 | """ |
| 37 | 40 | Preprocess image for better text extraction. |
| 38 | - | |
| 41 | + | |
| 39 | 42 | Parameters |
| 40 | 43 | ---------- |
| 41 | 44 | image : np.ndarray |
| 42 | 45 | Input image |
| 43 | - | |
| 46 | + | |
| 44 | 47 | Returns |
| 45 | 48 | ------- |
| 46 | 49 | np.ndarray |
| 47 | 50 | Preprocessed image |
| 48 | 51 | """ |
| @@ -49,66 +52,61 @@ | ||
| 49 | 52 | # Convert to grayscale if not already |
| 50 | 53 | if len(image.shape) == 3: |
| 51 | 54 | gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) |
| 52 | 55 | else: |
| 53 | 56 | gray = image |
| 54 | - | |
| 57 | + | |
| 55 | 58 | # Apply adaptive thresholding |
| 56 | 59 | thresh = cv2.adaptiveThreshold( |
| 57 | - gray, | |
| 58 | - 255, | |
| 59 | - cv2.ADAPTIVE_THRESH_GAUSSIAN_C, | |
| 60 | - cv2.THRESH_BINARY_INV, | |
| 61 | - 11, | |
| 62 | - 2 | |
| 63 | - ) | |
| 64 | - | |
| 60 | + gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2 | |
| 61 | + ) | |
| 62 | + | |
| 65 | 63 | # Noise removal |
| 66 | 64 | kernel = np.ones((1, 1), np.uint8) |
| 67 | 65 | opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel) |
| 68 | - | |
| 66 | + | |
| 69 | 67 | # Invert back |
| 70 | 68 | result = cv2.bitwise_not(opening) |
| 71 | - | |
| 69 | + | |
| 72 | 70 | return result |
| 73 | - | |
| 71 | + | |
| 74 | 72 | def extract_text_local(self, image: np.ndarray) -> str: |
| 75 | 73 | """ |
| 76 | 74 | Extract text from image using local OCR (Tesseract). |
| 77 | - | |
| 75 | + | |
| 78 | 76 | Parameters |
| 79 | 77 | ---------- |
| 80 | 78 | image : np.ndarray |
| 81 | 79 | Input image |
| 82 | - | |
| 80 | + | |
| 83 | 81 | Returns |
| 84 | 82 | ------- |
| 85 | 83 | str |
| 86 | 84 | Extracted text |
| 87 | 85 | """ |
| 88 | 86 | if not self.use_local_ocr: |
| 89 | 87 | raise RuntimeError("Local OCR not configured") |
| 90 | - | |
| 88 | + | |
| 91 | 89 | import pytesseract |
| 92 | - | |
| 90 | + | |
| 93 | 91 | # Preprocess image |
| 94 | 92 | processed = self.preprocess_image(image) |
| 95 | - | |
| 93 | + | |
| 96 | 94 | # Extract text |
| 97 | 95 | text = pytesseract.image_to_string(processed) |
| 98 | - | |
| 96 | + | |
| 99 | 97 | return text |
| 100 | - | |
| 98 | + | |
| 101 | 99 | def detect_text_regions(self, image: np.ndarray) -> List[Tuple[int, int, int, int]]: |
| 102 | 100 | """ |
| 103 | 101 | Detect potential text regions in image. |
| 104 | - | |
| 102 | + | |
| 105 | 103 | Parameters |
| 106 | 104 | ---------- |
| 107 | 105 | image : np.ndarray |
| 108 | 106 | Input image |
| 109 | - | |
| 107 | + | |
| 110 | 108 | Returns |
| 111 | 109 | ------- |
| 112 | 110 | list |
| 113 | 111 | List of bounding boxes for text regions (x, y, w, h) |
| 114 | 112 | """ |
| @@ -115,179 +113,182 @@ | ||
| 115 | 113 | # Convert to grayscale |
| 116 | 114 | if len(image.shape) == 3: |
| 117 | 115 | gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) |
| 118 | 116 | else: |
| 119 | 117 | gray = image |
| 120 | - | |
| 118 | + | |
| 121 | 119 | # Apply MSER (Maximally Stable Extremal Regions) |
| 122 | 120 | mser = cv2.MSER_create() |
| 123 | 121 | regions, _ = mser.detectRegions(gray) |
| 124 | - | |
| 122 | + | |
| 125 | 123 | # Convert regions to bounding boxes |
| 126 | 124 | bboxes = [] |
| 127 | 125 | for region in regions: |
| 128 | 126 | x, y, w, h = cv2.boundingRect(region.reshape(-1, 1, 2)) |
| 129 | - | |
| 127 | + | |
| 130 | 128 | # Apply filtering criteria for text-like regions |
| 131 | 129 | aspect_ratio = w / float(h) |
| 132 | 130 | if 0.1 < aspect_ratio < 10 and h > 5 and w > 5: |
| 133 | 131 | bboxes.append((x, y, w, h)) |
| 134 | - | |
| 132 | + | |
| 135 | 133 | # Merge overlapping boxes |
| 136 | 134 | merged_bboxes = self._merge_overlapping_boxes(bboxes) |
| 137 | - | |
| 135 | + | |
| 138 | 136 | logger.debug(f"Detected {len(merged_bboxes)} text regions") |
| 139 | 137 | return merged_bboxes |
| 140 | - | |
| 141 | - def _merge_overlapping_boxes(self, boxes: List[Tuple[int, int, int, int]]) -> List[Tuple[int, int, int, int]]: | |
| 138 | + | |
| 139 | + def _merge_overlapping_boxes( | |
| 140 | + self, boxes: List[Tuple[int, int, int, int]] | |
| 141 | + ) -> List[Tuple[int, int, int, int]]: | |
| 142 | 142 | """ |
| 143 | 143 | Merge overlapping bounding boxes. |
| 144 | - | |
| 144 | + | |
| 145 | 145 | Parameters |
| 146 | 146 | ---------- |
| 147 | 147 | boxes : list |
| 148 | 148 | List of bounding boxes (x, y, w, h) |
| 149 | - | |
| 149 | + | |
| 150 | 150 | Returns |
| 151 | 151 | ------- |
| 152 | 152 | list |
| 153 | 153 | Merged bounding boxes |
| 154 | 154 | """ |
| 155 | 155 | if not boxes: |
| 156 | 156 | return [] |
| 157 | - | |
| 157 | + | |
| 158 | 158 | # Sort boxes by x coordinate |
| 159 | 159 | sorted_boxes = sorted(boxes, key=lambda b: b[0]) |
| 160 | - | |
| 160 | + | |
| 161 | 161 | merged = [] |
| 162 | 162 | current = list(sorted_boxes[0]) |
| 163 | - | |
| 163 | + | |
| 164 | 164 | for box in sorted_boxes[1:]: |
| 165 | 165 | # Check if current box overlaps with the next one |
| 166 | - if (current[0] <= box[0] + box[2] and | |
| 167 | - box[0] <= current[0] + current[2] and | |
| 168 | - current[1] <= box[1] + box[3] and | |
| 169 | - box[1] <= current[1] + current[3]): | |
| 170 | - | |
| 166 | + if ( | |
| 167 | + current[0] <= box[0] + box[2] | |
| 168 | + and box[0] <= current[0] + current[2] | |
| 169 | + and current[1] <= box[1] + box[3] | |
| 170 | + and box[1] <= current[1] + current[3] | |
| 171 | + ): | |
| 171 | 172 | # Calculate merged box |
| 172 | 173 | x1 = min(current[0], box[0]) |
| 173 | 174 | y1 = min(current[1], box[1]) |
| 174 | 175 | x2 = max(current[0] + current[2], box[0] + box[2]) |
| 175 | 176 | y2 = max(current[1] + current[3], box[1] + box[3]) |
| 176 | - | |
| 177 | + | |
| 177 | 178 | # Update current box |
| 178 | 179 | current = [x1, y1, x2 - x1, y2 - y1] |
| 179 | 180 | else: |
| 180 | 181 | # Add current box to merged list and update current |
| 181 | 182 | merged.append(tuple(current)) |
| 182 | 183 | current = list(box) |
| 183 | - | |
| 184 | + | |
| 184 | 185 | # Add the last box |
| 185 | 186 | merged.append(tuple(current)) |
| 186 | - | |
| 187 | + | |
| 187 | 188 | return merged |
| 188 | - | |
| 189 | + | |
| 189 | 190 | def extract_text_from_regions( |
| 190 | - self, | |
| 191 | - image: np.ndarray, | |
| 192 | - regions: List[Tuple[int, int, int, int]] | |
| 191 | + self, image: np.ndarray, regions: List[Tuple[int, int, int, int]] | |
| 193 | 192 | ) -> Dict[Tuple[int, int, int, int], str]: |
| 194 | 193 | """ |
| 195 | 194 | Extract text from specified regions in image. |
| 196 | - | |
| 195 | + | |
| 197 | 196 | Parameters |
| 198 | 197 | ---------- |
| 199 | 198 | image : np.ndarray |
| 200 | 199 | Input image |
| 201 | 200 | regions : list |
| 202 | 201 | List of regions as (x, y, w, h) |
| 203 | - | |
| 202 | + | |
| 204 | 203 | Returns |
| 205 | 204 | ------- |
| 206 | 205 | dict |
| 207 | 206 | Dictionary of {region: text} |
| 208 | 207 | """ |
| 209 | 208 | results = {} |
| 210 | - | |
| 209 | + | |
| 211 | 210 | for region in regions: |
| 212 | 211 | x, y, w, h = region |
| 213 | - | |
| 212 | + | |
| 214 | 213 | # Extract region |
| 215 | - roi = image[y:y+h, x:x+w] | |
| 216 | - | |
| 214 | + roi = image[y : y + h, x : x + w] | |
| 215 | + | |
| 217 | 216 | # Skip empty regions |
| 218 | 217 | if roi.size == 0: |
| 219 | 218 | continue |
| 220 | - | |
| 219 | + | |
| 221 | 220 | # Extract text |
| 222 | 221 | if self.use_local_ocr: |
| 223 | 222 | text = self.extract_text_local(roi) |
| 224 | 223 | else: |
| 225 | 224 | text = "API-based text extraction not yet implemented" |
| 226 | - | |
| 225 | + | |
| 227 | 226 | # Store non-empty results |
| 228 | 227 | if text.strip(): |
| 229 | 228 | results[region] = text.strip() |
| 230 | - | |
| 229 | + | |
| 231 | 230 | return results |
| 232 | - | |
| 231 | + | |
| 233 | 232 | def extract_text_from_image(self, image: np.ndarray, detect_regions: bool = True) -> str: |
| 234 | 233 | """ |
| 235 | 234 | Extract text from entire image. |
| 236 | - | |
| 235 | + | |
| 237 | 236 | Parameters |
| 238 | 237 | ---------- |
| 239 | 238 | image : np.ndarray |
| 240 | 239 | Input image |
| 241 | 240 | detect_regions : bool |
| 242 | 241 | Whether to detect and process text regions separately |
| 243 | - | |
| 242 | + | |
| 244 | 243 | Returns |
| 245 | 244 | ------- |
| 246 | 245 | str |
| 247 | 246 | Extracted text |
| 248 | 247 | """ |
| 249 | 248 | if detect_regions: |
| 250 | 249 | # Detect regions and extract text from each |
| 251 | 250 | regions = self.detect_text_regions(image) |
| 252 | 251 | region_texts = self.extract_text_from_regions(image, regions) |
| 253 | - | |
| 252 | + | |
| 254 | 253 | # Combine text from all regions |
| 255 | 254 | text = "\n".join(region_texts.values()) |
| 256 | 255 | else: |
| 257 | 256 | # Extract text from entire image |
| 258 | 257 | if self.use_local_ocr: |
| 259 | 258 | text = self.extract_text_local(image) |
| 260 | 259 | else: |
| 261 | 260 | text = "API-based text extraction not yet implemented" |
| 262 | - | |
| 261 | + | |
| 263 | 262 | return text |
| 264 | - | |
| 265 | - def extract_text_from_file(self, image_path: Union[str, Path], detect_regions: bool = True) -> str: | |
| 263 | + | |
| 264 | + def extract_text_from_file( | |
| 265 | + self, image_path: Union[str, Path], detect_regions: bool = True | |
| 266 | + ) -> str: | |
| 266 | 267 | """ |
| 267 | 268 | Extract text from image file. |
| 268 | - | |
| 269 | + | |
| 269 | 270 | Parameters |
| 270 | 271 | ---------- |
| 271 | 272 | image_path : str or Path |
| 272 | 273 | Path to image file |
| 273 | 274 | detect_regions : bool |
| 274 | 275 | Whether to detect and process text regions separately |
| 275 | - | |
| 276 | + | |
| 276 | 277 | Returns |
| 277 | 278 | ------- |
| 278 | 279 | str |
| 279 | 280 | Extracted text |
| 280 | 281 | """ |
| 281 | 282 | image_path = Path(image_path) |
| 282 | 283 | if not image_path.exists(): |
| 283 | 284 | raise FileNotFoundError(f"Image file not found: {image_path}") |
| 284 | - | |
| 285 | + | |
| 285 | 286 | # Load image |
| 286 | 287 | image = cv2.imread(str(image_path)) |
| 287 | 288 | if image is None: |
| 288 | 289 | raise ValueError(f"Failed to load image: {image_path}") |
| 289 | - | |
| 290 | + | |
| 290 | 291 | # Extract text |
| 291 | 292 | text = self.extract_text_from_image(image, detect_regions) |
| 292 | - | |
| 293 | + | |
| 293 | 294 | return text |
| 294 | 295 |
| --- video_processor/extractors/text_extractor.py | |
| +++ video_processor/extractors/text_extractor.py | |
| @@ -1,48 +1,51 @@ | |
| 1 | """Text extraction module for frames and diagrams.""" |
| 2 | import logging |
| 3 | from pathlib import Path |
| 4 | from typing import Dict, List, Optional, Tuple, Union |
| 5 | |
| 6 | import cv2 |
| 7 | import numpy as np |
| 8 | |
| 9 | logger = logging.getLogger(__name__) |
| 10 | |
| 11 | class TextExtractor: |
| 12 | """Extract text from images, frames, and diagrams.""" |
| 13 | |
| 14 | def __init__(self, tesseract_path: Optional[str] = None): |
| 15 | """ |
| 16 | Initialize text extractor. |
| 17 | |
| 18 | Parameters |
| 19 | ---------- |
| 20 | tesseract_path : str, optional |
| 21 | Path to tesseract executable for local OCR |
| 22 | """ |
| 23 | self.tesseract_path = tesseract_path |
| 24 | |
| 25 | # Check if we're using tesseract locally |
| 26 | self.use_local_ocr = False |
| 27 | if tesseract_path: |
| 28 | try: |
| 29 | import pytesseract |
| 30 | pytesseract.pytesseract.tesseract_cmd = tesseract_path |
| 31 | self.use_local_ocr = True |
| 32 | except ImportError: |
| 33 | logger.warning("pytesseract not installed, local OCR unavailable") |
| 34 | |
| 35 | def preprocess_image(self, image: np.ndarray) -> np.ndarray: |
| 36 | """ |
| 37 | Preprocess image for better text extraction. |
| 38 | |
| 39 | Parameters |
| 40 | ---------- |
| 41 | image : np.ndarray |
| 42 | Input image |
| 43 | |
| 44 | Returns |
| 45 | ------- |
| 46 | np.ndarray |
| 47 | Preprocessed image |
| 48 | """ |
| @@ -49,66 +52,61 @@ | |
| 49 | # Convert to grayscale if not already |
| 50 | if len(image.shape) == 3: |
| 51 | gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) |
| 52 | else: |
| 53 | gray = image |
| 54 | |
| 55 | # Apply adaptive thresholding |
| 56 | thresh = cv2.adaptiveThreshold( |
| 57 | gray, |
| 58 | 255, |
| 59 | cv2.ADAPTIVE_THRESH_GAUSSIAN_C, |
| 60 | cv2.THRESH_BINARY_INV, |
| 61 | 11, |
| 62 | 2 |
| 63 | ) |
| 64 | |
| 65 | # Noise removal |
| 66 | kernel = np.ones((1, 1), np.uint8) |
| 67 | opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel) |
| 68 | |
| 69 | # Invert back |
| 70 | result = cv2.bitwise_not(opening) |
| 71 | |
| 72 | return result |
| 73 | |
| 74 | def extract_text_local(self, image: np.ndarray) -> str: |
| 75 | """ |
| 76 | Extract text from image using local OCR (Tesseract). |
| 77 | |
| 78 | Parameters |
| 79 | ---------- |
| 80 | image : np.ndarray |
| 81 | Input image |
| 82 | |
| 83 | Returns |
| 84 | ------- |
| 85 | str |
| 86 | Extracted text |
| 87 | """ |
| 88 | if not self.use_local_ocr: |
| 89 | raise RuntimeError("Local OCR not configured") |
| 90 | |
| 91 | import pytesseract |
| 92 | |
| 93 | # Preprocess image |
| 94 | processed = self.preprocess_image(image) |
| 95 | |
| 96 | # Extract text |
| 97 | text = pytesseract.image_to_string(processed) |
| 98 | |
| 99 | return text |
| 100 | |
| 101 | def detect_text_regions(self, image: np.ndarray) -> List[Tuple[int, int, int, int]]: |
| 102 | """ |
| 103 | Detect potential text regions in image. |
| 104 | |
| 105 | Parameters |
| 106 | ---------- |
| 107 | image : np.ndarray |
| 108 | Input image |
| 109 | |
| 110 | Returns |
| 111 | ------- |
| 112 | list |
| 113 | List of bounding boxes for text regions (x, y, w, h) |
| 114 | """ |
| @@ -115,179 +113,182 @@ | |
| 115 | # Convert to grayscale |
| 116 | if len(image.shape) == 3: |
| 117 | gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) |
| 118 | else: |
| 119 | gray = image |
| 120 | |
| 121 | # Apply MSER (Maximally Stable Extremal Regions) |
| 122 | mser = cv2.MSER_create() |
| 123 | regions, _ = mser.detectRegions(gray) |
| 124 | |
| 125 | # Convert regions to bounding boxes |
| 126 | bboxes = [] |
| 127 | for region in regions: |
| 128 | x, y, w, h = cv2.boundingRect(region.reshape(-1, 1, 2)) |
| 129 | |
| 130 | # Apply filtering criteria for text-like regions |
| 131 | aspect_ratio = w / float(h) |
| 132 | if 0.1 < aspect_ratio < 10 and h > 5 and w > 5: |
| 133 | bboxes.append((x, y, w, h)) |
| 134 | |
| 135 | # Merge overlapping boxes |
| 136 | merged_bboxes = self._merge_overlapping_boxes(bboxes) |
| 137 | |
| 138 | logger.debug(f"Detected {len(merged_bboxes)} text regions") |
| 139 | return merged_bboxes |
| 140 | |
| 141 | def _merge_overlapping_boxes(self, boxes: List[Tuple[int, int, int, int]]) -> List[Tuple[int, int, int, int]]: |
| 142 | """ |
| 143 | Merge overlapping bounding boxes. |
| 144 | |
| 145 | Parameters |
| 146 | ---------- |
| 147 | boxes : list |
| 148 | List of bounding boxes (x, y, w, h) |
| 149 | |
| 150 | Returns |
| 151 | ------- |
| 152 | list |
| 153 | Merged bounding boxes |
| 154 | """ |
| 155 | if not boxes: |
| 156 | return [] |
| 157 | |
| 158 | # Sort boxes by x coordinate |
| 159 | sorted_boxes = sorted(boxes, key=lambda b: b[0]) |
| 160 | |
| 161 | merged = [] |
| 162 | current = list(sorted_boxes[0]) |
| 163 | |
| 164 | for box in sorted_boxes[1:]: |
| 165 | # Check if current box overlaps with the next one |
| 166 | if (current[0] <= box[0] + box[2] and |
| 167 | box[0] <= current[0] + current[2] and |
| 168 | current[1] <= box[1] + box[3] and |
| 169 | box[1] <= current[1] + current[3]): |
| 170 | |
| 171 | # Calculate merged box |
| 172 | x1 = min(current[0], box[0]) |
| 173 | y1 = min(current[1], box[1]) |
| 174 | x2 = max(current[0] + current[2], box[0] + box[2]) |
| 175 | y2 = max(current[1] + current[3], box[1] + box[3]) |
| 176 | |
| 177 | # Update current box |
| 178 | current = [x1, y1, x2 - x1, y2 - y1] |
| 179 | else: |
| 180 | # Add current box to merged list and update current |
| 181 | merged.append(tuple(current)) |
| 182 | current = list(box) |
| 183 | |
| 184 | # Add the last box |
| 185 | merged.append(tuple(current)) |
| 186 | |
| 187 | return merged |
| 188 | |
| 189 | def extract_text_from_regions( |
| 190 | self, |
| 191 | image: np.ndarray, |
| 192 | regions: List[Tuple[int, int, int, int]] |
| 193 | ) -> Dict[Tuple[int, int, int, int], str]: |
| 194 | """ |
| 195 | Extract text from specified regions in image. |
| 196 | |
| 197 | Parameters |
| 198 | ---------- |
| 199 | image : np.ndarray |
| 200 | Input image |
| 201 | regions : list |
| 202 | List of regions as (x, y, w, h) |
| 203 | |
| 204 | Returns |
| 205 | ------- |
| 206 | dict |
| 207 | Dictionary of {region: text} |
| 208 | """ |
| 209 | results = {} |
| 210 | |
| 211 | for region in regions: |
| 212 | x, y, w, h = region |
| 213 | |
| 214 | # Extract region |
| 215 | roi = image[y:y+h, x:x+w] |
| 216 | |
| 217 | # Skip empty regions |
| 218 | if roi.size == 0: |
| 219 | continue |
| 220 | |
| 221 | # Extract text |
| 222 | if self.use_local_ocr: |
| 223 | text = self.extract_text_local(roi) |
| 224 | else: |
| 225 | text = "API-based text extraction not yet implemented" |
| 226 | |
| 227 | # Store non-empty results |
| 228 | if text.strip(): |
| 229 | results[region] = text.strip() |
| 230 | |
| 231 | return results |
| 232 | |
| 233 | def extract_text_from_image(self, image: np.ndarray, detect_regions: bool = True) -> str: |
| 234 | """ |
| 235 | Extract text from entire image. |
| 236 | |
| 237 | Parameters |
| 238 | ---------- |
| 239 | image : np.ndarray |
| 240 | Input image |
| 241 | detect_regions : bool |
| 242 | Whether to detect and process text regions separately |
| 243 | |
| 244 | Returns |
| 245 | ------- |
| 246 | str |
| 247 | Extracted text |
| 248 | """ |
| 249 | if detect_regions: |
| 250 | # Detect regions and extract text from each |
| 251 | regions = self.detect_text_regions(image) |
| 252 | region_texts = self.extract_text_from_regions(image, regions) |
| 253 | |
| 254 | # Combine text from all regions |
| 255 | text = "\n".join(region_texts.values()) |
| 256 | else: |
| 257 | # Extract text from entire image |
| 258 | if self.use_local_ocr: |
| 259 | text = self.extract_text_local(image) |
| 260 | else: |
| 261 | text = "API-based text extraction not yet implemented" |
| 262 | |
| 263 | return text |
| 264 | |
| 265 | def extract_text_from_file(self, image_path: Union[str, Path], detect_regions: bool = True) -> str: |
| 266 | """ |
| 267 | Extract text from image file. |
| 268 | |
| 269 | Parameters |
| 270 | ---------- |
| 271 | image_path : str or Path |
| 272 | Path to image file |
| 273 | detect_regions : bool |
| 274 | Whether to detect and process text regions separately |
| 275 | |
| 276 | Returns |
| 277 | ------- |
| 278 | str |
| 279 | Extracted text |
| 280 | """ |
| 281 | image_path = Path(image_path) |
| 282 | if not image_path.exists(): |
| 283 | raise FileNotFoundError(f"Image file not found: {image_path}") |
| 284 | |
| 285 | # Load image |
| 286 | image = cv2.imread(str(image_path)) |
| 287 | if image is None: |
| 288 | raise ValueError(f"Failed to load image: {image_path}") |
| 289 | |
| 290 | # Extract text |
| 291 | text = self.extract_text_from_image(image, detect_regions) |
| 292 | |
| 293 | return text |
| 294 |
| --- video_processor/extractors/text_extractor.py | |
| +++ video_processor/extractors/text_extractor.py | |
| @@ -1,48 +1,51 @@ | |
| 1 | """Text extraction module for frames and diagrams.""" |
| 2 | |
| 3 | import logging |
| 4 | from pathlib import Path |
| 5 | from typing import Dict, List, Optional, Tuple, Union |
| 6 | |
| 7 | import cv2 |
| 8 | import numpy as np |
| 9 | |
| 10 | logger = logging.getLogger(__name__) |
| 11 | |
| 12 | |
| 13 | class TextExtractor: |
| 14 | """Extract text from images, frames, and diagrams.""" |
| 15 | |
| 16 | def __init__(self, tesseract_path: Optional[str] = None): |
| 17 | """ |
| 18 | Initialize text extractor. |
| 19 | |
| 20 | Parameters |
| 21 | ---------- |
| 22 | tesseract_path : str, optional |
| 23 | Path to tesseract executable for local OCR |
| 24 | """ |
| 25 | self.tesseract_path = tesseract_path |
| 26 | |
| 27 | # Check if we're using tesseract locally |
| 28 | self.use_local_ocr = False |
| 29 | if tesseract_path: |
| 30 | try: |
| 31 | import pytesseract |
| 32 | |
| 33 | pytesseract.pytesseract.tesseract_cmd = tesseract_path |
| 34 | self.use_local_ocr = True |
| 35 | except ImportError: |
| 36 | logger.warning("pytesseract not installed, local OCR unavailable") |
| 37 | |
| 38 | def preprocess_image(self, image: np.ndarray) -> np.ndarray: |
| 39 | """ |
| 40 | Preprocess image for better text extraction. |
| 41 | |
| 42 | Parameters |
| 43 | ---------- |
| 44 | image : np.ndarray |
| 45 | Input image |
| 46 | |
| 47 | Returns |
| 48 | ------- |
| 49 | np.ndarray |
| 50 | Preprocessed image |
| 51 | """ |
| @@ -49,66 +52,61 @@ | |
| 52 | # Convert to grayscale if not already |
| 53 | if len(image.shape) == 3: |
| 54 | gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) |
| 55 | else: |
| 56 | gray = image |
| 57 | |
| 58 | # Apply adaptive thresholding |
| 59 | thresh = cv2.adaptiveThreshold( |
| 60 | gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2 |
| 61 | ) |
| 62 | |
| 63 | # Noise removal |
| 64 | kernel = np.ones((1, 1), np.uint8) |
| 65 | opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel) |
| 66 | |
| 67 | # Invert back |
| 68 | result = cv2.bitwise_not(opening) |
| 69 | |
| 70 | return result |
| 71 | |
| 72 | def extract_text_local(self, image: np.ndarray) -> str: |
| 73 | """ |
| 74 | Extract text from image using local OCR (Tesseract). |
| 75 | |
| 76 | Parameters |
| 77 | ---------- |
| 78 | image : np.ndarray |
| 79 | Input image |
| 80 | |
| 81 | Returns |
| 82 | ------- |
| 83 | str |
| 84 | Extracted text |
| 85 | """ |
| 86 | if not self.use_local_ocr: |
| 87 | raise RuntimeError("Local OCR not configured") |
| 88 | |
| 89 | import pytesseract |
| 90 | |
| 91 | # Preprocess image |
| 92 | processed = self.preprocess_image(image) |
| 93 | |
| 94 | # Extract text |
| 95 | text = pytesseract.image_to_string(processed) |
| 96 | |
| 97 | return text |
| 98 | |
| 99 | def detect_text_regions(self, image: np.ndarray) -> List[Tuple[int, int, int, int]]: |
| 100 | """ |
| 101 | Detect potential text regions in image. |
| 102 | |
| 103 | Parameters |
| 104 | ---------- |
| 105 | image : np.ndarray |
| 106 | Input image |
| 107 | |
| 108 | Returns |
| 109 | ------- |
| 110 | list |
| 111 | List of bounding boxes for text regions (x, y, w, h) |
| 112 | """ |
| @@ -115,179 +113,182 @@ | |
| 113 | # Convert to grayscale |
| 114 | if len(image.shape) == 3: |
| 115 | gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) |
| 116 | else: |
| 117 | gray = image |
| 118 | |
| 119 | # Apply MSER (Maximally Stable Extremal Regions) |
| 120 | mser = cv2.MSER_create() |
| 121 | regions, _ = mser.detectRegions(gray) |
| 122 | |
| 123 | # Convert regions to bounding boxes |
| 124 | bboxes = [] |
| 125 | for region in regions: |
| 126 | x, y, w, h = cv2.boundingRect(region.reshape(-1, 1, 2)) |
| 127 | |
| 128 | # Apply filtering criteria for text-like regions |
| 129 | aspect_ratio = w / float(h) |
| 130 | if 0.1 < aspect_ratio < 10 and h > 5 and w > 5: |
| 131 | bboxes.append((x, y, w, h)) |
| 132 | |
| 133 | # Merge overlapping boxes |
| 134 | merged_bboxes = self._merge_overlapping_boxes(bboxes) |
| 135 | |
| 136 | logger.debug(f"Detected {len(merged_bboxes)} text regions") |
| 137 | return merged_bboxes |
| 138 | |
| 139 | def _merge_overlapping_boxes( |
| 140 | self, boxes: List[Tuple[int, int, int, int]] |
| 141 | ) -> List[Tuple[int, int, int, int]]: |
| 142 | """ |
| 143 | Merge overlapping bounding boxes. |
| 144 | |
| 145 | Parameters |
| 146 | ---------- |
| 147 | boxes : list |
| 148 | List of bounding boxes (x, y, w, h) |
| 149 | |
| 150 | Returns |
| 151 | ------- |
| 152 | list |
| 153 | Merged bounding boxes |
| 154 | """ |
| 155 | if not boxes: |
| 156 | return [] |
| 157 | |
| 158 | # Sort boxes by x coordinate |
| 159 | sorted_boxes = sorted(boxes, key=lambda b: b[0]) |
| 160 | |
| 161 | merged = [] |
| 162 | current = list(sorted_boxes[0]) |
| 163 | |
| 164 | for box in sorted_boxes[1:]: |
| 165 | # Check if current box overlaps with the next one |
| 166 | if ( |
| 167 | current[0] <= box[0] + box[2] |
| 168 | and box[0] <= current[0] + current[2] |
| 169 | and current[1] <= box[1] + box[3] |
| 170 | and box[1] <= current[1] + current[3] |
| 171 | ): |
| 172 | # Calculate merged box |
| 173 | x1 = min(current[0], box[0]) |
| 174 | y1 = min(current[1], box[1]) |
| 175 | x2 = max(current[0] + current[2], box[0] + box[2]) |
| 176 | y2 = max(current[1] + current[3], box[1] + box[3]) |
| 177 | |
| 178 | # Update current box |
| 179 | current = [x1, y1, x2 - x1, y2 - y1] |
| 180 | else: |
| 181 | # Add current box to merged list and update current |
| 182 | merged.append(tuple(current)) |
| 183 | current = list(box) |
| 184 | |
| 185 | # Add the last box |
| 186 | merged.append(tuple(current)) |
| 187 | |
| 188 | return merged |
| 189 | |
| 190 | def extract_text_from_regions( |
| 191 | self, image: np.ndarray, regions: List[Tuple[int, int, int, int]] |
| 192 | ) -> Dict[Tuple[int, int, int, int], str]: |
| 193 | """ |
| 194 | Extract text from specified regions in image. |
| 195 | |
| 196 | Parameters |
| 197 | ---------- |
| 198 | image : np.ndarray |
| 199 | Input image |
| 200 | regions : list |
| 201 | List of regions as (x, y, w, h) |
| 202 | |
| 203 | Returns |
| 204 | ------- |
| 205 | dict |
| 206 | Dictionary of {region: text} |
| 207 | """ |
| 208 | results = {} |
| 209 | |
| 210 | for region in regions: |
| 211 | x, y, w, h = region |
| 212 | |
| 213 | # Extract region |
| 214 | roi = image[y : y + h, x : x + w] |
| 215 | |
| 216 | # Skip empty regions |
| 217 | if roi.size == 0: |
| 218 | continue |
| 219 | |
| 220 | # Extract text |
| 221 | if self.use_local_ocr: |
| 222 | text = self.extract_text_local(roi) |
| 223 | else: |
| 224 | text = "API-based text extraction not yet implemented" |
| 225 | |
| 226 | # Store non-empty results |
| 227 | if text.strip(): |
| 228 | results[region] = text.strip() |
| 229 | |
| 230 | return results |
| 231 | |
| 232 | def extract_text_from_image(self, image: np.ndarray, detect_regions: bool = True) -> str: |
| 233 | """ |
| 234 | Extract text from entire image. |
| 235 | |
| 236 | Parameters |
| 237 | ---------- |
| 238 | image : np.ndarray |
| 239 | Input image |
| 240 | detect_regions : bool |
| 241 | Whether to detect and process text regions separately |
| 242 | |
| 243 | Returns |
| 244 | ------- |
| 245 | str |
| 246 | Extracted text |
| 247 | """ |
| 248 | if detect_regions: |
| 249 | # Detect regions and extract text from each |
| 250 | regions = self.detect_text_regions(image) |
| 251 | region_texts = self.extract_text_from_regions(image, regions) |
| 252 | |
| 253 | # Combine text from all regions |
| 254 | text = "\n".join(region_texts.values()) |
| 255 | else: |
| 256 | # Extract text from entire image |
| 257 | if self.use_local_ocr: |
| 258 | text = self.extract_text_local(image) |
| 259 | else: |
| 260 | text = "API-based text extraction not yet implemented" |
| 261 | |
| 262 | return text |
| 263 | |
| 264 | def extract_text_from_file( |
| 265 | self, image_path: Union[str, Path], detect_regions: bool = True |
| 266 | ) -> str: |
| 267 | """ |
| 268 | Extract text from image file. |
| 269 | |
| 270 | Parameters |
| 271 | ---------- |
| 272 | image_path : str or Path |
| 273 | Path to image file |
| 274 | detect_regions : bool |
| 275 | Whether to detect and process text regions separately |
| 276 | |
| 277 | Returns |
| 278 | ------- |
| 279 | str |
| 280 | Extracted text |
| 281 | """ |
| 282 | image_path = Path(image_path) |
| 283 | if not image_path.exists(): |
| 284 | raise FileNotFoundError(f"Image file not found: {image_path}") |
| 285 | |
| 286 | # Load image |
| 287 | image = cv2.imread(str(image_path)) |
| 288 | if image is None: |
| 289 | raise ValueError(f"Failed to load image: {image_path}") |
| 290 | |
| 291 | # Extract text |
| 292 | text = self.extract_text_from_image(image, detect_regions) |
| 293 | |
| 294 | return text |
| 295 |
| --- video_processor/integrators/knowledge_graph.py | ||
| +++ video_processor/integrators/knowledge_graph.py | ||
| @@ -1,8 +1,7 @@ | ||
| 1 | 1 | """Knowledge graph integration for organizing extracted content.""" |
| 2 | 2 | |
| 3 | -import json | |
| 4 | 3 | import logging |
| 5 | 4 | from pathlib import Path |
| 6 | 5 | from typing import Dict, List, Optional, Union |
| 7 | 6 | |
| 8 | 7 | from tqdm import tqdm |
| @@ -33,18 +32,24 @@ | ||
| 33 | 32 | [{"role": "user", "content": prompt}], |
| 34 | 33 | max_tokens=4096, |
| 35 | 34 | temperature=temperature, |
| 36 | 35 | ) |
| 37 | 36 | |
| 38 | - def extract_entities_and_relationships(self, text: str) -> tuple[List[Entity], List[Relationship]]: | |
| 37 | + def extract_entities_and_relationships( | |
| 38 | + self, text: str | |
| 39 | + ) -> tuple[List[Entity], List[Relationship]]: | |
| 39 | 40 | """Extract entities and relationships in a single LLM call.""" |
| 40 | 41 | prompt = ( |
| 41 | 42 | "Extract all notable entities and relationships from the following content.\n\n" |
| 42 | 43 | f"CONTENT:\n{text}\n\n" |
| 43 | 44 | "Return a JSON object with two keys:\n" |
| 44 | - '- "entities": array of {"name": "...", "type": "person|concept|technology|organization|time", "description": "brief description"}\n' | |
| 45 | - '- "relationships": array of {"source": "entity name", "target": "entity name", "type": "relationship description"}\n\n' | |
| 45 | + '- "entities": array of {"name": "...", ' | |
| 46 | + '"type": "person|concept|technology|organization|time", ' | |
| 47 | + '"description": "brief description"}\n' | |
| 48 | + '- "relationships": array of {"source": "entity name", ' | |
| 49 | + '"target": "entity name", ' | |
| 50 | + '"type": "relationship description"}\n\n' | |
| 46 | 51 | "Return ONLY the JSON object." |
| 47 | 52 | ) |
| 48 | 53 | raw = self._chat(prompt) |
| 49 | 54 | parsed = parse_json_from_response(raw) |
| 50 | 55 | |
| @@ -52,32 +57,38 @@ | ||
| 52 | 57 | rels = [] |
| 53 | 58 | |
| 54 | 59 | if isinstance(parsed, dict): |
| 55 | 60 | for item in parsed.get("entities", []): |
| 56 | 61 | if isinstance(item, dict) and "name" in item: |
| 57 | - entities.append(Entity( | |
| 58 | - name=item["name"], | |
| 59 | - type=item.get("type", "concept"), | |
| 60 | - descriptions=[item["description"]] if item.get("description") else [], | |
| 61 | - )) | |
| 62 | - entity_names = {e.name for e in entities} | |
| 62 | + entities.append( | |
| 63 | + Entity( | |
| 64 | + name=item["name"], | |
| 65 | + type=item.get("type", "concept"), | |
| 66 | + descriptions=[item["description"]] if item.get("description") else [], | |
| 67 | + ) | |
| 68 | + ) | |
| 69 | + {e.name for e in entities} | |
| 63 | 70 | for item in parsed.get("relationships", []): |
| 64 | 71 | if isinstance(item, dict) and "source" in item and "target" in item: |
| 65 | - rels.append(Relationship( | |
| 66 | - source=item["source"], | |
| 67 | - target=item["target"], | |
| 68 | - type=item.get("type", "related_to"), | |
| 69 | - )) | |
| 72 | + rels.append( | |
| 73 | + Relationship( | |
| 74 | + source=item["source"], | |
| 75 | + target=item["target"], | |
| 76 | + type=item.get("type", "related_to"), | |
| 77 | + ) | |
| 78 | + ) | |
| 70 | 79 | elif isinstance(parsed, list): |
| 71 | 80 | # Fallback: if model returns a flat entity list |
| 72 | 81 | for item in parsed: |
| 73 | 82 | if isinstance(item, dict) and "name" in item: |
| 74 | - entities.append(Entity( | |
| 75 | - name=item["name"], | |
| 76 | - type=item.get("type", "concept"), | |
| 77 | - descriptions=[item["description"]] if item.get("description") else [], | |
| 78 | - )) | |
| 83 | + entities.append( | |
| 84 | + Entity( | |
| 85 | + name=item["name"], | |
| 86 | + type=item.get("type", "concept"), | |
| 87 | + descriptions=[item["description"]] if item.get("description") else [], | |
| 88 | + ) | |
| 89 | + ) | |
| 79 | 90 | |
| 80 | 91 | return entities, rels |
| 81 | 92 | |
| 82 | 93 | def add_content(self, text: str, source: str, timestamp: Optional[float] = None) -> None: |
| 83 | 94 | """Add content to knowledge graph by extracting entities and relationships.""" |
| @@ -84,39 +95,45 @@ | ||
| 84 | 95 | entities, relationships = self.extract_entities_and_relationships(text) |
| 85 | 96 | |
| 86 | 97 | for entity in entities: |
| 87 | 98 | eid = entity.name |
| 88 | 99 | if eid in self.nodes: |
| 89 | - self.nodes[eid]["occurrences"].append({ | |
| 90 | - "source": source, | |
| 91 | - "timestamp": timestamp, | |
| 92 | - "text": text[:100] + "..." if len(text) > 100 else text, | |
| 93 | - }) | |
| 100 | + self.nodes[eid]["occurrences"].append( | |
| 101 | + { | |
| 102 | + "source": source, | |
| 103 | + "timestamp": timestamp, | |
| 104 | + "text": text[:100] + "..." if len(text) > 100 else text, | |
| 105 | + } | |
| 106 | + ) | |
| 94 | 107 | if entity.descriptions: |
| 95 | 108 | self.nodes[eid]["descriptions"].update(entity.descriptions) |
| 96 | 109 | else: |
| 97 | 110 | self.nodes[eid] = { |
| 98 | 111 | "id": eid, |
| 99 | 112 | "name": entity.name, |
| 100 | 113 | "type": entity.type, |
| 101 | 114 | "descriptions": set(entity.descriptions), |
| 102 | - "occurrences": [{ | |
| 103 | - "source": source, | |
| 104 | - "timestamp": timestamp, | |
| 105 | - "text": text[:100] + "..." if len(text) > 100 else text, | |
| 106 | - }], | |
| 115 | + "occurrences": [ | |
| 116 | + { | |
| 117 | + "source": source, | |
| 118 | + "timestamp": timestamp, | |
| 119 | + "text": text[:100] + "..." if len(text) > 100 else text, | |
| 120 | + } | |
| 121 | + ], | |
| 107 | 122 | } |
| 108 | 123 | |
| 109 | 124 | for rel in relationships: |
| 110 | 125 | if rel.source in self.nodes and rel.target in self.nodes: |
| 111 | - self.relationships.append({ | |
| 112 | - "source": rel.source, | |
| 113 | - "target": rel.target, | |
| 114 | - "type": rel.type, | |
| 115 | - "content_source": source, | |
| 116 | - "timestamp": timestamp, | |
| 117 | - }) | |
| 126 | + self.relationships.append( | |
| 127 | + { | |
| 128 | + "source": rel.source, | |
| 129 | + "target": rel.target, | |
| 130 | + "type": rel.type, | |
| 131 | + "content_source": source, | |
| 132 | + "timestamp": timestamp, | |
| 133 | + } | |
| 134 | + ) | |
| 118 | 135 | |
| 119 | 136 | def process_transcript(self, transcript: Dict, batch_size: int = 10) -> None: |
| 120 | 137 | """Process transcript segments into knowledge graph, batching for efficiency.""" |
| 121 | 138 | if "segments" not in transcript: |
| 122 | 139 | logger.warning("Transcript missing segments") |
| @@ -137,17 +154,15 @@ | ||
| 137 | 154 | } |
| 138 | 155 | |
| 139 | 156 | # Batch segments together for fewer API calls |
| 140 | 157 | batches = [] |
| 141 | 158 | for start in range(0, len(segments), batch_size): |
| 142 | - batches.append(segments[start:start + batch_size]) | |
| 159 | + batches.append(segments[start : start + batch_size]) | |
| 143 | 160 | |
| 144 | 161 | for batch in tqdm(batches, desc="Building knowledge graph", unit="batch"): |
| 145 | 162 | # Combine batch text |
| 146 | - combined_text = " ".join( | |
| 147 | - seg["text"] for seg in batch if "text" in seg | |
| 148 | - ) | |
| 163 | + combined_text = " ".join(seg["text"] for seg in batch if "text" in seg) | |
| 149 | 164 | if not combined_text.strip(): |
| 150 | 165 | continue |
| 151 | 166 | |
| 152 | 167 | # Use first segment's timestamp as batch timestamp |
| 153 | 168 | batch_start_idx = segments.index(batch[0]) |
| @@ -169,29 +184,33 @@ | ||
| 169 | 184 | self.nodes[diagram_id] = { |
| 170 | 185 | "id": diagram_id, |
| 171 | 186 | "name": f"Diagram {i}", |
| 172 | 187 | "type": "diagram", |
| 173 | 188 | "descriptions": {"Visual diagram from video"}, |
| 174 | - "occurrences": [{ | |
| 175 | - "source": source if text_content else f"diagram_{i}", | |
| 176 | - "frame_index": diagram.get("frame_index"), | |
| 177 | - }], | |
| 189 | + "occurrences": [ | |
| 190 | + { | |
| 191 | + "source": source if text_content else f"diagram_{i}", | |
| 192 | + "frame_index": diagram.get("frame_index"), | |
| 193 | + } | |
| 194 | + ], | |
| 178 | 195 | } |
| 179 | 196 | |
| 180 | 197 | def to_data(self) -> KnowledgeGraphData: |
| 181 | 198 | """Convert to pydantic KnowledgeGraphData model.""" |
| 182 | 199 | nodes = [] |
| 183 | 200 | for node in self.nodes.values(): |
| 184 | 201 | descs = node.get("descriptions", set()) |
| 185 | 202 | if isinstance(descs, set): |
| 186 | 203 | descs = list(descs) |
| 187 | - nodes.append(Entity( | |
| 188 | - name=node["name"], | |
| 189 | - type=node.get("type", "concept"), | |
| 190 | - descriptions=descs, | |
| 191 | - occurrences=node.get("occurrences", []), | |
| 192 | - )) | |
| 204 | + nodes.append( | |
| 205 | + Entity( | |
| 206 | + name=node["name"], | |
| 207 | + type=node.get("type", "concept"), | |
| 208 | + descriptions=descs, | |
| 209 | + occurrences=node.get("occurrences", []), | |
| 210 | + ) | |
| 211 | + ) | |
| 193 | 212 | |
| 194 | 213 | rels = [ |
| 195 | 214 | Relationship( |
| 196 | 215 | source=r["source"], |
| 197 | 216 | target=r["target"], |
| @@ -280,11 +299,12 @@ | ||
| 280 | 299 | def generate_mermaid(self, max_nodes: int = 30) -> str: |
| 281 | 300 | """Generate Mermaid visualization code.""" |
| 282 | 301 | node_importance = {} |
| 283 | 302 | for node_id in self.nodes: |
| 284 | 303 | count = sum( |
| 285 | - 1 for rel in self.relationships | |
| 304 | + 1 | |
| 305 | + for rel in self.relationships | |
| 286 | 306 | if rel["source"] == node_id or rel["target"] == node_id |
| 287 | 307 | ) |
| 288 | 308 | node_importance[node_id] = count |
| 289 | 309 | |
| 290 | 310 | important = sorted(node_importance.items(), key=lambda x: x[1], reverse=True) |
| 291 | 311 |
| --- video_processor/integrators/knowledge_graph.py | |
| +++ video_processor/integrators/knowledge_graph.py | |
| @@ -1,8 +1,7 @@ | |
| 1 | """Knowledge graph integration for organizing extracted content.""" |
| 2 | |
| 3 | import json |
| 4 | import logging |
| 5 | from pathlib import Path |
| 6 | from typing import Dict, List, Optional, Union |
| 7 | |
| 8 | from tqdm import tqdm |
| @@ -33,18 +32,24 @@ | |
| 33 | [{"role": "user", "content": prompt}], |
| 34 | max_tokens=4096, |
| 35 | temperature=temperature, |
| 36 | ) |
| 37 | |
| 38 | def extract_entities_and_relationships(self, text: str) -> tuple[List[Entity], List[Relationship]]: |
| 39 | """Extract entities and relationships in a single LLM call.""" |
| 40 | prompt = ( |
| 41 | "Extract all notable entities and relationships from the following content.\n\n" |
| 42 | f"CONTENT:\n{text}\n\n" |
| 43 | "Return a JSON object with two keys:\n" |
| 44 | '- "entities": array of {"name": "...", "type": "person|concept|technology|organization|time", "description": "brief description"}\n' |
| 45 | '- "relationships": array of {"source": "entity name", "target": "entity name", "type": "relationship description"}\n\n' |
| 46 | "Return ONLY the JSON object." |
| 47 | ) |
| 48 | raw = self._chat(prompt) |
| 49 | parsed = parse_json_from_response(raw) |
| 50 | |
| @@ -52,32 +57,38 @@ | |
| 52 | rels = [] |
| 53 | |
| 54 | if isinstance(parsed, dict): |
| 55 | for item in parsed.get("entities", []): |
| 56 | if isinstance(item, dict) and "name" in item: |
| 57 | entities.append(Entity( |
| 58 | name=item["name"], |
| 59 | type=item.get("type", "concept"), |
| 60 | descriptions=[item["description"]] if item.get("description") else [], |
| 61 | )) |
| 62 | entity_names = {e.name for e in entities} |
| 63 | for item in parsed.get("relationships", []): |
| 64 | if isinstance(item, dict) and "source" in item and "target" in item: |
| 65 | rels.append(Relationship( |
| 66 | source=item["source"], |
| 67 | target=item["target"], |
| 68 | type=item.get("type", "related_to"), |
| 69 | )) |
| 70 | elif isinstance(parsed, list): |
| 71 | # Fallback: if model returns a flat entity list |
| 72 | for item in parsed: |
| 73 | if isinstance(item, dict) and "name" in item: |
| 74 | entities.append(Entity( |
| 75 | name=item["name"], |
| 76 | type=item.get("type", "concept"), |
| 77 | descriptions=[item["description"]] if item.get("description") else [], |
| 78 | )) |
| 79 | |
| 80 | return entities, rels |
| 81 | |
| 82 | def add_content(self, text: str, source: str, timestamp: Optional[float] = None) -> None: |
| 83 | """Add content to knowledge graph by extracting entities and relationships.""" |
| @@ -84,39 +95,45 @@ | |
| 84 | entities, relationships = self.extract_entities_and_relationships(text) |
| 85 | |
| 86 | for entity in entities: |
| 87 | eid = entity.name |
| 88 | if eid in self.nodes: |
| 89 | self.nodes[eid]["occurrences"].append({ |
| 90 | "source": source, |
| 91 | "timestamp": timestamp, |
| 92 | "text": text[:100] + "..." if len(text) > 100 else text, |
| 93 | }) |
| 94 | if entity.descriptions: |
| 95 | self.nodes[eid]["descriptions"].update(entity.descriptions) |
| 96 | else: |
| 97 | self.nodes[eid] = { |
| 98 | "id": eid, |
| 99 | "name": entity.name, |
| 100 | "type": entity.type, |
| 101 | "descriptions": set(entity.descriptions), |
| 102 | "occurrences": [{ |
| 103 | "source": source, |
| 104 | "timestamp": timestamp, |
| 105 | "text": text[:100] + "..." if len(text) > 100 else text, |
| 106 | }], |
| 107 | } |
| 108 | |
| 109 | for rel in relationships: |
| 110 | if rel.source in self.nodes and rel.target in self.nodes: |
| 111 | self.relationships.append({ |
| 112 | "source": rel.source, |
| 113 | "target": rel.target, |
| 114 | "type": rel.type, |
| 115 | "content_source": source, |
| 116 | "timestamp": timestamp, |
| 117 | }) |
| 118 | |
| 119 | def process_transcript(self, transcript: Dict, batch_size: int = 10) -> None: |
| 120 | """Process transcript segments into knowledge graph, batching for efficiency.""" |
| 121 | if "segments" not in transcript: |
| 122 | logger.warning("Transcript missing segments") |
| @@ -137,17 +154,15 @@ | |
| 137 | } |
| 138 | |
| 139 | # Batch segments together for fewer API calls |
| 140 | batches = [] |
| 141 | for start in range(0, len(segments), batch_size): |
| 142 | batches.append(segments[start:start + batch_size]) |
| 143 | |
| 144 | for batch in tqdm(batches, desc="Building knowledge graph", unit="batch"): |
| 145 | # Combine batch text |
| 146 | combined_text = " ".join( |
| 147 | seg["text"] for seg in batch if "text" in seg |
| 148 | ) |
| 149 | if not combined_text.strip(): |
| 150 | continue |
| 151 | |
| 152 | # Use first segment's timestamp as batch timestamp |
| 153 | batch_start_idx = segments.index(batch[0]) |
| @@ -169,29 +184,33 @@ | |
| 169 | self.nodes[diagram_id] = { |
| 170 | "id": diagram_id, |
| 171 | "name": f"Diagram {i}", |
| 172 | "type": "diagram", |
| 173 | "descriptions": {"Visual diagram from video"}, |
| 174 | "occurrences": [{ |
| 175 | "source": source if text_content else f"diagram_{i}", |
| 176 | "frame_index": diagram.get("frame_index"), |
| 177 | }], |
| 178 | } |
| 179 | |
| 180 | def to_data(self) -> KnowledgeGraphData: |
| 181 | """Convert to pydantic KnowledgeGraphData model.""" |
| 182 | nodes = [] |
| 183 | for node in self.nodes.values(): |
| 184 | descs = node.get("descriptions", set()) |
| 185 | if isinstance(descs, set): |
| 186 | descs = list(descs) |
| 187 | nodes.append(Entity( |
| 188 | name=node["name"], |
| 189 | type=node.get("type", "concept"), |
| 190 | descriptions=descs, |
| 191 | occurrences=node.get("occurrences", []), |
| 192 | )) |
| 193 | |
| 194 | rels = [ |
| 195 | Relationship( |
| 196 | source=r["source"], |
| 197 | target=r["target"], |
| @@ -280,11 +299,12 @@ | |
| 280 | def generate_mermaid(self, max_nodes: int = 30) -> str: |
| 281 | """Generate Mermaid visualization code.""" |
| 282 | node_importance = {} |
| 283 | for node_id in self.nodes: |
| 284 | count = sum( |
| 285 | 1 for rel in self.relationships |
| 286 | if rel["source"] == node_id or rel["target"] == node_id |
| 287 | ) |
| 288 | node_importance[node_id] = count |
| 289 | |
| 290 | important = sorted(node_importance.items(), key=lambda x: x[1], reverse=True) |
| 291 |
| --- video_processor/integrators/knowledge_graph.py | |
| +++ video_processor/integrators/knowledge_graph.py | |
| @@ -1,8 +1,7 @@ | |
| 1 | """Knowledge graph integration for organizing extracted content.""" |
| 2 | |
| 3 | import logging |
| 4 | from pathlib import Path |
| 5 | from typing import Dict, List, Optional, Union |
| 6 | |
| 7 | from tqdm import tqdm |
| @@ -33,18 +32,24 @@ | |
| 32 | [{"role": "user", "content": prompt}], |
| 33 | max_tokens=4096, |
| 34 | temperature=temperature, |
| 35 | ) |
| 36 | |
| 37 | def extract_entities_and_relationships( |
| 38 | self, text: str |
| 39 | ) -> tuple[List[Entity], List[Relationship]]: |
| 40 | """Extract entities and relationships in a single LLM call.""" |
| 41 | prompt = ( |
| 42 | "Extract all notable entities and relationships from the following content.\n\n" |
| 43 | f"CONTENT:\n{text}\n\n" |
| 44 | "Return a JSON object with two keys:\n" |
| 45 | '- "entities": array of {"name": "...", ' |
| 46 | '"type": "person|concept|technology|organization|time", ' |
| 47 | '"description": "brief description"}\n' |
| 48 | '- "relationships": array of {"source": "entity name", ' |
| 49 | '"target": "entity name", ' |
| 50 | '"type": "relationship description"}\n\n' |
| 51 | "Return ONLY the JSON object." |
| 52 | ) |
| 53 | raw = self._chat(prompt) |
| 54 | parsed = parse_json_from_response(raw) |
| 55 | |
| @@ -52,32 +57,38 @@ | |
| 57 | rels = [] |
| 58 | |
| 59 | if isinstance(parsed, dict): |
| 60 | for item in parsed.get("entities", []): |
| 61 | if isinstance(item, dict) and "name" in item: |
| 62 | entities.append( |
| 63 | Entity( |
| 64 | name=item["name"], |
| 65 | type=item.get("type", "concept"), |
| 66 | descriptions=[item["description"]] if item.get("description") else [], |
| 67 | ) |
| 68 | ) |
| 69 | {e.name for e in entities} |
| 70 | for item in parsed.get("relationships", []): |
| 71 | if isinstance(item, dict) and "source" in item and "target" in item: |
| 72 | rels.append( |
| 73 | Relationship( |
| 74 | source=item["source"], |
| 75 | target=item["target"], |
| 76 | type=item.get("type", "related_to"), |
| 77 | ) |
| 78 | ) |
| 79 | elif isinstance(parsed, list): |
| 80 | # Fallback: if model returns a flat entity list |
| 81 | for item in parsed: |
| 82 | if isinstance(item, dict) and "name" in item: |
| 83 | entities.append( |
| 84 | Entity( |
| 85 | name=item["name"], |
| 86 | type=item.get("type", "concept"), |
| 87 | descriptions=[item["description"]] if item.get("description") else [], |
| 88 | ) |
| 89 | ) |
| 90 | |
| 91 | return entities, rels |
| 92 | |
| 93 | def add_content(self, text: str, source: str, timestamp: Optional[float] = None) -> None: |
| 94 | """Add content to knowledge graph by extracting entities and relationships.""" |
| @@ -84,39 +95,45 @@ | |
| 95 | entities, relationships = self.extract_entities_and_relationships(text) |
| 96 | |
| 97 | for entity in entities: |
| 98 | eid = entity.name |
| 99 | if eid in self.nodes: |
| 100 | self.nodes[eid]["occurrences"].append( |
| 101 | { |
| 102 | "source": source, |
| 103 | "timestamp": timestamp, |
| 104 | "text": text[:100] + "..." if len(text) > 100 else text, |
| 105 | } |
| 106 | ) |
| 107 | if entity.descriptions: |
| 108 | self.nodes[eid]["descriptions"].update(entity.descriptions) |
| 109 | else: |
| 110 | self.nodes[eid] = { |
| 111 | "id": eid, |
| 112 | "name": entity.name, |
| 113 | "type": entity.type, |
| 114 | "descriptions": set(entity.descriptions), |
| 115 | "occurrences": [ |
| 116 | { |
| 117 | "source": source, |
| 118 | "timestamp": timestamp, |
| 119 | "text": text[:100] + "..." if len(text) > 100 else text, |
| 120 | } |
| 121 | ], |
| 122 | } |
| 123 | |
| 124 | for rel in relationships: |
| 125 | if rel.source in self.nodes and rel.target in self.nodes: |
| 126 | self.relationships.append( |
| 127 | { |
| 128 | "source": rel.source, |
| 129 | "target": rel.target, |
| 130 | "type": rel.type, |
| 131 | "content_source": source, |
| 132 | "timestamp": timestamp, |
| 133 | } |
| 134 | ) |
| 135 | |
| 136 | def process_transcript(self, transcript: Dict, batch_size: int = 10) -> None: |
| 137 | """Process transcript segments into knowledge graph, batching for efficiency.""" |
| 138 | if "segments" not in transcript: |
| 139 | logger.warning("Transcript missing segments") |
| @@ -137,17 +154,15 @@ | |
| 154 | } |
| 155 | |
| 156 | # Batch segments together for fewer API calls |
| 157 | batches = [] |
| 158 | for start in range(0, len(segments), batch_size): |
| 159 | batches.append(segments[start : start + batch_size]) |
| 160 | |
| 161 | for batch in tqdm(batches, desc="Building knowledge graph", unit="batch"): |
| 162 | # Combine batch text |
| 163 | combined_text = " ".join(seg["text"] for seg in batch if "text" in seg) |
| 164 | if not combined_text.strip(): |
| 165 | continue |
| 166 | |
| 167 | # Use first segment's timestamp as batch timestamp |
| 168 | batch_start_idx = segments.index(batch[0]) |
| @@ -169,29 +184,33 @@ | |
| 184 | self.nodes[diagram_id] = { |
| 185 | "id": diagram_id, |
| 186 | "name": f"Diagram {i}", |
| 187 | "type": "diagram", |
| 188 | "descriptions": {"Visual diagram from video"}, |
| 189 | "occurrences": [ |
| 190 | { |
| 191 | "source": source if text_content else f"diagram_{i}", |
| 192 | "frame_index": diagram.get("frame_index"), |
| 193 | } |
| 194 | ], |
| 195 | } |
| 196 | |
| 197 | def to_data(self) -> KnowledgeGraphData: |
| 198 | """Convert to pydantic KnowledgeGraphData model.""" |
| 199 | nodes = [] |
| 200 | for node in self.nodes.values(): |
| 201 | descs = node.get("descriptions", set()) |
| 202 | if isinstance(descs, set): |
| 203 | descs = list(descs) |
| 204 | nodes.append( |
| 205 | Entity( |
| 206 | name=node["name"], |
| 207 | type=node.get("type", "concept"), |
| 208 | descriptions=descs, |
| 209 | occurrences=node.get("occurrences", []), |
| 210 | ) |
| 211 | ) |
| 212 | |
| 213 | rels = [ |
| 214 | Relationship( |
| 215 | source=r["source"], |
| 216 | target=r["target"], |
| @@ -280,11 +299,12 @@ | |
| 299 | def generate_mermaid(self, max_nodes: int = 30) -> str: |
| 300 | """Generate Mermaid visualization code.""" |
| 301 | node_importance = {} |
| 302 | for node_id in self.nodes: |
| 303 | count = sum( |
| 304 | 1 |
| 305 | for rel in self.relationships |
| 306 | if rel["source"] == node_id or rel["target"] == node_id |
| 307 | ) |
| 308 | node_importance[node_id] = count |
| 309 | |
| 310 | important = sorted(node_importance.items(), key=lambda x: x[1], reverse=True) |
| 311 |
| --- video_processor/integrators/plan_generator.py | ||
| +++ video_processor/integrators/plan_generator.py | ||
| @@ -1,14 +1,13 @@ | ||
| 1 | 1 | """Plan generation for creating structured markdown output.""" |
| 2 | 2 | |
| 3 | -import json | |
| 4 | 3 | import logging |
| 5 | 4 | from pathlib import Path |
| 6 | 5 | from typing import Dict, List, Optional, Union |
| 7 | 6 | |
| 8 | 7 | from video_processor.integrators.knowledge_graph import KnowledgeGraph |
| 9 | -from video_processor.models import BatchManifest, VideoManifest | |
| 8 | +from video_processor.models import VideoManifest | |
| 10 | 9 | from video_processor.providers.manager import ProviderManager |
| 11 | 10 | |
| 12 | 11 | logger = logging.getLogger(__name__) |
| 13 | 12 | |
| 14 | 13 | |
| @@ -36,11 +35,13 @@ | ||
| 36 | 35 | """Generate summary from transcript.""" |
| 37 | 36 | full_text = "" |
| 38 | 37 | if "segments" in transcript: |
| 39 | 38 | for segment in transcript["segments"]: |
| 40 | 39 | if "text" in segment: |
| 41 | - speaker = f"{segment.get('speaker', 'Speaker')}: " if "speaker" in segment else "" | |
| 40 | + speaker = ( | |
| 41 | + f"{segment.get('speaker', 'Speaker')}: " if "speaker" in segment else "" | |
| 42 | + ) | |
| 42 | 43 | full_text += f"{speaker}{segment['text']}\n\n" |
| 43 | 44 | |
| 44 | 45 | if not full_text.strip(): |
| 45 | 46 | full_text = transcript.get("text", "") |
| 46 | 47 | |
| 47 | 48 |
| --- video_processor/integrators/plan_generator.py | |
| +++ video_processor/integrators/plan_generator.py | |
| @@ -1,14 +1,13 @@ | |
| 1 | """Plan generation for creating structured markdown output.""" |
| 2 | |
| 3 | import json |
| 4 | import logging |
| 5 | from pathlib import Path |
| 6 | from typing import Dict, List, Optional, Union |
| 7 | |
| 8 | from video_processor.integrators.knowledge_graph import KnowledgeGraph |
| 9 | from video_processor.models import BatchManifest, VideoManifest |
| 10 | from video_processor.providers.manager import ProviderManager |
| 11 | |
| 12 | logger = logging.getLogger(__name__) |
| 13 | |
| 14 | |
| @@ -36,11 +35,13 @@ | |
| 36 | """Generate summary from transcript.""" |
| 37 | full_text = "" |
| 38 | if "segments" in transcript: |
| 39 | for segment in transcript["segments"]: |
| 40 | if "text" in segment: |
| 41 | speaker = f"{segment.get('speaker', 'Speaker')}: " if "speaker" in segment else "" |
| 42 | full_text += f"{speaker}{segment['text']}\n\n" |
| 43 | |
| 44 | if not full_text.strip(): |
| 45 | full_text = transcript.get("text", "") |
| 46 | |
| 47 |
| --- video_processor/integrators/plan_generator.py | |
| +++ video_processor/integrators/plan_generator.py | |
| @@ -1,14 +1,13 @@ | |
| 1 | """Plan generation for creating structured markdown output.""" |
| 2 | |
| 3 | import logging |
| 4 | from pathlib import Path |
| 5 | from typing import Dict, List, Optional, Union |
| 6 | |
| 7 | from video_processor.integrators.knowledge_graph import KnowledgeGraph |
| 8 | from video_processor.models import VideoManifest |
| 9 | from video_processor.providers.manager import ProviderManager |
| 10 | |
| 11 | logger = logging.getLogger(__name__) |
| 12 | |
| 13 | |
| @@ -36,11 +35,13 @@ | |
| 35 | """Generate summary from transcript.""" |
| 36 | full_text = "" |
| 37 | if "segments" in transcript: |
| 38 | for segment in transcript["segments"]: |
| 39 | if "text" in segment: |
| 40 | speaker = ( |
| 41 | f"{segment.get('speaker', 'Speaker')}: " if "speaker" in segment else "" |
| 42 | ) |
| 43 | full_text += f"{speaker}{segment['text']}\n\n" |
| 44 | |
| 45 | if not full_text.strip(): |
| 46 | full_text = transcript.get("text", "") |
| 47 | |
| 48 |
+38
-17
| --- video_processor/models.py | ||
| +++ video_processor/models.py | ||
| @@ -1,17 +1,17 @@ | ||
| 1 | 1 | """Pydantic data models for PlanOpticon output.""" |
| 2 | 2 | |
| 3 | 3 | from datetime import datetime |
| 4 | 4 | from enum import Enum |
| 5 | -from pathlib import Path | |
| 6 | 5 | from typing import Any, Dict, List, Optional |
| 7 | 6 | |
| 8 | 7 | from pydantic import BaseModel, Field |
| 9 | 8 | |
| 10 | 9 | |
| 11 | 10 | class DiagramType(str, Enum): |
| 12 | 11 | """Types of visual content detected in video frames.""" |
| 12 | + | |
| 13 | 13 | flowchart = "flowchart" |
| 14 | 14 | sequence = "sequence" |
| 15 | 15 | architecture = "architecture" |
| 16 | 16 | whiteboard = "whiteboard" |
| 17 | 17 | chart = "chart" |
| @@ -21,10 +21,11 @@ | ||
| 21 | 21 | unknown = "unknown" |
| 22 | 22 | |
| 23 | 23 | |
| 24 | 24 | class OutputFormat(str, Enum): |
| 25 | 25 | """Available output formats.""" |
| 26 | + | |
| 26 | 27 | markdown = "markdown" |
| 27 | 28 | json = "json" |
| 28 | 29 | html = "html" |
| 29 | 30 | pdf = "pdf" |
| 30 | 31 | svg = "svg" |
| @@ -31,39 +32,47 @@ | ||
| 31 | 32 | png = "png" |
| 32 | 33 | |
| 33 | 34 | |
| 34 | 35 | class TranscriptSegment(BaseModel): |
| 35 | 36 | """A single segment of transcribed audio.""" |
| 37 | + | |
| 36 | 38 | start: float = Field(description="Start time in seconds") |
| 37 | 39 | end: float = Field(description="End time in seconds") |
| 38 | 40 | text: str = Field(description="Transcribed text") |
| 39 | 41 | speaker: Optional[str] = Field(default=None, description="Speaker identifier") |
| 40 | 42 | confidence: Optional[float] = Field(default=None, description="Transcription confidence 0-1") |
| 41 | 43 | |
| 42 | 44 | |
| 43 | 45 | class ActionItem(BaseModel): |
| 44 | 46 | """An action item extracted from content.""" |
| 47 | + | |
| 45 | 48 | action: str = Field(description="The action to be taken") |
| 46 | 49 | assignee: Optional[str] = Field(default=None, description="Person responsible") |
| 47 | 50 | deadline: Optional[str] = Field(default=None, description="Deadline or timeframe") |
| 48 | 51 | priority: Optional[str] = Field(default=None, description="Priority level") |
| 49 | 52 | context: Optional[str] = Field(default=None, description="Additional context") |
| 50 | - source: Optional[str] = Field(default=None, description="Where this was found (transcript/diagram)") | |
| 53 | + source: Optional[str] = Field( | |
| 54 | + default=None, description="Where this was found (transcript/diagram)" | |
| 55 | + ) | |
| 51 | 56 | |
| 52 | 57 | |
| 53 | 58 | class KeyPoint(BaseModel): |
| 54 | 59 | """A key point extracted from content.""" |
| 60 | + | |
| 55 | 61 | point: str = Field(description="The key point") |
| 56 | 62 | topic: Optional[str] = Field(default=None, description="Topic or category") |
| 57 | 63 | details: Optional[str] = Field(default=None, description="Supporting details") |
| 58 | 64 | timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)") |
| 59 | 65 | source: Optional[str] = Field(default=None, description="Where this was found") |
| 60 | - related_diagrams: List[int] = Field(default_factory=list, description="Indices of related diagrams") | |
| 66 | + related_diagrams: List[int] = Field( | |
| 67 | + default_factory=list, description="Indices of related diagrams" | |
| 68 | + ) | |
| 61 | 69 | |
| 62 | 70 | |
| 63 | 71 | class DiagramResult(BaseModel): |
| 64 | 72 | """Result from diagram extraction and analysis.""" |
| 73 | + | |
| 65 | 74 | frame_index: int = Field(description="Index of the source frame") |
| 66 | 75 | timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)") |
| 67 | 76 | diagram_type: DiagramType = Field(default=DiagramType.unknown, description="Type of diagram") |
| 68 | 77 | confidence: float = Field(default=0.0, description="Detection confidence 0-1") |
| 69 | 78 | description: Optional[str] = Field(default=None, description="Description of the diagram") |
| @@ -70,85 +79,95 @@ | ||
| 70 | 79 | text_content: Optional[str] = Field(default=None, description="Text visible in the diagram") |
| 71 | 80 | elements: List[str] = Field(default_factory=list, description="Identified elements") |
| 72 | 81 | relationships: List[str] = Field(default_factory=list, description="Identified relationships") |
| 73 | 82 | mermaid: Optional[str] = Field(default=None, description="Mermaid syntax representation") |
| 74 | 83 | chart_data: Optional[Dict[str, Any]] = Field( |
| 75 | - default=None, | |
| 76 | - description="Chart data for reproduction (labels, values, chart_type)" | |
| 84 | + default=None, description="Chart data for reproduction (labels, values, chart_type)" | |
| 77 | 85 | ) |
| 78 | 86 | image_path: Optional[str] = Field(default=None, description="Relative path to original frame") |
| 79 | 87 | svg_path: Optional[str] = Field(default=None, description="Relative path to rendered SVG") |
| 80 | 88 | png_path: Optional[str] = Field(default=None, description="Relative path to rendered PNG") |
| 81 | 89 | mermaid_path: Optional[str] = Field(default=None, description="Relative path to mermaid source") |
| 82 | 90 | |
| 83 | 91 | |
| 84 | 92 | class ScreenCapture(BaseModel): |
| 85 | 93 | """A screengrab fallback when diagram extraction fails or is uncertain.""" |
| 94 | + | |
| 86 | 95 | frame_index: int = Field(description="Index of the source frame") |
| 87 | 96 | timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)") |
| 88 | 97 | caption: Optional[str] = Field(default=None, description="Brief description of the content") |
| 89 | 98 | image_path: Optional[str] = Field(default=None, description="Relative path to screenshot") |
| 90 | - confidence: float = Field(default=0.0, description="Detection confidence that triggered fallback") | |
| 99 | + confidence: float = Field( | |
| 100 | + default=0.0, description="Detection confidence that triggered fallback" | |
| 101 | + ) | |
| 91 | 102 | |
| 92 | 103 | |
| 93 | 104 | class Entity(BaseModel): |
| 94 | 105 | """An entity in the knowledge graph.""" |
| 106 | + | |
| 95 | 107 | name: str = Field(description="Entity name") |
| 96 | 108 | type: str = Field(default="concept", description="Entity type (person, concept, time, diagram)") |
| 97 | 109 | descriptions: List[str] = Field(default_factory=list, description="Descriptions of this entity") |
| 98 | - source: Optional[str] = Field(default=None, description="Source attribution (transcript/diagram/both)") | |
| 110 | + source: Optional[str] = Field( | |
| 111 | + default=None, description="Source attribution (transcript/diagram/both)" | |
| 112 | + ) | |
| 99 | 113 | occurrences: List[Dict[str, Any]] = Field( |
| 100 | - default_factory=list, | |
| 101 | - description="List of occurrences with source, timestamp, text" | |
| 114 | + default_factory=list, description="List of occurrences with source, timestamp, text" | |
| 102 | 115 | ) |
| 103 | 116 | |
| 104 | 117 | |
| 105 | 118 | class Relationship(BaseModel): |
| 106 | 119 | """A relationship between entities in the knowledge graph.""" |
| 120 | + | |
| 107 | 121 | source: str = Field(description="Source entity name") |
| 108 | 122 | target: str = Field(description="Target entity name") |
| 109 | 123 | type: str = Field(default="related_to", description="Relationship type") |
| 110 | 124 | content_source: Optional[str] = Field(default=None, description="Content source identifier") |
| 111 | 125 | timestamp: Optional[float] = Field(default=None, description="Timestamp in seconds") |
| 112 | 126 | |
| 113 | 127 | |
| 114 | 128 | class KnowledgeGraphData(BaseModel): |
| 115 | 129 | """Serializable knowledge graph data.""" |
| 130 | + | |
| 116 | 131 | nodes: List[Entity] = Field(default_factory=list, description="Graph nodes/entities") |
| 117 | - relationships: List[Relationship] = Field(default_factory=list, description="Graph relationships") | |
| 132 | + relationships: List[Relationship] = Field( | |
| 133 | + default_factory=list, description="Graph relationships" | |
| 134 | + ) | |
| 118 | 135 | |
| 119 | 136 | |
| 120 | 137 | class ProcessingStats(BaseModel): |
| 121 | 138 | """Statistics about a processing run.""" |
| 139 | + | |
| 122 | 140 | start_time: Optional[str] = Field(default=None, description="ISO format start time") |
| 123 | 141 | end_time: Optional[str] = Field(default=None, description="ISO format end time") |
| 124 | 142 | duration_seconds: Optional[float] = Field(default=None, description="Total processing time") |
| 125 | 143 | frames_extracted: int = Field(default=0) |
| 126 | 144 | people_frames_filtered: int = Field(default=0) |
| 127 | 145 | diagrams_detected: int = Field(default=0) |
| 128 | 146 | screen_captures: int = Field(default=0) |
| 129 | 147 | transcript_duration_seconds: Optional[float] = Field(default=None) |
| 130 | 148 | models_used: Dict[str, str] = Field( |
| 131 | - default_factory=dict, | |
| 132 | - description="Map of task to model used (e.g. vision: gpt-4o)" | |
| 149 | + default_factory=dict, description="Map of task to model used (e.g. vision: gpt-4o)" | |
| 133 | 150 | ) |
| 134 | 151 | |
| 135 | 152 | |
| 136 | 153 | class VideoMetadata(BaseModel): |
| 137 | 154 | """Metadata about the source video.""" |
| 155 | + | |
| 138 | 156 | title: str = Field(description="Video title") |
| 139 | 157 | source_path: Optional[str] = Field(default=None, description="Original video file path") |
| 140 | 158 | duration_seconds: Optional[float] = Field(default=None, description="Video duration") |
| 141 | 159 | resolution: Optional[str] = Field(default=None, description="Video resolution (e.g. 1920x1080)") |
| 142 | 160 | processed_at: str = Field( |
| 143 | 161 | default_factory=lambda: datetime.now().isoformat(), |
| 144 | - description="ISO format processing timestamp" | |
| 162 | + description="ISO format processing timestamp", | |
| 145 | 163 | ) |
| 146 | 164 | |
| 147 | 165 | |
| 148 | 166 | class VideoManifest(BaseModel): |
| 149 | 167 | """Manifest for a single video processing run - the single source of truth.""" |
| 168 | + | |
| 150 | 169 | version: str = Field(default="1.0", description="Manifest schema version") |
| 151 | 170 | video: VideoMetadata = Field(description="Source video metadata") |
| 152 | 171 | stats: ProcessingStats = Field(default_factory=ProcessingStats) |
| 153 | 172 | |
| 154 | 173 | # Relative paths to output files |
| @@ -167,15 +186,18 @@ | ||
| 167 | 186 | action_items: List[ActionItem] = Field(default_factory=list) |
| 168 | 187 | diagrams: List[DiagramResult] = Field(default_factory=list) |
| 169 | 188 | screen_captures: List[ScreenCapture] = Field(default_factory=list) |
| 170 | 189 | |
| 171 | 190 | # Frame paths |
| 172 | - frame_paths: List[str] = Field(default_factory=list, description="Relative paths to extracted frames") | |
| 191 | + frame_paths: List[str] = Field( | |
| 192 | + default_factory=list, description="Relative paths to extracted frames" | |
| 193 | + ) | |
| 173 | 194 | |
| 174 | 195 | |
| 175 | 196 | class BatchVideoEntry(BaseModel): |
| 176 | 197 | """Summary of a single video within a batch.""" |
| 198 | + | |
| 177 | 199 | video_name: str |
| 178 | 200 | manifest_path: str = Field(description="Relative path to video manifest") |
| 179 | 201 | status: str = Field(default="pending", description="pending/completed/failed") |
| 180 | 202 | error: Optional[str] = Field(default=None, description="Error message if failed") |
| 181 | 203 | diagrams_count: int = Field(default=0) |
| @@ -184,15 +206,14 @@ | ||
| 184 | 206 | duration_seconds: Optional[float] = Field(default=None) |
| 185 | 207 | |
| 186 | 208 | |
| 187 | 209 | class BatchManifest(BaseModel): |
| 188 | 210 | """Manifest for a batch processing run.""" |
| 211 | + | |
| 189 | 212 | version: str = Field(default="1.0") |
| 190 | 213 | title: str = Field(default="Batch Processing Results") |
| 191 | - processed_at: str = Field( | |
| 192 | - default_factory=lambda: datetime.now().isoformat() | |
| 193 | - ) | |
| 214 | + processed_at: str = Field(default_factory=lambda: datetime.now().isoformat()) | |
| 194 | 215 | stats: ProcessingStats = Field(default_factory=ProcessingStats) |
| 195 | 216 | |
| 196 | 217 | videos: List[BatchVideoEntry] = Field(default_factory=list) |
| 197 | 218 | |
| 198 | 219 | # Aggregated counts |
| 199 | 220 |
| --- video_processor/models.py | |
| +++ video_processor/models.py | |
| @@ -1,17 +1,17 @@ | |
| 1 | """Pydantic data models for PlanOpticon output.""" |
| 2 | |
| 3 | from datetime import datetime |
| 4 | from enum import Enum |
| 5 | from pathlib import Path |
| 6 | from typing import Any, Dict, List, Optional |
| 7 | |
| 8 | from pydantic import BaseModel, Field |
| 9 | |
| 10 | |
| 11 | class DiagramType(str, Enum): |
| 12 | """Types of visual content detected in video frames.""" |
| 13 | flowchart = "flowchart" |
| 14 | sequence = "sequence" |
| 15 | architecture = "architecture" |
| 16 | whiteboard = "whiteboard" |
| 17 | chart = "chart" |
| @@ -21,10 +21,11 @@ | |
| 21 | unknown = "unknown" |
| 22 | |
| 23 | |
| 24 | class OutputFormat(str, Enum): |
| 25 | """Available output formats.""" |
| 26 | markdown = "markdown" |
| 27 | json = "json" |
| 28 | html = "html" |
| 29 | pdf = "pdf" |
| 30 | svg = "svg" |
| @@ -31,39 +32,47 @@ | |
| 31 | png = "png" |
| 32 | |
| 33 | |
| 34 | class TranscriptSegment(BaseModel): |
| 35 | """A single segment of transcribed audio.""" |
| 36 | start: float = Field(description="Start time in seconds") |
| 37 | end: float = Field(description="End time in seconds") |
| 38 | text: str = Field(description="Transcribed text") |
| 39 | speaker: Optional[str] = Field(default=None, description="Speaker identifier") |
| 40 | confidence: Optional[float] = Field(default=None, description="Transcription confidence 0-1") |
| 41 | |
| 42 | |
| 43 | class ActionItem(BaseModel): |
| 44 | """An action item extracted from content.""" |
| 45 | action: str = Field(description="The action to be taken") |
| 46 | assignee: Optional[str] = Field(default=None, description="Person responsible") |
| 47 | deadline: Optional[str] = Field(default=None, description="Deadline or timeframe") |
| 48 | priority: Optional[str] = Field(default=None, description="Priority level") |
| 49 | context: Optional[str] = Field(default=None, description="Additional context") |
| 50 | source: Optional[str] = Field(default=None, description="Where this was found (transcript/diagram)") |
| 51 | |
| 52 | |
| 53 | class KeyPoint(BaseModel): |
| 54 | """A key point extracted from content.""" |
| 55 | point: str = Field(description="The key point") |
| 56 | topic: Optional[str] = Field(default=None, description="Topic or category") |
| 57 | details: Optional[str] = Field(default=None, description="Supporting details") |
| 58 | timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)") |
| 59 | source: Optional[str] = Field(default=None, description="Where this was found") |
| 60 | related_diagrams: List[int] = Field(default_factory=list, description="Indices of related diagrams") |
| 61 | |
| 62 | |
| 63 | class DiagramResult(BaseModel): |
| 64 | """Result from diagram extraction and analysis.""" |
| 65 | frame_index: int = Field(description="Index of the source frame") |
| 66 | timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)") |
| 67 | diagram_type: DiagramType = Field(default=DiagramType.unknown, description="Type of diagram") |
| 68 | confidence: float = Field(default=0.0, description="Detection confidence 0-1") |
| 69 | description: Optional[str] = Field(default=None, description="Description of the diagram") |
| @@ -70,85 +79,95 @@ | |
| 70 | text_content: Optional[str] = Field(default=None, description="Text visible in the diagram") |
| 71 | elements: List[str] = Field(default_factory=list, description="Identified elements") |
| 72 | relationships: List[str] = Field(default_factory=list, description="Identified relationships") |
| 73 | mermaid: Optional[str] = Field(default=None, description="Mermaid syntax representation") |
| 74 | chart_data: Optional[Dict[str, Any]] = Field( |
| 75 | default=None, |
| 76 | description="Chart data for reproduction (labels, values, chart_type)" |
| 77 | ) |
| 78 | image_path: Optional[str] = Field(default=None, description="Relative path to original frame") |
| 79 | svg_path: Optional[str] = Field(default=None, description="Relative path to rendered SVG") |
| 80 | png_path: Optional[str] = Field(default=None, description="Relative path to rendered PNG") |
| 81 | mermaid_path: Optional[str] = Field(default=None, description="Relative path to mermaid source") |
| 82 | |
| 83 | |
| 84 | class ScreenCapture(BaseModel): |
| 85 | """A screengrab fallback when diagram extraction fails or is uncertain.""" |
| 86 | frame_index: int = Field(description="Index of the source frame") |
| 87 | timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)") |
| 88 | caption: Optional[str] = Field(default=None, description="Brief description of the content") |
| 89 | image_path: Optional[str] = Field(default=None, description="Relative path to screenshot") |
| 90 | confidence: float = Field(default=0.0, description="Detection confidence that triggered fallback") |
| 91 | |
| 92 | |
| 93 | class Entity(BaseModel): |
| 94 | """An entity in the knowledge graph.""" |
| 95 | name: str = Field(description="Entity name") |
| 96 | type: str = Field(default="concept", description="Entity type (person, concept, time, diagram)") |
| 97 | descriptions: List[str] = Field(default_factory=list, description="Descriptions of this entity") |
| 98 | source: Optional[str] = Field(default=None, description="Source attribution (transcript/diagram/both)") |
| 99 | occurrences: List[Dict[str, Any]] = Field( |
| 100 | default_factory=list, |
| 101 | description="List of occurrences with source, timestamp, text" |
| 102 | ) |
| 103 | |
| 104 | |
| 105 | class Relationship(BaseModel): |
| 106 | """A relationship between entities in the knowledge graph.""" |
| 107 | source: str = Field(description="Source entity name") |
| 108 | target: str = Field(description="Target entity name") |
| 109 | type: str = Field(default="related_to", description="Relationship type") |
| 110 | content_source: Optional[str] = Field(default=None, description="Content source identifier") |
| 111 | timestamp: Optional[float] = Field(default=None, description="Timestamp in seconds") |
| 112 | |
| 113 | |
| 114 | class KnowledgeGraphData(BaseModel): |
| 115 | """Serializable knowledge graph data.""" |
| 116 | nodes: List[Entity] = Field(default_factory=list, description="Graph nodes/entities") |
| 117 | relationships: List[Relationship] = Field(default_factory=list, description="Graph relationships") |
| 118 | |
| 119 | |
| 120 | class ProcessingStats(BaseModel): |
| 121 | """Statistics about a processing run.""" |
| 122 | start_time: Optional[str] = Field(default=None, description="ISO format start time") |
| 123 | end_time: Optional[str] = Field(default=None, description="ISO format end time") |
| 124 | duration_seconds: Optional[float] = Field(default=None, description="Total processing time") |
| 125 | frames_extracted: int = Field(default=0) |
| 126 | people_frames_filtered: int = Field(default=0) |
| 127 | diagrams_detected: int = Field(default=0) |
| 128 | screen_captures: int = Field(default=0) |
| 129 | transcript_duration_seconds: Optional[float] = Field(default=None) |
| 130 | models_used: Dict[str, str] = Field( |
| 131 | default_factory=dict, |
| 132 | description="Map of task to model used (e.g. vision: gpt-4o)" |
| 133 | ) |
| 134 | |
| 135 | |
| 136 | class VideoMetadata(BaseModel): |
| 137 | """Metadata about the source video.""" |
| 138 | title: str = Field(description="Video title") |
| 139 | source_path: Optional[str] = Field(default=None, description="Original video file path") |
| 140 | duration_seconds: Optional[float] = Field(default=None, description="Video duration") |
| 141 | resolution: Optional[str] = Field(default=None, description="Video resolution (e.g. 1920x1080)") |
| 142 | processed_at: str = Field( |
| 143 | default_factory=lambda: datetime.now().isoformat(), |
| 144 | description="ISO format processing timestamp" |
| 145 | ) |
| 146 | |
| 147 | |
| 148 | class VideoManifest(BaseModel): |
| 149 | """Manifest for a single video processing run - the single source of truth.""" |
| 150 | version: str = Field(default="1.0", description="Manifest schema version") |
| 151 | video: VideoMetadata = Field(description="Source video metadata") |
| 152 | stats: ProcessingStats = Field(default_factory=ProcessingStats) |
| 153 | |
| 154 | # Relative paths to output files |
| @@ -167,15 +186,18 @@ | |
| 167 | action_items: List[ActionItem] = Field(default_factory=list) |
| 168 | diagrams: List[DiagramResult] = Field(default_factory=list) |
| 169 | screen_captures: List[ScreenCapture] = Field(default_factory=list) |
| 170 | |
| 171 | # Frame paths |
| 172 | frame_paths: List[str] = Field(default_factory=list, description="Relative paths to extracted frames") |
| 173 | |
| 174 | |
| 175 | class BatchVideoEntry(BaseModel): |
| 176 | """Summary of a single video within a batch.""" |
| 177 | video_name: str |
| 178 | manifest_path: str = Field(description="Relative path to video manifest") |
| 179 | status: str = Field(default="pending", description="pending/completed/failed") |
| 180 | error: Optional[str] = Field(default=None, description="Error message if failed") |
| 181 | diagrams_count: int = Field(default=0) |
| @@ -184,15 +206,14 @@ | |
| 184 | duration_seconds: Optional[float] = Field(default=None) |
| 185 | |
| 186 | |
| 187 | class BatchManifest(BaseModel): |
| 188 | """Manifest for a batch processing run.""" |
| 189 | version: str = Field(default="1.0") |
| 190 | title: str = Field(default="Batch Processing Results") |
| 191 | processed_at: str = Field( |
| 192 | default_factory=lambda: datetime.now().isoformat() |
| 193 | ) |
| 194 | stats: ProcessingStats = Field(default_factory=ProcessingStats) |
| 195 | |
| 196 | videos: List[BatchVideoEntry] = Field(default_factory=list) |
| 197 | |
| 198 | # Aggregated counts |
| 199 |
| --- video_processor/models.py | |
| +++ video_processor/models.py | |
| @@ -1,17 +1,17 @@ | |
| 1 | """Pydantic data models for PlanOpticon output.""" |
| 2 | |
| 3 | from datetime import datetime |
| 4 | from enum import Enum |
| 5 | from typing import Any, Dict, List, Optional |
| 6 | |
| 7 | from pydantic import BaseModel, Field |
| 8 | |
| 9 | |
| 10 | class DiagramType(str, Enum): |
| 11 | """Types of visual content detected in video frames.""" |
| 12 | |
| 13 | flowchart = "flowchart" |
| 14 | sequence = "sequence" |
| 15 | architecture = "architecture" |
| 16 | whiteboard = "whiteboard" |
| 17 | chart = "chart" |
| @@ -21,10 +21,11 @@ | |
| 21 | unknown = "unknown" |
| 22 | |
| 23 | |
| 24 | class OutputFormat(str, Enum): |
| 25 | """Available output formats.""" |
| 26 | |
| 27 | markdown = "markdown" |
| 28 | json = "json" |
| 29 | html = "html" |
| 30 | pdf = "pdf" |
| 31 | svg = "svg" |
| @@ -31,39 +32,47 @@ | |
| 32 | png = "png" |
| 33 | |
| 34 | |
| 35 | class TranscriptSegment(BaseModel): |
| 36 | """A single segment of transcribed audio.""" |
| 37 | |
| 38 | start: float = Field(description="Start time in seconds") |
| 39 | end: float = Field(description="End time in seconds") |
| 40 | text: str = Field(description="Transcribed text") |
| 41 | speaker: Optional[str] = Field(default=None, description="Speaker identifier") |
| 42 | confidence: Optional[float] = Field(default=None, description="Transcription confidence 0-1") |
| 43 | |
| 44 | |
| 45 | class ActionItem(BaseModel): |
| 46 | """An action item extracted from content.""" |
| 47 | |
| 48 | action: str = Field(description="The action to be taken") |
| 49 | assignee: Optional[str] = Field(default=None, description="Person responsible") |
| 50 | deadline: Optional[str] = Field(default=None, description="Deadline or timeframe") |
| 51 | priority: Optional[str] = Field(default=None, description="Priority level") |
| 52 | context: Optional[str] = Field(default=None, description="Additional context") |
| 53 | source: Optional[str] = Field( |
| 54 | default=None, description="Where this was found (transcript/diagram)" |
| 55 | ) |
| 56 | |
| 57 | |
| 58 | class KeyPoint(BaseModel): |
| 59 | """A key point extracted from content.""" |
| 60 | |
| 61 | point: str = Field(description="The key point") |
| 62 | topic: Optional[str] = Field(default=None, description="Topic or category") |
| 63 | details: Optional[str] = Field(default=None, description="Supporting details") |
| 64 | timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)") |
| 65 | source: Optional[str] = Field(default=None, description="Where this was found") |
| 66 | related_diagrams: List[int] = Field( |
| 67 | default_factory=list, description="Indices of related diagrams" |
| 68 | ) |
| 69 | |
| 70 | |
| 71 | class DiagramResult(BaseModel): |
| 72 | """Result from diagram extraction and analysis.""" |
| 73 | |
| 74 | frame_index: int = Field(description="Index of the source frame") |
| 75 | timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)") |
| 76 | diagram_type: DiagramType = Field(default=DiagramType.unknown, description="Type of diagram") |
| 77 | confidence: float = Field(default=0.0, description="Detection confidence 0-1") |
| 78 | description: Optional[str] = Field(default=None, description="Description of the diagram") |
| @@ -70,85 +79,95 @@ | |
| 79 | text_content: Optional[str] = Field(default=None, description="Text visible in the diagram") |
| 80 | elements: List[str] = Field(default_factory=list, description="Identified elements") |
| 81 | relationships: List[str] = Field(default_factory=list, description="Identified relationships") |
| 82 | mermaid: Optional[str] = Field(default=None, description="Mermaid syntax representation") |
| 83 | chart_data: Optional[Dict[str, Any]] = Field( |
| 84 | default=None, description="Chart data for reproduction (labels, values, chart_type)" |
| 85 | ) |
| 86 | image_path: Optional[str] = Field(default=None, description="Relative path to original frame") |
| 87 | svg_path: Optional[str] = Field(default=None, description="Relative path to rendered SVG") |
| 88 | png_path: Optional[str] = Field(default=None, description="Relative path to rendered PNG") |
| 89 | mermaid_path: Optional[str] = Field(default=None, description="Relative path to mermaid source") |
| 90 | |
| 91 | |
| 92 | class ScreenCapture(BaseModel): |
| 93 | """A screengrab fallback when diagram extraction fails or is uncertain.""" |
| 94 | |
| 95 | frame_index: int = Field(description="Index of the source frame") |
| 96 | timestamp: Optional[float] = Field(default=None, description="Timestamp in video (seconds)") |
| 97 | caption: Optional[str] = Field(default=None, description="Brief description of the content") |
| 98 | image_path: Optional[str] = Field(default=None, description="Relative path to screenshot") |
| 99 | confidence: float = Field( |
| 100 | default=0.0, description="Detection confidence that triggered fallback" |
| 101 | ) |
| 102 | |
| 103 | |
| 104 | class Entity(BaseModel): |
| 105 | """An entity in the knowledge graph.""" |
| 106 | |
| 107 | name: str = Field(description="Entity name") |
| 108 | type: str = Field(default="concept", description="Entity type (person, concept, time, diagram)") |
| 109 | descriptions: List[str] = Field(default_factory=list, description="Descriptions of this entity") |
| 110 | source: Optional[str] = Field( |
| 111 | default=None, description="Source attribution (transcript/diagram/both)" |
| 112 | ) |
| 113 | occurrences: List[Dict[str, Any]] = Field( |
| 114 | default_factory=list, description="List of occurrences with source, timestamp, text" |
| 115 | ) |
| 116 | |
| 117 | |
| 118 | class Relationship(BaseModel): |
| 119 | """A relationship between entities in the knowledge graph.""" |
| 120 | |
| 121 | source: str = Field(description="Source entity name") |
| 122 | target: str = Field(description="Target entity name") |
| 123 | type: str = Field(default="related_to", description="Relationship type") |
| 124 | content_source: Optional[str] = Field(default=None, description="Content source identifier") |
| 125 | timestamp: Optional[float] = Field(default=None, description="Timestamp in seconds") |
| 126 | |
| 127 | |
| 128 | class KnowledgeGraphData(BaseModel): |
| 129 | """Serializable knowledge graph data.""" |
| 130 | |
| 131 | nodes: List[Entity] = Field(default_factory=list, description="Graph nodes/entities") |
| 132 | relationships: List[Relationship] = Field( |
| 133 | default_factory=list, description="Graph relationships" |
| 134 | ) |
| 135 | |
| 136 | |
| 137 | class ProcessingStats(BaseModel): |
| 138 | """Statistics about a processing run.""" |
| 139 | |
| 140 | start_time: Optional[str] = Field(default=None, description="ISO format start time") |
| 141 | end_time: Optional[str] = Field(default=None, description="ISO format end time") |
| 142 | duration_seconds: Optional[float] = Field(default=None, description="Total processing time") |
| 143 | frames_extracted: int = Field(default=0) |
| 144 | people_frames_filtered: int = Field(default=0) |
| 145 | diagrams_detected: int = Field(default=0) |
| 146 | screen_captures: int = Field(default=0) |
| 147 | transcript_duration_seconds: Optional[float] = Field(default=None) |
| 148 | models_used: Dict[str, str] = Field( |
| 149 | default_factory=dict, description="Map of task to model used (e.g. vision: gpt-4o)" |
| 150 | ) |
| 151 | |
| 152 | |
| 153 | class VideoMetadata(BaseModel): |
| 154 | """Metadata about the source video.""" |
| 155 | |
| 156 | title: str = Field(description="Video title") |
| 157 | source_path: Optional[str] = Field(default=None, description="Original video file path") |
| 158 | duration_seconds: Optional[float] = Field(default=None, description="Video duration") |
| 159 | resolution: Optional[str] = Field(default=None, description="Video resolution (e.g. 1920x1080)") |
| 160 | processed_at: str = Field( |
| 161 | default_factory=lambda: datetime.now().isoformat(), |
| 162 | description="ISO format processing timestamp", |
| 163 | ) |
| 164 | |
| 165 | |
| 166 | class VideoManifest(BaseModel): |
| 167 | """Manifest for a single video processing run - the single source of truth.""" |
| 168 | |
| 169 | version: str = Field(default="1.0", description="Manifest schema version") |
| 170 | video: VideoMetadata = Field(description="Source video metadata") |
| 171 | stats: ProcessingStats = Field(default_factory=ProcessingStats) |
| 172 | |
| 173 | # Relative paths to output files |
| @@ -167,15 +186,18 @@ | |
| 186 | action_items: List[ActionItem] = Field(default_factory=list) |
| 187 | diagrams: List[DiagramResult] = Field(default_factory=list) |
| 188 | screen_captures: List[ScreenCapture] = Field(default_factory=list) |
| 189 | |
| 190 | # Frame paths |
| 191 | frame_paths: List[str] = Field( |
| 192 | default_factory=list, description="Relative paths to extracted frames" |
| 193 | ) |
| 194 | |
| 195 | |
| 196 | class BatchVideoEntry(BaseModel): |
| 197 | """Summary of a single video within a batch.""" |
| 198 | |
| 199 | video_name: str |
| 200 | manifest_path: str = Field(description="Relative path to video manifest") |
| 201 | status: str = Field(default="pending", description="pending/completed/failed") |
| 202 | error: Optional[str] = Field(default=None, description="Error message if failed") |
| 203 | diagrams_count: int = Field(default=0) |
| @@ -184,15 +206,14 @@ | |
| 206 | duration_seconds: Optional[float] = Field(default=None) |
| 207 | |
| 208 | |
| 209 | class BatchManifest(BaseModel): |
| 210 | """Manifest for a batch processing run.""" |
| 211 | |
| 212 | version: str = Field(default="1.0") |
| 213 | title: str = Field(default="Batch Processing Results") |
| 214 | processed_at: str = Field(default_factory=lambda: datetime.now().isoformat()) |
| 215 | stats: ProcessingStats = Field(default_factory=ProcessingStats) |
| 216 | |
| 217 | videos: List[BatchVideoEntry] = Field(default_factory=list) |
| 218 | |
| 219 | # Aggregated counts |
| 220 |
| --- video_processor/output_structure.py | ||
| +++ video_processor/output_structure.py | ||
| @@ -1,8 +1,7 @@ | ||
| 1 | 1 | """Standardized output directory structure and manifest I/O for PlanOpticon.""" |
| 2 | 2 | |
| 3 | -import json | |
| 4 | 3 | import logging |
| 5 | 4 | from pathlib import Path |
| 6 | 5 | from typing import Dict |
| 7 | 6 | |
| 8 | 7 | from video_processor.models import BatchManifest, VideoManifest |
| 9 | 8 |
| --- video_processor/output_structure.py | |
| +++ video_processor/output_structure.py | |
| @@ -1,8 +1,7 @@ | |
| 1 | """Standardized output directory structure and manifest I/O for PlanOpticon.""" |
| 2 | |
| 3 | import json |
| 4 | import logging |
| 5 | from pathlib import Path |
| 6 | from typing import Dict |
| 7 | |
| 8 | from video_processor.models import BatchManifest, VideoManifest |
| 9 |
| --- video_processor/output_structure.py | |
| +++ video_processor/output_structure.py | |
| @@ -1,8 +1,7 @@ | |
| 1 | """Standardized output directory structure and manifest I/O for PlanOpticon.""" |
| 2 | |
| 3 | import logging |
| 4 | from pathlib import Path |
| 5 | from typing import Dict |
| 6 | |
| 7 | from video_processor.models import BatchManifest, VideoManifest |
| 8 |
+17
-14
| --- video_processor/pipeline.py | ||
| +++ video_processor/pipeline.py | ||
| @@ -9,11 +9,15 @@ | ||
| 9 | 9 | |
| 10 | 10 | from tqdm import tqdm |
| 11 | 11 | |
| 12 | 12 | from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer |
| 13 | 13 | from video_processor.extractors.audio_extractor import AudioExtractor |
| 14 | -from video_processor.extractors.frame_extractor import extract_frames, filter_people_frames, save_frames | |
| 14 | +from video_processor.extractors.frame_extractor import ( | |
| 15 | + extract_frames, | |
| 16 | + filter_people_frames, | |
| 17 | + save_frames, | |
| 18 | +) | |
| 15 | 19 | from video_processor.integrators.knowledge_graph import KnowledgeGraph |
| 16 | 20 | from video_processor.integrators.plan_generator import PlanGenerator |
| 17 | 21 | from video_processor.models import ( |
| 18 | 22 | ActionItem, |
| 19 | 23 | KeyPoint, |
| @@ -145,13 +149,11 @@ | ||
| 145 | 149 | srt_lines = [] |
| 146 | 150 | for i, seg in enumerate(segments): |
| 147 | 151 | start = seg.get("start", 0) |
| 148 | 152 | end = seg.get("end", 0) |
| 149 | 153 | srt_lines.append(str(i + 1)) |
| 150 | - srt_lines.append( | |
| 151 | - f"{_format_srt_time(start)} --> {_format_srt_time(end)}" | |
| 152 | - ) | |
| 154 | + srt_lines.append(f"{_format_srt_time(start)} --> {_format_srt_time(end)}") | |
| 153 | 155 | srt_lines.append(seg.get("text", "").strip()) |
| 154 | 156 | srt_lines.append("") |
| 155 | 157 | transcript_srt.write_text("\n".join(srt_lines)) |
| 156 | 158 | pipeline_bar.update(1) |
| 157 | 159 | |
| @@ -158,14 +160,17 @@ | ||
| 158 | 160 | # --- Step 4: Diagram extraction --- |
| 159 | 161 | pm.usage.start_step("Visual analysis") |
| 160 | 162 | pipeline_bar.set_description("Pipeline: analyzing visuals") |
| 161 | 163 | diagrams = [] |
| 162 | 164 | screen_captures = [] |
| 163 | - existing_diagrams = sorted(dirs["diagrams"].glob("diagram_*.json")) if dirs["diagrams"].exists() else [] | |
| 165 | + existing_diagrams = ( | |
| 166 | + sorted(dirs["diagrams"].glob("diagram_*.json")) if dirs["diagrams"].exists() else [] | |
| 167 | + ) | |
| 164 | 168 | if existing_diagrams: |
| 165 | 169 | logger.info(f"Resuming: found {len(existing_diagrams)} diagrams on disk, skipping analysis") |
| 166 | 170 | from video_processor.models import DiagramResult |
| 171 | + | |
| 167 | 172 | for dj in existing_diagrams: |
| 168 | 173 | try: |
| 169 | 174 | diagrams.append(DiagramResult.model_validate_json(dj.read_text())) |
| 170 | 175 | except Exception as e: |
| 171 | 176 | logger.warning(f"Failed to load diagram {dj}: {e}") |
| @@ -208,16 +213,12 @@ | ||
| 208 | 213 | pipeline_bar.set_description("Pipeline: extracting key points") |
| 209 | 214 | kp_path = dirs["results"] / "key_points.json" |
| 210 | 215 | ai_path = dirs["results"] / "action_items.json" |
| 211 | 216 | if kp_path.exists() and ai_path.exists(): |
| 212 | 217 | logger.info("Resuming: found key points and action items on disk") |
| 213 | - key_points = [ | |
| 214 | - KeyPoint(**item) for item in json.loads(kp_path.read_text()) | |
| 215 | - ] | |
| 216 | - action_items = [ | |
| 217 | - ActionItem(**item) for item in json.loads(ai_path.read_text()) | |
| 218 | - ] | |
| 218 | + key_points = [KeyPoint(**item) for item in json.loads(kp_path.read_text())] | |
| 219 | + action_items = [ActionItem(**item) for item in json.loads(ai_path.read_text())] | |
| 219 | 220 | else: |
| 220 | 221 | key_points = _extract_key_points(pm, transcript_text) |
| 221 | 222 | action_items = _extract_action_items(pm, transcript_text) |
| 222 | 223 | |
| 223 | 224 | kp_path.write_text(json.dumps([kp.model_dump() for kp in key_points], indent=2)) |
| @@ -286,13 +287,15 @@ | ||
| 286 | 287 | pipeline_bar.close() |
| 287 | 288 | |
| 288 | 289 | # Write manifest |
| 289 | 290 | write_video_manifest(manifest, output_dir) |
| 290 | 291 | |
| 291 | - logger.info(f"Processing complete in {elapsed:.1f}s: {len(diagrams)} diagrams, " | |
| 292 | - f"{len(screen_captures)} captures, {len(key_points)} key points, " | |
| 293 | - f"{len(action_items)} action items") | |
| 292 | + logger.info( | |
| 293 | + f"Processing complete in {elapsed:.1f}s: {len(diagrams)} diagrams, " | |
| 294 | + f"{len(screen_captures)} captures, {len(key_points)} key points, " | |
| 295 | + f"{len(action_items)} action items" | |
| 296 | + ) | |
| 294 | 297 | |
| 295 | 298 | return manifest |
| 296 | 299 | |
| 297 | 300 | |
| 298 | 301 | def _extract_key_points(pm: ProviderManager, text: str) -> list[KeyPoint]: |
| 299 | 302 |
| --- video_processor/pipeline.py | |
| +++ video_processor/pipeline.py | |
| @@ -9,11 +9,15 @@ | |
| 9 | |
| 10 | from tqdm import tqdm |
| 11 | |
| 12 | from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer |
| 13 | from video_processor.extractors.audio_extractor import AudioExtractor |
| 14 | from video_processor.extractors.frame_extractor import extract_frames, filter_people_frames, save_frames |
| 15 | from video_processor.integrators.knowledge_graph import KnowledgeGraph |
| 16 | from video_processor.integrators.plan_generator import PlanGenerator |
| 17 | from video_processor.models import ( |
| 18 | ActionItem, |
| 19 | KeyPoint, |
| @@ -145,13 +149,11 @@ | |
| 145 | srt_lines = [] |
| 146 | for i, seg in enumerate(segments): |
| 147 | start = seg.get("start", 0) |
| 148 | end = seg.get("end", 0) |
| 149 | srt_lines.append(str(i + 1)) |
| 150 | srt_lines.append( |
| 151 | f"{_format_srt_time(start)} --> {_format_srt_time(end)}" |
| 152 | ) |
| 153 | srt_lines.append(seg.get("text", "").strip()) |
| 154 | srt_lines.append("") |
| 155 | transcript_srt.write_text("\n".join(srt_lines)) |
| 156 | pipeline_bar.update(1) |
| 157 | |
| @@ -158,14 +160,17 @@ | |
| 158 | # --- Step 4: Diagram extraction --- |
| 159 | pm.usage.start_step("Visual analysis") |
| 160 | pipeline_bar.set_description("Pipeline: analyzing visuals") |
| 161 | diagrams = [] |
| 162 | screen_captures = [] |
| 163 | existing_diagrams = sorted(dirs["diagrams"].glob("diagram_*.json")) if dirs["diagrams"].exists() else [] |
| 164 | if existing_diagrams: |
| 165 | logger.info(f"Resuming: found {len(existing_diagrams)} diagrams on disk, skipping analysis") |
| 166 | from video_processor.models import DiagramResult |
| 167 | for dj in existing_diagrams: |
| 168 | try: |
| 169 | diagrams.append(DiagramResult.model_validate_json(dj.read_text())) |
| 170 | except Exception as e: |
| 171 | logger.warning(f"Failed to load diagram {dj}: {e}") |
| @@ -208,16 +213,12 @@ | |
| 208 | pipeline_bar.set_description("Pipeline: extracting key points") |
| 209 | kp_path = dirs["results"] / "key_points.json" |
| 210 | ai_path = dirs["results"] / "action_items.json" |
| 211 | if kp_path.exists() and ai_path.exists(): |
| 212 | logger.info("Resuming: found key points and action items on disk") |
| 213 | key_points = [ |
| 214 | KeyPoint(**item) for item in json.loads(kp_path.read_text()) |
| 215 | ] |
| 216 | action_items = [ |
| 217 | ActionItem(**item) for item in json.loads(ai_path.read_text()) |
| 218 | ] |
| 219 | else: |
| 220 | key_points = _extract_key_points(pm, transcript_text) |
| 221 | action_items = _extract_action_items(pm, transcript_text) |
| 222 | |
| 223 | kp_path.write_text(json.dumps([kp.model_dump() for kp in key_points], indent=2)) |
| @@ -286,13 +287,15 @@ | |
| 286 | pipeline_bar.close() |
| 287 | |
| 288 | # Write manifest |
| 289 | write_video_manifest(manifest, output_dir) |
| 290 | |
| 291 | logger.info(f"Processing complete in {elapsed:.1f}s: {len(diagrams)} diagrams, " |
| 292 | f"{len(screen_captures)} captures, {len(key_points)} key points, " |
| 293 | f"{len(action_items)} action items") |
| 294 | |
| 295 | return manifest |
| 296 | |
| 297 | |
| 298 | def _extract_key_points(pm: ProviderManager, text: str) -> list[KeyPoint]: |
| 299 |
| --- video_processor/pipeline.py | |
| +++ video_processor/pipeline.py | |
| @@ -9,11 +9,15 @@ | |
| 9 | |
| 10 | from tqdm import tqdm |
| 11 | |
| 12 | from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer |
| 13 | from video_processor.extractors.audio_extractor import AudioExtractor |
| 14 | from video_processor.extractors.frame_extractor import ( |
| 15 | extract_frames, |
| 16 | filter_people_frames, |
| 17 | save_frames, |
| 18 | ) |
| 19 | from video_processor.integrators.knowledge_graph import KnowledgeGraph |
| 20 | from video_processor.integrators.plan_generator import PlanGenerator |
| 21 | from video_processor.models import ( |
| 22 | ActionItem, |
| 23 | KeyPoint, |
| @@ -145,13 +149,11 @@ | |
| 149 | srt_lines = [] |
| 150 | for i, seg in enumerate(segments): |
| 151 | start = seg.get("start", 0) |
| 152 | end = seg.get("end", 0) |
| 153 | srt_lines.append(str(i + 1)) |
| 154 | srt_lines.append(f"{_format_srt_time(start)} --> {_format_srt_time(end)}") |
| 155 | srt_lines.append(seg.get("text", "").strip()) |
| 156 | srt_lines.append("") |
| 157 | transcript_srt.write_text("\n".join(srt_lines)) |
| 158 | pipeline_bar.update(1) |
| 159 | |
| @@ -158,14 +160,17 @@ | |
| 160 | # --- Step 4: Diagram extraction --- |
| 161 | pm.usage.start_step("Visual analysis") |
| 162 | pipeline_bar.set_description("Pipeline: analyzing visuals") |
| 163 | diagrams = [] |
| 164 | screen_captures = [] |
| 165 | existing_diagrams = ( |
| 166 | sorted(dirs["diagrams"].glob("diagram_*.json")) if dirs["diagrams"].exists() else [] |
| 167 | ) |
| 168 | if existing_diagrams: |
| 169 | logger.info(f"Resuming: found {len(existing_diagrams)} diagrams on disk, skipping analysis") |
| 170 | from video_processor.models import DiagramResult |
| 171 | |
| 172 | for dj in existing_diagrams: |
| 173 | try: |
| 174 | diagrams.append(DiagramResult.model_validate_json(dj.read_text())) |
| 175 | except Exception as e: |
| 176 | logger.warning(f"Failed to load diagram {dj}: {e}") |
| @@ -208,16 +213,12 @@ | |
| 213 | pipeline_bar.set_description("Pipeline: extracting key points") |
| 214 | kp_path = dirs["results"] / "key_points.json" |
| 215 | ai_path = dirs["results"] / "action_items.json" |
| 216 | if kp_path.exists() and ai_path.exists(): |
| 217 | logger.info("Resuming: found key points and action items on disk") |
| 218 | key_points = [KeyPoint(**item) for item in json.loads(kp_path.read_text())] |
| 219 | action_items = [ActionItem(**item) for item in json.loads(ai_path.read_text())] |
| 220 | else: |
| 221 | key_points = _extract_key_points(pm, transcript_text) |
| 222 | action_items = _extract_action_items(pm, transcript_text) |
| 223 | |
| 224 | kp_path.write_text(json.dumps([kp.model_dump() for kp in key_points], indent=2)) |
| @@ -286,13 +287,15 @@ | |
| 287 | pipeline_bar.close() |
| 288 | |
| 289 | # Write manifest |
| 290 | write_video_manifest(manifest, output_dir) |
| 291 | |
| 292 | logger.info( |
| 293 | f"Processing complete in {elapsed:.1f}s: {len(diagrams)} diagrams, " |
| 294 | f"{len(screen_captures)} captures, {len(key_points)} key points, " |
| 295 | f"{len(action_items)} action items" |
| 296 | ) |
| 297 | |
| 298 | return manifest |
| 299 | |
| 300 | |
| 301 | def _extract_key_points(pm: ProviderManager, text: str) -> list[KeyPoint]: |
| 302 |
| --- video_processor/providers/anthropic_provider.py | ||
| +++ video_processor/providers/anthropic_provider.py | ||
| @@ -97,14 +97,16 @@ | ||
| 97 | 97 | try: |
| 98 | 98 | page = self.client.models.list(limit=100) |
| 99 | 99 | for m in page.data: |
| 100 | 100 | mid = m.id |
| 101 | 101 | caps = ["chat", "vision"] # All Claude models support chat + vision |
| 102 | - models.append(ModelInfo( | |
| 103 | - id=mid, | |
| 104 | - provider="anthropic", | |
| 105 | - display_name=getattr(m, "display_name", mid), | |
| 106 | - capabilities=caps, | |
| 107 | - )) | |
| 102 | + models.append( | |
| 103 | + ModelInfo( | |
| 104 | + id=mid, | |
| 105 | + provider="anthropic", | |
| 106 | + display_name=getattr(m, "display_name", mid), | |
| 107 | + capabilities=caps, | |
| 108 | + ) | |
| 109 | + ) | |
| 108 | 110 | except Exception as e: |
| 109 | 111 | logger.warning(f"Failed to list Anthropic models: {e}") |
| 110 | 112 | return sorted(models, key=lambda m: m.id) |
| 111 | 113 |
| --- video_processor/providers/anthropic_provider.py | |
| +++ video_processor/providers/anthropic_provider.py | |
| @@ -97,14 +97,16 @@ | |
| 97 | try: |
| 98 | page = self.client.models.list(limit=100) |
| 99 | for m in page.data: |
| 100 | mid = m.id |
| 101 | caps = ["chat", "vision"] # All Claude models support chat + vision |
| 102 | models.append(ModelInfo( |
| 103 | id=mid, |
| 104 | provider="anthropic", |
| 105 | display_name=getattr(m, "display_name", mid), |
| 106 | capabilities=caps, |
| 107 | )) |
| 108 | except Exception as e: |
| 109 | logger.warning(f"Failed to list Anthropic models: {e}") |
| 110 | return sorted(models, key=lambda m: m.id) |
| 111 |
| --- video_processor/providers/anthropic_provider.py | |
| +++ video_processor/providers/anthropic_provider.py | |
| @@ -97,14 +97,16 @@ | |
| 97 | try: |
| 98 | page = self.client.models.list(limit=100) |
| 99 | for m in page.data: |
| 100 | mid = m.id |
| 101 | caps = ["chat", "vision"] # All Claude models support chat + vision |
| 102 | models.append( |
| 103 | ModelInfo( |
| 104 | id=mid, |
| 105 | provider="anthropic", |
| 106 | display_name=getattr(m, "display_name", mid), |
| 107 | capabilities=caps, |
| 108 | ) |
| 109 | ) |
| 110 | except Exception as e: |
| 111 | logger.warning(f"Failed to list Anthropic models: {e}") |
| 112 | return sorted(models, key=lambda m: m.id) |
| 113 |
| --- video_processor/providers/base.py | ||
| +++ video_processor/providers/base.py | ||
| @@ -7,16 +7,16 @@ | ||
| 7 | 7 | from pydantic import BaseModel, Field |
| 8 | 8 | |
| 9 | 9 | |
| 10 | 10 | class ModelInfo(BaseModel): |
| 11 | 11 | """Information about an available model.""" |
| 12 | + | |
| 12 | 13 | id: str = Field(description="Model identifier (e.g. gpt-4o)") |
| 13 | 14 | provider: str = Field(description="Provider name (openai, anthropic, gemini)") |
| 14 | 15 | display_name: str = Field(default="", description="Human-readable name") |
| 15 | 16 | capabilities: List[str] = Field( |
| 16 | - default_factory=list, | |
| 17 | - description="Model capabilities: chat, vision, audio, embedding" | |
| 17 | + default_factory=list, description="Model capabilities: chat, vision, audio, embedding" | |
| 18 | 18 | ) |
| 19 | 19 | |
| 20 | 20 | |
| 21 | 21 | class BaseProvider(ABC): |
| 22 | 22 | """Abstract base for all provider implementations.""" |
| 23 | 23 |
| --- video_processor/providers/base.py | |
| +++ video_processor/providers/base.py | |
| @@ -7,16 +7,16 @@ | |
| 7 | from pydantic import BaseModel, Field |
| 8 | |
| 9 | |
| 10 | class ModelInfo(BaseModel): |
| 11 | """Information about an available model.""" |
| 12 | id: str = Field(description="Model identifier (e.g. gpt-4o)") |
| 13 | provider: str = Field(description="Provider name (openai, anthropic, gemini)") |
| 14 | display_name: str = Field(default="", description="Human-readable name") |
| 15 | capabilities: List[str] = Field( |
| 16 | default_factory=list, |
| 17 | description="Model capabilities: chat, vision, audio, embedding" |
| 18 | ) |
| 19 | |
| 20 | |
| 21 | class BaseProvider(ABC): |
| 22 | """Abstract base for all provider implementations.""" |
| 23 |
| --- video_processor/providers/base.py | |
| +++ video_processor/providers/base.py | |
| @@ -7,16 +7,16 @@ | |
| 7 | from pydantic import BaseModel, Field |
| 8 | |
| 9 | |
| 10 | class ModelInfo(BaseModel): |
| 11 | """Information about an available model.""" |
| 12 | |
| 13 | id: str = Field(description="Model identifier (e.g. gpt-4o)") |
| 14 | provider: str = Field(description="Provider name (openai, anthropic, gemini)") |
| 15 | display_name: str = Field(default="", description="Human-readable name") |
| 16 | capabilities: List[str] = Field( |
| 17 | default_factory=list, description="Model capabilities: chat, vision, audio, embedding" |
| 18 | ) |
| 19 | |
| 20 | |
| 21 | class BaseProvider(ABC): |
| 22 | """Abstract base for all provider implementations.""" |
| 23 |
| --- video_processor/providers/discovery.py | ||
| +++ video_processor/providers/discovery.py | ||
| @@ -38,10 +38,11 @@ | ||
| 38 | 38 | |
| 39 | 39 | # OpenAI |
| 40 | 40 | if keys.get("openai"): |
| 41 | 41 | try: |
| 42 | 42 | from video_processor.providers.openai_provider import OpenAIProvider |
| 43 | + | |
| 43 | 44 | provider = OpenAIProvider(api_key=keys["openai"]) |
| 44 | 45 | models = provider.list_models() |
| 45 | 46 | logger.info(f"Discovered {len(models)} OpenAI models") |
| 46 | 47 | all_models.extend(models) |
| 47 | 48 | except Exception as e: |
| @@ -49,10 +50,11 @@ | ||
| 49 | 50 | |
| 50 | 51 | # Anthropic |
| 51 | 52 | if keys.get("anthropic"): |
| 52 | 53 | try: |
| 53 | 54 | from video_processor.providers.anthropic_provider import AnthropicProvider |
| 55 | + | |
| 54 | 56 | provider = AnthropicProvider(api_key=keys["anthropic"]) |
| 55 | 57 | models = provider.list_models() |
| 56 | 58 | logger.info(f"Discovered {len(models)} Anthropic models") |
| 57 | 59 | all_models.extend(models) |
| 58 | 60 | except Exception as e: |
| @@ -62,10 +64,11 @@ | ||
| 62 | 64 | gemini_key = keys.get("gemini") |
| 63 | 65 | gemini_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS", "") |
| 64 | 66 | if gemini_key or gemini_creds: |
| 65 | 67 | try: |
| 66 | 68 | from video_processor.providers.gemini_provider import GeminiProvider |
| 69 | + | |
| 67 | 70 | provider = GeminiProvider( |
| 68 | 71 | api_key=gemini_key or None, |
| 69 | 72 | credentials_path=gemini_creds or None, |
| 70 | 73 | ) |
| 71 | 74 | models = provider.list_models() |
| 72 | 75 |
| --- video_processor/providers/discovery.py | |
| +++ video_processor/providers/discovery.py | |
| @@ -38,10 +38,11 @@ | |
| 38 | |
| 39 | # OpenAI |
| 40 | if keys.get("openai"): |
| 41 | try: |
| 42 | from video_processor.providers.openai_provider import OpenAIProvider |
| 43 | provider = OpenAIProvider(api_key=keys["openai"]) |
| 44 | models = provider.list_models() |
| 45 | logger.info(f"Discovered {len(models)} OpenAI models") |
| 46 | all_models.extend(models) |
| 47 | except Exception as e: |
| @@ -49,10 +50,11 @@ | |
| 49 | |
| 50 | # Anthropic |
| 51 | if keys.get("anthropic"): |
| 52 | try: |
| 53 | from video_processor.providers.anthropic_provider import AnthropicProvider |
| 54 | provider = AnthropicProvider(api_key=keys["anthropic"]) |
| 55 | models = provider.list_models() |
| 56 | logger.info(f"Discovered {len(models)} Anthropic models") |
| 57 | all_models.extend(models) |
| 58 | except Exception as e: |
| @@ -62,10 +64,11 @@ | |
| 62 | gemini_key = keys.get("gemini") |
| 63 | gemini_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS", "") |
| 64 | if gemini_key or gemini_creds: |
| 65 | try: |
| 66 | from video_processor.providers.gemini_provider import GeminiProvider |
| 67 | provider = GeminiProvider( |
| 68 | api_key=gemini_key or None, |
| 69 | credentials_path=gemini_creds or None, |
| 70 | ) |
| 71 | models = provider.list_models() |
| 72 |
| --- video_processor/providers/discovery.py | |
| +++ video_processor/providers/discovery.py | |
| @@ -38,10 +38,11 @@ | |
| 38 | |
| 39 | # OpenAI |
| 40 | if keys.get("openai"): |
| 41 | try: |
| 42 | from video_processor.providers.openai_provider import OpenAIProvider |
| 43 | |
| 44 | provider = OpenAIProvider(api_key=keys["openai"]) |
| 45 | models = provider.list_models() |
| 46 | logger.info(f"Discovered {len(models)} OpenAI models") |
| 47 | all_models.extend(models) |
| 48 | except Exception as e: |
| @@ -49,10 +50,11 @@ | |
| 50 | |
| 51 | # Anthropic |
| 52 | if keys.get("anthropic"): |
| 53 | try: |
| 54 | from video_processor.providers.anthropic_provider import AnthropicProvider |
| 55 | |
| 56 | provider = AnthropicProvider(api_key=keys["anthropic"]) |
| 57 | models = provider.list_models() |
| 58 | logger.info(f"Discovered {len(models)} Anthropic models") |
| 59 | all_models.extend(models) |
| 60 | except Exception as e: |
| @@ -62,10 +64,11 @@ | |
| 64 | gemini_key = keys.get("gemini") |
| 65 | gemini_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS", "") |
| 66 | if gemini_key or gemini_creds: |
| 67 | try: |
| 68 | from video_processor.providers.gemini_provider import GeminiProvider |
| 69 | |
| 70 | provider = GeminiProvider( |
| 71 | api_key=gemini_key or None, |
| 72 | credentials_path=gemini_creds or None, |
| 73 | ) |
| 74 | models = provider.list_models() |
| 75 |
| --- video_processor/providers/gemini_provider.py | ||
| +++ video_processor/providers/gemini_provider.py | ||
| @@ -29,16 +29,15 @@ | ||
| 29 | 29 | ): |
| 30 | 30 | self.api_key = api_key or os.getenv("GEMINI_API_KEY") |
| 31 | 31 | self.credentials_path = credentials_path or os.getenv("GOOGLE_APPLICATION_CREDENTIALS") |
| 32 | 32 | |
| 33 | 33 | if not self.api_key and not self.credentials_path: |
| 34 | - raise ValueError( | |
| 35 | - "Neither GEMINI_API_KEY nor GOOGLE_APPLICATION_CREDENTIALS is set" | |
| 36 | - ) | |
| 34 | + raise ValueError("Neither GEMINI_API_KEY nor GOOGLE_APPLICATION_CREDENTIALS is set") | |
| 37 | 35 | |
| 38 | 36 | try: |
| 39 | 37 | from google import genai |
| 38 | + | |
| 40 | 39 | self._genai = genai |
| 41 | 40 | |
| 42 | 41 | if self.api_key: |
| 43 | 42 | self.client = genai.Client(api_key=self.api_key) |
| 44 | 43 | else: |
| @@ -55,12 +54,11 @@ | ||
| 55 | 54 | project=project, |
| 56 | 55 | location=location, |
| 57 | 56 | ) |
| 58 | 57 | except ImportError: |
| 59 | 58 | raise ImportError( |
| 60 | - "google-genai package not installed. " | |
| 61 | - "Install with: pip install google-genai" | |
| 59 | + "google-genai package not installed. Install with: pip install google-genai" | |
| 62 | 60 | ) |
| 63 | 61 | |
| 64 | 62 | def chat( |
| 65 | 63 | self, |
| 66 | 64 | messages: list[dict], |
| @@ -73,14 +71,16 @@ | ||
| 73 | 71 | model = model or "gemini-2.5-flash" |
| 74 | 72 | # Convert OpenAI-style messages to Gemini contents |
| 75 | 73 | contents = [] |
| 76 | 74 | for msg in messages: |
| 77 | 75 | role = "user" if msg["role"] == "user" else "model" |
| 78 | - contents.append(types.Content( | |
| 79 | - role=role, | |
| 80 | - parts=[types.Part.from_text(text=msg["content"])], | |
| 81 | - )) | |
| 76 | + contents.append( | |
| 77 | + types.Content( | |
| 78 | + role=role, | |
| 79 | + parts=[types.Part.from_text(text=msg["content"])], | |
| 80 | + ) | |
| 81 | + ) | |
| 82 | 82 | |
| 83 | 83 | response = self.client.models.generate_content( |
| 84 | 84 | model=model, |
| 85 | 85 | contents=contents, |
| 86 | 86 | config=types.GenerateContentConfig( |
| @@ -168,10 +168,11 @@ | ||
| 168 | 168 | ), |
| 169 | 169 | ) |
| 170 | 170 | |
| 171 | 171 | # Parse JSON response |
| 172 | 172 | import json |
| 173 | + | |
| 173 | 174 | try: |
| 174 | 175 | data = json.loads(response.text) |
| 175 | 176 | except (json.JSONDecodeError, TypeError): |
| 176 | 177 | data = {"text": response.text or "", "segments": []} |
| 177 | 178 | |
| @@ -190,11 +191,11 @@ | ||
| 190 | 191 | for m in self.client.models.list(): |
| 191 | 192 | mid = m.name or "" |
| 192 | 193 | # Strip prefix variants from different API modes |
| 193 | 194 | for prefix in ("models/", "publishers/google/models/"): |
| 194 | 195 | if mid.startswith(prefix): |
| 195 | - mid = mid[len(prefix):] | |
| 196 | + mid = mid[len(prefix) :] | |
| 196 | 197 | break |
| 197 | 198 | display = getattr(m, "display_name", mid) or mid |
| 198 | 199 | |
| 199 | 200 | caps = [] |
| 200 | 201 | mid_lower = mid.lower() |
| @@ -206,14 +207,16 @@ | ||
| 206 | 207 | caps.append("audio") |
| 207 | 208 | if "embedding" in mid_lower: |
| 208 | 209 | caps.append("embedding") |
| 209 | 210 | |
| 210 | 211 | if caps: |
| 211 | - models.append(ModelInfo( | |
| 212 | - id=mid, | |
| 213 | - provider="gemini", | |
| 214 | - display_name=display, | |
| 215 | - capabilities=caps, | |
| 216 | - )) | |
| 212 | + models.append( | |
| 213 | + ModelInfo( | |
| 214 | + id=mid, | |
| 215 | + provider="gemini", | |
| 216 | + display_name=display, | |
| 217 | + capabilities=caps, | |
| 218 | + ) | |
| 219 | + ) | |
| 217 | 220 | except Exception as e: |
| 218 | 221 | logger.warning(f"Failed to list Gemini models: {e}") |
| 219 | 222 | return sorted(models, key=lambda m: m.id) |
| 220 | 223 |
| --- video_processor/providers/gemini_provider.py | |
| +++ video_processor/providers/gemini_provider.py | |
| @@ -29,16 +29,15 @@ | |
| 29 | ): |
| 30 | self.api_key = api_key or os.getenv("GEMINI_API_KEY") |
| 31 | self.credentials_path = credentials_path or os.getenv("GOOGLE_APPLICATION_CREDENTIALS") |
| 32 | |
| 33 | if not self.api_key and not self.credentials_path: |
| 34 | raise ValueError( |
| 35 | "Neither GEMINI_API_KEY nor GOOGLE_APPLICATION_CREDENTIALS is set" |
| 36 | ) |
| 37 | |
| 38 | try: |
| 39 | from google import genai |
| 40 | self._genai = genai |
| 41 | |
| 42 | if self.api_key: |
| 43 | self.client = genai.Client(api_key=self.api_key) |
| 44 | else: |
| @@ -55,12 +54,11 @@ | |
| 55 | project=project, |
| 56 | location=location, |
| 57 | ) |
| 58 | except ImportError: |
| 59 | raise ImportError( |
| 60 | "google-genai package not installed. " |
| 61 | "Install with: pip install google-genai" |
| 62 | ) |
| 63 | |
| 64 | def chat( |
| 65 | self, |
| 66 | messages: list[dict], |
| @@ -73,14 +71,16 @@ | |
| 73 | model = model or "gemini-2.5-flash" |
| 74 | # Convert OpenAI-style messages to Gemini contents |
| 75 | contents = [] |
| 76 | for msg in messages: |
| 77 | role = "user" if msg["role"] == "user" else "model" |
| 78 | contents.append(types.Content( |
| 79 | role=role, |
| 80 | parts=[types.Part.from_text(text=msg["content"])], |
| 81 | )) |
| 82 | |
| 83 | response = self.client.models.generate_content( |
| 84 | model=model, |
| 85 | contents=contents, |
| 86 | config=types.GenerateContentConfig( |
| @@ -168,10 +168,11 @@ | |
| 168 | ), |
| 169 | ) |
| 170 | |
| 171 | # Parse JSON response |
| 172 | import json |
| 173 | try: |
| 174 | data = json.loads(response.text) |
| 175 | except (json.JSONDecodeError, TypeError): |
| 176 | data = {"text": response.text or "", "segments": []} |
| 177 | |
| @@ -190,11 +191,11 @@ | |
| 190 | for m in self.client.models.list(): |
| 191 | mid = m.name or "" |
| 192 | # Strip prefix variants from different API modes |
| 193 | for prefix in ("models/", "publishers/google/models/"): |
| 194 | if mid.startswith(prefix): |
| 195 | mid = mid[len(prefix):] |
| 196 | break |
| 197 | display = getattr(m, "display_name", mid) or mid |
| 198 | |
| 199 | caps = [] |
| 200 | mid_lower = mid.lower() |
| @@ -206,14 +207,16 @@ | |
| 206 | caps.append("audio") |
| 207 | if "embedding" in mid_lower: |
| 208 | caps.append("embedding") |
| 209 | |
| 210 | if caps: |
| 211 | models.append(ModelInfo( |
| 212 | id=mid, |
| 213 | provider="gemini", |
| 214 | display_name=display, |
| 215 | capabilities=caps, |
| 216 | )) |
| 217 | except Exception as e: |
| 218 | logger.warning(f"Failed to list Gemini models: {e}") |
| 219 | return sorted(models, key=lambda m: m.id) |
| 220 |
| --- video_processor/providers/gemini_provider.py | |
| +++ video_processor/providers/gemini_provider.py | |
| @@ -29,16 +29,15 @@ | |
| 29 | ): |
| 30 | self.api_key = api_key or os.getenv("GEMINI_API_KEY") |
| 31 | self.credentials_path = credentials_path or os.getenv("GOOGLE_APPLICATION_CREDENTIALS") |
| 32 | |
| 33 | if not self.api_key and not self.credentials_path: |
| 34 | raise ValueError("Neither GEMINI_API_KEY nor GOOGLE_APPLICATION_CREDENTIALS is set") |
| 35 | |
| 36 | try: |
| 37 | from google import genai |
| 38 | |
| 39 | self._genai = genai |
| 40 | |
| 41 | if self.api_key: |
| 42 | self.client = genai.Client(api_key=self.api_key) |
| 43 | else: |
| @@ -55,12 +54,11 @@ | |
| 54 | project=project, |
| 55 | location=location, |
| 56 | ) |
| 57 | except ImportError: |
| 58 | raise ImportError( |
| 59 | "google-genai package not installed. Install with: pip install google-genai" |
| 60 | ) |
| 61 | |
| 62 | def chat( |
| 63 | self, |
| 64 | messages: list[dict], |
| @@ -73,14 +71,16 @@ | |
| 71 | model = model or "gemini-2.5-flash" |
| 72 | # Convert OpenAI-style messages to Gemini contents |
| 73 | contents = [] |
| 74 | for msg in messages: |
| 75 | role = "user" if msg["role"] == "user" else "model" |
| 76 | contents.append( |
| 77 | types.Content( |
| 78 | role=role, |
| 79 | parts=[types.Part.from_text(text=msg["content"])], |
| 80 | ) |
| 81 | ) |
| 82 | |
| 83 | response = self.client.models.generate_content( |
| 84 | model=model, |
| 85 | contents=contents, |
| 86 | config=types.GenerateContentConfig( |
| @@ -168,10 +168,11 @@ | |
| 168 | ), |
| 169 | ) |
| 170 | |
| 171 | # Parse JSON response |
| 172 | import json |
| 173 | |
| 174 | try: |
| 175 | data = json.loads(response.text) |
| 176 | except (json.JSONDecodeError, TypeError): |
| 177 | data = {"text": response.text or "", "segments": []} |
| 178 | |
| @@ -190,11 +191,11 @@ | |
| 191 | for m in self.client.models.list(): |
| 192 | mid = m.name or "" |
| 193 | # Strip prefix variants from different API modes |
| 194 | for prefix in ("models/", "publishers/google/models/"): |
| 195 | if mid.startswith(prefix): |
| 196 | mid = mid[len(prefix) :] |
| 197 | break |
| 198 | display = getattr(m, "display_name", mid) or mid |
| 199 | |
| 200 | caps = [] |
| 201 | mid_lower = mid.lower() |
| @@ -206,14 +207,16 @@ | |
| 207 | caps.append("audio") |
| 208 | if "embedding" in mid_lower: |
| 209 | caps.append("embedding") |
| 210 | |
| 211 | if caps: |
| 212 | models.append( |
| 213 | ModelInfo( |
| 214 | id=mid, |
| 215 | provider="gemini", |
| 216 | display_name=display, |
| 217 | capabilities=caps, |
| 218 | ) |
| 219 | ) |
| 220 | except Exception as e: |
| 221 | logger.warning(f"Failed to list Gemini models: {e}") |
| 222 | return sorted(models, key=lambda m: m.id) |
| 223 |
+29
-7
| --- video_processor/providers/manager.py | ||
| +++ video_processor/providers/manager.py | ||
| @@ -1,9 +1,8 @@ | ||
| 1 | 1 | """ProviderManager - unified interface for routing API calls to the best available provider.""" |
| 2 | 2 | |
| 3 | 3 | import logging |
| 4 | -import os | |
| 5 | 4 | from pathlib import Path |
| 6 | 5 | from typing import Optional |
| 7 | 6 | |
| 8 | 7 | from dotenv import load_dotenv |
| 9 | 8 | |
| @@ -67,11 +66,13 @@ | ||
| 67 | 66 | |
| 68 | 67 | # If a single provider is forced, apply it |
| 69 | 68 | if provider: |
| 70 | 69 | self.vision_model = vision_model or self._default_for_provider(provider, "vision") |
| 71 | 70 | self.chat_model = chat_model or self._default_for_provider(provider, "chat") |
| 72 | - self.transcription_model = transcription_model or self._default_for_provider(provider, "audio") | |
| 71 | + self.transcription_model = transcription_model or self._default_for_provider( | |
| 72 | + provider, "audio" | |
| 73 | + ) | |
| 73 | 74 | else: |
| 74 | 75 | self.vision_model = vision_model |
| 75 | 76 | self.chat_model = chat_model |
| 76 | 77 | self.transcription_model = transcription_model |
| 77 | 78 | |
| @@ -80,34 +81,51 @@ | ||
| 80 | 81 | @staticmethod |
| 81 | 82 | def _default_for_provider(provider: str, capability: str) -> str: |
| 82 | 83 | """Return the default model for a provider/capability combo.""" |
| 83 | 84 | defaults = { |
| 84 | 85 | "openai": {"chat": "gpt-4o", "vision": "gpt-4o", "audio": "whisper-1"}, |
| 85 | - "anthropic": {"chat": "claude-sonnet-4-5-20250929", "vision": "claude-sonnet-4-5-20250929", "audio": ""}, | |
| 86 | - "gemini": {"chat": "gemini-2.5-flash", "vision": "gemini-2.5-flash", "audio": "gemini-2.5-flash"}, | |
| 86 | + "anthropic": { | |
| 87 | + "chat": "claude-sonnet-4-5-20250929", | |
| 88 | + "vision": "claude-sonnet-4-5-20250929", | |
| 89 | + "audio": "", | |
| 90 | + }, | |
| 91 | + "gemini": { | |
| 92 | + "chat": "gemini-2.5-flash", | |
| 93 | + "vision": "gemini-2.5-flash", | |
| 94 | + "audio": "gemini-2.5-flash", | |
| 95 | + }, | |
| 87 | 96 | } |
| 88 | 97 | return defaults.get(provider, {}).get(capability, "") |
| 89 | 98 | |
| 90 | 99 | def _get_provider(self, provider_name: str) -> BaseProvider: |
| 91 | 100 | """Lazily initialize and cache a provider instance.""" |
| 92 | 101 | if provider_name not in self._providers: |
| 93 | 102 | if provider_name == "openai": |
| 94 | 103 | from video_processor.providers.openai_provider import OpenAIProvider |
| 104 | + | |
| 95 | 105 | self._providers[provider_name] = OpenAIProvider() |
| 96 | 106 | elif provider_name == "anthropic": |
| 97 | 107 | from video_processor.providers.anthropic_provider import AnthropicProvider |
| 108 | + | |
| 98 | 109 | self._providers[provider_name] = AnthropicProvider() |
| 99 | 110 | elif provider_name == "gemini": |
| 100 | 111 | from video_processor.providers.gemini_provider import GeminiProvider |
| 112 | + | |
| 101 | 113 | self._providers[provider_name] = GeminiProvider() |
| 102 | 114 | else: |
| 103 | 115 | raise ValueError(f"Unknown provider: {provider_name}") |
| 104 | 116 | return self._providers[provider_name] |
| 105 | 117 | |
| 106 | 118 | def _provider_for_model(self, model_id: str) -> str: |
| 107 | 119 | """Infer the provider from a model id.""" |
| 108 | - if model_id.startswith("gpt-") or model_id.startswith("o1") or model_id.startswith("o3") or model_id.startswith("o4") or model_id.startswith("whisper"): | |
| 120 | + if ( | |
| 121 | + model_id.startswith("gpt-") | |
| 122 | + or model_id.startswith("o1") | |
| 123 | + or model_id.startswith("o3") | |
| 124 | + or model_id.startswith("o4") | |
| 125 | + or model_id.startswith("whisper") | |
| 126 | + ): | |
| 109 | 127 | return "openai" |
| 110 | 128 | if model_id.startswith("claude-"): |
| 111 | 129 | return "anthropic" |
| 112 | 130 | if model_id.startswith("gemini-"): |
| 113 | 131 | return "gemini" |
| @@ -121,11 +139,13 @@ | ||
| 121 | 139 | def _get_available_models(self) -> list[ModelInfo]: |
| 122 | 140 | if self._available_models is None: |
| 123 | 141 | self._available_models = discover_available_models() |
| 124 | 142 | return self._available_models |
| 125 | 143 | |
| 126 | - def _resolve_model(self, explicit: Optional[str], capability: str, preferences: list[tuple[str, str]]) -> tuple[str, str]: | |
| 144 | + def _resolve_model( | |
| 145 | + self, explicit: Optional[str], capability: str, preferences: list[tuple[str, str]] | |
| 146 | + ) -> tuple[str, str]: | |
| 127 | 147 | """ |
| 128 | 148 | Resolve which (provider, model) to use for a capability. |
| 129 | 149 | |
| 130 | 150 | Returns (provider_name, model_id). |
| 131 | 151 | """ |
| @@ -169,11 +189,13 @@ | ||
| 169 | 189 | ) -> str: |
| 170 | 190 | """Send a chat completion to the best available provider.""" |
| 171 | 191 | prov_name, model = self._resolve_model(self.chat_model, "chat", _CHAT_PREFERENCES) |
| 172 | 192 | logger.info(f"Chat: using {prov_name}/{model}") |
| 173 | 193 | provider = self._get_provider(prov_name) |
| 174 | - result = provider.chat(messages, max_tokens=max_tokens, temperature=temperature, model=model) | |
| 194 | + result = provider.chat( | |
| 195 | + messages, max_tokens=max_tokens, temperature=temperature, model=model | |
| 196 | + ) | |
| 175 | 197 | self._track(provider, prov_name, model) |
| 176 | 198 | return result |
| 177 | 199 | |
| 178 | 200 | def analyze_image( |
| 179 | 201 | self, |
| 180 | 202 |
| --- video_processor/providers/manager.py | |
| +++ video_processor/providers/manager.py | |
| @@ -1,9 +1,8 @@ | |
| 1 | """ProviderManager - unified interface for routing API calls to the best available provider.""" |
| 2 | |
| 3 | import logging |
| 4 | import os |
| 5 | from pathlib import Path |
| 6 | from typing import Optional |
| 7 | |
| 8 | from dotenv import load_dotenv |
| 9 | |
| @@ -67,11 +66,13 @@ | |
| 67 | |
| 68 | # If a single provider is forced, apply it |
| 69 | if provider: |
| 70 | self.vision_model = vision_model or self._default_for_provider(provider, "vision") |
| 71 | self.chat_model = chat_model or self._default_for_provider(provider, "chat") |
| 72 | self.transcription_model = transcription_model or self._default_for_provider(provider, "audio") |
| 73 | else: |
| 74 | self.vision_model = vision_model |
| 75 | self.chat_model = chat_model |
| 76 | self.transcription_model = transcription_model |
| 77 | |
| @@ -80,34 +81,51 @@ | |
| 80 | @staticmethod |
| 81 | def _default_for_provider(provider: str, capability: str) -> str: |
| 82 | """Return the default model for a provider/capability combo.""" |
| 83 | defaults = { |
| 84 | "openai": {"chat": "gpt-4o", "vision": "gpt-4o", "audio": "whisper-1"}, |
| 85 | "anthropic": {"chat": "claude-sonnet-4-5-20250929", "vision": "claude-sonnet-4-5-20250929", "audio": ""}, |
| 86 | "gemini": {"chat": "gemini-2.5-flash", "vision": "gemini-2.5-flash", "audio": "gemini-2.5-flash"}, |
| 87 | } |
| 88 | return defaults.get(provider, {}).get(capability, "") |
| 89 | |
| 90 | def _get_provider(self, provider_name: str) -> BaseProvider: |
| 91 | """Lazily initialize and cache a provider instance.""" |
| 92 | if provider_name not in self._providers: |
| 93 | if provider_name == "openai": |
| 94 | from video_processor.providers.openai_provider import OpenAIProvider |
| 95 | self._providers[provider_name] = OpenAIProvider() |
| 96 | elif provider_name == "anthropic": |
| 97 | from video_processor.providers.anthropic_provider import AnthropicProvider |
| 98 | self._providers[provider_name] = AnthropicProvider() |
| 99 | elif provider_name == "gemini": |
| 100 | from video_processor.providers.gemini_provider import GeminiProvider |
| 101 | self._providers[provider_name] = GeminiProvider() |
| 102 | else: |
| 103 | raise ValueError(f"Unknown provider: {provider_name}") |
| 104 | return self._providers[provider_name] |
| 105 | |
| 106 | def _provider_for_model(self, model_id: str) -> str: |
| 107 | """Infer the provider from a model id.""" |
| 108 | if model_id.startswith("gpt-") or model_id.startswith("o1") or model_id.startswith("o3") or model_id.startswith("o4") or model_id.startswith("whisper"): |
| 109 | return "openai" |
| 110 | if model_id.startswith("claude-"): |
| 111 | return "anthropic" |
| 112 | if model_id.startswith("gemini-"): |
| 113 | return "gemini" |
| @@ -121,11 +139,13 @@ | |
| 121 | def _get_available_models(self) -> list[ModelInfo]: |
| 122 | if self._available_models is None: |
| 123 | self._available_models = discover_available_models() |
| 124 | return self._available_models |
| 125 | |
| 126 | def _resolve_model(self, explicit: Optional[str], capability: str, preferences: list[tuple[str, str]]) -> tuple[str, str]: |
| 127 | """ |
| 128 | Resolve which (provider, model) to use for a capability. |
| 129 | |
| 130 | Returns (provider_name, model_id). |
| 131 | """ |
| @@ -169,11 +189,13 @@ | |
| 169 | ) -> str: |
| 170 | """Send a chat completion to the best available provider.""" |
| 171 | prov_name, model = self._resolve_model(self.chat_model, "chat", _CHAT_PREFERENCES) |
| 172 | logger.info(f"Chat: using {prov_name}/{model}") |
| 173 | provider = self._get_provider(prov_name) |
| 174 | result = provider.chat(messages, max_tokens=max_tokens, temperature=temperature, model=model) |
| 175 | self._track(provider, prov_name, model) |
| 176 | return result |
| 177 | |
| 178 | def analyze_image( |
| 179 | self, |
| 180 |
| --- video_processor/providers/manager.py | |
| +++ video_processor/providers/manager.py | |
| @@ -1,9 +1,8 @@ | |
| 1 | """ProviderManager - unified interface for routing API calls to the best available provider.""" |
| 2 | |
| 3 | import logging |
| 4 | from pathlib import Path |
| 5 | from typing import Optional |
| 6 | |
| 7 | from dotenv import load_dotenv |
| 8 | |
| @@ -67,11 +66,13 @@ | |
| 66 | |
| 67 | # If a single provider is forced, apply it |
| 68 | if provider: |
| 69 | self.vision_model = vision_model or self._default_for_provider(provider, "vision") |
| 70 | self.chat_model = chat_model or self._default_for_provider(provider, "chat") |
| 71 | self.transcription_model = transcription_model or self._default_for_provider( |
| 72 | provider, "audio" |
| 73 | ) |
| 74 | else: |
| 75 | self.vision_model = vision_model |
| 76 | self.chat_model = chat_model |
| 77 | self.transcription_model = transcription_model |
| 78 | |
| @@ -80,34 +81,51 @@ | |
| 81 | @staticmethod |
| 82 | def _default_for_provider(provider: str, capability: str) -> str: |
| 83 | """Return the default model for a provider/capability combo.""" |
| 84 | defaults = { |
| 85 | "openai": {"chat": "gpt-4o", "vision": "gpt-4o", "audio": "whisper-1"}, |
| 86 | "anthropic": { |
| 87 | "chat": "claude-sonnet-4-5-20250929", |
| 88 | "vision": "claude-sonnet-4-5-20250929", |
| 89 | "audio": "", |
| 90 | }, |
| 91 | "gemini": { |
| 92 | "chat": "gemini-2.5-flash", |
| 93 | "vision": "gemini-2.5-flash", |
| 94 | "audio": "gemini-2.5-flash", |
| 95 | }, |
| 96 | } |
| 97 | return defaults.get(provider, {}).get(capability, "") |
| 98 | |
| 99 | def _get_provider(self, provider_name: str) -> BaseProvider: |
| 100 | """Lazily initialize and cache a provider instance.""" |
| 101 | if provider_name not in self._providers: |
| 102 | if provider_name == "openai": |
| 103 | from video_processor.providers.openai_provider import OpenAIProvider |
| 104 | |
| 105 | self._providers[provider_name] = OpenAIProvider() |
| 106 | elif provider_name == "anthropic": |
| 107 | from video_processor.providers.anthropic_provider import AnthropicProvider |
| 108 | |
| 109 | self._providers[provider_name] = AnthropicProvider() |
| 110 | elif provider_name == "gemini": |
| 111 | from video_processor.providers.gemini_provider import GeminiProvider |
| 112 | |
| 113 | self._providers[provider_name] = GeminiProvider() |
| 114 | else: |
| 115 | raise ValueError(f"Unknown provider: {provider_name}") |
| 116 | return self._providers[provider_name] |
| 117 | |
| 118 | def _provider_for_model(self, model_id: str) -> str: |
| 119 | """Infer the provider from a model id.""" |
| 120 | if ( |
| 121 | model_id.startswith("gpt-") |
| 122 | or model_id.startswith("o1") |
| 123 | or model_id.startswith("o3") |
| 124 | or model_id.startswith("o4") |
| 125 | or model_id.startswith("whisper") |
| 126 | ): |
| 127 | return "openai" |
| 128 | if model_id.startswith("claude-"): |
| 129 | return "anthropic" |
| 130 | if model_id.startswith("gemini-"): |
| 131 | return "gemini" |
| @@ -121,11 +139,13 @@ | |
| 139 | def _get_available_models(self) -> list[ModelInfo]: |
| 140 | if self._available_models is None: |
| 141 | self._available_models = discover_available_models() |
| 142 | return self._available_models |
| 143 | |
| 144 | def _resolve_model( |
| 145 | self, explicit: Optional[str], capability: str, preferences: list[tuple[str, str]] |
| 146 | ) -> tuple[str, str]: |
| 147 | """ |
| 148 | Resolve which (provider, model) to use for a capability. |
| 149 | |
| 150 | Returns (provider_name, model_id). |
| 151 | """ |
| @@ -169,11 +189,13 @@ | |
| 189 | ) -> str: |
| 190 | """Send a chat completion to the best available provider.""" |
| 191 | prov_name, model = self._resolve_model(self.chat_model, "chat", _CHAT_PREFERENCES) |
| 192 | logger.info(f"Chat: using {prov_name}/{model}") |
| 193 | provider = self._get_provider(prov_name) |
| 194 | result = provider.chat( |
| 195 | messages, max_tokens=max_tokens, temperature=temperature, model=model |
| 196 | ) |
| 197 | self._track(provider, prov_name, model) |
| 198 | return result |
| 199 | |
| 200 | def analyze_image( |
| 201 | self, |
| 202 |
| --- video_processor/providers/openai_provider.py | ||
| +++ video_processor/providers/openai_provider.py | ||
| @@ -13,11 +13,22 @@ | ||
| 13 | 13 | |
| 14 | 14 | load_dotenv() |
| 15 | 15 | logger = logging.getLogger(__name__) |
| 16 | 16 | |
| 17 | 17 | # Models known to have vision capability |
| 18 | -_VISION_MODELS = {"gpt-4o", "gpt-4o-mini", "gpt-4-turbo", "gpt-4.1", "gpt-4.1-mini", "gpt-4.1-nano", "o1", "o3", "o3-mini", "o4-mini"} | |
| 18 | +_VISION_MODELS = { | |
| 19 | + "gpt-4o", | |
| 20 | + "gpt-4o-mini", | |
| 21 | + "gpt-4-turbo", | |
| 22 | + "gpt-4.1", | |
| 23 | + "gpt-4.1-mini", | |
| 24 | + "gpt-4.1-nano", | |
| 25 | + "o1", | |
| 26 | + "o3", | |
| 27 | + "o3-mini", | |
| 28 | + "o4-mini", | |
| 29 | +} | |
| 19 | 30 | _AUDIO_MODELS = {"whisper-1"} |
| 20 | 31 | |
| 21 | 32 | |
| 22 | 33 | class OpenAIProvider(BaseProvider): |
| 23 | 34 | """OpenAI API provider.""" |
| @@ -44,11 +55,13 @@ | ||
| 44 | 55 | max_tokens=max_tokens, |
| 45 | 56 | temperature=temperature, |
| 46 | 57 | ) |
| 47 | 58 | self._last_usage = { |
| 48 | 59 | "input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0, |
| 49 | - "output_tokens": getattr(response.usage, "completion_tokens", 0) if response.usage else 0, | |
| 60 | + "output_tokens": getattr(response.usage, "completion_tokens", 0) | |
| 61 | + if response.usage | |
| 62 | + else 0, | |
| 50 | 63 | } |
| 51 | 64 | return response.choices[0].message.content or "" |
| 52 | 65 | |
| 53 | 66 | def analyze_image( |
| 54 | 67 | self, |
| @@ -75,11 +88,13 @@ | ||
| 75 | 88 | ], |
| 76 | 89 | max_tokens=max_tokens, |
| 77 | 90 | ) |
| 78 | 91 | self._last_usage = { |
| 79 | 92 | "input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0, |
| 80 | - "output_tokens": getattr(response.usage, "completion_tokens", 0) if response.usage else 0, | |
| 93 | + "output_tokens": getattr(response.usage, "completion_tokens", 0) | |
| 94 | + if response.usage | |
| 95 | + else 0, | |
| 81 | 96 | } |
| 82 | 97 | return response.choices[0].message.content or "" |
| 83 | 98 | |
| 84 | 99 | # Whisper API limit is 25MB |
| 85 | 100 | _MAX_FILE_SIZE = 25 * 1024 * 1024 |
| @@ -101,13 +116,11 @@ | ||
| 101 | 116 | logger.info( |
| 102 | 117 | f"Audio file {file_size / 1024 / 1024:.1f}MB exceeds Whisper 25MB limit, chunking..." |
| 103 | 118 | ) |
| 104 | 119 | return self._transcribe_chunked(audio_path, language, model) |
| 105 | 120 | |
| 106 | - def _transcribe_single( | |
| 107 | - self, audio_path: Path, language: Optional[str], model: str | |
| 108 | - ) -> dict: | |
| 121 | + def _transcribe_single(self, audio_path: Path, language: Optional[str], model: str) -> dict: | |
| 109 | 122 | """Transcribe a single audio file.""" |
| 110 | 123 | with open(audio_path, "rb") as f: |
| 111 | 124 | kwargs = {"model": model, "file": f} |
| 112 | 125 | if language: |
| 113 | 126 | kwargs["language"] = language |
| @@ -128,15 +141,14 @@ | ||
| 128 | 141 | "duration": getattr(response, "duration", None), |
| 129 | 142 | "provider": "openai", |
| 130 | 143 | "model": model, |
| 131 | 144 | } |
| 132 | 145 | |
| 133 | - def _transcribe_chunked( | |
| 134 | - self, audio_path: Path, language: Optional[str], model: str | |
| 135 | - ) -> dict: | |
| 146 | + def _transcribe_chunked(self, audio_path: Path, language: Optional[str], model: str) -> dict: | |
| 136 | 147 | """Split audio into chunks under 25MB and transcribe each.""" |
| 137 | 148 | import tempfile |
| 149 | + | |
| 138 | 150 | from video_processor.extractors.audio_extractor import AudioExtractor |
| 139 | 151 | |
| 140 | 152 | extractor = AudioExtractor() |
| 141 | 153 | audio_data, sr = extractor.load_audio(audio_path) |
| 142 | 154 | total_duration = len(audio_data) / sr |
| @@ -164,15 +176,17 @@ | ||
| 164 | 176 | logger.info(f"Transcribing chunk {i + 1}/{len(segments_data)}...") |
| 165 | 177 | result = self._transcribe_single(chunk_path, language, model) |
| 166 | 178 | |
| 167 | 179 | all_text.append(result["text"]) |
| 168 | 180 | for seg in result.get("segments", []): |
| 169 | - all_segments.append({ | |
| 170 | - "start": seg["start"] + time_offset, | |
| 171 | - "end": seg["end"] + time_offset, | |
| 172 | - "text": seg["text"], | |
| 173 | - }) | |
| 181 | + all_segments.append( | |
| 182 | + { | |
| 183 | + "start": seg["start"] + time_offset, | |
| 184 | + "end": seg["end"] + time_offset, | |
| 185 | + "text": seg["text"], | |
| 186 | + } | |
| 187 | + ) | |
| 174 | 188 | |
| 175 | 189 | if not detected_language and result.get("language"): |
| 176 | 190 | detected_language = result["language"] |
| 177 | 191 | |
| 178 | 192 | time_offset += len(chunk) / sr |
| @@ -200,14 +214,16 @@ | ||
| 200 | 214 | if mid in _AUDIO_MODELS or mid.startswith("whisper"): |
| 201 | 215 | caps.append("audio") |
| 202 | 216 | if "embedding" in mid: |
| 203 | 217 | caps.append("embedding") |
| 204 | 218 | if caps: |
| 205 | - models.append(ModelInfo( | |
| 206 | - id=mid, | |
| 207 | - provider="openai", | |
| 208 | - display_name=mid, | |
| 209 | - capabilities=caps, | |
| 210 | - )) | |
| 219 | + models.append( | |
| 220 | + ModelInfo( | |
| 221 | + id=mid, | |
| 222 | + provider="openai", | |
| 223 | + display_name=mid, | |
| 224 | + capabilities=caps, | |
| 225 | + ) | |
| 226 | + ) | |
| 211 | 227 | except Exception as e: |
| 212 | 228 | logger.warning(f"Failed to list OpenAI models: {e}") |
| 213 | 229 | return sorted(models, key=lambda m: m.id) |
| 214 | 230 |
| --- video_processor/providers/openai_provider.py | |
| +++ video_processor/providers/openai_provider.py | |
| @@ -13,11 +13,22 @@ | |
| 13 | |
| 14 | load_dotenv() |
| 15 | logger = logging.getLogger(__name__) |
| 16 | |
| 17 | # Models known to have vision capability |
| 18 | _VISION_MODELS = {"gpt-4o", "gpt-4o-mini", "gpt-4-turbo", "gpt-4.1", "gpt-4.1-mini", "gpt-4.1-nano", "o1", "o3", "o3-mini", "o4-mini"} |
| 19 | _AUDIO_MODELS = {"whisper-1"} |
| 20 | |
| 21 | |
| 22 | class OpenAIProvider(BaseProvider): |
| 23 | """OpenAI API provider.""" |
| @@ -44,11 +55,13 @@ | |
| 44 | max_tokens=max_tokens, |
| 45 | temperature=temperature, |
| 46 | ) |
| 47 | self._last_usage = { |
| 48 | "input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0, |
| 49 | "output_tokens": getattr(response.usage, "completion_tokens", 0) if response.usage else 0, |
| 50 | } |
| 51 | return response.choices[0].message.content or "" |
| 52 | |
| 53 | def analyze_image( |
| 54 | self, |
| @@ -75,11 +88,13 @@ | |
| 75 | ], |
| 76 | max_tokens=max_tokens, |
| 77 | ) |
| 78 | self._last_usage = { |
| 79 | "input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0, |
| 80 | "output_tokens": getattr(response.usage, "completion_tokens", 0) if response.usage else 0, |
| 81 | } |
| 82 | return response.choices[0].message.content or "" |
| 83 | |
| 84 | # Whisper API limit is 25MB |
| 85 | _MAX_FILE_SIZE = 25 * 1024 * 1024 |
| @@ -101,13 +116,11 @@ | |
| 101 | logger.info( |
| 102 | f"Audio file {file_size / 1024 / 1024:.1f}MB exceeds Whisper 25MB limit, chunking..." |
| 103 | ) |
| 104 | return self._transcribe_chunked(audio_path, language, model) |
| 105 | |
| 106 | def _transcribe_single( |
| 107 | self, audio_path: Path, language: Optional[str], model: str |
| 108 | ) -> dict: |
| 109 | """Transcribe a single audio file.""" |
| 110 | with open(audio_path, "rb") as f: |
| 111 | kwargs = {"model": model, "file": f} |
| 112 | if language: |
| 113 | kwargs["language"] = language |
| @@ -128,15 +141,14 @@ | |
| 128 | "duration": getattr(response, "duration", None), |
| 129 | "provider": "openai", |
| 130 | "model": model, |
| 131 | } |
| 132 | |
| 133 | def _transcribe_chunked( |
| 134 | self, audio_path: Path, language: Optional[str], model: str |
| 135 | ) -> dict: |
| 136 | """Split audio into chunks under 25MB and transcribe each.""" |
| 137 | import tempfile |
| 138 | from video_processor.extractors.audio_extractor import AudioExtractor |
| 139 | |
| 140 | extractor = AudioExtractor() |
| 141 | audio_data, sr = extractor.load_audio(audio_path) |
| 142 | total_duration = len(audio_data) / sr |
| @@ -164,15 +176,17 @@ | |
| 164 | logger.info(f"Transcribing chunk {i + 1}/{len(segments_data)}...") |
| 165 | result = self._transcribe_single(chunk_path, language, model) |
| 166 | |
| 167 | all_text.append(result["text"]) |
| 168 | for seg in result.get("segments", []): |
| 169 | all_segments.append({ |
| 170 | "start": seg["start"] + time_offset, |
| 171 | "end": seg["end"] + time_offset, |
| 172 | "text": seg["text"], |
| 173 | }) |
| 174 | |
| 175 | if not detected_language and result.get("language"): |
| 176 | detected_language = result["language"] |
| 177 | |
| 178 | time_offset += len(chunk) / sr |
| @@ -200,14 +214,16 @@ | |
| 200 | if mid in _AUDIO_MODELS or mid.startswith("whisper"): |
| 201 | caps.append("audio") |
| 202 | if "embedding" in mid: |
| 203 | caps.append("embedding") |
| 204 | if caps: |
| 205 | models.append(ModelInfo( |
| 206 | id=mid, |
| 207 | provider="openai", |
| 208 | display_name=mid, |
| 209 | capabilities=caps, |
| 210 | )) |
| 211 | except Exception as e: |
| 212 | logger.warning(f"Failed to list OpenAI models: {e}") |
| 213 | return sorted(models, key=lambda m: m.id) |
| 214 |
| --- video_processor/providers/openai_provider.py | |
| +++ video_processor/providers/openai_provider.py | |
| @@ -13,11 +13,22 @@ | |
| 13 | |
| 14 | load_dotenv() |
| 15 | logger = logging.getLogger(__name__) |
| 16 | |
| 17 | # Models known to have vision capability |
| 18 | _VISION_MODELS = { |
| 19 | "gpt-4o", |
| 20 | "gpt-4o-mini", |
| 21 | "gpt-4-turbo", |
| 22 | "gpt-4.1", |
| 23 | "gpt-4.1-mini", |
| 24 | "gpt-4.1-nano", |
| 25 | "o1", |
| 26 | "o3", |
| 27 | "o3-mini", |
| 28 | "o4-mini", |
| 29 | } |
| 30 | _AUDIO_MODELS = {"whisper-1"} |
| 31 | |
| 32 | |
| 33 | class OpenAIProvider(BaseProvider): |
| 34 | """OpenAI API provider.""" |
| @@ -44,11 +55,13 @@ | |
| 55 | max_tokens=max_tokens, |
| 56 | temperature=temperature, |
| 57 | ) |
| 58 | self._last_usage = { |
| 59 | "input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0, |
| 60 | "output_tokens": getattr(response.usage, "completion_tokens", 0) |
| 61 | if response.usage |
| 62 | else 0, |
| 63 | } |
| 64 | return response.choices[0].message.content or "" |
| 65 | |
| 66 | def analyze_image( |
| 67 | self, |
| @@ -75,11 +88,13 @@ | |
| 88 | ], |
| 89 | max_tokens=max_tokens, |
| 90 | ) |
| 91 | self._last_usage = { |
| 92 | "input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0, |
| 93 | "output_tokens": getattr(response.usage, "completion_tokens", 0) |
| 94 | if response.usage |
| 95 | else 0, |
| 96 | } |
| 97 | return response.choices[0].message.content or "" |
| 98 | |
| 99 | # Whisper API limit is 25MB |
| 100 | _MAX_FILE_SIZE = 25 * 1024 * 1024 |
| @@ -101,13 +116,11 @@ | |
| 116 | logger.info( |
| 117 | f"Audio file {file_size / 1024 / 1024:.1f}MB exceeds Whisper 25MB limit, chunking..." |
| 118 | ) |
| 119 | return self._transcribe_chunked(audio_path, language, model) |
| 120 | |
| 121 | def _transcribe_single(self, audio_path: Path, language: Optional[str], model: str) -> dict: |
| 122 | """Transcribe a single audio file.""" |
| 123 | with open(audio_path, "rb") as f: |
| 124 | kwargs = {"model": model, "file": f} |
| 125 | if language: |
| 126 | kwargs["language"] = language |
| @@ -128,15 +141,14 @@ | |
| 141 | "duration": getattr(response, "duration", None), |
| 142 | "provider": "openai", |
| 143 | "model": model, |
| 144 | } |
| 145 | |
| 146 | def _transcribe_chunked(self, audio_path: Path, language: Optional[str], model: str) -> dict: |
| 147 | """Split audio into chunks under 25MB and transcribe each.""" |
| 148 | import tempfile |
| 149 | |
| 150 | from video_processor.extractors.audio_extractor import AudioExtractor |
| 151 | |
| 152 | extractor = AudioExtractor() |
| 153 | audio_data, sr = extractor.load_audio(audio_path) |
| 154 | total_duration = len(audio_data) / sr |
| @@ -164,15 +176,17 @@ | |
| 176 | logger.info(f"Transcribing chunk {i + 1}/{len(segments_data)}...") |
| 177 | result = self._transcribe_single(chunk_path, language, model) |
| 178 | |
| 179 | all_text.append(result["text"]) |
| 180 | for seg in result.get("segments", []): |
| 181 | all_segments.append( |
| 182 | { |
| 183 | "start": seg["start"] + time_offset, |
| 184 | "end": seg["end"] + time_offset, |
| 185 | "text": seg["text"], |
| 186 | } |
| 187 | ) |
| 188 | |
| 189 | if not detected_language and result.get("language"): |
| 190 | detected_language = result["language"] |
| 191 | |
| 192 | time_offset += len(chunk) / sr |
| @@ -200,14 +214,16 @@ | |
| 214 | if mid in _AUDIO_MODELS or mid.startswith("whisper"): |
| 215 | caps.append("audio") |
| 216 | if "embedding" in mid: |
| 217 | caps.append("embedding") |
| 218 | if caps: |
| 219 | models.append( |
| 220 | ModelInfo( |
| 221 | id=mid, |
| 222 | provider="openai", |
| 223 | display_name=mid, |
| 224 | capabilities=caps, |
| 225 | ) |
| 226 | ) |
| 227 | except Exception as e: |
| 228 | logger.warning(f"Failed to list OpenAI models: {e}") |
| 229 | return sorted(models, key=lambda m: m.id) |
| 230 |
| --- video_processor/providers/whisper_local.py | ||
| +++ video_processor/providers/whisper_local.py | ||
| @@ -69,13 +69,11 @@ | ||
| 69 | 69 | return |
| 70 | 70 | |
| 71 | 71 | try: |
| 72 | 72 | import whisper |
| 73 | 73 | except ImportError: |
| 74 | - raise ImportError( | |
| 75 | - "openai-whisper not installed. Run: pip install openai-whisper torch" | |
| 76 | - ) | |
| 74 | + raise ImportError("openai-whisper not installed. Run: pip install openai-whisper torch") | |
| 77 | 75 | |
| 78 | 76 | logger.info(f"Loading Whisper {self.model_size} model on {self.device}...") |
| 79 | 77 | self._model = whisper.load_model(self.model_size, device=self.device) |
| 80 | 78 | logger.info("Whisper model loaded") |
| 81 | 79 | |
| @@ -125,10 +123,11 @@ | ||
| 125 | 123 | |
| 126 | 124 | @staticmethod |
| 127 | 125 | def is_available() -> bool: |
| 128 | 126 | """Check if local Whisper is installed and usable.""" |
| 129 | 127 | try: |
| 130 | - import whisper | |
| 131 | - import torch | |
| 128 | + import torch # noqa: F401 | |
| 129 | + import whisper # noqa: F401 | |
| 130 | + | |
| 132 | 131 | return True |
| 133 | 132 | except ImportError: |
| 134 | 133 | return False |
| 135 | 134 |
| --- video_processor/providers/whisper_local.py | |
| +++ video_processor/providers/whisper_local.py | |
| @@ -69,13 +69,11 @@ | |
| 69 | return |
| 70 | |
| 71 | try: |
| 72 | import whisper |
| 73 | except ImportError: |
| 74 | raise ImportError( |
| 75 | "openai-whisper not installed. Run: pip install openai-whisper torch" |
| 76 | ) |
| 77 | |
| 78 | logger.info(f"Loading Whisper {self.model_size} model on {self.device}...") |
| 79 | self._model = whisper.load_model(self.model_size, device=self.device) |
| 80 | logger.info("Whisper model loaded") |
| 81 | |
| @@ -125,10 +123,11 @@ | |
| 125 | |
| 126 | @staticmethod |
| 127 | def is_available() -> bool: |
| 128 | """Check if local Whisper is installed and usable.""" |
| 129 | try: |
| 130 | import whisper |
| 131 | import torch |
| 132 | return True |
| 133 | except ImportError: |
| 134 | return False |
| 135 |
| --- video_processor/providers/whisper_local.py | |
| +++ video_processor/providers/whisper_local.py | |
| @@ -69,13 +69,11 @@ | |
| 69 | return |
| 70 | |
| 71 | try: |
| 72 | import whisper |
| 73 | except ImportError: |
| 74 | raise ImportError("openai-whisper not installed. Run: pip install openai-whisper torch") |
| 75 | |
| 76 | logger.info(f"Loading Whisper {self.model_size} model on {self.device}...") |
| 77 | self._model = whisper.load_model(self.model_size, device=self.device) |
| 78 | logger.info("Whisper model loaded") |
| 79 | |
| @@ -125,10 +123,11 @@ | |
| 123 | |
| 124 | @staticmethod |
| 125 | def is_available() -> bool: |
| 126 | """Check if local Whisper is installed and usable.""" |
| 127 | try: |
| 128 | import torch # noqa: F401 |
| 129 | import whisper # noqa: F401 |
| 130 | |
| 131 | return True |
| 132 | except ImportError: |
| 133 | return False |
| 134 |
| --- video_processor/sources/base.py | ||
| +++ video_processor/sources/base.py | ||
| @@ -10,10 +10,11 @@ | ||
| 10 | 10 | logger = logging.getLogger(__name__) |
| 11 | 11 | |
| 12 | 12 | |
| 13 | 13 | class SourceFile(BaseModel): |
| 14 | 14 | """A file available in a cloud source.""" |
| 15 | + | |
| 15 | 16 | name: str = Field(description="File name") |
| 16 | 17 | id: str = Field(description="Provider-specific file identifier") |
| 17 | 18 | size_bytes: Optional[int] = Field(default=None, description="File size in bytes") |
| 18 | 19 | mime_type: Optional[str] = Field(default=None, description="MIME type") |
| 19 | 20 | modified_at: Optional[str] = Field(default=None, description="Last modified timestamp") |
| 20 | 21 |
| --- video_processor/sources/base.py | |
| +++ video_processor/sources/base.py | |
| @@ -10,10 +10,11 @@ | |
| 10 | logger = logging.getLogger(__name__) |
| 11 | |
| 12 | |
| 13 | class SourceFile(BaseModel): |
| 14 | """A file available in a cloud source.""" |
| 15 | name: str = Field(description="File name") |
| 16 | id: str = Field(description="Provider-specific file identifier") |
| 17 | size_bytes: Optional[int] = Field(default=None, description="File size in bytes") |
| 18 | mime_type: Optional[str] = Field(default=None, description="MIME type") |
| 19 | modified_at: Optional[str] = Field(default=None, description="Last modified timestamp") |
| 20 |
| --- video_processor/sources/base.py | |
| +++ video_processor/sources/base.py | |
| @@ -10,10 +10,11 @@ | |
| 10 | logger = logging.getLogger(__name__) |
| 11 | |
| 12 | |
| 13 | class SourceFile(BaseModel): |
| 14 | """A file available in a cloud source.""" |
| 15 | |
| 16 | name: str = Field(description="File name") |
| 17 | id: str = Field(description="Provider-specific file identifier") |
| 18 | size_bytes: Optional[int] = Field(default=None, description="File size in bytes") |
| 19 | mime_type: Optional[str] = Field(default=None, description="MIME type") |
| 20 | modified_at: Optional[str] = Field(default=None, description="Last modified timestamp") |
| 21 |
| --- video_processor/sources/dropbox_source.py | ||
| +++ video_processor/sources/dropbox_source.py | ||
| @@ -56,13 +56,11 @@ | ||
| 56 | 56 | def authenticate(self) -> bool: |
| 57 | 57 | """Authenticate with Dropbox API.""" |
| 58 | 58 | try: |
| 59 | 59 | import dropbox |
| 60 | 60 | except ImportError: |
| 61 | - logger.error( | |
| 62 | - "Dropbox SDK not installed. Run: pip install planopticon[dropbox]" | |
| 63 | - ) | |
| 61 | + logger.error("Dropbox SDK not installed. Run: pip install planopticon[dropbox]") | |
| 64 | 62 | return False |
| 65 | 63 | |
| 66 | 64 | # Try direct access token first |
| 67 | 65 | if self.access_token: |
| 68 | 66 | return self._auth_token(dropbox) |
| @@ -109,13 +107,11 @@ | ||
| 109 | 107 | return False |
| 110 | 108 | |
| 111 | 109 | def _auth_oauth(self, dropbox) -> bool: |
| 112 | 110 | """Run OAuth2 PKCE flow.""" |
| 113 | 111 | if not self.app_key: |
| 114 | - logger.error( | |
| 115 | - "Dropbox app key not configured. Set DROPBOX_APP_KEY env var." | |
| 116 | - ) | |
| 112 | + logger.error("Dropbox app key not configured. Set DROPBOX_APP_KEY env var.") | |
| 117 | 113 | return False |
| 118 | 114 | |
| 119 | 115 | try: |
| 120 | 116 | flow = dropbox.DropboxOAuth2FlowNoRedirect( |
| 121 | 117 | consumer_key=self.app_key, |
| @@ -187,13 +183,11 @@ | ||
| 187 | 183 | ext = Path(entry.name).suffix.lower() |
| 188 | 184 | if ext not in VIDEO_EXTENSIONS: |
| 189 | 185 | continue |
| 190 | 186 | |
| 191 | 187 | if patterns: |
| 192 | - if not any( | |
| 193 | - entry.name.endswith(p.replace("*", "")) for p in patterns | |
| 194 | - ): | |
| 188 | + if not any(entry.name.endswith(p.replace("*", "")) for p in patterns): | |
| 195 | 189 | continue |
| 196 | 190 | |
| 197 | 191 | files.append( |
| 198 | 192 | SourceFile( |
| 199 | 193 | name=entry.name, |
| 200 | 194 |
| --- video_processor/sources/dropbox_source.py | |
| +++ video_processor/sources/dropbox_source.py | |
| @@ -56,13 +56,11 @@ | |
| 56 | def authenticate(self) -> bool: |
| 57 | """Authenticate with Dropbox API.""" |
| 58 | try: |
| 59 | import dropbox |
| 60 | except ImportError: |
| 61 | logger.error( |
| 62 | "Dropbox SDK not installed. Run: pip install planopticon[dropbox]" |
| 63 | ) |
| 64 | return False |
| 65 | |
| 66 | # Try direct access token first |
| 67 | if self.access_token: |
| 68 | return self._auth_token(dropbox) |
| @@ -109,13 +107,11 @@ | |
| 109 | return False |
| 110 | |
| 111 | def _auth_oauth(self, dropbox) -> bool: |
| 112 | """Run OAuth2 PKCE flow.""" |
| 113 | if not self.app_key: |
| 114 | logger.error( |
| 115 | "Dropbox app key not configured. Set DROPBOX_APP_KEY env var." |
| 116 | ) |
| 117 | return False |
| 118 | |
| 119 | try: |
| 120 | flow = dropbox.DropboxOAuth2FlowNoRedirect( |
| 121 | consumer_key=self.app_key, |
| @@ -187,13 +183,11 @@ | |
| 187 | ext = Path(entry.name).suffix.lower() |
| 188 | if ext not in VIDEO_EXTENSIONS: |
| 189 | continue |
| 190 | |
| 191 | if patterns: |
| 192 | if not any( |
| 193 | entry.name.endswith(p.replace("*", "")) for p in patterns |
| 194 | ): |
| 195 | continue |
| 196 | |
| 197 | files.append( |
| 198 | SourceFile( |
| 199 | name=entry.name, |
| 200 |
| --- video_processor/sources/dropbox_source.py | |
| +++ video_processor/sources/dropbox_source.py | |
| @@ -56,13 +56,11 @@ | |
| 56 | def authenticate(self) -> bool: |
| 57 | """Authenticate with Dropbox API.""" |
| 58 | try: |
| 59 | import dropbox |
| 60 | except ImportError: |
| 61 | logger.error("Dropbox SDK not installed. Run: pip install planopticon[dropbox]") |
| 62 | return False |
| 63 | |
| 64 | # Try direct access token first |
| 65 | if self.access_token: |
| 66 | return self._auth_token(dropbox) |
| @@ -109,13 +107,11 @@ | |
| 107 | return False |
| 108 | |
| 109 | def _auth_oauth(self, dropbox) -> bool: |
| 110 | """Run OAuth2 PKCE flow.""" |
| 111 | if not self.app_key: |
| 112 | logger.error("Dropbox app key not configured. Set DROPBOX_APP_KEY env var.") |
| 113 | return False |
| 114 | |
| 115 | try: |
| 116 | flow = dropbox.DropboxOAuth2FlowNoRedirect( |
| 117 | consumer_key=self.app_key, |
| @@ -187,13 +183,11 @@ | |
| 183 | ext = Path(entry.name).suffix.lower() |
| 184 | if ext not in VIDEO_EXTENSIONS: |
| 185 | continue |
| 186 | |
| 187 | if patterns: |
| 188 | if not any(entry.name.endswith(p.replace("*", "")) for p in patterns): |
| 189 | continue |
| 190 | |
| 191 | files.append( |
| 192 | SourceFile( |
| 193 | name=entry.name, |
| 194 |
| --- video_processor/sources/google_drive.py | ||
| +++ video_processor/sources/google_drive.py | ||
| @@ -65,27 +65,23 @@ | ||
| 65 | 65 | If True, force service account auth. If False, force OAuth. |
| 66 | 66 | If None, auto-detect from credentials file. |
| 67 | 67 | token_path : Path, optional |
| 68 | 68 | Where to store/load OAuth tokens. Defaults to ~/.planopticon/google_drive_token.json |
| 69 | 69 | """ |
| 70 | - self.credentials_path = credentials_path or os.environ.get( | |
| 71 | - "GOOGLE_APPLICATION_CREDENTIALS" | |
| 72 | - ) | |
| 70 | + self.credentials_path = credentials_path or os.environ.get("GOOGLE_APPLICATION_CREDENTIALS") | |
| 73 | 71 | self.use_service_account = use_service_account |
| 74 | 72 | self.token_path = token_path or _TOKEN_PATH |
| 75 | 73 | self.service = None |
| 76 | 74 | self._creds = None |
| 77 | 75 | |
| 78 | 76 | def authenticate(self) -> bool: |
| 79 | 77 | """Authenticate with Google Drive API.""" |
| 80 | 78 | try: |
| 81 | - from google.oauth2 import service_account as sa_module | |
| 79 | + from google.oauth2 import service_account as sa_module # noqa: F401 | |
| 82 | 80 | from googleapiclient.discovery import build |
| 83 | 81 | except ImportError: |
| 84 | - logger.error( | |
| 85 | - "Google API client not installed. Run: pip install planopticon[gdrive]" | |
| 86 | - ) | |
| 82 | + logger.error("Google API client not installed. Run: pip install planopticon[gdrive]") | |
| 87 | 83 | return False |
| 88 | 84 | |
| 89 | 85 | # Determine auth method |
| 90 | 86 | if self.use_service_account is True or ( |
| 91 | 87 | self.use_service_account is None and self._is_service_account() |
| @@ -130,23 +126,19 @@ | ||
| 130 | 126 | try: |
| 131 | 127 | from google.auth.transport.requests import Request |
| 132 | 128 | from google.oauth2.credentials import Credentials |
| 133 | 129 | from google_auth_oauthlib.flow import InstalledAppFlow |
| 134 | 130 | except ImportError: |
| 135 | - logger.error( | |
| 136 | - "OAuth libraries not installed. Run: pip install planopticon[gdrive]" | |
| 137 | - ) | |
| 131 | + logger.error("OAuth libraries not installed. Run: pip install planopticon[gdrive]") | |
| 138 | 132 | return False |
| 139 | 133 | |
| 140 | 134 | creds = None |
| 141 | 135 | |
| 142 | 136 | # Load existing token |
| 143 | 137 | if self.token_path.exists(): |
| 144 | 138 | try: |
| 145 | - creds = Credentials.from_authorized_user_file( | |
| 146 | - str(self.token_path), SCOPES | |
| 147 | - ) | |
| 139 | + creds = Credentials.from_authorized_user_file(str(self.token_path), SCOPES) | |
| 148 | 140 | except Exception: |
| 149 | 141 | pass |
| 150 | 142 | |
| 151 | 143 | # Refresh or run new flow |
| 152 | 144 | if creds and creds.expired and creds.refresh_token: |
| @@ -251,13 +243,11 @@ | ||
| 251 | 243 | query_parts = [] |
| 252 | 244 | |
| 253 | 245 | if folder_id: |
| 254 | 246 | query_parts.append(f"'{folder_id}' in parents") |
| 255 | 247 | |
| 256 | - mime_conditions = " or ".join( | |
| 257 | - f"mimeType='{mt}'" for mt in VIDEO_MIME_TYPES | |
| 258 | - ) | |
| 248 | + mime_conditions = " or ".join(f"mimeType='{mt}'" for mt in VIDEO_MIME_TYPES) | |
| 259 | 249 | query_parts.append(f"({mime_conditions})") |
| 260 | 250 | query_parts.append("trashed=false") |
| 261 | 251 | |
| 262 | 252 | query = " and ".join(query_parts) |
| 263 | 253 | page_token = None |
| @@ -275,13 +265,11 @@ | ||
| 275 | 265 | .execute() |
| 276 | 266 | ) |
| 277 | 267 | |
| 278 | 268 | for f in response.get("files", []): |
| 279 | 269 | name = f.get("name", "") |
| 280 | - if patterns and not any( | |
| 281 | - name.endswith(p.replace("*", "")) for p in patterns | |
| 282 | - ): | |
| 270 | + if patterns and not any(name.endswith(p.replace("*", "")) for p in patterns): | |
| 283 | 271 | continue |
| 284 | 272 | |
| 285 | 273 | out.append( |
| 286 | 274 | SourceFile( |
| 287 | 275 | name=name, |
| @@ -336,11 +324,10 @@ | ||
| 336 | 324 | """Download a file from Google Drive.""" |
| 337 | 325 | if not self.service: |
| 338 | 326 | raise RuntimeError("Not authenticated. Call authenticate() first.") |
| 339 | 327 | |
| 340 | 328 | from googleapiclient.http import MediaIoBaseDownload |
| 341 | - import io | |
| 342 | 329 | |
| 343 | 330 | destination = Path(destination) |
| 344 | 331 | destination.parent.mkdir(parents=True, exist_ok=True) |
| 345 | 332 | |
| 346 | 333 | request = self.service.files().get_media(fileId=file.id) |
| 347 | 334 |
| --- video_processor/sources/google_drive.py | |
| +++ video_processor/sources/google_drive.py | |
| @@ -65,27 +65,23 @@ | |
| 65 | If True, force service account auth. If False, force OAuth. |
| 66 | If None, auto-detect from credentials file. |
| 67 | token_path : Path, optional |
| 68 | Where to store/load OAuth tokens. Defaults to ~/.planopticon/google_drive_token.json |
| 69 | """ |
| 70 | self.credentials_path = credentials_path or os.environ.get( |
| 71 | "GOOGLE_APPLICATION_CREDENTIALS" |
| 72 | ) |
| 73 | self.use_service_account = use_service_account |
| 74 | self.token_path = token_path or _TOKEN_PATH |
| 75 | self.service = None |
| 76 | self._creds = None |
| 77 | |
| 78 | def authenticate(self) -> bool: |
| 79 | """Authenticate with Google Drive API.""" |
| 80 | try: |
| 81 | from google.oauth2 import service_account as sa_module |
| 82 | from googleapiclient.discovery import build |
| 83 | except ImportError: |
| 84 | logger.error( |
| 85 | "Google API client not installed. Run: pip install planopticon[gdrive]" |
| 86 | ) |
| 87 | return False |
| 88 | |
| 89 | # Determine auth method |
| 90 | if self.use_service_account is True or ( |
| 91 | self.use_service_account is None and self._is_service_account() |
| @@ -130,23 +126,19 @@ | |
| 130 | try: |
| 131 | from google.auth.transport.requests import Request |
| 132 | from google.oauth2.credentials import Credentials |
| 133 | from google_auth_oauthlib.flow import InstalledAppFlow |
| 134 | except ImportError: |
| 135 | logger.error( |
| 136 | "OAuth libraries not installed. Run: pip install planopticon[gdrive]" |
| 137 | ) |
| 138 | return False |
| 139 | |
| 140 | creds = None |
| 141 | |
| 142 | # Load existing token |
| 143 | if self.token_path.exists(): |
| 144 | try: |
| 145 | creds = Credentials.from_authorized_user_file( |
| 146 | str(self.token_path), SCOPES |
| 147 | ) |
| 148 | except Exception: |
| 149 | pass |
| 150 | |
| 151 | # Refresh or run new flow |
| 152 | if creds and creds.expired and creds.refresh_token: |
| @@ -251,13 +243,11 @@ | |
| 251 | query_parts = [] |
| 252 | |
| 253 | if folder_id: |
| 254 | query_parts.append(f"'{folder_id}' in parents") |
| 255 | |
| 256 | mime_conditions = " or ".join( |
| 257 | f"mimeType='{mt}'" for mt in VIDEO_MIME_TYPES |
| 258 | ) |
| 259 | query_parts.append(f"({mime_conditions})") |
| 260 | query_parts.append("trashed=false") |
| 261 | |
| 262 | query = " and ".join(query_parts) |
| 263 | page_token = None |
| @@ -275,13 +265,11 @@ | |
| 275 | .execute() |
| 276 | ) |
| 277 | |
| 278 | for f in response.get("files", []): |
| 279 | name = f.get("name", "") |
| 280 | if patterns and not any( |
| 281 | name.endswith(p.replace("*", "")) for p in patterns |
| 282 | ): |
| 283 | continue |
| 284 | |
| 285 | out.append( |
| 286 | SourceFile( |
| 287 | name=name, |
| @@ -336,11 +324,10 @@ | |
| 336 | """Download a file from Google Drive.""" |
| 337 | if not self.service: |
| 338 | raise RuntimeError("Not authenticated. Call authenticate() first.") |
| 339 | |
| 340 | from googleapiclient.http import MediaIoBaseDownload |
| 341 | import io |
| 342 | |
| 343 | destination = Path(destination) |
| 344 | destination.parent.mkdir(parents=True, exist_ok=True) |
| 345 | |
| 346 | request = self.service.files().get_media(fileId=file.id) |
| 347 |
| --- video_processor/sources/google_drive.py | |
| +++ video_processor/sources/google_drive.py | |
| @@ -65,27 +65,23 @@ | |
| 65 | If True, force service account auth. If False, force OAuth. |
| 66 | If None, auto-detect from credentials file. |
| 67 | token_path : Path, optional |
| 68 | Where to store/load OAuth tokens. Defaults to ~/.planopticon/google_drive_token.json |
| 69 | """ |
| 70 | self.credentials_path = credentials_path or os.environ.get("GOOGLE_APPLICATION_CREDENTIALS") |
| 71 | self.use_service_account = use_service_account |
| 72 | self.token_path = token_path or _TOKEN_PATH |
| 73 | self.service = None |
| 74 | self._creds = None |
| 75 | |
| 76 | def authenticate(self) -> bool: |
| 77 | """Authenticate with Google Drive API.""" |
| 78 | try: |
| 79 | from google.oauth2 import service_account as sa_module # noqa: F401 |
| 80 | from googleapiclient.discovery import build |
| 81 | except ImportError: |
| 82 | logger.error("Google API client not installed. Run: pip install planopticon[gdrive]") |
| 83 | return False |
| 84 | |
| 85 | # Determine auth method |
| 86 | if self.use_service_account is True or ( |
| 87 | self.use_service_account is None and self._is_service_account() |
| @@ -130,23 +126,19 @@ | |
| 126 | try: |
| 127 | from google.auth.transport.requests import Request |
| 128 | from google.oauth2.credentials import Credentials |
| 129 | from google_auth_oauthlib.flow import InstalledAppFlow |
| 130 | except ImportError: |
| 131 | logger.error("OAuth libraries not installed. Run: pip install planopticon[gdrive]") |
| 132 | return False |
| 133 | |
| 134 | creds = None |
| 135 | |
| 136 | # Load existing token |
| 137 | if self.token_path.exists(): |
| 138 | try: |
| 139 | creds = Credentials.from_authorized_user_file(str(self.token_path), SCOPES) |
| 140 | except Exception: |
| 141 | pass |
| 142 | |
| 143 | # Refresh or run new flow |
| 144 | if creds and creds.expired and creds.refresh_token: |
| @@ -251,13 +243,11 @@ | |
| 243 | query_parts = [] |
| 244 | |
| 245 | if folder_id: |
| 246 | query_parts.append(f"'{folder_id}' in parents") |
| 247 | |
| 248 | mime_conditions = " or ".join(f"mimeType='{mt}'" for mt in VIDEO_MIME_TYPES) |
| 249 | query_parts.append(f"({mime_conditions})") |
| 250 | query_parts.append("trashed=false") |
| 251 | |
| 252 | query = " and ".join(query_parts) |
| 253 | page_token = None |
| @@ -275,13 +265,11 @@ | |
| 265 | .execute() |
| 266 | ) |
| 267 | |
| 268 | for f in response.get("files", []): |
| 269 | name = f.get("name", "") |
| 270 | if patterns and not any(name.endswith(p.replace("*", "")) for p in patterns): |
| 271 | continue |
| 272 | |
| 273 | out.append( |
| 274 | SourceFile( |
| 275 | name=name, |
| @@ -336,11 +324,10 @@ | |
| 324 | """Download a file from Google Drive.""" |
| 325 | if not self.service: |
| 326 | raise RuntimeError("Not authenticated. Call authenticate() first.") |
| 327 | |
| 328 | from googleapiclient.http import MediaIoBaseDownload |
| 329 | |
| 330 | destination = Path(destination) |
| 331 | destination.parent.mkdir(parents=True, exist_ok=True) |
| 332 | |
| 333 | request = self.service.files().get_media(fileId=file.id) |
| 334 |
+50
-56
| --- video_processor/utils/api_cache.py | ||
| +++ video_processor/utils/api_cache.py | ||
| @@ -1,28 +1,30 @@ | ||
| 1 | 1 | """Caching system for API responses to reduce API calls and costs.""" |
| 2 | + | |
| 3 | +import hashlib | |
| 2 | 4 | import json |
| 3 | 5 | import logging |
| 4 | 6 | import os |
| 5 | 7 | import time |
| 6 | -import hashlib | |
| 7 | 8 | from pathlib import Path |
| 8 | 9 | from typing import Any, Dict, Optional, Union |
| 9 | 10 | |
| 10 | 11 | logger = logging.getLogger(__name__) |
| 12 | + | |
| 11 | 13 | |
| 12 | 14 | class ApiCache: |
| 13 | 15 | """Disk-based API response cache.""" |
| 14 | - | |
| 16 | + | |
| 15 | 17 | def __init__( |
| 16 | - self, | |
| 17 | - cache_dir: Union[str, Path], | |
| 18 | + self, | |
| 19 | + cache_dir: Union[str, Path], | |
| 18 | 20 | namespace: str = "default", |
| 19 | - ttl: int = 86400 # 24 hours in seconds | |
| 21 | + ttl: int = 86400, # 24 hours in seconds | |
| 20 | 22 | ): |
| 21 | 23 | """ |
| 22 | 24 | Initialize API cache. |
| 23 | - | |
| 25 | + | |
| 24 | 26 | Parameters |
| 25 | 27 | ---------- |
| 26 | 28 | cache_dir : str or Path |
| 27 | 29 | Directory for cache files |
| 28 | 30 | namespace : str |
| @@ -31,206 +33,198 @@ | ||
| 31 | 33 | Time-to-live for cache entries in seconds |
| 32 | 34 | """ |
| 33 | 35 | self.cache_dir = Path(cache_dir) |
| 34 | 36 | self.namespace = namespace |
| 35 | 37 | self.ttl = ttl |
| 36 | - | |
| 38 | + | |
| 37 | 39 | # Ensure namespace directory exists |
| 38 | 40 | self.namespace_dir = self.cache_dir / namespace |
| 39 | 41 | self.namespace_dir.mkdir(parents=True, exist_ok=True) |
| 40 | - | |
| 42 | + | |
| 41 | 43 | logger.debug(f"Initialized API cache in {self.namespace_dir}") |
| 42 | - | |
| 44 | + | |
| 43 | 45 | def get_cache_path(self, key: str) -> Path: |
| 44 | 46 | """ |
| 45 | 47 | Get path to cache file for key. |
| 46 | - | |
| 48 | + | |
| 47 | 49 | Parameters |
| 48 | 50 | ---------- |
| 49 | 51 | key : str |
| 50 | 52 | Cache key |
| 51 | - | |
| 53 | + | |
| 52 | 54 | Returns |
| 53 | 55 | ------- |
| 54 | 56 | Path |
| 55 | 57 | Path to cache file |
| 56 | 58 | """ |
| 57 | 59 | # Hash the key to ensure valid filename |
| 58 | 60 | hashed_key = hashlib.md5(key.encode()).hexdigest() |
| 59 | 61 | return self.namespace_dir / f"{hashed_key}.json" |
| 60 | - | |
| 62 | + | |
| 61 | 63 | def get(self, key: str) -> Optional[Any]: |
| 62 | 64 | """ |
| 63 | 65 | Get value from cache. |
| 64 | - | |
| 66 | + | |
| 65 | 67 | Parameters |
| 66 | 68 | ---------- |
| 67 | 69 | key : str |
| 68 | 70 | Cache key |
| 69 | - | |
| 71 | + | |
| 70 | 72 | Returns |
| 71 | 73 | ------- |
| 72 | 74 | object or None |
| 73 | 75 | Cached value if available and not expired, None otherwise |
| 74 | 76 | """ |
| 75 | 77 | cache_path = self.get_cache_path(key) |
| 76 | - | |
| 78 | + | |
| 77 | 79 | # Check if cache file exists |
| 78 | 80 | if not cache_path.exists(): |
| 79 | 81 | return None |
| 80 | - | |
| 82 | + | |
| 81 | 83 | try: |
| 82 | 84 | # Read cache file |
| 83 | 85 | with open(cache_path, "r", encoding="utf-8") as f: |
| 84 | 86 | cache_data = json.load(f) |
| 85 | - | |
| 87 | + | |
| 86 | 88 | # Check if cache entry is expired |
| 87 | 89 | timestamp = cache_data.get("timestamp", 0) |
| 88 | 90 | now = time.time() |
| 89 | - | |
| 91 | + | |
| 90 | 92 | if now - timestamp > self.ttl: |
| 91 | 93 | logger.debug(f"Cache entry expired for {key}") |
| 92 | 94 | return None |
| 93 | - | |
| 95 | + | |
| 94 | 96 | logger.debug(f"Cache hit for {key}") |
| 95 | 97 | return cache_data.get("value") |
| 96 | - | |
| 98 | + | |
| 97 | 99 | except Exception as e: |
| 98 | 100 | logger.warning(f"Error reading cache: {str(e)}") |
| 99 | 101 | return None |
| 100 | - | |
| 102 | + | |
| 101 | 103 | def set(self, key: str, value: Any) -> bool: |
| 102 | 104 | """ |
| 103 | 105 | Set value in cache. |
| 104 | - | |
| 106 | + | |
| 105 | 107 | Parameters |
| 106 | 108 | ---------- |
| 107 | 109 | key : str |
| 108 | 110 | Cache key |
| 109 | 111 | value : object |
| 110 | 112 | Value to cache (must be JSON serializable) |
| 111 | - | |
| 113 | + | |
| 112 | 114 | Returns |
| 113 | 115 | ------- |
| 114 | 116 | bool |
| 115 | 117 | True if successful, False otherwise |
| 116 | 118 | """ |
| 117 | 119 | cache_path = self.get_cache_path(key) |
| 118 | - | |
| 120 | + | |
| 119 | 121 | try: |
| 120 | 122 | # Prepare cache data |
| 121 | - cache_data = { | |
| 122 | - "timestamp": time.time(), | |
| 123 | - "value": value | |
| 124 | - } | |
| 125 | - | |
| 123 | + cache_data = {"timestamp": time.time(), "value": value} | |
| 124 | + | |
| 126 | 125 | # Write to cache file |
| 127 | 126 | with open(cache_path, "w", encoding="utf-8") as f: |
| 128 | 127 | json.dump(cache_data, f, ensure_ascii=False) |
| 129 | - | |
| 128 | + | |
| 130 | 129 | logger.debug(f"Cached value for {key}") |
| 131 | 130 | return True |
| 132 | - | |
| 131 | + | |
| 133 | 132 | except Exception as e: |
| 134 | 133 | logger.warning(f"Error writing to cache: {str(e)}") |
| 135 | 134 | return False |
| 136 | - | |
| 135 | + | |
| 137 | 136 | def invalidate(self, key: str) -> bool: |
| 138 | 137 | """ |
| 139 | 138 | Invalidate cache entry. |
| 140 | - | |
| 139 | + | |
| 141 | 140 | Parameters |
| 142 | 141 | ---------- |
| 143 | 142 | key : str |
| 144 | 143 | Cache key |
| 145 | - | |
| 144 | + | |
| 146 | 145 | Returns |
| 147 | 146 | ------- |
| 148 | 147 | bool |
| 149 | 148 | True if entry was removed, False otherwise |
| 150 | 149 | """ |
| 151 | 150 | cache_path = self.get_cache_path(key) |
| 152 | - | |
| 151 | + | |
| 153 | 152 | if cache_path.exists(): |
| 154 | 153 | try: |
| 155 | 154 | os.remove(cache_path) |
| 156 | 155 | logger.debug(f"Invalidated cache for {key}") |
| 157 | 156 | return True |
| 158 | 157 | except Exception as e: |
| 159 | 158 | logger.warning(f"Error invalidating cache: {str(e)}") |
| 160 | - | |
| 159 | + | |
| 161 | 160 | return False |
| 162 | - | |
| 161 | + | |
| 163 | 162 | def clear(self, older_than: Optional[int] = None) -> int: |
| 164 | 163 | """ |
| 165 | 164 | Clear all cache entries or entries older than specified time. |
| 166 | - | |
| 165 | + | |
| 167 | 166 | Parameters |
| 168 | 167 | ---------- |
| 169 | 168 | older_than : int, optional |
| 170 | 169 | Clear entries older than this many seconds |
| 171 | - | |
| 170 | + | |
| 172 | 171 | Returns |
| 173 | 172 | ------- |
| 174 | 173 | int |
| 175 | 174 | Number of entries cleared |
| 176 | 175 | """ |
| 177 | 176 | count = 0 |
| 178 | 177 | now = time.time() |
| 179 | - | |
| 178 | + | |
| 180 | 179 | for cache_file in self.namespace_dir.glob("*.json"): |
| 181 | 180 | try: |
| 182 | 181 | # Check file age if criteria provided |
| 183 | 182 | if older_than is not None: |
| 184 | 183 | file_age = now - os.path.getmtime(cache_file) |
| 185 | 184 | if file_age <= older_than: |
| 186 | 185 | continue |
| 187 | - | |
| 186 | + | |
| 188 | 187 | # Remove file |
| 189 | 188 | os.remove(cache_file) |
| 190 | 189 | count += 1 |
| 191 | - | |
| 190 | + | |
| 192 | 191 | except Exception as e: |
| 193 | 192 | logger.warning(f"Error clearing cache file {cache_file}: {str(e)}") |
| 194 | - | |
| 193 | + | |
| 195 | 194 | logger.info(f"Cleared {count} cache entries from {self.namespace}") |
| 196 | 195 | return count |
| 197 | - | |
| 196 | + | |
| 198 | 197 | def get_stats(self) -> Dict: |
| 199 | 198 | """ |
| 200 | 199 | Get cache statistics. |
| 201 | - | |
| 200 | + | |
| 202 | 201 | Returns |
| 203 | 202 | ------- |
| 204 | 203 | dict |
| 205 | 204 | Cache statistics |
| 206 | 205 | """ |
| 207 | 206 | cache_files = list(self.namespace_dir.glob("*.json")) |
| 208 | 207 | total_size = sum(os.path.getsize(f) for f in cache_files) |
| 209 | - | |
| 208 | + | |
| 210 | 209 | # Analyze age distribution |
| 211 | 210 | now = time.time() |
| 212 | - age_distribution = { | |
| 213 | - "1h": 0, | |
| 214 | - "6h": 0, | |
| 215 | - "24h": 0, | |
| 216 | - "older": 0 | |
| 217 | - } | |
| 218 | - | |
| 211 | + age_distribution = {"1h": 0, "6h": 0, "24h": 0, "older": 0} | |
| 212 | + | |
| 219 | 213 | for cache_file in cache_files: |
| 220 | 214 | file_age = now - os.path.getmtime(cache_file) |
| 221 | - | |
| 215 | + | |
| 222 | 216 | if file_age <= 3600: # 1 hour |
| 223 | 217 | age_distribution["1h"] += 1 |
| 224 | 218 | elif file_age <= 21600: # 6 hours |
| 225 | 219 | age_distribution["6h"] += 1 |
| 226 | 220 | elif file_age <= 86400: # 24 hours |
| 227 | 221 | age_distribution["24h"] += 1 |
| 228 | 222 | else: |
| 229 | 223 | age_distribution["older"] += 1 |
| 230 | - | |
| 224 | + | |
| 231 | 225 | return { |
| 232 | 226 | "namespace": self.namespace, |
| 233 | 227 | "entry_count": len(cache_files), |
| 234 | 228 | "total_size_bytes": total_size, |
| 235 | - "age_distribution": age_distribution | |
| 229 | + "age_distribution": age_distribution, | |
| 236 | 230 | } |
| 237 | 231 |
| --- video_processor/utils/api_cache.py | |
| +++ video_processor/utils/api_cache.py | |
| @@ -1,28 +1,30 @@ | |
| 1 | """Caching system for API responses to reduce API calls and costs.""" |
| 2 | import json |
| 3 | import logging |
| 4 | import os |
| 5 | import time |
| 6 | import hashlib |
| 7 | from pathlib import Path |
| 8 | from typing import Any, Dict, Optional, Union |
| 9 | |
| 10 | logger = logging.getLogger(__name__) |
| 11 | |
| 12 | class ApiCache: |
| 13 | """Disk-based API response cache.""" |
| 14 | |
| 15 | def __init__( |
| 16 | self, |
| 17 | cache_dir: Union[str, Path], |
| 18 | namespace: str = "default", |
| 19 | ttl: int = 86400 # 24 hours in seconds |
| 20 | ): |
| 21 | """ |
| 22 | Initialize API cache. |
| 23 | |
| 24 | Parameters |
| 25 | ---------- |
| 26 | cache_dir : str or Path |
| 27 | Directory for cache files |
| 28 | namespace : str |
| @@ -31,206 +33,198 @@ | |
| 31 | Time-to-live for cache entries in seconds |
| 32 | """ |
| 33 | self.cache_dir = Path(cache_dir) |
| 34 | self.namespace = namespace |
| 35 | self.ttl = ttl |
| 36 | |
| 37 | # Ensure namespace directory exists |
| 38 | self.namespace_dir = self.cache_dir / namespace |
| 39 | self.namespace_dir.mkdir(parents=True, exist_ok=True) |
| 40 | |
| 41 | logger.debug(f"Initialized API cache in {self.namespace_dir}") |
| 42 | |
| 43 | def get_cache_path(self, key: str) -> Path: |
| 44 | """ |
| 45 | Get path to cache file for key. |
| 46 | |
| 47 | Parameters |
| 48 | ---------- |
| 49 | key : str |
| 50 | Cache key |
| 51 | |
| 52 | Returns |
| 53 | ------- |
| 54 | Path |
| 55 | Path to cache file |
| 56 | """ |
| 57 | # Hash the key to ensure valid filename |
| 58 | hashed_key = hashlib.md5(key.encode()).hexdigest() |
| 59 | return self.namespace_dir / f"{hashed_key}.json" |
| 60 | |
| 61 | def get(self, key: str) -> Optional[Any]: |
| 62 | """ |
| 63 | Get value from cache. |
| 64 | |
| 65 | Parameters |
| 66 | ---------- |
| 67 | key : str |
| 68 | Cache key |
| 69 | |
| 70 | Returns |
| 71 | ------- |
| 72 | object or None |
| 73 | Cached value if available and not expired, None otherwise |
| 74 | """ |
| 75 | cache_path = self.get_cache_path(key) |
| 76 | |
| 77 | # Check if cache file exists |
| 78 | if not cache_path.exists(): |
| 79 | return None |
| 80 | |
| 81 | try: |
| 82 | # Read cache file |
| 83 | with open(cache_path, "r", encoding="utf-8") as f: |
| 84 | cache_data = json.load(f) |
| 85 | |
| 86 | # Check if cache entry is expired |
| 87 | timestamp = cache_data.get("timestamp", 0) |
| 88 | now = time.time() |
| 89 | |
| 90 | if now - timestamp > self.ttl: |
| 91 | logger.debug(f"Cache entry expired for {key}") |
| 92 | return None |
| 93 | |
| 94 | logger.debug(f"Cache hit for {key}") |
| 95 | return cache_data.get("value") |
| 96 | |
| 97 | except Exception as e: |
| 98 | logger.warning(f"Error reading cache: {str(e)}") |
| 99 | return None |
| 100 | |
| 101 | def set(self, key: str, value: Any) -> bool: |
| 102 | """ |
| 103 | Set value in cache. |
| 104 | |
| 105 | Parameters |
| 106 | ---------- |
| 107 | key : str |
| 108 | Cache key |
| 109 | value : object |
| 110 | Value to cache (must be JSON serializable) |
| 111 | |
| 112 | Returns |
| 113 | ------- |
| 114 | bool |
| 115 | True if successful, False otherwise |
| 116 | """ |
| 117 | cache_path = self.get_cache_path(key) |
| 118 | |
| 119 | try: |
| 120 | # Prepare cache data |
| 121 | cache_data = { |
| 122 | "timestamp": time.time(), |
| 123 | "value": value |
| 124 | } |
| 125 | |
| 126 | # Write to cache file |
| 127 | with open(cache_path, "w", encoding="utf-8") as f: |
| 128 | json.dump(cache_data, f, ensure_ascii=False) |
| 129 | |
| 130 | logger.debug(f"Cached value for {key}") |
| 131 | return True |
| 132 | |
| 133 | except Exception as e: |
| 134 | logger.warning(f"Error writing to cache: {str(e)}") |
| 135 | return False |
| 136 | |
| 137 | def invalidate(self, key: str) -> bool: |
| 138 | """ |
| 139 | Invalidate cache entry. |
| 140 | |
| 141 | Parameters |
| 142 | ---------- |
| 143 | key : str |
| 144 | Cache key |
| 145 | |
| 146 | Returns |
| 147 | ------- |
| 148 | bool |
| 149 | True if entry was removed, False otherwise |
| 150 | """ |
| 151 | cache_path = self.get_cache_path(key) |
| 152 | |
| 153 | if cache_path.exists(): |
| 154 | try: |
| 155 | os.remove(cache_path) |
| 156 | logger.debug(f"Invalidated cache for {key}") |
| 157 | return True |
| 158 | except Exception as e: |
| 159 | logger.warning(f"Error invalidating cache: {str(e)}") |
| 160 | |
| 161 | return False |
| 162 | |
| 163 | def clear(self, older_than: Optional[int] = None) -> int: |
| 164 | """ |
| 165 | Clear all cache entries or entries older than specified time. |
| 166 | |
| 167 | Parameters |
| 168 | ---------- |
| 169 | older_than : int, optional |
| 170 | Clear entries older than this many seconds |
| 171 | |
| 172 | Returns |
| 173 | ------- |
| 174 | int |
| 175 | Number of entries cleared |
| 176 | """ |
| 177 | count = 0 |
| 178 | now = time.time() |
| 179 | |
| 180 | for cache_file in self.namespace_dir.glob("*.json"): |
| 181 | try: |
| 182 | # Check file age if criteria provided |
| 183 | if older_than is not None: |
| 184 | file_age = now - os.path.getmtime(cache_file) |
| 185 | if file_age <= older_than: |
| 186 | continue |
| 187 | |
| 188 | # Remove file |
| 189 | os.remove(cache_file) |
| 190 | count += 1 |
| 191 | |
| 192 | except Exception as e: |
| 193 | logger.warning(f"Error clearing cache file {cache_file}: {str(e)}") |
| 194 | |
| 195 | logger.info(f"Cleared {count} cache entries from {self.namespace}") |
| 196 | return count |
| 197 | |
| 198 | def get_stats(self) -> Dict: |
| 199 | """ |
| 200 | Get cache statistics. |
| 201 | |
| 202 | Returns |
| 203 | ------- |
| 204 | dict |
| 205 | Cache statistics |
| 206 | """ |
| 207 | cache_files = list(self.namespace_dir.glob("*.json")) |
| 208 | total_size = sum(os.path.getsize(f) for f in cache_files) |
| 209 | |
| 210 | # Analyze age distribution |
| 211 | now = time.time() |
| 212 | age_distribution = { |
| 213 | "1h": 0, |
| 214 | "6h": 0, |
| 215 | "24h": 0, |
| 216 | "older": 0 |
| 217 | } |
| 218 | |
| 219 | for cache_file in cache_files: |
| 220 | file_age = now - os.path.getmtime(cache_file) |
| 221 | |
| 222 | if file_age <= 3600: # 1 hour |
| 223 | age_distribution["1h"] += 1 |
| 224 | elif file_age <= 21600: # 6 hours |
| 225 | age_distribution["6h"] += 1 |
| 226 | elif file_age <= 86400: # 24 hours |
| 227 | age_distribution["24h"] += 1 |
| 228 | else: |
| 229 | age_distribution["older"] += 1 |
| 230 | |
| 231 | return { |
| 232 | "namespace": self.namespace, |
| 233 | "entry_count": len(cache_files), |
| 234 | "total_size_bytes": total_size, |
| 235 | "age_distribution": age_distribution |
| 236 | } |
| 237 |
| --- video_processor/utils/api_cache.py | |
| +++ video_processor/utils/api_cache.py | |
| @@ -1,28 +1,30 @@ | |
| 1 | """Caching system for API responses to reduce API calls and costs.""" |
| 2 | |
| 3 | import hashlib |
| 4 | import json |
| 5 | import logging |
| 6 | import os |
| 7 | import time |
| 8 | from pathlib import Path |
| 9 | from typing import Any, Dict, Optional, Union |
| 10 | |
| 11 | logger = logging.getLogger(__name__) |
| 12 | |
| 13 | |
| 14 | class ApiCache: |
| 15 | """Disk-based API response cache.""" |
| 16 | |
| 17 | def __init__( |
| 18 | self, |
| 19 | cache_dir: Union[str, Path], |
| 20 | namespace: str = "default", |
| 21 | ttl: int = 86400, # 24 hours in seconds |
| 22 | ): |
| 23 | """ |
| 24 | Initialize API cache. |
| 25 | |
| 26 | Parameters |
| 27 | ---------- |
| 28 | cache_dir : str or Path |
| 29 | Directory for cache files |
| 30 | namespace : str |
| @@ -31,206 +33,198 @@ | |
| 33 | Time-to-live for cache entries in seconds |
| 34 | """ |
| 35 | self.cache_dir = Path(cache_dir) |
| 36 | self.namespace = namespace |
| 37 | self.ttl = ttl |
| 38 | |
| 39 | # Ensure namespace directory exists |
| 40 | self.namespace_dir = self.cache_dir / namespace |
| 41 | self.namespace_dir.mkdir(parents=True, exist_ok=True) |
| 42 | |
| 43 | logger.debug(f"Initialized API cache in {self.namespace_dir}") |
| 44 | |
| 45 | def get_cache_path(self, key: str) -> Path: |
| 46 | """ |
| 47 | Get path to cache file for key. |
| 48 | |
| 49 | Parameters |
| 50 | ---------- |
| 51 | key : str |
| 52 | Cache key |
| 53 | |
| 54 | Returns |
| 55 | ------- |
| 56 | Path |
| 57 | Path to cache file |
| 58 | """ |
| 59 | # Hash the key to ensure valid filename |
| 60 | hashed_key = hashlib.md5(key.encode()).hexdigest() |
| 61 | return self.namespace_dir / f"{hashed_key}.json" |
| 62 | |
| 63 | def get(self, key: str) -> Optional[Any]: |
| 64 | """ |
| 65 | Get value from cache. |
| 66 | |
| 67 | Parameters |
| 68 | ---------- |
| 69 | key : str |
| 70 | Cache key |
| 71 | |
| 72 | Returns |
| 73 | ------- |
| 74 | object or None |
| 75 | Cached value if available and not expired, None otherwise |
| 76 | """ |
| 77 | cache_path = self.get_cache_path(key) |
| 78 | |
| 79 | # Check if cache file exists |
| 80 | if not cache_path.exists(): |
| 81 | return None |
| 82 | |
| 83 | try: |
| 84 | # Read cache file |
| 85 | with open(cache_path, "r", encoding="utf-8") as f: |
| 86 | cache_data = json.load(f) |
| 87 | |
| 88 | # Check if cache entry is expired |
| 89 | timestamp = cache_data.get("timestamp", 0) |
| 90 | now = time.time() |
| 91 | |
| 92 | if now - timestamp > self.ttl: |
| 93 | logger.debug(f"Cache entry expired for {key}") |
| 94 | return None |
| 95 | |
| 96 | logger.debug(f"Cache hit for {key}") |
| 97 | return cache_data.get("value") |
| 98 | |
| 99 | except Exception as e: |
| 100 | logger.warning(f"Error reading cache: {str(e)}") |
| 101 | return None |
| 102 | |
| 103 | def set(self, key: str, value: Any) -> bool: |
| 104 | """ |
| 105 | Set value in cache. |
| 106 | |
| 107 | Parameters |
| 108 | ---------- |
| 109 | key : str |
| 110 | Cache key |
| 111 | value : object |
| 112 | Value to cache (must be JSON serializable) |
| 113 | |
| 114 | Returns |
| 115 | ------- |
| 116 | bool |
| 117 | True if successful, False otherwise |
| 118 | """ |
| 119 | cache_path = self.get_cache_path(key) |
| 120 | |
| 121 | try: |
| 122 | # Prepare cache data |
| 123 | cache_data = {"timestamp": time.time(), "value": value} |
| 124 | |
| 125 | # Write to cache file |
| 126 | with open(cache_path, "w", encoding="utf-8") as f: |
| 127 | json.dump(cache_data, f, ensure_ascii=False) |
| 128 | |
| 129 | logger.debug(f"Cached value for {key}") |
| 130 | return True |
| 131 | |
| 132 | except Exception as e: |
| 133 | logger.warning(f"Error writing to cache: {str(e)}") |
| 134 | return False |
| 135 | |
| 136 | def invalidate(self, key: str) -> bool: |
| 137 | """ |
| 138 | Invalidate cache entry. |
| 139 | |
| 140 | Parameters |
| 141 | ---------- |
| 142 | key : str |
| 143 | Cache key |
| 144 | |
| 145 | Returns |
| 146 | ------- |
| 147 | bool |
| 148 | True if entry was removed, False otherwise |
| 149 | """ |
| 150 | cache_path = self.get_cache_path(key) |
| 151 | |
| 152 | if cache_path.exists(): |
| 153 | try: |
| 154 | os.remove(cache_path) |
| 155 | logger.debug(f"Invalidated cache for {key}") |
| 156 | return True |
| 157 | except Exception as e: |
| 158 | logger.warning(f"Error invalidating cache: {str(e)}") |
| 159 | |
| 160 | return False |
| 161 | |
| 162 | def clear(self, older_than: Optional[int] = None) -> int: |
| 163 | """ |
| 164 | Clear all cache entries or entries older than specified time. |
| 165 | |
| 166 | Parameters |
| 167 | ---------- |
| 168 | older_than : int, optional |
| 169 | Clear entries older than this many seconds |
| 170 | |
| 171 | Returns |
| 172 | ------- |
| 173 | int |
| 174 | Number of entries cleared |
| 175 | """ |
| 176 | count = 0 |
| 177 | now = time.time() |
| 178 | |
| 179 | for cache_file in self.namespace_dir.glob("*.json"): |
| 180 | try: |
| 181 | # Check file age if criteria provided |
| 182 | if older_than is not None: |
| 183 | file_age = now - os.path.getmtime(cache_file) |
| 184 | if file_age <= older_than: |
| 185 | continue |
| 186 | |
| 187 | # Remove file |
| 188 | os.remove(cache_file) |
| 189 | count += 1 |
| 190 | |
| 191 | except Exception as e: |
| 192 | logger.warning(f"Error clearing cache file {cache_file}: {str(e)}") |
| 193 | |
| 194 | logger.info(f"Cleared {count} cache entries from {self.namespace}") |
| 195 | return count |
| 196 | |
| 197 | def get_stats(self) -> Dict: |
| 198 | """ |
| 199 | Get cache statistics. |
| 200 | |
| 201 | Returns |
| 202 | ------- |
| 203 | dict |
| 204 | Cache statistics |
| 205 | """ |
| 206 | cache_files = list(self.namespace_dir.glob("*.json")) |
| 207 | total_size = sum(os.path.getsize(f) for f in cache_files) |
| 208 | |
| 209 | # Analyze age distribution |
| 210 | now = time.time() |
| 211 | age_distribution = {"1h": 0, "6h": 0, "24h": 0, "older": 0} |
| 212 | |
| 213 | for cache_file in cache_files: |
| 214 | file_age = now - os.path.getmtime(cache_file) |
| 215 | |
| 216 | if file_age <= 3600: # 1 hour |
| 217 | age_distribution["1h"] += 1 |
| 218 | elif file_age <= 21600: # 6 hours |
| 219 | age_distribution["6h"] += 1 |
| 220 | elif file_age <= 86400: # 24 hours |
| 221 | age_distribution["24h"] += 1 |
| 222 | else: |
| 223 | age_distribution["older"] += 1 |
| 224 | |
| 225 | return { |
| 226 | "namespace": self.namespace, |
| 227 | "entry_count": len(cache_files), |
| 228 | "total_size_bytes": total_size, |
| 229 | "age_distribution": age_distribution, |
| 230 | } |
| 231 |
+7
-4
| --- video_processor/utils/export.py | ||
| +++ video_processor/utils/export.py | ||
| @@ -1,15 +1,14 @@ | ||
| 1 | 1 | """Multi-format output orchestration.""" |
| 2 | 2 | |
| 3 | -import json | |
| 4 | 3 | import logging |
| 5 | 4 | from pathlib import Path |
| 6 | 5 | from typing import Optional |
| 7 | 6 | |
| 8 | 7 | from tqdm import tqdm |
| 9 | 8 | |
| 10 | -from video_processor.models import DiagramResult, VideoManifest | |
| 9 | +from video_processor.models import VideoManifest | |
| 11 | 10 | from video_processor.utils.rendering import render_mermaid, reproduce_chart |
| 12 | 11 | |
| 13 | 12 | logger = logging.getLogger(__name__) |
| 14 | 13 | |
| 15 | 14 | |
| @@ -79,11 +78,13 @@ | ||
| 79 | 78 | svg_path = output_dir / d.svg_path if d.svg_path else None |
| 80 | 79 | if svg_path and svg_path.exists(): |
| 81 | 80 | svg_content = svg_path.read_text() |
| 82 | 81 | diag_html += f'<div class="diagram">{svg_content}</div>' |
| 83 | 82 | elif d.image_path: |
| 84 | - diag_html += f'<img src="{d.image_path}" alt="Diagram {i + 1}" style="max-width:100%">' | |
| 83 | + diag_html += ( | |
| 84 | + f'<img src="{d.image_path}" alt="Diagram {i + 1}" style="max-width:100%">' | |
| 85 | + ) | |
| 85 | 86 | if d.mermaid: |
| 86 | 87 | diag_html += f'<pre class="mermaid">{d.mermaid}</pre>' |
| 87 | 88 | sections.append(diag_html) |
| 88 | 89 | |
| 89 | 90 | title = manifest.video.title or "PlanOpticon Analysis" |
| @@ -155,11 +156,13 @@ | ||
| 155 | 156 | Updates manifest with output file paths and returns it. |
| 156 | 157 | """ |
| 157 | 158 | output_dir = Path(output_dir) |
| 158 | 159 | |
| 159 | 160 | # Render mermaid diagrams to SVG/PNG |
| 160 | - for i, diagram in enumerate(tqdm(manifest.diagrams, desc="Rendering diagrams", unit="diag") if manifest.diagrams else []): | |
| 161 | + for i, diagram in enumerate( | |
| 162 | + tqdm(manifest.diagrams, desc="Rendering diagrams", unit="diag") if manifest.diagrams else [] | |
| 163 | + ): | |
| 161 | 164 | if diagram.mermaid: |
| 162 | 165 | diagrams_dir = output_dir / "diagrams" |
| 163 | 166 | prefix = f"diagram_{i}" |
| 164 | 167 | paths = render_mermaid(diagram.mermaid, diagrams_dir, prefix) |
| 165 | 168 | if "svg" in paths: |
| 166 | 169 |
| --- video_processor/utils/export.py | |
| +++ video_processor/utils/export.py | |
| @@ -1,15 +1,14 @@ | |
| 1 | """Multi-format output orchestration.""" |
| 2 | |
| 3 | import json |
| 4 | import logging |
| 5 | from pathlib import Path |
| 6 | from typing import Optional |
| 7 | |
| 8 | from tqdm import tqdm |
| 9 | |
| 10 | from video_processor.models import DiagramResult, VideoManifest |
| 11 | from video_processor.utils.rendering import render_mermaid, reproduce_chart |
| 12 | |
| 13 | logger = logging.getLogger(__name__) |
| 14 | |
| 15 | |
| @@ -79,11 +78,13 @@ | |
| 79 | svg_path = output_dir / d.svg_path if d.svg_path else None |
| 80 | if svg_path and svg_path.exists(): |
| 81 | svg_content = svg_path.read_text() |
| 82 | diag_html += f'<div class="diagram">{svg_content}</div>' |
| 83 | elif d.image_path: |
| 84 | diag_html += f'<img src="{d.image_path}" alt="Diagram {i + 1}" style="max-width:100%">' |
| 85 | if d.mermaid: |
| 86 | diag_html += f'<pre class="mermaid">{d.mermaid}</pre>' |
| 87 | sections.append(diag_html) |
| 88 | |
| 89 | title = manifest.video.title or "PlanOpticon Analysis" |
| @@ -155,11 +156,13 @@ | |
| 155 | Updates manifest with output file paths and returns it. |
| 156 | """ |
| 157 | output_dir = Path(output_dir) |
| 158 | |
| 159 | # Render mermaid diagrams to SVG/PNG |
| 160 | for i, diagram in enumerate(tqdm(manifest.diagrams, desc="Rendering diagrams", unit="diag") if manifest.diagrams else []): |
| 161 | if diagram.mermaid: |
| 162 | diagrams_dir = output_dir / "diagrams" |
| 163 | prefix = f"diagram_{i}" |
| 164 | paths = render_mermaid(diagram.mermaid, diagrams_dir, prefix) |
| 165 | if "svg" in paths: |
| 166 |
| --- video_processor/utils/export.py | |
| +++ video_processor/utils/export.py | |
| @@ -1,15 +1,14 @@ | |
| 1 | """Multi-format output orchestration.""" |
| 2 | |
| 3 | import logging |
| 4 | from pathlib import Path |
| 5 | from typing import Optional |
| 6 | |
| 7 | from tqdm import tqdm |
| 8 | |
| 9 | from video_processor.models import VideoManifest |
| 10 | from video_processor.utils.rendering import render_mermaid, reproduce_chart |
| 11 | |
| 12 | logger = logging.getLogger(__name__) |
| 13 | |
| 14 | |
| @@ -79,11 +78,13 @@ | |
| 78 | svg_path = output_dir / d.svg_path if d.svg_path else None |
| 79 | if svg_path and svg_path.exists(): |
| 80 | svg_content = svg_path.read_text() |
| 81 | diag_html += f'<div class="diagram">{svg_content}</div>' |
| 82 | elif d.image_path: |
| 83 | diag_html += ( |
| 84 | f'<img src="{d.image_path}" alt="Diagram {i + 1}" style="max-width:100%">' |
| 85 | ) |
| 86 | if d.mermaid: |
| 87 | diag_html += f'<pre class="mermaid">{d.mermaid}</pre>' |
| 88 | sections.append(diag_html) |
| 89 | |
| 90 | title = manifest.video.title or "PlanOpticon Analysis" |
| @@ -155,11 +156,13 @@ | |
| 156 | Updates manifest with output file paths and returns it. |
| 157 | """ |
| 158 | output_dir = Path(output_dir) |
| 159 | |
| 160 | # Render mermaid diagrams to SVG/PNG |
| 161 | for i, diagram in enumerate( |
| 162 | tqdm(manifest.diagrams, desc="Rendering diagrams", unit="diag") if manifest.diagrams else [] |
| 163 | ): |
| 164 | if diagram.mermaid: |
| 165 | diagrams_dir = output_dir / "diagrams" |
| 166 | prefix = f"diagram_{i}" |
| 167 | paths = render_mermaid(diagram.mermaid, diagrams_dir, prefix) |
| 168 | if "svg" in paths: |
| 169 |
| --- video_processor/utils/prompt_templates.py | ||
| +++ video_processor/utils/prompt_templates.py | ||
| @@ -1,152 +1,153 @@ | ||
| 1 | 1 | """Prompt templates for LLM-based content analysis.""" |
| 2 | -import json | |
| 2 | + | |
| 3 | 3 | import logging |
| 4 | -import os | |
| 5 | 4 | from pathlib import Path |
| 6 | 5 | from string import Template |
| 7 | -from typing import Any, Dict, List, Optional, Union | |
| 6 | +from typing import Dict, Optional, Union | |
| 8 | 7 | |
| 9 | 8 | logger = logging.getLogger(__name__) |
| 9 | + | |
| 10 | 10 | |
| 11 | 11 | class PromptTemplate: |
| 12 | 12 | """Template manager for LLM prompts.""" |
| 13 | - | |
| 13 | + | |
| 14 | 14 | def __init__( |
| 15 | - self, | |
| 15 | + self, | |
| 16 | 16 | templates_dir: Optional[Union[str, Path]] = None, |
| 17 | - default_templates: Optional[Dict[str, str]] = None | |
| 17 | + default_templates: Optional[Dict[str, str]] = None, | |
| 18 | 18 | ): |
| 19 | 19 | """ |
| 20 | 20 | Initialize prompt template manager. |
| 21 | - | |
| 21 | + | |
| 22 | 22 | Parameters |
| 23 | 23 | ---------- |
| 24 | 24 | templates_dir : str or Path, optional |
| 25 | 25 | Directory containing template files |
| 26 | 26 | default_templates : dict, optional |
| 27 | 27 | Default templates to use |
| 28 | 28 | """ |
| 29 | 29 | self.templates_dir = Path(templates_dir) if templates_dir else None |
| 30 | 30 | self.templates = {} |
| 31 | - | |
| 31 | + | |
| 32 | 32 | # Load default templates |
| 33 | 33 | if default_templates: |
| 34 | 34 | self.templates.update(default_templates) |
| 35 | - | |
| 35 | + | |
| 36 | 36 | # Load templates from directory if provided |
| 37 | 37 | if self.templates_dir and self.templates_dir.exists(): |
| 38 | 38 | self._load_templates_from_dir() |
| 39 | - | |
| 39 | + | |
| 40 | 40 | def _load_templates_from_dir(self) -> None: |
| 41 | 41 | """Load templates from template directory.""" |
| 42 | 42 | if not self.templates_dir: |
| 43 | 43 | return |
| 44 | - | |
| 44 | + | |
| 45 | 45 | for template_file in self.templates_dir.glob("*.txt"): |
| 46 | 46 | template_name = template_file.stem |
| 47 | 47 | try: |
| 48 | 48 | with open(template_file, "r", encoding="utf-8") as f: |
| 49 | 49 | template_content = f.read() |
| 50 | 50 | self.templates[template_name] = template_content |
| 51 | 51 | logger.debug(f"Loaded template: {template_name}") |
| 52 | 52 | except Exception as e: |
| 53 | 53 | logger.warning(f"Error loading template {template_name}: {str(e)}") |
| 54 | - | |
| 54 | + | |
| 55 | 55 | def get_template(self, template_name: str) -> Optional[Template]: |
| 56 | 56 | """ |
| 57 | 57 | Get template by name. |
| 58 | - | |
| 58 | + | |
| 59 | 59 | Parameters |
| 60 | 60 | ---------- |
| 61 | 61 | template_name : str |
| 62 | 62 | Template name |
| 63 | - | |
| 63 | + | |
| 64 | 64 | Returns |
| 65 | 65 | ------- |
| 66 | 66 | Template or None |
| 67 | 67 | Template object if found, None otherwise |
| 68 | 68 | """ |
| 69 | 69 | if template_name not in self.templates: |
| 70 | 70 | logger.warning(f"Template not found: {template_name}") |
| 71 | 71 | return None |
| 72 | - | |
| 72 | + | |
| 73 | 73 | return Template(self.templates[template_name]) |
| 74 | - | |
| 74 | + | |
| 75 | 75 | def format_prompt(self, template_name: str, **kwargs) -> Optional[str]: |
| 76 | 76 | """ |
| 77 | 77 | Format prompt with provided parameters. |
| 78 | - | |
| 78 | + | |
| 79 | 79 | Parameters |
| 80 | 80 | ---------- |
| 81 | 81 | template_name : str |
| 82 | 82 | Template name |
| 83 | 83 | **kwargs : dict |
| 84 | 84 | Template parameters |
| 85 | - | |
| 85 | + | |
| 86 | 86 | Returns |
| 87 | 87 | ------- |
| 88 | 88 | str or None |
| 89 | 89 | Formatted prompt if template exists, None otherwise |
| 90 | 90 | """ |
| 91 | 91 | template = self.get_template(template_name) |
| 92 | 92 | if not template: |
| 93 | 93 | return None |
| 94 | - | |
| 94 | + | |
| 95 | 95 | try: |
| 96 | 96 | return template.safe_substitute(**kwargs) |
| 97 | 97 | except Exception as e: |
| 98 | 98 | logger.error(f"Error formatting template {template_name}: {str(e)}") |
| 99 | 99 | return None |
| 100 | - | |
| 100 | + | |
| 101 | 101 | def add_template(self, template_name: str, template_content: str) -> None: |
| 102 | 102 | """ |
| 103 | 103 | Add or update template. |
| 104 | - | |
| 104 | + | |
| 105 | 105 | Parameters |
| 106 | 106 | ---------- |
| 107 | 107 | template_name : str |
| 108 | 108 | Template name |
| 109 | 109 | template_content : str |
| 110 | 110 | Template content |
| 111 | 111 | """ |
| 112 | 112 | self.templates[template_name] = template_content |
| 113 | - | |
| 113 | + | |
| 114 | 114 | def save_template(self, template_name: str) -> bool: |
| 115 | 115 | """ |
| 116 | 116 | Save template to file. |
| 117 | - | |
| 117 | + | |
| 118 | 118 | Parameters |
| 119 | 119 | ---------- |
| 120 | 120 | template_name : str |
| 121 | 121 | Template name |
| 122 | - | |
| 122 | + | |
| 123 | 123 | Returns |
| 124 | 124 | ------- |
| 125 | 125 | bool |
| 126 | 126 | True if successful, False otherwise |
| 127 | 127 | """ |
| 128 | 128 | if not self.templates_dir: |
| 129 | 129 | logger.error("Templates directory not set") |
| 130 | 130 | return False |
| 131 | - | |
| 131 | + | |
| 132 | 132 | if template_name not in self.templates: |
| 133 | 133 | logger.warning(f"Template not found: {template_name}") |
| 134 | 134 | return False |
| 135 | - | |
| 135 | + | |
| 136 | 136 | try: |
| 137 | 137 | self.templates_dir.mkdir(parents=True, exist_ok=True) |
| 138 | 138 | template_path = self.templates_dir / f"{template_name}.txt" |
| 139 | - | |
| 139 | + | |
| 140 | 140 | with open(template_path, "w", encoding="utf-8") as f: |
| 141 | 141 | f.write(self.templates[template_name]) |
| 142 | - | |
| 142 | + | |
| 143 | 143 | logger.debug(f"Saved template: {template_name}") |
| 144 | 144 | return True |
| 145 | 145 | except Exception as e: |
| 146 | 146 | logger.error(f"Error saving template {template_name}: {str(e)}") |
| 147 | 147 | return False |
| 148 | + | |
| 148 | 149 | |
| 149 | 150 | # Default prompt templates |
| 150 | 151 | DEFAULT_TEMPLATES = { |
| 151 | 152 | "content_analysis": """ |
| 152 | 153 | Analyze the provided video content and extract key information: |
| @@ -161,50 +162,48 @@ | ||
| 161 | 162 | - Main topics and themes |
| 162 | 163 | - Key points for each topic |
| 163 | 164 | - Important details or facts |
| 164 | 165 | - Action items or follow-ups |
| 165 | 166 | - Relationships between concepts |
| 166 | - | |
| 167 | + | |
| 167 | 168 | Format the output as structured markdown. |
| 168 | 169 | """, |
| 169 | - | |
| 170 | 170 | "diagram_extraction": """ |
| 171 | - Analyze the following image that contains a diagram, whiteboard content, or other visual information. | |
| 172 | - | |
| 171 | + Analyze the following image that contains a diagram, whiteboard content, | |
| 172 | + or other visual information. | |
| 173 | + | |
| 173 | 174 | Extract and convert this visual information into a structured representation. |
| 174 | - | |
| 175 | + | |
| 175 | 176 | If it's a flowchart, process diagram, or similar structured visual: |
| 176 | 177 | - Identify the components and their relationships |
| 177 | 178 | - Preserve the logical flow and structure |
| 178 | 179 | - Convert it to mermaid diagram syntax |
| 179 | - | |
| 180 | + | |
| 180 | 181 | If it's a whiteboard with text, bullet points, or unstructured content: |
| 181 | 182 | - Extract all text elements |
| 182 | 183 | - Preserve hierarchical organization if present |
| 183 | 184 | - Maintain any emphasized or highlighted elements |
| 184 | - | |
| 185 | + | |
| 185 | 186 | Image context: $image_context |
| 186 | - | |
| 187 | + | |
| 187 | 188 | Return the results as markdown with appropriate structure. |
| 188 | 189 | """, |
| 189 | - | |
| 190 | 190 | "action_item_detection": """ |
| 191 | 191 | Review the following transcript and identify all action items, commitments, or follow-up tasks. |
| 192 | - | |
| 192 | + | |
| 193 | 193 | TRANSCRIPT: |
| 194 | 194 | $transcript |
| 195 | - | |
| 195 | + | |
| 196 | 196 | For each action item, extract: |
| 197 | 197 | - The specific action to be taken |
| 198 | 198 | - Who is responsible (if mentioned) |
| 199 | 199 | - Any deadlines or timeframes |
| 200 | 200 | - Priority level (if indicated) |
| 201 | 201 | - Context or additional details |
| 202 | - | |
| 202 | + | |
| 203 | 203 | Format the results as a structured list of action items. |
| 204 | 204 | """, |
| 205 | - | |
| 206 | 205 | "content_summary": """ |
| 207 | 206 | Provide a concise summary of the following content: |
| 208 | 207 | |
| 209 | 208 | $content |
| 210 | 209 | |
| @@ -214,11 +213,10 @@ | ||
| 214 | 213 | - Focus on the most important information |
| 215 | 214 | - Maintain a neutral, objective tone |
| 216 | 215 | |
| 217 | 216 | Format the summary as clear, readable text. |
| 218 | 217 | """, |
| 219 | - | |
| 220 | 218 | "summary_generation": """ |
| 221 | 219 | Generate a comprehensive summary of the following transcript content. |
| 222 | 220 | |
| 223 | 221 | CONTENT: |
| 224 | 222 | $content |
| @@ -229,11 +227,10 @@ | ||
| 229 | 227 | - Notes any important context or background |
| 230 | 228 | - Is 3-5 paragraphs long |
| 231 | 229 | |
| 232 | 230 | Write in clear, professional prose. |
| 233 | 231 | """, |
| 234 | - | |
| 235 | 232 | "key_points_extraction": """ |
| 236 | 233 | Extract the key points from the following content. |
| 237 | 234 | |
| 238 | 235 | CONTENT: |
| 239 | 236 | $content |
| @@ -243,31 +240,30 @@ | ||
| 243 | 240 | - "topic": category or topic area (optional) |
| 244 | 241 | - "details": supporting details (optional) |
| 245 | 242 | |
| 246 | 243 | Example format: |
| 247 | 244 | [ |
| 248 | - {"point": "The system uses microservices architecture", "topic": "Architecture", "details": "Each service handles a specific domain"}, | |
| 249 | - {"point": "Migration is planned for Q2", "topic": "Timeline", "details": null} | |
| 245 | + {"point": "The system uses microservices architecture", | |
| 246 | + "topic": "Architecture", "details": "Each service handles a specific domain"}, | |
| 250 | 247 | ] |
| 251 | 248 | |
| 252 | 249 | Return ONLY the JSON array, no additional text. |
| 253 | 250 | """, |
| 254 | - | |
| 255 | 251 | "entity_extraction": """ |
| 256 | - Extract all notable entities (people, concepts, technologies, organizations, time references) from the following content. | |
| 257 | - | |
| 252 | + Extract all notable entities (people, concepts, technologies, organizations, | |
| 253 | + time references) from the following content. | |
| 258 | 254 | CONTENT: |
| 259 | 255 | $content |
| 260 | 256 | |
| 261 | 257 | Return a JSON array of entity objects: |
| 262 | 258 | [ |
| 263 | - {"name": "entity name", "type": "person|concept|technology|organization|time", "description": "brief description"} | |
| 264 | - ] | |
| 259 | + {"name": "entity name", | |
| 260 | + "type": "person|concept|technology|organization|time", | |
| 261 | + "description": "brief description"} | |
| 265 | 262 | |
| 266 | 263 | Return ONLY the JSON array, no additional text. |
| 267 | 264 | """, |
| 268 | - | |
| 269 | 265 | "relationship_extraction": """ |
| 270 | 266 | Given the following content and entities, identify relationships between them. |
| 271 | 267 | |
| 272 | 268 | CONTENT: |
| 273 | 269 | $content |
| @@ -275,16 +271,15 @@ | ||
| 275 | 271 | KNOWN ENTITIES: |
| 276 | 272 | $entities |
| 277 | 273 | |
| 278 | 274 | Return a JSON array of relationship objects: |
| 279 | 275 | [ |
| 280 | - {"source": "entity A", "target": "entity B", "type": "relationship type (e.g., uses, manages, depends_on, created_by, part_of)"} | |
| 281 | - ] | |
| 276 | + {"source": "entity A", "target": "entity B", | |
| 277 | + "type": "relationship type (e.g., uses, manages, depends_on, created_by, part_of)"} | |
| 282 | 278 | |
| 283 | 279 | Return ONLY the JSON array, no additional text. |
| 284 | 280 | """, |
| 285 | - | |
| 286 | 281 | "diagram_analysis": """ |
| 287 | 282 | Analyze the following text extracted from a diagram or visual element. |
| 288 | 283 | |
| 289 | 284 | DIAGRAM TEXT: |
| 290 | 285 | $diagram_text |
| @@ -303,11 +298,10 @@ | ||
| 303 | 298 | "summary": "brief description of what the diagram shows" |
| 304 | 299 | } |
| 305 | 300 | |
| 306 | 301 | Return ONLY the JSON object, no additional text. |
| 307 | 302 | """, |
| 308 | - | |
| 309 | 303 | "mermaid_generation": """ |
| 310 | 304 | Convert the following diagram information into valid Mermaid diagram syntax. |
| 311 | 305 | |
| 312 | 306 | Diagram Type: $diagram_type |
| 313 | 307 | Text Content: $text_content |
| @@ -315,10 +309,10 @@ | ||
| 315 | 309 | |
| 316 | 310 | Generate a Mermaid diagram that accurately represents the visual structure. |
| 317 | 311 | Use the appropriate Mermaid diagram type (graph, sequenceDiagram, classDiagram, etc.). |
| 318 | 312 | |
| 319 | 313 | Return ONLY the Mermaid code, no markdown fences or explanations. |
| 320 | - """ | |
| 314 | + """, | |
| 321 | 315 | } |
| 322 | 316 | |
| 323 | 317 | # Create default prompt template manager |
| 324 | 318 | default_prompt_manager = PromptTemplate(default_templates=DEFAULT_TEMPLATES) |
| 325 | 319 |
| --- video_processor/utils/prompt_templates.py | |
| +++ video_processor/utils/prompt_templates.py | |
| @@ -1,152 +1,153 @@ | |
| 1 | """Prompt templates for LLM-based content analysis.""" |
| 2 | import json |
| 3 | import logging |
| 4 | import os |
| 5 | from pathlib import Path |
| 6 | from string import Template |
| 7 | from typing import Any, Dict, List, Optional, Union |
| 8 | |
| 9 | logger = logging.getLogger(__name__) |
| 10 | |
| 11 | class PromptTemplate: |
| 12 | """Template manager for LLM prompts.""" |
| 13 | |
| 14 | def __init__( |
| 15 | self, |
| 16 | templates_dir: Optional[Union[str, Path]] = None, |
| 17 | default_templates: Optional[Dict[str, str]] = None |
| 18 | ): |
| 19 | """ |
| 20 | Initialize prompt template manager. |
| 21 | |
| 22 | Parameters |
| 23 | ---------- |
| 24 | templates_dir : str or Path, optional |
| 25 | Directory containing template files |
| 26 | default_templates : dict, optional |
| 27 | Default templates to use |
| 28 | """ |
| 29 | self.templates_dir = Path(templates_dir) if templates_dir else None |
| 30 | self.templates = {} |
| 31 | |
| 32 | # Load default templates |
| 33 | if default_templates: |
| 34 | self.templates.update(default_templates) |
| 35 | |
| 36 | # Load templates from directory if provided |
| 37 | if self.templates_dir and self.templates_dir.exists(): |
| 38 | self._load_templates_from_dir() |
| 39 | |
| 40 | def _load_templates_from_dir(self) -> None: |
| 41 | """Load templates from template directory.""" |
| 42 | if not self.templates_dir: |
| 43 | return |
| 44 | |
| 45 | for template_file in self.templates_dir.glob("*.txt"): |
| 46 | template_name = template_file.stem |
| 47 | try: |
| 48 | with open(template_file, "r", encoding="utf-8") as f: |
| 49 | template_content = f.read() |
| 50 | self.templates[template_name] = template_content |
| 51 | logger.debug(f"Loaded template: {template_name}") |
| 52 | except Exception as e: |
| 53 | logger.warning(f"Error loading template {template_name}: {str(e)}") |
| 54 | |
| 55 | def get_template(self, template_name: str) -> Optional[Template]: |
| 56 | """ |
| 57 | Get template by name. |
| 58 | |
| 59 | Parameters |
| 60 | ---------- |
| 61 | template_name : str |
| 62 | Template name |
| 63 | |
| 64 | Returns |
| 65 | ------- |
| 66 | Template or None |
| 67 | Template object if found, None otherwise |
| 68 | """ |
| 69 | if template_name not in self.templates: |
| 70 | logger.warning(f"Template not found: {template_name}") |
| 71 | return None |
| 72 | |
| 73 | return Template(self.templates[template_name]) |
| 74 | |
| 75 | def format_prompt(self, template_name: str, **kwargs) -> Optional[str]: |
| 76 | """ |
| 77 | Format prompt with provided parameters. |
| 78 | |
| 79 | Parameters |
| 80 | ---------- |
| 81 | template_name : str |
| 82 | Template name |
| 83 | **kwargs : dict |
| 84 | Template parameters |
| 85 | |
| 86 | Returns |
| 87 | ------- |
| 88 | str or None |
| 89 | Formatted prompt if template exists, None otherwise |
| 90 | """ |
| 91 | template = self.get_template(template_name) |
| 92 | if not template: |
| 93 | return None |
| 94 | |
| 95 | try: |
| 96 | return template.safe_substitute(**kwargs) |
| 97 | except Exception as e: |
| 98 | logger.error(f"Error formatting template {template_name}: {str(e)}") |
| 99 | return None |
| 100 | |
| 101 | def add_template(self, template_name: str, template_content: str) -> None: |
| 102 | """ |
| 103 | Add or update template. |
| 104 | |
| 105 | Parameters |
| 106 | ---------- |
| 107 | template_name : str |
| 108 | Template name |
| 109 | template_content : str |
| 110 | Template content |
| 111 | """ |
| 112 | self.templates[template_name] = template_content |
| 113 | |
| 114 | def save_template(self, template_name: str) -> bool: |
| 115 | """ |
| 116 | Save template to file. |
| 117 | |
| 118 | Parameters |
| 119 | ---------- |
| 120 | template_name : str |
| 121 | Template name |
| 122 | |
| 123 | Returns |
| 124 | ------- |
| 125 | bool |
| 126 | True if successful, False otherwise |
| 127 | """ |
| 128 | if not self.templates_dir: |
| 129 | logger.error("Templates directory not set") |
| 130 | return False |
| 131 | |
| 132 | if template_name not in self.templates: |
| 133 | logger.warning(f"Template not found: {template_name}") |
| 134 | return False |
| 135 | |
| 136 | try: |
| 137 | self.templates_dir.mkdir(parents=True, exist_ok=True) |
| 138 | template_path = self.templates_dir / f"{template_name}.txt" |
| 139 | |
| 140 | with open(template_path, "w", encoding="utf-8") as f: |
| 141 | f.write(self.templates[template_name]) |
| 142 | |
| 143 | logger.debug(f"Saved template: {template_name}") |
| 144 | return True |
| 145 | except Exception as e: |
| 146 | logger.error(f"Error saving template {template_name}: {str(e)}") |
| 147 | return False |
| 148 | |
| 149 | # Default prompt templates |
| 150 | DEFAULT_TEMPLATES = { |
| 151 | "content_analysis": """ |
| 152 | Analyze the provided video content and extract key information: |
| @@ -161,50 +162,48 @@ | |
| 161 | - Main topics and themes |
| 162 | - Key points for each topic |
| 163 | - Important details or facts |
| 164 | - Action items or follow-ups |
| 165 | - Relationships between concepts |
| 166 | |
| 167 | Format the output as structured markdown. |
| 168 | """, |
| 169 | |
| 170 | "diagram_extraction": """ |
| 171 | Analyze the following image that contains a diagram, whiteboard content, or other visual information. |
| 172 | |
| 173 | Extract and convert this visual information into a structured representation. |
| 174 | |
| 175 | If it's a flowchart, process diagram, or similar structured visual: |
| 176 | - Identify the components and their relationships |
| 177 | - Preserve the logical flow and structure |
| 178 | - Convert it to mermaid diagram syntax |
| 179 | |
| 180 | If it's a whiteboard with text, bullet points, or unstructured content: |
| 181 | - Extract all text elements |
| 182 | - Preserve hierarchical organization if present |
| 183 | - Maintain any emphasized or highlighted elements |
| 184 | |
| 185 | Image context: $image_context |
| 186 | |
| 187 | Return the results as markdown with appropriate structure. |
| 188 | """, |
| 189 | |
| 190 | "action_item_detection": """ |
| 191 | Review the following transcript and identify all action items, commitments, or follow-up tasks. |
| 192 | |
| 193 | TRANSCRIPT: |
| 194 | $transcript |
| 195 | |
| 196 | For each action item, extract: |
| 197 | - The specific action to be taken |
| 198 | - Who is responsible (if mentioned) |
| 199 | - Any deadlines or timeframes |
| 200 | - Priority level (if indicated) |
| 201 | - Context or additional details |
| 202 | |
| 203 | Format the results as a structured list of action items. |
| 204 | """, |
| 205 | |
| 206 | "content_summary": """ |
| 207 | Provide a concise summary of the following content: |
| 208 | |
| 209 | $content |
| 210 | |
| @@ -214,11 +213,10 @@ | |
| 214 | - Focus on the most important information |
| 215 | - Maintain a neutral, objective tone |
| 216 | |
| 217 | Format the summary as clear, readable text. |
| 218 | """, |
| 219 | |
| 220 | "summary_generation": """ |
| 221 | Generate a comprehensive summary of the following transcript content. |
| 222 | |
| 223 | CONTENT: |
| 224 | $content |
| @@ -229,11 +227,10 @@ | |
| 229 | - Notes any important context or background |
| 230 | - Is 3-5 paragraphs long |
| 231 | |
| 232 | Write in clear, professional prose. |
| 233 | """, |
| 234 | |
| 235 | "key_points_extraction": """ |
| 236 | Extract the key points from the following content. |
| 237 | |
| 238 | CONTENT: |
| 239 | $content |
| @@ -243,31 +240,30 @@ | |
| 243 | - "topic": category or topic area (optional) |
| 244 | - "details": supporting details (optional) |
| 245 | |
| 246 | Example format: |
| 247 | [ |
| 248 | {"point": "The system uses microservices architecture", "topic": "Architecture", "details": "Each service handles a specific domain"}, |
| 249 | {"point": "Migration is planned for Q2", "topic": "Timeline", "details": null} |
| 250 | ] |
| 251 | |
| 252 | Return ONLY the JSON array, no additional text. |
| 253 | """, |
| 254 | |
| 255 | "entity_extraction": """ |
| 256 | Extract all notable entities (people, concepts, technologies, organizations, time references) from the following content. |
| 257 | |
| 258 | CONTENT: |
| 259 | $content |
| 260 | |
| 261 | Return a JSON array of entity objects: |
| 262 | [ |
| 263 | {"name": "entity name", "type": "person|concept|technology|organization|time", "description": "brief description"} |
| 264 | ] |
| 265 | |
| 266 | Return ONLY the JSON array, no additional text. |
| 267 | """, |
| 268 | |
| 269 | "relationship_extraction": """ |
| 270 | Given the following content and entities, identify relationships between them. |
| 271 | |
| 272 | CONTENT: |
| 273 | $content |
| @@ -275,16 +271,15 @@ | |
| 275 | KNOWN ENTITIES: |
| 276 | $entities |
| 277 | |
| 278 | Return a JSON array of relationship objects: |
| 279 | [ |
| 280 | {"source": "entity A", "target": "entity B", "type": "relationship type (e.g., uses, manages, depends_on, created_by, part_of)"} |
| 281 | ] |
| 282 | |
| 283 | Return ONLY the JSON array, no additional text. |
| 284 | """, |
| 285 | |
| 286 | "diagram_analysis": """ |
| 287 | Analyze the following text extracted from a diagram or visual element. |
| 288 | |
| 289 | DIAGRAM TEXT: |
| 290 | $diagram_text |
| @@ -303,11 +298,10 @@ | |
| 303 | "summary": "brief description of what the diagram shows" |
| 304 | } |
| 305 | |
| 306 | Return ONLY the JSON object, no additional text. |
| 307 | """, |
| 308 | |
| 309 | "mermaid_generation": """ |
| 310 | Convert the following diagram information into valid Mermaid diagram syntax. |
| 311 | |
| 312 | Diagram Type: $diagram_type |
| 313 | Text Content: $text_content |
| @@ -315,10 +309,10 @@ | |
| 315 | |
| 316 | Generate a Mermaid diagram that accurately represents the visual structure. |
| 317 | Use the appropriate Mermaid diagram type (graph, sequenceDiagram, classDiagram, etc.). |
| 318 | |
| 319 | Return ONLY the Mermaid code, no markdown fences or explanations. |
| 320 | """ |
| 321 | } |
| 322 | |
| 323 | # Create default prompt template manager |
| 324 | default_prompt_manager = PromptTemplate(default_templates=DEFAULT_TEMPLATES) |
| 325 |
| --- video_processor/utils/prompt_templates.py | |
| +++ video_processor/utils/prompt_templates.py | |
| @@ -1,152 +1,153 @@ | |
| 1 | """Prompt templates for LLM-based content analysis.""" |
| 2 | |
| 3 | import logging |
| 4 | from pathlib import Path |
| 5 | from string import Template |
| 6 | from typing import Dict, Optional, Union |
| 7 | |
| 8 | logger = logging.getLogger(__name__) |
| 9 | |
| 10 | |
| 11 | class PromptTemplate: |
| 12 | """Template manager for LLM prompts.""" |
| 13 | |
| 14 | def __init__( |
| 15 | self, |
| 16 | templates_dir: Optional[Union[str, Path]] = None, |
| 17 | default_templates: Optional[Dict[str, str]] = None, |
| 18 | ): |
| 19 | """ |
| 20 | Initialize prompt template manager. |
| 21 | |
| 22 | Parameters |
| 23 | ---------- |
| 24 | templates_dir : str or Path, optional |
| 25 | Directory containing template files |
| 26 | default_templates : dict, optional |
| 27 | Default templates to use |
| 28 | """ |
| 29 | self.templates_dir = Path(templates_dir) if templates_dir else None |
| 30 | self.templates = {} |
| 31 | |
| 32 | # Load default templates |
| 33 | if default_templates: |
| 34 | self.templates.update(default_templates) |
| 35 | |
| 36 | # Load templates from directory if provided |
| 37 | if self.templates_dir and self.templates_dir.exists(): |
| 38 | self._load_templates_from_dir() |
| 39 | |
| 40 | def _load_templates_from_dir(self) -> None: |
| 41 | """Load templates from template directory.""" |
| 42 | if not self.templates_dir: |
| 43 | return |
| 44 | |
| 45 | for template_file in self.templates_dir.glob("*.txt"): |
| 46 | template_name = template_file.stem |
| 47 | try: |
| 48 | with open(template_file, "r", encoding="utf-8") as f: |
| 49 | template_content = f.read() |
| 50 | self.templates[template_name] = template_content |
| 51 | logger.debug(f"Loaded template: {template_name}") |
| 52 | except Exception as e: |
| 53 | logger.warning(f"Error loading template {template_name}: {str(e)}") |
| 54 | |
| 55 | def get_template(self, template_name: str) -> Optional[Template]: |
| 56 | """ |
| 57 | Get template by name. |
| 58 | |
| 59 | Parameters |
| 60 | ---------- |
| 61 | template_name : str |
| 62 | Template name |
| 63 | |
| 64 | Returns |
| 65 | ------- |
| 66 | Template or None |
| 67 | Template object if found, None otherwise |
| 68 | """ |
| 69 | if template_name not in self.templates: |
| 70 | logger.warning(f"Template not found: {template_name}") |
| 71 | return None |
| 72 | |
| 73 | return Template(self.templates[template_name]) |
| 74 | |
| 75 | def format_prompt(self, template_name: str, **kwargs) -> Optional[str]: |
| 76 | """ |
| 77 | Format prompt with provided parameters. |
| 78 | |
| 79 | Parameters |
| 80 | ---------- |
| 81 | template_name : str |
| 82 | Template name |
| 83 | **kwargs : dict |
| 84 | Template parameters |
| 85 | |
| 86 | Returns |
| 87 | ------- |
| 88 | str or None |
| 89 | Formatted prompt if template exists, None otherwise |
| 90 | """ |
| 91 | template = self.get_template(template_name) |
| 92 | if not template: |
| 93 | return None |
| 94 | |
| 95 | try: |
| 96 | return template.safe_substitute(**kwargs) |
| 97 | except Exception as e: |
| 98 | logger.error(f"Error formatting template {template_name}: {str(e)}") |
| 99 | return None |
| 100 | |
| 101 | def add_template(self, template_name: str, template_content: str) -> None: |
| 102 | """ |
| 103 | Add or update template. |
| 104 | |
| 105 | Parameters |
| 106 | ---------- |
| 107 | template_name : str |
| 108 | Template name |
| 109 | template_content : str |
| 110 | Template content |
| 111 | """ |
| 112 | self.templates[template_name] = template_content |
| 113 | |
| 114 | def save_template(self, template_name: str) -> bool: |
| 115 | """ |
| 116 | Save template to file. |
| 117 | |
| 118 | Parameters |
| 119 | ---------- |
| 120 | template_name : str |
| 121 | Template name |
| 122 | |
| 123 | Returns |
| 124 | ------- |
| 125 | bool |
| 126 | True if successful, False otherwise |
| 127 | """ |
| 128 | if not self.templates_dir: |
| 129 | logger.error("Templates directory not set") |
| 130 | return False |
| 131 | |
| 132 | if template_name not in self.templates: |
| 133 | logger.warning(f"Template not found: {template_name}") |
| 134 | return False |
| 135 | |
| 136 | try: |
| 137 | self.templates_dir.mkdir(parents=True, exist_ok=True) |
| 138 | template_path = self.templates_dir / f"{template_name}.txt" |
| 139 | |
| 140 | with open(template_path, "w", encoding="utf-8") as f: |
| 141 | f.write(self.templates[template_name]) |
| 142 | |
| 143 | logger.debug(f"Saved template: {template_name}") |
| 144 | return True |
| 145 | except Exception as e: |
| 146 | logger.error(f"Error saving template {template_name}: {str(e)}") |
| 147 | return False |
| 148 | |
| 149 | |
| 150 | # Default prompt templates |
| 151 | DEFAULT_TEMPLATES = { |
| 152 | "content_analysis": """ |
| 153 | Analyze the provided video content and extract key information: |
| @@ -161,50 +162,48 @@ | |
| 162 | - Main topics and themes |
| 163 | - Key points for each topic |
| 164 | - Important details or facts |
| 165 | - Action items or follow-ups |
| 166 | - Relationships between concepts |
| 167 | |
| 168 | Format the output as structured markdown. |
| 169 | """, |
| 170 | "diagram_extraction": """ |
| 171 | Analyze the following image that contains a diagram, whiteboard content, |
| 172 | or other visual information. |
| 173 | |
| 174 | Extract and convert this visual information into a structured representation. |
| 175 | |
| 176 | If it's a flowchart, process diagram, or similar structured visual: |
| 177 | - Identify the components and their relationships |
| 178 | - Preserve the logical flow and structure |
| 179 | - Convert it to mermaid diagram syntax |
| 180 | |
| 181 | If it's a whiteboard with text, bullet points, or unstructured content: |
| 182 | - Extract all text elements |
| 183 | - Preserve hierarchical organization if present |
| 184 | - Maintain any emphasized or highlighted elements |
| 185 | |
| 186 | Image context: $image_context |
| 187 | |
| 188 | Return the results as markdown with appropriate structure. |
| 189 | """, |
| 190 | "action_item_detection": """ |
| 191 | Review the following transcript and identify all action items, commitments, or follow-up tasks. |
| 192 | |
| 193 | TRANSCRIPT: |
| 194 | $transcript |
| 195 | |
| 196 | For each action item, extract: |
| 197 | - The specific action to be taken |
| 198 | - Who is responsible (if mentioned) |
| 199 | - Any deadlines or timeframes |
| 200 | - Priority level (if indicated) |
| 201 | - Context or additional details |
| 202 | |
| 203 | Format the results as a structured list of action items. |
| 204 | """, |
| 205 | "content_summary": """ |
| 206 | Provide a concise summary of the following content: |
| 207 | |
| 208 | $content |
| 209 | |
| @@ -214,11 +213,10 @@ | |
| 213 | - Focus on the most important information |
| 214 | - Maintain a neutral, objective tone |
| 215 | |
| 216 | Format the summary as clear, readable text. |
| 217 | """, |
| 218 | "summary_generation": """ |
| 219 | Generate a comprehensive summary of the following transcript content. |
| 220 | |
| 221 | CONTENT: |
| 222 | $content |
| @@ -229,11 +227,10 @@ | |
| 227 | - Notes any important context or background |
| 228 | - Is 3-5 paragraphs long |
| 229 | |
| 230 | Write in clear, professional prose. |
| 231 | """, |
| 232 | "key_points_extraction": """ |
| 233 | Extract the key points from the following content. |
| 234 | |
| 235 | CONTENT: |
| 236 | $content |
| @@ -243,31 +240,30 @@ | |
| 240 | - "topic": category or topic area (optional) |
| 241 | - "details": supporting details (optional) |
| 242 | |
| 243 | Example format: |
| 244 | [ |
| 245 | {"point": "The system uses microservices architecture", |
| 246 | "topic": "Architecture", "details": "Each service handles a specific domain"}, |
| 247 | ] |
| 248 | |
| 249 | Return ONLY the JSON array, no additional text. |
| 250 | """, |
| 251 | "entity_extraction": """ |
| 252 | Extract all notable entities (people, concepts, technologies, organizations, |
| 253 | time references) from the following content. |
| 254 | CONTENT: |
| 255 | $content |
| 256 | |
| 257 | Return a JSON array of entity objects: |
| 258 | [ |
| 259 | {"name": "entity name", |
| 260 | "type": "person|concept|technology|organization|time", |
| 261 | "description": "brief description"} |
| 262 | |
| 263 | Return ONLY the JSON array, no additional text. |
| 264 | """, |
| 265 | "relationship_extraction": """ |
| 266 | Given the following content and entities, identify relationships between them. |
| 267 | |
| 268 | CONTENT: |
| 269 | $content |
| @@ -275,16 +271,15 @@ | |
| 271 | KNOWN ENTITIES: |
| 272 | $entities |
| 273 | |
| 274 | Return a JSON array of relationship objects: |
| 275 | [ |
| 276 | {"source": "entity A", "target": "entity B", |
| 277 | "type": "relationship type (e.g., uses, manages, depends_on, created_by, part_of)"} |
| 278 | |
| 279 | Return ONLY the JSON array, no additional text. |
| 280 | """, |
| 281 | "diagram_analysis": """ |
| 282 | Analyze the following text extracted from a diagram or visual element. |
| 283 | |
| 284 | DIAGRAM TEXT: |
| 285 | $diagram_text |
| @@ -303,11 +298,10 @@ | |
| 298 | "summary": "brief description of what the diagram shows" |
| 299 | } |
| 300 | |
| 301 | Return ONLY the JSON object, no additional text. |
| 302 | """, |
| 303 | "mermaid_generation": """ |
| 304 | Convert the following diagram information into valid Mermaid diagram syntax. |
| 305 | |
| 306 | Diagram Type: $diagram_type |
| 307 | Text Content: $text_content |
| @@ -315,10 +309,10 @@ | |
| 309 | |
| 310 | Generate a Mermaid diagram that accurately represents the visual structure. |
| 311 | Use the appropriate Mermaid diagram type (graph, sequenceDiagram, classDiagram, etc.). |
| 312 | |
| 313 | Return ONLY the Mermaid code, no markdown fences or explanations. |
| 314 | """, |
| 315 | } |
| 316 | |
| 317 | # Create default prompt template manager |
| 318 | default_prompt_manager = PromptTemplate(default_templates=DEFAULT_TEMPLATES) |
| 319 |
| --- video_processor/utils/rendering.py | ||
| +++ video_processor/utils/rendering.py | ||
| @@ -1,10 +1,10 @@ | ||
| 1 | 1 | """Mermaid rendering and chart reproduction utilities.""" |
| 2 | 2 | |
| 3 | 3 | import logging |
| 4 | 4 | from pathlib import Path |
| 5 | -from typing import Dict, Optional | |
| 5 | +from typing import Dict | |
| 6 | 6 | |
| 7 | 7 | logger = logging.getLogger(__name__) |
| 8 | 8 | |
| 9 | 9 | |
| 10 | 10 | def render_mermaid(mermaid_code: str, output_dir: str | Path, name: str) -> Dict[str, Path]: |
| @@ -47,15 +47,20 @@ | ||
| 47 | 47 | png_content = rendered.img_response |
| 48 | 48 | if png_content: |
| 49 | 49 | if isinstance(png_content, bytes): |
| 50 | 50 | png_path.write_bytes(png_content) |
| 51 | 51 | else: |
| 52 | - png_path.write_bytes(png_content.encode() if isinstance(png_content, str) else png_content) | |
| 52 | + png_path.write_bytes( | |
| 53 | + png_content.encode() if isinstance(png_content, str) else png_content | |
| 54 | + ) | |
| 53 | 55 | result["png"] = png_path |
| 54 | 56 | |
| 55 | 57 | except ImportError: |
| 56 | - logger.warning("mermaid-py not installed, skipping SVG/PNG rendering. Install with: pip install mermaid-py") | |
| 58 | + logger.warning( | |
| 59 | + "mermaid-py not installed, skipping SVG/PNG rendering. " | |
| 60 | + "Install with: pip install mermaid-py" | |
| 61 | + ) | |
| 57 | 62 | except Exception as e: |
| 58 | 63 | logger.warning(f"Mermaid rendering failed for '{name}': {e}") |
| 59 | 64 | |
| 60 | 65 | return result |
| 61 | 66 | |
| 62 | 67 |
| --- video_processor/utils/rendering.py | |
| +++ video_processor/utils/rendering.py | |
| @@ -1,10 +1,10 @@ | |
| 1 | """Mermaid rendering and chart reproduction utilities.""" |
| 2 | |
| 3 | import logging |
| 4 | from pathlib import Path |
| 5 | from typing import Dict, Optional |
| 6 | |
| 7 | logger = logging.getLogger(__name__) |
| 8 | |
| 9 | |
| 10 | def render_mermaid(mermaid_code: str, output_dir: str | Path, name: str) -> Dict[str, Path]: |
| @@ -47,15 +47,20 @@ | |
| 47 | png_content = rendered.img_response |
| 48 | if png_content: |
| 49 | if isinstance(png_content, bytes): |
| 50 | png_path.write_bytes(png_content) |
| 51 | else: |
| 52 | png_path.write_bytes(png_content.encode() if isinstance(png_content, str) else png_content) |
| 53 | result["png"] = png_path |
| 54 | |
| 55 | except ImportError: |
| 56 | logger.warning("mermaid-py not installed, skipping SVG/PNG rendering. Install with: pip install mermaid-py") |
| 57 | except Exception as e: |
| 58 | logger.warning(f"Mermaid rendering failed for '{name}': {e}") |
| 59 | |
| 60 | return result |
| 61 | |
| 62 |
| --- video_processor/utils/rendering.py | |
| +++ video_processor/utils/rendering.py | |
| @@ -1,10 +1,10 @@ | |
| 1 | """Mermaid rendering and chart reproduction utilities.""" |
| 2 | |
| 3 | import logging |
| 4 | from pathlib import Path |
| 5 | from typing import Dict |
| 6 | |
| 7 | logger = logging.getLogger(__name__) |
| 8 | |
| 9 | |
| 10 | def render_mermaid(mermaid_code: str, output_dir: str | Path, name: str) -> Dict[str, Path]: |
| @@ -47,15 +47,20 @@ | |
| 47 | png_content = rendered.img_response |
| 48 | if png_content: |
| 49 | if isinstance(png_content, bytes): |
| 50 | png_path.write_bytes(png_content) |
| 51 | else: |
| 52 | png_path.write_bytes( |
| 53 | png_content.encode() if isinstance(png_content, str) else png_content |
| 54 | ) |
| 55 | result["png"] = png_path |
| 56 | |
| 57 | except ImportError: |
| 58 | logger.warning( |
| 59 | "mermaid-py not installed, skipping SVG/PNG rendering. " |
| 60 | "Install with: pip install mermaid-py" |
| 61 | ) |
| 62 | except Exception as e: |
| 63 | logger.warning(f"Mermaid rendering failed for '{name}': {e}") |
| 64 | |
| 65 | return result |
| 66 | |
| 67 |
| --- video_processor/utils/usage_tracker.py | ||
| +++ video_processor/utils/usage_tracker.py | ||
| @@ -2,11 +2,10 @@ | ||
| 2 | 2 | |
| 3 | 3 | import time |
| 4 | 4 | from dataclasses import dataclass, field |
| 5 | 5 | from typing import Optional |
| 6 | 6 | |
| 7 | - | |
| 8 | 7 | # Cost per million tokens (USD) — updated Feb 2025 |
| 9 | 8 | _MODEL_PRICING = { |
| 10 | 9 | # Anthropic |
| 11 | 10 | "claude-sonnet-4-5-20250929": {"input": 3.00, "output": 15.00}, |
| 12 | 11 | "claude-haiku-3-5-20241022": {"input": 0.80, "output": 4.00}, |
| @@ -26,10 +25,11 @@ | ||
| 26 | 25 | |
| 27 | 26 | |
| 28 | 27 | @dataclass |
| 29 | 28 | class ModelUsage: |
| 30 | 29 | """Accumulated usage for a single model.""" |
| 30 | + | |
| 31 | 31 | provider: str = "" |
| 32 | 32 | model: str = "" |
| 33 | 33 | calls: int = 0 |
| 34 | 34 | input_tokens: int = 0 |
| 35 | 35 | output_tokens: int = 0 |
| @@ -59,10 +59,11 @@ | ||
| 59 | 59 | |
| 60 | 60 | |
| 61 | 61 | @dataclass |
| 62 | 62 | class StepTiming: |
| 63 | 63 | """Timing for a single pipeline step.""" |
| 64 | + | |
| 64 | 65 | name: str |
| 65 | 66 | start_time: float = 0.0 |
| 66 | 67 | end_time: float = 0.0 |
| 67 | 68 | |
| 68 | 69 | @property |
| @@ -73,10 +74,11 @@ | ||
| 73 | 74 | |
| 74 | 75 | |
| 75 | 76 | @dataclass |
| 76 | 77 | class UsageTracker: |
| 77 | 78 | """Tracks API usage, costs, and timing across a pipeline run.""" |
| 79 | + | |
| 78 | 80 | _models: dict = field(default_factory=dict) |
| 79 | 81 | _steps: list = field(default_factory=list) |
| 80 | 82 | _current_step: Optional[StepTiming] = field(default=None) |
| 81 | 83 | _start_time: float = field(default_factory=time.time) |
| 82 | 84 | |
| @@ -160,25 +162,28 @@ | ||
| 160 | 162 | ) |
| 161 | 163 | |
| 162 | 164 | # API usage |
| 163 | 165 | if self._models: |
| 164 | 166 | lines.append(f"\n API Calls: {self.total_api_calls}") |
| 165 | - lines.append(f" Tokens: {self.total_tokens:,} " | |
| 166 | - f"({self.total_input_tokens:,} in / {self.total_output_tokens:,} out)") | |
| 167 | + lines.append( | |
| 168 | + f" Tokens: {self.total_tokens:,} " | |
| 169 | + f"({self.total_input_tokens:,} in / {self.total_output_tokens:,} out)" | |
| 170 | + ) | |
| 167 | 171 | lines.append("") |
| 168 | 172 | lines.append(f" {'Model':<35} {'Calls':>6} {'In Tok':>8} {'Out Tok':>8} {'Cost':>8}") |
| 169 | - lines.append(f" {'-'*35} {'-'*6} {'-'*8} {'-'*8} {'-'*8}") | |
| 173 | + lines.append(f" {'-' * 35} {'-' * 6} {'-' * 8} {'-' * 8} {'-' * 8}") | |
| 170 | 174 | for key in sorted(self._models.keys()): |
| 171 | 175 | u = self._models[key] |
| 172 | 176 | cost_str = f"${u.estimated_cost:.4f}" if u.estimated_cost > 0 else "free" |
| 173 | 177 | if u.audio_minutes > 0: |
| 174 | 178 | lines.append( |
| 175 | 179 | f" {key:<35} {u.calls:>6} {u.audio_minutes:>7.1f}m {'-':>8} {cost_str:>8}" |
| 176 | 180 | ) |
| 177 | 181 | else: |
| 178 | 182 | lines.append( |
| 179 | - f" {key:<35} {u.calls:>6} {u.input_tokens:>8,} {u.output_tokens:>8,} {cost_str:>8}" | |
| 183 | + f" {key:<35} {u.calls:>6} " | |
| 184 | + f"{u.input_tokens:>8,} {u.output_tokens:>8,} {cost_str:>8}" | |
| 180 | 185 | ) |
| 181 | 186 | |
| 182 | 187 | lines.append(f"\n Estimated total cost: ${self.total_cost:.4f}") |
| 183 | 188 | |
| 184 | 189 | lines.append("=" * 60) |
| 185 | 190 | |
| 186 | 191 | DELETED work_plan.md |
| --- video_processor/utils/usage_tracker.py | |
| +++ video_processor/utils/usage_tracker.py | |
| @@ -2,11 +2,10 @@ | |
| 2 | |
| 3 | import time |
| 4 | from dataclasses import dataclass, field |
| 5 | from typing import Optional |
| 6 | |
| 7 | |
| 8 | # Cost per million tokens (USD) — updated Feb 2025 |
| 9 | _MODEL_PRICING = { |
| 10 | # Anthropic |
| 11 | "claude-sonnet-4-5-20250929": {"input": 3.00, "output": 15.00}, |
| 12 | "claude-haiku-3-5-20241022": {"input": 0.80, "output": 4.00}, |
| @@ -26,10 +25,11 @@ | |
| 26 | |
| 27 | |
| 28 | @dataclass |
| 29 | class ModelUsage: |
| 30 | """Accumulated usage for a single model.""" |
| 31 | provider: str = "" |
| 32 | model: str = "" |
| 33 | calls: int = 0 |
| 34 | input_tokens: int = 0 |
| 35 | output_tokens: int = 0 |
| @@ -59,10 +59,11 @@ | |
| 59 | |
| 60 | |
| 61 | @dataclass |
| 62 | class StepTiming: |
| 63 | """Timing for a single pipeline step.""" |
| 64 | name: str |
| 65 | start_time: float = 0.0 |
| 66 | end_time: float = 0.0 |
| 67 | |
| 68 | @property |
| @@ -73,10 +74,11 @@ | |
| 73 | |
| 74 | |
| 75 | @dataclass |
| 76 | class UsageTracker: |
| 77 | """Tracks API usage, costs, and timing across a pipeline run.""" |
| 78 | _models: dict = field(default_factory=dict) |
| 79 | _steps: list = field(default_factory=list) |
| 80 | _current_step: Optional[StepTiming] = field(default=None) |
| 81 | _start_time: float = field(default_factory=time.time) |
| 82 | |
| @@ -160,25 +162,28 @@ | |
| 160 | ) |
| 161 | |
| 162 | # API usage |
| 163 | if self._models: |
| 164 | lines.append(f"\n API Calls: {self.total_api_calls}") |
| 165 | lines.append(f" Tokens: {self.total_tokens:,} " |
| 166 | f"({self.total_input_tokens:,} in / {self.total_output_tokens:,} out)") |
| 167 | lines.append("") |
| 168 | lines.append(f" {'Model':<35} {'Calls':>6} {'In Tok':>8} {'Out Tok':>8} {'Cost':>8}") |
| 169 | lines.append(f" {'-'*35} {'-'*6} {'-'*8} {'-'*8} {'-'*8}") |
| 170 | for key in sorted(self._models.keys()): |
| 171 | u = self._models[key] |
| 172 | cost_str = f"${u.estimated_cost:.4f}" if u.estimated_cost > 0 else "free" |
| 173 | if u.audio_minutes > 0: |
| 174 | lines.append( |
| 175 | f" {key:<35} {u.calls:>6} {u.audio_minutes:>7.1f}m {'-':>8} {cost_str:>8}" |
| 176 | ) |
| 177 | else: |
| 178 | lines.append( |
| 179 | f" {key:<35} {u.calls:>6} {u.input_tokens:>8,} {u.output_tokens:>8,} {cost_str:>8}" |
| 180 | ) |
| 181 | |
| 182 | lines.append(f"\n Estimated total cost: ${self.total_cost:.4f}") |
| 183 | |
| 184 | lines.append("=" * 60) |
| 185 | |
| 186 | ELETED work_plan.md |
| --- video_processor/utils/usage_tracker.py | |
| +++ video_processor/utils/usage_tracker.py | |
| @@ -2,11 +2,10 @@ | |
| 2 | |
| 3 | import time |
| 4 | from dataclasses import dataclass, field |
| 5 | from typing import Optional |
| 6 | |
| 7 | # Cost per million tokens (USD) — updated Feb 2025 |
| 8 | _MODEL_PRICING = { |
| 9 | # Anthropic |
| 10 | "claude-sonnet-4-5-20250929": {"input": 3.00, "output": 15.00}, |
| 11 | "claude-haiku-3-5-20241022": {"input": 0.80, "output": 4.00}, |
| @@ -26,10 +25,11 @@ | |
| 25 | |
| 26 | |
| 27 | @dataclass |
| 28 | class ModelUsage: |
| 29 | """Accumulated usage for a single model.""" |
| 30 | |
| 31 | provider: str = "" |
| 32 | model: str = "" |
| 33 | calls: int = 0 |
| 34 | input_tokens: int = 0 |
| 35 | output_tokens: int = 0 |
| @@ -59,10 +59,11 @@ | |
| 59 | |
| 60 | |
| 61 | @dataclass |
| 62 | class StepTiming: |
| 63 | """Timing for a single pipeline step.""" |
| 64 | |
| 65 | name: str |
| 66 | start_time: float = 0.0 |
| 67 | end_time: float = 0.0 |
| 68 | |
| 69 | @property |
| @@ -73,10 +74,11 @@ | |
| 74 | |
| 75 | |
| 76 | @dataclass |
| 77 | class UsageTracker: |
| 78 | """Tracks API usage, costs, and timing across a pipeline run.""" |
| 79 | |
| 80 | _models: dict = field(default_factory=dict) |
| 81 | _steps: list = field(default_factory=list) |
| 82 | _current_step: Optional[StepTiming] = field(default=None) |
| 83 | _start_time: float = field(default_factory=time.time) |
| 84 | |
| @@ -160,25 +162,28 @@ | |
| 162 | ) |
| 163 | |
| 164 | # API usage |
| 165 | if self._models: |
| 166 | lines.append(f"\n API Calls: {self.total_api_calls}") |
| 167 | lines.append( |
| 168 | f" Tokens: {self.total_tokens:,} " |
| 169 | f"({self.total_input_tokens:,} in / {self.total_output_tokens:,} out)" |
| 170 | ) |
| 171 | lines.append("") |
| 172 | lines.append(f" {'Model':<35} {'Calls':>6} {'In Tok':>8} {'Out Tok':>8} {'Cost':>8}") |
| 173 | lines.append(f" {'-' * 35} {'-' * 6} {'-' * 8} {'-' * 8} {'-' * 8}") |
| 174 | for key in sorted(self._models.keys()): |
| 175 | u = self._models[key] |
| 176 | cost_str = f"${u.estimated_cost:.4f}" if u.estimated_cost > 0 else "free" |
| 177 | if u.audio_minutes > 0: |
| 178 | lines.append( |
| 179 | f" {key:<35} {u.calls:>6} {u.audio_minutes:>7.1f}m {'-':>8} {cost_str:>8}" |
| 180 | ) |
| 181 | else: |
| 182 | lines.append( |
| 183 | f" {key:<35} {u.calls:>6} " |
| 184 | f"{u.input_tokens:>8,} {u.output_tokens:>8,} {cost_str:>8}" |
| 185 | ) |
| 186 | |
| 187 | lines.append(f"\n Estimated total cost: ${self.total_cost:.4f}") |
| 188 | |
| 189 | lines.append("=" * 60) |
| 190 | |
| 191 | ELETED work_plan.md |
D
work_plan.md
-188
| --- a/work_plan.md | ||
| +++ b/work_plan.md | ||
| @@ -1,188 +0,0 @@ | ||
| 1 | -PlanOpticon Development Roadmap | |
| 2 | -This document outlines the development milestones and actionable tasks for implementing the PlanOpticon video analysis system, prioritizing rapid delivery of useful outputs. | |
| 3 | -Milestone 1: Core Video Processing & Markdown Output | |
| 4 | -Goal: Process a video and produce markdown notes and mermaid diagrams | |
| 5 | -Infrastructure Setup | |
| 6 | - | |
| 7 | - Initialize project repository structure | |
| 8 | - Implement basic CLI with argparse | |
| 9 | - Create configuration management system | |
| 10 | - Set up logging framework | |
| 11 | - | |
| 12 | -Video & Audio Processing | |
| 13 | - | |
| 14 | - Implement video frame extraction | |
| 15 | - Create audio extraction pipeline | |
| 16 | - Build frame sampling strategy based on visual changes | |
| 17 | - Implement basic scene detection using cloud APIs | |
| 18 | - | |
| 19 | -Transcription & Analysis | |
| 20 | - | |
| 21 | - Integrate with cloud speech-to-text APIs (e.g., OpenAI Whisper API, Google Speech-to-Text) | |
| 22 | - Implement text analysis using LLM APIs (e.g., Claude API, GPT-4 API) | |
| 23 | - Build keyword and key point extraction via API integration | |
| 24 | - Create prompt templates for effective LLM content analysis | |
| 25 | - | |
| 26 | -Diagram Generation | |
| 27 | - | |
| 28 | - Create flow visualization module using mermaid syntax | |
| 29 | - Implement relationship mapping for detected topics | |
| 30 | - Build timeline representation generator | |
| 31 | - Leverage computer vision APIs (e.g., GPT-4 Vision, Google Cloud Vision) for diagram extraction from slides/whiteboards | |
| 32 | - | |
| 33 | -Markdown Output Generation | |
| 34 | - | |
| 35 | - Implement structured markdown generator | |
| 36 | - Create templating system for output | |
| 37 | - Build mermaid diagram integration | |
| 38 | - Develop table of contents generator | |
| 39 | - | |
| 40 | -Testing & Validation | |
| 41 | - | |
| 42 | - Set up basic testing infrastructure | |
| 43 | - Create sample videos for testing | |
| 44 | - Implement quality checks for outputs | |
| 45 | - Build simple validation metrics | |
| 46 | - | |
| 47 | -Success Criteria: | |
| 48 | - | |
| 49 | -Run script with a video input and receive markdown output with embedded mermaid diagrams | |
| 50 | -Content correctly captures main topics and relationships | |
| 51 | -Basic structure includes headings, bullet points, and at least one diagram | |
| 52 | - | |
| 53 | -Milestone 2: Advanced Content Analysis | |
| 54 | -Goal: Enhance extraction quality and content organization | |
| 55 | -Improved Speech Processing | |
| 56 | - | |
| 57 | - Integrate specialized speaker diarization APIs | |
| 58 | - Create transcript segmentation via LLM prompting | |
| 59 | - Build timestamp synchronization with content | |
| 60 | - Implement API-based vocabulary detection and handling | |
| 61 | - | |
| 62 | -Enhanced Visual Analysis | |
| 63 | - | |
| 64 | - Optimize prompts for vision APIs to detect diagrams and charts | |
| 65 | - Create efficient frame selection for API cost management | |
| 66 | - Build structured prompt chains for detailed visual analysis | |
| 67 | - Implement caching mechanism for API responses | |
| 68 | - | |
| 69 | -Content Organization | |
| 70 | - | |
| 71 | - Implement hierarchical topic modeling | |
| 72 | - Create concept relationship mapping | |
| 73 | - Build content categorization | |
| 74 | - Develop importance scoring for extracted points | |
| 75 | - | |
| 76 | -Quality Improvements | |
| 77 | - | |
| 78 | - Implement noise filtering for audio | |
| 79 | - Create redundancy reduction in notes | |
| 80 | - Build context preservation mechanisms | |
| 81 | - Develop content verification systems | |
| 82 | - | |
| 83 | -Milestone 3: Action Item & Knowledge Extraction | |
| 84 | -Goal: Identify action items and build knowledge structures | |
| 85 | -Action Item Detection | |
| 86 | - | |
| 87 | - Implement commitment language recognition | |
| 88 | - Create deadline and timeframe extraction | |
| 89 | - Build responsibility attribution | |
| 90 | - Develop priority estimation | |
| 91 | - | |
| 92 | -Knowledge Organization | |
| 93 | - | |
| 94 | - Implement knowledge graph construction | |
| 95 | - Create entity recognition and linking | |
| 96 | - Build cross-reference system | |
| 97 | - Develop temporal relationship tracking | |
| 98 | - | |
| 99 | -Enhanced Output Options | |
| 100 | - | |
| 101 | - Implement JSON structured data output | |
| 102 | - Create SVG diagram generation | |
| 103 | - Build interactive HTML output option | |
| 104 | - Develop customizable templates | |
| 105 | - | |
| 106 | -Integration Components | |
| 107 | - | |
| 108 | - Implement unified data model | |
| 109 | - Create serialization framework | |
| 110 | - Build persistence layer for results | |
| 111 | - Develop query interface for extracted knowledge | |
| 112 | - | |
| 113 | -Milestone 4: Optimization & Deployment | |
| 114 | -Goal: Enhance performance and create deployment package | |
| 115 | -Performance Optimization | |
| 116 | - | |
| 117 | - Implement GPU acceleration for core algorithms | |
| 118 | - Create ARM-specific optimizations | |
| 119 | - Build memory usage optimization | |
| 120 | - Develop parallel processing capabilities | |
| 121 | - | |
| 122 | -System Packaging | |
| 123 | - | |
| 124 | - Implement dependency management | |
| 125 | - Create installation scripts | |
| 126 | - Build comprehensive documentation | |
| 127 | - Develop container deployment option | |
| 128 | - | |
| 129 | -Advanced Features | |
| 130 | - | |
| 131 | - Implement custom domain adaptation | |
| 132 | - Create multi-video correlation | |
| 133 | - Build confidence scoring for extraction | |
| 134 | - Develop automated quality assessment | |
| 135 | - | |
| 136 | -User Experience | |
| 137 | - | |
| 138 | - Implement progress reporting | |
| 139 | - Create error handling and recovery | |
| 140 | - Build output customization options | |
| 141 | - Develop feedback collection mechanism | |
| 142 | - | |
| 143 | -Priority Matrix | |
| 144 | -FeatureImportanceTechnical ComplexityDependenciesPriorityVideo Frame ExtractionHighLowNoneP0Audio TranscriptionHighMediumAudio ExtractionP0Markdown GenerationHighLowContent AnalysisP0Mermaid Diagram CreationHighMediumContent AnalysisP0Topic ExtractionHighMediumTranscriptionP0Basic CLIHighLowNoneP0Speaker DiarizationMediumHighAudio ExtractionP2Visual Element DetectionHighHighFrame ExtractionP1Action Item DetectionMediumMediumTranscriptionP1GPU AccelerationLowMediumCore ProcessingP3ARM OptimizationMediumMediumCore ProcessingP2Installation PackageMediumLowWorking SystemP2 | |
| 145 | -Implementation Approach | |
| 146 | -To achieve the first milestone efficiently: | |
| 147 | - | |
| 148 | -Leverage Existing Cloud APIs | |
| 149 | - | |
| 150 | -Integrate with cloud speech-to-text services rather than building models | |
| 151 | -Use vision APIs for image/slide/whiteboard analysis | |
| 152 | -Employ LLM APIs (OpenAI, Anthropic, etc.) for content analysis and summarization | |
| 153 | -Implement API fallbacks and retries for robustness | |
| 154 | - | |
| 155 | - | |
| 156 | -Focus on Pipeline Integration | |
| 157 | - | |
| 158 | -Build connectors between components | |
| 159 | -Ensure data flows properly through the system | |
| 160 | -Create uniform data structures for interoperability | |
| 161 | - | |
| 162 | - | |
| 163 | -Build for Extensibility | |
| 164 | - | |
| 165 | -Design plugin architecture from the beginning | |
| 166 | -Use configuration-driven approach where possible | |
| 167 | -Create clear interfaces between components | |
| 168 | - | |
| 169 | - | |
| 170 | -Iterative Refinement | |
| 171 | - | |
| 172 | -Implement basic functionality first | |
| 173 | -Add sophistication in subsequent iterations | |
| 174 | -Collect feedback after each milestone | |
| 175 | - | |
| 176 | - | |
| 177 | - | |
| 178 | -Next Steps | |
| 179 | -After completing this roadmap, potential future enhancements include: | |
| 180 | - | |
| 181 | -Real-time processing capabilities | |
| 182 | -Integration with video conferencing platforms | |
| 183 | -Collaborative annotation and editing features | |
| 184 | -Domain-specific model fine-tuning | |
| 185 | -Multi-language support | |
| 186 | -Customizable output formats | |
| 187 | - | |
| 188 | -This roadmap provides a clear path to developing PlanOpticon with a focus on delivering value quickly through a milestone-based approach, prioritizing the generation of markdown notes and mermaid diagrams as the first outcome. |
| --- a/work_plan.md | |
| +++ b/work_plan.md | |
| @@ -1,188 +0,0 @@ | |
| 1 | PlanOpticon Development Roadmap |
| 2 | This document outlines the development milestones and actionable tasks for implementing the PlanOpticon video analysis system, prioritizing rapid delivery of useful outputs. |
| 3 | Milestone 1: Core Video Processing & Markdown Output |
| 4 | Goal: Process a video and produce markdown notes and mermaid diagrams |
| 5 | Infrastructure Setup |
| 6 | |
| 7 | Initialize project repository structure |
| 8 | Implement basic CLI with argparse |
| 9 | Create configuration management system |
| 10 | Set up logging framework |
| 11 | |
| 12 | Video & Audio Processing |
| 13 | |
| 14 | Implement video frame extraction |
| 15 | Create audio extraction pipeline |
| 16 | Build frame sampling strategy based on visual changes |
| 17 | Implement basic scene detection using cloud APIs |
| 18 | |
| 19 | Transcription & Analysis |
| 20 | |
| 21 | Integrate with cloud speech-to-text APIs (e.g., OpenAI Whisper API, Google Speech-to-Text) |
| 22 | Implement text analysis using LLM APIs (e.g., Claude API, GPT-4 API) |
| 23 | Build keyword and key point extraction via API integration |
| 24 | Create prompt templates for effective LLM content analysis |
| 25 | |
| 26 | Diagram Generation |
| 27 | |
| 28 | Create flow visualization module using mermaid syntax |
| 29 | Implement relationship mapping for detected topics |
| 30 | Build timeline representation generator |
| 31 | Leverage computer vision APIs (e.g., GPT-4 Vision, Google Cloud Vision) for diagram extraction from slides/whiteboards |
| 32 | |
| 33 | Markdown Output Generation |
| 34 | |
| 35 | Implement structured markdown generator |
| 36 | Create templating system for output |
| 37 | Build mermaid diagram integration |
| 38 | Develop table of contents generator |
| 39 | |
| 40 | Testing & Validation |
| 41 | |
| 42 | Set up basic testing infrastructure |
| 43 | Create sample videos for testing |
| 44 | Implement quality checks for outputs |
| 45 | Build simple validation metrics |
| 46 | |
| 47 | Success Criteria: |
| 48 | |
| 49 | Run script with a video input and receive markdown output with embedded mermaid diagrams |
| 50 | Content correctly captures main topics and relationships |
| 51 | Basic structure includes headings, bullet points, and at least one diagram |
| 52 | |
| 53 | Milestone 2: Advanced Content Analysis |
| 54 | Goal: Enhance extraction quality and content organization |
| 55 | Improved Speech Processing |
| 56 | |
| 57 | Integrate specialized speaker diarization APIs |
| 58 | Create transcript segmentation via LLM prompting |
| 59 | Build timestamp synchronization with content |
| 60 | Implement API-based vocabulary detection and handling |
| 61 | |
| 62 | Enhanced Visual Analysis |
| 63 | |
| 64 | Optimize prompts for vision APIs to detect diagrams and charts |
| 65 | Create efficient frame selection for API cost management |
| 66 | Build structured prompt chains for detailed visual analysis |
| 67 | Implement caching mechanism for API responses |
| 68 | |
| 69 | Content Organization |
| 70 | |
| 71 | Implement hierarchical topic modeling |
| 72 | Create concept relationship mapping |
| 73 | Build content categorization |
| 74 | Develop importance scoring for extracted points |
| 75 | |
| 76 | Quality Improvements |
| 77 | |
| 78 | Implement noise filtering for audio |
| 79 | Create redundancy reduction in notes |
| 80 | Build context preservation mechanisms |
| 81 | Develop content verification systems |
| 82 | |
| 83 | Milestone 3: Action Item & Knowledge Extraction |
| 84 | Goal: Identify action items and build knowledge structures |
| 85 | Action Item Detection |
| 86 | |
| 87 | Implement commitment language recognition |
| 88 | Create deadline and timeframe extraction |
| 89 | Build responsibility attribution |
| 90 | Develop priority estimation |
| 91 | |
| 92 | Knowledge Organization |
| 93 | |
| 94 | Implement knowledge graph construction |
| 95 | Create entity recognition and linking |
| 96 | Build cross-reference system |
| 97 | Develop temporal relationship tracking |
| 98 | |
| 99 | Enhanced Output Options |
| 100 | |
| 101 | Implement JSON structured data output |
| 102 | Create SVG diagram generation |
| 103 | Build interactive HTML output option |
| 104 | Develop customizable templates |
| 105 | |
| 106 | Integration Components |
| 107 | |
| 108 | Implement unified data model |
| 109 | Create serialization framework |
| 110 | Build persistence layer for results |
| 111 | Develop query interface for extracted knowledge |
| 112 | |
| 113 | Milestone 4: Optimization & Deployment |
| 114 | Goal: Enhance performance and create deployment package |
| 115 | Performance Optimization |
| 116 | |
| 117 | Implement GPU acceleration for core algorithms |
| 118 | Create ARM-specific optimizations |
| 119 | Build memory usage optimization |
| 120 | Develop parallel processing capabilities |
| 121 | |
| 122 | System Packaging |
| 123 | |
| 124 | Implement dependency management |
| 125 | Create installation scripts |
| 126 | Build comprehensive documentation |
| 127 | Develop container deployment option |
| 128 | |
| 129 | Advanced Features |
| 130 | |
| 131 | Implement custom domain adaptation |
| 132 | Create multi-video correlation |
| 133 | Build confidence scoring for extraction |
| 134 | Develop automated quality assessment |
| 135 | |
| 136 | User Experience |
| 137 | |
| 138 | Implement progress reporting |
| 139 | Create error handling and recovery |
| 140 | Build output customization options |
| 141 | Develop feedback collection mechanism |
| 142 | |
| 143 | Priority Matrix |
| 144 | FeatureImportanceTechnical ComplexityDependenciesPriorityVideo Frame ExtractionHighLowNoneP0Audio TranscriptionHighMediumAudio ExtractionP0Markdown GenerationHighLowContent AnalysisP0Mermaid Diagram CreationHighMediumContent AnalysisP0Topic ExtractionHighMediumTranscriptionP0Basic CLIHighLowNoneP0Speaker DiarizationMediumHighAudio ExtractionP2Visual Element DetectionHighHighFrame ExtractionP1Action Item DetectionMediumMediumTranscriptionP1GPU AccelerationLowMediumCore ProcessingP3ARM OptimizationMediumMediumCore ProcessingP2Installation PackageMediumLowWorking SystemP2 |
| 145 | Implementation Approach |
| 146 | To achieve the first milestone efficiently: |
| 147 | |
| 148 | Leverage Existing Cloud APIs |
| 149 | |
| 150 | Integrate with cloud speech-to-text services rather than building models |
| 151 | Use vision APIs for image/slide/whiteboard analysis |
| 152 | Employ LLM APIs (OpenAI, Anthropic, etc.) for content analysis and summarization |
| 153 | Implement API fallbacks and retries for robustness |
| 154 | |
| 155 | |
| 156 | Focus on Pipeline Integration |
| 157 | |
| 158 | Build connectors between components |
| 159 | Ensure data flows properly through the system |
| 160 | Create uniform data structures for interoperability |
| 161 | |
| 162 | |
| 163 | Build for Extensibility |
| 164 | |
| 165 | Design plugin architecture from the beginning |
| 166 | Use configuration-driven approach where possible |
| 167 | Create clear interfaces between components |
| 168 | |
| 169 | |
| 170 | Iterative Refinement |
| 171 | |
| 172 | Implement basic functionality first |
| 173 | Add sophistication in subsequent iterations |
| 174 | Collect feedback after each milestone |
| 175 | |
| 176 | |
| 177 | |
| 178 | Next Steps |
| 179 | After completing this roadmap, potential future enhancements include: |
| 180 | |
| 181 | Real-time processing capabilities |
| 182 | Integration with video conferencing platforms |
| 183 | Collaborative annotation and editing features |
| 184 | Domain-specific model fine-tuning |
| 185 | Multi-language support |
| 186 | Customizable output formats |
| 187 | |
| 188 | This roadmap provides a clear path to developing PlanOpticon with a focus on delivering value quickly through a milestone-based approach, prioritizing the generation of markdown notes and mermaid diagrams as the first outcome. |
| --- a/work_plan.md | |
| +++ b/work_plan.md | |
| @@ -1,188 +0,0 @@ | |