PlanOpticon

Add face detection, periodic capture, KG batching, checkpoint/resume, and production hardening - Face detection filter removes webcam/people frames using OpenCV Haar cascade with size-based filtering (min 90px) to preserve screen share sidebar thumbnails - Periodic frame capture (every 30s) catches slow-evolving content like document scrolling that falls below change detection threshold - Knowledge graph batches 10 segments per API call, reducing calls from ~2200 to ~110 and combining entity/relationship extraction into single prompt - Checkpoint/resume on all 7 pipeline steps — skips completed work on re-run - Even sampling for diagram analysis across full video instead of first N frames - Local Whisper transcription on MPS, usage tracking, progress bars - Audio/frame extractors, text extractor, output formatter, API cache - Platform-specific requirements (Apple Silicon, GPU), setup script - Updated docs: CLI reference, pipeline architecture, single-video guide

leo 2026-02-15 04:17 trunk
Commit 287a3bb0cdafafd28a255be1625194a8171f53636a764c42f73b0cf7e2454f0b
41 files changed +4 +7 -1 +149 +3 +1 +13 -11 -1 +4 +10 +120 +121 +58 +2 +11 -3 +17 -7 +216 +1 +196 +197 +270 +68 -59 +1 +167 -74 +8 +9 -4 +44 -7 +54 -4 +83 +76 +6 -4 +98 -17 +163 +3 -1 +153
--- a/.env.example
+++ b/.env.example
@@ -0,0 +1,4 @@
1
+# API Keys
2
+OPENAI_API_KEY=your_openai_api_key_here
3
+ANTHROPIC_API_KEY=your_anthropic_api_key_here
4
+GOOGLE_APPLICATION_CREDENTIALS=path_to_your_google_credentials.json
--- a/.env.example
+++ b/.env.example
@@ -0,0 +1,4 @@
 
 
 
 
--- a/.env.example
+++ b/.env.example
@@ -0,0 +1,4 @@
1 # API Keys
2 OPENAI_API_KEY=your_openai_api_key_here
3 ANTHROPIC_API_KEY=your_anthropic_api_key_here
4 GOOGLE_APPLICATION_CREDENTIALS=path_to_your_google_credentials.json
+7 -1
--- .gitignore
+++ .gitignore
@@ -1,11 +1,17 @@
11
# Byte-compiled / optimized / DLL files
22
__pycache__/
33
*.py[cod]
44
*$py.class
5
+*.env*
56
6
-panopticon-435716-271902ced24f.json
7
+# Google Cloud service account keys
8
+panopticon-*.json
9
+planopticon-*.json
10
+*-*-*ced24f.json
11
+*service_account*.json
12
+*credentials*.json
713
814
# C extensions
915
*.so
1016
1117
# Test data
1218
1319
ADDED README_new.md
--- .gitignore
+++ .gitignore
@@ -1,11 +1,17 @@
1 # Byte-compiled / optimized / DLL files
2 __pycache__/
3 *.py[cod]
4 *$py.class
 
5
6 panopticon-435716-271902ced24f.json
 
 
 
 
 
7
8 # C extensions
9 *.so
10
11 # Test data
12
13 DDED README_new.md
--- .gitignore
+++ .gitignore
@@ -1,11 +1,17 @@
1 # Byte-compiled / optimized / DLL files
2 __pycache__/
3 *.py[cod]
4 *$py.class
5 *.env*
6
7 # Google Cloud service account keys
8 panopticon-*.json
9 planopticon-*.json
10 *-*-*ced24f.json
11 *service_account*.json
12 *credentials*.json
13
14 # C extensions
15 *.so
16
17 # Test data
18
19 DDED README_new.md
+149
--- a/README_new.md
+++ b/README_new.md
@@ -0,0 +1,149 @@
1
+# PlanOpticon
2
+
3
+Comprehensive Video Analysis & Knowledge Extraction CLI
4
+
5
+## Overview
6
+
7
+PlanOpticon is an advanced AI-powered CLI tool that conducts thorough analysis of video content, extracting structured knowledge, diagrams, and actionable insights. Using state-of-the-art computer vision and natural language processing techniques, PlanOpticon transforms video assets into valuable, structured information.
8
+
9
+## Core Features
10
+
11
+- **Complete Transcription**: Full speech-to-text with speaker attribution and semantic segmentation
12
+- **Visual Element Extraction**: Automated recognition and digitization of diagrams, charts, whiteboards, and visual aids
13
+- **Action Item Detection**: Intelligent identification and prioritization of tasks, commitments, and follow-ups
14
+- **Knowledge Structure**: Organization of extracted content into searchable, related concepts
15
+- **Plan Generation**: Synthesis of extracted elements into cohesive action plans and summaries
16
+
17
+## Installation
18
+
19
+### Prerequisites
20
+
21
+- Python 3.9+
22
+- FFmpeg (for audio/video processing)
23
+- API keys for cloud services (OpenAI, Google Cloud, etc.)
24
+
25
+### Setup
26
+
27
+1. Clone the repository:
28
+
29
+```bash
30
+git clone https://github.com/yourusername/planopticon.git
31
+cd planopticon
32
+```
33
+
34
+2. Run the setup script which creates a virtual environment and installs dependencies:
35
+
36
+```bash
37
+./scripts/setup.sh
38
+```
39
+
40
+3. Configure your API keys by editing the `.env` file created during setup.
41
+
42
+### Manual Installation
43
+
44
+If you prefer to set up manually:
45
+
46
+```bash
47
+# Create virtual environment
48
+python -m venv venv
49
+source venv/bin/activate # On Windows: venv\Scripts\activate
50
+
51
+# Install dependencies
52
+pip install -e .
53
+
54
+# Install optional GPU dependencies (if available)
55
+pip install -r requirements-gpu.txt # For NVIDIA GPUs
56
+pip install -r requirements-apple.txt # For Apple Silicon
57
+```
58
+
59
+## Usage
60
+
61
+PlanOpticon is designed as a command-line interface tool:
62
+
63
+```bash
64
+# Basic usage
65
+planopticon analyze --input video.mp4 --output analysis/
66
+
67
+# Specify processing depth
68
+planopticon analyze --input video.mp4 --depth comprehensive --output analysis/
69
+
70
+# Focus on specific extraction types
71
+planopticon analyze --input video.mp4 --focus "diagrams,action-items" --output analysis/
72
+
73
+# Process with GPU acceleration
74
+planopticon analyze --input video.mp4 --use-gpu --output analysis/
75
+```
76
+
77
+### Output Structure
78
+
79
+```
80
+analysis/
81
+├── transcript/
82
+│ ├── video_name.json # Full transcription with timestamps and speakers
83
+│ ├── video_name.txt # Plain text transcription
84
+│ └── video_name.srt # Subtitle format
85
+├── frames/ # Extracted key frames
86
+│ ├── frame_0001.jpg
87
+│ └── frame_0002.jpg
88
+├── audio/ # Extracted audio
89
+│ └── video_name.wav
90
+├── diagrams/ # Extracted and digitized visual elements
91
+│ ├── diagram_001.svg
92
+│ └── whiteboard_001.svg
93
+└── cache/ # API response cache
94
+```
95
+
96
+## Development
97
+
98
+### Architecture
99
+
100
+PlanOpticon follows a modular pipeline architecture:
101
+
102
+```
103
+video_processor/
104
+├── extractors/ # Video and audio extraction
105
+├── api/ # Cloud API integrations
106
+├── analyzers/ # Content analysis components
107
+├── integrators/ # Knowledge integration
108
+├── utils/ # Common utilities
109
+└── cli/ # Command-line interface
110
+```
111
+
112
+### Code Standards
113
+
114
+- Follow PEP 8 style guidelines for all Python code
115
+- Write comprehensive docstrings using NumPy style
116
+- Include type hints consistently throughout the codebase
117
+- Maintain test coverage for key components
118
+
119
+### Testing
120
+
121
+Run tests with pytest:
122
+
123
+```bash
124
+pytest
125
+```
126
+
127
+## System Requirements
128
+
129
+- Python 3.9+
130
+- 8GB RAM minimum (16GB recommended)
131
+- 2GB disk space for models and dependencies
132
+- CUDA-compatible GPU (optional, for accelerated processing)
133
+- ARM64 or x86_64 architecture
134
+
135
+## License
136
+
137
+MIT License
138
+
139
+## Roadmap
140
+
141
+See [work_plan.md](work_plan.md) for detailed development roadmap and milestones.
142
+
143
+## Contributing
144
+
145
+Contributions are welcome! Please feel free to submit a Pull Request.
146
+
147
+## Contact
148
+
149
+For questions or contributions, please open an issue on GitHub or contact the maintainers at [email protected].
--- a/README_new.md
+++ b/README_new.md
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- a/README_new.md
+++ b/README_new.md
@@ -0,0 +1,149 @@
1 # PlanOpticon
2
3 Comprehensive Video Analysis & Knowledge Extraction CLI
4
5 ## Overview
6
7 PlanOpticon is an advanced AI-powered CLI tool that conducts thorough analysis of video content, extracting structured knowledge, diagrams, and actionable insights. Using state-of-the-art computer vision and natural language processing techniques, PlanOpticon transforms video assets into valuable, structured information.
8
9 ## Core Features
10
11 - **Complete Transcription**: Full speech-to-text with speaker attribution and semantic segmentation
12 - **Visual Element Extraction**: Automated recognition and digitization of diagrams, charts, whiteboards, and visual aids
13 - **Action Item Detection**: Intelligent identification and prioritization of tasks, commitments, and follow-ups
14 - **Knowledge Structure**: Organization of extracted content into searchable, related concepts
15 - **Plan Generation**: Synthesis of extracted elements into cohesive action plans and summaries
16
17 ## Installation
18
19 ### Prerequisites
20
21 - Python 3.9+
22 - FFmpeg (for audio/video processing)
23 - API keys for cloud services (OpenAI, Google Cloud, etc.)
24
25 ### Setup
26
27 1. Clone the repository:
28
29 ```bash
30 git clone https://github.com/yourusername/planopticon.git
31 cd planopticon
32 ```
33
34 2. Run the setup script which creates a virtual environment and installs dependencies:
35
36 ```bash
37 ./scripts/setup.sh
38 ```
39
40 3. Configure your API keys by editing the `.env` file created during setup.
41
42 ### Manual Installation
43
44 If you prefer to set up manually:
45
46 ```bash
47 # Create virtual environment
48 python -m venv venv
49 source venv/bin/activate # On Windows: venv\Scripts\activate
50
51 # Install dependencies
52 pip install -e .
53
54 # Install optional GPU dependencies (if available)
55 pip install -r requirements-gpu.txt # For NVIDIA GPUs
56 pip install -r requirements-apple.txt # For Apple Silicon
57 ```
58
59 ## Usage
60
61 PlanOpticon is designed as a command-line interface tool:
62
63 ```bash
64 # Basic usage
65 planopticon analyze --input video.mp4 --output analysis/
66
67 # Specify processing depth
68 planopticon analyze --input video.mp4 --depth comprehensive --output analysis/
69
70 # Focus on specific extraction types
71 planopticon analyze --input video.mp4 --focus "diagrams,action-items" --output analysis/
72
73 # Process with GPU acceleration
74 planopticon analyze --input video.mp4 --use-gpu --output analysis/
75 ```
76
77 ### Output Structure
78
79 ```
80 analysis/
81 ├── transcript/
82 │ ├── video_name.json # Full transcription with timestamps and speakers
83 │ ├── video_name.txt # Plain text transcription
84 │ └── video_name.srt # Subtitle format
85 ├── frames/ # Extracted key frames
86 │ ├── frame_0001.jpg
87 │ └── frame_0002.jpg
88 ├── audio/ # Extracted audio
89 │ └── video_name.wav
90 ├── diagrams/ # Extracted and digitized visual elements
91 │ ├── diagram_001.svg
92 │ └── whiteboard_001.svg
93 └── cache/ # API response cache
94 ```
95
96 ## Development
97
98 ### Architecture
99
100 PlanOpticon follows a modular pipeline architecture:
101
102 ```
103 video_processor/
104 ├── extractors/ # Video and audio extraction
105 ├── api/ # Cloud API integrations
106 ├── analyzers/ # Content analysis components
107 ├── integrators/ # Knowledge integration
108 ├── utils/ # Common utilities
109 └── cli/ # Command-line interface
110 ```
111
112 ### Code Standards
113
114 - Follow PEP 8 style guidelines for all Python code
115 - Write comprehensive docstrings using NumPy style
116 - Include type hints consistently throughout the codebase
117 - Maintain test coverage for key components
118
119 ### Testing
120
121 Run tests with pytest:
122
123 ```bash
124 pytest
125 ```
126
127 ## System Requirements
128
129 - Python 3.9+
130 - 8GB RAM minimum (16GB recommended)
131 - 2GB disk space for models and dependencies
132 - CUDA-compatible GPU (optional, for accelerated processing)
133 - ARM64 or x86_64 architecture
134
135 ## License
136
137 MIT License
138
139 ## Roadmap
140
141 See [work_plan.md](work_plan.md) for detailed development roadmap and milestones.
142
143 ## Contributing
144
145 Contributions are welcome! Please feel free to submit a Pull Request.
146
147 ## Contact
148
149 For questions or contributions, please open an issue on GitHub or contact the maintainers at [email protected].
--- docs/architecture/pipeline.md
+++ docs/architecture/pipeline.md
@@ -12,10 +12,13 @@
1212
participant DiagramAnalyzer
1313
participant KnowledgeGraph
1414
1515
CLI->>Pipeline: process_single_video()
1616
Pipeline->>FrameExtractor: extract_frames()
17
+ Note over FrameExtractor: Change detection + periodic capture (every 30s)
18
+ Pipeline->>Pipeline: filter_people_frames()
19
+ Note over Pipeline: OpenCV face detection removes webcam/people frames
1720
Pipeline->>AudioExtractor: extract_audio()
1821
Pipeline->>Provider: transcribe_audio()
1922
Pipeline->>DiagramAnalyzer: process_frames()
2023
2124
loop Each frame
2225
--- docs/architecture/pipeline.md
+++ docs/architecture/pipeline.md
@@ -12,10 +12,13 @@
12 participant DiagramAnalyzer
13 participant KnowledgeGraph
14
15 CLI->>Pipeline: process_single_video()
16 Pipeline->>FrameExtractor: extract_frames()
 
 
 
17 Pipeline->>AudioExtractor: extract_audio()
18 Pipeline->>Provider: transcribe_audio()
19 Pipeline->>DiagramAnalyzer: process_frames()
20
21 loop Each frame
22
--- docs/architecture/pipeline.md
+++ docs/architecture/pipeline.md
@@ -12,10 +12,13 @@
12 participant DiagramAnalyzer
13 participant KnowledgeGraph
14
15 CLI->>Pipeline: process_single_video()
16 Pipeline->>FrameExtractor: extract_frames()
17 Note over FrameExtractor: Change detection + periodic capture (every 30s)
18 Pipeline->>Pipeline: filter_people_frames()
19 Note over Pipeline: OpenCV face detection removes webcam/people frames
20 Pipeline->>AudioExtractor: extract_audio()
21 Pipeline->>Provider: transcribe_audio()
22 Pipeline->>DiagramAnalyzer: process_frames()
23
24 loop Each frame
25
--- docs/cli-reference.md
+++ docs/cli-reference.md
@@ -15,10 +15,11 @@
1515
| `--depth` | `basic\|standard\|comprehensive` | `standard` | Processing depth |
1616
| `--focus` | TEXT | all | Comma-separated focus areas |
1717
| `--use-gpu` | FLAG | off | Enable GPU acceleration |
1818
| `--sampling-rate` | FLOAT | 0.5 | Frame sampling rate (fps) |
1919
| `--change-threshold` | FLOAT | 0.15 | Visual change threshold |
20
+| `--periodic-capture` | FLOAT | 30.0 | Capture a frame every N seconds regardless of change (0 to disable) |
2021
| `--title` | TEXT | auto | Report title |
2122
| `-p`, `--provider` | `auto\|openai\|anthropic\|gemini` | `auto` | API provider |
2223
| `--vision-model` | TEXT | auto | Override vision model |
2324
| `--chat-model` | TEXT | auto | Override chat model |
2425
2526
--- docs/cli-reference.md
+++ docs/cli-reference.md
@@ -15,10 +15,11 @@
15 | `--depth` | `basic\|standard\|comprehensive` | `standard` | Processing depth |
16 | `--focus` | TEXT | all | Comma-separated focus areas |
17 | `--use-gpu` | FLAG | off | Enable GPU acceleration |
18 | `--sampling-rate` | FLOAT | 0.5 | Frame sampling rate (fps) |
19 | `--change-threshold` | FLOAT | 0.15 | Visual change threshold |
 
20 | `--title` | TEXT | auto | Report title |
21 | `-p`, `--provider` | `auto\|openai\|anthropic\|gemini` | `auto` | API provider |
22 | `--vision-model` | TEXT | auto | Override vision model |
23 | `--chat-model` | TEXT | auto | Override chat model |
24
25
--- docs/cli-reference.md
+++ docs/cli-reference.md
@@ -15,10 +15,11 @@
15 | `--depth` | `basic\|standard\|comprehensive` | `standard` | Processing depth |
16 | `--focus` | TEXT | all | Comma-separated focus areas |
17 | `--use-gpu` | FLAG | off | Enable GPU acceleration |
18 | `--sampling-rate` | FLOAT | 0.5 | Frame sampling rate (fps) |
19 | `--change-threshold` | FLOAT | 0.15 | Visual change threshold |
20 | `--periodic-capture` | FLOAT | 30.0 | Capture a frame every N seconds regardless of change (0 to disable) |
21 | `--title` | TEXT | auto | Report title |
22 | `-p`, `--provider` | `auto\|openai\|anthropic\|gemini` | `auto` | API provider |
23 | `--vision-model` | TEXT | auto | Override vision model |
24 | `--chat-model` | TEXT | auto | Override chat model |
25
26
--- docs/guide/single-video.md
+++ docs/guide/single-video.md
@@ -8,21 +8,22 @@
88
99
## What happens
1010
1111
The pipeline runs these steps in order:
1212
13
-1. **Frame extraction** — Samples frames from the video using change detection to avoid duplicates
14
-2. **Audio extraction** — Extracts audio track to WAV
15
-3. **Transcription** — Sends audio to speech-to-text (Whisper or Gemini)
16
-4. **Diagram detection** — Vision model classifies each frame as diagram/chart/whiteboard/screenshot/none
17
-5. **Diagram analysis** — High-confidence diagrams get full extraction (description, text, mermaid, chart data)
18
-6. **Screengrab fallback** — Medium-confidence frames are saved as captioned screenshots
19
-7. **Knowledge graph** — Extracts entities and relationships from transcript + diagrams
20
-8. **Key points** — LLM extracts main points and topics
21
-9. **Action items** — LLM finds tasks, commitments, and follow-ups
22
-10. **Reports** — Generates markdown, HTML, and PDF
23
-11. **Export** — Renders mermaid diagrams to SVG/PNG, reproduces charts
13
+1. **Frame extraction** — Samples frames using change detection for transitions plus periodic capture (every 30s) for slow-evolving content like document scrolling
14
+2. **People frame filtering** — OpenCV face detection automatically removes webcam/video conference frames, keeping only shared content (slides, documents, screen shares)
15
+3. **Audio extraction** — Extracts audio track to WAV
16
+4. **Transcription** — Sends audio to speech-to-text (Whisper or Gemini)
17
+5. **Diagram detection** — Vision model classifies each frame as diagram/chart/whiteboard/screenshot/none
18
+6. **Diagram analysis** — High-confidence diagrams get full extraction (description, text, mermaid, chart data)
19
+7. **Screengrab fallback** — Medium-confidence frames are saved as captioned screenshots
20
+8. **Knowledge graph** — Extracts entities and relationships from transcript + diagrams
21
+9. **Key points** — LLM extracts main points and topics
22
+10. **Action items** — LLM finds tasks, commitments, and follow-ups
23
+11. **Reports** — Generates markdown, HTML, and PDF
24
+12. **Export** — Renders mermaid diagrams to SVG/PNG, reproduces charts
2425
2526
## Processing depth
2627
2728
### `basic`
2829
- Transcription only
@@ -53,14 +54,15 @@
5354
"duration_seconds": 3600.0
5455
},
5556
"stats": {
5657
"duration_seconds": 45.2,
5758
"frames_extracted": 42,
59
+ "people_frames_filtered": 11,
5860
"diagrams_detected": 3,
5961
"screen_captures": 5
6062
},
6163
"key_points": [...],
6264
"action_items": [...],
6365
"diagrams": [...],
6466
"screen_captures": [...]
6567
}
6668
```
6769
--- docs/guide/single-video.md
+++ docs/guide/single-video.md
@@ -8,21 +8,22 @@
8
9 ## What happens
10
11 The pipeline runs these steps in order:
12
13 1. **Frame extraction** — Samples frames from the video using change detection to avoid duplicates
14 2. **Audio extraction** — Extracts audio track to WAV
15 3. **Transcription** — Sends audio to speech-to-text (Whisper or Gemini)
16 4. **Diagram detection** — Vision model classifies each frame as diagram/chart/whiteboard/screenshot/none
17 5. **Diagram analysis** — High-confidence diagrams get full extraction (description, text, mermaid, chart data)
18 6. **Screengrab fallback** — Medium-confidence frames are saved as captioned screenshots
19 7. **Knowledge graph** — Extracts entities and relationships from transcript + diagrams
20 8. **Key points** — LLM extracts main points and topics
21 9. **Action items** — LLM finds tasks, commitments, and follow-ups
22 10. **Reports** — Generates markdown, HTML, and PDF
23 11. **Export** — Renders mermaid diagrams to SVG/PNG, reproduces charts
 
24
25 ## Processing depth
26
27 ### `basic`
28 - Transcription only
@@ -53,14 +54,15 @@
53 "duration_seconds": 3600.0
54 },
55 "stats": {
56 "duration_seconds": 45.2,
57 "frames_extracted": 42,
 
58 "diagrams_detected": 3,
59 "screen_captures": 5
60 },
61 "key_points": [...],
62 "action_items": [...],
63 "diagrams": [...],
64 "screen_captures": [...]
65 }
66 ```
67
--- docs/guide/single-video.md
+++ docs/guide/single-video.md
@@ -8,21 +8,22 @@
8
9 ## What happens
10
11 The pipeline runs these steps in order:
12
13 1. **Frame extraction** — Samples frames using change detection for transitions plus periodic capture (every 30s) for slow-evolving content like document scrolling
14 2. **People frame filtering** — OpenCV face detection automatically removes webcam/video conference frames, keeping only shared content (slides, documents, screen shares)
15 3. **Audio extraction** — Extracts audio track to WAV
16 4. **Transcription** — Sends audio to speech-to-text (Whisper or Gemini)
17 5. **Diagram detection** — Vision model classifies each frame as diagram/chart/whiteboard/screenshot/none
18 6. **Diagram analysis** — High-confidence diagrams get full extraction (description, text, mermaid, chart data)
19 7. **Screengrab fallback** — Medium-confidence frames are saved as captioned screenshots
20 8. **Knowledge graph** — Extracts entities and relationships from transcript + diagrams
21 9. **Key points** — LLM extracts main points and topics
22 10. **Action items** — LLM finds tasks, commitments, and follow-ups
23 11. **Reports** — Generates markdown, HTML, and PDF
24 12. **Export** — Renders mermaid diagrams to SVG/PNG, reproduces charts
25
26 ## Processing depth
27
28 ### `basic`
29 - Transcription only
@@ -53,14 +54,15 @@
54 "duration_seconds": 3600.0
55 },
56 "stats": {
57 "duration_seconds": 45.2,
58 "frames_extracted": 42,
59 "people_frames_filtered": 11,
60 "diagrams_detected": 3,
61 "screen_captures": 5
62 },
63 "key_points": [...],
64 "action_items": [...],
65 "diagrams": [...],
66 "screen_captures": [...]
67 }
68 ```
69
--- pyproject.toml
+++ pyproject.toml
@@ -15,11 +15,10 @@
1515
keywords = ["video", "analysis", "ai", "knowledge-extraction", "transcription", "diagrams"]
1616
classifiers = [
1717
"Development Status :: 4 - Beta",
1818
"Intended Audience :: Developers",
1919
"Intended Audience :: Science/Research",
20
- "License :: OSI Approved :: MIT License",
2120
"Operating System :: OS Independent",
2221
"Programming Language :: Python :: 3",
2322
"Programming Language :: Python :: 3.10",
2423
"Programming Language :: Python :: 3.11",
2524
"Programming Language :: Python :: 3.12",
2625
2726
ADDED requirements-apple.txt
2827
ADDED requirements-gpu.txt
2928
ADDED scripts/setup.sh
3029
ADDED tests/__init__.py
3130
ADDED tests/test_audio_extractor.py
3231
ADDED tests/test_frame_extractor.py
--- pyproject.toml
+++ pyproject.toml
@@ -15,11 +15,10 @@
15 keywords = ["video", "analysis", "ai", "knowledge-extraction", "transcription", "diagrams"]
16 classifiers = [
17 "Development Status :: 4 - Beta",
18 "Intended Audience :: Developers",
19 "Intended Audience :: Science/Research",
20 "License :: OSI Approved :: MIT License",
21 "Operating System :: OS Independent",
22 "Programming Language :: Python :: 3",
23 "Programming Language :: Python :: 3.10",
24 "Programming Language :: Python :: 3.11",
25 "Programming Language :: Python :: 3.12",
26
27 DDED requirements-apple.txt
28 DDED requirements-gpu.txt
29 DDED scripts/setup.sh
30 DDED tests/__init__.py
31 DDED tests/test_audio_extractor.py
32 DDED tests/test_frame_extractor.py
--- pyproject.toml
+++ pyproject.toml
@@ -15,11 +15,10 @@
15 keywords = ["video", "analysis", "ai", "knowledge-extraction", "transcription", "diagrams"]
16 classifiers = [
17 "Development Status :: 4 - Beta",
18 "Intended Audience :: Developers",
19 "Intended Audience :: Science/Research",
 
20 "Operating System :: OS Independent",
21 "Programming Language :: Python :: 3",
22 "Programming Language :: Python :: 3.10",
23 "Programming Language :: Python :: 3.11",
24 "Programming Language :: Python :: 3.12",
25
26 DDED requirements-apple.txt
27 DDED requirements-gpu.txt
28 DDED scripts/setup.sh
29 DDED tests/__init__.py
30 DDED tests/test_audio_extractor.py
31 DDED tests/test_frame_extractor.py
--- a/requirements-apple.txt
+++ b/requirements-apple.txt
@@ -0,0 +1,4 @@
1
+# ARM-specific optimi0tions for macOS
2
+torch==2.11.0
3
+torchvision==0.15.0
4
+torchaudio==2.0
--- a/requirements-apple.txt
+++ b/requirements-apple.txt
@@ -0,0 +1,4 @@
 
 
 
 
--- a/requirements-apple.txt
+++ b/requirements-apple.txt
@@ -0,0 +1,4 @@
1 # ARM-specific optimi0tions for macOS
2 torch==2.11.0
3 torchvision==0.15.0
4 torchaudio==2.0
--- a/requirements-gpu.txt
+++ b/requirements-gpu.txt
@@ -0,0 +1,10 @@
1
+# NVIDIA GPU accelera0.0+cu118 dependencies
2
+torch==2.11.0
3
+torchvision==0.15.0+cu118
4
+torchaudio==2.0.0+cu118
5
+# Additional CUDA dependencies
6
+opencv-contrib-python-headless>=4.7.0.72
7
+cupy-cuda11x>=12.0.0
8
+
9
+# Find packages at:
10
+# https://download.pytorch
--- a/requirements-gpu.txt
+++ b/requirements-gpu.txt
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
--- a/requirements-gpu.txt
+++ b/requirements-gpu.txt
@@ -0,0 +1,10 @@
1 # NVIDIA GPU accelera0.0+cu118 dependencies
2 torch==2.11.0
3 torchvision==0.15.0+cu118
4 torchaudio==2.0.0+cu118
5 # Additional CUDA dependencies
6 opencv-contrib-python-headless>=4.7.0.72
7 cupy-cuda11x>=12.0.0
8
9 # Find packages at:
10 # https://download.pytorch
--- a/scripts/setup.sh
+++ b/scripts/setup.sh
@@ -0,0 +1,120 @@
1
+#!/bin/bash
2
+# PlanOpticon setup script
3
+set -e
4
+
5
+# Detect operating system
6
+if [[ "$OSTYPE" == "darwin"* ]]; then
7
+ OS="macos"
8
+elif [[ "$OSTYPE" == "linux-gnu"* ]]; then
9
+ OS="linux"
10
+else
11
+ echo "Unsupported operating system: $OSTYPE"
12
+ exit 1
13
+fi
14
+
15
+# Detect architecture
16
+ARCH=$(uname -m)
17
+if [[ "$ARCH" == "arm64" ]] || [[ "$ARCH" == "aarch64" ]]; then
18
+ ARCH="arm64"
19
+elif [[ "$ARCH" == "x86_64" ]]; then
20
+ ARCH="x86_64"
21
+else
22
+ echo "Unsupported architecture: $ARCH"
23
+ exit 1
24
+fi
25
+
26
+echo "Setting up PlanOpticon on $OS ($ARCH)..."
27
+
28
+# Check for Python
29
+if ! command -v python3 &> /dev/null; then
30
+ echo "Python 3 is required but not found."
31
+ if [[ "$OS" == "macos" ]]; then
32
+ echo "Please install Python 3 using Homebrew or from python.org."
33
+ echo " brew install python"
34
+ elif [[ "$OS" == "linux" ]]; then
35
+ echo "Please install Python 3 using your package manager."
36
+ echo " Ubuntu/Debian: sudo apt install python3 python3-pip python3-venv"
37
+ echo " Fedora: sudo dnf install python3 python3-pip"
38
+ fi
39
+ exit 1
40
+fi
41
+
42
+# Check Python version
43
+PY_VERSION=$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
44
+PY_MAJOR=$(echo $PY_VERSION | cut -d. -f1)
45
+PY_MINOR=$(echo $PY_VERSION | cut -d. -f2)
46
+
47
+if [[ "$PY_MAJOR" -lt 3 ]] || [[ "$PY_MAJOR" -eq 3 && "$PY_MINOR" -lt 9 ]]; then
48
+ echo "Python 3.9 or higher is required, but found $PY_VERSION."
49
+ echo "Please upgrade your Python installation."
50
+ exit 1
51
+fi
52
+
53
+echo "Using Python $PY_VERSION"
54
+
55
+# Check for FFmpeg
56
+if ! command -v ffmpeg &> /dev/null; then
57
+ echo "FFmpeg is required but not found."
58
+ if [[ "$OS" == "macos" ]]; then
59
+ echo "Please install FFmpeg using Homebrew:"
60
+ echo " brew install ffmpeg"
61
+ elif [[ "$OS" == "linux" ]]; then
62
+ echo "Please install FFmpeg using your package manager:"
63
+ echo " Ubuntu/Debian: sudo apt install ffmpeg"
64
+ echo " Fedora: sudo dnf install ffmpeg"
65
+ fi
66
+ exit 1
67
+fi
68
+
69
+echo "FFmpeg found"
70
+
71
+# Create and activate virtual environment
72
+if [[ -d "venv" ]]; then
73
+ echo "Virtual environment already exists"
74
+else
75
+ echo "Creating virtual environment..."
76
+ python3 -m venv venv
77
+fi
78
+
79
+# Determine activate script path
80
+if [[ "$OS" == "macos" ]] || [[ "$OS" == "linux" ]]; then
81
+ ACTIVATE="venv/bin/activate"
82
+fi
83
+
84
+echo "Activating virtual environment..."
85
+source "$ACTIVATE"
86
+
87
+# Upgrade pip
88
+echo "Upgrading pip..."
89
+pip install --upgrade pip
90
+
91
+# Install dependencies
92
+echo "Installing dependencies..."
93
+pip install -e .
94
+
95
+# Install optional GPU dependencies if available
96
+if [[ "$OS" == "macos" && "$ARCH" == "arm64" ]]; then
97
+ echo "Installing optional ARM-specific packages for macOS..."
98
+ pip install -r requirements-apple.txt 2>/dev/null || echo "No ARM-specific packages found or could not install them."
99
+elif [[ "$ARCH" == "x86_64" ]]; then
100
+ # Check for NVIDIA GPU
101
+ if [[ "$OS" == "linux" ]] && command -v nvidia-smi &> /dev/null; then
102
+ echo "NVIDIA GPU detected, installing GPU dependencies..."
103
+ pip install -r requirements-gpu.txt 2>/dev/null || echo "Could not install GPU packages."
104
+ fi
105
+fi
106
+
107
+# Create example .env file if it doesn't exist
108
+if [[ ! -f ".env" ]]; then
109
+ echo "Creating example .env file..."
110
+ cp .env.example .env
111
+ echo "Please edit the .env file to add your API keys."
112
+fi
113
+
114
+echo "Setup complete! PlanOpticon is ready to use."
115
+echo ""
116
+echo "To activate the virtual environment, run:"
117
+echo " source \"$ACTIVATE\""
118
+echo ""
119
+echo "To run PlanOpticon, use:"
120
+echo " planopticon --help"
--- a/scripts/setup.sh
+++ b/scripts/setup.sh
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- a/scripts/setup.sh
+++ b/scripts/setup.sh
@@ -0,0 +1,120 @@
1 #!/bin/bash
2 # PlanOpticon setup script
3 set -e
4
5 # Detect operating system
6 if [[ "$OSTYPE" == "darwin"* ]]; then
7 OS="macos"
8 elif [[ "$OSTYPE" == "linux-gnu"* ]]; then
9 OS="linux"
10 else
11 echo "Unsupported operating system: $OSTYPE"
12 exit 1
13 fi
14
15 # Detect architecture
16 ARCH=$(uname -m)
17 if [[ "$ARCH" == "arm64" ]] || [[ "$ARCH" == "aarch64" ]]; then
18 ARCH="arm64"
19 elif [[ "$ARCH" == "x86_64" ]]; then
20 ARCH="x86_64"
21 else
22 echo "Unsupported architecture: $ARCH"
23 exit 1
24 fi
25
26 echo "Setting up PlanOpticon on $OS ($ARCH)..."
27
28 # Check for Python
29 if ! command -v python3 &> /dev/null; then
30 echo "Python 3 is required but not found."
31 if [[ "$OS" == "macos" ]]; then
32 echo "Please install Python 3 using Homebrew or from python.org."
33 echo " brew install python"
34 elif [[ "$OS" == "linux" ]]; then
35 echo "Please install Python 3 using your package manager."
36 echo " Ubuntu/Debian: sudo apt install python3 python3-pip python3-venv"
37 echo " Fedora: sudo dnf install python3 python3-pip"
38 fi
39 exit 1
40 fi
41
42 # Check Python version
43 PY_VERSION=$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
44 PY_MAJOR=$(echo $PY_VERSION | cut -d. -f1)
45 PY_MINOR=$(echo $PY_VERSION | cut -d. -f2)
46
47 if [[ "$PY_MAJOR" -lt 3 ]] || [[ "$PY_MAJOR" -eq 3 && "$PY_MINOR" -lt 9 ]]; then
48 echo "Python 3.9 or higher is required, but found $PY_VERSION."
49 echo "Please upgrade your Python installation."
50 exit 1
51 fi
52
53 echo "Using Python $PY_VERSION"
54
55 # Check for FFmpeg
56 if ! command -v ffmpeg &> /dev/null; then
57 echo "FFmpeg is required but not found."
58 if [[ "$OS" == "macos" ]]; then
59 echo "Please install FFmpeg using Homebrew:"
60 echo " brew install ffmpeg"
61 elif [[ "$OS" == "linux" ]]; then
62 echo "Please install FFmpeg using your package manager:"
63 echo " Ubuntu/Debian: sudo apt install ffmpeg"
64 echo " Fedora: sudo dnf install ffmpeg"
65 fi
66 exit 1
67 fi
68
69 echo "FFmpeg found"
70
71 # Create and activate virtual environment
72 if [[ -d "venv" ]]; then
73 echo "Virtual environment already exists"
74 else
75 echo "Creating virtual environment..."
76 python3 -m venv venv
77 fi
78
79 # Determine activate script path
80 if [[ "$OS" == "macos" ]] || [[ "$OS" == "linux" ]]; then
81 ACTIVATE="venv/bin/activate"
82 fi
83
84 echo "Activating virtual environment..."
85 source "$ACTIVATE"
86
87 # Upgrade pip
88 echo "Upgrading pip..."
89 pip install --upgrade pip
90
91 # Install dependencies
92 echo "Installing dependencies..."
93 pip install -e .
94
95 # Install optional GPU dependencies if available
96 if [[ "$OS" == "macos" && "$ARCH" == "arm64" ]]; then
97 echo "Installing optional ARM-specific packages for macOS..."
98 pip install -r requirements-apple.txt 2>/dev/null || echo "No ARM-specific packages found or could not install them."
99 elif [[ "$ARCH" == "x86_64" ]]; then
100 # Check for NVIDIA GPU
101 if [[ "$OS" == "linux" ]] && command -v nvidia-smi &> /dev/null; then
102 echo "NVIDIA GPU detected, installing GPU dependencies..."
103 pip install -r requirements-gpu.txt 2>/dev/null || echo "Could not install GPU packages."
104 fi
105 fi
106
107 # Create example .env file if it doesn't exist
108 if [[ ! -f ".env" ]]; then
109 echo "Creating example .env file..."
110 cp .env.example .env
111 echo "Please edit the .env file to add your API keys."
112 fi
113
114 echo "Setup complete! PlanOpticon is ready to use."
115 echo ""
116 echo "To activate the virtual environment, run:"
117 echo " source \"$ACTIVATE\""
118 echo ""
119 echo "To run PlanOpticon, use:"
120 echo " planopticon --help"

No diff available

--- a/tests/test_audio_extractor.py
+++ b/tests/test_audio_extractor.py
@@ -0,0 +1,121 @@
1
+"""Tests for the import osaudio extractor module."""
2
+
3
+import tempfile
4
+from pathlib import Pathpatch, MagicMock
5
+
6
+import numpy as np
7
+import pytest
8
+
9
+from video_processor.extractors.audio_extractor import AudioExtractor
10
+
11
+class TestAudioExtractor:
12
+ """Test suite for AudioExtractor class."""
13
+ r AudioExtractor class."""
14
+
15
+ def test_init(self):
16
+ """Test initialization of AudioExtractor."""
17
+ # Default parameters
18
+ extractor = AudioExtractor()
19
+ assert extractor.sampldir) / "
20
+ # Custom parameters
21
+ extractor = AudioExtractor(sample_rate=44100, mono=False)
22
+ assert extractor.sample_rate == 44100
23
+ asse
24
+ @patch('subprocess.run')
25
+ def test_extract_audio(self, mock_run):
26
+ """Test audio extraction from video."""
27
+ # Mock the subprocess.run call
28
+ mock_result = MagicMock()
29
+ mock_result.returncode = 0
30
+ mock_run.
31
+ll
32
+ assert extractor.mono is False
33
+
34
+ @patch("subprocess.run")
35
+ def tesvideocess.run")
36
+ def audio extractor module."""
37
+
38
+import tempfile
39
+from pathlib import Path
40
+from unittest.mock import MagicMock, patch
41
+
42
+import numpy as np
43
+
44
+from vi"""Tests
45
+ # Extract audio
46
+ extractor = AudioExtractor()
47
+ o_path.with_suffix(".wav")
48
+
49
+ # Test with default output path
50
+ output_path = extractor.extract_audio(video_path)
51
+ assert output_path == vide o_path.with_suffix(".wav")
52
+
53
+ # Test with custom output path
54
+ custom_output = Path(temp_dir) / "custom_audio.wav"
55
+ output_path = extractor.extract_audio(video_path, custom_ou
56
+ .extract_aucustom_output
57
+
58
+ # Verify subprocess.run was called with correct arguments
59
+ mock_run.assert_called()
60
+ args, kwargs = mock_run.call_args
61
+ assert "ffmpeg" in args[0]
62
+ assert "-i" in args[0]
63
+ assert
64
+ @patch('soundfile.info')
65
+ def test_get_audio_properties(self, mock_sf_info):
66
+ """Test getting audio properties."""
67
+ # Mock soundfile.info
68
+ mock_info = MagicMock()
69
+ mock_info.duration = 10.5
70
+ mock_info.samplerate = 16000
71
+ mock_info.channels = 1
72
+ mock_info.format = "WAV"
73
+ mock_info.subtype = "PCM_16"
74
+ mock_sf_inf
75
+ll
76
+ assert extractor.mono is False
77
+
78
+ @patch("subprocess.run")
79
+ def test_extract_audio(self, mock_run):
80
+ """Test audio extraction from video."""
81
+ # Mock the subprocess.run call
82
+ mock_resu ck_result.returncode = 0
83
+ mock_run.return_value = mock_result
84
+
85
+ with tempfile.TemporaryDirectory() as temp_dir:
86
+
87
+ .extract_aua dummy video file
88
+ video_path = Path(temp_dir) / "test_video.mp4"
89
+ with open(video_path, "wb") as f:
90
+ f.write(b"dummy video content")
91
+
92
+ # Extract audio
93
+ extractor = AudioExtractor()
94
+
95
+ # Test with defaul utput_path = extractor.extract_audio(video_path)
96
+ assert output_path == video_path.with_suffix(".wav")
97
+
98
+ # Test with custom output path
99
+ custom_output = Path(temp # _dir) / "cu assert str(vidtor.extract_audio(video_path, custom_output)
100
+ assert output_path == custom_output
101
+
102
+
103
+ sample_rate,
104
+ with segment_length_ms=500,
105
+ all
106
+ as called with correct actor module."""
107
+
108
+import tempfile
109
+from pathlib import Path
110
+from unittest.mock import MagicMock, patch
111
+
112
+import numpy as np
113
+
114
+from video_processor.e"""Tests assert str(video_path) in args[0]
115
+
116
+ @patch("soundfile.info")
117
+ def test_get_audi
118
+ sample_rate,
119
+ with segment_ """Test getting aud Mock soundfile.info
120
+ mock_info = MagicMock()
121
+ mock_info.duration = 10.
--- a/tests/test_audio_extractor.py
+++ b/tests/test_audio_extractor.py
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- a/tests/test_audio_extractor.py
+++ b/tests/test_audio_extractor.py
@@ -0,0 +1,121 @@
1 """Tests for the import osaudio extractor module."""
2
3 import tempfile
4 from pathlib import Pathpatch, MagicMock
5
6 import numpy as np
7 import pytest
8
9 from video_processor.extractors.audio_extractor import AudioExtractor
10
11 class TestAudioExtractor:
12 """Test suite for AudioExtractor class."""
13 r AudioExtractor class."""
14
15 def test_init(self):
16 """Test initialization of AudioExtractor."""
17 # Default parameters
18 extractor = AudioExtractor()
19 assert extractor.sampldir) / "
20 # Custom parameters
21 extractor = AudioExtractor(sample_rate=44100, mono=False)
22 assert extractor.sample_rate == 44100
23 asse
24 @patch('subprocess.run')
25 def test_extract_audio(self, mock_run):
26 """Test audio extraction from video."""
27 # Mock the subprocess.run call
28 mock_result = MagicMock()
29 mock_result.returncode = 0
30 mock_run.
31 ll
32 assert extractor.mono is False
33
34 @patch("subprocess.run")
35 def tesvideocess.run")
36 def audio extractor module."""
37
38 import tempfile
39 from pathlib import Path
40 from unittest.mock import MagicMock, patch
41
42 import numpy as np
43
44 from vi"""Tests
45 # Extract audio
46 extractor = AudioExtractor()
47 o_path.with_suffix(".wav")
48
49 # Test with default output path
50 output_path = extractor.extract_audio(video_path)
51 assert output_path == vide o_path.with_suffix(".wav")
52
53 # Test with custom output path
54 custom_output = Path(temp_dir) / "custom_audio.wav"
55 output_path = extractor.extract_audio(video_path, custom_ou
56 .extract_aucustom_output
57
58 # Verify subprocess.run was called with correct arguments
59 mock_run.assert_called()
60 args, kwargs = mock_run.call_args
61 assert "ffmpeg" in args[0]
62 assert "-i" in args[0]
63 assert
64 @patch('soundfile.info')
65 def test_get_audio_properties(self, mock_sf_info):
66 """Test getting audio properties."""
67 # Mock soundfile.info
68 mock_info = MagicMock()
69 mock_info.duration = 10.5
70 mock_info.samplerate = 16000
71 mock_info.channels = 1
72 mock_info.format = "WAV"
73 mock_info.subtype = "PCM_16"
74 mock_sf_inf
75 ll
76 assert extractor.mono is False
77
78 @patch("subprocess.run")
79 def test_extract_audio(self, mock_run):
80 """Test audio extraction from video."""
81 # Mock the subprocess.run call
82 mock_resu ck_result.returncode = 0
83 mock_run.return_value = mock_result
84
85 with tempfile.TemporaryDirectory() as temp_dir:
86
87 .extract_aua dummy video file
88 video_path = Path(temp_dir) / "test_video.mp4"
89 with open(video_path, "wb") as f:
90 f.write(b"dummy video content")
91
92 # Extract audio
93 extractor = AudioExtractor()
94
95 # Test with defaul utput_path = extractor.extract_audio(video_path)
96 assert output_path == video_path.with_suffix(".wav")
97
98 # Test with custom output path
99 custom_output = Path(temp # _dir) / "cu assert str(vidtor.extract_audio(video_path, custom_output)
100 assert output_path == custom_output
101
102
103 sample_rate,
104 with segment_length_ms=500,
105 all
106 as called with correct actor module."""
107
108 import tempfile
109 from pathlib import Path
110 from unittest.mock import MagicMock, patch
111
112 import numpy as np
113
114 from video_processor.e"""Tests assert str(video_path) in args[0]
115
116 @patch("soundfile.info")
117 def test_get_audi
118 sample_rate,
119 with segment_ """Test getting aud Mock soundfile.info
120 mock_info = MagicMock()
121 mock_info.duration = 10.
--- a/tests/test_frame_extractor.py
+++ b/tests/test_frame_extractor.py
@@ -0,0 +1,58 @@
1
+"""Tests for the rame extractor module."""
2
+from pathlib import Path
3
+
4
+import numpy as np
5
+import pytest
6
+
7
+from video_processor.extractors.frame_extractor import (
8
+ calculate_frame_difference,
9
+ extract_frames,
10
+ is_gpu
11
+)ilable,
12
+ save_frames,
13
+)
14
+
15
+
16
+# Create dummy test frames
17
+@pytest.fixture
18
+def dummy_frames():
19
+ # Create a list of dummy frames with different content
20
+ frames = []
21
+ for i in range(3):
22
+ # Create frame with different intensity for each
23
+ frame = np.ones((100, 100, 3), dtype=np.uint8) * (i * 50)
24
+ frames.appendframe)
25
+ return frames
26
+
27
+
28
+def test_calculate_frame_difference():
29
+ """Test frame difference calculation."""
30
+ # Create two frames with some difference
31
+ frame1 = np.zeros((100, 100, 3), dtype=np.uint8)
32
+ frame2 = np.ones((100, 100, 3), dtype=np.uin t8) * 128 # 50% intensity
33
+
34
+ # Calculate difference
35
+ diff = calculate_frame_
36
+ # Expected difference is around 128/255 = 0.5
37
+ assert 0.45 <= diff <= 0.55
38
+ def test_is_gpu_available():
39
+ """Test GPU availability check."""
40
+ # This just tests that the function runs without error
41
+ # We don't assert the result because it depends on the system
42
+ result = is_gpu_available()
43
+ assert isinstance(result, bool)
44
+
45
+def test_save_frames(dummy_frames):
46
+ """Test saving frames to disk."""
47
+ with tempfile.TemporaryDirectory() as temp_dir:
48
+ # Save frames
49
+ paths = save_frames(dummy_frames, temp_dir, "test_frame")
50
+
51
+ # Check that me")
52
+
53
+ # Check that we got the correct number of paths
54
+ assert len(p
55
+ # Check that files were created
56
+ for path in paths:
57
+ assert os.path.exists(path)
58
+ assert os.path.getsize(path) > 0
--- a/tests/test_frame_extractor.py
+++ b/tests/test_frame_extractor.py
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- a/tests/test_frame_extractor.py
+++ b/tests/test_frame_extractor.py
@@ -0,0 +1,58 @@
1 """Tests for the rame extractor module."""
2 from pathlib import Path
3
4 import numpy as np
5 import pytest
6
7 from video_processor.extractors.frame_extractor import (
8 calculate_frame_difference,
9 extract_frames,
10 is_gpu
11 )ilable,
12 save_frames,
13 )
14
15
16 # Create dummy test frames
17 @pytest.fixture
18 def dummy_frames():
19 # Create a list of dummy frames with different content
20 frames = []
21 for i in range(3):
22 # Create frame with different intensity for each
23 frame = np.ones((100, 100, 3), dtype=np.uint8) * (i * 50)
24 frames.appendframe)
25 return frames
26
27
28 def test_calculate_frame_difference():
29 """Test frame difference calculation."""
30 # Create two frames with some difference
31 frame1 = np.zeros((100, 100, 3), dtype=np.uint8)
32 frame2 = np.ones((100, 100, 3), dtype=np.uin t8) * 128 # 50% intensity
33
34 # Calculate difference
35 diff = calculate_frame_
36 # Expected difference is around 128/255 = 0.5
37 assert 0.45 <= diff <= 0.55
38 def test_is_gpu_available():
39 """Test GPU availability check."""
40 # This just tests that the function runs without error
41 # We don't assert the result because it depends on the system
42 result = is_gpu_available()
43 assert isinstance(result, bool)
44
45 def test_save_frames(dummy_frames):
46 """Test saving frames to disk."""
47 with tempfile.TemporaryDirectory() as temp_dir:
48 # Save frames
49 paths = save_frames(dummy_frames, temp_dir, "test_frame")
50
51 # Check that me")
52
53 # Check that we got the correct number of paths
54 assert len(p
55 # Check that files were created
56 for path in paths:
57 assert os.path.exists(path)
58 assert os.path.getsize(path) > 0
--- tests/test_providers.py
+++ tests/test_providers.py
@@ -104,16 +104,18 @@
104104
assert "anthropic/claude-sonnet-4-5-20250929" == used["chat"]
105105
106106
107107
class TestDiscovery:
108108
@patch("video_processor.providers.discovery._cached_models", None)
109
+ @patch.dict("os.environ", {}, clear=True)
109110
def test_discover_skips_missing_keys(self):
110111
from video_processor.providers.discovery import discover_available_models
111112
# No API keys -> empty list, no errors
112113
models = discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""})
113114
assert models == []
114115
116
+ @patch.dict("os.environ", {}, clear=True)
115117
@patch("video_processor.providers.discovery._cached_models", None)
116118
def test_discover_caches_results(self):
117119
from video_processor.providers import discovery
118120
119121
models = discovery.discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""})
120122
121123
ADDED video_processor/__init__.py
122124
ADDED video_processor/analyzers/__init__.py
--- tests/test_providers.py
+++ tests/test_providers.py
@@ -104,16 +104,18 @@
104 assert "anthropic/claude-sonnet-4-5-20250929" == used["chat"]
105
106
107 class TestDiscovery:
108 @patch("video_processor.providers.discovery._cached_models", None)
 
109 def test_discover_skips_missing_keys(self):
110 from video_processor.providers.discovery import discover_available_models
111 # No API keys -> empty list, no errors
112 models = discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""})
113 assert models == []
114
 
115 @patch("video_processor.providers.discovery._cached_models", None)
116 def test_discover_caches_results(self):
117 from video_processor.providers import discovery
118
119 models = discovery.discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""})
120
121 DDED video_processor/__init__.py
122 DDED video_processor/analyzers/__init__.py
--- tests/test_providers.py
+++ tests/test_providers.py
@@ -104,16 +104,18 @@
104 assert "anthropic/claude-sonnet-4-5-20250929" == used["chat"]
105
106
107 class TestDiscovery:
108 @patch("video_processor.providers.discovery._cached_models", None)
109 @patch.dict("os.environ", {}, clear=True)
110 def test_discover_skips_missing_keys(self):
111 from video_processor.providers.discovery import discover_available_models
112 # No API keys -> empty list, no errors
113 models = discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""})
114 assert models == []
115
116 @patch.dict("os.environ", {}, clear=True)
117 @patch("video_processor.providers.discovery._cached_models", None)
118 def test_discover_caches_results(self):
119 from video_processor.providers import discovery
120
121 models = discovery.discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""})
122
123 DDED video_processor/__init__.py
124 DDED video_processor/analyzers/__init__.py

No diff available

--- video_processor/analyzers/diagram_analyzer.py
+++ video_processor/analyzers/diagram_analyzer.py
@@ -3,26 +3,34 @@
33
import json
44
import logging
55
import shutil
66
from pathlib import Path
77
from typing import List, Optional, Tuple, Union
8
+
9
+from tqdm import tqdm
810
911
from video_processor.models import DiagramResult, DiagramType, ScreenCapture
1012
from video_processor.providers.manager import ProviderManager
1113
1214
logger = logging.getLogger(__name__)
1315
1416
# Classification prompt — returns JSON
1517
_CLASSIFY_PROMPT = """\
16
-Examine this image from a video recording. Classify whether it contains a diagram, \
17
-chart, slide, whiteboard content, architecture drawing, or other structured visual information.
18
+Examine this image from a video recording. Your job is to identify ONLY shared content \
19
+— slides, presentations, charts, diagrams, documents, screen shares, whiteboard content, \
20
+architecture drawings, tables, or other structured visual information worth capturing.
21
+
22
+IMPORTANT: If the image primarily shows a person, people, webcam feeds, faces, or a \
23
+video conference participant view, return confidence 0.0. We are ONLY interested in \
24
+shared/presented content, NOT people or camera views.
1825
1926
Return ONLY a JSON object (no markdown fences):
2027
{
2128
"is_diagram": true/false,
2229
"diagram_type": "flowchart"|"sequence"|"architecture"|"whiteboard"|"chart"|"table"|"slide"|"screenshot"|"unknown",
2330
"confidence": 0.0 to 1.0,
31
+ "content_type": "slide"|"diagram"|"document"|"screen_share"|"whiteboard"|"chart"|"person"|"other",
2432
"brief_description": "one-sentence description of what you see"
2533
}
2634
"""
2735
2836
# Single-pass analysis prompt — extracts everything in one call
@@ -137,11 +145,11 @@
137145
diagrams: List[DiagramResult] = []
138146
captures: List[ScreenCapture] = []
139147
diagram_idx = 0
140148
capture_idx = 0
141149
142
- for i, fp in enumerate(frame_paths):
150
+ for i, fp in enumerate(tqdm(frame_paths, desc="Analyzing frames", unit="frame")):
143151
fp = Path(fp)
144152
logger.info(f"Classifying frame {i}/{len(frame_paths)}: {fp.name}")
145153
146154
try:
147155
classification = self.classify_frame(fp)
148156
149157
ADDED video_processor/cli/__init__.py
--- video_processor/analyzers/diagram_analyzer.py
+++ video_processor/analyzers/diagram_analyzer.py
@@ -3,26 +3,34 @@
3 import json
4 import logging
5 import shutil
6 from pathlib import Path
7 from typing import List, Optional, Tuple, Union
 
 
8
9 from video_processor.models import DiagramResult, DiagramType, ScreenCapture
10 from video_processor.providers.manager import ProviderManager
11
12 logger = logging.getLogger(__name__)
13
14 # Classification prompt — returns JSON
15 _CLASSIFY_PROMPT = """\
16 Examine this image from a video recording. Classify whether it contains a diagram, \
17 chart, slide, whiteboard content, architecture drawing, or other structured visual information.
 
 
 
 
 
18
19 Return ONLY a JSON object (no markdown fences):
20 {
21 "is_diagram": true/false,
22 "diagram_type": "flowchart"|"sequence"|"architecture"|"whiteboard"|"chart"|"table"|"slide"|"screenshot"|"unknown",
23 "confidence": 0.0 to 1.0,
 
24 "brief_description": "one-sentence description of what you see"
25 }
26 """
27
28 # Single-pass analysis prompt — extracts everything in one call
@@ -137,11 +145,11 @@
137 diagrams: List[DiagramResult] = []
138 captures: List[ScreenCapture] = []
139 diagram_idx = 0
140 capture_idx = 0
141
142 for i, fp in enumerate(frame_paths):
143 fp = Path(fp)
144 logger.info(f"Classifying frame {i}/{len(frame_paths)}: {fp.name}")
145
146 try:
147 classification = self.classify_frame(fp)
148
149 DDED video_processor/cli/__init__.py
--- video_processor/analyzers/diagram_analyzer.py
+++ video_processor/analyzers/diagram_analyzer.py
@@ -3,26 +3,34 @@
3 import json
4 import logging
5 import shutil
6 from pathlib import Path
7 from typing import List, Optional, Tuple, Union
8
9 from tqdm import tqdm
10
11 from video_processor.models import DiagramResult, DiagramType, ScreenCapture
12 from video_processor.providers.manager import ProviderManager
13
14 logger = logging.getLogger(__name__)
15
16 # Classification prompt — returns JSON
17 _CLASSIFY_PROMPT = """\
18 Examine this image from a video recording. Your job is to identify ONLY shared content \
19 — slides, presentations, charts, diagrams, documents, screen shares, whiteboard content, \
20 architecture drawings, tables, or other structured visual information worth capturing.
21
22 IMPORTANT: If the image primarily shows a person, people, webcam feeds, faces, or a \
23 video conference participant view, return confidence 0.0. We are ONLY interested in \
24 shared/presented content, NOT people or camera views.
25
26 Return ONLY a JSON object (no markdown fences):
27 {
28 "is_diagram": true/false,
29 "diagram_type": "flowchart"|"sequence"|"architecture"|"whiteboard"|"chart"|"table"|"slide"|"screenshot"|"unknown",
30 "confidence": 0.0 to 1.0,
31 "content_type": "slide"|"diagram"|"document"|"screen_share"|"whiteboard"|"chart"|"person"|"other",
32 "brief_description": "one-sentence description of what you see"
33 }
34 """
35
36 # Single-pass analysis prompt — extracts everything in one call
@@ -137,11 +145,11 @@
145 diagrams: List[DiagramResult] = []
146 captures: List[ScreenCapture] = []
147 diagram_idx = 0
148 capture_idx = 0
149
150 for i, fp in enumerate(tqdm(frame_paths, desc="Analyzing frames", unit="frame")):
151 fp = Path(fp)
152 logger.info(f"Classifying frame {i}/{len(frame_paths)}: {fp.name}")
153
154 try:
155 classification = self.classify_frame(fp)
156
157 DDED video_processor/cli/__init__.py

No diff available

--- video_processor/cli/commands.py
+++ video_processor/cli/commands.py
@@ -8,10 +8,11 @@
88
from pathlib import Path
99
from typing import List, Optional
1010
1111
import click
1212
import colorlog
13
+from tqdm import tqdm
1314
1415
1516
def setup_logging(verbose: bool = False) -> None:
1617
"""Set up logging with color formatting."""
1718
log_level = logging.DEBUG if verbose else logging.INFO
@@ -57,10 +58,11 @@
5758
)
5859
@click.option("--focus", type=str, help='Comma-separated focus areas (e.g., "diagrams,action-items")')
5960
@click.option("--use-gpu", is_flag=True, help="Enable GPU acceleration if available")
6061
@click.option("--sampling-rate", type=float, default=0.5, help="Frame sampling rate")
6162
@click.option("--change-threshold", type=float, default=0.15, help="Visual change threshold")
63
+@click.option("--periodic-capture", type=float, default=30.0, help="Capture a frame every N seconds regardless of change (0 to disable)")
6264
@click.option("--title", type=str, help="Title for the analysis report")
6365
@click.option(
6466
"--provider",
6567
"-p",
6668
type=click.Choice(["auto", "openai", "anthropic", "gemini"]),
@@ -77,10 +79,11 @@
7779
depth,
7880
focus,
7981
use_gpu,
8082
sampling_rate,
8183
change_threshold,
84
+ periodic_capture,
8285
title,
8386
provider,
8487
vision_model,
8588
chat_model,
8689
):
@@ -104,16 +107,19 @@
104107
provider_manager=pm,
105108
depth=depth,
106109
focus_areas=focus_areas,
107110
sampling_rate=sampling_rate,
108111
change_threshold=change_threshold,
112
+ periodic_capture_seconds=periodic_capture,
109113
use_gpu=use_gpu,
110114
title=title,
111115
)
112
- logging.info(f"Results at {output}/manifest.json")
116
+ click.echo(pm.usage.format_summary())
117
+ click.echo(f"\n Results: {output}/manifest.json")
113118
except Exception as e:
114119
logging.error(f"Error: {e}")
120
+ click.echo(pm.usage.format_summary())
115121
if ctx.obj["verbose"]:
116122
import traceback
117123
118124
traceback.print_exc()
119125
sys.exit(1)
@@ -150,12 +156,13 @@
150156
default="local",
151157
help="Video source (local directory, Google Drive, or Dropbox)",
152158
)
153159
@click.option("--folder-id", type=str, default=None, help="Google Drive folder ID")
154160
@click.option("--folder-path", type=str, default=None, help="Cloud folder path")
161
+@click.option("--recursive/--no-recursive", default=True, help="Recurse into subfolders (default: recursive)")
155162
@click.pass_context
156
-def batch(ctx, input_dir, output, depth, pattern, title, provider, vision_model, chat_model, source, folder_id, folder_path):
163
+def batch(ctx, input_dir, output, depth, pattern, title, provider, vision_model, chat_model, source, folder_id, folder_path, recursive):
157164
"""Process a folder of videos in batch."""
158165
from video_processor.integrators.knowledge_graph import KnowledgeGraph
159166
from video_processor.integrators.plan_generator import PlanGenerator
160167
from video_processor.models import BatchManifest, BatchVideoEntry
161168
from video_processor.output_structure import (
@@ -180,11 +187,11 @@
180187
181188
cloud = GoogleDriveSource()
182189
if not cloud.authenticate():
183190
logging.error("Google Drive authentication failed")
184191
sys.exit(1)
185
- cloud_files = cloud.list_videos(folder_id=folder_id, folder_path=folder_path, patterns=patterns)
192
+ cloud_files = cloud.list_videos(folder_id=folder_id, folder_path=folder_path, patterns=patterns, recursive=recursive)
186193
local_paths = cloud.download_all(cloud_files, download_dir)
187194
elif source == "dropbox":
188195
from video_processor.sources.dropbox_source import DropboxSource
189196
190197
cloud = DropboxSource()
@@ -202,14 +209,15 @@
202209
if not input_dir:
203210
logging.error("--input-dir is required for local source")
204211
sys.exit(1)
205212
input_dir = Path(input_dir)
206213
207
- # Find videos
214
+ # Find videos (rglob for recursive, glob for flat)
208215
videos = []
216
+ glob_fn = input_dir.rglob if recursive else input_dir.glob
209217
for pat in patterns:
210
- videos.extend(sorted(input_dir.glob(pat)))
218
+ videos.extend(sorted(glob_fn(pat)))
211219
videos = sorted(set(videos))
212220
213221
if not videos:
214222
logging.error(f"No videos found in {input_dir} matching {pattern}")
215223
sys.exit(1)
@@ -219,11 +227,11 @@
219227
dirs = create_batch_output_dirs(output, title)
220228
manifests = []
221229
entries = []
222230
merged_kg = KnowledgeGraph()
223231
224
- for idx, video_path in enumerate(videos):
232
+ for idx, video_path in enumerate(tqdm(videos, desc="Batch processing", unit="video")):
225233
video_name = video_path.stem
226234
video_output = dirs["videos"] / video_name
227235
logging.info(f"Processing video {idx + 1}/{len(videos)}: {video_path.name}")
228236
229237
entry = BatchVideoEntry(
@@ -290,11 +298,13 @@
290298
videos=entries,
291299
batch_summary_md="batch_summary.md",
292300
merged_knowledge_graph_json="knowledge_graph.json",
293301
)
294302
write_batch_manifest(batch_manifest, output)
295
- logging.info(f"Batch complete: {batch_manifest.completed_videos}/{batch_manifest.total_videos} succeeded")
303
+ click.echo(pm.usage.format_summary())
304
+ click.echo(f"\n Batch complete: {batch_manifest.completed_videos}/{batch_manifest.total_videos} succeeded")
305
+ click.echo(f" Results: {output}/batch_manifest.json")
296306
297307
298308
@cli.command("list-models")
299309
@click.pass_context
300310
def list_models(ctx):
301311
302312
ADDED video_processor/cli/output_formatter.py
303313
ADDED video_processor/extractors/__init__.py
304314
ADDED video_processor/extractors/audio_extractor.py
305315
ADDED video_processor/extractors/frame_extractor.py
306316
ADDED video_processor/extractors/text_extractor.py
307317
ADDED video_processor/integrators/__init__.py
--- video_processor/cli/commands.py
+++ video_processor/cli/commands.py
@@ -8,10 +8,11 @@
8 from pathlib import Path
9 from typing import List, Optional
10
11 import click
12 import colorlog
 
13
14
15 def setup_logging(verbose: bool = False) -> None:
16 """Set up logging with color formatting."""
17 log_level = logging.DEBUG if verbose else logging.INFO
@@ -57,10 +58,11 @@
57 )
58 @click.option("--focus", type=str, help='Comma-separated focus areas (e.g., "diagrams,action-items")')
59 @click.option("--use-gpu", is_flag=True, help="Enable GPU acceleration if available")
60 @click.option("--sampling-rate", type=float, default=0.5, help="Frame sampling rate")
61 @click.option("--change-threshold", type=float, default=0.15, help="Visual change threshold")
 
62 @click.option("--title", type=str, help="Title for the analysis report")
63 @click.option(
64 "--provider",
65 "-p",
66 type=click.Choice(["auto", "openai", "anthropic", "gemini"]),
@@ -77,10 +79,11 @@
77 depth,
78 focus,
79 use_gpu,
80 sampling_rate,
81 change_threshold,
 
82 title,
83 provider,
84 vision_model,
85 chat_model,
86 ):
@@ -104,16 +107,19 @@
104 provider_manager=pm,
105 depth=depth,
106 focus_areas=focus_areas,
107 sampling_rate=sampling_rate,
108 change_threshold=change_threshold,
 
109 use_gpu=use_gpu,
110 title=title,
111 )
112 logging.info(f"Results at {output}/manifest.json")
 
113 except Exception as e:
114 logging.error(f"Error: {e}")
 
115 if ctx.obj["verbose"]:
116 import traceback
117
118 traceback.print_exc()
119 sys.exit(1)
@@ -150,12 +156,13 @@
150 default="local",
151 help="Video source (local directory, Google Drive, or Dropbox)",
152 )
153 @click.option("--folder-id", type=str, default=None, help="Google Drive folder ID")
154 @click.option("--folder-path", type=str, default=None, help="Cloud folder path")
 
155 @click.pass_context
156 def batch(ctx, input_dir, output, depth, pattern, title, provider, vision_model, chat_model, source, folder_id, folder_path):
157 """Process a folder of videos in batch."""
158 from video_processor.integrators.knowledge_graph import KnowledgeGraph
159 from video_processor.integrators.plan_generator import PlanGenerator
160 from video_processor.models import BatchManifest, BatchVideoEntry
161 from video_processor.output_structure import (
@@ -180,11 +187,11 @@
180
181 cloud = GoogleDriveSource()
182 if not cloud.authenticate():
183 logging.error("Google Drive authentication failed")
184 sys.exit(1)
185 cloud_files = cloud.list_videos(folder_id=folder_id, folder_path=folder_path, patterns=patterns)
186 local_paths = cloud.download_all(cloud_files, download_dir)
187 elif source == "dropbox":
188 from video_processor.sources.dropbox_source import DropboxSource
189
190 cloud = DropboxSource()
@@ -202,14 +209,15 @@
202 if not input_dir:
203 logging.error("--input-dir is required for local source")
204 sys.exit(1)
205 input_dir = Path(input_dir)
206
207 # Find videos
208 videos = []
 
209 for pat in patterns:
210 videos.extend(sorted(input_dir.glob(pat)))
211 videos = sorted(set(videos))
212
213 if not videos:
214 logging.error(f"No videos found in {input_dir} matching {pattern}")
215 sys.exit(1)
@@ -219,11 +227,11 @@
219 dirs = create_batch_output_dirs(output, title)
220 manifests = []
221 entries = []
222 merged_kg = KnowledgeGraph()
223
224 for idx, video_path in enumerate(videos):
225 video_name = video_path.stem
226 video_output = dirs["videos"] / video_name
227 logging.info(f"Processing video {idx + 1}/{len(videos)}: {video_path.name}")
228
229 entry = BatchVideoEntry(
@@ -290,11 +298,13 @@
290 videos=entries,
291 batch_summary_md="batch_summary.md",
292 merged_knowledge_graph_json="knowledge_graph.json",
293 )
294 write_batch_manifest(batch_manifest, output)
295 logging.info(f"Batch complete: {batch_manifest.completed_videos}/{batch_manifest.total_videos} succeeded")
 
 
296
297
298 @cli.command("list-models")
299 @click.pass_context
300 def list_models(ctx):
301
302 DDED video_processor/cli/output_formatter.py
303 DDED video_processor/extractors/__init__.py
304 DDED video_processor/extractors/audio_extractor.py
305 DDED video_processor/extractors/frame_extractor.py
306 DDED video_processor/extractors/text_extractor.py
307 DDED video_processor/integrators/__init__.py
--- video_processor/cli/commands.py
+++ video_processor/cli/commands.py
@@ -8,10 +8,11 @@
8 from pathlib import Path
9 from typing import List, Optional
10
11 import click
12 import colorlog
13 from tqdm import tqdm
14
15
16 def setup_logging(verbose: bool = False) -> None:
17 """Set up logging with color formatting."""
18 log_level = logging.DEBUG if verbose else logging.INFO
@@ -57,10 +58,11 @@
58 )
59 @click.option("--focus", type=str, help='Comma-separated focus areas (e.g., "diagrams,action-items")')
60 @click.option("--use-gpu", is_flag=True, help="Enable GPU acceleration if available")
61 @click.option("--sampling-rate", type=float, default=0.5, help="Frame sampling rate")
62 @click.option("--change-threshold", type=float, default=0.15, help="Visual change threshold")
63 @click.option("--periodic-capture", type=float, default=30.0, help="Capture a frame every N seconds regardless of change (0 to disable)")
64 @click.option("--title", type=str, help="Title for the analysis report")
65 @click.option(
66 "--provider",
67 "-p",
68 type=click.Choice(["auto", "openai", "anthropic", "gemini"]),
@@ -77,10 +79,11 @@
79 depth,
80 focus,
81 use_gpu,
82 sampling_rate,
83 change_threshold,
84 periodic_capture,
85 title,
86 provider,
87 vision_model,
88 chat_model,
89 ):
@@ -104,16 +107,19 @@
107 provider_manager=pm,
108 depth=depth,
109 focus_areas=focus_areas,
110 sampling_rate=sampling_rate,
111 change_threshold=change_threshold,
112 periodic_capture_seconds=periodic_capture,
113 use_gpu=use_gpu,
114 title=title,
115 )
116 click.echo(pm.usage.format_summary())
117 click.echo(f"\n Results: {output}/manifest.json")
118 except Exception as e:
119 logging.error(f"Error: {e}")
120 click.echo(pm.usage.format_summary())
121 if ctx.obj["verbose"]:
122 import traceback
123
124 traceback.print_exc()
125 sys.exit(1)
@@ -150,12 +156,13 @@
156 default="local",
157 help="Video source (local directory, Google Drive, or Dropbox)",
158 )
159 @click.option("--folder-id", type=str, default=None, help="Google Drive folder ID")
160 @click.option("--folder-path", type=str, default=None, help="Cloud folder path")
161 @click.option("--recursive/--no-recursive", default=True, help="Recurse into subfolders (default: recursive)")
162 @click.pass_context
163 def batch(ctx, input_dir, output, depth, pattern, title, provider, vision_model, chat_model, source, folder_id, folder_path, recursive):
164 """Process a folder of videos in batch."""
165 from video_processor.integrators.knowledge_graph import KnowledgeGraph
166 from video_processor.integrators.plan_generator import PlanGenerator
167 from video_processor.models import BatchManifest, BatchVideoEntry
168 from video_processor.output_structure import (
@@ -180,11 +187,11 @@
187
188 cloud = GoogleDriveSource()
189 if not cloud.authenticate():
190 logging.error("Google Drive authentication failed")
191 sys.exit(1)
192 cloud_files = cloud.list_videos(folder_id=folder_id, folder_path=folder_path, patterns=patterns, recursive=recursive)
193 local_paths = cloud.download_all(cloud_files, download_dir)
194 elif source == "dropbox":
195 from video_processor.sources.dropbox_source import DropboxSource
196
197 cloud = DropboxSource()
@@ -202,14 +209,15 @@
209 if not input_dir:
210 logging.error("--input-dir is required for local source")
211 sys.exit(1)
212 input_dir = Path(input_dir)
213
214 # Find videos (rglob for recursive, glob for flat)
215 videos = []
216 glob_fn = input_dir.rglob if recursive else input_dir.glob
217 for pat in patterns:
218 videos.extend(sorted(glob_fn(pat)))
219 videos = sorted(set(videos))
220
221 if not videos:
222 logging.error(f"No videos found in {input_dir} matching {pattern}")
223 sys.exit(1)
@@ -219,11 +227,11 @@
227 dirs = create_batch_output_dirs(output, title)
228 manifests = []
229 entries = []
230 merged_kg = KnowledgeGraph()
231
232 for idx, video_path in enumerate(tqdm(videos, desc="Batch processing", unit="video")):
233 video_name = video_path.stem
234 video_output = dirs["videos"] / video_name
235 logging.info(f"Processing video {idx + 1}/{len(videos)}: {video_path.name}")
236
237 entry = BatchVideoEntry(
@@ -290,11 +298,13 @@
298 videos=entries,
299 batch_summary_md="batch_summary.md",
300 merged_knowledge_graph_json="knowledge_graph.json",
301 )
302 write_batch_manifest(batch_manifest, output)
303 click.echo(pm.usage.format_summary())
304 click.echo(f"\n Batch complete: {batch_manifest.completed_videos}/{batch_manifest.total_videos} succeeded")
305 click.echo(f" Results: {output}/batch_manifest.json")
306
307
308 @cli.command("list-models")
309 @click.pass_context
310 def list_models(ctx):
311
312 DDED video_processor/cli/output_formatter.py
313 DDED video_processor/extractors/__init__.py
314 DDED video_processor/extractors/audio_extractor.py
315 DDED video_processor/extractors/frame_extractor.py
316 DDED video_processor/extractors/text_extractor.py
317 DDED video_processor/integrators/__init__.py
--- a/video_processor/cli/output_formatter.py
+++ b/video_processor/cli/output_formatter.py
@@ -0,0 +1,216 @@
1
+"""Output formatting for PlanOpticon analysis results."""
2
+
3
+import html
4
+import json
5
+import logging
6
+import shutil
7
+from pathlib import Path
8
+from typing import Dict, List, Optional, Union
9
+
10
+logger = logging.getLogger(__name__)
11
+
12
+class OutputFormatter:
13
+ """Formats and organizes output from video analysis."""
14
+ ut from video analysis."""
15
+
16
+ def __init__(self, output_dir: Union[str, Pat
17
+ Parameters
18
+ ----------
19
+ output_dir : str or Path
20
+ Output directory for formatted content
21
+ """
22
+ self.output_dir = Path(output_dir)
23
+ rents=True, exist_ok=True)
24
+
25
+ def organize_outputs(
26
+ self,
27
+ markdown_path: Union[str, Path],
28
+ knowledge_graph_path: Union[str, Path],
29
+ diagrams: List[Dict],
30
+ frames_dir: Optional[Union[str, Path]] = None,
31
+ transcript_path: Option
32
+ ) -> Dict:
33
+ """
34
+ Organize outputs into a consistent structure.
35
+
36
+ Parameters
37
+ ----------
38
+ markdown_path : str or Path
39
+ Path to markdown analysis
40
+ knowledge_graph_path : str or Path
41
+ Path to knowledge graph JSON
42
+ diagrams : list
43
+ List of diagram analysis results
44
+ frames_dir : str or Path, optional
45
+ Directory with extracted frames
46
+ transcript_path : str or Path, optional
47
+ .output_
48
+ Returns
49
+ -------
50
+ dict
51
+ Dictionary with organized output paths
52
+ """
53
+ # Create output structure
54
+ md_dir = self.output_dir / "markdown"
55
+ diagrams_dir = self.output_dir / "diagrams"
56
+ data_dir = self.output_dir / "data"
57
+ = self.output_dir / "data"
58
+
59
+ md_dir.mkdir(exist_ok=True)
60
+ diagrams_dir.mkdir edge graph JSON
61
+t_ok=True)
62
+
63
+ # Copy markdown file
64
+ markdown_path = Path(markdown_path)
65
+ md_output = md_dir / markdown_path.name
66
+ shutil.copy2
67
+ # Copy knowledge graph
68
+ kg_path = Path(knowledge_graph_path)
69
+ kg_output = data_dir / kg_path.name
70
+ shutil.copy2(kg_path, kg_output)
71
+ edge graph JSON
72
+diagram images if available
73
+ diagram_images = []
74
+ for diagram in diagrams:
75
+ if "image_path" in diagram and diagram["image_path"]:
76
+ img_path = Path(diagram["image_path"])
77
+ if img_path.exists():
78
+ img_output = diagrams_dir / img_path.name
79
+ shutil.copy2(img_path, img_output)
80
+ diagram_imag edge graph JSON
81
+script_path: Optional[Union[str, Path]] = None,
82
+ ) -> Dict:
83
+ """
84
+ Organize outputs into a consistent structure.
85
+
86
+ Parameters
87
+ ----------
88
+ markdown_path : str or Path
89
+ Path to markdown analysis
90
+ knowledge_graph_path : str or Pat edge graph JSON
91
+ diagrams : list
92
+ List of diagram analysis results
93
+ frames_dir : str or Path, optional
94
+ Directory with extracted frames
95
+ transcript_path : str or Path, optional
96
+ Path to transcript file
97
+
98
+ Returns
99
+ -------
100
+
101
+ nized output paths
102
+ """
103
+ # Create output structure
104
+ md_dir = self.output_dir / "markdown"
105
+ diagrams_dir = self.output_dir / "diagrams"
106
+ data_dir = self.output_dir / "data"
107
+
108
+
109
+ diagrams_dir.mkdir(exist_ok=True)
110
+ data_dir.mkdir(exist_ok=True)
111
+
112
+ # Copy markdown file
113
+ markdown_path = Path lines.a
114
+ ath = Path(markdown_path)
115
+ md_output = md_dir / markdown_path.name
116
+ shutil.copy2(markdown_path, md_output)
117
+
118
+ # Copy knowledge graph
119
+ kg_path = Path(knowledge_graph_path)
120
+ k
121
+rkdown_p shutil.copy2(kg_path, kg_output)
122
+
123
+ # Copy diagram images if available
124
+ diagram_images = []
125
+ for diagram in diagrams:
126
+ if "image_path" in diagram and diagram["image_path"]:
127
+ img_path = Path(diagram["image_path"])
128
+ if img_pa
129
+ }
130
+ utput else None,
131
+ }
132
+
133
+ def create_html_index(self, outputs: Dict) -> Path:
134
+ """
135
+ Create HTML index page for outputs.
136
+
137
+ Parameters
138
+ ----------
139
+ outputs : dict
140
+ Dictionary with organized output paths
141
+
142
+ Returns
143
+ -------
144
+ Path
145
+ Path to HTML index
146
+ """
147
+ esc = html.escape
148
+
149
+ # Simple HTML index template
150
+ lines = [
151
+ "<!DOCTYPE html>",
152
+ "<html>",
153
+ "<head>",
154
+ " <title>PlanOpticon Analysis Results</title>",
155
+ " <style>",
156
+ margin: 0; padding: 20px; line-height: 1.6; }",
157
+ " .container { max-width: 1200px; margin: 0 auto; }",
158
+ " h1 { color: #333; }",
159
+ " h2 { color: #555; margin-top: 30px; }",
160
+ " .section { margin-bottom: 30px; }",
161
+ " .files { display: flex; flex-wrap: wrap; }",
162
+ " .file-item { margin: 10px; text-align: center; }",
163
+ " .file-item img { max-width: 200px; max-height: 150px; object-fit: contain; }",
164
+ " .file-name { margin-top: 5px; font-size: 0.9em; }",
165
+ " a { color: #0066cc; text-decoration: none; }",
166
+ " a:hover { text-decoration: underline; }",
167
+ " </style>",
168
+ "</head>",
169
+ "<body>",
170
+ "<div class='container'>",
171
+ " <h1>PlanOpticon Analysis Results</h1>",
172
+ ""
173
+ ]
174
+
175
+ # Add markdown section
176
+ if outputs.get("markdown"):
177
+ md_path = Path(outputs["markdown"])
178
+ md_rel = esc(str(md_path.relative_to(self.output_dir)))
179
+
180
+ lines.append(" <div class='section'>")
181
+ lines.append(" <h2>Analysis Report</h2>")
182
+ lines.append(f" <p><a href='{md_rel}' target='_blank'>View Analysis</a></p>")
183
+ lines.append(" </div>")
184
+
185
+ # Add diagrams section
186
+ if outputs.get("diagram_images") and len(outputs["diagram_images"]) > 0:
187
+ lines.append(" <div class='section'>")
188
+ lines.append(" <h2>Diagrams</h2>")
189
+ lines.append(" <div class='files'>")
190
+
191
+ for img_path in outputs["diagram_images"]:
192
+ img_path = Path(img_path)
193
+ img_rel = esc(str(img_path.relative_to(self.output_dir)))
194
+ img_name = esc(img_path.name)
195
+
196
+ lines.append(" <div class='file-item'>")
197
+ lines.append(f" <a href='{img_rel}' target='_blank'>")
198
+ lines.append(f" <img src='{img_rel}' alt='Diagram'>")
199
+ lines.append(" </a>")
200
+ lines.append(f" <div class='file-name'>{img_name}</div>")
201
+ lines.append(" </divf" t("markdo<li><a href='{data_rel}' target='_b)
202
+
203
+ lines.append(" </ul>")
204
+ lines.append(" </div>")
205
+
206
+ # Close HTML
207
+ lines.append("</div>")
208
+ lines.append("</body>")
209
+ lines.append("</html>")
210
+
211
+ # Write HTML file
212
+ index_path = self.output_dir / "index.html"
213
+ with open(index_path, "w") as f:
214
+ f.write("\n".join(lines))
215
+
216
+ logger.info(f"Created HTML index at {index_path}")
--- a/video_processor/cli/output_formatter.py
+++ b/video_processor/cli/output_formatter.py
@@ -0,0 +1,216 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- a/video_processor/cli/output_formatter.py
+++ b/video_processor/cli/output_formatter.py
@@ -0,0 +1,216 @@
1 """Output formatting for PlanOpticon analysis results."""
2
3 import html
4 import json
5 import logging
6 import shutil
7 from pathlib import Path
8 from typing import Dict, List, Optional, Union
9
10 logger = logging.getLogger(__name__)
11
12 class OutputFormatter:
13 """Formats and organizes output from video analysis."""
14 ut from video analysis."""
15
16 def __init__(self, output_dir: Union[str, Pat
17 Parameters
18 ----------
19 output_dir : str or Path
20 Output directory for formatted content
21 """
22 self.output_dir = Path(output_dir)
23 rents=True, exist_ok=True)
24
25 def organize_outputs(
26 self,
27 markdown_path: Union[str, Path],
28 knowledge_graph_path: Union[str, Path],
29 diagrams: List[Dict],
30 frames_dir: Optional[Union[str, Path]] = None,
31 transcript_path: Option
32 ) -> Dict:
33 """
34 Organize outputs into a consistent structure.
35
36 Parameters
37 ----------
38 markdown_path : str or Path
39 Path to markdown analysis
40 knowledge_graph_path : str or Path
41 Path to knowledge graph JSON
42 diagrams : list
43 List of diagram analysis results
44 frames_dir : str or Path, optional
45 Directory with extracted frames
46 transcript_path : str or Path, optional
47 .output_
48 Returns
49 -------
50 dict
51 Dictionary with organized output paths
52 """
53 # Create output structure
54 md_dir = self.output_dir / "markdown"
55 diagrams_dir = self.output_dir / "diagrams"
56 data_dir = self.output_dir / "data"
57 = self.output_dir / "data"
58
59 md_dir.mkdir(exist_ok=True)
60 diagrams_dir.mkdir edge graph JSON
61 t_ok=True)
62
63 # Copy markdown file
64 markdown_path = Path(markdown_path)
65 md_output = md_dir / markdown_path.name
66 shutil.copy2
67 # Copy knowledge graph
68 kg_path = Path(knowledge_graph_path)
69 kg_output = data_dir / kg_path.name
70 shutil.copy2(kg_path, kg_output)
71 edge graph JSON
72 diagram images if available
73 diagram_images = []
74 for diagram in diagrams:
75 if "image_path" in diagram and diagram["image_path"]:
76 img_path = Path(diagram["image_path"])
77 if img_path.exists():
78 img_output = diagrams_dir / img_path.name
79 shutil.copy2(img_path, img_output)
80 diagram_imag edge graph JSON
81 script_path: Optional[Union[str, Path]] = None,
82 ) -> Dict:
83 """
84 Organize outputs into a consistent structure.
85
86 Parameters
87 ----------
88 markdown_path : str or Path
89 Path to markdown analysis
90 knowledge_graph_path : str or Pat edge graph JSON
91 diagrams : list
92 List of diagram analysis results
93 frames_dir : str or Path, optional
94 Directory with extracted frames
95 transcript_path : str or Path, optional
96 Path to transcript file
97
98 Returns
99 -------
100
101 nized output paths
102 """
103 # Create output structure
104 md_dir = self.output_dir / "markdown"
105 diagrams_dir = self.output_dir / "diagrams"
106 data_dir = self.output_dir / "data"
107
108
109 diagrams_dir.mkdir(exist_ok=True)
110 data_dir.mkdir(exist_ok=True)
111
112 # Copy markdown file
113 markdown_path = Path lines.a
114 ath = Path(markdown_path)
115 md_output = md_dir / markdown_path.name
116 shutil.copy2(markdown_path, md_output)
117
118 # Copy knowledge graph
119 kg_path = Path(knowledge_graph_path)
120 k
121 rkdown_p shutil.copy2(kg_path, kg_output)
122
123 # Copy diagram images if available
124 diagram_images = []
125 for diagram in diagrams:
126 if "image_path" in diagram and diagram["image_path"]:
127 img_path = Path(diagram["image_path"])
128 if img_pa
129 }
130 utput else None,
131 }
132
133 def create_html_index(self, outputs: Dict) -> Path:
134 """
135 Create HTML index page for outputs.
136
137 Parameters
138 ----------
139 outputs : dict
140 Dictionary with organized output paths
141
142 Returns
143 -------
144 Path
145 Path to HTML index
146 """
147 esc = html.escape
148
149 # Simple HTML index template
150 lines = [
151 "<!DOCTYPE html>",
152 "<html>",
153 "<head>",
154 " <title>PlanOpticon Analysis Results</title>",
155 " <style>",
156 margin: 0; padding: 20px; line-height: 1.6; }",
157 " .container { max-width: 1200px; margin: 0 auto; }",
158 " h1 { color: #333; }",
159 " h2 { color: #555; margin-top: 30px; }",
160 " .section { margin-bottom: 30px; }",
161 " .files { display: flex; flex-wrap: wrap; }",
162 " .file-item { margin: 10px; text-align: center; }",
163 " .file-item img { max-width: 200px; max-height: 150px; object-fit: contain; }",
164 " .file-name { margin-top: 5px; font-size: 0.9em; }",
165 " a { color: #0066cc; text-decoration: none; }",
166 " a:hover { text-decoration: underline; }",
167 " </style>",
168 "</head>",
169 "<body>",
170 "<div class='container'>",
171 " <h1>PlanOpticon Analysis Results</h1>",
172 ""
173 ]
174
175 # Add markdown section
176 if outputs.get("markdown"):
177 md_path = Path(outputs["markdown"])
178 md_rel = esc(str(md_path.relative_to(self.output_dir)))
179
180 lines.append(" <div class='section'>")
181 lines.append(" <h2>Analysis Report</h2>")
182 lines.append(f" <p><a href='{md_rel}' target='_blank'>View Analysis</a></p>")
183 lines.append(" </div>")
184
185 # Add diagrams section
186 if outputs.get("diagram_images") and len(outputs["diagram_images"]) > 0:
187 lines.append(" <div class='section'>")
188 lines.append(" <h2>Diagrams</h2>")
189 lines.append(" <div class='files'>")
190
191 for img_path in outputs["diagram_images"]:
192 img_path = Path(img_path)
193 img_rel = esc(str(img_path.relative_to(self.output_dir)))
194 img_name = esc(img_path.name)
195
196 lines.append(" <div class='file-item'>")
197 lines.append(f" <a href='{img_rel}' target='_blank'>")
198 lines.append(f" <img src='{img_rel}' alt='Diagram'>")
199 lines.append(" </a>")
200 lines.append(f" <div class='file-name'>{img_name}</div>")
201 lines.append(" </divf" t("markdo<li><a href='{data_rel}' target='_b)
202
203 lines.append(" </ul>")
204 lines.append(" </div>")
205
206 # Close HTML
207 lines.append("</div>")
208 lines.append("</body>")
209 lines.append("</html>")
210
211 # Write HTML file
212 index_path = self.output_dir / "index.html"
213 with open(index_path, "w") as f:
214 f.write("\n".join(lines))
215
216 logger.info(f"Created HTML index at {index_path}")
--- a/video_processor/extractors/__init__.py
+++ b/video_processor/extractors/__init__.py
@@ -0,0 +1 @@
1
+from v
--- a/video_processor/extractors/__init__.py
+++ b/video_processor/extractors/__init__.py
@@ -0,0 +1 @@
 
--- a/video_processor/extractors/__init__.py
+++ b/video_processor/extractors/__init__.py
@@ -0,0 +1 @@
1 from v
--- a/video_processor/extractors/audio_extractor.py
+++ b/video_processor/extractors/audio_extractor.py
@@ -0,0 +1,196 @@
1
+"""Audio extraction and processing module for video analysis."""
2
+import logging
3
+import os
4
+import subprocess
5
+from pathlib import Path
6
+from typing import Dict, Optional, Tuple, Union
7
+
8
+import librosa
9
+import numpy as np
10
+import soundfile as sf
11
+
12
+logger = loclass AudioExtractor:
13
+ """Extract and process audio from video files."""
14
+
15
+ def __init__(self, sample_rate: int = 16000, mono: bool = True):
16
+ """
17
+ Initi
18
+ Parameters
19
+""Audio extraction and processxtraction and processing module for video analysis."""
20
+
21
+import logging
22
+import subprocess
23
+from pathlib import Path
24
+from typing import Dict, Optional, Tupl
25
+ def e cted audio file
26
+ : Union[str, Path],
27
+ output_path: Optiona
28
+ Path t format: str = "wav"Extract audio from video file.
29
+
30
+ Parameters
31
+video_path : str ovideo file
32
+ output_patheo file
33
+ output_path : str or Path, optional
34
+ Path to save extracted audio (if None, saves alongside video)
35
+ format : ""Audio extraction and processing module for video analysis."""
36
+
37
+i"""Audio extraction and processing module for video analysis."""
38
+
39
+import logging
40
+import subprocess
41
+from pathlib import Path
42
+from typing import Dict, Option
43
+ format# Generate output path if not provided
44
+ if output_path is None:
45
+ output_path = video_path.with_suffix(f".{format}")
46
+ else:
47
+ output_path = Path(output_path)
48
+
49
+ # Ensure output directory exists
50
+ output_path.parent.mkdir(pa
51
+ # Extract audio using ffmpeg
52
+ try:
53
+ cmd = [
54
+ "ffmpeg",
55
+ "-i", str(video_path),
56
+ "-o_path),
57
+ "-acodec",
58
+ "pcm_s16le", # PCM 16-bit little-en "-ar",
59
+ str(se "1" if self.mono else "2", # Channels (mono or stereo)
60
+ "-y", # Overwrite output
61
+ str(output_path)
62
+ s
63
+ -]
64
+
65
+ ers
66
+ """Audio extraction and
67
+ cmd,
68
+ self.sample_rate = samp
69
+ stderr=subprocess.PIPE,
70
+ check=True
71
+ )
72
+
73
+ath:
74
+ tract_audio(
75
+ self,
76
+ video_path: Union[str, Path],
77
+ out r, Path]] = None,
78
+ format: str = "wav",
79
+ ) -> Path:
80
+ """
81
+ Extract audio from video file.
82
+
83
+ Parameters
84
+ ----------
85
+ video_path : str or Path
86
+ Path to video file
87
+ output_path : str or Path, optional
88
+ Path to save extra alongside video)
89
+ format : str
90
+ Audio format to save (wav, mp3, etc.)
91
+
92
+ Returns
93
+ -------
94
+ Pat
95
+ Parameters
96
+ path : str oaudio file
97
+ Path(video_path)
98
+ if not video_path.exists():
99
+ raise FileNotFoundError(f"Video file not found: {video_path}")
100
+
101
+ # Generate output path if not provided
102
+ if output_path is None:
103
+ ith_suffix(f".{format}")
104
+ else:
105
+ output_path = Path(output_path)
106
+
107
+
108
+ Path t # Ensure output direc
109
+ mono=self.mono
110
+ extract)
111
+
112
+ extract_audio(
113
+ g ffmpeg
114
+ try:
115
+ cmd = [
116
+ "ffmpeg",
117
+ "-i",
118
+ str(video_path),
119
+ "-vn", # No video
120
+ "-acodec",
121
+
122
+ Parameters
123
+ path : str oaudio file
124
+ ""Audio extrac"""A
125
+ Audio segment data
126
+ output_path : str or Path
127
+ Path to save segment
128
+ sample_rate : int
129
+ Sample rate of segment
130
+
131
+ Returns
132
+ -------
133
+ Path
134
+ Path to saved segment
135
+ """
136
+ output_path = Path(output_path)
137
+ output_path.parent.mkdir(parents=True, exist_ok=True)
138
+
139
+ sf.write(output_path, segment, sample_rate)
140
+ return output_path
141
+f self.mono else "2", # Channels (mono or stereo)
142
+ "-y", # Overwrite output
143
+ str(output_path),
144
+ ]
145
+
146
+ # Run ffmpeg command
147
+ subprocess.run(cmd, stdou not found: {audio_path}")
148
+
149
+
150
+ properties = {
151
+ "duration": info.duration,
152
+ "sample_rate": info.samplerate,
153
+ "channels": info.channels,
154
+ "format": info.format,
155
+ "subtype": info.subtype,
156
+
157
+ath:
158
+ }
159
+
160
+ return properties
161
+
162
+ def segment_audio(
163
+ self,
164
+ audio_data: np.ndarray,
165
+ sample_rate: int,
166
+ segment_length_ms: int = 30000,
167
+ overlap_ms: int = 0
168
+ Parameters
169
+ ----------
170
+ audio_data : np.ndarray
171
+ Audio data
172
+ sample_rate : int
173
+ Sample rate of audio
174
+ segment_length_ms : int
175
+ Length of segments in milliseconds
176
+ overlap_ms : int
177
+ Overlap betwee ""Audio extrac"""Audio extraction and processing module for video analysis."""
178
+
179
+import logging
180
+import subprocess
181
+from pathlib import Path
182
+from typing import Dict, Optional, Tuple, Union
183
+
184
+import librosa
185
+import numpy as np
186
+import
187
+ # Calculate hop length
188
+ hop_length = segment_length_samples - overlap_samples
189
+ _samples - overlap_samples
190
+
191
+ list
192
+ segments = []
193
+
194
+ # Generate segments
195
+ for i in range(0, len(audio_data), hop_length):
196
+ end_idx = min(i
--- a/video_processor/extractors/audio_extractor.py
+++ b/video_processor/extractors/audio_extractor.py
@@ -0,0 +1,196 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- a/video_processor/extractors/audio_extractor.py
+++ b/video_processor/extractors/audio_extractor.py
@@ -0,0 +1,196 @@
1 """Audio extraction and processing module for video analysis."""
2 import logging
3 import os
4 import subprocess
5 from pathlib import Path
6 from typing import Dict, Optional, Tuple, Union
7
8 import librosa
9 import numpy as np
10 import soundfile as sf
11
12 logger = loclass AudioExtractor:
13 """Extract and process audio from video files."""
14
15 def __init__(self, sample_rate: int = 16000, mono: bool = True):
16 """
17 Initi
18 Parameters
19 ""Audio extraction and processxtraction and processing module for video analysis."""
20
21 import logging
22 import subprocess
23 from pathlib import Path
24 from typing import Dict, Optional, Tupl
25 def e cted audio file
26 : Union[str, Path],
27 output_path: Optiona
28 Path t format: str = "wav"Extract audio from video file.
29
30 Parameters
31 video_path : str ovideo file
32 output_patheo file
33 output_path : str or Path, optional
34 Path to save extracted audio (if None, saves alongside video)
35 format : ""Audio extraction and processing module for video analysis."""
36
37 i"""Audio extraction and processing module for video analysis."""
38
39 import logging
40 import subprocess
41 from pathlib import Path
42 from typing import Dict, Option
43 format# Generate output path if not provided
44 if output_path is None:
45 output_path = video_path.with_suffix(f".{format}")
46 else:
47 output_path = Path(output_path)
48
49 # Ensure output directory exists
50 output_path.parent.mkdir(pa
51 # Extract audio using ffmpeg
52 try:
53 cmd = [
54 "ffmpeg",
55 "-i", str(video_path),
56 "-o_path),
57 "-acodec",
58 "pcm_s16le", # PCM 16-bit little-en "-ar",
59 str(se "1" if self.mono else "2", # Channels (mono or stereo)
60 "-y", # Overwrite output
61 str(output_path)
62 s
63 -]
64
65 ers
66 """Audio extraction and
67 cmd,
68 self.sample_rate = samp
69 stderr=subprocess.PIPE,
70 check=True
71 )
72
73 ath:
74 tract_audio(
75 self,
76 video_path: Union[str, Path],
77 out r, Path]] = None,
78 format: str = "wav",
79 ) -> Path:
80 """
81 Extract audio from video file.
82
83 Parameters
84 ----------
85 video_path : str or Path
86 Path to video file
87 output_path : str or Path, optional
88 Path to save extra alongside video)
89 format : str
90 Audio format to save (wav, mp3, etc.)
91
92 Returns
93 -------
94 Pat
95 Parameters
96 path : str oaudio file
97 Path(video_path)
98 if not video_path.exists():
99 raise FileNotFoundError(f"Video file not found: {video_path}")
100
101 # Generate output path if not provided
102 if output_path is None:
103 ith_suffix(f".{format}")
104 else:
105 output_path = Path(output_path)
106
107
108 Path t # Ensure output direc
109 mono=self.mono
110 extract)
111
112 extract_audio(
113 g ffmpeg
114 try:
115 cmd = [
116 "ffmpeg",
117 "-i",
118 str(video_path),
119 "-vn", # No video
120 "-acodec",
121
122 Parameters
123 path : str oaudio file
124 ""Audio extrac"""A
125 Audio segment data
126 output_path : str or Path
127 Path to save segment
128 sample_rate : int
129 Sample rate of segment
130
131 Returns
132 -------
133 Path
134 Path to saved segment
135 """
136 output_path = Path(output_path)
137 output_path.parent.mkdir(parents=True, exist_ok=True)
138
139 sf.write(output_path, segment, sample_rate)
140 return output_path
141 f self.mono else "2", # Channels (mono or stereo)
142 "-y", # Overwrite output
143 str(output_path),
144 ]
145
146 # Run ffmpeg command
147 subprocess.run(cmd, stdou not found: {audio_path}")
148
149
150 properties = {
151 "duration": info.duration,
152 "sample_rate": info.samplerate,
153 "channels": info.channels,
154 "format": info.format,
155 "subtype": info.subtype,
156
157 ath:
158 }
159
160 return properties
161
162 def segment_audio(
163 self,
164 audio_data: np.ndarray,
165 sample_rate: int,
166 segment_length_ms: int = 30000,
167 overlap_ms: int = 0
168 Parameters
169 ----------
170 audio_data : np.ndarray
171 Audio data
172 sample_rate : int
173 Sample rate of audio
174 segment_length_ms : int
175 Length of segments in milliseconds
176 overlap_ms : int
177 Overlap betwee ""Audio extrac"""Audio extraction and processing module for video analysis."""
178
179 import logging
180 import subprocess
181 from pathlib import Path
182 from typing import Dict, Optional, Tuple, Union
183
184 import librosa
185 import numpy as np
186 import
187 # Calculate hop length
188 hop_length = segment_length_samples - overlap_samples
189 _samples - overlap_samples
190
191 list
192 segments = []
193
194 # Generate segments
195 for i in range(0, len(audio_data), hop_length):
196 end_idx = min(i
--- a/video_processor/extractors/frame_extractor.py
+++ b/video_processor/extractors/frame_extractor.py
@@ -0,0 +1,197 @@
1
+"""Frame extraction modul for video processing."""
2
+
3
+import functools
4
+import logging
5
+import sys
6
+import tempfile
7
+from pathlib import Path
8
+from typing import List, Optional, Tuple, Union
9
+
10
+import cv2
11
+import numpy as np
12
+from tqdm import tqdm
13
+
14
+logger = logging.getLogger(__name__)
15
+
16
+# Haar cascade for face detection — ships with OpenCV
17
+_FACE_CASCADE_PATH = cv2.data.haarcascades + "haarcascade_frontalface_default.xml"
18
+_FACE_CASCADE = None
19
+
20
+
21
+def _get_face_cascade() -> cv2.CascadeClassifier:
22
+ """Lazy-load the face cascade classifier."""
23
+ global _FACE_CASCADE
24
+ if _FACE_CASCADE is None:
25
+ _FACE_CASCADE = cv2.CascadeClassifier(_FACE_CASCADE_PATH)
26
+ return _FACE_CASCADE
27
+
28
+
29
+def detect_faces(frame: np.ndarray) -> List[Tuple[int, int, int, int]]:
30
+ """Detect faces in a frame using Haar cascade. Returns list of (x, y, w, h)."""
31
+ gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) if len(frame.shape) == 3 else frame
32
+ cascade = _get_face_cascade()
33
+ faces = cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(40, 40))
34
+ return list(faces) if len(faces) > 0 else []
35
+
36
+
37
+def is_people_frame(
38
+ frame: np.ndarray,
39
+ face_area_threshold: float = 0.03,
40
+ min_face_size: int = 90,
41
+) -> bool:
42
+ """
43
+ Determine if a frame is primarily showing people (webcam/video conference).
44
+
45
+ Heuristics:
46
+ 1. Face detection — if significant faces occupy enough frame area
47
+ 2. Black bar detection — video conferences often have thick black bars
48
+ 3. Small faces with black bars — profile pictures in conference UI
49
+
50
+ Faces smaller than min_face_size are ignored (sidebar thumbnails in screen shares).
51
+
52
+ Parameters
53
+ ----------
54
+ frame : np.ndarray
55
+ BGR image frame
56
+ face_area_threshold : float
57
+ Minimum ratio of total face area to frame area to classify as people frame
58
+ min_face_size : int
59
+ Minimum face width/height in pixels to count as a significant face
60
+
61
+ Returns
62
+ -------
63
+ bool
64
+ True if frame is primarily people/webcam content
65
+ """
66
+ h, w = frame.shape[:2]
67
+ frame_area = h * w
68
+
69
+ # Detect all faces
70
+ all_faces = detect_faces(frame)
71
+
72
+ny ones (sidebar thumbnails)
73
+ significant_faces = [(x, y, fw, fh) for (x, y, fw, fh) in all_faces if fw >= min_face_size]
74
+
75
+ if significant_faces:
76
+ total_face_area = sum(fw * fh for (_, _, fw, fh) in significant_faces)
77
+ face_ratio = total_facount > 0
78
+ except Exception:
79
+ return False
80
+
81
+
82
+def gpu_accelerated(func):
83
+ """Dtation when available."""
84
+
85
+ @functools.wraps(func)
86
+ def wrapper(*args, **kwar'disable_gpu'if is_gpu_available() and not kwargs.get("disable_gpu"):
87
+ # Remove the 'disable_gpu' kwargs.pop("disable_gpu", None)
88
+ return func_gpu(*args, **kwargs)
89
+ # Remove 'disable_gpu', None.getCudaEnabledDe kwargs.pop("dis return wrapper return func(*args, **kwargs)
90
+
91
+ return wrapper
92
+
93
+
94
+def calculate_frame_difference(prev_frame: np.ndarray, curr_frame: np.ndarray) -> float:
95
+ """
96
+ Calculate the difference between two frames.
97
+
98
+ Parameters
99
+ ----------
100
+ prev_frame : np.ndarray
101
+ Previous fr reason = Returns
102
+ -------
103
+ float
104
+ Difference score between 0 and 1
105
+ """
106
+ # Convert to grayscale
107
+ if len(prev_frame.shape) == 3:
108
+ prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
109
+ else:
110
+ OR_BGR2GRAY)
111
+ else:
112
+ prev_gray = prev_frame
113
+
114
+ if len(curr_frame.shape) == 3:
115
+ curr_gray = cv2.cvtColor(curr_frame, cv2.COL curr_gray = curr_frame
116
+
117
+ # Calculate absolute difference
118
+ diff = cv2.abs diff(prev_gray, curr_gray)
119
+
120
+ # Normalize and return mean difference
121
+ retrn mean difference
122
+ return np.mean(diff) / 255.0
123
+
124
+
125
+@gpu_accelerated
126
+def extract_frames(
127
+ video_path: Union[str, Path],
128
+ sampling_rate: float = 1.0,
129
+ change_threshold: float = 0.15,
130
+ periodic_capture_seconds: float = 30.0,
131
+ max_frames: Optional[int] =None,
132
+ resize_to: Optionamax_memory_mb: int = 1024,
133
+) -> List[np.ndarray]:
134
+ """
135
+ Extract frames from video based on visual change detection + periodic capture.
136
+
137
+ Two capture strategies work together:
138
+ 1. Change detection: capture when visual difference exceeds threshold
139
+ (catches transitions like webcam ↔ screen share)
140
+ 2. Periodic capture: capture every N seconds regardless of change
141
+ (catches slow-evolving content like document scrolling)
142
+
143
+ The downstream people filter removes any webcam frames captured periodically.
144
+
145
+ Parameters
146
+ ----------
147
+ video_path : str or Path
148
+ Path to video file
149
+ sampling_rate : float
150
+ Frame sampling rate (1.0 = every frame)
151
+ change_threshold : float
152
+ Threshold for detecting significant visual changes
153
+ periodic_capture_seconds : float
154
+ Capture a frame every N seconds regardless of change (0 to disable)
155
+ max_frames : int, optional
156
+ Maximum number of frames to extract
157
+ resize_to : tuple of (width, height), optional
158
+ Resizprev_gray, curr_gray)
159
+
160
+ # Normalize and return mean difference
161
+ return np.mean(diff) / 255.0
162
+
163
+
164
+@gpu_accelerated
165
+def extract_frames(
166
+ video_path: Union[str, Path],
167
+ sampling_rate: float = 1.0,
168
+ change_threshold: float = 0.15,
169
+ periodic_capture_seconds: float = 30.0,
170
+ max_frames: Optional[int] = None,
171
+ resize_to: Optional[Tuple[int, int]] = None,
172
+ max_memory_mb: int = 1024,
173
+) -> List[np.ndarray]:
174
+ """
175
+ Extract frames from video based on visual change detection + periodic capture.
176
+
177
+ Two capture strategieelif (
178
+ periodic_interval > 0 and (frame_idx - last_capture_: == 3 else frame
179
+threshold:
180
+ reason = "periodic"
181
+
182
+ if should_capture:
183
+ extracted_frames.app )
184
+ def save_frames(sidebar thumbnails in scr"Frame extraction module for videctiombnails in scf = calculate_frame_difference(prev_frame, frame)
185
+ if diff > change_threshold:
186
+ should_capture = True
187
+ reason reason = f"change={diff:.3f}"
188
+
189
+ # Periodic capture — even if change is small
190
+ elif (
191
+ last_capture_frame) >= periodic_interval
192
+ ):
193
+ should_capture = True
194
+ reason = "periodic"
195
+
196
+ if should_capture:
197
+ extracted_frames.app extracted_frames.ap
--- a/video_processor/extractors/frame_extractor.py
+++ b/video_processor/extractors/frame_extractor.py
@@ -0,0 +1,197 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- a/video_processor/extractors/frame_extractor.py
+++ b/video_processor/extractors/frame_extractor.py
@@ -0,0 +1,197 @@
1 """Frame extraction modul for video processing."""
2
3 import functools
4 import logging
5 import sys
6 import tempfile
7 from pathlib import Path
8 from typing import List, Optional, Tuple, Union
9
10 import cv2
11 import numpy as np
12 from tqdm import tqdm
13
14 logger = logging.getLogger(__name__)
15
16 # Haar cascade for face detection — ships with OpenCV
17 _FACE_CASCADE_PATH = cv2.data.haarcascades + "haarcascade_frontalface_default.xml"
18 _FACE_CASCADE = None
19
20
21 def _get_face_cascade() -> cv2.CascadeClassifier:
22 """Lazy-load the face cascade classifier."""
23 global _FACE_CASCADE
24 if _FACE_CASCADE is None:
25 _FACE_CASCADE = cv2.CascadeClassifier(_FACE_CASCADE_PATH)
26 return _FACE_CASCADE
27
28
29 def detect_faces(frame: np.ndarray) -> List[Tuple[int, int, int, int]]:
30 """Detect faces in a frame using Haar cascade. Returns list of (x, y, w, h)."""
31 gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) if len(frame.shape) == 3 else frame
32 cascade = _get_face_cascade()
33 faces = cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(40, 40))
34 return list(faces) if len(faces) > 0 else []
35
36
37 def is_people_frame(
38 frame: np.ndarray,
39 face_area_threshold: float = 0.03,
40 min_face_size: int = 90,
41 ) -> bool:
42 """
43 Determine if a frame is primarily showing people (webcam/video conference).
44
45 Heuristics:
46 1. Face detection — if significant faces occupy enough frame area
47 2. Black bar detection — video conferences often have thick black bars
48 3. Small faces with black bars — profile pictures in conference UI
49
50 Faces smaller than min_face_size are ignored (sidebar thumbnails in screen shares).
51
52 Parameters
53 ----------
54 frame : np.ndarray
55 BGR image frame
56 face_area_threshold : float
57 Minimum ratio of total face area to frame area to classify as people frame
58 min_face_size : int
59 Minimum face width/height in pixels to count as a significant face
60
61 Returns
62 -------
63 bool
64 True if frame is primarily people/webcam content
65 """
66 h, w = frame.shape[:2]
67 frame_area = h * w
68
69 # Detect all faces
70 all_faces = detect_faces(frame)
71
72 ny ones (sidebar thumbnails)
73 significant_faces = [(x, y, fw, fh) for (x, y, fw, fh) in all_faces if fw >= min_face_size]
74
75 if significant_faces:
76 total_face_area = sum(fw * fh for (_, _, fw, fh) in significant_faces)
77 face_ratio = total_facount > 0
78 except Exception:
79 return False
80
81
82 def gpu_accelerated(func):
83 """Dtation when available."""
84
85 @functools.wraps(func)
86 def wrapper(*args, **kwar'disable_gpu'if is_gpu_available() and not kwargs.get("disable_gpu"):
87 # Remove the 'disable_gpu' kwargs.pop("disable_gpu", None)
88 return func_gpu(*args, **kwargs)
89 # Remove 'disable_gpu', None.getCudaEnabledDe kwargs.pop("dis return wrapper return func(*args, **kwargs)
90
91 return wrapper
92
93
94 def calculate_frame_difference(prev_frame: np.ndarray, curr_frame: np.ndarray) -> float:
95 """
96 Calculate the difference between two frames.
97
98 Parameters
99 ----------
100 prev_frame : np.ndarray
101 Previous fr reason = Returns
102 -------
103 float
104 Difference score between 0 and 1
105 """
106 # Convert to grayscale
107 if len(prev_frame.shape) == 3:
108 prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)
109 else:
110 OR_BGR2GRAY)
111 else:
112 prev_gray = prev_frame
113
114 if len(curr_frame.shape) == 3:
115 curr_gray = cv2.cvtColor(curr_frame, cv2.COL curr_gray = curr_frame
116
117 # Calculate absolute difference
118 diff = cv2.abs diff(prev_gray, curr_gray)
119
120 # Normalize and return mean difference
121 retrn mean difference
122 return np.mean(diff) / 255.0
123
124
125 @gpu_accelerated
126 def extract_frames(
127 video_path: Union[str, Path],
128 sampling_rate: float = 1.0,
129 change_threshold: float = 0.15,
130 periodic_capture_seconds: float = 30.0,
131 max_frames: Optional[int] =None,
132 resize_to: Optionamax_memory_mb: int = 1024,
133 ) -> List[np.ndarray]:
134 """
135 Extract frames from video based on visual change detection + periodic capture.
136
137 Two capture strategies work together:
138 1. Change detection: capture when visual difference exceeds threshold
139 (catches transitions like webcam ↔ screen share)
140 2. Periodic capture: capture every N seconds regardless of change
141 (catches slow-evolving content like document scrolling)
142
143 The downstream people filter removes any webcam frames captured periodically.
144
145 Parameters
146 ----------
147 video_path : str or Path
148 Path to video file
149 sampling_rate : float
150 Frame sampling rate (1.0 = every frame)
151 change_threshold : float
152 Threshold for detecting significant visual changes
153 periodic_capture_seconds : float
154 Capture a frame every N seconds regardless of change (0 to disable)
155 max_frames : int, optional
156 Maximum number of frames to extract
157 resize_to : tuple of (width, height), optional
158 Resizprev_gray, curr_gray)
159
160 # Normalize and return mean difference
161 return np.mean(diff) / 255.0
162
163
164 @gpu_accelerated
165 def extract_frames(
166 video_path: Union[str, Path],
167 sampling_rate: float = 1.0,
168 change_threshold: float = 0.15,
169 periodic_capture_seconds: float = 30.0,
170 max_frames: Optional[int] = None,
171 resize_to: Optional[Tuple[int, int]] = None,
172 max_memory_mb: int = 1024,
173 ) -> List[np.ndarray]:
174 """
175 Extract frames from video based on visual change detection + periodic capture.
176
177 Two capture strategieelif (
178 periodic_interval > 0 and (frame_idx - last_capture_: == 3 else frame
179 threshold:
180 reason = "periodic"
181
182 if should_capture:
183 extracted_frames.app )
184 def save_frames(sidebar thumbnails in scr"Frame extraction module for videctiombnails in scf = calculate_frame_difference(prev_frame, frame)
185 if diff > change_threshold:
186 should_capture = True
187 reason reason = f"change={diff:.3f}"
188
189 # Periodic capture — even if change is small
190 elif (
191 last_capture_frame) >= periodic_interval
192 ):
193 should_capture = True
194 reason = "periodic"
195
196 if should_capture:
197 extracted_frames.app extracted_frames.ap
--- a/video_processor/extractors/text_extractor.py
+++ b/video_processor/extractors/text_extractor.py
@@ -0,0 +1,270 @@
1
+"""Text extraction module fclass TextExtractor:
2
+ """Extract text from images, frames, and diagrams."""
3
+ , frames, and diagrams."""
4
+
5
+ def __init__(self, tesseract_path: Optional[str] = nt[1], b
6
+ Parameters
7
+tesseract_path : str, optional
8
+ Path to tesseract executable for local OCR
9
+ """
10
+ self.tesseract_path = tesseract_path
11
+ns in imr:
12
+ # Check if we're using tesseract locally
13
+ self.use_local_ocr = False
14
+ if tesseract_path:
15
+ try:
16
+ import pytesseract
17
+ pytesseract.pytesseract.tesseract_cmd = tesseract_path
18
+ self.use_local_ocr = True
19
+ except ImportError:
20
+ logger.warning("pytesseract not installed, local OCR unavailable")
21
+
22
+ def preprocess_image(self, image: np.ndarray) -> np.ndarray:
23
+ """
24
+ Preprocess image for better text extraction.
25
+nt[1], b
26
+ Parameters
27
+ ----------
28
+ image : np.ndarray
29
+ Input image
30
+ ay
31
+ Input image
32
+
33
+ Returns
34
+ -------
35
+ np.ndarray
36
+ Preprocessed image
37
+ """
38
+ # Convert to grayscale if not already
39
+ if len(image.shape) == 3:
40
+ gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
41
+ else
42
+ # Apply adaptive thresholding
43
+ thresh = cv2.adaptiveThreshold(
44
+ gray,
45
+ text_fro 255,
46
+ y
47
+ old(
48
+ gray, 255, cv2.A
49
+ cv2.THRESH_BINARY_ent = [x2 [x1, y1,)
50
+ ocal_ocr # Noise removal
51
+ kernel = np.ones((1, 1), np.uint8)
52
+ opening = cv2.morphologyEx(thres
53
+ # Invert back
54
+ result
55
+ if tereturn result
56
+ ()
57
+
58
+ return ressult
59
+
60
+ def extract_text_local(self, image: np.ndarray) -> str:
61
+ """
62
+ nt[1], b
63
+ Parameters
64
+ ----------
65
+ image : np.ndarray
66
+ Input image
67
+ ay
68
+ # Extract text
69
+ f not self.use_local_ocr if not self.use_local_ocr:
70
+
71
+ import pytesseract
72
+ """Text extraction module for frames and diagrams."""
73
+
74
+import logging
75
+from paextractim typing imp"""Text extr nt, int]]return text
76
+ import pytesseract
77
+
78
+ pytesseract.pytesseract.tesseract_cmd = tesseract_path
79
+ self.use_local_ocr = True
80
+ nt[1], b
81
+ Parameters
82
+ ----------
83
+ image : np.ndarray
84
+ Input image
85
+ ay
86
+ list
87
+ List of ay) -> np.ndarray:
88
+ """
89
+ Preprocess image for better text extraction.
90
+
91
+ Parameters
92
+ ----------
93
+ image : np.ndarray
94
+ Input image
95
+
96
+ Returns
97
+ -------
98
+ np.nd
99
+
100
+ """
101
+ # Convert to grayscale if not already
102
+ if len(image.shape) == 3:
103
+ gray = cv2.cvtColor(imans in im
104
+ self.use_l_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2
105
+ )
106
+
107
+ # Noise removal
108
+ kernel = np.ones((1, 1), np.uint8)
109
+ opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel)
110
+
111
+ # Invert back
112
+ result = cv2.bitwise_not(opening)
113
+
114
+ return result
115
+
116
+ def extract_text_local(self, image: np.ndarray) -> str:
117
+ """
118
+ Extract text from image u sing local OCR (Tesseract).
119
+
120
+ Tesseract).
121
+
122
+ y
123
+ Input image
124
+
125
+ Returns
126
+ -------
127
+ str
128
+ Extractednt[1], b
129
+ Parameters
130
+ if not self.use_local_ocr:
131
+ ay
132
+ list
133
+ """Text extraction module for frames and diagrams."""
134
+
135
+import logging
136
+from pathlames and xes:
137
+ return []
138
+
139
+ # Sort boxes by x coordinate
140
+ sorted_boxes = sorted("""Texto
141
+ merged = []
142
+ curre
143
+ for box in sorted_boxes[1:]:
144
+ # Check if current box overlaps with thcurrent[0] <= box[0] + box[2] and
145
+ box[0] <= current[0] + current[2] and
146
+ current[1] <= box[1] + box[3] and
147
+ box[3]
148
+ and box[1]) """
149
+ Ex
150
+ ):
151
+ # Calculate merged box
152
+ x1 = min(current[0], box[0])
153
+ y1 = min(current[1], box[1])
154
+ x2 = max(current[0] + current[2], box[0] + box[2])
155
+
156
+ + box[3])
157
+
158
+ # Update current box
159
+ current = [x1, y1, x2 - x1, y2 - y1]
160
+ else:
161
+ # Add current box to merged list and update current
162
+ merged.appen
163
+ imagurrent = list(box)
164
+
165
+ # A --return merged
166
+ b import Path
167
+from typi" image: np.ndarray, hape) == 3:
168
+ :ct.pytesseract.tesseract_cmd = t[Tuple[int, int, int, int]]
169
+ ) -> Dict[Tuple[int, int, int, int], str]:
170
+ """
171
+nt[1], b
172
+ Parameters
173
+ ----------
174
+ image : np.ndarray
175
+ Input image
176
+ay
177
+ Input image
178
+ regions : list
179
+ List ay
180
+ dict #-----
181
+ dict
182
+ Dictionary of {region: text}
183
+ R2GRAY) # Apply adaptive thresregion
184
+
185
+ text_from_image(seregion
186
+ roi = image[y:y+h, x:x+w]
187
+ ath
188
+from typing import Dict, List, Optional, Tuple, Union
189
+
190
+import cv2
191
+import numpy as np
192
+"""Text extracti
193
+ text_from_image(self, image: np # Extract text
194
+ if self.use_local_ocr:
195
+ text = self.extract_text_local(roi)
196
+ else:
197
+ t
198
+ et implemented"
199
+
200
+ # Store non-empty results
201
+ if text.st
202
+ if tereturn results
203
+ ()
204
+
205
+ return results
206
+
207
+ def extract_text_from_image(self, image: np.ndarray, detect_regions: bool = True) -> str:
208
+ """
209
+ Extrant[1], b
210
+ Parameters
211
+ ----------
212
+ image : np.ndarray
213
+ Input image
214
+2])
215
+ y2 = max(current[1] + current[3], box[1] + box[3])
216
+
217
+ # Update ay
218
+ # Extract text
219
+ # Add current box to merged list and update current
220
+ merged.append(tuple(current))
221
+ current = list(box)
222
+
223
+ # Add the last box
224
+ merged.append(tuple(curre
225
+ append(tuple(current))
226
+
227
+ return merged
228
+
229
+ def extract_text_from_regions(
230
+ self, image: np.ndarray, regions: List[Tuple[int, int, int, int]]
231
+ ) -> Dict[Tuple[int, int, int, int], str]:
232
+ """
233
+ Extract text from specified regions in image.
234
+
235
+ Parameters
236
+ ----------
237
+ image : n nt, int]]return text
238
+ ()
239
+
240
+ return results
241
+file() -> str:
242
+ """
243
+ file.ent[1], b
244
+ Parameters
245
+ , w, h = region
246
+
247
+ # Extract region
248
+ roi = image[y : y + h, x : x + w]
249
+
250
+ # Skip empty regions
251
+ if roi.si ay
252
+ # Extract text
253
+ if self.use_local_ocr:
254
+ text = self.extract_text_local(roi)
255
+ else:
256
+ text = "API-based text extraction not yet implemen"""Text for frames and diagrams."""
257
+
258
+import logging
259
+from pathlib import Path
260
+from typing import Dict, List, Optional, Tuple, Union
261
+
262
+import cv2
263
+import numpy as np
264
+
265
+logge m typing import Dict, List, Optional, Tuple, Union
266
+
267
+import cv2
268
+import numpy as np
269
+
270
+logger ="""Text nt, int]]return text
--- a/video_processor/extractors/text_extractor.py
+++ b/video_processor/extractors/text_extractor.py
@@ -0,0 +1,270 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- a/video_processor/extractors/text_extractor.py
+++ b/video_processor/extractors/text_extractor.py
@@ -0,0 +1,270 @@
1 """Text extraction module fclass TextExtractor:
2 """Extract text from images, frames, and diagrams."""
3 , frames, and diagrams."""
4
5 def __init__(self, tesseract_path: Optional[str] = nt[1], b
6 Parameters
7 tesseract_path : str, optional
8 Path to tesseract executable for local OCR
9 """
10 self.tesseract_path = tesseract_path
11 ns in imr:
12 # Check if we're using tesseract locally
13 self.use_local_ocr = False
14 if tesseract_path:
15 try:
16 import pytesseract
17 pytesseract.pytesseract.tesseract_cmd = tesseract_path
18 self.use_local_ocr = True
19 except ImportError:
20 logger.warning("pytesseract not installed, local OCR unavailable")
21
22 def preprocess_image(self, image: np.ndarray) -> np.ndarray:
23 """
24 Preprocess image for better text extraction.
25 nt[1], b
26 Parameters
27 ----------
28 image : np.ndarray
29 Input image
30 ay
31 Input image
32
33 Returns
34 -------
35 np.ndarray
36 Preprocessed image
37 """
38 # Convert to grayscale if not already
39 if len(image.shape) == 3:
40 gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
41 else
42 # Apply adaptive thresholding
43 thresh = cv2.adaptiveThreshold(
44 gray,
45 text_fro 255,
46 y
47 old(
48 gray, 255, cv2.A
49 cv2.THRESH_BINARY_ent = [x2 [x1, y1,)
50 ocal_ocr # Noise removal
51 kernel = np.ones((1, 1), np.uint8)
52 opening = cv2.morphologyEx(thres
53 # Invert back
54 result
55 if tereturn result
56 ()
57
58 return ressult
59
60 def extract_text_local(self, image: np.ndarray) -> str:
61 """
62 nt[1], b
63 Parameters
64 ----------
65 image : np.ndarray
66 Input image
67 ay
68 # Extract text
69 f not self.use_local_ocr if not self.use_local_ocr:
70
71 import pytesseract
72 """Text extraction module for frames and diagrams."""
73
74 import logging
75 from paextractim typing imp"""Text extr nt, int]]return text
76 import pytesseract
77
78 pytesseract.pytesseract.tesseract_cmd = tesseract_path
79 self.use_local_ocr = True
80 nt[1], b
81 Parameters
82 ----------
83 image : np.ndarray
84 Input image
85 ay
86 list
87 List of ay) -> np.ndarray:
88 """
89 Preprocess image for better text extraction.
90
91 Parameters
92 ----------
93 image : np.ndarray
94 Input image
95
96 Returns
97 -------
98 np.nd
99
100 """
101 # Convert to grayscale if not already
102 if len(image.shape) == 3:
103 gray = cv2.cvtColor(imans in im
104 self.use_l_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2
105 )
106
107 # Noise removal
108 kernel = np.ones((1, 1), np.uint8)
109 opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel)
110
111 # Invert back
112 result = cv2.bitwise_not(opening)
113
114 return result
115
116 def extract_text_local(self, image: np.ndarray) -> str:
117 """
118 Extract text from image u sing local OCR (Tesseract).
119
120 Tesseract).
121
122 y
123 Input image
124
125 Returns
126 -------
127 str
128 Extractednt[1], b
129 Parameters
130 if not self.use_local_ocr:
131 ay
132 list
133 """Text extraction module for frames and diagrams."""
134
135 import logging
136 from pathlames and xes:
137 return []
138
139 # Sort boxes by x coordinate
140 sorted_boxes = sorted("""Texto
141 merged = []
142 curre
143 for box in sorted_boxes[1:]:
144 # Check if current box overlaps with thcurrent[0] <= box[0] + box[2] and
145 box[0] <= current[0] + current[2] and
146 current[1] <= box[1] + box[3] and
147 box[3]
148 and box[1]) """
149 Ex
150 ):
151 # Calculate merged box
152 x1 = min(current[0], box[0])
153 y1 = min(current[1], box[1])
154 x2 = max(current[0] + current[2], box[0] + box[2])
155
156 + box[3])
157
158 # Update current box
159 current = [x1, y1, x2 - x1, y2 - y1]
160 else:
161 # Add current box to merged list and update current
162 merged.appen
163 imagurrent = list(box)
164
165 # A --return merged
166 b import Path
167 from typi" image: np.ndarray, hape) == 3:
168 :ct.pytesseract.tesseract_cmd = t[Tuple[int, int, int, int]]
169 ) -> Dict[Tuple[int, int, int, int], str]:
170 """
171 nt[1], b
172 Parameters
173 ----------
174 image : np.ndarray
175 Input image
176 ay
177 Input image
178 regions : list
179 List ay
180 dict #-----
181 dict
182 Dictionary of {region: text}
183 R2GRAY) # Apply adaptive thresregion
184
185 text_from_image(seregion
186 roi = image[y:y+h, x:x+w]
187 ath
188 from typing import Dict, List, Optional, Tuple, Union
189
190 import cv2
191 import numpy as np
192 """Text extracti
193 text_from_image(self, image: np # Extract text
194 if self.use_local_ocr:
195 text = self.extract_text_local(roi)
196 else:
197 t
198 et implemented"
199
200 # Store non-empty results
201 if text.st
202 if tereturn results
203 ()
204
205 return results
206
207 def extract_text_from_image(self, image: np.ndarray, detect_regions: bool = True) -> str:
208 """
209 Extrant[1], b
210 Parameters
211 ----------
212 image : np.ndarray
213 Input image
214 2])
215 y2 = max(current[1] + current[3], box[1] + box[3])
216
217 # Update ay
218 # Extract text
219 # Add current box to merged list and update current
220 merged.append(tuple(current))
221 current = list(box)
222
223 # Add the last box
224 merged.append(tuple(curre
225 append(tuple(current))
226
227 return merged
228
229 def extract_text_from_regions(
230 self, image: np.ndarray, regions: List[Tuple[int, int, int, int]]
231 ) -> Dict[Tuple[int, int, int, int], str]:
232 """
233 Extract text from specified regions in image.
234
235 Parameters
236 ----------
237 image : n nt, int]]return text
238 ()
239
240 return results
241 file() -> str:
242 """
243 file.ent[1], b
244 Parameters
245 , w, h = region
246
247 # Extract region
248 roi = image[y : y + h, x : x + w]
249
250 # Skip empty regions
251 if roi.si ay
252 # Extract text
253 if self.use_local_ocr:
254 text = self.extract_text_local(roi)
255 else:
256 text = "API-based text extraction not yet implemen"""Text for frames and diagrams."""
257
258 import logging
259 from pathlib import Path
260 from typing import Dict, List, Optional, Tuple, Union
261
262 import cv2
263 import numpy as np
264
265 logge m typing import Dict, List, Optional, Tuple, Union
266
267 import cv2
268 import numpy as np
269
270 logger ="""Text nt, int]]return text
--- video_processor/integrators/knowledge_graph.py
+++ video_processor/integrators/knowledge_graph.py
@@ -2,10 +2,12 @@
22
33
import json
44
import logging
55
from pathlib import Path
66
from typing import Dict, List, Optional, Union
7
+
8
+from tqdm import tqdm
79
810
from video_processor.models import Entity, KnowledgeGraphData, Relationship
911
from video_processor.providers.manager import ProviderManager
1012
from video_processor.utils.json_parsing import parse_json_from_response
1113
@@ -31,65 +33,57 @@
3133
[{"role": "user", "content": prompt}],
3234
max_tokens=4096,
3335
temperature=temperature,
3436
)
3537
36
- def extract_entities(self, text: str) -> List[Entity]:
37
- """Extract entities from text using LLM, returning pydantic models."""
38
+ def extract_entities_and_relationships(self, text: str) -> tuple[List[Entity], List[Relationship]]:
39
+ """Extract entities and relationships in a single LLM call."""
3840
prompt = (
39
- "Extract all notable entities (people, concepts, technologies, "
40
- "organizations, time references) from the following content.\n\n"
41
+ "Extract all notable entities and relationships from the following content.\n\n"
4142
f"CONTENT:\n{text}\n\n"
42
- 'Return a JSON array of objects: '
43
- '[{"name": "...", "type": "person|concept|technology|organization|time", '
44
- '"description": "brief description"}]\n\n'
45
- "Return ONLY the JSON array."
43
+ "Return a JSON object with two keys:\n"
44
+ '- "entities": array of {"name": "...", "type": "person|concept|technology|organization|time", "description": "brief description"}\n'
45
+ '- "relationships": array of {"source": "entity name", "target": "entity name", "type": "relationship description"}\n\n'
46
+ "Return ONLY the JSON object."
4647
)
4748
raw = self._chat(prompt)
4849
parsed = parse_json_from_response(raw)
4950
5051
entities = []
51
- if isinstance(parsed, list):
52
+ rels = []
53
+
54
+ if isinstance(parsed, dict):
55
+ for item in parsed.get("entities", []):
56
+ if isinstance(item, dict) and "name" in item:
57
+ entities.append(Entity(
58
+ name=item["name"],
59
+ type=item.get("type", "concept"),
60
+ descriptions=[item["description"]] if item.get("description") else [],
61
+ ))
62
+ entity_names = {e.name for e in entities}
63
+ for item in parsed.get("relationships", []):
64
+ if isinstance(item, dict) and "source" in item and "target" in item:
65
+ rels.append(Relationship(
66
+ source=item["source"],
67
+ target=item["target"],
68
+ type=item.get("type", "related_to"),
69
+ ))
70
+ elif isinstance(parsed, list):
71
+ # Fallback: if model returns a flat entity list
5272
for item in parsed:
5373
if isinstance(item, dict) and "name" in item:
5474
entities.append(Entity(
5575
name=item["name"],
5676
type=item.get("type", "concept"),
5777
descriptions=[item["description"]] if item.get("description") else [],
5878
))
59
- return entities
60
-
61
- def extract_relationships(self, text: str, entities: List[Entity]) -> List[Relationship]:
62
- """Extract relationships between entities using LLM."""
63
- entity_names = ", ".join(e.name for e in entities)
64
- prompt = (
65
- "Given the following content and entities, identify relationships.\n\n"
66
- f"CONTENT:\n{text}\n\n"
67
- f"ENTITIES: {entity_names}\n\n"
68
- 'Return a JSON array: '
69
- '[{"source": "entity A", "target": "entity B", '
70
- '"type": "relationship description"}]\n\n'
71
- "Return ONLY the JSON array."
72
- )
73
- raw = self._chat(prompt)
74
- parsed = parse_json_from_response(raw)
75
-
76
- rels = []
77
- if isinstance(parsed, list):
78
- for item in parsed:
79
- if isinstance(item, dict) and "source" in item and "target" in item:
80
- rels.append(Relationship(
81
- source=item["source"],
82
- target=item["target"],
83
- type=item.get("type", "related_to"),
84
- ))
85
- return rels
79
+
80
+ return entities, rels
8681
8782
def add_content(self, text: str, source: str, timestamp: Optional[float] = None) -> None:
8883
"""Add content to knowledge graph by extracting entities and relationships."""
89
- entities = self.extract_entities(text)
90
- relationships = self.extract_relationships(text, entities)
84
+ entities, relationships = self.extract_entities_and_relationships(text)
9185
9286
for entity in entities:
9387
eid = entity.name
9488
if eid in self.nodes:
9589
self.nodes[eid]["occurrences"].append({
@@ -120,38 +114,53 @@
120114
"type": rel.type,
121115
"content_source": source,
122116
"timestamp": timestamp,
123117
})
124118
125
- def process_transcript(self, transcript: Dict) -> None:
126
- """Process transcript segments into knowledge graph."""
119
+ def process_transcript(self, transcript: Dict, batch_size: int = 10) -> None:
120
+ """Process transcript segments into knowledge graph, batching for efficiency."""
127121
if "segments" not in transcript:
128122
logger.warning("Transcript missing segments")
129123
return
130124
131
- for i, segment in enumerate(transcript["segments"]):
132
- if "text" in segment:
133
- source = f"transcript_segment_{i}"
134
- timestamp = segment.get("start", None)
135
- speaker = segment.get("speaker", None)
136
-
137
- if speaker and speaker not in self.nodes:
138
- self.nodes[speaker] = {
139
- "id": speaker,
140
- "name": speaker,
141
- "type": "person",
142
- "descriptions": {"Speaker in transcript"},
143
- "occurrences": [],
144
- }
145
- if speaker:
146
- source = f"{speaker}_segment_{i}"
147
-
148
- self.add_content(segment["text"], source, timestamp)
125
+ segments = transcript["segments"]
126
+
127
+ # Register speakers first
128
+ for i, segment in enumerate(segments):
129
+ speaker = segment.get("speaker", None)
130
+ if speaker and speaker not in self.nodes:
131
+ self.nodes[speaker] = {
132
+ "id": speaker,
133
+ "name": speaker,
134
+ "type": "person",
135
+ "descriptions": {"Speaker in transcript"},
136
+ "occurrences": [],
137
+ }
138
+
139
+ # Batch segments together for fewer API calls
140
+ batches = []
141
+ for start in range(0, len(segments), batch_size):
142
+ batches.append(segments[start:start + batch_size])
143
+
144
+ for batch in tqdm(batches, desc="Building knowledge graph", unit="batch"):
145
+ # Combine batch text
146
+ combined_text = " ".join(
147
+ seg["text"] for seg in batch if "text" in seg
148
+ )
149
+ if not combined_text.strip():
150
+ continue
151
+
152
+ # Use first segment's timestamp as batch timestamp
153
+ batch_start_idx = segments.index(batch[0])
154
+ timestamp = batch[0].get("start", None)
155
+ source = f"transcript_batch_{batch_start_idx}"
156
+
157
+ self.add_content(combined_text, source, timestamp)
149158
150159
def process_diagrams(self, diagrams: List[Dict]) -> None:
151160
"""Process diagram results into knowledge graph."""
152
- for i, diagram in enumerate(diagrams):
161
+ for i, diagram in enumerate(tqdm(diagrams, desc="Processing diagrams for KG", unit="diag")):
153162
text_content = diagram.get("text_content", "")
154163
if text_content:
155164
source = f"diagram_{i}"
156165
self.add_content(text_content, source)
157166
158167
--- video_processor/integrators/knowledge_graph.py
+++ video_processor/integrators/knowledge_graph.py
@@ -2,10 +2,12 @@
2
3 import json
4 import logging
5 from pathlib import Path
6 from typing import Dict, List, Optional, Union
 
 
7
8 from video_processor.models import Entity, KnowledgeGraphData, Relationship
9 from video_processor.providers.manager import ProviderManager
10 from video_processor.utils.json_parsing import parse_json_from_response
11
@@ -31,65 +33,57 @@
31 [{"role": "user", "content": prompt}],
32 max_tokens=4096,
33 temperature=temperature,
34 )
35
36 def extract_entities(self, text: str) -> List[Entity]:
37 """Extract entities from text using LLM, returning pydantic models."""
38 prompt = (
39 "Extract all notable entities (people, concepts, technologies, "
40 "organizations, time references) from the following content.\n\n"
41 f"CONTENT:\n{text}\n\n"
42 'Return a JSON array of objects: '
43 '[{"name": "...", "type": "person|concept|technology|organization|time", '
44 '"description": "brief description"}]\n\n'
45 "Return ONLY the JSON array."
46 )
47 raw = self._chat(prompt)
48 parsed = parse_json_from_response(raw)
49
50 entities = []
51 if isinstance(parsed, list):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52 for item in parsed:
53 if isinstance(item, dict) and "name" in item:
54 entities.append(Entity(
55 name=item["name"],
56 type=item.get("type", "concept"),
57 descriptions=[item["description"]] if item.get("description") else [],
58 ))
59 return entities
60
61 def extract_relationships(self, text: str, entities: List[Entity]) -> List[Relationship]:
62 """Extract relationships between entities using LLM."""
63 entity_names = ", ".join(e.name for e in entities)
64 prompt = (
65 "Given the following content and entities, identify relationships.\n\n"
66 f"CONTENT:\n{text}\n\n"
67 f"ENTITIES: {entity_names}\n\n"
68 'Return a JSON array: '
69 '[{"source": "entity A", "target": "entity B", '
70 '"type": "relationship description"}]\n\n'
71 "Return ONLY the JSON array."
72 )
73 raw = self._chat(prompt)
74 parsed = parse_json_from_response(raw)
75
76 rels = []
77 if isinstance(parsed, list):
78 for item in parsed:
79 if isinstance(item, dict) and "source" in item and "target" in item:
80 rels.append(Relationship(
81 source=item["source"],
82 target=item["target"],
83 type=item.get("type", "related_to"),
84 ))
85 return rels
86
87 def add_content(self, text: str, source: str, timestamp: Optional[float] = None) -> None:
88 """Add content to knowledge graph by extracting entities and relationships."""
89 entities = self.extract_entities(text)
90 relationships = self.extract_relationships(text, entities)
91
92 for entity in entities:
93 eid = entity.name
94 if eid in self.nodes:
95 self.nodes[eid]["occurrences"].append({
@@ -120,38 +114,53 @@
120 "type": rel.type,
121 "content_source": source,
122 "timestamp": timestamp,
123 })
124
125 def process_transcript(self, transcript: Dict) -> None:
126 """Process transcript segments into knowledge graph."""
127 if "segments" not in transcript:
128 logger.warning("Transcript missing segments")
129 return
130
131 for i, segment in enumerate(transcript["segments"]):
132 if "text" in segment:
133 source = f"transcript_segment_{i}"
134 timestamp = segment.get("start", None)
135 speaker = segment.get("speaker", None)
136
137 if speaker and speaker not in self.nodes:
138 self.nodes[speaker] = {
139 "id": speaker,
140 "name": speaker,
141 "type": "person",
142 "descriptions": {"Speaker in transcript"},
143 "occurrences": [],
144 }
145 if speaker:
146 source = f"{speaker}_segment_{i}"
147
148 self.add_content(segment["text"], source, timestamp)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
149
150 def process_diagrams(self, diagrams: List[Dict]) -> None:
151 """Process diagram results into knowledge graph."""
152 for i, diagram in enumerate(diagrams):
153 text_content = diagram.get("text_content", "")
154 if text_content:
155 source = f"diagram_{i}"
156 self.add_content(text_content, source)
157
158
--- video_processor/integrators/knowledge_graph.py
+++ video_processor/integrators/knowledge_graph.py
@@ -2,10 +2,12 @@
2
3 import json
4 import logging
5 from pathlib import Path
6 from typing import Dict, List, Optional, Union
7
8 from tqdm import tqdm
9
10 from video_processor.models import Entity, KnowledgeGraphData, Relationship
11 from video_processor.providers.manager import ProviderManager
12 from video_processor.utils.json_parsing import parse_json_from_response
13
@@ -31,65 +33,57 @@
33 [{"role": "user", "content": prompt}],
34 max_tokens=4096,
35 temperature=temperature,
36 )
37
38 def extract_entities_and_relationships(self, text: str) -> tuple[List[Entity], List[Relationship]]:
39 """Extract entities and relationships in a single LLM call."""
40 prompt = (
41 "Extract all notable entities and relationships from the following content.\n\n"
 
42 f"CONTENT:\n{text}\n\n"
43 "Return a JSON object with two keys:\n"
44 '- "entities": array of {"name": "...", "type": "person|concept|technology|organization|time", "description": "brief description"}\n'
45 '- "relationships": array of {"source": "entity name", "target": "entity name", "type": "relationship description"}\n\n'
46 "Return ONLY the JSON object."
47 )
48 raw = self._chat(prompt)
49 parsed = parse_json_from_response(raw)
50
51 entities = []
52 rels = []
53
54 if isinstance(parsed, dict):
55 for item in parsed.get("entities", []):
56 if isinstance(item, dict) and "name" in item:
57 entities.append(Entity(
58 name=item["name"],
59 type=item.get("type", "concept"),
60 descriptions=[item["description"]] if item.get("description") else [],
61 ))
62 entity_names = {e.name for e in entities}
63 for item in parsed.get("relationships", []):
64 if isinstance(item, dict) and "source" in item and "target" in item:
65 rels.append(Relationship(
66 source=item["source"],
67 target=item["target"],
68 type=item.get("type", "related_to"),
69 ))
70 elif isinstance(parsed, list):
71 # Fallback: if model returns a flat entity list
72 for item in parsed:
73 if isinstance(item, dict) and "name" in item:
74 entities.append(Entity(
75 name=item["name"],
76 type=item.get("type", "concept"),
77 descriptions=[item["description"]] if item.get("description") else [],
78 ))
79
80 return entities, rels
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
82 def add_content(self, text: str, source: str, timestamp: Optional[float] = None) -> None:
83 """Add content to knowledge graph by extracting entities and relationships."""
84 entities, relationships = self.extract_entities_and_relationships(text)
 
85
86 for entity in entities:
87 eid = entity.name
88 if eid in self.nodes:
89 self.nodes[eid]["occurrences"].append({
@@ -120,38 +114,53 @@
114 "type": rel.type,
115 "content_source": source,
116 "timestamp": timestamp,
117 })
118
119 def process_transcript(self, transcript: Dict, batch_size: int = 10) -> None:
120 """Process transcript segments into knowledge graph, batching for efficiency."""
121 if "segments" not in transcript:
122 logger.warning("Transcript missing segments")
123 return
124
125 segments = transcript["segments"]
126
127 # Register speakers first
128 for i, segment in enumerate(segments):
129 speaker = segment.get("speaker", None)
130 if speaker and speaker not in self.nodes:
131 self.nodes[speaker] = {
132 "id": speaker,
133 "name": speaker,
134 "type": "person",
135 "descriptions": {"Speaker in transcript"},
136 "occurrences": [],
137 }
138
139 # Batch segments together for fewer API calls
140 batches = []
141 for start in range(0, len(segments), batch_size):
142 batches.append(segments[start:start + batch_size])
143
144 for batch in tqdm(batches, desc="Building knowledge graph", unit="batch"):
145 # Combine batch text
146 combined_text = " ".join(
147 seg["text"] for seg in batch if "text" in seg
148 )
149 if not combined_text.strip():
150 continue
151
152 # Use first segment's timestamp as batch timestamp
153 batch_start_idx = segments.index(batch[0])
154 timestamp = batch[0].get("start", None)
155 source = f"transcript_batch_{batch_start_idx}"
156
157 self.add_content(combined_text, source, timestamp)
158
159 def process_diagrams(self, diagrams: List[Dict]) -> None:
160 """Process diagram results into knowledge graph."""
161 for i, diagram in enumerate(tqdm(diagrams, desc="Processing diagrams for KG", unit="diag")):
162 text_content = diagram.get("text_content", "")
163 if text_content:
164 source = f"diagram_{i}"
165 self.add_content(text_content, source)
166
167
--- video_processor/models.py
+++ video_processor/models.py
@@ -121,10 +121,11 @@
121121
"""Statistics about a processing run."""
122122
start_time: Optional[str] = Field(default=None, description="ISO format start time")
123123
end_time: Optional[str] = Field(default=None, description="ISO format end time")
124124
duration_seconds: Optional[float] = Field(default=None, description="Total processing time")
125125
frames_extracted: int = Field(default=0)
126
+ people_frames_filtered: int = Field(default=0)
126127
diagrams_detected: int = Field(default=0)
127128
screen_captures: int = Field(default=0)
128129
transcript_duration_seconds: Optional[float] = Field(default=None)
129130
models_used: Dict[str, str] = Field(
130131
default_factory=dict,
131132
--- video_processor/models.py
+++ video_processor/models.py
@@ -121,10 +121,11 @@
121 """Statistics about a processing run."""
122 start_time: Optional[str] = Field(default=None, description="ISO format start time")
123 end_time: Optional[str] = Field(default=None, description="ISO format end time")
124 duration_seconds: Optional[float] = Field(default=None, description="Total processing time")
125 frames_extracted: int = Field(default=0)
 
126 diagrams_detected: int = Field(default=0)
127 screen_captures: int = Field(default=0)
128 transcript_duration_seconds: Optional[float] = Field(default=None)
129 models_used: Dict[str, str] = Field(
130 default_factory=dict,
131
--- video_processor/models.py
+++ video_processor/models.py
@@ -121,10 +121,11 @@
121 """Statistics about a processing run."""
122 start_time: Optional[str] = Field(default=None, description="ISO format start time")
123 end_time: Optional[str] = Field(default=None, description="ISO format end time")
124 duration_seconds: Optional[float] = Field(default=None, description="Total processing time")
125 frames_extracted: int = Field(default=0)
126 people_frames_filtered: int = Field(default=0)
127 diagrams_detected: int = Field(default=0)
128 screen_captures: int = Field(default=0)
129 transcript_duration_seconds: Optional[float] = Field(default=None)
130 models_used: Dict[str, str] = Field(
131 default_factory=dict,
132
--- video_processor/pipeline.py
+++ video_processor/pipeline.py
@@ -4,14 +4,16 @@
44
import logging
55
import time
66
from datetime import datetime
77
from pathlib import Path
88
from typing import Optional
9
+
10
+from tqdm import tqdm
911
1012
from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
1113
from video_processor.extractors.audio_extractor import AudioExtractor
12
-from video_processor.extractors.frame_extractor import extract_frames, save_frames
14
+from video_processor.extractors.frame_extractor import extract_frames, filter_people_frames, save_frames
1315
from video_processor.integrators.knowledge_graph import KnowledgeGraph
1416
from video_processor.integrators.plan_generator import PlanGenerator
1517
from video_processor.models import (
1618
ActionItem,
1719
KeyPoint,
@@ -32,10 +34,11 @@
3234
provider_manager: Optional[ProviderManager] = None,
3335
depth: str = "standard",
3436
focus_areas: Optional[list[str]] = None,
3537
sampling_rate: float = 0.5,
3638
change_threshold: float = 0.15,
39
+ periodic_capture_seconds: float = 30.0,
3740
use_gpu: bool = False,
3841
title: Optional[str] = None,
3942
) -> VideoManifest:
4043
"""
4144
Full pipeline: frames -> audio -> transcription -> diagrams -> KG -> report -> export.
@@ -56,108 +59,191 @@
5659
dirs = create_video_output_dirs(output_dir, video_name)
5760
5861
logger.info(f"Processing: {input_path}")
5962
logger.info(f"Depth: {depth}, Focus: {focus_areas or 'all'}")
6063
64
+ steps = [
65
+ "Extract frames",
66
+ "Extract audio",
67
+ "Transcribe",
68
+ "Analyze visuals",
69
+ "Build knowledge graph",
70
+ "Extract key points",
71
+ "Generate report",
72
+ "Export formats",
73
+ ]
74
+ pipeline_bar = tqdm(steps, desc="Pipeline", unit="step", position=0)
75
+
6176
# --- Step 1: Extract frames ---
62
- logger.info("Extracting video frames...")
63
- frames = extract_frames(
64
- input_path,
65
- sampling_rate=sampling_rate,
66
- change_threshold=change_threshold,
67
- disable_gpu=not use_gpu,
68
- )
69
- frame_paths = save_frames(frames, dirs["frames"], "frame")
70
- logger.info(f"Extracted {len(frames)} frames")
77
+ pm.usage.start_step("Frame extraction")
78
+ pipeline_bar.set_description("Pipeline: extracting frames")
79
+ existing_frames = sorted(dirs["frames"].glob("frame_*.jpg"))
80
+ people_removed = 0
81
+ if existing_frames:
82
+ frame_paths = existing_frames
83
+ logger.info(f"Resuming: found {len(frame_paths)} frames on disk, skipping extraction")
84
+ else:
85
+ logger.info("Extracting video frames...")
86
+ frames = extract_frames(
87
+ input_path,
88
+ sampling_rate=sampling_rate,
89
+ change_threshold=change_threshold,
90
+ periodic_capture_seconds=periodic_capture_seconds,
91
+ disable_gpu=not use_gpu,
92
+ )
93
+ logger.info(f"Extracted {len(frames)} raw frames")
94
+
95
+ # Filter out people/webcam frames before saving
96
+ frames, people_removed = filter_people_frames(frames)
97
+ frame_paths = save_frames(frames, dirs["frames"], "frame")
98
+ logger.info(f"Saved {len(frames)} content frames ({people_removed} people frames filtered)")
99
+ pipeline_bar.update(1)
71100
72101
# --- Step 2: Extract audio ---
73
- logger.info("Extracting audio...")
102
+ pm.usage.start_step("Audio extraction")
103
+ pipeline_bar.set_description("Pipeline: extracting audio")
104
+ audio_path = dirs["root"] / "audio" / f"{video_name}.wav"
74105
audio_extractor = AudioExtractor()
75
- audio_path = audio_extractor.extract_audio(
76
- input_path, output_path=dirs["root"] / "audio" / f"{video_name}.wav"
77
- )
106
+ if audio_path.exists():
107
+ logger.info(f"Resuming: found audio at {audio_path}, skipping extraction")
108
+ else:
109
+ logger.info("Extracting audio...")
110
+ audio_path = audio_extractor.extract_audio(input_path, output_path=audio_path)
78111
audio_props = audio_extractor.get_audio_properties(audio_path)
112
+ pipeline_bar.update(1)
79113
80114
# --- Step 3: Transcribe ---
81
- logger.info("Transcribing audio...")
82
- transcription = pm.transcribe_audio(audio_path)
83
- transcript_text = transcription.get("text", "")
84
- segments = transcription.get("segments", [])
85
-
86
- # Save transcript files
87
- transcript_data = {
88
- "text": transcript_text,
89
- "segments": segments,
90
- "duration": transcription.get("duration") or audio_props.get("duration"),
91
- "language": transcription.get("language"),
92
- "provider": transcription.get("provider"),
93
- "model": transcription.get("model"),
94
- }
115
+ pm.usage.start_step("Transcription")
116
+ pipeline_bar.set_description("Pipeline: transcribing audio")
95117
transcript_json = dirs["transcript"] / "transcript.json"
96
- transcript_json.write_text(json.dumps(transcript_data, indent=2))
97
-
98
- transcript_txt = dirs["transcript"] / "transcript.txt"
99
- transcript_txt.write_text(transcript_text)
100
-
101
- # SRT
102
- transcript_srt = dirs["transcript"] / "transcript.srt"
103
- srt_lines = []
104
- for i, seg in enumerate(segments):
105
- start = seg.get("start", 0)
106
- end = seg.get("end", 0)
107
- srt_lines.append(str(i + 1))
108
- srt_lines.append(
109
- f"{_format_srt_time(start)} --> {_format_srt_time(end)}"
110
- )
111
- srt_lines.append(seg.get("text", "").strip())
112
- srt_lines.append("")
113
- transcript_srt.write_text("\n".join(srt_lines))
118
+ if transcript_json.exists():
119
+ logger.info("Resuming: found transcript on disk, skipping transcription")
120
+ transcript_data = json.loads(transcript_json.read_text())
121
+ transcript_text = transcript_data.get("text", "")
122
+ segments = transcript_data.get("segments", [])
123
+ else:
124
+ logger.info("Transcribing audio...")
125
+ transcription = pm.transcribe_audio(audio_path)
126
+ transcript_text = transcription.get("text", "")
127
+ segments = transcription.get("segments", [])
128
+
129
+ # Save transcript files
130
+ transcript_data = {
131
+ "text": transcript_text,
132
+ "segments": segments,
133
+ "duration": transcription.get("duration") or audio_props.get("duration"),
134
+ "language": transcription.get("language"),
135
+ "provider": transcription.get("provider"),
136
+ "model": transcription.get("model"),
137
+ }
138
+ transcript_json.write_text(json.dumps(transcript_data, indent=2))
139
+
140
+ transcript_txt = dirs["transcript"] / "transcript.txt"
141
+ transcript_txt.write_text(transcript_text)
142
+
143
+ # SRT
144
+ transcript_srt = dirs["transcript"] / "transcript.srt"
145
+ srt_lines = []
146
+ for i, seg in enumerate(segments):
147
+ start = seg.get("start", 0)
148
+ end = seg.get("end", 0)
149
+ srt_lines.append(str(i + 1))
150
+ srt_lines.append(
151
+ f"{_format_srt_time(start)} --> {_format_srt_time(end)}"
152
+ )
153
+ srt_lines.append(seg.get("text", "").strip())
154
+ srt_lines.append("")
155
+ transcript_srt.write_text("\n".join(srt_lines))
156
+ pipeline_bar.update(1)
114157
115158
# --- Step 4: Diagram extraction ---
159
+ pm.usage.start_step("Visual analysis")
160
+ pipeline_bar.set_description("Pipeline: analyzing visuals")
116161
diagrams = []
117162
screen_captures = []
118
- if depth != "basic" and (not focus_areas or "diagrams" in focus_areas):
163
+ existing_diagrams = sorted(dirs["diagrams"].glob("diagram_*.json")) if dirs["diagrams"].exists() else []
164
+ if existing_diagrams:
165
+ logger.info(f"Resuming: found {len(existing_diagrams)} diagrams on disk, skipping analysis")
166
+ from video_processor.models import DiagramResult
167
+ for dj in existing_diagrams:
168
+ try:
169
+ diagrams.append(DiagramResult.model_validate_json(dj.read_text()))
170
+ except Exception as e:
171
+ logger.warning(f"Failed to load diagram {dj}: {e}")
172
+ elif depth != "basic" and (not focus_areas or "diagrams" in focus_areas):
119173
logger.info("Analyzing visual elements...")
120174
analyzer = DiagramAnalyzer(provider_manager=pm)
121175
max_frames = 10 if depth == "standard" else 20
122
- subset = frame_paths[:min(max_frames, len(frame_paths))]
176
+ # Evenly sample across all frames rather than just taking the first N
177
+ if len(frame_paths) <= max_frames:
178
+ subset = frame_paths
179
+ else:
180
+ step = len(frame_paths) / max_frames
181
+ subset = [frame_paths[int(i * step)] for i in range(max_frames)]
123182
diagrams, screen_captures = analyzer.process_frames(
124183
subset, diagrams_dir=dirs["diagrams"], captures_dir=dirs["captures"]
125184
)
185
+ pipeline_bar.update(1)
126186
127187
# --- Step 5: Knowledge graph ---
128
- logger.info("Building knowledge graph...")
129
- kg = KnowledgeGraph(provider_manager=pm)
130
- kg.process_transcript(transcript_data)
131
- if diagrams:
132
- diagram_dicts = [d.model_dump() for d in diagrams]
133
- kg.process_diagrams(diagram_dicts)
134
- kg_path = kg.save(dirs["results"] / "knowledge_graph.json")
188
+ pm.usage.start_step("Knowledge graph")
189
+ pipeline_bar.set_description("Pipeline: building knowledge graph")
190
+ kg_json_path = dirs["results"] / "knowledge_graph.json"
191
+ if kg_json_path.exists():
192
+ logger.info("Resuming: found knowledge graph on disk, loading")
193
+ kg_data = json.loads(kg_json_path.read_text())
194
+ kg = KnowledgeGraph(provider_manager=pm)
195
+ kg = KnowledgeGraph.from_dict(kg_data)
196
+ else:
197
+ logger.info("Building knowledge graph...")
198
+ kg = KnowledgeGraph(provider_manager=pm)
199
+ kg.process_transcript(transcript_data)
200
+ if diagrams:
201
+ diagram_dicts = [d.model_dump() for d in diagrams]
202
+ kg.process_diagrams(diagram_dicts)
203
+ kg.save(kg_json_path)
204
+ pipeline_bar.update(1)
135205
136206
# --- Step 6: Extract key points & action items ---
137
- key_points = _extract_key_points(pm, transcript_text)
138
- action_items = _extract_action_items(pm, transcript_text)
139
-
140
- # Save structured data
207
+ pm.usage.start_step("Key points & actions")
208
+ pipeline_bar.set_description("Pipeline: extracting key points")
141209
kp_path = dirs["results"] / "key_points.json"
142
- kp_path.write_text(json.dumps([kp.model_dump() for kp in key_points], indent=2))
143
-
144210
ai_path = dirs["results"] / "action_items.json"
145
- ai_path.write_text(json.dumps([ai.model_dump() for ai in action_items], indent=2))
211
+ if kp_path.exists() and ai_path.exists():
212
+ logger.info("Resuming: found key points and action items on disk")
213
+ key_points = [
214
+ KeyPoint(**item) for item in json.loads(kp_path.read_text())
215
+ ]
216
+ action_items = [
217
+ ActionItem(**item) for item in json.loads(ai_path.read_text())
218
+ ]
219
+ else:
220
+ key_points = _extract_key_points(pm, transcript_text)
221
+ action_items = _extract_action_items(pm, transcript_text)
222
+
223
+ kp_path.write_text(json.dumps([kp.model_dump() for kp in key_points], indent=2))
224
+ ai_path.write_text(json.dumps([ai.model_dump() for ai in action_items], indent=2))
225
+ pipeline_bar.update(1)
146226
147227
# --- Step 7: Generate markdown report ---
148
- logger.info("Generating report...")
149
- plan_gen = PlanGenerator(provider_manager=pm, knowledge_graph=kg)
228
+ pm.usage.start_step("Report generation")
229
+ pipeline_bar.set_description("Pipeline: generating report")
150230
md_path = dirs["results"] / "analysis.md"
151
- plan_gen.generate_markdown(
152
- transcript=transcript_data,
153
- key_points=[kp.model_dump() for kp in key_points],
154
- diagrams=[d.model_dump() for d in diagrams],
155
- knowledge_graph=kg.to_dict(),
156
- video_title=title,
157
- output_path=md_path,
158
- )
231
+ if md_path.exists():
232
+ logger.info("Resuming: found analysis report on disk, skipping generation")
233
+ else:
234
+ logger.info("Generating report...")
235
+ plan_gen = PlanGenerator(provider_manager=pm, knowledge_graph=kg)
236
+ plan_gen.generate_markdown(
237
+ transcript=transcript_data,
238
+ key_points=[kp.model_dump() for kp in key_points],
239
+ diagrams=[d.model_dump() for d in diagrams],
240
+ knowledge_graph=kg.to_dict(),
241
+ video_title=title,
242
+ output_path=md_path,
243
+ )
244
+ pipeline_bar.update(1)
159245
160246
# --- Build manifest ---
161247
elapsed = time.time() - start_time
162248
manifest = VideoManifest(
163249
video=VideoMetadata(
@@ -166,11 +252,12 @@
166252
duration_seconds=audio_props.get("duration"),
167253
),
168254
stats=ProcessingStats(
169255
start_time=datetime.now().isoformat(),
170256
duration_seconds=elapsed,
171
- frames_extracted=len(frames),
257
+ frames_extracted=len(frame_paths),
258
+ people_frames_filtered=people_removed,
172259
diagrams_detected=len(diagrams),
173260
screen_captures=len(screen_captures),
174261
transcript_duration_seconds=audio_props.get("duration"),
175262
models_used=pm.get_models_used(),
176263
),
@@ -187,13 +274,19 @@
187274
screen_captures=screen_captures,
188275
frame_paths=[f"frames/{Path(p).name}" for p in frame_paths],
189276
)
190277
191278
# --- Step 8: Export all formats ---
192
- logger.info("Exporting multi-format outputs...")
279
+ pm.usage.start_step("Export formats")
280
+ pipeline_bar.set_description("Pipeline: exporting formats")
193281
manifest = export_all_formats(output_dir, manifest)
194282
283
+ pm.usage.end_step()
284
+ pipeline_bar.update(1)
285
+ pipeline_bar.set_description("Pipeline: complete")
286
+ pipeline_bar.close()
287
+
195288
# Write manifest
196289
write_video_manifest(manifest, output_dir)
197290
198291
logger.info(f"Processing complete in {elapsed:.1f}s: {len(diagrams)} diagrams, "
199292
f"{len(screen_captures)} captures, {len(key_points)} key points, "
200293
--- video_processor/pipeline.py
+++ video_processor/pipeline.py
@@ -4,14 +4,16 @@
4 import logging
5 import time
6 from datetime import datetime
7 from pathlib import Path
8 from typing import Optional
 
 
9
10 from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
11 from video_processor.extractors.audio_extractor import AudioExtractor
12 from video_processor.extractors.frame_extractor import extract_frames, save_frames
13 from video_processor.integrators.knowledge_graph import KnowledgeGraph
14 from video_processor.integrators.plan_generator import PlanGenerator
15 from video_processor.models import (
16 ActionItem,
17 KeyPoint,
@@ -32,10 +34,11 @@
32 provider_manager: Optional[ProviderManager] = None,
33 depth: str = "standard",
34 focus_areas: Optional[list[str]] = None,
35 sampling_rate: float = 0.5,
36 change_threshold: float = 0.15,
 
37 use_gpu: bool = False,
38 title: Optional[str] = None,
39 ) -> VideoManifest:
40 """
41 Full pipeline: frames -> audio -> transcription -> diagrams -> KG -> report -> export.
@@ -56,108 +59,191 @@
56 dirs = create_video_output_dirs(output_dir, video_name)
57
58 logger.info(f"Processing: {input_path}")
59 logger.info(f"Depth: {depth}, Focus: {focus_areas or 'all'}")
60
 
 
 
 
 
 
 
 
 
 
 
 
61 # --- Step 1: Extract frames ---
62 logger.info("Extracting video frames...")
63 frames = extract_frames(
64 input_path,
65 sampling_rate=sampling_rate,
66 change_threshold=change_threshold,
67 disable_gpu=not use_gpu,
68 )
69 frame_paths = save_frames(frames, dirs["frames"], "frame")
70 logger.info(f"Extracted {len(frames)} frames")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
72 # --- Step 2: Extract audio ---
73 logger.info("Extracting audio...")
 
 
74 audio_extractor = AudioExtractor()
75 audio_path = audio_extractor.extract_audio(
76 input_path, output_path=dirs["root"] / "audio" / f"{video_name}.wav"
77 )
 
 
78 audio_props = audio_extractor.get_audio_properties(audio_path)
 
79
80 # --- Step 3: Transcribe ---
81 logger.info("Transcribing audio...")
82 transcription = pm.transcribe_audio(audio_path)
83 transcript_text = transcription.get("text", "")
84 segments = transcription.get("segments", [])
85
86 # Save transcript files
87 transcript_data = {
88 "text": transcript_text,
89 "segments": segments,
90 "duration": transcription.get("duration") or audio_props.get("duration"),
91 "language": transcription.get("language"),
92 "provider": transcription.get("provider"),
93 "model": transcription.get("model"),
94 }
95 transcript_json = dirs["transcript"] / "transcript.json"
96 transcript_json.write_text(json.dumps(transcript_data, indent=2))
97
98 transcript_txt = dirs["transcript"] / "transcript.txt"
99 transcript_txt.write_text(transcript_text)
100
101 # SRT
102 transcript_srt = dirs["transcript"] / "transcript.srt"
103 srt_lines = []
104 for i, seg in enumerate(segments):
105 start = seg.get("start", 0)
106 end = seg.get("end", 0)
107 srt_lines.append(str(i + 1))
108 srt_lines.append(
109 f"{_format_srt_time(start)} --> {_format_srt_time(end)}"
110 )
111 srt_lines.append(seg.get("text", "").strip())
112 srt_lines.append("")
113 transcript_srt.write_text("\n".join(srt_lines))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
115 # --- Step 4: Diagram extraction ---
 
 
116 diagrams = []
117 screen_captures = []
118 if depth != "basic" and (not focus_areas or "diagrams" in focus_areas):
 
 
 
 
 
 
 
 
 
119 logger.info("Analyzing visual elements...")
120 analyzer = DiagramAnalyzer(provider_manager=pm)
121 max_frames = 10 if depth == "standard" else 20
122 subset = frame_paths[:min(max_frames, len(frame_paths))]
 
 
 
 
 
123 diagrams, screen_captures = analyzer.process_frames(
124 subset, diagrams_dir=dirs["diagrams"], captures_dir=dirs["captures"]
125 )
 
126
127 # --- Step 5: Knowledge graph ---
128 logger.info("Building knowledge graph...")
129 kg = KnowledgeGraph(provider_manager=pm)
130 kg.process_transcript(transcript_data)
131 if diagrams:
132 diagram_dicts = [d.model_dump() for d in diagrams]
133 kg.process_diagrams(diagram_dicts)
134 kg_path = kg.save(dirs["results"] / "knowledge_graph.json")
 
 
 
 
 
 
 
 
 
 
135
136 # --- Step 6: Extract key points & action items ---
137 key_points = _extract_key_points(pm, transcript_text)
138 action_items = _extract_action_items(pm, transcript_text)
139
140 # Save structured data
141 kp_path = dirs["results"] / "key_points.json"
142 kp_path.write_text(json.dumps([kp.model_dump() for kp in key_points], indent=2))
143
144 ai_path = dirs["results"] / "action_items.json"
145 ai_path.write_text(json.dumps([ai.model_dump() for ai in action_items], indent=2))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
146
147 # --- Step 7: Generate markdown report ---
148 logger.info("Generating report...")
149 plan_gen = PlanGenerator(provider_manager=pm, knowledge_graph=kg)
150 md_path = dirs["results"] / "analysis.md"
151 plan_gen.generate_markdown(
152 transcript=transcript_data,
153 key_points=[kp.model_dump() for kp in key_points],
154 diagrams=[d.model_dump() for d in diagrams],
155 knowledge_graph=kg.to_dict(),
156 video_title=title,
157 output_path=md_path,
158 )
 
 
 
 
 
 
159
160 # --- Build manifest ---
161 elapsed = time.time() - start_time
162 manifest = VideoManifest(
163 video=VideoMetadata(
@@ -166,11 +252,12 @@
166 duration_seconds=audio_props.get("duration"),
167 ),
168 stats=ProcessingStats(
169 start_time=datetime.now().isoformat(),
170 duration_seconds=elapsed,
171 frames_extracted=len(frames),
 
172 diagrams_detected=len(diagrams),
173 screen_captures=len(screen_captures),
174 transcript_duration_seconds=audio_props.get("duration"),
175 models_used=pm.get_models_used(),
176 ),
@@ -187,13 +274,19 @@
187 screen_captures=screen_captures,
188 frame_paths=[f"frames/{Path(p).name}" for p in frame_paths],
189 )
190
191 # --- Step 8: Export all formats ---
192 logger.info("Exporting multi-format outputs...")
 
193 manifest = export_all_formats(output_dir, manifest)
194
 
 
 
 
 
195 # Write manifest
196 write_video_manifest(manifest, output_dir)
197
198 logger.info(f"Processing complete in {elapsed:.1f}s: {len(diagrams)} diagrams, "
199 f"{len(screen_captures)} captures, {len(key_points)} key points, "
200
--- video_processor/pipeline.py
+++ video_processor/pipeline.py
@@ -4,14 +4,16 @@
4 import logging
5 import time
6 from datetime import datetime
7 from pathlib import Path
8 from typing import Optional
9
10 from tqdm import tqdm
11
12 from video_processor.analyzers.diagram_analyzer import DiagramAnalyzer
13 from video_processor.extractors.audio_extractor import AudioExtractor
14 from video_processor.extractors.frame_extractor import extract_frames, filter_people_frames, save_frames
15 from video_processor.integrators.knowledge_graph import KnowledgeGraph
16 from video_processor.integrators.plan_generator import PlanGenerator
17 from video_processor.models import (
18 ActionItem,
19 KeyPoint,
@@ -32,10 +34,11 @@
34 provider_manager: Optional[ProviderManager] = None,
35 depth: str = "standard",
36 focus_areas: Optional[list[str]] = None,
37 sampling_rate: float = 0.5,
38 change_threshold: float = 0.15,
39 periodic_capture_seconds: float = 30.0,
40 use_gpu: bool = False,
41 title: Optional[str] = None,
42 ) -> VideoManifest:
43 """
44 Full pipeline: frames -> audio -> transcription -> diagrams -> KG -> report -> export.
@@ -56,108 +59,191 @@
59 dirs = create_video_output_dirs(output_dir, video_name)
60
61 logger.info(f"Processing: {input_path}")
62 logger.info(f"Depth: {depth}, Focus: {focus_areas or 'all'}")
63
64 steps = [
65 "Extract frames",
66 "Extract audio",
67 "Transcribe",
68 "Analyze visuals",
69 "Build knowledge graph",
70 "Extract key points",
71 "Generate report",
72 "Export formats",
73 ]
74 pipeline_bar = tqdm(steps, desc="Pipeline", unit="step", position=0)
75
76 # --- Step 1: Extract frames ---
77 pm.usage.start_step("Frame extraction")
78 pipeline_bar.set_description("Pipeline: extracting frames")
79 existing_frames = sorted(dirs["frames"].glob("frame_*.jpg"))
80 people_removed = 0
81 if existing_frames:
82 frame_paths = existing_frames
83 logger.info(f"Resuming: found {len(frame_paths)} frames on disk, skipping extraction")
84 else:
85 logger.info("Extracting video frames...")
86 frames = extract_frames(
87 input_path,
88 sampling_rate=sampling_rate,
89 change_threshold=change_threshold,
90 periodic_capture_seconds=periodic_capture_seconds,
91 disable_gpu=not use_gpu,
92 )
93 logger.info(f"Extracted {len(frames)} raw frames")
94
95 # Filter out people/webcam frames before saving
96 frames, people_removed = filter_people_frames(frames)
97 frame_paths = save_frames(frames, dirs["frames"], "frame")
98 logger.info(f"Saved {len(frames)} content frames ({people_removed} people frames filtered)")
99 pipeline_bar.update(1)
100
101 # --- Step 2: Extract audio ---
102 pm.usage.start_step("Audio extraction")
103 pipeline_bar.set_description("Pipeline: extracting audio")
104 audio_path = dirs["root"] / "audio" / f"{video_name}.wav"
105 audio_extractor = AudioExtractor()
106 if audio_path.exists():
107 logger.info(f"Resuming: found audio at {audio_path}, skipping extraction")
108 else:
109 logger.info("Extracting audio...")
110 audio_path = audio_extractor.extract_audio(input_path, output_path=audio_path)
111 audio_props = audio_extractor.get_audio_properties(audio_path)
112 pipeline_bar.update(1)
113
114 # --- Step 3: Transcribe ---
115 pm.usage.start_step("Transcription")
116 pipeline_bar.set_description("Pipeline: transcribing audio")
 
 
 
 
 
 
 
 
 
 
 
 
117 transcript_json = dirs["transcript"] / "transcript.json"
118 if transcript_json.exists():
119 logger.info("Resuming: found transcript on disk, skipping transcription")
120 transcript_data = json.loads(transcript_json.read_text())
121 transcript_text = transcript_data.get("text", "")
122 segments = transcript_data.get("segments", [])
123 else:
124 logger.info("Transcribing audio...")
125 transcription = pm.transcribe_audio(audio_path)
126 transcript_text = transcription.get("text", "")
127 segments = transcription.get("segments", [])
128
129 # Save transcript files
130 transcript_data = {
131 "text": transcript_text,
132 "segments": segments,
133 "duration": transcription.get("duration") or audio_props.get("duration"),
134 "language": transcription.get("language"),
135 "provider": transcription.get("provider"),
136 "model": transcription.get("model"),
137 }
138 transcript_json.write_text(json.dumps(transcript_data, indent=2))
139
140 transcript_txt = dirs["transcript"] / "transcript.txt"
141 transcript_txt.write_text(transcript_text)
142
143 # SRT
144 transcript_srt = dirs["transcript"] / "transcript.srt"
145 srt_lines = []
146 for i, seg in enumerate(segments):
147 start = seg.get("start", 0)
148 end = seg.get("end", 0)
149 srt_lines.append(str(i + 1))
150 srt_lines.append(
151 f"{_format_srt_time(start)} --> {_format_srt_time(end)}"
152 )
153 srt_lines.append(seg.get("text", "").strip())
154 srt_lines.append("")
155 transcript_srt.write_text("\n".join(srt_lines))
156 pipeline_bar.update(1)
157
158 # --- Step 4: Diagram extraction ---
159 pm.usage.start_step("Visual analysis")
160 pipeline_bar.set_description("Pipeline: analyzing visuals")
161 diagrams = []
162 screen_captures = []
163 existing_diagrams = sorted(dirs["diagrams"].glob("diagram_*.json")) if dirs["diagrams"].exists() else []
164 if existing_diagrams:
165 logger.info(f"Resuming: found {len(existing_diagrams)} diagrams on disk, skipping analysis")
166 from video_processor.models import DiagramResult
167 for dj in existing_diagrams:
168 try:
169 diagrams.append(DiagramResult.model_validate_json(dj.read_text()))
170 except Exception as e:
171 logger.warning(f"Failed to load diagram {dj}: {e}")
172 elif depth != "basic" and (not focus_areas or "diagrams" in focus_areas):
173 logger.info("Analyzing visual elements...")
174 analyzer = DiagramAnalyzer(provider_manager=pm)
175 max_frames = 10 if depth == "standard" else 20
176 # Evenly sample across all frames rather than just taking the first N
177 if len(frame_paths) <= max_frames:
178 subset = frame_paths
179 else:
180 step = len(frame_paths) / max_frames
181 subset = [frame_paths[int(i * step)] for i in range(max_frames)]
182 diagrams, screen_captures = analyzer.process_frames(
183 subset, diagrams_dir=dirs["diagrams"], captures_dir=dirs["captures"]
184 )
185 pipeline_bar.update(1)
186
187 # --- Step 5: Knowledge graph ---
188 pm.usage.start_step("Knowledge graph")
189 pipeline_bar.set_description("Pipeline: building knowledge graph")
190 kg_json_path = dirs["results"] / "knowledge_graph.json"
191 if kg_json_path.exists():
192 logger.info("Resuming: found knowledge graph on disk, loading")
193 kg_data = json.loads(kg_json_path.read_text())
194 kg = KnowledgeGraph(provider_manager=pm)
195 kg = KnowledgeGraph.from_dict(kg_data)
196 else:
197 logger.info("Building knowledge graph...")
198 kg = KnowledgeGraph(provider_manager=pm)
199 kg.process_transcript(transcript_data)
200 if diagrams:
201 diagram_dicts = [d.model_dump() for d in diagrams]
202 kg.process_diagrams(diagram_dicts)
203 kg.save(kg_json_path)
204 pipeline_bar.update(1)
205
206 # --- Step 6: Extract key points & action items ---
207 pm.usage.start_step("Key points & actions")
208 pipeline_bar.set_description("Pipeline: extracting key points")
 
 
209 kp_path = dirs["results"] / "key_points.json"
 
 
210 ai_path = dirs["results"] / "action_items.json"
211 if kp_path.exists() and ai_path.exists():
212 logger.info("Resuming: found key points and action items on disk")
213 key_points = [
214 KeyPoint(**item) for item in json.loads(kp_path.read_text())
215 ]
216 action_items = [
217 ActionItem(**item) for item in json.loads(ai_path.read_text())
218 ]
219 else:
220 key_points = _extract_key_points(pm, transcript_text)
221 action_items = _extract_action_items(pm, transcript_text)
222
223 kp_path.write_text(json.dumps([kp.model_dump() for kp in key_points], indent=2))
224 ai_path.write_text(json.dumps([ai.model_dump() for ai in action_items], indent=2))
225 pipeline_bar.update(1)
226
227 # --- Step 7: Generate markdown report ---
228 pm.usage.start_step("Report generation")
229 pipeline_bar.set_description("Pipeline: generating report")
230 md_path = dirs["results"] / "analysis.md"
231 if md_path.exists():
232 logger.info("Resuming: found analysis report on disk, skipping generation")
233 else:
234 logger.info("Generating report...")
235 plan_gen = PlanGenerator(provider_manager=pm, knowledge_graph=kg)
236 plan_gen.generate_markdown(
237 transcript=transcript_data,
238 key_points=[kp.model_dump() for kp in key_points],
239 diagrams=[d.model_dump() for d in diagrams],
240 knowledge_graph=kg.to_dict(),
241 video_title=title,
242 output_path=md_path,
243 )
244 pipeline_bar.update(1)
245
246 # --- Build manifest ---
247 elapsed = time.time() - start_time
248 manifest = VideoManifest(
249 video=VideoMetadata(
@@ -166,11 +252,12 @@
252 duration_seconds=audio_props.get("duration"),
253 ),
254 stats=ProcessingStats(
255 start_time=datetime.now().isoformat(),
256 duration_seconds=elapsed,
257 frames_extracted=len(frame_paths),
258 people_frames_filtered=people_removed,
259 diagrams_detected=len(diagrams),
260 screen_captures=len(screen_captures),
261 transcript_duration_seconds=audio_props.get("duration"),
262 models_used=pm.get_models_used(),
263 ),
@@ -187,13 +274,19 @@
274 screen_captures=screen_captures,
275 frame_paths=[f"frames/{Path(p).name}" for p in frame_paths],
276 )
277
278 # --- Step 8: Export all formats ---
279 pm.usage.start_step("Export formats")
280 pipeline_bar.set_description("Pipeline: exporting formats")
281 manifest = export_all_formats(output_dir, manifest)
282
283 pm.usage.end_step()
284 pipeline_bar.update(1)
285 pipeline_bar.set_description("Pipeline: complete")
286 pipeline_bar.close()
287
288 # Write manifest
289 write_video_manifest(manifest, output_dir)
290
291 logger.info(f"Processing complete in {elapsed:.1f}s: {len(diagrams)} diagrams, "
292 f"{len(screen_captures)} captures, {len(key_points)} key points, "
293
--- video_processor/providers/anthropic_provider.py
+++ video_processor/providers/anthropic_provider.py
@@ -38,10 +38,14 @@
3838
model=model,
3939
messages=messages,
4040
max_tokens=max_tokens,
4141
temperature=temperature,
4242
)
43
+ self._last_usage = {
44
+ "input_tokens": getattr(response.usage, "input_tokens", 0),
45
+ "output_tokens": getattr(response.usage, "output_tokens", 0),
46
+ }
4347
return response.content[0].text
4448
4549
def analyze_image(
4650
self,
4751
image_bytes: bytes,
@@ -69,10 +73,14 @@
6973
],
7074
}
7175
],
7276
max_tokens=max_tokens,
7377
)
78
+ self._last_usage = {
79
+ "input_tokens": getattr(response.usage, "input_tokens", 0),
80
+ "output_tokens": getattr(response.usage, "output_tokens", 0),
81
+ }
7482
return response.content[0].text
7583
7684
def transcribe_audio(
7785
self,
7886
audio_path: str | Path,
7987
--- video_processor/providers/anthropic_provider.py
+++ video_processor/providers/anthropic_provider.py
@@ -38,10 +38,14 @@
38 model=model,
39 messages=messages,
40 max_tokens=max_tokens,
41 temperature=temperature,
42 )
 
 
 
 
43 return response.content[0].text
44
45 def analyze_image(
46 self,
47 image_bytes: bytes,
@@ -69,10 +73,14 @@
69 ],
70 }
71 ],
72 max_tokens=max_tokens,
73 )
 
 
 
 
74 return response.content[0].text
75
76 def transcribe_audio(
77 self,
78 audio_path: str | Path,
79
--- video_processor/providers/anthropic_provider.py
+++ video_processor/providers/anthropic_provider.py
@@ -38,10 +38,14 @@
38 model=model,
39 messages=messages,
40 max_tokens=max_tokens,
41 temperature=temperature,
42 )
43 self._last_usage = {
44 "input_tokens": getattr(response.usage, "input_tokens", 0),
45 "output_tokens": getattr(response.usage, "output_tokens", 0),
46 }
47 return response.content[0].text
48
49 def analyze_image(
50 self,
51 image_bytes: bytes,
@@ -69,10 +73,14 @@
73 ],
74 }
75 ],
76 max_tokens=max_tokens,
77 )
78 self._last_usage = {
79 "input_tokens": getattr(response.usage, "input_tokens", 0),
80 "output_tokens": getattr(response.usage, "output_tokens", 0),
81 }
82 return response.content[0].text
83
84 def transcribe_audio(
85 self,
86 audio_path: str | Path,
87
--- video_processor/providers/discovery.py
+++ video_processor/providers/discovery.py
@@ -56,20 +56,25 @@
5656
logger.info(f"Discovered {len(models)} Anthropic models")
5757
all_models.extend(models)
5858
except Exception as e:
5959
logger.info(f"Anthropic discovery skipped: {e}")
6060
61
- # Gemini
62
- if keys.get("gemini"):
61
+ # Gemini (API key or service account)
62
+ gemini_key = keys.get("gemini")
63
+ gemini_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS", "")
64
+ if gemini_key or gemini_creds:
6365
try:
6466
from video_processor.providers.gemini_provider import GeminiProvider
65
- provider = GeminiProvider(api_key=keys["gemini"])
67
+ provider = GeminiProvider(
68
+ api_key=gemini_key or None,
69
+ credentials_path=gemini_creds or None,
70
+ )
6671
models = provider.list_models()
6772
logger.info(f"Discovered {len(models)} Gemini models")
6873
all_models.extend(models)
6974
except Exception as e:
70
- logger.info(f"Gemini discovery skipped: {e}")
75
+ logger.warning(f"Gemini discovery failed: {e}")
7176
7277
# Sort by provider then id
7378
all_models.sort(key=lambda m: (m.provider, m.id))
7479
_cached_models = all_models
7580
logger.info(f"Total discovered models: {len(all_models)}")
7681
--- video_processor/providers/discovery.py
+++ video_processor/providers/discovery.py
@@ -56,20 +56,25 @@
56 logger.info(f"Discovered {len(models)} Anthropic models")
57 all_models.extend(models)
58 except Exception as e:
59 logger.info(f"Anthropic discovery skipped: {e}")
60
61 # Gemini
62 if keys.get("gemini"):
 
 
63 try:
64 from video_processor.providers.gemini_provider import GeminiProvider
65 provider = GeminiProvider(api_key=keys["gemini"])
 
 
 
66 models = provider.list_models()
67 logger.info(f"Discovered {len(models)} Gemini models")
68 all_models.extend(models)
69 except Exception as e:
70 logger.info(f"Gemini discovery skipped: {e}")
71
72 # Sort by provider then id
73 all_models.sort(key=lambda m: (m.provider, m.id))
74 _cached_models = all_models
75 logger.info(f"Total discovered models: {len(all_models)}")
76
--- video_processor/providers/discovery.py
+++ video_processor/providers/discovery.py
@@ -56,20 +56,25 @@
56 logger.info(f"Discovered {len(models)} Anthropic models")
57 all_models.extend(models)
58 except Exception as e:
59 logger.info(f"Anthropic discovery skipped: {e}")
60
61 # Gemini (API key or service account)
62 gemini_key = keys.get("gemini")
63 gemini_creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS", "")
64 if gemini_key or gemini_creds:
65 try:
66 from video_processor.providers.gemini_provider import GeminiProvider
67 provider = GeminiProvider(
68 api_key=gemini_key or None,
69 credentials_path=gemini_creds or None,
70 )
71 models = provider.list_models()
72 logger.info(f"Discovered {len(models)} Gemini models")
73 all_models.extend(models)
74 except Exception as e:
75 logger.warning(f"Gemini discovery failed: {e}")
76
77 # Sort by provider then id
78 all_models.sort(key=lambda m: (m.provider, m.id))
79 _cached_models = all_models
80 logger.info(f"Total discovered models: {len(all_models)}")
81
--- video_processor/providers/gemini_provider.py
+++ video_processor/providers/gemini_provider.py
@@ -20,18 +20,43 @@
2020
class GeminiProvider(BaseProvider):
2121
"""Google Gemini API provider via google-genai SDK."""
2222
2323
provider_name = "gemini"
2424
25
- def __init__(self, api_key: Optional[str] = None):
25
+ def __init__(
26
+ self,
27
+ api_key: Optional[str] = None,
28
+ credentials_path: Optional[str] = None,
29
+ ):
2630
self.api_key = api_key or os.getenv("GEMINI_API_KEY")
27
- if not self.api_key:
28
- raise ValueError("GEMINI_API_KEY not set")
31
+ self.credentials_path = credentials_path or os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
32
+
33
+ if not self.api_key and not self.credentials_path:
34
+ raise ValueError(
35
+ "Neither GEMINI_API_KEY nor GOOGLE_APPLICATION_CREDENTIALS is set"
36
+ )
37
+
2938
try:
3039
from google import genai
3140
self._genai = genai
32
- self.client = genai.Client(api_key=self.api_key)
41
+
42
+ if self.api_key:
43
+ self.client = genai.Client(api_key=self.api_key)
44
+ else:
45
+ # Service account → use Vertex AI mode
46
+ import json
47
+
48
+ with open(self.credentials_path) as f:
49
+ sa_info = json.load(f)
50
+ project = sa_info.get("project_id", "")
51
+ location = os.getenv("GOOGLE_CLOUD_LOCATION", "us-central1")
52
+
53
+ self.client = genai.Client(
54
+ vertexai=True,
55
+ project=project,
56
+ location=location,
57
+ )
3358
except ImportError:
3459
raise ImportError(
3560
"google-genai package not installed. "
3661
"Install with: pip install google-genai"
3762
)
@@ -61,10 +86,15 @@
6186
config=types.GenerateContentConfig(
6287
max_output_tokens=max_tokens,
6388
temperature=temperature,
6489
),
6590
)
91
+ um = getattr(response, "usage_metadata", None)
92
+ self._last_usage = {
93
+ "input_tokens": getattr(um, "prompt_token_count", 0) if um else 0,
94
+ "output_tokens": getattr(um, "candidates_token_count", 0) if um else 0,
95
+ }
6696
return response.text or ""
6797
6898
def analyze_image(
6999
self,
70100
image_bytes: bytes,
@@ -83,10 +113,15 @@
83113
],
84114
config=types.GenerateContentConfig(
85115
max_output_tokens=max_tokens,
86116
),
87117
)
118
+ um = getattr(response, "usage_metadata", None)
119
+ self._last_usage = {
120
+ "input_tokens": getattr(um, "prompt_token_count", 0) if um else 0,
121
+ "output_tokens": getattr(um, "candidates_token_count", 0) if um else 0,
122
+ }
88123
return response.text or ""
89124
90125
def transcribe_audio(
91126
self,
92127
audio_path: str | Path,
@@ -152,13 +187,15 @@
152187
def list_models(self) -> list[ModelInfo]:
153188
models = []
154189
try:
155190
for m in self.client.models.list():
156191
mid = m.name or ""
157
- # Strip "models/" prefix if present
158
- if mid.startswith("models/"):
159
- mid = mid[7:]
192
+ # Strip prefix variants from different API modes
193
+ for prefix in ("models/", "publishers/google/models/"):
194
+ if mid.startswith(prefix):
195
+ mid = mid[len(prefix):]
196
+ break
160197
display = getattr(m, "display_name", mid) or mid
161198
162199
caps = []
163200
mid_lower = mid.lower()
164201
if "gemini" in mid_lower:
165202
--- video_processor/providers/gemini_provider.py
+++ video_processor/providers/gemini_provider.py
@@ -20,18 +20,43 @@
20 class GeminiProvider(BaseProvider):
21 """Google Gemini API provider via google-genai SDK."""
22
23 provider_name = "gemini"
24
25 def __init__(self, api_key: Optional[str] = None):
 
 
 
 
26 self.api_key = api_key or os.getenv("GEMINI_API_KEY")
27 if not self.api_key:
28 raise ValueError("GEMINI_API_KEY not set")
 
 
 
 
 
29 try:
30 from google import genai
31 self._genai = genai
32 self.client = genai.Client(api_key=self.api_key)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33 except ImportError:
34 raise ImportError(
35 "google-genai package not installed. "
36 "Install with: pip install google-genai"
37 )
@@ -61,10 +86,15 @@
61 config=types.GenerateContentConfig(
62 max_output_tokens=max_tokens,
63 temperature=temperature,
64 ),
65 )
 
 
 
 
 
66 return response.text or ""
67
68 def analyze_image(
69 self,
70 image_bytes: bytes,
@@ -83,10 +113,15 @@
83 ],
84 config=types.GenerateContentConfig(
85 max_output_tokens=max_tokens,
86 ),
87 )
 
 
 
 
 
88 return response.text or ""
89
90 def transcribe_audio(
91 self,
92 audio_path: str | Path,
@@ -152,13 +187,15 @@
152 def list_models(self) -> list[ModelInfo]:
153 models = []
154 try:
155 for m in self.client.models.list():
156 mid = m.name or ""
157 # Strip "models/" prefix if present
158 if mid.startswith("models/"):
159 mid = mid[7:]
 
 
160 display = getattr(m, "display_name", mid) or mid
161
162 caps = []
163 mid_lower = mid.lower()
164 if "gemini" in mid_lower:
165
--- video_processor/providers/gemini_provider.py
+++ video_processor/providers/gemini_provider.py
@@ -20,18 +20,43 @@
20 class GeminiProvider(BaseProvider):
21 """Google Gemini API provider via google-genai SDK."""
22
23 provider_name = "gemini"
24
25 def __init__(
26 self,
27 api_key: Optional[str] = None,
28 credentials_path: Optional[str] = None,
29 ):
30 self.api_key = api_key or os.getenv("GEMINI_API_KEY")
31 self.credentials_path = credentials_path or os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
32
33 if not self.api_key and not self.credentials_path:
34 raise ValueError(
35 "Neither GEMINI_API_KEY nor GOOGLE_APPLICATION_CREDENTIALS is set"
36 )
37
38 try:
39 from google import genai
40 self._genai = genai
41
42 if self.api_key:
43 self.client = genai.Client(api_key=self.api_key)
44 else:
45 # Service account → use Vertex AI mode
46 import json
47
48 with open(self.credentials_path) as f:
49 sa_info = json.load(f)
50 project = sa_info.get("project_id", "")
51 location = os.getenv("GOOGLE_CLOUD_LOCATION", "us-central1")
52
53 self.client = genai.Client(
54 vertexai=True,
55 project=project,
56 location=location,
57 )
58 except ImportError:
59 raise ImportError(
60 "google-genai package not installed. "
61 "Install with: pip install google-genai"
62 )
@@ -61,10 +86,15 @@
86 config=types.GenerateContentConfig(
87 max_output_tokens=max_tokens,
88 temperature=temperature,
89 ),
90 )
91 um = getattr(response, "usage_metadata", None)
92 self._last_usage = {
93 "input_tokens": getattr(um, "prompt_token_count", 0) if um else 0,
94 "output_tokens": getattr(um, "candidates_token_count", 0) if um else 0,
95 }
96 return response.text or ""
97
98 def analyze_image(
99 self,
100 image_bytes: bytes,
@@ -83,10 +113,15 @@
113 ],
114 config=types.GenerateContentConfig(
115 max_output_tokens=max_tokens,
116 ),
117 )
118 um = getattr(response, "usage_metadata", None)
119 self._last_usage = {
120 "input_tokens": getattr(um, "prompt_token_count", 0) if um else 0,
121 "output_tokens": getattr(um, "candidates_token_count", 0) if um else 0,
122 }
123 return response.text or ""
124
125 def transcribe_audio(
126 self,
127 audio_path: str | Path,
@@ -152,13 +187,15 @@
187 def list_models(self) -> list[ModelInfo]:
188 models = []
189 try:
190 for m in self.client.models.list():
191 mid = m.name or ""
192 # Strip prefix variants from different API modes
193 for prefix in ("models/", "publishers/google/models/"):
194 if mid.startswith(prefix):
195 mid = mid[len(prefix):]
196 break
197 display = getattr(m, "display_name", mid) or mid
198
199 caps = []
200 mid_lower = mid.lower()
201 if "gemini" in mid_lower:
202
--- video_processor/providers/manager.py
+++ video_processor/providers/manager.py
@@ -7,10 +7,11 @@
77
88
from dotenv import load_dotenv
99
1010
from video_processor.providers.base import BaseProvider, ModelInfo
1111
from video_processor.providers.discovery import discover_available_models
12
+from video_processor.utils.usage_tracker import UsageTracker
1213
1314
load_dotenv()
1415
logger = logging.getLogger(__name__)
1516
1617
# Default model preference rankings (tried in order)
@@ -60,10 +61,11 @@
6061
auto : if True and no model specified, pick the best available
6162
"""
6263
self.auto = auto
6364
self._providers: dict[str, BaseProvider] = {}
6465
self._available_models: Optional[list[ModelInfo]] = None
66
+ self.usage = UsageTracker()
6567
6668
# If a single provider is forced, apply it
6769
if provider:
6870
self.vision_model = vision_model or self._default_for_provider(provider, "vision")
6971
self.chat_model = chat_model or self._default_for_provider(provider, "chat")
@@ -143,10 +145,22 @@
143145
raise RuntimeError(
144146
f"No provider available for capability '{capability}'. "
145147
"Set an API key for at least one provider."
146148
)
147149
150
+ def _track(self, provider: BaseProvider, prov_name: str, model: str) -> None:
151
+ """Record usage from the last API call on a provider."""
152
+ last = getattr(provider, "_last_usage", None)
153
+ if last:
154
+ self.usage.record(
155
+ provider=prov_name,
156
+ model=model,
157
+ input_tokens=last.get("input_tokens", 0),
158
+ output_tokens=last.get("output_tokens", 0),
159
+ )
160
+ provider._last_usage = None
161
+
148162
# --- Public API ---
149163
150164
def chat(
151165
self,
152166
messages: list[dict],
@@ -155,11 +169,13 @@
155169
) -> str:
156170
"""Send a chat completion to the best available provider."""
157171
prov_name, model = self._resolve_model(self.chat_model, "chat", _CHAT_PREFERENCES)
158172
logger.info(f"Chat: using {prov_name}/{model}")
159173
provider = self._get_provider(prov_name)
160
- return provider.chat(messages, max_tokens=max_tokens, temperature=temperature, model=model)
174
+ result = provider.chat(messages, max_tokens=max_tokens, temperature=temperature, model=model)
175
+ self._track(provider, prov_name, model)
176
+ return result
161177
162178
def analyze_image(
163179
self,
164180
image_bytes: bytes,
165181
prompt: str,
@@ -167,24 +183,58 @@
167183
) -> str:
168184
"""Analyze an image using the best available vision provider."""
169185
prov_name, model = self._resolve_model(self.vision_model, "vision", _VISION_PREFERENCES)
170186
logger.info(f"Vision: using {prov_name}/{model}")
171187
provider = self._get_provider(prov_name)
172
- return provider.analyze_image(image_bytes, prompt, max_tokens=max_tokens, model=model)
188
+ result = provider.analyze_image(image_bytes, prompt, max_tokens=max_tokens, model=model)
189
+ self._track(provider, prov_name, model)
190
+ return result
173191
174192
def transcribe_audio(
175193
self,
176194
audio_path: str | Path,
177195
language: Optional[str] = None,
178196
) -> dict:
179
- """Transcribe audio using the best available provider."""
197
+ """Transcribe audio using local Whisper if available, otherwise API."""
198
+ # Prefer local Whisper — no file size limits, no API costs
199
+ if not self.transcription_model or self.transcription_model.startswith("whisper-local"):
200
+ try:
201
+ from video_processor.providers.whisper_local import WhisperLocal
202
+
203
+ if WhisperLocal.is_available():
204
+ # Parse model size from "whisper-local:large" or default to "large"
205
+ size = "large"
206
+ if self.transcription_model and ":" in self.transcription_model:
207
+ size = self.transcription_model.split(":", 1)[1]
208
+ if not hasattr(self, "_whisper_local"):
209
+ self._whisper_local = WhisperLocal(model_size=size)
210
+ logger.info(f"Transcription: using local whisper-{size}")
211
+ result = self._whisper_local.transcribe(audio_path, language=language)
212
+ duration = result.get("duration") or 0
213
+ self.usage.record(
214
+ provider="local",
215
+ model=f"whisper-{size}",
216
+ audio_minutes=duration / 60 if duration else 0,
217
+ )
218
+ return result
219
+ except ImportError:
220
+ pass
221
+
222
+ # Fall back to API-based transcription
180223
prov_name, model = self._resolve_model(
181224
self.transcription_model, "audio", _TRANSCRIPTION_PREFERENCES
182225
)
183226
logger.info(f"Transcription: using {prov_name}/{model}")
184227
provider = self._get_provider(prov_name)
185
- return provider.transcribe_audio(audio_path, language=language, model=model)
228
+ result = provider.transcribe_audio(audio_path, language=language, model=model)
229
+ duration = result.get("duration") or 0
230
+ self.usage.record(
231
+ provider=prov_name,
232
+ model=model,
233
+ audio_minutes=duration / 60 if duration else 0,
234
+ )
235
+ return result
186236
187237
def get_models_used(self) -> dict[str, str]:
188238
"""Return a dict mapping capability to 'provider/model' for tracking."""
189239
result = {}
190240
for cap, explicit, prefs in [
191241
--- video_processor/providers/manager.py
+++ video_processor/providers/manager.py
@@ -7,10 +7,11 @@
7
8 from dotenv import load_dotenv
9
10 from video_processor.providers.base import BaseProvider, ModelInfo
11 from video_processor.providers.discovery import discover_available_models
 
12
13 load_dotenv()
14 logger = logging.getLogger(__name__)
15
16 # Default model preference rankings (tried in order)
@@ -60,10 +61,11 @@
60 auto : if True and no model specified, pick the best available
61 """
62 self.auto = auto
63 self._providers: dict[str, BaseProvider] = {}
64 self._available_models: Optional[list[ModelInfo]] = None
 
65
66 # If a single provider is forced, apply it
67 if provider:
68 self.vision_model = vision_model or self._default_for_provider(provider, "vision")
69 self.chat_model = chat_model or self._default_for_provider(provider, "chat")
@@ -143,10 +145,22 @@
143 raise RuntimeError(
144 f"No provider available for capability '{capability}'. "
145 "Set an API key for at least one provider."
146 )
147
 
 
 
 
 
 
 
 
 
 
 
 
148 # --- Public API ---
149
150 def chat(
151 self,
152 messages: list[dict],
@@ -155,11 +169,13 @@
155 ) -> str:
156 """Send a chat completion to the best available provider."""
157 prov_name, model = self._resolve_model(self.chat_model, "chat", _CHAT_PREFERENCES)
158 logger.info(f"Chat: using {prov_name}/{model}")
159 provider = self._get_provider(prov_name)
160 return provider.chat(messages, max_tokens=max_tokens, temperature=temperature, model=model)
 
 
161
162 def analyze_image(
163 self,
164 image_bytes: bytes,
165 prompt: str,
@@ -167,24 +183,58 @@
167 ) -> str:
168 """Analyze an image using the best available vision provider."""
169 prov_name, model = self._resolve_model(self.vision_model, "vision", _VISION_PREFERENCES)
170 logger.info(f"Vision: using {prov_name}/{model}")
171 provider = self._get_provider(prov_name)
172 return provider.analyze_image(image_bytes, prompt, max_tokens=max_tokens, model=model)
 
 
173
174 def transcribe_audio(
175 self,
176 audio_path: str | Path,
177 language: Optional[str] = None,
178 ) -> dict:
179 """Transcribe audio using the best available provider."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
180 prov_name, model = self._resolve_model(
181 self.transcription_model, "audio", _TRANSCRIPTION_PREFERENCES
182 )
183 logger.info(f"Transcription: using {prov_name}/{model}")
184 provider = self._get_provider(prov_name)
185 return provider.transcribe_audio(audio_path, language=language, model=model)
 
 
 
 
 
 
 
186
187 def get_models_used(self) -> dict[str, str]:
188 """Return a dict mapping capability to 'provider/model' for tracking."""
189 result = {}
190 for cap, explicit, prefs in [
191
--- video_processor/providers/manager.py
+++ video_processor/providers/manager.py
@@ -7,10 +7,11 @@
7
8 from dotenv import load_dotenv
9
10 from video_processor.providers.base import BaseProvider, ModelInfo
11 from video_processor.providers.discovery import discover_available_models
12 from video_processor.utils.usage_tracker import UsageTracker
13
14 load_dotenv()
15 logger = logging.getLogger(__name__)
16
17 # Default model preference rankings (tried in order)
@@ -60,10 +61,11 @@
61 auto : if True and no model specified, pick the best available
62 """
63 self.auto = auto
64 self._providers: dict[str, BaseProvider] = {}
65 self._available_models: Optional[list[ModelInfo]] = None
66 self.usage = UsageTracker()
67
68 # If a single provider is forced, apply it
69 if provider:
70 self.vision_model = vision_model or self._default_for_provider(provider, "vision")
71 self.chat_model = chat_model or self._default_for_provider(provider, "chat")
@@ -143,10 +145,22 @@
145 raise RuntimeError(
146 f"No provider available for capability '{capability}'. "
147 "Set an API key for at least one provider."
148 )
149
150 def _track(self, provider: BaseProvider, prov_name: str, model: str) -> None:
151 """Record usage from the last API call on a provider."""
152 last = getattr(provider, "_last_usage", None)
153 if last:
154 self.usage.record(
155 provider=prov_name,
156 model=model,
157 input_tokens=last.get("input_tokens", 0),
158 output_tokens=last.get("output_tokens", 0),
159 )
160 provider._last_usage = None
161
162 # --- Public API ---
163
164 def chat(
165 self,
166 messages: list[dict],
@@ -155,11 +169,13 @@
169 ) -> str:
170 """Send a chat completion to the best available provider."""
171 prov_name, model = self._resolve_model(self.chat_model, "chat", _CHAT_PREFERENCES)
172 logger.info(f"Chat: using {prov_name}/{model}")
173 provider = self._get_provider(prov_name)
174 result = provider.chat(messages, max_tokens=max_tokens, temperature=temperature, model=model)
175 self._track(provider, prov_name, model)
176 return result
177
178 def analyze_image(
179 self,
180 image_bytes: bytes,
181 prompt: str,
@@ -167,24 +183,58 @@
183 ) -> str:
184 """Analyze an image using the best available vision provider."""
185 prov_name, model = self._resolve_model(self.vision_model, "vision", _VISION_PREFERENCES)
186 logger.info(f"Vision: using {prov_name}/{model}")
187 provider = self._get_provider(prov_name)
188 result = provider.analyze_image(image_bytes, prompt, max_tokens=max_tokens, model=model)
189 self._track(provider, prov_name, model)
190 return result
191
192 def transcribe_audio(
193 self,
194 audio_path: str | Path,
195 language: Optional[str] = None,
196 ) -> dict:
197 """Transcribe audio using local Whisper if available, otherwise API."""
198 # Prefer local Whisper — no file size limits, no API costs
199 if not self.transcription_model or self.transcription_model.startswith("whisper-local"):
200 try:
201 from video_processor.providers.whisper_local import WhisperLocal
202
203 if WhisperLocal.is_available():
204 # Parse model size from "whisper-local:large" or default to "large"
205 size = "large"
206 if self.transcription_model and ":" in self.transcription_model:
207 size = self.transcription_model.split(":", 1)[1]
208 if not hasattr(self, "_whisper_local"):
209 self._whisper_local = WhisperLocal(model_size=size)
210 logger.info(f"Transcription: using local whisper-{size}")
211 result = self._whisper_local.transcribe(audio_path, language=language)
212 duration = result.get("duration") or 0
213 self.usage.record(
214 provider="local",
215 model=f"whisper-{size}",
216 audio_minutes=duration / 60 if duration else 0,
217 )
218 return result
219 except ImportError:
220 pass
221
222 # Fall back to API-based transcription
223 prov_name, model = self._resolve_model(
224 self.transcription_model, "audio", _TRANSCRIPTION_PREFERENCES
225 )
226 logger.info(f"Transcription: using {prov_name}/{model}")
227 provider = self._get_provider(prov_name)
228 result = provider.transcribe_audio(audio_path, language=language, model=model)
229 duration = result.get("duration") or 0
230 self.usage.record(
231 provider=prov_name,
232 model=model,
233 audio_minutes=duration / 60 if duration else 0,
234 )
235 return result
236
237 def get_models_used(self) -> dict[str, str]:
238 """Return a dict mapping capability to 'provider/model' for tracking."""
239 result = {}
240 for cap, explicit, prefs in [
241
--- video_processor/providers/openai_provider.py
+++ video_processor/providers/openai_provider.py
@@ -42,10 +42,14 @@
4242
model=model,
4343
messages=messages,
4444
max_tokens=max_tokens,
4545
temperature=temperature,
4646
)
47
+ self._last_usage = {
48
+ "input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0,
49
+ "output_tokens": getattr(response.usage, "completion_tokens", 0) if response.usage else 0,
50
+ }
4751
return response.choices[0].message.content or ""
4852
4953
def analyze_image(
5054
self,
5155
image_bytes: bytes,
@@ -69,19 +73,42 @@
6973
],
7074
}
7175
],
7276
max_tokens=max_tokens,
7377
)
78
+ self._last_usage = {
79
+ "input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0,
80
+ "output_tokens": getattr(response.usage, "completion_tokens", 0) if response.usage else 0,
81
+ }
7482
return response.choices[0].message.content or ""
7583
84
+ # Whisper API limit is 25MB
85
+ _MAX_FILE_SIZE = 25 * 1024 * 1024
86
+
7687
def transcribe_audio(
7788
self,
7889
audio_path: str | Path,
7990
language: Optional[str] = None,
8091
model: Optional[str] = None,
8192
) -> dict:
8293
model = model or "whisper-1"
94
+ audio_path = Path(audio_path)
95
+ file_size = audio_path.stat().st_size
96
+
97
+ if file_size <= self._MAX_FILE_SIZE:
98
+ return self._transcribe_single(audio_path, language, model)
99
+
100
+ # File too large — split into chunks and transcribe each
101
+ logger.info(
102
+ f"Audio file {file_size / 1024 / 1024:.1f}MB exceeds Whisper 25MB limit, chunking..."
103
+ )
104
+ return self._transcribe_chunked(audio_path, language, model)
105
+
106
+ def _transcribe_single(
107
+ self, audio_path: Path, language: Optional[str], model: str
108
+ ) -> dict:
109
+ """Transcribe a single audio file."""
83110
with open(audio_path, "rb") as f:
84111
kwargs = {"model": model, "file": f}
85112
if language:
86113
kwargs["language"] = language
87114
response = self.client.audio.transcriptions.create(
@@ -100,10 +127,66 @@
100127
"language": getattr(response, "language", language),
101128
"duration": getattr(response, "duration", None),
102129
"provider": "openai",
103130
"model": model,
104131
}
132
+
133
+ def _transcribe_chunked(
134
+ self, audio_path: Path, language: Optional[str], model: str
135
+ ) -> dict:
136
+ """Split audio into chunks under 25MB and transcribe each."""
137
+ import tempfile
138
+ from video_processor.extractors.audio_extractor import AudioExtractor
139
+
140
+ extractor = AudioExtractor()
141
+ audio_data, sr = extractor.load_audio(audio_path)
142
+ total_duration = len(audio_data) / sr
143
+
144
+ # Calculate chunk duration to stay under 25MB
145
+ # WAV: 16-bit mono = 2 bytes/sample, plus header overhead
146
+ bytes_per_second = sr * 2
147
+ max_seconds = self._MAX_FILE_SIZE // bytes_per_second
148
+ # Use 80% of max to leave headroom
149
+ chunk_ms = int(max_seconds * 0.8 * 1000)
150
+
151
+ segments_data = extractor.segment_audio(audio_data, sr, segment_length_ms=chunk_ms)
152
+ logger.info(f"Split into {len(segments_data)} chunks of ~{chunk_ms / 1000:.0f}s each")
153
+
154
+ all_text = []
155
+ all_segments = []
156
+ time_offset = 0.0
157
+ detected_language = language
158
+
159
+ with tempfile.TemporaryDirectory() as tmpdir:
160
+ for i, chunk in enumerate(segments_data):
161
+ chunk_path = Path(tmpdir) / f"chunk_{i:03d}.wav"
162
+ extractor.save_segment(chunk, chunk_path, sr)
163
+
164
+ logger.info(f"Transcribing chunk {i + 1}/{len(segments_data)}...")
165
+ result = self._transcribe_single(chunk_path, language, model)
166
+
167
+ all_text.append(result["text"])
168
+ for seg in result.get("segments", []):
169
+ all_segments.append({
170
+ "start": seg["start"] + time_offset,
171
+ "end": seg["end"] + time_offset,
172
+ "text": seg["text"],
173
+ })
174
+
175
+ if not detected_language and result.get("language"):
176
+ detected_language = result["language"]
177
+
178
+ time_offset += len(chunk) / sr
179
+
180
+ return {
181
+ "text": " ".join(all_text),
182
+ "segments": all_segments,
183
+ "language": detected_language,
184
+ "duration": total_duration,
185
+ "provider": "openai",
186
+ "model": model,
187
+ }
105188
106189
def list_models(self) -> list[ModelInfo]:
107190
models = []
108191
try:
109192
for m in self.client.models.list():
110193
111194
ADDED video_processor/providers/whisper_local.py
--- video_processor/providers/openai_provider.py
+++ video_processor/providers/openai_provider.py
@@ -42,10 +42,14 @@
42 model=model,
43 messages=messages,
44 max_tokens=max_tokens,
45 temperature=temperature,
46 )
 
 
 
 
47 return response.choices[0].message.content or ""
48
49 def analyze_image(
50 self,
51 image_bytes: bytes,
@@ -69,19 +73,42 @@
69 ],
70 }
71 ],
72 max_tokens=max_tokens,
73 )
 
 
 
 
74 return response.choices[0].message.content or ""
75
 
 
 
76 def transcribe_audio(
77 self,
78 audio_path: str | Path,
79 language: Optional[str] = None,
80 model: Optional[str] = None,
81 ) -> dict:
82 model = model or "whisper-1"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83 with open(audio_path, "rb") as f:
84 kwargs = {"model": model, "file": f}
85 if language:
86 kwargs["language"] = language
87 response = self.client.audio.transcriptions.create(
@@ -100,10 +127,66 @@
100 "language": getattr(response, "language", language),
101 "duration": getattr(response, "duration", None),
102 "provider": "openai",
103 "model": model,
104 }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
106 def list_models(self) -> list[ModelInfo]:
107 models = []
108 try:
109 for m in self.client.models.list():
110
111 DDED video_processor/providers/whisper_local.py
--- video_processor/providers/openai_provider.py
+++ video_processor/providers/openai_provider.py
@@ -42,10 +42,14 @@
42 model=model,
43 messages=messages,
44 max_tokens=max_tokens,
45 temperature=temperature,
46 )
47 self._last_usage = {
48 "input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0,
49 "output_tokens": getattr(response.usage, "completion_tokens", 0) if response.usage else 0,
50 }
51 return response.choices[0].message.content or ""
52
53 def analyze_image(
54 self,
55 image_bytes: bytes,
@@ -69,19 +73,42 @@
73 ],
74 }
75 ],
76 max_tokens=max_tokens,
77 )
78 self._last_usage = {
79 "input_tokens": getattr(response.usage, "prompt_tokens", 0) if response.usage else 0,
80 "output_tokens": getattr(response.usage, "completion_tokens", 0) if response.usage else 0,
81 }
82 return response.choices[0].message.content or ""
83
84 # Whisper API limit is 25MB
85 _MAX_FILE_SIZE = 25 * 1024 * 1024
86
87 def transcribe_audio(
88 self,
89 audio_path: str | Path,
90 language: Optional[str] = None,
91 model: Optional[str] = None,
92 ) -> dict:
93 model = model or "whisper-1"
94 audio_path = Path(audio_path)
95 file_size = audio_path.stat().st_size
96
97 if file_size <= self._MAX_FILE_SIZE:
98 return self._transcribe_single(audio_path, language, model)
99
100 # File too large — split into chunks and transcribe each
101 logger.info(
102 f"Audio file {file_size / 1024 / 1024:.1f}MB exceeds Whisper 25MB limit, chunking..."
103 )
104 return self._transcribe_chunked(audio_path, language, model)
105
106 def _transcribe_single(
107 self, audio_path: Path, language: Optional[str], model: str
108 ) -> dict:
109 """Transcribe a single audio file."""
110 with open(audio_path, "rb") as f:
111 kwargs = {"model": model, "file": f}
112 if language:
113 kwargs["language"] = language
114 response = self.client.audio.transcriptions.create(
@@ -100,10 +127,66 @@
127 "language": getattr(response, "language", language),
128 "duration": getattr(response, "duration", None),
129 "provider": "openai",
130 "model": model,
131 }
132
133 def _transcribe_chunked(
134 self, audio_path: Path, language: Optional[str], model: str
135 ) -> dict:
136 """Split audio into chunks under 25MB and transcribe each."""
137 import tempfile
138 from video_processor.extractors.audio_extractor import AudioExtractor
139
140 extractor = AudioExtractor()
141 audio_data, sr = extractor.load_audio(audio_path)
142 total_duration = len(audio_data) / sr
143
144 # Calculate chunk duration to stay under 25MB
145 # WAV: 16-bit mono = 2 bytes/sample, plus header overhead
146 bytes_per_second = sr * 2
147 max_seconds = self._MAX_FILE_SIZE // bytes_per_second
148 # Use 80% of max to leave headroom
149 chunk_ms = int(max_seconds * 0.8 * 1000)
150
151 segments_data = extractor.segment_audio(audio_data, sr, segment_length_ms=chunk_ms)
152 logger.info(f"Split into {len(segments_data)} chunks of ~{chunk_ms / 1000:.0f}s each")
153
154 all_text = []
155 all_segments = []
156 time_offset = 0.0
157 detected_language = language
158
159 with tempfile.TemporaryDirectory() as tmpdir:
160 for i, chunk in enumerate(segments_data):
161 chunk_path = Path(tmpdir) / f"chunk_{i:03d}.wav"
162 extractor.save_segment(chunk, chunk_path, sr)
163
164 logger.info(f"Transcribing chunk {i + 1}/{len(segments_data)}...")
165 result = self._transcribe_single(chunk_path, language, model)
166
167 all_text.append(result["text"])
168 for seg in result.get("segments", []):
169 all_segments.append({
170 "start": seg["start"] + time_offset,
171 "end": seg["end"] + time_offset,
172 "text": seg["text"],
173 })
174
175 if not detected_language and result.get("language"):
176 detected_language = result["language"]
177
178 time_offset += len(chunk) / sr
179
180 return {
181 "text": " ".join(all_text),
182 "segments": all_segments,
183 "language": detected_language,
184 "duration": total_duration,
185 "provider": "openai",
186 "model": model,
187 }
188
189 def list_models(self) -> list[ModelInfo]:
190 models = []
191 try:
192 for m in self.client.models.list():
193
194 DDED video_processor/providers/whisper_local.py
--- a/video_processor/providers/whisper_local.py
+++ b/video_processor/providers/whisper_local.py
@@ -0,0 +1,76 @@
1
+"""Local Whisper transcription provider — runs on-device with GPU acceleration."""
2
+
3
+import logging
4
+from pathlib import Path
5
+from typing import Optional
6
+
7
+logger = logging.getLogger(__name__)
8
+
9
+# Model size → approximate VRAM/RAM usage
10
+_MODEL_SIZES = {
11
+ "tiny": "~1GB",
12
+ "base": "~1GB",
13
+ "small": "~2GB",
14
+ "medium": "~5GB",
15
+ "large": "~10GB",
16
+ "turbo": "~6GB",
17
+}
18
+
19
+
20
+class WhisperLocal:
21
+ """
22
+ Local Whisper transcription using openai-whisper.
23
+
24
+ Uses MPS (Apple Silicon) or CUDA when available, falls back to CPU.
25
+ No file size limits — processes audio directly on device.
26
+ """
27
+
28
+ def __init__(self, model_size: str = "large", device: Optional[str] = None):
29
+ """
30
+ Initialize local Whisper.
31
+
32
+ Parameters
33
+ ----------
34
+ model_size : str
35
+ Whisper model size: tiny, base, small, medium, large, turbo
36
+ device : str, optional
37
+ Force device: 'mps', 'cuda', 'cpu'. Auto-detects if None.
38
+ """
39
+ self.model_size = model_size
40
+ self._model = None
41
+
42
+ if device:
43
+ self.device = device
44
+ else:
45
+ self.device = self._detect_device()
46
+
47
+ logger.info(
48
+ f"WhisperLocal: model={model_size} ({_MODEL_SIZES.get(model_size, '?')}), "
49
+ f"device={self.device}"
50
+ )
51
+
52
+ @staticmethod
53
+ def _detect_device() -> str:
54
+ """Auto-detect the best available device."""
55
+ try:
56
+ import torch
57
+
58
+ if torch.cuda.is_available():
59
+ return "cuda"
60
+ if torch.backends.mps.is_available():
61
+ return "mps"
62
+ except ImportError:
63
+ pass
64
+ return "cpu"
65
+
66
+ def _load_model(self):
67
+ """Lazy-load the Whisper model."""
68
+ if self._model is not None:
69
+ return
70
+
71
+ try:
72
+ import whisper
73
+ except ImportError:
74
+ ze} ({_MODEL_SIZES raise ImportError("openai-whisper not installed. Run: pip in
75
+ h.backends., small, medium, large, tLoading Whisper {self.model_size} model on {self.device}...")
76
+
--- a/video_processor/providers/whisper_local.py
+++ b/video_processor/providers/whisper_local.py
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- a/video_processor/providers/whisper_local.py
+++ b/video_processor/providers/whisper_local.py
@@ -0,0 +1,76 @@
1 """Local Whisper transcription provider — runs on-device with GPU acceleration."""
2
3 import logging
4 from pathlib import Path
5 from typing import Optional
6
7 logger = logging.getLogger(__name__)
8
9 # Model size → approximate VRAM/RAM usage
10 _MODEL_SIZES = {
11 "tiny": "~1GB",
12 "base": "~1GB",
13 "small": "~2GB",
14 "medium": "~5GB",
15 "large": "~10GB",
16 "turbo": "~6GB",
17 }
18
19
20 class WhisperLocal:
21 """
22 Local Whisper transcription using openai-whisper.
23
24 Uses MPS (Apple Silicon) or CUDA when available, falls back to CPU.
25 No file size limits — processes audio directly on device.
26 """
27
28 def __init__(self, model_size: str = "large", device: Optional[str] = None):
29 """
30 Initialize local Whisper.
31
32 Parameters
33 ----------
34 model_size : str
35 Whisper model size: tiny, base, small, medium, large, turbo
36 device : str, optional
37 Force device: 'mps', 'cuda', 'cpu'. Auto-detects if None.
38 """
39 self.model_size = model_size
40 self._model = None
41
42 if device:
43 self.device = device
44 else:
45 self.device = self._detect_device()
46
47 logger.info(
48 f"WhisperLocal: model={model_size} ({_MODEL_SIZES.get(model_size, '?')}), "
49 f"device={self.device}"
50 )
51
52 @staticmethod
53 def _detect_device() -> str:
54 """Auto-detect the best available device."""
55 try:
56 import torch
57
58 if torch.cuda.is_available():
59 return "cuda"
60 if torch.backends.mps.is_available():
61 return "mps"
62 except ImportError:
63 pass
64 return "cpu"
65
66 def _load_model(self):
67 """Lazy-load the Whisper model."""
68 if self._model is not None:
69 return
70
71 try:
72 import whisper
73 except ImportError:
74 ze} ({_MODEL_SIZES raise ImportError("openai-whisper not installed. Run: pip in
75 h.backends., small, medium, large, tLoading Whisper {self.model_size} model on {self.device}...")
76
--- video_processor/sources/base.py
+++ video_processor/sources/base.py
@@ -50,17 +50,19 @@
5050
def download_all(
5151
self,
5252
files: List[SourceFile],
5353
destination_dir: Path,
5454
) -> List[Path]:
55
- """Download multiple files to a directory."""
55
+ """Download multiple files to a directory, preserving subfolder structure."""
5656
destination_dir.mkdir(parents=True, exist_ok=True)
5757
paths = []
5858
for f in files:
59
- dest = destination_dir / f.name
59
+ # Use path (with subfolder) if available, otherwise just name
60
+ relative = f.path if f.path else f.name
61
+ dest = destination_dir / relative
6062
try:
6163
local_path = self.download(f, dest)
6264
paths.append(local_path)
63
- logger.info(f"Downloaded: {f.name}")
65
+ logger.info(f"Downloaded: {relative}")
6466
except Exception as e:
65
- logger.error(f"Failed to download {f.name}: {e}")
67
+ logger.error(f"Failed to download {relative}: {e}")
6668
return paths
6769
--- video_processor/sources/base.py
+++ video_processor/sources/base.py
@@ -50,17 +50,19 @@
50 def download_all(
51 self,
52 files: List[SourceFile],
53 destination_dir: Path,
54 ) -> List[Path]:
55 """Download multiple files to a directory."""
56 destination_dir.mkdir(parents=True, exist_ok=True)
57 paths = []
58 for f in files:
59 dest = destination_dir / f.name
 
 
60 try:
61 local_path = self.download(f, dest)
62 paths.append(local_path)
63 logger.info(f"Downloaded: {f.name}")
64 except Exception as e:
65 logger.error(f"Failed to download {f.name}: {e}")
66 return paths
67
--- video_processor/sources/base.py
+++ video_processor/sources/base.py
@@ -50,17 +50,19 @@
50 def download_all(
51 self,
52 files: List[SourceFile],
53 destination_dir: Path,
54 ) -> List[Path]:
55 """Download multiple files to a directory, preserving subfolder structure."""
56 destination_dir.mkdir(parents=True, exist_ok=True)
57 paths = []
58 for f in files:
59 # Use path (with subfolder) if available, otherwise just name
60 relative = f.path if f.path else f.name
61 dest = destination_dir / relative
62 try:
63 local_path = self.download(f, dest)
64 paths.append(local_path)
65 logger.info(f"Downloaded: {relative}")
66 except Exception as e:
67 logger.error(f"Failed to download {relative}: {e}")
68 return paths
69
--- video_processor/sources/google_drive.py
+++ video_processor/sources/google_drive.py
@@ -187,31 +187,81 @@
187187
def list_videos(
188188
self,
189189
folder_id: Optional[str] = None,
190190
folder_path: Optional[str] = None,
191191
patterns: Optional[List[str]] = None,
192
+ recursive: bool = True,
192193
) -> List[SourceFile]:
193
- """List video files in a Google Drive folder."""
194
+ """
195
+ List video files in a Google Drive folder.
196
+
197
+ Parameters
198
+ ----------
199
+ folder_id : str, optional
200
+ Google Drive folder ID.
201
+ folder_path : str, optional
202
+ Not used for Google Drive (folder_id is canonical).
203
+ patterns : list[str], optional
204
+ File extension patterns like ['*.mp4', '*.mkv'].
205
+ recursive : bool
206
+ If True, recurse into subfolders (default: True).
207
+ """
194208
if not self.service:
195209
raise RuntimeError("Not authenticated. Call authenticate() first.")
196210
197
- # Build query
211
+ files: List[SourceFile] = []
212
+ self._list_folder(
213
+ folder_id=folder_id,
214
+ prefix="",
215
+ patterns=patterns,
216
+ recursive=recursive,
217
+ out=files,
218
+ )
219
+
220
+ logger.info(f"Found {len(files)} videos in Google Drive")
221
+ return files
222
+
223
+ def _list_folder(
224
+ self,
225
+ folder_id: Optional[str],
226
+ prefix: str,
227
+ patterns: Optional[List[str]],
228
+ recursive: bool,
229
+ out: List[SourceFile],
230
+ ) -> None:
231
+ """List videos in a single folder, optionally recursing into subfolders."""
232
+ # List video files
233
+ self._list_files_in_folder(folder_id, prefix, patterns, out)
234
+
235
+ # Recurse into subfolders
236
+ if recursive:
237
+ subfolders = self._list_subfolders(folder_id)
238
+ for sf_id, sf_name in subfolders:
239
+ sub_prefix = f"{prefix}{sf_name}/" if prefix else f"{sf_name}/"
240
+ logger.debug(f"Recursing into subfolder: {sub_prefix}")
241
+ self._list_folder(sf_id, sub_prefix, patterns, recursive, out)
242
+
243
+ def _list_files_in_folder(
244
+ self,
245
+ folder_id: Optional[str],
246
+ prefix: str,
247
+ patterns: Optional[List[str]],
248
+ out: List[SourceFile],
249
+ ) -> None:
250
+ """List video files in a single folder (non-recursive)."""
198251
query_parts = []
199252
200253
if folder_id:
201254
query_parts.append(f"'{folder_id}' in parents")
202255
203
- # Filter for video MIME types
204256
mime_conditions = " or ".join(
205257
f"mimeType='{mt}'" for mt in VIDEO_MIME_TYPES
206258
)
207259
query_parts.append(f"({mime_conditions})")
208260
query_parts.append("trashed=false")
209261
210262
query = " and ".join(query_parts)
211
-
212
- files = []
213263
page_token = None
214264
215265
while True:
216266
response = (
217267
self.service.files()
@@ -224,34 +274,65 @@
224274
)
225275
.execute()
226276
)
227277
228278
for f in response.get("files", []):
229
- # Apply pattern filtering if specified
230
- if patterns:
231
- name = f.get("name", "")
232
- if not any(
233
- name.endswith(p.replace("*", "")) for p in patterns
234
- ):
235
- continue
236
-
237
- files.append(
279
+ name = f.get("name", "")
280
+ if patterns and not any(
281
+ name.endswith(p.replace("*", "")) for p in patterns
282
+ ):
283
+ continue
284
+
285
+ out.append(
238286
SourceFile(
239
- name=f["name"],
287
+ name=name,
240288
id=f["id"],
241289
size_bytes=int(f.get("size", 0)) if f.get("size") else None,
242290
mime_type=f.get("mimeType"),
243291
modified_at=f.get("modifiedTime"),
292
+ path=f"{prefix}{name}" if prefix else name,
244293
)
245294
)
246295
247296
page_token = response.get("nextPageToken")
248297
if not page_token:
249298
break
250299
251
- logger.info(f"Found {len(files)} videos in Google Drive")
252
- return files
300
+ def _list_subfolders(self, parent_id: Optional[str]) -> List[tuple]:
301
+ """List immediate subfolders of a folder. Returns list of (id, name)."""
302
+ query_parts = [
303
+ "mimeType='application/vnd.google-apps.folder'",
304
+ "trashed=false",
305
+ ]
306
+ if parent_id:
307
+ query_parts.append(f"'{parent_id}' in parents")
308
+
309
+ query = " and ".join(query_parts)
310
+ subfolders = []
311
+ page_token = None
312
+
313
+ while True:
314
+ response = (
315
+ self.service.files()
316
+ .list(
317
+ q=query,
318
+ spaces="drive",
319
+ fields="nextPageToken, files(id, name)",
320
+ pageToken=page_token,
321
+ pageSize=100,
322
+ )
323
+ .execute()
324
+ )
325
+
326
+ for f in response.get("files", []):
327
+ subfolders.append((f["id"], f["name"]))
328
+
329
+ page_token = response.get("nextPageToken")
330
+ if not page_token:
331
+ break
332
+
333
+ return sorted(subfolders, key=lambda x: x[1])
253334
254335
def download(self, file: SourceFile, destination: Path) -> Path:
255336
"""Download a file from Google Drive."""
256337
if not self.service:
257338
raise RuntimeError("Not authenticated. Call authenticate() first.")
258339
259340
ADDED video_processor/utils/__init__.py
260341
ADDED video_processor/utils/api_cache.py
--- video_processor/sources/google_drive.py
+++ video_processor/sources/google_drive.py
@@ -187,31 +187,81 @@
187 def list_videos(
188 self,
189 folder_id: Optional[str] = None,
190 folder_path: Optional[str] = None,
191 patterns: Optional[List[str]] = None,
 
192 ) -> List[SourceFile]:
193 """List video files in a Google Drive folder."""
 
 
 
 
 
 
 
 
 
 
 
 
 
194 if not self.service:
195 raise RuntimeError("Not authenticated. Call authenticate() first.")
196
197 # Build query
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
198 query_parts = []
199
200 if folder_id:
201 query_parts.append(f"'{folder_id}' in parents")
202
203 # Filter for video MIME types
204 mime_conditions = " or ".join(
205 f"mimeType='{mt}'" for mt in VIDEO_MIME_TYPES
206 )
207 query_parts.append(f"({mime_conditions})")
208 query_parts.append("trashed=false")
209
210 query = " and ".join(query_parts)
211
212 files = []
213 page_token = None
214
215 while True:
216 response = (
217 self.service.files()
@@ -224,34 +274,65 @@
224 )
225 .execute()
226 )
227
228 for f in response.get("files", []):
229 # Apply pattern filtering if specified
230 if patterns:
231 name = f.get("name", "")
232 if not any(
233 name.endswith(p.replace("*", "")) for p in patterns
234 ):
235 continue
236
237 files.append(
238 SourceFile(
239 name=f["name"],
240 id=f["id"],
241 size_bytes=int(f.get("size", 0)) if f.get("size") else None,
242 mime_type=f.get("mimeType"),
243 modified_at=f.get("modifiedTime"),
 
244 )
245 )
246
247 page_token = response.get("nextPageToken")
248 if not page_token:
249 break
250
251 logger.info(f"Found {len(files)} videos in Google Drive")
252 return files
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
253
254 def download(self, file: SourceFile, destination: Path) -> Path:
255 """Download a file from Google Drive."""
256 if not self.service:
257 raise RuntimeError("Not authenticated. Call authenticate() first.")
258
259 DDED video_processor/utils/__init__.py
260 DDED video_processor/utils/api_cache.py
--- video_processor/sources/google_drive.py
+++ video_processor/sources/google_drive.py
@@ -187,31 +187,81 @@
187 def list_videos(
188 self,
189 folder_id: Optional[str] = None,
190 folder_path: Optional[str] = None,
191 patterns: Optional[List[str]] = None,
192 recursive: bool = True,
193 ) -> List[SourceFile]:
194 """
195 List video files in a Google Drive folder.
196
197 Parameters
198 ----------
199 folder_id : str, optional
200 Google Drive folder ID.
201 folder_path : str, optional
202 Not used for Google Drive (folder_id is canonical).
203 patterns : list[str], optional
204 File extension patterns like ['*.mp4', '*.mkv'].
205 recursive : bool
206 If True, recurse into subfolders (default: True).
207 """
208 if not self.service:
209 raise RuntimeError("Not authenticated. Call authenticate() first.")
210
211 files: List[SourceFile] = []
212 self._list_folder(
213 folder_id=folder_id,
214 prefix="",
215 patterns=patterns,
216 recursive=recursive,
217 out=files,
218 )
219
220 logger.info(f"Found {len(files)} videos in Google Drive")
221 return files
222
223 def _list_folder(
224 self,
225 folder_id: Optional[str],
226 prefix: str,
227 patterns: Optional[List[str]],
228 recursive: bool,
229 out: List[SourceFile],
230 ) -> None:
231 """List videos in a single folder, optionally recursing into subfolders."""
232 # List video files
233 self._list_files_in_folder(folder_id, prefix, patterns, out)
234
235 # Recurse into subfolders
236 if recursive:
237 subfolders = self._list_subfolders(folder_id)
238 for sf_id, sf_name in subfolders:
239 sub_prefix = f"{prefix}{sf_name}/" if prefix else f"{sf_name}/"
240 logger.debug(f"Recursing into subfolder: {sub_prefix}")
241 self._list_folder(sf_id, sub_prefix, patterns, recursive, out)
242
243 def _list_files_in_folder(
244 self,
245 folder_id: Optional[str],
246 prefix: str,
247 patterns: Optional[List[str]],
248 out: List[SourceFile],
249 ) -> None:
250 """List video files in a single folder (non-recursive)."""
251 query_parts = []
252
253 if folder_id:
254 query_parts.append(f"'{folder_id}' in parents")
255
 
256 mime_conditions = " or ".join(
257 f"mimeType='{mt}'" for mt in VIDEO_MIME_TYPES
258 )
259 query_parts.append(f"({mime_conditions})")
260 query_parts.append("trashed=false")
261
262 query = " and ".join(query_parts)
 
 
263 page_token = None
264
265 while True:
266 response = (
267 self.service.files()
@@ -224,34 +274,65 @@
274 )
275 .execute()
276 )
277
278 for f in response.get("files", []):
279 name = f.get("name", "")
280 if patterns and not any(
281 name.endswith(p.replace("*", "")) for p in patterns
282 ):
283 continue
284
285 out.append(
 
 
286 SourceFile(
287 name=name,
288 id=f["id"],
289 size_bytes=int(f.get("size", 0)) if f.get("size") else None,
290 mime_type=f.get("mimeType"),
291 modified_at=f.get("modifiedTime"),
292 path=f"{prefix}{name}" if prefix else name,
293 )
294 )
295
296 page_token = response.get("nextPageToken")
297 if not page_token:
298 break
299
300 def _list_subfolders(self, parent_id: Optional[str]) -> List[tuple]:
301 """List immediate subfolders of a folder. Returns list of (id, name)."""
302 query_parts = [
303 "mimeType='application/vnd.google-apps.folder'",
304 "trashed=false",
305 ]
306 if parent_id:
307 query_parts.append(f"'{parent_id}' in parents")
308
309 query = " and ".join(query_parts)
310 subfolders = []
311 page_token = None
312
313 while True:
314 response = (
315 self.service.files()
316 .list(
317 q=query,
318 spaces="drive",
319 fields="nextPageToken, files(id, name)",
320 pageToken=page_token,
321 pageSize=100,
322 )
323 .execute()
324 )
325
326 for f in response.get("files", []):
327 subfolders.append((f["id"], f["name"]))
328
329 page_token = response.get("nextPageToken")
330 if not page_token:
331 break
332
333 return sorted(subfolders, key=lambda x: x[1])
334
335 def download(self, file: SourceFile, destination: Path) -> Path:
336 """Download a file from Google Drive."""
337 if not self.service:
338 raise RuntimeError("Not authenticated. Call authenticate() first.")
339
340 DDED video_processor/utils/__init__.py
341 DDED video_processor/utils/api_cache.py

No diff available

--- a/video_processor/utils/api_cache.py
+++ b/video_processor/utils/api_cache.py
@@ -0,0 +1,163 @@
1
+"""Caching system for API responses to reduce API calls and costs."""
2
+import json
3
+import logimport hashlib
4
+from pathlib import Path
5
+from typing import Any, Dict, Optional, Union
6
+
7
+logger = loging.getLogger(__name__)
8
+
9
+
10
+cla sed API response cache."""
11
+
12
+
13
+ c
14
+ Union[str, Path],
15
+ ,
16
+ ttl: int = 86400, # 24 hours in seconds
17
+ ):
18
+ """
19
+ ne otherwise
20
+ ----------
21
+ cache_dir : str or Path
22
+ Directory for cache files
23
+ namespace : str
24
+ Cache namespace for organizing cache files
25
+ ttl : int
26
+ Time-to-live for cache entries in seconds
27
+ """
28
+ self.cache_dir = Path(cache_dir)
29
+ self.namespace = namesp ace
30
+ self.ttl = ttl
31
+
32
+ # Ensure namespace directory exists
33
+ self.namespace_dir = self.cache_dir / namespace
34
+ self.namespace_dir.mkdir(pa
35
+red
36
+ timestamp
37
+
38
+ logger.de
39
+ def get_cache_path(self, key: str) -> Path:
40
+ """
41
+ Get p ne otherwise
42
+ """
43
+ cache_path = self.get_cache_path(key)
44
+
45
+ der than this Path
46
+ Path to cache file
47
+ """
48
+ # Hash the key to ensure valid filename
49
+ hashed_key = hashlib.md5(key.encode()).hexdigest()
50
+ return self.namespace_dir / f"{hashed_key}.json"
51
+
52
+ def get(self, key: str) -> Optional[Any]:
53
+ """
54
+ Get value from cache.
55
+ ne otherwise
56
+ """
57
+ cache_path = self.get_cache_path(key)
58
+
59
+ der than this object or None
60
+ Cached value if available and not expired, None otherwise
61
+ """
62
+ cache_path = self.get_cache_path(key)
63
+
64
+ # Check path(key)
65
+
66
+ # Check if cache file exists
67
+ if not cache_path.exists(
68
+ Returns
69
+ ---Read cache file
70
+ with open(cache_path, "r", encoding="utf-8") as f:
71
+ cache_data = json.load(f)
72
+
73
+ f.get_cache_path(key)es to reduce API calls and costs."""
74
+
75
+import hashlib
76
+import json
77
+import log"""Caching s
78
+ = time.time()
79
+
80
+ if now - timestamp > self.ttl:
81
+ logger.debug(f"Cache y: str) -> Optional[Any]:
82
+ Time-to-live for cache en
83
+ except E"""Caching system for Am for API responses to .ttl = ttl
84
+
85
+ # Ensure namespace directory exists
86
+ self.namespace_dir = self. ne otherwise
87
+ """
88
+ cache_path = self.get_cache_path(key)
89
+
90
+ Retu logger.debug(f"Initialized API cache in {self.namespace_dir}")
91
+
92
+ der than this Get path to cache file for key.
93
+
94
+ Parameters
95
+ ----------
96
+ key :
97
+ Returns
98
+ -------
99
+ Path
100
+ Path to cache fp", 0)
101
+ "timestamp": time.time(),p", 0)
102
+ "value": value
103
+ }
104
+ ensure valid filename
105
+ hashed_key = hashlib.md5(key.encode()).hexdigest()
106
+ return self.namespace_dir / f"{hashed_key}.json"
107
+
108
+ def get(self, ke y: str) -> Optional[Any]:
109
+ """
110
+ Get value from cache.
111
+
112
+
113
+ except E"""Caching system for Am for API responses to redu -------
114
+ object or None
115
+ Cached value if available and not expired, No ne otherwise
116
+ """
117
+ cache_path = self.get_cache_path(key)
118
+
119
+ der than this Get path to cache file fo try:
120
+ # Read cache file
121
+ with open(cache_path, "r", encoding="utf he_path, "r", encoding="utf-8") as f:
122
+ cache_data = json.load(f)
123
+
124
+ # Check if cache entry is expired
125
+ timestamp = cache_data.get("timestamp", 0)
126
+ now = time.time()
127
+
128
+ if now - timestamp > self.ttl:
129
+ logger.deb
130
+ return False
131
+
132
+ def clear(self, older_than: Optional[int] = None) -> int:
133
+ """
134
+ Clear all cache entries or entries older than specified time.
135
+ ne otherwise
136
+ older_than : int, optional
137
+ Clear entries older than this many seconds
138
+ der than this many seconds
139
+
140
+ Returns
141
+ -------
142
+ int
143
+ Number of entries cleared
144
+ """
145
+ count = 0 self.
146
+ for cache_file in self.namespace_dir_file in self.namespace_dir.glob("*.json"):
147
+ try:
148
+ # Check file age if criteria provided
149
+ if older_than is not None:
150
+ file_age = now - os.path.getmtime(cache_file)
151
+ if file_age <= older_than:
152
+ p", 0)
153
+ # Remove file
154
+ os.remove(cache_file)
155
+ count += 1
156
+ e.time()
157
+
158
+ if now - timestamp > self.ttl:
159
+ logger logger.warning
160
+ xists
161
+ logger.info(f"Cleared {count} cache entries from {self.namespace}")
162
+ return count
163
+ ce}"
--- a/video_processor/utils/api_cache.py
+++ b/video_processor/utils/api_cache.py
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- a/video_processor/utils/api_cache.py
+++ b/video_processor/utils/api_cache.py
@@ -0,0 +1,163 @@
1 """Caching system for API responses to reduce API calls and costs."""
2 import json
3 import logimport hashlib
4 from pathlib import Path
5 from typing import Any, Dict, Optional, Union
6
7 logger = loging.getLogger(__name__)
8
9
10 cla sed API response cache."""
11
12
13 c
14 Union[str, Path],
15 ,
16 ttl: int = 86400, # 24 hours in seconds
17 ):
18 """
19 ne otherwise
20 ----------
21 cache_dir : str or Path
22 Directory for cache files
23 namespace : str
24 Cache namespace for organizing cache files
25 ttl : int
26 Time-to-live for cache entries in seconds
27 """
28 self.cache_dir = Path(cache_dir)
29 self.namespace = namesp ace
30 self.ttl = ttl
31
32 # Ensure namespace directory exists
33 self.namespace_dir = self.cache_dir / namespace
34 self.namespace_dir.mkdir(pa
35 red
36 timestamp
37
38 logger.de
39 def get_cache_path(self, key: str) -> Path:
40 """
41 Get p ne otherwise
42 """
43 cache_path = self.get_cache_path(key)
44
45 der than this Path
46 Path to cache file
47 """
48 # Hash the key to ensure valid filename
49 hashed_key = hashlib.md5(key.encode()).hexdigest()
50 return self.namespace_dir / f"{hashed_key}.json"
51
52 def get(self, key: str) -> Optional[Any]:
53 """
54 Get value from cache.
55 ne otherwise
56 """
57 cache_path = self.get_cache_path(key)
58
59 der than this object or None
60 Cached value if available and not expired, None otherwise
61 """
62 cache_path = self.get_cache_path(key)
63
64 # Check path(key)
65
66 # Check if cache file exists
67 if not cache_path.exists(
68 Returns
69 ---Read cache file
70 with open(cache_path, "r", encoding="utf-8") as f:
71 cache_data = json.load(f)
72
73 f.get_cache_path(key)es to reduce API calls and costs."""
74
75 import hashlib
76 import json
77 import log"""Caching s
78 = time.time()
79
80 if now - timestamp > self.ttl:
81 logger.debug(f"Cache y: str) -> Optional[Any]:
82 Time-to-live for cache en
83 except E"""Caching system for Am for API responses to .ttl = ttl
84
85 # Ensure namespace directory exists
86 self.namespace_dir = self. ne otherwise
87 """
88 cache_path = self.get_cache_path(key)
89
90 Retu logger.debug(f"Initialized API cache in {self.namespace_dir}")
91
92 der than this Get path to cache file for key.
93
94 Parameters
95 ----------
96 key :
97 Returns
98 -------
99 Path
100 Path to cache fp", 0)
101 "timestamp": time.time(),p", 0)
102 "value": value
103 }
104 ensure valid filename
105 hashed_key = hashlib.md5(key.encode()).hexdigest()
106 return self.namespace_dir / f"{hashed_key}.json"
107
108 def get(self, ke y: str) -> Optional[Any]:
109 """
110 Get value from cache.
111
112
113 except E"""Caching system for Am for API responses to redu -------
114 object or None
115 Cached value if available and not expired, No ne otherwise
116 """
117 cache_path = self.get_cache_path(key)
118
119 der than this Get path to cache file fo try:
120 # Read cache file
121 with open(cache_path, "r", encoding="utf he_path, "r", encoding="utf-8") as f:
122 cache_data = json.load(f)
123
124 # Check if cache entry is expired
125 timestamp = cache_data.get("timestamp", 0)
126 now = time.time()
127
128 if now - timestamp > self.ttl:
129 logger.deb
130 return False
131
132 def clear(self, older_than: Optional[int] = None) -> int:
133 """
134 Clear all cache entries or entries older than specified time.
135 ne otherwise
136 older_than : int, optional
137 Clear entries older than this many seconds
138 der than this many seconds
139
140 Returns
141 -------
142 int
143 Number of entries cleared
144 """
145 count = 0 self.
146 for cache_file in self.namespace_dir_file in self.namespace_dir.glob("*.json"):
147 try:
148 # Check file age if criteria provided
149 if older_than is not None:
150 file_age = now - os.path.getmtime(cache_file)
151 if file_age <= older_than:
152 p", 0)
153 # Remove file
154 os.remove(cache_file)
155 count += 1
156 e.time()
157
158 if now - timestamp > self.ttl:
159 logger logger.warning
160 xists
161 logger.info(f"Cleared {count} cache entries from {self.namespace}")
162 return count
163 ce}"
--- video_processor/utils/export.py
+++ video_processor/utils/export.py
@@ -2,10 +2,12 @@
22
33
import json
44
import logging
55
from pathlib import Path
66
from typing import Optional
7
+
8
+from tqdm import tqdm
79
810
from video_processor.models import DiagramResult, VideoManifest
911
from video_processor.utils.rendering import render_mermaid, reproduce_chart
1012
1113
logger = logging.getLogger(__name__)
@@ -153,11 +155,11 @@
153155
Updates manifest with output file paths and returns it.
154156
"""
155157
output_dir = Path(output_dir)
156158
157159
# Render mermaid diagrams to SVG/PNG
158
- for i, diagram in enumerate(manifest.diagrams):
160
+ for i, diagram in enumerate(tqdm(manifest.diagrams, desc="Rendering diagrams", unit="diag") if manifest.diagrams else []):
159161
if diagram.mermaid:
160162
diagrams_dir = output_dir / "diagrams"
161163
prefix = f"diagram_{i}"
162164
paths = render_mermaid(diagram.mermaid, diagrams_dir, prefix)
163165
if "svg" in paths:
164166
165167
ADDED video_processor/utils/usage_tracker.py
166168
ADDED video_processor/utils/visualization.py
--- video_processor/utils/export.py
+++ video_processor/utils/export.py
@@ -2,10 +2,12 @@
2
3 import json
4 import logging
5 from pathlib import Path
6 from typing import Optional
 
 
7
8 from video_processor.models import DiagramResult, VideoManifest
9 from video_processor.utils.rendering import render_mermaid, reproduce_chart
10
11 logger = logging.getLogger(__name__)
@@ -153,11 +155,11 @@
153 Updates manifest with output file paths and returns it.
154 """
155 output_dir = Path(output_dir)
156
157 # Render mermaid diagrams to SVG/PNG
158 for i, diagram in enumerate(manifest.diagrams):
159 if diagram.mermaid:
160 diagrams_dir = output_dir / "diagrams"
161 prefix = f"diagram_{i}"
162 paths = render_mermaid(diagram.mermaid, diagrams_dir, prefix)
163 if "svg" in paths:
164
165 DDED video_processor/utils/usage_tracker.py
166 DDED video_processor/utils/visualization.py
--- video_processor/utils/export.py
+++ video_processor/utils/export.py
@@ -2,10 +2,12 @@
2
3 import json
4 import logging
5 from pathlib import Path
6 from typing import Optional
7
8 from tqdm import tqdm
9
10 from video_processor.models import DiagramResult, VideoManifest
11 from video_processor.utils.rendering import render_mermaid, reproduce_chart
12
13 logger = logging.getLogger(__name__)
@@ -153,11 +155,11 @@
155 Updates manifest with output file paths and returns it.
156 """
157 output_dir = Path(output_dir)
158
159 # Render mermaid diagrams to SVG/PNG
160 for i, diagram in enumerate(tqdm(manifest.diagrams, desc="Rendering diagrams", unit="diag") if manifest.diagrams else []):
161 if diagram.mermaid:
162 diagrams_dir = output_dir / "diagrams"
163 prefix = f"diagram_{i}"
164 paths = render_mermaid(diagram.mermaid, diagrams_dir, prefix)
165 if "svg" in paths:
166
167 DDED video_processor/utils/usage_tracker.py
168 DDED video_processor/utils/visualization.py
--- a/video_processor/utils/usage_tracker.py
+++ b/video_processor/utils/usage_tracker.py
@@ -0,0 +1,153 @@
1
+"""Usage tracking and cost estimation for API calls."""
2
+
3
+import time
4
+from dataclasses import dataclass, field
5
+from typing import Optional
6
+
7
+
8
+# Cost per million tokens (USD) — updated Feb 2025
9
+_MODEL_PRICING = {
10
+ # Anthropic
11
+ "claude-sonnet-4-5-20250929": {"input": 3.00, "output": 15.00},
12
+ "claude-haiku-3-5-20241022": {"input": 0.80, "output": 4.00},
13
+ # OpenAI
14
+ "gpt-4o": {"input": 2.50, "output": 10.00},
15
+ "gpt-4o-mini": {"input": 0.15, "output": 0.60},
16
+ "gpt-4.1": {"input": 2.00, "output": 8.00},
17
+ "gpt-4.1-mini": {"input": 0.40, "output": 1.60},
18
+ "gpt-4.1-nano": {"input": 0.10, "output": 0.40},
19
+ # Google Gemini
20
+ "gemini-2.5-flash": {"input": 0.15, "output": 0.60},
21
+ "gemini-2.5-pro": {"input": 1.25, "output": 10.00},
22
+ "gemini-2.0-flash": {"input": 0.10, "output": 0.40},
23
+ # Whisper
24
+ "whisper-1": {"per_minute": 0.006},
25
+}
26
+
27
+
28
+@dataclass
29
+class ModelUsage:
30
+ """Accumulated usge for a single model."""
31
+
32
+ provider: str = ""
33
+ model: str = ""
34
+ calls: int = 0
35
+ input_tokens: int = 0
36
+ output_tokens: int = 0
37
+ audio_minutes: float = 0.0
38
+
39
+ @property
40
+ def total_tokens(self) -> int:
41
+ return self.input_tokens + self.output_tokens
42
+
43
+ @property
44
+ def estimated_cost(self) -> float:
45
+ pricing = _MODEL_PRICING.get(self.model)
46
+ if not pricing:
47
+ # Try partial match
48
+ for key, p in _MODEL_PRICING.items():
49
+ if key in self.model or self.model in key:
50
+ pricing = p
51
+ break
52
+ if not pricing:
53
+ return 0.0
54
+ if "per_minute" in pricing:
55
+ return self.audio_minutes * pricing["per_minute"]
56
+ return (
57
+ self.input_tokens * pricing.get("input", 0) / 1_000_000
58
+ + self.output_tokens * pricing.get("output", 0) / 1_000_000
59
+ )
60
+
61
+
62
+@dataclass
63
+class StepTiming:
64
+ """Timing for timation for API calls."""
65
+
66
+import time
67
+from dataclasses import dataclass, field
68
+from typing import Optional
69
+
70
+# Cost per million tokens (USD) — updated Feb 2025
71
+_MODEL_PRICING = {
72
+ # Anthropic
73
+ "claude-sonnet-4-5-20250929": {"input": 3.00, "output": 15.00},
74
+ "claude-haiku-3-5-20241022": {"input": 0.80, "output": 4.00},
75
+ """Usage tracking and cost estimation for API calls."""
76
+
77
+import time
78
+from dataclasses import dataclass, field
79
+from typing import Optional
80
+
81
+# Cost per million tokens (USD) — updated Feb 2025
82
+_MODEL_PRICING = {
83
+ # Anthropic
84
+ "claude-sonnet-4-5-20250929": {"input": 3.00, "output": 15.00},
85
+ "claude-haiku-3-5-20241022": {"input": 0.80, "output": 4.00},
86
+ # OpenAI
87
+ "gpt-4o": {"input": 2.50, "output": 10.00},
88
+ "gpt-4o-mini": {"input": 0.15, "output": 0.60},
89
+ "gpt-4.1": {"input": 2.00, "output": 8.00},
90
+ "gpt-4.1-mini": {"input": 0.40, "output": 1.60},
91
+ "gpt-4.1-nano": {"input": 0.10, "output": 0.40},
92
+ # Google Gemini
93
+ "gemini-2.5-flash": {"input": 0.15, "output": 0.60},
94
+ "gemini-2.5-pro": {"input": 1.25, "output": 10.00},
95
+ "gemini-2.0-flash": {"input": 0.10, "output": 0.40},
96
+ # Whisper
97
+ "whisper-1": {"per_minute": 0.006},
98
+}
99
+
100
+
101
+@dataclass
102
+class ModelUsage:
103
+ """Accumulated usage for a single model."""
104
+
105
+ provider: str = ""
106
+ model: str = ""
107
+ calls: int = 0
108
+ input_tokens: int = 0
109
+ output_tokens: int = 0
110
+ audio_minutes: float = 0.0
111
+
112
+ @property
113
+ def total_tokens(self) -> int:
114
+ return self.input_tokens + self.output_tokens
115
+
116
+ @property
117
+ def estimated_cost(self) -> float:
118
+ pricing = _MODEL_PRICING.get(self.model)
119
+ es.append(
120
+ ff.total_tokens:,} "
121
+ f"({self.total_input_tokens:,} in / {self.t)
122
+ lines.append("")
123
+ lines.append(f" {'Model':<35} {'Calls':>6} {'In Tok':>8} {'Out Tok':>8} {'Cost':>8}")
124
+ lines.append(f" {'-'*35} {'-'*6} {'-'*8} {'-'*8} {'-'*8}")
125
+ for key in sorted(self._models.keys()):
126
+ u = self._models[key]
127
+ cost_str = f"${u.estimated_cost:.4f}" if u.estimated_cost > 0 else "free"
128
+ if u.audio_minutes > 0:
129
+ lines.append(
130
+ f" {key:<35} {u.calls:>6} {u.audio_minutes:>7.1f}m {'-':>8} {cost_str:>8}"
131
+ )
132
+ else:
133
+ lines.append(
134
+ {u.input_tokens:>8,} {u.output_tokens:>8,} {cost_str:>8}"
135
+ )
136
+
137
+ lines.append(f"\n Estimated total cost: ${self.total_cost:.4f}")
138
+
139
+ lines.append("=" * 60)
140
+ return "\n".join(lines)
141
+
142
+
143
+def _fmt_duration(seconds: float) -> str:
144
+ """Format seconds as human-readable duration."""
145
+ if seconds < 60:
146
+ return f"{seconds:.1f}s"
147
+ m = int(seconds // 60)
148
+ s = seconds % 60
149
+ if m < 60:
150
+ return f"{m}m {s:.0f}s"
151
+ h = m // 60
152
+ m = m % 60
153
+ return f"{h}h {m}m {s:.0f}s"
--- a/video_processor/utils/usage_tracker.py
+++ b/video_processor/utils/usage_tracker.py
@@ -0,0 +1,153 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- a/video_processor/utils/usage_tracker.py
+++ b/video_processor/utils/usage_tracker.py
@@ -0,0 +1,153 @@
1 """Usage tracking and cost estimation for API calls."""
2
3 import time
4 from dataclasses import dataclass, field
5 from typing import Optional
6
7
8 # Cost per million tokens (USD) — updated Feb 2025
9 _MODEL_PRICING = {
10 # Anthropic
11 "claude-sonnet-4-5-20250929": {"input": 3.00, "output": 15.00},
12 "claude-haiku-3-5-20241022": {"input": 0.80, "output": 4.00},
13 # OpenAI
14 "gpt-4o": {"input": 2.50, "output": 10.00},
15 "gpt-4o-mini": {"input": 0.15, "output": 0.60},
16 "gpt-4.1": {"input": 2.00, "output": 8.00},
17 "gpt-4.1-mini": {"input": 0.40, "output": 1.60},
18 "gpt-4.1-nano": {"input": 0.10, "output": 0.40},
19 # Google Gemini
20 "gemini-2.5-flash": {"input": 0.15, "output": 0.60},
21 "gemini-2.5-pro": {"input": 1.25, "output": 10.00},
22 "gemini-2.0-flash": {"input": 0.10, "output": 0.40},
23 # Whisper
24 "whisper-1": {"per_minute": 0.006},
25 }
26
27
28 @dataclass
29 class ModelUsage:
30 """Accumulated usge for a single model."""
31
32 provider: str = ""
33 model: str = ""
34 calls: int = 0
35 input_tokens: int = 0
36 output_tokens: int = 0
37 audio_minutes: float = 0.0
38
39 @property
40 def total_tokens(self) -> int:
41 return self.input_tokens + self.output_tokens
42
43 @property
44 def estimated_cost(self) -> float:
45 pricing = _MODEL_PRICING.get(self.model)
46 if not pricing:
47 # Try partial match
48 for key, p in _MODEL_PRICING.items():
49 if key in self.model or self.model in key:
50 pricing = p
51 break
52 if not pricing:
53 return 0.0
54 if "per_minute" in pricing:
55 return self.audio_minutes * pricing["per_minute"]
56 return (
57 self.input_tokens * pricing.get("input", 0) / 1_000_000
58 + self.output_tokens * pricing.get("output", 0) / 1_000_000
59 )
60
61
62 @dataclass
63 class StepTiming:
64 """Timing for timation for API calls."""
65
66 import time
67 from dataclasses import dataclass, field
68 from typing import Optional
69
70 # Cost per million tokens (USD) — updated Feb 2025
71 _MODEL_PRICING = {
72 # Anthropic
73 "claude-sonnet-4-5-20250929": {"input": 3.00, "output": 15.00},
74 "claude-haiku-3-5-20241022": {"input": 0.80, "output": 4.00},
75 """Usage tracking and cost estimation for API calls."""
76
77 import time
78 from dataclasses import dataclass, field
79 from typing import Optional
80
81 # Cost per million tokens (USD) — updated Feb 2025
82 _MODEL_PRICING = {
83 # Anthropic
84 "claude-sonnet-4-5-20250929": {"input": 3.00, "output": 15.00},
85 "claude-haiku-3-5-20241022": {"input": 0.80, "output": 4.00},
86 # OpenAI
87 "gpt-4o": {"input": 2.50, "output": 10.00},
88 "gpt-4o-mini": {"input": 0.15, "output": 0.60},
89 "gpt-4.1": {"input": 2.00, "output": 8.00},
90 "gpt-4.1-mini": {"input": 0.40, "output": 1.60},
91 "gpt-4.1-nano": {"input": 0.10, "output": 0.40},
92 # Google Gemini
93 "gemini-2.5-flash": {"input": 0.15, "output": 0.60},
94 "gemini-2.5-pro": {"input": 1.25, "output": 10.00},
95 "gemini-2.0-flash": {"input": 0.10, "output": 0.40},
96 # Whisper
97 "whisper-1": {"per_minute": 0.006},
98 }
99
100
101 @dataclass
102 class ModelUsage:
103 """Accumulated usage for a single model."""
104
105 provider: str = ""
106 model: str = ""
107 calls: int = 0
108 input_tokens: int = 0
109 output_tokens: int = 0
110 audio_minutes: float = 0.0
111
112 @property
113 def total_tokens(self) -> int:
114 return self.input_tokens + self.output_tokens
115
116 @property
117 def estimated_cost(self) -> float:
118 pricing = _MODEL_PRICING.get(self.model)
119 es.append(
120 ff.total_tokens:,} "
121 f"({self.total_input_tokens:,} in / {self.t)
122 lines.append("")
123 lines.append(f" {'Model':<35} {'Calls':>6} {'In Tok':>8} {'Out Tok':>8} {'Cost':>8}")
124 lines.append(f" {'-'*35} {'-'*6} {'-'*8} {'-'*8} {'-'*8}")
125 for key in sorted(self._models.keys()):
126 u = self._models[key]
127 cost_str = f"${u.estimated_cost:.4f}" if u.estimated_cost > 0 else "free"
128 if u.audio_minutes > 0:
129 lines.append(
130 f" {key:<35} {u.calls:>6} {u.audio_minutes:>7.1f}m {'-':>8} {cost_str:>8}"
131 )
132 else:
133 lines.append(
134 {u.input_tokens:>8,} {u.output_tokens:>8,} {cost_str:>8}"
135 )
136
137 lines.append(f"\n Estimated total cost: ${self.total_cost:.4f}")
138
139 lines.append("=" * 60)
140 return "\n".join(lines)
141
142
143 def _fmt_duration(seconds: float) -> str:
144 """Format seconds as human-readable duration."""
145 if seconds < 60:
146 return f"{seconds:.1f}s"
147 m = int(seconds // 60)
148 s = seconds % 60
149 if m < 60:
150 return f"{m}m {s:.0f}s"
151 h = m // 60
152 m = m % 60
153 return f"{h}h {m}m {s:.0f}s"

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button