PlanOpticon

Merge pull request #18 from ConflictHQ/add-ollama-provider Add Ollama provider for fully offline video analysis

noreply 2026-02-16 22:51 trunk merge
Commit a0146a58f34f11146e6932d285f8050f8ef0074867974ad0841941f1583a4290
+3 -3
--- README.md
+++ README.md
@@ -6,15 +6,15 @@
66
[![License](https://img.shields.io/github/license/ConflictHQ/PlanOpticon)](LICENSE)
77
[![Docs](https://img.shields.io/badge/docs-planopticon.dev-blue)](https://planopticon.dev)
88
99
**AI-powered video analysis and knowledge extraction.**
1010
11
-PlanOpticon processes video recordings into structured knowledge — transcripts, diagrams, action items, key points, and knowledge graphs. It auto-discovers available models across OpenAI, Anthropic, and Gemini, and produces rich multi-format output.
11
+PlanOpticon processes video recordings into structured knowledge — transcripts, diagrams, action items, key points, and knowledge graphs. It auto-discovers available models across OpenAI, Anthropic, Gemini, and Ollama, and produces rich multi-format output.
1212
1313
## Features
1414
15
-- **Multi-provider AI** — Auto-discovers and routes to the best available model across OpenAI, Anthropic, and Google Gemini
15
+- **Multi-provider AI** — Auto-discovers and routes to the best available model across OpenAI, Anthropic, Google Gemini, and Ollama (fully offline)
1616
- **Smart frame extraction** — Change detection for transitions + periodic capture for slow-evolving content (document scrolling, screen shares)
1717
- **People frame filtering** — OpenCV face detection automatically removes webcam/video conference frames, keeping only shared content
1818
- **Diagram extraction** — Vision model classification detects flowcharts, architecture diagrams, charts, and whiteboards
1919
- **Knowledge graphs** — Extracts entities and relationships, builds and merges knowledge graphs across videos
2020
- **Action item detection** — Finds commitments, tasks, and follow-ups with assignees and deadlines
@@ -65,11 +65,11 @@
6565
6666
### Requirements
6767
6868
- Python 3.10+
6969
- FFmpeg (`brew install ffmpeg` / `apt install ffmpeg`)
70
-- At least one API key: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `GEMINI_API_KEY`
70
+- At least one API key (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `GEMINI_API_KEY`) **or** [Ollama](https://ollama.com) running locally
7171
7272
## Output Structure
7373
7474
```
7575
output/
7676
--- README.md
+++ README.md
@@ -6,15 +6,15 @@
6 [![License](https://img.shields.io/github/license/ConflictHQ/PlanOpticon)](LICENSE)
7 [![Docs](https://img.shields.io/badge/docs-planopticon.dev-blue)](https://planopticon.dev)
8
9 **AI-powered video analysis and knowledge extraction.**
10
11 PlanOpticon processes video recordings into structured knowledge — transcripts, diagrams, action items, key points, and knowledge graphs. It auto-discovers available models across OpenAI, Anthropic, and Gemini, and produces rich multi-format output.
12
13 ## Features
14
15 - **Multi-provider AI** — Auto-discovers and routes to the best available model across OpenAI, Anthropic, and Google Gemini
16 - **Smart frame extraction** — Change detection for transitions + periodic capture for slow-evolving content (document scrolling, screen shares)
17 - **People frame filtering** — OpenCV face detection automatically removes webcam/video conference frames, keeping only shared content
18 - **Diagram extraction** — Vision model classification detects flowcharts, architecture diagrams, charts, and whiteboards
19 - **Knowledge graphs** — Extracts entities and relationships, builds and merges knowledge graphs across videos
20 - **Action item detection** — Finds commitments, tasks, and follow-ups with assignees and deadlines
@@ -65,11 +65,11 @@
65
66 ### Requirements
67
68 - Python 3.10+
69 - FFmpeg (`brew install ffmpeg` / `apt install ffmpeg`)
70 - At least one API key: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `GEMINI_API_KEY`
71
72 ## Output Structure
73
74 ```
75 output/
76
--- README.md
+++ README.md
@@ -6,15 +6,15 @@
6 [![License](https://img.shields.io/github/license/ConflictHQ/PlanOpticon)](LICENSE)
7 [![Docs](https://img.shields.io/badge/docs-planopticon.dev-blue)](https://planopticon.dev)
8
9 **AI-powered video analysis and knowledge extraction.**
10
11 PlanOpticon processes video recordings into structured knowledge — transcripts, diagrams, action items, key points, and knowledge graphs. It auto-discovers available models across OpenAI, Anthropic, Gemini, and Ollama, and produces rich multi-format output.
12
13 ## Features
14
15 - **Multi-provider AI** — Auto-discovers and routes to the best available model across OpenAI, Anthropic, Google Gemini, and Ollama (fully offline)
16 - **Smart frame extraction** — Change detection for transitions + periodic capture for slow-evolving content (document scrolling, screen shares)
17 - **People frame filtering** — OpenCV face detection automatically removes webcam/video conference frames, keeping only shared content
18 - **Diagram extraction** — Vision model classification detects flowcharts, architecture diagrams, charts, and whiteboards
19 - **Knowledge graphs** — Extracts entities and relationships, builds and merges knowledge graphs across videos
20 - **Action item detection** — Finds commitments, tasks, and follow-ups with assignees and deadlines
@@ -65,11 +65,11 @@
65
66 ### Requirements
67
68 - Python 3.10+
69 - FFmpeg (`brew install ffmpeg` / `apt install ffmpeg`)
70 - At least one API key (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `GEMINI_API_KEY`) **or** [Ollama](https://ollama.com) running locally
71
72 ## Output Structure
73
74 ```
75 output/
76
--- docs/architecture/overview.md
+++ docs/architecture/overview.md
@@ -43,10 +43,11 @@
4343
├── providers/ # AI provider abstraction
4444
│ ├── base.py # BaseProvider ABC
4545
│ ├── openai_provider.py
4646
│ ├── anthropic_provider.py
4747
│ ├── gemini_provider.py
48
+│ ├── ollama_provider.py # Local Ollama (offline)
4849
│ ├── discovery.py # Auto-model-discovery
4950
│ └── manager.py # ProviderManager routing
5051
├── utils/
5152
│ ├── json_parsing.py # Robust LLM JSON parsing
5253
│ ├── rendering.py # Mermaid + chart rendering
@@ -60,8 +61,8 @@
6061
6162
## Key design decisions
6263
6364
- **Pydantic everywhere** — All structured data uses pydantic models for validation and serialization
6465
- **Manifest-driven** — Every run produces `manifest.json` as the single source of truth
65
-- **Provider abstraction** — Single `ProviderManager` wraps OpenAI, Anthropic, Gemini behind a common interface
66
+- **Provider abstraction** — Single `ProviderManager` wraps OpenAI, Anthropic, Gemini, and Ollama behind a common interface
6667
- **No hardcoded models** — Model lists come from API discovery
6768
- **Screengrab fallback** — When extraction fails, save the frame as a captioned screenshot
6869
--- docs/architecture/overview.md
+++ docs/architecture/overview.md
@@ -43,10 +43,11 @@
43 ├── providers/ # AI provider abstraction
44 │ ├── base.py # BaseProvider ABC
45 │ ├── openai_provider.py
46 │ ├── anthropic_provider.py
47 │ ├── gemini_provider.py
 
48 │ ├── discovery.py # Auto-model-discovery
49 │ └── manager.py # ProviderManager routing
50 ├── utils/
51 │ ├── json_parsing.py # Robust LLM JSON parsing
52 │ ├── rendering.py # Mermaid + chart rendering
@@ -60,8 +61,8 @@
60
61 ## Key design decisions
62
63 - **Pydantic everywhere** — All structured data uses pydantic models for validation and serialization
64 - **Manifest-driven** — Every run produces `manifest.json` as the single source of truth
65 - **Provider abstraction** — Single `ProviderManager` wraps OpenAI, Anthropic, Gemini behind a common interface
66 - **No hardcoded models** — Model lists come from API discovery
67 - **Screengrab fallback** — When extraction fails, save the frame as a captioned screenshot
68
--- docs/architecture/overview.md
+++ docs/architecture/overview.md
@@ -43,10 +43,11 @@
43 ├── providers/ # AI provider abstraction
44 │ ├── base.py # BaseProvider ABC
45 │ ├── openai_provider.py
46 │ ├── anthropic_provider.py
47 │ ├── gemini_provider.py
48 │ ├── ollama_provider.py # Local Ollama (offline)
49 │ ├── discovery.py # Auto-model-discovery
50 │ └── manager.py # ProviderManager routing
51 ├── utils/
52 │ ├── json_parsing.py # Robust LLM JSON parsing
53 │ ├── rendering.py # Mermaid + chart rendering
@@ -60,8 +61,8 @@
61
62 ## Key design decisions
63
64 - **Pydantic everywhere** — All structured data uses pydantic models for validation and serialization
65 - **Manifest-driven** — Every run produces `manifest.json` as the single source of truth
66 - **Provider abstraction** — Single `ProviderManager` wraps OpenAI, Anthropic, Gemini, and Ollama behind a common interface
67 - **No hardcoded models** — Model lists come from API discovery
68 - **Screengrab fallback** — When extraction fails, save the frame as a captioned screenshot
69
--- docs/architecture/providers.md
+++ docs/architecture/providers.md
@@ -9,40 +9,73 @@
99
| Provider | Chat | Vision | Transcription |
1010
|----------|------|--------|--------------|
1111
| OpenAI | GPT-4o, GPT-4 | GPT-4o | Whisper-1 |
1212
| Anthropic | Claude Sonnet/Opus | Claude Sonnet/Opus | — |
1313
| Google Gemini | Gemini Flash/Pro | Gemini Flash/Pro | Gemini Flash |
14
+| Ollama (local) | Any installed model | llava, moondream, etc. | — (use local Whisper) |
15
+
16
+## Ollama (offline mode)
17
+
18
+[Ollama](https://ollama.com) enables fully offline operation with no API keys required. PlanOpticon connects via Ollama's OpenAI-compatible API.
19
+
20
+```bash
21
+# Install and start Ollama
22
+ollama serve
23
+
24
+# Pull a chat model
25
+ollama pull llama3.2
26
+
27
+# Pull a vision model (for diagram analysis)
28
+ollama pull llava
29
+```
30
+
31
+PlanOpticon auto-detects Ollama when it's running. To force Ollama:
32
+
33
+```bash
34
+planopticon analyze -i video.mp4 -o ./out --provider ollama
35
+```
36
+
37
+Configure a non-default host via `OLLAMA_HOST`:
38
+
39
+```bash
40
+export OLLAMA_HOST=http://192.168.1.100:11434
41
+```
1442
1543
## Auto-discovery
1644
17
-On startup, `ProviderManager` checks which API keys are configured and queries each provider's API to discover available models:
45
+On startup, `ProviderManager` checks which API keys are configured, queries each provider's API, and checks for a running Ollama server to discover available models:
1846
1947
```python
2048
from video_processor.providers.manager import ProviderManager
2149
2250
pm = ProviderManager()
23
-# Automatically discovers models from all configured providers
51
+# Automatically discovers models from all configured providers + Ollama
2452
```
2553
2654
## Routing preferences
2755
2856
Each task type has a default preference order:
2957
3058
| Task | Preference |
3159
|------|-----------|
32
-| Vision | Gemini Flash → GPT-4o → Claude Sonnet |
33
-| Chat | Claude Sonnet → GPT-4o → Gemini Flash |
34
-| Transcription | Whisper-1 → Gemini Flash |
60
+| Vision | Gemini Flash → GPT-4o → Claude Sonnet → Ollama |
61
+| Chat | Claude Sonnet → GPT-4o → Gemini Flash → Ollama |
62
+| Transcription | Local Whisper → Whisper-1 → Gemini Flash |
63
+
64
+Ollama acts as the last-resort fallback — if no cloud API keys are set but Ollama is running, it is used automatically.
3565
3666
## Manual override
3767
3868
```python
3969
pm = ProviderManager(
4070
vision_model="gpt-4o",
4171
chat_model="claude-sonnet-4-5-20250929",
4272
provider="openai", # Force a specific provider
4373
)
74
+
75
+# Or use Ollama for fully offline processing
76
+pm = ProviderManager(provider="ollama")
4477
```
4578
4679
## BaseProvider interface
4780
4881
All providers implement:
4982
--- docs/architecture/providers.md
+++ docs/architecture/providers.md
@@ -9,40 +9,73 @@
9 | Provider | Chat | Vision | Transcription |
10 |----------|------|--------|--------------|
11 | OpenAI | GPT-4o, GPT-4 | GPT-4o | Whisper-1 |
12 | Anthropic | Claude Sonnet/Opus | Claude Sonnet/Opus | — |
13 | Google Gemini | Gemini Flash/Pro | Gemini Flash/Pro | Gemini Flash |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
15 ## Auto-discovery
16
17 On startup, `ProviderManager` checks which API keys are configured and queries each provider's API to discover available models:
18
19 ```python
20 from video_processor.providers.manager import ProviderManager
21
22 pm = ProviderManager()
23 # Automatically discovers models from all configured providers
24 ```
25
26 ## Routing preferences
27
28 Each task type has a default preference order:
29
30 | Task | Preference |
31 |------|-----------|
32 | Vision | Gemini Flash → GPT-4o → Claude Sonnet |
33 | Chat | Claude Sonnet → GPT-4o → Gemini Flash |
34 | Transcription | Whisper-1 → Gemini Flash |
 
 
35
36 ## Manual override
37
38 ```python
39 pm = ProviderManager(
40 vision_model="gpt-4o",
41 chat_model="claude-sonnet-4-5-20250929",
42 provider="openai", # Force a specific provider
43 )
 
 
 
44 ```
45
46 ## BaseProvider interface
47
48 All providers implement:
49
--- docs/architecture/providers.md
+++ docs/architecture/providers.md
@@ -9,40 +9,73 @@
9 | Provider | Chat | Vision | Transcription |
10 |----------|------|--------|--------------|
11 | OpenAI | GPT-4o, GPT-4 | GPT-4o | Whisper-1 |
12 | Anthropic | Claude Sonnet/Opus | Claude Sonnet/Opus | — |
13 | Google Gemini | Gemini Flash/Pro | Gemini Flash/Pro | Gemini Flash |
14 | Ollama (local) | Any installed model | llava, moondream, etc. | — (use local Whisper) |
15
16 ## Ollama (offline mode)
17
18 [Ollama](https://ollama.com) enables fully offline operation with no API keys required. PlanOpticon connects via Ollama's OpenAI-compatible API.
19
20 ```bash
21 # Install and start Ollama
22 ollama serve
23
24 # Pull a chat model
25 ollama pull llama3.2
26
27 # Pull a vision model (for diagram analysis)
28 ollama pull llava
29 ```
30
31 PlanOpticon auto-detects Ollama when it's running. To force Ollama:
32
33 ```bash
34 planopticon analyze -i video.mp4 -o ./out --provider ollama
35 ```
36
37 Configure a non-default host via `OLLAMA_HOST`:
38
39 ```bash
40 export OLLAMA_HOST=http://192.168.1.100:11434
41 ```
42
43 ## Auto-discovery
44
45 On startup, `ProviderManager` checks which API keys are configured, queries each provider's API, and checks for a running Ollama server to discover available models:
46
47 ```python
48 from video_processor.providers.manager import ProviderManager
49
50 pm = ProviderManager()
51 # Automatically discovers models from all configured providers + Ollama
52 ```
53
54 ## Routing preferences
55
56 Each task type has a default preference order:
57
58 | Task | Preference |
59 |------|-----------|
60 | Vision | Gemini Flash → GPT-4o → Claude Sonnet → Ollama |
61 | Chat | Claude Sonnet → GPT-4o → Gemini Flash → Ollama |
62 | Transcription | Local Whisper → Whisper-1 → Gemini Flash |
63
64 Ollama acts as the last-resort fallback — if no cloud API keys are set but Ollama is running, it is used automatically.
65
66 ## Manual override
67
68 ```python
69 pm = ProviderManager(
70 vision_model="gpt-4o",
71 chat_model="claude-sonnet-4-5-20250929",
72 provider="openai", # Force a specific provider
73 )
74
75 # Or use Ollama for fully offline processing
76 pm = ProviderManager(provider="ollama")
77 ```
78
79 ## BaseProvider interface
80
81 All providers implement:
82
--- docs/cli-reference.md
+++ docs/cli-reference.md
@@ -17,11 +17,11 @@
1717
| `--use-gpu` | FLAG | off | Enable GPU acceleration |
1818
| `--sampling-rate` | FLOAT | 0.5 | Frame sampling rate (fps) |
1919
| `--change-threshold` | FLOAT | 0.15 | Visual change threshold |
2020
| `--periodic-capture` | FLOAT | 30.0 | Capture a frame every N seconds regardless of change (0 to disable) |
2121
| `--title` | TEXT | auto | Report title |
22
-| `-p`, `--provider` | `auto\|openai\|anthropic\|gemini` | `auto` | API provider |
22
+| `-p`, `--provider` | `auto\|openai\|anthropic\|gemini\|ollama` | `auto` | API provider |
2323
| `--vision-model` | TEXT | auto | Override vision model |
2424
| `--chat-model` | TEXT | auto | Override chat model |
2525
2626
---
2727
@@ -38,11 +38,11 @@
3838
| `-i`, `--input-dir` | PATH | *required* | Directory containing videos |
3939
| `-o`, `--output` | PATH | *required* | Output directory |
4040
| `--depth` | `basic\|standard\|comprehensive` | `standard` | Processing depth |
4141
| `--pattern` | TEXT | `*.mp4,*.mkv,*.avi,*.mov,*.webm` | File glob patterns |
4242
| `--title` | TEXT | `Batch Processing Results` | Batch title |
43
-| `-p`, `--provider` | `auto\|openai\|anthropic\|gemini` | `auto` | API provider |
43
+| `-p`, `--provider` | `auto\|openai\|anthropic\|gemini\|ollama` | `auto` | API provider |
4444
| `--vision-model` | TEXT | auto | Override vision model |
4545
| `--chat-model` | TEXT | auto | Override chat model |
4646
| `--source` | `local\|gdrive\|dropbox` | `local` | Video source |
4747
| `--folder-id` | TEXT | none | Google Drive folder ID |
4848
| `--folder-path` | TEXT | none | Cloud folder path |
@@ -90,11 +90,11 @@
9090
|--------|------|---------|-------------|
9191
| `-i`, `--input` | PATH | *required* | Input video file path |
9292
| `-o`, `--output` | PATH | *required* | Output directory |
9393
| `--depth` | `basic\|standard\|comprehensive` | `standard` | Initial processing depth (agent may adapt) |
9494
| `--title` | TEXT | auto | Report title |
95
-| `-p`, `--provider` | `auto\|openai\|anthropic\|gemini` | `auto` | API provider |
95
+| `-p`, `--provider` | `auto\|openai\|anthropic\|gemini\|ollama` | `auto` | API provider |
9696
| `--vision-model` | TEXT | auto | Override vision model |
9797
| `--chat-model` | TEXT | auto | Override chat model |
9898
9999
---
100100
101101
--- docs/cli-reference.md
+++ docs/cli-reference.md
@@ -17,11 +17,11 @@
17 | `--use-gpu` | FLAG | off | Enable GPU acceleration |
18 | `--sampling-rate` | FLOAT | 0.5 | Frame sampling rate (fps) |
19 | `--change-threshold` | FLOAT | 0.15 | Visual change threshold |
20 | `--periodic-capture` | FLOAT | 30.0 | Capture a frame every N seconds regardless of change (0 to disable) |
21 | `--title` | TEXT | auto | Report title |
22 | `-p`, `--provider` | `auto\|openai\|anthropic\|gemini` | `auto` | API provider |
23 | `--vision-model` | TEXT | auto | Override vision model |
24 | `--chat-model` | TEXT | auto | Override chat model |
25
26 ---
27
@@ -38,11 +38,11 @@
38 | `-i`, `--input-dir` | PATH | *required* | Directory containing videos |
39 | `-o`, `--output` | PATH | *required* | Output directory |
40 | `--depth` | `basic\|standard\|comprehensive` | `standard` | Processing depth |
41 | `--pattern` | TEXT | `*.mp4,*.mkv,*.avi,*.mov,*.webm` | File glob patterns |
42 | `--title` | TEXT | `Batch Processing Results` | Batch title |
43 | `-p`, `--provider` | `auto\|openai\|anthropic\|gemini` | `auto` | API provider |
44 | `--vision-model` | TEXT | auto | Override vision model |
45 | `--chat-model` | TEXT | auto | Override chat model |
46 | `--source` | `local\|gdrive\|dropbox` | `local` | Video source |
47 | `--folder-id` | TEXT | none | Google Drive folder ID |
48 | `--folder-path` | TEXT | none | Cloud folder path |
@@ -90,11 +90,11 @@
90 |--------|------|---------|-------------|
91 | `-i`, `--input` | PATH | *required* | Input video file path |
92 | `-o`, `--output` | PATH | *required* | Output directory |
93 | `--depth` | `basic\|standard\|comprehensive` | `standard` | Initial processing depth (agent may adapt) |
94 | `--title` | TEXT | auto | Report title |
95 | `-p`, `--provider` | `auto\|openai\|anthropic\|gemini` | `auto` | API provider |
96 | `--vision-model` | TEXT | auto | Override vision model |
97 | `--chat-model` | TEXT | auto | Override chat model |
98
99 ---
100
101
--- docs/cli-reference.md
+++ docs/cli-reference.md
@@ -17,11 +17,11 @@
17 | `--use-gpu` | FLAG | off | Enable GPU acceleration |
18 | `--sampling-rate` | FLOAT | 0.5 | Frame sampling rate (fps) |
19 | `--change-threshold` | FLOAT | 0.15 | Visual change threshold |
20 | `--periodic-capture` | FLOAT | 30.0 | Capture a frame every N seconds regardless of change (0 to disable) |
21 | `--title` | TEXT | auto | Report title |
22 | `-p`, `--provider` | `auto\|openai\|anthropic\|gemini\|ollama` | `auto` | API provider |
23 | `--vision-model` | TEXT | auto | Override vision model |
24 | `--chat-model` | TEXT | auto | Override chat model |
25
26 ---
27
@@ -38,11 +38,11 @@
38 | `-i`, `--input-dir` | PATH | *required* | Directory containing videos |
39 | `-o`, `--output` | PATH | *required* | Output directory |
40 | `--depth` | `basic\|standard\|comprehensive` | `standard` | Processing depth |
41 | `--pattern` | TEXT | `*.mp4,*.mkv,*.avi,*.mov,*.webm` | File glob patterns |
42 | `--title` | TEXT | `Batch Processing Results` | Batch title |
43 | `-p`, `--provider` | `auto\|openai\|anthropic\|gemini\|ollama` | `auto` | API provider |
44 | `--vision-model` | TEXT | auto | Override vision model |
45 | `--chat-model` | TEXT | auto | Override chat model |
46 | `--source` | `local\|gdrive\|dropbox` | `local` | Video source |
47 | `--folder-id` | TEXT | none | Google Drive folder ID |
48 | `--folder-path` | TEXT | none | Cloud folder path |
@@ -90,11 +90,11 @@
90 |--------|------|---------|-------------|
91 | `-i`, `--input` | PATH | *required* | Input video file path |
92 | `-o`, `--output` | PATH | *required* | Output directory |
93 | `--depth` | `basic\|standard\|comprehensive` | `standard` | Initial processing depth (agent may adapt) |
94 | `--title` | TEXT | auto | Report title |
95 | `-p`, `--provider` | `auto\|openai\|anthropic\|gemini\|ollama` | `auto` | API provider |
96 | `--vision-model` | TEXT | auto | Override vision model |
97 | `--chat-model` | TEXT | auto | Override chat model |
98
99 ---
100
101
--- docs/getting-started/configuration.md
+++ docs/getting-started/configuration.md
@@ -5,22 +5,25 @@
55
| Variable | Description |
66
|----------|-------------|
77
| `OPENAI_API_KEY` | OpenAI API key |
88
| `ANTHROPIC_API_KEY` | Anthropic API key |
99
| `GEMINI_API_KEY` | Google Gemini API key |
10
+| `OLLAMA_HOST` | Ollama server URL (default: `http://localhost:11434`) |
1011
| `GOOGLE_APPLICATION_CREDENTIALS` | Path to Google service account JSON (for Drive) |
1112
| `CACHE_DIR` | Directory for API response caching |
1213
1314
## Provider routing
1415
1516
PlanOpticon auto-discovers available models and routes each task to the best option:
1617
1718
| Task | Default preference |
1819
|------|--------------------|
19
-| Vision (diagrams) | Gemini Flash > GPT-4o > Claude Sonnet |
20
-| Chat (analysis) | Claude Sonnet > GPT-4o > Gemini Flash |
21
-| Transcription | Whisper-1 > Gemini Flash |
20
+| Vision (diagrams) | Gemini Flash > GPT-4o > Claude Sonnet > Ollama |
21
+| Chat (analysis) | Claude Sonnet > GPT-4o > Gemini Flash > Ollama |
22
+| Transcription | Local Whisper > Whisper-1 > Gemini Flash |
23
+
24
+If no cloud API keys are configured, PlanOpticon automatically falls back to Ollama when a local server is running. This enables fully offline operation when paired with local Whisper for transcription.
2225
2326
Override with `--provider`, `--vision-model`, or `--chat-model` flags.
2427
2528
## Frame sampling
2629
2730
--- docs/getting-started/configuration.md
+++ docs/getting-started/configuration.md
@@ -5,22 +5,25 @@
5 | Variable | Description |
6 |----------|-------------|
7 | `OPENAI_API_KEY` | OpenAI API key |
8 | `ANTHROPIC_API_KEY` | Anthropic API key |
9 | `GEMINI_API_KEY` | Google Gemini API key |
 
10 | `GOOGLE_APPLICATION_CREDENTIALS` | Path to Google service account JSON (for Drive) |
11 | `CACHE_DIR` | Directory for API response caching |
12
13 ## Provider routing
14
15 PlanOpticon auto-discovers available models and routes each task to the best option:
16
17 | Task | Default preference |
18 |------|--------------------|
19 | Vision (diagrams) | Gemini Flash > GPT-4o > Claude Sonnet |
20 | Chat (analysis) | Claude Sonnet > GPT-4o > Gemini Flash |
21 | Transcription | Whisper-1 > Gemini Flash |
 
 
22
23 Override with `--provider`, `--vision-model`, or `--chat-model` flags.
24
25 ## Frame sampling
26
27
--- docs/getting-started/configuration.md
+++ docs/getting-started/configuration.md
@@ -5,22 +5,25 @@
5 | Variable | Description |
6 |----------|-------------|
7 | `OPENAI_API_KEY` | OpenAI API key |
8 | `ANTHROPIC_API_KEY` | Anthropic API key |
9 | `GEMINI_API_KEY` | Google Gemini API key |
10 | `OLLAMA_HOST` | Ollama server URL (default: `http://localhost:11434`) |
11 | `GOOGLE_APPLICATION_CREDENTIALS` | Path to Google service account JSON (for Drive) |
12 | `CACHE_DIR` | Directory for API response caching |
13
14 ## Provider routing
15
16 PlanOpticon auto-discovers available models and routes each task to the best option:
17
18 | Task | Default preference |
19 |------|--------------------|
20 | Vision (diagrams) | Gemini Flash > GPT-4o > Claude Sonnet > Ollama |
21 | Chat (analysis) | Claude Sonnet > GPT-4o > Gemini Flash > Ollama |
22 | Transcription | Local Whisper > Whisper-1 > Gemini Flash |
23
24 If no cloud API keys are configured, PlanOpticon automatically falls back to Ollama when a local server is running. This enables fully offline operation when paired with local Whisper for transcription.
25
26 Override with `--provider`, `--vision-model`, or `--chat-model` flags.
27
28 ## Frame sampling
29
30
--- docs/getting-started/installation.md
+++ docs/getting-started/installation.md
@@ -62,11 +62,15 @@
6262
6363
Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to PATH.
6464
6565
## API keys
6666
67
-You need at least one AI provider API key. Set them as environment variables:
67
+You need at least one AI provider API key **or** a running Ollama server.
68
+
69
+### Cloud providers
70
+
71
+Set API keys as environment variables:
6872
6973
```bash
7074
export OPENAI_API_KEY="sk-..."
7175
export ANTHROPIC_API_KEY="sk-ant-..."
7276
export GEMINI_API_KEY="AI..."
@@ -78,6 +82,21 @@
7882
OPENAI_API_KEY=sk-...
7983
ANTHROPIC_API_KEY=sk-ant-...
8084
GEMINI_API_KEY=AI...
8185
```
8286
87
+### Ollama (fully offline)
88
+
89
+No API keys needed — just install and run [Ollama](https://ollama.com):
90
+
91
+```bash
92
+# Install Ollama, then pull models
93
+ollama pull llama3.2 # Chat/analysis
94
+ollama pull llava # Vision (diagram detection)
95
+
96
+# Start the server (if not already running)
97
+ollama serve
98
+```
99
+
100
+PlanOpticon auto-detects Ollama and uses it as a fallback when no cloud API keys are set. For a fully offline pipeline, pair Ollama with local Whisper transcription (`pip install planopticon[gpu]`).
101
+
83102
PlanOpticon will automatically discover which providers are available and route to the best model for each task.
84103
--- docs/getting-started/installation.md
+++ docs/getting-started/installation.md
@@ -62,11 +62,15 @@
62
63 Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to PATH.
64
65 ## API keys
66
67 You need at least one AI provider API key. Set them as environment variables:
 
 
 
 
68
69 ```bash
70 export OPENAI_API_KEY="sk-..."
71 export ANTHROPIC_API_KEY="sk-ant-..."
72 export GEMINI_API_KEY="AI..."
@@ -78,6 +82,21 @@
78 OPENAI_API_KEY=sk-...
79 ANTHROPIC_API_KEY=sk-ant-...
80 GEMINI_API_KEY=AI...
81 ```
82
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83 PlanOpticon will automatically discover which providers are available and route to the best model for each task.
84
--- docs/getting-started/installation.md
+++ docs/getting-started/installation.md
@@ -62,11 +62,15 @@
62
63 Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to PATH.
64
65 ## API keys
66
67 You need at least one AI provider API key **or** a running Ollama server.
68
69 ### Cloud providers
70
71 Set API keys as environment variables:
72
73 ```bash
74 export OPENAI_API_KEY="sk-..."
75 export ANTHROPIC_API_KEY="sk-ant-..."
76 export GEMINI_API_KEY="AI..."
@@ -78,6 +82,21 @@
82 OPENAI_API_KEY=sk-...
83 ANTHROPIC_API_KEY=sk-ant-...
84 GEMINI_API_KEY=AI...
85 ```
86
87 ### Ollama (fully offline)
88
89 No API keys needed — just install and run [Ollama](https://ollama.com):
90
91 ```bash
92 # Install Ollama, then pull models
93 ollama pull llama3.2 # Chat/analysis
94 ollama pull llava # Vision (diagram detection)
95
96 # Start the server (if not already running)
97 ollama serve
98 ```
99
100 PlanOpticon auto-detects Ollama and uses it as a fallback when no cloud API keys are set. For a fully offline pipeline, pair Ollama with local Whisper transcription (`pip install planopticon[gpu]`).
101
102 PlanOpticon will automatically discover which providers are available and route to the best model for each task.
103
--- docs/getting-started/quickstart.md
+++ docs/getting-started/quickstart.md
@@ -35,10 +35,13 @@
3535
# Auto-detect best available (default)
3636
planopticon analyze -i video.mp4 -o ./out
3737
3838
# Force a specific provider
3939
planopticon analyze -i video.mp4 -o ./out --provider openai
40
+
41
+# Use Ollama for fully offline processing (no API keys needed)
42
+planopticon analyze -i video.mp4 -o ./out --provider ollama
4043
4144
# Override specific models
4245
planopticon analyze -i video.mp4 -o ./out \
4346
--vision-model gpt-4o \
4447
--chat-model claude-sonnet-4-5-20250929
4548
--- docs/getting-started/quickstart.md
+++ docs/getting-started/quickstart.md
@@ -35,10 +35,13 @@
35 # Auto-detect best available (default)
36 planopticon analyze -i video.mp4 -o ./out
37
38 # Force a specific provider
39 planopticon analyze -i video.mp4 -o ./out --provider openai
 
 
 
40
41 # Override specific models
42 planopticon analyze -i video.mp4 -o ./out \
43 --vision-model gpt-4o \
44 --chat-model claude-sonnet-4-5-20250929
45
--- docs/getting-started/quickstart.md
+++ docs/getting-started/quickstart.md
@@ -35,10 +35,13 @@
35 # Auto-detect best available (default)
36 planopticon analyze -i video.mp4 -o ./out
37
38 # Force a specific provider
39 planopticon analyze -i video.mp4 -o ./out --provider openai
40
41 # Use Ollama for fully offline processing (no API keys needed)
42 planopticon analyze -i video.mp4 -o ./out --provider ollama
43
44 # Override specific models
45 planopticon analyze -i video.mp4 -o ./out \
46 --vision-model gpt-4o \
47 --chat-model claude-sonnet-4-5-20250929
48
--- tests/test_providers.py
+++ tests/test_providers.py
@@ -1,8 +1,10 @@
11
"""Tests for the provider abstraction layer."""
22
33
from unittest.mock import MagicMock, patch
4
+
5
+import pytest
46
57
from video_processor.providers.base import BaseProvider, ModelInfo
68
from video_processor.providers.manager import ProviderManager
79
810
@@ -107,21 +109,27 @@
107109
assert "anthropic/claude-sonnet-4-5-20250929" == used["chat"]
108110
109111
110112
class TestDiscovery:
111113
@patch("video_processor.providers.discovery._cached_models", None)
114
+ @patch(
115
+ "video_processor.providers.ollama_provider.OllamaProvider.is_available", return_value=False
116
+ )
112117
@patch.dict("os.environ", {}, clear=True)
113
- def test_discover_skips_missing_keys(self):
118
+ def test_discover_skips_missing_keys(self, mock_ollama):
114119
from video_processor.providers.discovery import discover_available_models
115120
116
- # No API keys -> empty list, no errors
121
+ # No API keys and no Ollama -> empty list, no errors
117122
models = discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""})
118123
assert models == []
119124
120125
@patch.dict("os.environ", {}, clear=True)
126
+ @patch(
127
+ "video_processor.providers.ollama_provider.OllamaProvider.is_available", return_value=False
128
+ )
121129
@patch("video_processor.providers.discovery._cached_models", None)
122
- def test_discover_caches_results(self):
130
+ def test_discover_caches_results(self, mock_ollama):
123131
from video_processor.providers import discovery
124132
125133
models = discovery.discover_available_models(
126134
api_keys={"openai": "", "anthropic": "", "gemini": ""}
127135
)
@@ -131,5 +139,85 @@
131139
assert models2 == [] # Still cached empty result
132140
133141
# Force refresh
134142
discovery.clear_discovery_cache()
135143
# Would try to connect with real key, so skip that test
144
+
145
+
146
+class TestOllamaProvider:
147
+ @patch("video_processor.providers.ollama_provider.requests")
148
+ def test_is_available_when_running(self, mock_requests):
149
+ mock_resp = MagicMock()
150
+ mock_resp.status_code = 200
151
+ mock_requests.get.return_value = mock_resp
152
+
153
+ from video_processor.providers.ollama_provider import OllamaProvider
154
+
155
+ assert OllamaProvider.is_available()
156
+
157
+ @patch("video_processor.providers.ollama_provider.requests")
158
+ def test_is_available_when_not_running(self, mock_requests):
159
+ mock_requests.get.side_effect = ConnectionError
160
+
161
+ from video_processor.providers.ollama_provider import OllamaProvider
162
+
163
+ assert not OllamaProvider.is_available()
164
+
165
+ @patch("video_processor.providers.ollama_provider.requests")
166
+ @patch("video_processor.providers.ollama_provider.OpenAI")
167
+ def test_transcribe_raises(self, mock_openai, mock_requests):
168
+ from video_processor.providers.ollama_provider import OllamaProvider
169
+
170
+ provider = OllamaProvider()
171
+ with pytest.raises(NotImplementedError):
172
+ provider.transcribe_audio("/tmp/test.wav")
173
+
174
+ @patch("video_processor.providers.ollama_provider.requests")
175
+ @patch("video_processor.providers.ollama_provider.OpenAI")
176
+ def test_list_models(self, mock_openai, mock_requests):
177
+ mock_resp = MagicMock()
178
+ mock_resp.status_code = 200
179
+ mock_resp.json.return_value = {
180
+ "models": [
181
+ {"name": "llama3.2:latest", "details": {"family": "llama"}},
182
+ {"name": "llava:13b", "details": {"family": "llava"}},
183
+ ]
184
+ }
185
+ mock_requests.get.return_value = mock_resp
186
+
187
+ from video_processor.providers.ollama_provider import OllamaProvider
188
+
189
+ provider = OllamaProvider()
190
+ models = provider.list_models()
191
+ assert len(models) == 2
192
+ assert models[0].provider == "ollama"
193
+
194
+ # llava should have vision capability
195
+ llava = [m for m in models if "llava" in m.id][0]
196
+ assert "vision" in llava.capabilities
197
+
198
+ # llama should have only chat
199
+ llama = [m for m in models if "llama" in m.id][0]
200
+ assert "chat" in llama.capabilities
201
+ assert "vision" not in llama.capabilities
202
+
203
+ def test_provider_for_model_ollama_via_discovery(self):
204
+ mgr = ProviderManager()
205
+ mgr._available_models = [
206
+ ModelInfo(id="llama3.2:latest", provider="ollama", capabilities=["chat"]),
207
+ ]
208
+ assert mgr._provider_for_model("llama3.2:latest") == "ollama"
209
+
210
+ def test_provider_for_model_ollama_fuzzy_tag(self):
211
+ mgr = ProviderManager()
212
+ mgr._available_models = [
213
+ ModelInfo(id="llama3.2:latest", provider="ollama", capabilities=["chat"]),
214
+ ]
215
+ # Should match "llama3.2" to "llama3.2:latest" via prefix
216
+ assert mgr._provider_for_model("llama3.2") == "ollama"
217
+
218
+ def test_init_forced_provider_ollama(self):
219
+ mgr = ProviderManager(provider="ollama")
220
+ # Ollama defaults are empty (resolved dynamically)
221
+ assert mgr.vision_model == ""
222
+ assert mgr.chat_model == ""
223
+ assert mgr.transcription_model == ""
136224
--- tests/test_providers.py
+++ tests/test_providers.py
@@ -1,8 +1,10 @@
1 """Tests for the provider abstraction layer."""
2
3 from unittest.mock import MagicMock, patch
 
 
4
5 from video_processor.providers.base import BaseProvider, ModelInfo
6 from video_processor.providers.manager import ProviderManager
7
8
@@ -107,21 +109,27 @@
107 assert "anthropic/claude-sonnet-4-5-20250929" == used["chat"]
108
109
110 class TestDiscovery:
111 @patch("video_processor.providers.discovery._cached_models", None)
 
 
 
112 @patch.dict("os.environ", {}, clear=True)
113 def test_discover_skips_missing_keys(self):
114 from video_processor.providers.discovery import discover_available_models
115
116 # No API keys -> empty list, no errors
117 models = discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""})
118 assert models == []
119
120 @patch.dict("os.environ", {}, clear=True)
 
 
 
121 @patch("video_processor.providers.discovery._cached_models", None)
122 def test_discover_caches_results(self):
123 from video_processor.providers import discovery
124
125 models = discovery.discover_available_models(
126 api_keys={"openai": "", "anthropic": "", "gemini": ""}
127 )
@@ -131,5 +139,85 @@
131 assert models2 == [] # Still cached empty result
132
133 # Force refresh
134 discovery.clear_discovery_cache()
135 # Would try to connect with real key, so skip that test
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
136
--- tests/test_providers.py
+++ tests/test_providers.py
@@ -1,8 +1,10 @@
1 """Tests for the provider abstraction layer."""
2
3 from unittest.mock import MagicMock, patch
4
5 import pytest
6
7 from video_processor.providers.base import BaseProvider, ModelInfo
8 from video_processor.providers.manager import ProviderManager
9
10
@@ -107,21 +109,27 @@
109 assert "anthropic/claude-sonnet-4-5-20250929" == used["chat"]
110
111
112 class TestDiscovery:
113 @patch("video_processor.providers.discovery._cached_models", None)
114 @patch(
115 "video_processor.providers.ollama_provider.OllamaProvider.is_available", return_value=False
116 )
117 @patch.dict("os.environ", {}, clear=True)
118 def test_discover_skips_missing_keys(self, mock_ollama):
119 from video_processor.providers.discovery import discover_available_models
120
121 # No API keys and no Ollama -> empty list, no errors
122 models = discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""})
123 assert models == []
124
125 @patch.dict("os.environ", {}, clear=True)
126 @patch(
127 "video_processor.providers.ollama_provider.OllamaProvider.is_available", return_value=False
128 )
129 @patch("video_processor.providers.discovery._cached_models", None)
130 def test_discover_caches_results(self, mock_ollama):
131 from video_processor.providers import discovery
132
133 models = discovery.discover_available_models(
134 api_keys={"openai": "", "anthropic": "", "gemini": ""}
135 )
@@ -131,5 +139,85 @@
139 assert models2 == [] # Still cached empty result
140
141 # Force refresh
142 discovery.clear_discovery_cache()
143 # Would try to connect with real key, so skip that test
144
145
146 class TestOllamaProvider:
147 @patch("video_processor.providers.ollama_provider.requests")
148 def test_is_available_when_running(self, mock_requests):
149 mock_resp = MagicMock()
150 mock_resp.status_code = 200
151 mock_requests.get.return_value = mock_resp
152
153 from video_processor.providers.ollama_provider import OllamaProvider
154
155 assert OllamaProvider.is_available()
156
157 @patch("video_processor.providers.ollama_provider.requests")
158 def test_is_available_when_not_running(self, mock_requests):
159 mock_requests.get.side_effect = ConnectionError
160
161 from video_processor.providers.ollama_provider import OllamaProvider
162
163 assert not OllamaProvider.is_available()
164
165 @patch("video_processor.providers.ollama_provider.requests")
166 @patch("video_processor.providers.ollama_provider.OpenAI")
167 def test_transcribe_raises(self, mock_openai, mock_requests):
168 from video_processor.providers.ollama_provider import OllamaProvider
169
170 provider = OllamaProvider()
171 with pytest.raises(NotImplementedError):
172 provider.transcribe_audio("/tmp/test.wav")
173
174 @patch("video_processor.providers.ollama_provider.requests")
175 @patch("video_processor.providers.ollama_provider.OpenAI")
176 def test_list_models(self, mock_openai, mock_requests):
177 mock_resp = MagicMock()
178 mock_resp.status_code = 200
179 mock_resp.json.return_value = {
180 "models": [
181 {"name": "llama3.2:latest", "details": {"family": "llama"}},
182 {"name": "llava:13b", "details": {"family": "llava"}},
183 ]
184 }
185 mock_requests.get.return_value = mock_resp
186
187 from video_processor.providers.ollama_provider import OllamaProvider
188
189 provider = OllamaProvider()
190 models = provider.list_models()
191 assert len(models) == 2
192 assert models[0].provider == "ollama"
193
194 # llava should have vision capability
195 llava = [m for m in models if "llava" in m.id][0]
196 assert "vision" in llava.capabilities
197
198 # llama should have only chat
199 llama = [m for m in models if "llama" in m.id][0]
200 assert "chat" in llama.capabilities
201 assert "vision" not in llama.capabilities
202
203 def test_provider_for_model_ollama_via_discovery(self):
204 mgr = ProviderManager()
205 mgr._available_models = [
206 ModelInfo(id="llama3.2:latest", provider="ollama", capabilities=["chat"]),
207 ]
208 assert mgr._provider_for_model("llama3.2:latest") == "ollama"
209
210 def test_provider_for_model_ollama_fuzzy_tag(self):
211 mgr = ProviderManager()
212 mgr._available_models = [
213 ModelInfo(id="llama3.2:latest", provider="ollama", capabilities=["chat"]),
214 ]
215 # Should match "llama3.2" to "llama3.2:latest" via prefix
216 assert mgr._provider_for_model("llama3.2") == "ollama"
217
218 def test_init_forced_provider_ollama(self):
219 mgr = ProviderManager(provider="ollama")
220 # Ollama defaults are empty (resolved dynamically)
221 assert mgr.vision_model == ""
222 assert mgr.chat_model == ""
223 assert mgr.transcription_model == ""
224
--- video_processor/cli/commands.py
+++ video_processor/cli/commands.py
@@ -73,11 +73,11 @@
7373
)
7474
@click.option("--title", type=str, help="Title for the analysis report")
7575
@click.option(
7676
"--provider",
7777
"-p",
78
- type=click.Choice(["auto", "openai", "anthropic", "gemini"]),
78
+ type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]),
7979
default="auto",
8080
help="API provider",
8181
)
8282
@click.option("--vision-model", type=str, default=None, help="Override model for vision tasks")
8383
@click.option("--chat-model", type=str, default=None, help="Override model for LLM/chat tasks")
@@ -154,11 +154,11 @@
154154
)
155155
@click.option("--title", type=str, default="Batch Processing Results", help="Batch title")
156156
@click.option(
157157
"--provider",
158158
"-p",
159
- type=click.Choice(["auto", "openai", "anthropic", "gemini"]),
159
+ type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]),
160160
default="auto",
161161
help="API provider",
162162
)
163163
@click.option("--vision-model", type=str, default=None, help="Override model for vision tasks")
164164
@click.option("--chat-model", type=str, default=None, help="Override model for LLM/chat tasks")
@@ -343,12 +343,14 @@
343343
"""Discover and display available models from all configured providers."""
344344
from video_processor.providers.discovery import discover_available_models
345345
346346
models = discover_available_models(force_refresh=True)
347347
if not models:
348
- click.echo("No models discovered. Check that at least one API key is set:")
349
- click.echo(" OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY")
348
+ click.echo(
349
+ "No models discovered. Check that at least one API key is set or Ollama is running:"
350
+ )
351
+ click.echo(" OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, or `ollama serve`")
350352
return
351353
352354
by_provider: dict[str, list] = {}
353355
for m in models:
354356
by_provider.setdefault(m.provider, []).append(m)
@@ -417,11 +419,11 @@
417419
)
418420
@click.option("--title", type=str, help="Title for the analysis report")
419421
@click.option(
420422
"--provider",
421423
"-p",
422
- type=click.Choice(["auto", "openai", "anthropic", "gemini"]),
424
+ type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]),
423425
default="auto",
424426
help="API provider",
425427
)
426428
@click.option("--vision-model", type=str, default=None, help="Override model for vision tasks")
427429
@click.option("--chat-model", type=str, default=None, help="Override model for LLM/chat tasks")
@@ -511,11 +513,11 @@
511513
type=click.Choice(["basic", "standard", "comprehensive"]),
512514
default="standard",
513515
)
514516
provider = click.prompt(
515517
" Provider",
516
- type=click.Choice(["auto", "openai", "anthropic", "gemini"]),
518
+ type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]),
517519
default="auto",
518520
)
519521
ctx.invoke(
520522
analyze,
521523
input=input_path,
@@ -540,11 +542,11 @@
540542
type=click.Choice(["basic", "standard", "comprehensive"]),
541543
default="standard",
542544
)
543545
provider = click.prompt(
544546
" Provider",
545
- type=click.Choice(["auto", "openai", "anthropic", "gemini"]),
547
+ type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]),
546548
default="auto",
547549
)
548550
ctx.invoke(
549551
batch,
550552
input_dir=input_dir,
551553
--- video_processor/cli/commands.py
+++ video_processor/cli/commands.py
@@ -73,11 +73,11 @@
73 )
74 @click.option("--title", type=str, help="Title for the analysis report")
75 @click.option(
76 "--provider",
77 "-p",
78 type=click.Choice(["auto", "openai", "anthropic", "gemini"]),
79 default="auto",
80 help="API provider",
81 )
82 @click.option("--vision-model", type=str, default=None, help="Override model for vision tasks")
83 @click.option("--chat-model", type=str, default=None, help="Override model for LLM/chat tasks")
@@ -154,11 +154,11 @@
154 )
155 @click.option("--title", type=str, default="Batch Processing Results", help="Batch title")
156 @click.option(
157 "--provider",
158 "-p",
159 type=click.Choice(["auto", "openai", "anthropic", "gemini"]),
160 default="auto",
161 help="API provider",
162 )
163 @click.option("--vision-model", type=str, default=None, help="Override model for vision tasks")
164 @click.option("--chat-model", type=str, default=None, help="Override model for LLM/chat tasks")
@@ -343,12 +343,14 @@
343 """Discover and display available models from all configured providers."""
344 from video_processor.providers.discovery import discover_available_models
345
346 models = discover_available_models(force_refresh=True)
347 if not models:
348 click.echo("No models discovered. Check that at least one API key is set:")
349 click.echo(" OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY")
 
 
350 return
351
352 by_provider: dict[str, list] = {}
353 for m in models:
354 by_provider.setdefault(m.provider, []).append(m)
@@ -417,11 +419,11 @@
417 )
418 @click.option("--title", type=str, help="Title for the analysis report")
419 @click.option(
420 "--provider",
421 "-p",
422 type=click.Choice(["auto", "openai", "anthropic", "gemini"]),
423 default="auto",
424 help="API provider",
425 )
426 @click.option("--vision-model", type=str, default=None, help="Override model for vision tasks")
427 @click.option("--chat-model", type=str, default=None, help="Override model for LLM/chat tasks")
@@ -511,11 +513,11 @@
511 type=click.Choice(["basic", "standard", "comprehensive"]),
512 default="standard",
513 )
514 provider = click.prompt(
515 " Provider",
516 type=click.Choice(["auto", "openai", "anthropic", "gemini"]),
517 default="auto",
518 )
519 ctx.invoke(
520 analyze,
521 input=input_path,
@@ -540,11 +542,11 @@
540 type=click.Choice(["basic", "standard", "comprehensive"]),
541 default="standard",
542 )
543 provider = click.prompt(
544 " Provider",
545 type=click.Choice(["auto", "openai", "anthropic", "gemini"]),
546 default="auto",
547 )
548 ctx.invoke(
549 batch,
550 input_dir=input_dir,
551
--- video_processor/cli/commands.py
+++ video_processor/cli/commands.py
@@ -73,11 +73,11 @@
73 )
74 @click.option("--title", type=str, help="Title for the analysis report")
75 @click.option(
76 "--provider",
77 "-p",
78 type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]),
79 default="auto",
80 help="API provider",
81 )
82 @click.option("--vision-model", type=str, default=None, help="Override model for vision tasks")
83 @click.option("--chat-model", type=str, default=None, help="Override model for LLM/chat tasks")
@@ -154,11 +154,11 @@
154 )
155 @click.option("--title", type=str, default="Batch Processing Results", help="Batch title")
156 @click.option(
157 "--provider",
158 "-p",
159 type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]),
160 default="auto",
161 help="API provider",
162 )
163 @click.option("--vision-model", type=str, default=None, help="Override model for vision tasks")
164 @click.option("--chat-model", type=str, default=None, help="Override model for LLM/chat tasks")
@@ -343,12 +343,14 @@
343 """Discover and display available models from all configured providers."""
344 from video_processor.providers.discovery import discover_available_models
345
346 models = discover_available_models(force_refresh=True)
347 if not models:
348 click.echo(
349 "No models discovered. Check that at least one API key is set or Ollama is running:"
350 )
351 click.echo(" OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, or `ollama serve`")
352 return
353
354 by_provider: dict[str, list] = {}
355 for m in models:
356 by_provider.setdefault(m.provider, []).append(m)
@@ -417,11 +419,11 @@
419 )
420 @click.option("--title", type=str, help="Title for the analysis report")
421 @click.option(
422 "--provider",
423 "-p",
424 type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]),
425 default="auto",
426 help="API provider",
427 )
428 @click.option("--vision-model", type=str, default=None, help="Override model for vision tasks")
429 @click.option("--chat-model", type=str, default=None, help="Override model for LLM/chat tasks")
@@ -511,11 +513,11 @@
513 type=click.Choice(["basic", "standard", "comprehensive"]),
514 default="standard",
515 )
516 provider = click.prompt(
517 " Provider",
518 type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]),
519 default="auto",
520 )
521 ctx.invoke(
522 analyze,
523 input=input_path,
@@ -540,11 +542,11 @@
542 type=click.Choice(["basic", "standard", "comprehensive"]),
543 default="standard",
544 )
545 provider = click.prompt(
546 " Provider",
547 type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]),
548 default="auto",
549 )
550 ctx.invoke(
551 batch,
552 input_dir=input_dir,
553
--- video_processor/providers/discovery.py
+++ video_processor/providers/discovery.py
@@ -75,10 +75,22 @@
7575
logger.info(f"Discovered {len(models)} Gemini models")
7676
all_models.extend(models)
7777
except Exception as e:
7878
logger.warning(f"Gemini discovery failed: {e}")
7979
80
+ # Ollama (local, no API key needed)
81
+ try:
82
+ from video_processor.providers.ollama_provider import OllamaProvider
83
+
84
+ if OllamaProvider.is_available():
85
+ provider = OllamaProvider()
86
+ models = provider.list_models()
87
+ logger.info(f"Discovered {len(models)} Ollama models")
88
+ all_models.extend(models)
89
+ except Exception as e:
90
+ logger.info(f"Ollama discovery skipped: {e}")
91
+
8092
# Sort by provider then id
8193
all_models.sort(key=lambda m: (m.provider, m.id))
8294
_cached_models = all_models
8395
logger.info(f"Total discovered models: {len(all_models)}")
8496
return all_models
8597
--- video_processor/providers/discovery.py
+++ video_processor/providers/discovery.py
@@ -75,10 +75,22 @@
75 logger.info(f"Discovered {len(models)} Gemini models")
76 all_models.extend(models)
77 except Exception as e:
78 logger.warning(f"Gemini discovery failed: {e}")
79
 
 
 
 
 
 
 
 
 
 
 
 
80 # Sort by provider then id
81 all_models.sort(key=lambda m: (m.provider, m.id))
82 _cached_models = all_models
83 logger.info(f"Total discovered models: {len(all_models)}")
84 return all_models
85
--- video_processor/providers/discovery.py
+++ video_processor/providers/discovery.py
@@ -75,10 +75,22 @@
75 logger.info(f"Discovered {len(models)} Gemini models")
76 all_models.extend(models)
77 except Exception as e:
78 logger.warning(f"Gemini discovery failed: {e}")
79
80 # Ollama (local, no API key needed)
81 try:
82 from video_processor.providers.ollama_provider import OllamaProvider
83
84 if OllamaProvider.is_available():
85 provider = OllamaProvider()
86 models = provider.list_models()
87 logger.info(f"Discovered {len(models)} Ollama models")
88 all_models.extend(models)
89 except Exception as e:
90 logger.info(f"Ollama discovery skipped: {e}")
91
92 # Sort by provider then id
93 all_models.sort(key=lambda m: (m.provider, m.id))
94 _cached_models = all_models
95 logger.info(f"Total discovered models: {len(all_models)}")
96 return all_models
97
--- video_processor/providers/manager.py
+++ video_processor/providers/manager.py
@@ -91,10 +91,15 @@
9191
"gemini": {
9292
"chat": "gemini-2.5-flash",
9393
"vision": "gemini-2.5-flash",
9494
"audio": "gemini-2.5-flash",
9595
},
96
+ "ollama": {
97
+ "chat": "",
98
+ "vision": "",
99
+ "audio": "",
100
+ },
96101
}
97102
return defaults.get(provider, {}).get(capability, "")
98103
99104
def _get_provider(self, provider_name: str) -> BaseProvider:
100105
"""Lazily initialize and cache a provider instance."""
@@ -109,10 +114,14 @@
109114
self._providers[provider_name] = AnthropicProvider()
110115
elif provider_name == "gemini":
111116
from video_processor.providers.gemini_provider import GeminiProvider
112117
113118
self._providers[provider_name] = GeminiProvider()
119
+ elif provider_name == "ollama":
120
+ from video_processor.providers.ollama_provider import OllamaProvider
121
+
122
+ self._providers[provider_name] = OllamaProvider()
114123
else:
115124
raise ValueError(f"Unknown provider: {provider_name}")
116125
return self._providers[provider_name]
117126
118127
def _provider_for_model(self, model_id: str) -> str:
@@ -127,15 +136,18 @@
127136
return "openai"
128137
if model_id.startswith("claude-"):
129138
return "anthropic"
130139
if model_id.startswith("gemini-"):
131140
return "gemini"
132
- # Try discovery
141
+ # Try discovery (exact match, then prefix match for ollama name:tag format)
133142
models = self._get_available_models()
134143
for m in models:
135144
if m.id == model_id:
136145
return m.provider
146
+ for m in models:
147
+ if m.id.startswith(model_id + ":"):
148
+ return m.provider
137149
raise ValueError(f"Cannot determine provider for model: {model_id}")
138150
139151
def _get_available_models(self) -> list[ModelInfo]:
140152
if self._available_models is None:
141153
self._available_models = discover_available_models()
@@ -159,14 +171,27 @@
159171
try:
160172
self._get_provider(prov)
161173
return prov, model
162174
except (ValueError, ImportError):
163175
continue
176
+
177
+ # Fallback: try Ollama if available (no API key needed)
178
+ try:
179
+ from video_processor.providers.ollama_provider import OllamaProvider
180
+
181
+ if OllamaProvider.is_available():
182
+ provider = self._get_provider("ollama")
183
+ models = provider.list_models()
184
+ for m in models:
185
+ if capability in m.capabilities:
186
+ return "ollama", m.id
187
+ except Exception:
188
+ pass
164189
165190
raise RuntimeError(
166191
f"No provider available for capability '{capability}'. "
167
- "Set an API key for at least one provider."
192
+ "Set an API key for at least one provider, or start Ollama."
168193
)
169194
170195
def _track(self, provider: BaseProvider, prov_name: str, model: str) -> None:
171196
"""Record usage from the last API call on a provider."""
172197
last = getattr(provider, "_last_usage", None)
173198
174199
ADDED video_processor/providers/ollama_provider.py
--- video_processor/providers/manager.py
+++ video_processor/providers/manager.py
@@ -91,10 +91,15 @@
91 "gemini": {
92 "chat": "gemini-2.5-flash",
93 "vision": "gemini-2.5-flash",
94 "audio": "gemini-2.5-flash",
95 },
 
 
 
 
 
96 }
97 return defaults.get(provider, {}).get(capability, "")
98
99 def _get_provider(self, provider_name: str) -> BaseProvider:
100 """Lazily initialize and cache a provider instance."""
@@ -109,10 +114,14 @@
109 self._providers[provider_name] = AnthropicProvider()
110 elif provider_name == "gemini":
111 from video_processor.providers.gemini_provider import GeminiProvider
112
113 self._providers[provider_name] = GeminiProvider()
 
 
 
 
114 else:
115 raise ValueError(f"Unknown provider: {provider_name}")
116 return self._providers[provider_name]
117
118 def _provider_for_model(self, model_id: str) -> str:
@@ -127,15 +136,18 @@
127 return "openai"
128 if model_id.startswith("claude-"):
129 return "anthropic"
130 if model_id.startswith("gemini-"):
131 return "gemini"
132 # Try discovery
133 models = self._get_available_models()
134 for m in models:
135 if m.id == model_id:
136 return m.provider
 
 
 
137 raise ValueError(f"Cannot determine provider for model: {model_id}")
138
139 def _get_available_models(self) -> list[ModelInfo]:
140 if self._available_models is None:
141 self._available_models = discover_available_models()
@@ -159,14 +171,27 @@
159 try:
160 self._get_provider(prov)
161 return prov, model
162 except (ValueError, ImportError):
163 continue
 
 
 
 
 
 
 
 
 
 
 
 
 
164
165 raise RuntimeError(
166 f"No provider available for capability '{capability}'. "
167 "Set an API key for at least one provider."
168 )
169
170 def _track(self, provider: BaseProvider, prov_name: str, model: str) -> None:
171 """Record usage from the last API call on a provider."""
172 last = getattr(provider, "_last_usage", None)
173
174 DDED video_processor/providers/ollama_provider.py
--- video_processor/providers/manager.py
+++ video_processor/providers/manager.py
@@ -91,10 +91,15 @@
91 "gemini": {
92 "chat": "gemini-2.5-flash",
93 "vision": "gemini-2.5-flash",
94 "audio": "gemini-2.5-flash",
95 },
96 "ollama": {
97 "chat": "",
98 "vision": "",
99 "audio": "",
100 },
101 }
102 return defaults.get(provider, {}).get(capability, "")
103
104 def _get_provider(self, provider_name: str) -> BaseProvider:
105 """Lazily initialize and cache a provider instance."""
@@ -109,10 +114,14 @@
114 self._providers[provider_name] = AnthropicProvider()
115 elif provider_name == "gemini":
116 from video_processor.providers.gemini_provider import GeminiProvider
117
118 self._providers[provider_name] = GeminiProvider()
119 elif provider_name == "ollama":
120 from video_processor.providers.ollama_provider import OllamaProvider
121
122 self._providers[provider_name] = OllamaProvider()
123 else:
124 raise ValueError(f"Unknown provider: {provider_name}")
125 return self._providers[provider_name]
126
127 def _provider_for_model(self, model_id: str) -> str:
@@ -127,15 +136,18 @@
136 return "openai"
137 if model_id.startswith("claude-"):
138 return "anthropic"
139 if model_id.startswith("gemini-"):
140 return "gemini"
141 # Try discovery (exact match, then prefix match for ollama name:tag format)
142 models = self._get_available_models()
143 for m in models:
144 if m.id == model_id:
145 return m.provider
146 for m in models:
147 if m.id.startswith(model_id + ":"):
148 return m.provider
149 raise ValueError(f"Cannot determine provider for model: {model_id}")
150
151 def _get_available_models(self) -> list[ModelInfo]:
152 if self._available_models is None:
153 self._available_models = discover_available_models()
@@ -159,14 +171,27 @@
171 try:
172 self._get_provider(prov)
173 return prov, model
174 except (ValueError, ImportError):
175 continue
176
177 # Fallback: try Ollama if available (no API key needed)
178 try:
179 from video_processor.providers.ollama_provider import OllamaProvider
180
181 if OllamaProvider.is_available():
182 provider = self._get_provider("ollama")
183 models = provider.list_models()
184 for m in models:
185 if capability in m.capabilities:
186 return "ollama", m.id
187 except Exception:
188 pass
189
190 raise RuntimeError(
191 f"No provider available for capability '{capability}'. "
192 "Set an API key for at least one provider, or start Ollama."
193 )
194
195 def _track(self, provider: BaseProvider, prov_name: str, model: str) -> None:
196 """Record usage from the last API call on a provider."""
197 last = getattr(provider, "_last_usage", None)
198
199 DDED video_processor/providers/ollama_provider.py
--- a/video_processor/providers/ollama_provider.py
+++ b/video_processor/providers/ollama_provider.py
@@ -0,0 +1,109 @@
1
+"""Ollama provider implementation using OpenAI-compatible API."""
2
+
3
+import base64
4
+import logging
5
+import os
6
+from pathlib import Path
7
+from typing import Optional
8
+
9
+import requests
10
+from openai import OpenAI
11
+
12
+from video_processor.providers.base import BaseProvider, ModelInfo
13
+
14
+logger = logging.getLogger(__name__)
15
+
16
+# Known vision-capable model families (base name before the colon/tag)
17
+_VISION_FAMILIES = {
18
+ "llava",
19
+ "llava-llama3",
20
+ "llava-phi3",
21
+ "llama3.2-vision",
22
+ "moondream",
23
+ "bakllava",
24
+ "minicpm-v",
25
+ "deepseek-vl",
26
+ "internvl2",
27
+}
28
+
29
+DEFAULT_HOST = "http://localhost:11434"
30
+
31
+
32
+class OllamaProvider(BaseProvider):
33
+ """Ollama local LLM provider via OpenAI-compatible API."""
34
+
35
+ provider_name = model = model or self._dprompt_tokens", 0) self.host ULT_HOST)
36
+ model = model or secmethod
37
+ def is_available(host: Optional[str] = None) -> bool:
38
+ """Check if an Ollama server is running and reachable."""
39
+ host = host or os.getenv("OLLAMA_HOST", DEFAULT_HOST)
40
+ try:
41
+ resp = requests.get(f"{host}/api/tags", timeout=3)
42
+ return resp.status_code == 200
43
+ except Exception:
44
+ return False
45
+
46
+ @property
47
+ def _default_model(self) -> str:
48
+ models = self._get_models()
49
+ for m in models:
50
+ if "chat" in m.capabilities:
51
+ return m.id
52
+ return "llama3.2:latest"
53
+
54
+ @property
55
+ def _default_vision_model(self) -> Optional[str]:
56
+ models = self._get_models()
57
+ for m in models:
58
+ if "vision" in m.capabilities:
59
+ return m.id
60
+ return None
61
+
62
+ def _get_models(self) -> list[ModelInfo]:
63
+ if self._models_cache is None:
64
+ self._models_cache = self.list_models()
65
+ return self._models_cache
66
+
67
+ def chat(
68
+ self,
69
+ messages: list[ model = model or self._dprompt_tokens", 0) self.host ULT_HOST)
70
+ model = model or setions.create(
71
+ model=model,
72
+ messages=messages,
73
+ max_tokens=max_tokens,
74
+ temperature=temperature,
75
+ )
76
+ self._last_usage = {
77
+ "input_tokens": (getattr(response.usage, "prompt_tokens", 0) or 0)
78
+ if response.usage
79
+ else 0,
80
+ "output_tokens": (getattr(response.usage, "completion_tokens", 0) or 0)
81
+ if response.usage
82
+ else 0,
83
+ }
84
+ return response.choices[0].message.content or ""
85
+
86
+ def analyze_image(
87
+ self,
88
+ image_bytes: bytes,
89
+ prompt: str,
90
+ max_tokens: int = 4096,
91
+ model: Optional[str] = None,
92
+ ) -> str:
93
+ model = model or self._default_vision_model
94
+ if not model:
95
+ raise RuntimeError(
96
+ "No Ollama vision model available. Install a multimodal model: ollama pull llava"
97
+ )
98
+ b64 = base64.b64encode(image_bytes).decode()
99
+ response = self.client.chat.completions.create(
100
+ model=model,
101
+ messages=[
102
+ {
103
+ "role": "user",
104
+ "content": [
105
+ {"type": "text", "text": prompt},
106
+ {
107
+ "type": "image_url",
108
+ "image_url": {"url": f"data:image/jpeg;base64,{b64}"},
109
+
--- a/video_processor/providers/ollama_provider.py
+++ b/video_processor/providers/ollama_provider.py
@@ -0,0 +1,109 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--- a/video_processor/providers/ollama_provider.py
+++ b/video_processor/providers/ollama_provider.py
@@ -0,0 +1,109 @@
1 """Ollama provider implementation using OpenAI-compatible API."""
2
3 import base64
4 import logging
5 import os
6 from pathlib import Path
7 from typing import Optional
8
9 import requests
10 from openai import OpenAI
11
12 from video_processor.providers.base import BaseProvider, ModelInfo
13
14 logger = logging.getLogger(__name__)
15
16 # Known vision-capable model families (base name before the colon/tag)
17 _VISION_FAMILIES = {
18 "llava",
19 "llava-llama3",
20 "llava-phi3",
21 "llama3.2-vision",
22 "moondream",
23 "bakllava",
24 "minicpm-v",
25 "deepseek-vl",
26 "internvl2",
27 }
28
29 DEFAULT_HOST = "http://localhost:11434"
30
31
32 class OllamaProvider(BaseProvider):
33 """Ollama local LLM provider via OpenAI-compatible API."""
34
35 provider_name = model = model or self._dprompt_tokens", 0) self.host ULT_HOST)
36 model = model or secmethod
37 def is_available(host: Optional[str] = None) -> bool:
38 """Check if an Ollama server is running and reachable."""
39 host = host or os.getenv("OLLAMA_HOST", DEFAULT_HOST)
40 try:
41 resp = requests.get(f"{host}/api/tags", timeout=3)
42 return resp.status_code == 200
43 except Exception:
44 return False
45
46 @property
47 def _default_model(self) -> str:
48 models = self._get_models()
49 for m in models:
50 if "chat" in m.capabilities:
51 return m.id
52 return "llama3.2:latest"
53
54 @property
55 def _default_vision_model(self) -> Optional[str]:
56 models = self._get_models()
57 for m in models:
58 if "vision" in m.capabilities:
59 return m.id
60 return None
61
62 def _get_models(self) -> list[ModelInfo]:
63 if self._models_cache is None:
64 self._models_cache = self.list_models()
65 return self._models_cache
66
67 def chat(
68 self,
69 messages: list[ model = model or self._dprompt_tokens", 0) self.host ULT_HOST)
70 model = model or setions.create(
71 model=model,
72 messages=messages,
73 max_tokens=max_tokens,
74 temperature=temperature,
75 )
76 self._last_usage = {
77 "input_tokens": (getattr(response.usage, "prompt_tokens", 0) or 0)
78 if response.usage
79 else 0,
80 "output_tokens": (getattr(response.usage, "completion_tokens", 0) or 0)
81 if response.usage
82 else 0,
83 }
84 return response.choices[0].message.content or ""
85
86 def analyze_image(
87 self,
88 image_bytes: bytes,
89 prompt: str,
90 max_tokens: int = 4096,
91 model: Optional[str] = None,
92 ) -> str:
93 model = model or self._default_vision_model
94 if not model:
95 raise RuntimeError(
96 "No Ollama vision model available. Install a multimodal model: ollama pull llava"
97 )
98 b64 = base64.b64encode(image_bytes).decode()
99 response = self.client.chat.completions.create(
100 model=model,
101 messages=[
102 {
103 "role": "user",
104 "content": [
105 {"type": "text", "text": prompt},
106 {
107 "type": "image_url",
108 "image_url": {"url": f"data:image/jpeg;base64,{b64}"},
109

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button