PlanOpticon
Add Ollama provider for fully offline video analysis Adds Ollama as a fourth LLM provider, enabling fully offline operation with no API keys required. Uses Ollama's OpenAI-compatible API (zero new dependencies). Auto-detects running Ollama server and falls back to it when no cloud API keys are configured. - New OllamaProvider class with chat, vision, and model discovery - ProviderManager auto-fallback to Ollama as last resort - Vision model detection for llava, moondream, etc. - CLI --provider ollama option on all commands - Updated docs, architecture guide, and CLI reference - 7 new tests for Ollama provider integration
Commit
139b64b39f1bd0daec84e957d55b085a9c1c6eea19c65cf11bcf58f5dc7437cd
Parent
8ade7827de290d0…
12 files changed
+3
-3
+2
-1
+38
-5
+3
-3
+6
-3
+20
-1
+3
+91
-3
+9
-7
+12
+27
-2
+109
~
README.md
~
docs/architecture/overview.md
~
docs/architecture/providers.md
~
docs/cli-reference.md
~
docs/getting-started/configuration.md
~
docs/getting-started/installation.md
~
docs/getting-started/quickstart.md
~
tests/test_providers.py
~
video_processor/cli/commands.py
~
video_processor/providers/discovery.py
~
video_processor/providers/manager.py
+
video_processor/providers/ollama_provider.py
+3
-3
| --- README.md | ||
| +++ README.md | ||
| @@ -6,15 +6,15 @@ | ||
| 6 | 6 | [](LICENSE) |
| 7 | 7 | [](https://planopticon.dev) |
| 8 | 8 | |
| 9 | 9 | **AI-powered video analysis and knowledge extraction.** |
| 10 | 10 | |
| 11 | -PlanOpticon processes video recordings into structured knowledge — transcripts, diagrams, action items, key points, and knowledge graphs. It auto-discovers available models across OpenAI, Anthropic, and Gemini, and produces rich multi-format output. | |
| 11 | +PlanOpticon processes video recordings into structured knowledge — transcripts, diagrams, action items, key points, and knowledge graphs. It auto-discovers available models across OpenAI, Anthropic, Gemini, and Ollama, and produces rich multi-format output. | |
| 12 | 12 | |
| 13 | 13 | ## Features |
| 14 | 14 | |
| 15 | -- **Multi-provider AI** — Auto-discovers and routes to the best available model across OpenAI, Anthropic, and Google Gemini | |
| 15 | +- **Multi-provider AI** — Auto-discovers and routes to the best available model across OpenAI, Anthropic, Google Gemini, and Ollama (fully offline) | |
| 16 | 16 | - **Smart frame extraction** — Change detection for transitions + periodic capture for slow-evolving content (document scrolling, screen shares) |
| 17 | 17 | - **People frame filtering** — OpenCV face detection automatically removes webcam/video conference frames, keeping only shared content |
| 18 | 18 | - **Diagram extraction** — Vision model classification detects flowcharts, architecture diagrams, charts, and whiteboards |
| 19 | 19 | - **Knowledge graphs** — Extracts entities and relationships, builds and merges knowledge graphs across videos |
| 20 | 20 | - **Action item detection** — Finds commitments, tasks, and follow-ups with assignees and deadlines |
| @@ -65,11 +65,11 @@ | ||
| 65 | 65 | |
| 66 | 66 | ### Requirements |
| 67 | 67 | |
| 68 | 68 | - Python 3.10+ |
| 69 | 69 | - FFmpeg (`brew install ffmpeg` / `apt install ffmpeg`) |
| 70 | -- At least one API key: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `GEMINI_API_KEY` | |
| 70 | +- At least one API key (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `GEMINI_API_KEY`) **or** [Ollama](https://ollama.com) running locally | |
| 71 | 71 | |
| 72 | 72 | ## Output Structure |
| 73 | 73 | |
| 74 | 74 | ``` |
| 75 | 75 | output/ |
| 76 | 76 |
| --- README.md | |
| +++ README.md | |
| @@ -6,15 +6,15 @@ | |
| 6 | [](LICENSE) |
| 7 | [](https://planopticon.dev) |
| 8 | |
| 9 | **AI-powered video analysis and knowledge extraction.** |
| 10 | |
| 11 | PlanOpticon processes video recordings into structured knowledge — transcripts, diagrams, action items, key points, and knowledge graphs. It auto-discovers available models across OpenAI, Anthropic, and Gemini, and produces rich multi-format output. |
| 12 | |
| 13 | ## Features |
| 14 | |
| 15 | - **Multi-provider AI** — Auto-discovers and routes to the best available model across OpenAI, Anthropic, and Google Gemini |
| 16 | - **Smart frame extraction** — Change detection for transitions + periodic capture for slow-evolving content (document scrolling, screen shares) |
| 17 | - **People frame filtering** — OpenCV face detection automatically removes webcam/video conference frames, keeping only shared content |
| 18 | - **Diagram extraction** — Vision model classification detects flowcharts, architecture diagrams, charts, and whiteboards |
| 19 | - **Knowledge graphs** — Extracts entities and relationships, builds and merges knowledge graphs across videos |
| 20 | - **Action item detection** — Finds commitments, tasks, and follow-ups with assignees and deadlines |
| @@ -65,11 +65,11 @@ | |
| 65 | |
| 66 | ### Requirements |
| 67 | |
| 68 | - Python 3.10+ |
| 69 | - FFmpeg (`brew install ffmpeg` / `apt install ffmpeg`) |
| 70 | - At least one API key: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `GEMINI_API_KEY` |
| 71 | |
| 72 | ## Output Structure |
| 73 | |
| 74 | ``` |
| 75 | output/ |
| 76 |
| --- README.md | |
| +++ README.md | |
| @@ -6,15 +6,15 @@ | |
| 6 | [](LICENSE) |
| 7 | [](https://planopticon.dev) |
| 8 | |
| 9 | **AI-powered video analysis and knowledge extraction.** |
| 10 | |
| 11 | PlanOpticon processes video recordings into structured knowledge — transcripts, diagrams, action items, key points, and knowledge graphs. It auto-discovers available models across OpenAI, Anthropic, Gemini, and Ollama, and produces rich multi-format output. |
| 12 | |
| 13 | ## Features |
| 14 | |
| 15 | - **Multi-provider AI** — Auto-discovers and routes to the best available model across OpenAI, Anthropic, Google Gemini, and Ollama (fully offline) |
| 16 | - **Smart frame extraction** — Change detection for transitions + periodic capture for slow-evolving content (document scrolling, screen shares) |
| 17 | - **People frame filtering** — OpenCV face detection automatically removes webcam/video conference frames, keeping only shared content |
| 18 | - **Diagram extraction** — Vision model classification detects flowcharts, architecture diagrams, charts, and whiteboards |
| 19 | - **Knowledge graphs** — Extracts entities and relationships, builds and merges knowledge graphs across videos |
| 20 | - **Action item detection** — Finds commitments, tasks, and follow-ups with assignees and deadlines |
| @@ -65,11 +65,11 @@ | |
| 65 | |
| 66 | ### Requirements |
| 67 | |
| 68 | - Python 3.10+ |
| 69 | - FFmpeg (`brew install ffmpeg` / `apt install ffmpeg`) |
| 70 | - At least one API key (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or `GEMINI_API_KEY`) **or** [Ollama](https://ollama.com) running locally |
| 71 | |
| 72 | ## Output Structure |
| 73 | |
| 74 | ``` |
| 75 | output/ |
| 76 |
+2
-1
| --- docs/architecture/overview.md | ||
| +++ docs/architecture/overview.md | ||
| @@ -43,10 +43,11 @@ | ||
| 43 | 43 | ├── providers/ # AI provider abstraction |
| 44 | 44 | │ ├── base.py # BaseProvider ABC |
| 45 | 45 | │ ├── openai_provider.py |
| 46 | 46 | │ ├── anthropic_provider.py |
| 47 | 47 | │ ├── gemini_provider.py |
| 48 | +│ ├── ollama_provider.py # Local Ollama (offline) | |
| 48 | 49 | │ ├── discovery.py # Auto-model-discovery |
| 49 | 50 | │ └── manager.py # ProviderManager routing |
| 50 | 51 | ├── utils/ |
| 51 | 52 | │ ├── json_parsing.py # Robust LLM JSON parsing |
| 52 | 53 | │ ├── rendering.py # Mermaid + chart rendering |
| @@ -60,8 +61,8 @@ | ||
| 60 | 61 | |
| 61 | 62 | ## Key design decisions |
| 62 | 63 | |
| 63 | 64 | - **Pydantic everywhere** — All structured data uses pydantic models for validation and serialization |
| 64 | 65 | - **Manifest-driven** — Every run produces `manifest.json` as the single source of truth |
| 65 | -- **Provider abstraction** — Single `ProviderManager` wraps OpenAI, Anthropic, Gemini behind a common interface | |
| 66 | +- **Provider abstraction** — Single `ProviderManager` wraps OpenAI, Anthropic, Gemini, and Ollama behind a common interface | |
| 66 | 67 | - **No hardcoded models** — Model lists come from API discovery |
| 67 | 68 | - **Screengrab fallback** — When extraction fails, save the frame as a captioned screenshot |
| 68 | 69 |
| --- docs/architecture/overview.md | |
| +++ docs/architecture/overview.md | |
| @@ -43,10 +43,11 @@ | |
| 43 | ├── providers/ # AI provider abstraction |
| 44 | │ ├── base.py # BaseProvider ABC |
| 45 | │ ├── openai_provider.py |
| 46 | │ ├── anthropic_provider.py |
| 47 | │ ├── gemini_provider.py |
| 48 | │ ├── discovery.py # Auto-model-discovery |
| 49 | │ └── manager.py # ProviderManager routing |
| 50 | ├── utils/ |
| 51 | │ ├── json_parsing.py # Robust LLM JSON parsing |
| 52 | │ ├── rendering.py # Mermaid + chart rendering |
| @@ -60,8 +61,8 @@ | |
| 60 | |
| 61 | ## Key design decisions |
| 62 | |
| 63 | - **Pydantic everywhere** — All structured data uses pydantic models for validation and serialization |
| 64 | - **Manifest-driven** — Every run produces `manifest.json` as the single source of truth |
| 65 | - **Provider abstraction** — Single `ProviderManager` wraps OpenAI, Anthropic, Gemini behind a common interface |
| 66 | - **No hardcoded models** — Model lists come from API discovery |
| 67 | - **Screengrab fallback** — When extraction fails, save the frame as a captioned screenshot |
| 68 |
| --- docs/architecture/overview.md | |
| +++ docs/architecture/overview.md | |
| @@ -43,10 +43,11 @@ | |
| 43 | ├── providers/ # AI provider abstraction |
| 44 | │ ├── base.py # BaseProvider ABC |
| 45 | │ ├── openai_provider.py |
| 46 | │ ├── anthropic_provider.py |
| 47 | │ ├── gemini_provider.py |
| 48 | │ ├── ollama_provider.py # Local Ollama (offline) |
| 49 | │ ├── discovery.py # Auto-model-discovery |
| 50 | │ └── manager.py # ProviderManager routing |
| 51 | ├── utils/ |
| 52 | │ ├── json_parsing.py # Robust LLM JSON parsing |
| 53 | │ ├── rendering.py # Mermaid + chart rendering |
| @@ -60,8 +61,8 @@ | |
| 61 | |
| 62 | ## Key design decisions |
| 63 | |
| 64 | - **Pydantic everywhere** — All structured data uses pydantic models for validation and serialization |
| 65 | - **Manifest-driven** — Every run produces `manifest.json` as the single source of truth |
| 66 | - **Provider abstraction** — Single `ProviderManager` wraps OpenAI, Anthropic, Gemini, and Ollama behind a common interface |
| 67 | - **No hardcoded models** — Model lists come from API discovery |
| 68 | - **Screengrab fallback** — When extraction fails, save the frame as a captioned screenshot |
| 69 |
+38
-5
| --- docs/architecture/providers.md | ||
| +++ docs/architecture/providers.md | ||
| @@ -9,40 +9,73 @@ | ||
| 9 | 9 | | Provider | Chat | Vision | Transcription | |
| 10 | 10 | |----------|------|--------|--------------| |
| 11 | 11 | | OpenAI | GPT-4o, GPT-4 | GPT-4o | Whisper-1 | |
| 12 | 12 | | Anthropic | Claude Sonnet/Opus | Claude Sonnet/Opus | — | |
| 13 | 13 | | Google Gemini | Gemini Flash/Pro | Gemini Flash/Pro | Gemini Flash | |
| 14 | +| Ollama (local) | Any installed model | llava, moondream, etc. | — (use local Whisper) | | |
| 15 | + | |
| 16 | +## Ollama (offline mode) | |
| 17 | + | |
| 18 | +[Ollama](https://ollama.com) enables fully offline operation with no API keys required. PlanOpticon connects via Ollama's OpenAI-compatible API. | |
| 19 | + | |
| 20 | +```bash | |
| 21 | +# Install and start Ollama | |
| 22 | +ollama serve | |
| 23 | + | |
| 24 | +# Pull a chat model | |
| 25 | +ollama pull llama3.2 | |
| 26 | + | |
| 27 | +# Pull a vision model (for diagram analysis) | |
| 28 | +ollama pull llava | |
| 29 | +``` | |
| 30 | + | |
| 31 | +PlanOpticon auto-detects Ollama when it's running. To force Ollama: | |
| 32 | + | |
| 33 | +```bash | |
| 34 | +planopticon analyze -i video.mp4 -o ./out --provider ollama | |
| 35 | +``` | |
| 36 | + | |
| 37 | +Configure a non-default host via `OLLAMA_HOST`: | |
| 38 | + | |
| 39 | +```bash | |
| 40 | +export OLLAMA_HOST=http://192.168.1.100:11434 | |
| 41 | +``` | |
| 14 | 42 | |
| 15 | 43 | ## Auto-discovery |
| 16 | 44 | |
| 17 | -On startup, `ProviderManager` checks which API keys are configured and queries each provider's API to discover available models: | |
| 45 | +On startup, `ProviderManager` checks which API keys are configured, queries each provider's API, and checks for a running Ollama server to discover available models: | |
| 18 | 46 | |
| 19 | 47 | ```python |
| 20 | 48 | from video_processor.providers.manager import ProviderManager |
| 21 | 49 | |
| 22 | 50 | pm = ProviderManager() |
| 23 | -# Automatically discovers models from all configured providers | |
| 51 | +# Automatically discovers models from all configured providers + Ollama | |
| 24 | 52 | ``` |
| 25 | 53 | |
| 26 | 54 | ## Routing preferences |
| 27 | 55 | |
| 28 | 56 | Each task type has a default preference order: |
| 29 | 57 | |
| 30 | 58 | | Task | Preference | |
| 31 | 59 | |------|-----------| |
| 32 | -| Vision | Gemini Flash → GPT-4o → Claude Sonnet | | |
| 33 | -| Chat | Claude Sonnet → GPT-4o → Gemini Flash | | |
| 34 | -| Transcription | Whisper-1 → Gemini Flash | | |
| 60 | +| Vision | Gemini Flash → GPT-4o → Claude Sonnet → Ollama | | |
| 61 | +| Chat | Claude Sonnet → GPT-4o → Gemini Flash → Ollama | | |
| 62 | +| Transcription | Local Whisper → Whisper-1 → Gemini Flash | | |
| 63 | + | |
| 64 | +Ollama acts as the last-resort fallback — if no cloud API keys are set but Ollama is running, it is used automatically. | |
| 35 | 65 | |
| 36 | 66 | ## Manual override |
| 37 | 67 | |
| 38 | 68 | ```python |
| 39 | 69 | pm = ProviderManager( |
| 40 | 70 | vision_model="gpt-4o", |
| 41 | 71 | chat_model="claude-sonnet-4-5-20250929", |
| 42 | 72 | provider="openai", # Force a specific provider |
| 43 | 73 | ) |
| 74 | + | |
| 75 | +# Or use Ollama for fully offline processing | |
| 76 | +pm = ProviderManager(provider="ollama") | |
| 44 | 77 | ``` |
| 45 | 78 | |
| 46 | 79 | ## BaseProvider interface |
| 47 | 80 | |
| 48 | 81 | All providers implement: |
| 49 | 82 |
| --- docs/architecture/providers.md | |
| +++ docs/architecture/providers.md | |
| @@ -9,40 +9,73 @@ | |
| 9 | | Provider | Chat | Vision | Transcription | |
| 10 | |----------|------|--------|--------------| |
| 11 | | OpenAI | GPT-4o, GPT-4 | GPT-4o | Whisper-1 | |
| 12 | | Anthropic | Claude Sonnet/Opus | Claude Sonnet/Opus | — | |
| 13 | | Google Gemini | Gemini Flash/Pro | Gemini Flash/Pro | Gemini Flash | |
| 14 | |
| 15 | ## Auto-discovery |
| 16 | |
| 17 | On startup, `ProviderManager` checks which API keys are configured and queries each provider's API to discover available models: |
| 18 | |
| 19 | ```python |
| 20 | from video_processor.providers.manager import ProviderManager |
| 21 | |
| 22 | pm = ProviderManager() |
| 23 | # Automatically discovers models from all configured providers |
| 24 | ``` |
| 25 | |
| 26 | ## Routing preferences |
| 27 | |
| 28 | Each task type has a default preference order: |
| 29 | |
| 30 | | Task | Preference | |
| 31 | |------|-----------| |
| 32 | | Vision | Gemini Flash → GPT-4o → Claude Sonnet | |
| 33 | | Chat | Claude Sonnet → GPT-4o → Gemini Flash | |
| 34 | | Transcription | Whisper-1 → Gemini Flash | |
| 35 | |
| 36 | ## Manual override |
| 37 | |
| 38 | ```python |
| 39 | pm = ProviderManager( |
| 40 | vision_model="gpt-4o", |
| 41 | chat_model="claude-sonnet-4-5-20250929", |
| 42 | provider="openai", # Force a specific provider |
| 43 | ) |
| 44 | ``` |
| 45 | |
| 46 | ## BaseProvider interface |
| 47 | |
| 48 | All providers implement: |
| 49 |
| --- docs/architecture/providers.md | |
| +++ docs/architecture/providers.md | |
| @@ -9,40 +9,73 @@ | |
| 9 | | Provider | Chat | Vision | Transcription | |
| 10 | |----------|------|--------|--------------| |
| 11 | | OpenAI | GPT-4o, GPT-4 | GPT-4o | Whisper-1 | |
| 12 | | Anthropic | Claude Sonnet/Opus | Claude Sonnet/Opus | — | |
| 13 | | Google Gemini | Gemini Flash/Pro | Gemini Flash/Pro | Gemini Flash | |
| 14 | | Ollama (local) | Any installed model | llava, moondream, etc. | — (use local Whisper) | |
| 15 | |
| 16 | ## Ollama (offline mode) |
| 17 | |
| 18 | [Ollama](https://ollama.com) enables fully offline operation with no API keys required. PlanOpticon connects via Ollama's OpenAI-compatible API. |
| 19 | |
| 20 | ```bash |
| 21 | # Install and start Ollama |
| 22 | ollama serve |
| 23 | |
| 24 | # Pull a chat model |
| 25 | ollama pull llama3.2 |
| 26 | |
| 27 | # Pull a vision model (for diagram analysis) |
| 28 | ollama pull llava |
| 29 | ``` |
| 30 | |
| 31 | PlanOpticon auto-detects Ollama when it's running. To force Ollama: |
| 32 | |
| 33 | ```bash |
| 34 | planopticon analyze -i video.mp4 -o ./out --provider ollama |
| 35 | ``` |
| 36 | |
| 37 | Configure a non-default host via `OLLAMA_HOST`: |
| 38 | |
| 39 | ```bash |
| 40 | export OLLAMA_HOST=http://192.168.1.100:11434 |
| 41 | ``` |
| 42 | |
| 43 | ## Auto-discovery |
| 44 | |
| 45 | On startup, `ProviderManager` checks which API keys are configured, queries each provider's API, and checks for a running Ollama server to discover available models: |
| 46 | |
| 47 | ```python |
| 48 | from video_processor.providers.manager import ProviderManager |
| 49 | |
| 50 | pm = ProviderManager() |
| 51 | # Automatically discovers models from all configured providers + Ollama |
| 52 | ``` |
| 53 | |
| 54 | ## Routing preferences |
| 55 | |
| 56 | Each task type has a default preference order: |
| 57 | |
| 58 | | Task | Preference | |
| 59 | |------|-----------| |
| 60 | | Vision | Gemini Flash → GPT-4o → Claude Sonnet → Ollama | |
| 61 | | Chat | Claude Sonnet → GPT-4o → Gemini Flash → Ollama | |
| 62 | | Transcription | Local Whisper → Whisper-1 → Gemini Flash | |
| 63 | |
| 64 | Ollama acts as the last-resort fallback — if no cloud API keys are set but Ollama is running, it is used automatically. |
| 65 | |
| 66 | ## Manual override |
| 67 | |
| 68 | ```python |
| 69 | pm = ProviderManager( |
| 70 | vision_model="gpt-4o", |
| 71 | chat_model="claude-sonnet-4-5-20250929", |
| 72 | provider="openai", # Force a specific provider |
| 73 | ) |
| 74 | |
| 75 | # Or use Ollama for fully offline processing |
| 76 | pm = ProviderManager(provider="ollama") |
| 77 | ``` |
| 78 | |
| 79 | ## BaseProvider interface |
| 80 | |
| 81 | All providers implement: |
| 82 |
+3
-3
| --- docs/cli-reference.md | ||
| +++ docs/cli-reference.md | ||
| @@ -17,11 +17,11 @@ | ||
| 17 | 17 | | `--use-gpu` | FLAG | off | Enable GPU acceleration | |
| 18 | 18 | | `--sampling-rate` | FLOAT | 0.5 | Frame sampling rate (fps) | |
| 19 | 19 | | `--change-threshold` | FLOAT | 0.15 | Visual change threshold | |
| 20 | 20 | | `--periodic-capture` | FLOAT | 30.0 | Capture a frame every N seconds regardless of change (0 to disable) | |
| 21 | 21 | | `--title` | TEXT | auto | Report title | |
| 22 | -| `-p`, `--provider` | `auto\|openai\|anthropic\|gemini` | `auto` | API provider | | |
| 22 | +| `-p`, `--provider` | `auto\|openai\|anthropic\|gemini\|ollama` | `auto` | API provider | | |
| 23 | 23 | | `--vision-model` | TEXT | auto | Override vision model | |
| 24 | 24 | | `--chat-model` | TEXT | auto | Override chat model | |
| 25 | 25 | |
| 26 | 26 | --- |
| 27 | 27 | |
| @@ -38,11 +38,11 @@ | ||
| 38 | 38 | | `-i`, `--input-dir` | PATH | *required* | Directory containing videos | |
| 39 | 39 | | `-o`, `--output` | PATH | *required* | Output directory | |
| 40 | 40 | | `--depth` | `basic\|standard\|comprehensive` | `standard` | Processing depth | |
| 41 | 41 | | `--pattern` | TEXT | `*.mp4,*.mkv,*.avi,*.mov,*.webm` | File glob patterns | |
| 42 | 42 | | `--title` | TEXT | `Batch Processing Results` | Batch title | |
| 43 | -| `-p`, `--provider` | `auto\|openai\|anthropic\|gemini` | `auto` | API provider | | |
| 43 | +| `-p`, `--provider` | `auto\|openai\|anthropic\|gemini\|ollama` | `auto` | API provider | | |
| 44 | 44 | | `--vision-model` | TEXT | auto | Override vision model | |
| 45 | 45 | | `--chat-model` | TEXT | auto | Override chat model | |
| 46 | 46 | | `--source` | `local\|gdrive\|dropbox` | `local` | Video source | |
| 47 | 47 | | `--folder-id` | TEXT | none | Google Drive folder ID | |
| 48 | 48 | | `--folder-path` | TEXT | none | Cloud folder path | |
| @@ -90,11 +90,11 @@ | ||
| 90 | 90 | |--------|------|---------|-------------| |
| 91 | 91 | | `-i`, `--input` | PATH | *required* | Input video file path | |
| 92 | 92 | | `-o`, `--output` | PATH | *required* | Output directory | |
| 93 | 93 | | `--depth` | `basic\|standard\|comprehensive` | `standard` | Initial processing depth (agent may adapt) | |
| 94 | 94 | | `--title` | TEXT | auto | Report title | |
| 95 | -| `-p`, `--provider` | `auto\|openai\|anthropic\|gemini` | `auto` | API provider | | |
| 95 | +| `-p`, `--provider` | `auto\|openai\|anthropic\|gemini\|ollama` | `auto` | API provider | | |
| 96 | 96 | | `--vision-model` | TEXT | auto | Override vision model | |
| 97 | 97 | | `--chat-model` | TEXT | auto | Override chat model | |
| 98 | 98 | |
| 99 | 99 | --- |
| 100 | 100 | |
| 101 | 101 |
| --- docs/cli-reference.md | |
| +++ docs/cli-reference.md | |
| @@ -17,11 +17,11 @@ | |
| 17 | | `--use-gpu` | FLAG | off | Enable GPU acceleration | |
| 18 | | `--sampling-rate` | FLOAT | 0.5 | Frame sampling rate (fps) | |
| 19 | | `--change-threshold` | FLOAT | 0.15 | Visual change threshold | |
| 20 | | `--periodic-capture` | FLOAT | 30.0 | Capture a frame every N seconds regardless of change (0 to disable) | |
| 21 | | `--title` | TEXT | auto | Report title | |
| 22 | | `-p`, `--provider` | `auto\|openai\|anthropic\|gemini` | `auto` | API provider | |
| 23 | | `--vision-model` | TEXT | auto | Override vision model | |
| 24 | | `--chat-model` | TEXT | auto | Override chat model | |
| 25 | |
| 26 | --- |
| 27 | |
| @@ -38,11 +38,11 @@ | |
| 38 | | `-i`, `--input-dir` | PATH | *required* | Directory containing videos | |
| 39 | | `-o`, `--output` | PATH | *required* | Output directory | |
| 40 | | `--depth` | `basic\|standard\|comprehensive` | `standard` | Processing depth | |
| 41 | | `--pattern` | TEXT | `*.mp4,*.mkv,*.avi,*.mov,*.webm` | File glob patterns | |
| 42 | | `--title` | TEXT | `Batch Processing Results` | Batch title | |
| 43 | | `-p`, `--provider` | `auto\|openai\|anthropic\|gemini` | `auto` | API provider | |
| 44 | | `--vision-model` | TEXT | auto | Override vision model | |
| 45 | | `--chat-model` | TEXT | auto | Override chat model | |
| 46 | | `--source` | `local\|gdrive\|dropbox` | `local` | Video source | |
| 47 | | `--folder-id` | TEXT | none | Google Drive folder ID | |
| 48 | | `--folder-path` | TEXT | none | Cloud folder path | |
| @@ -90,11 +90,11 @@ | |
| 90 | |--------|------|---------|-------------| |
| 91 | | `-i`, `--input` | PATH | *required* | Input video file path | |
| 92 | | `-o`, `--output` | PATH | *required* | Output directory | |
| 93 | | `--depth` | `basic\|standard\|comprehensive` | `standard` | Initial processing depth (agent may adapt) | |
| 94 | | `--title` | TEXT | auto | Report title | |
| 95 | | `-p`, `--provider` | `auto\|openai\|anthropic\|gemini` | `auto` | API provider | |
| 96 | | `--vision-model` | TEXT | auto | Override vision model | |
| 97 | | `--chat-model` | TEXT | auto | Override chat model | |
| 98 | |
| 99 | --- |
| 100 | |
| 101 |
| --- docs/cli-reference.md | |
| +++ docs/cli-reference.md | |
| @@ -17,11 +17,11 @@ | |
| 17 | | `--use-gpu` | FLAG | off | Enable GPU acceleration | |
| 18 | | `--sampling-rate` | FLOAT | 0.5 | Frame sampling rate (fps) | |
| 19 | | `--change-threshold` | FLOAT | 0.15 | Visual change threshold | |
| 20 | | `--periodic-capture` | FLOAT | 30.0 | Capture a frame every N seconds regardless of change (0 to disable) | |
| 21 | | `--title` | TEXT | auto | Report title | |
| 22 | | `-p`, `--provider` | `auto\|openai\|anthropic\|gemini\|ollama` | `auto` | API provider | |
| 23 | | `--vision-model` | TEXT | auto | Override vision model | |
| 24 | | `--chat-model` | TEXT | auto | Override chat model | |
| 25 | |
| 26 | --- |
| 27 | |
| @@ -38,11 +38,11 @@ | |
| 38 | | `-i`, `--input-dir` | PATH | *required* | Directory containing videos | |
| 39 | | `-o`, `--output` | PATH | *required* | Output directory | |
| 40 | | `--depth` | `basic\|standard\|comprehensive` | `standard` | Processing depth | |
| 41 | | `--pattern` | TEXT | `*.mp4,*.mkv,*.avi,*.mov,*.webm` | File glob patterns | |
| 42 | | `--title` | TEXT | `Batch Processing Results` | Batch title | |
| 43 | | `-p`, `--provider` | `auto\|openai\|anthropic\|gemini\|ollama` | `auto` | API provider | |
| 44 | | `--vision-model` | TEXT | auto | Override vision model | |
| 45 | | `--chat-model` | TEXT | auto | Override chat model | |
| 46 | | `--source` | `local\|gdrive\|dropbox` | `local` | Video source | |
| 47 | | `--folder-id` | TEXT | none | Google Drive folder ID | |
| 48 | | `--folder-path` | TEXT | none | Cloud folder path | |
| @@ -90,11 +90,11 @@ | |
| 90 | |--------|------|---------|-------------| |
| 91 | | `-i`, `--input` | PATH | *required* | Input video file path | |
| 92 | | `-o`, `--output` | PATH | *required* | Output directory | |
| 93 | | `--depth` | `basic\|standard\|comprehensive` | `standard` | Initial processing depth (agent may adapt) | |
| 94 | | `--title` | TEXT | auto | Report title | |
| 95 | | `-p`, `--provider` | `auto\|openai\|anthropic\|gemini\|ollama` | `auto` | API provider | |
| 96 | | `--vision-model` | TEXT | auto | Override vision model | |
| 97 | | `--chat-model` | TEXT | auto | Override chat model | |
| 98 | |
| 99 | --- |
| 100 | |
| 101 |
| --- docs/getting-started/configuration.md | ||
| +++ docs/getting-started/configuration.md | ||
| @@ -5,22 +5,25 @@ | ||
| 5 | 5 | | Variable | Description | |
| 6 | 6 | |----------|-------------| |
| 7 | 7 | | `OPENAI_API_KEY` | OpenAI API key | |
| 8 | 8 | | `ANTHROPIC_API_KEY` | Anthropic API key | |
| 9 | 9 | | `GEMINI_API_KEY` | Google Gemini API key | |
| 10 | +| `OLLAMA_HOST` | Ollama server URL (default: `http://localhost:11434`) | | |
| 10 | 11 | | `GOOGLE_APPLICATION_CREDENTIALS` | Path to Google service account JSON (for Drive) | |
| 11 | 12 | | `CACHE_DIR` | Directory for API response caching | |
| 12 | 13 | |
| 13 | 14 | ## Provider routing |
| 14 | 15 | |
| 15 | 16 | PlanOpticon auto-discovers available models and routes each task to the best option: |
| 16 | 17 | |
| 17 | 18 | | Task | Default preference | |
| 18 | 19 | |------|--------------------| |
| 19 | -| Vision (diagrams) | Gemini Flash > GPT-4o > Claude Sonnet | | |
| 20 | -| Chat (analysis) | Claude Sonnet > GPT-4o > Gemini Flash | | |
| 21 | -| Transcription | Whisper-1 > Gemini Flash | | |
| 20 | +| Vision (diagrams) | Gemini Flash > GPT-4o > Claude Sonnet > Ollama | | |
| 21 | +| Chat (analysis) | Claude Sonnet > GPT-4o > Gemini Flash > Ollama | | |
| 22 | +| Transcription | Local Whisper > Whisper-1 > Gemini Flash | | |
| 23 | + | |
| 24 | +If no cloud API keys are configured, PlanOpticon automatically falls back to Ollama when a local server is running. This enables fully offline operation when paired with local Whisper for transcription. | |
| 22 | 25 | |
| 23 | 26 | Override with `--provider`, `--vision-model`, or `--chat-model` flags. |
| 24 | 27 | |
| 25 | 28 | ## Frame sampling |
| 26 | 29 | |
| 27 | 30 |
| --- docs/getting-started/configuration.md | |
| +++ docs/getting-started/configuration.md | |
| @@ -5,22 +5,25 @@ | |
| 5 | | Variable | Description | |
| 6 | |----------|-------------| |
| 7 | | `OPENAI_API_KEY` | OpenAI API key | |
| 8 | | `ANTHROPIC_API_KEY` | Anthropic API key | |
| 9 | | `GEMINI_API_KEY` | Google Gemini API key | |
| 10 | | `GOOGLE_APPLICATION_CREDENTIALS` | Path to Google service account JSON (for Drive) | |
| 11 | | `CACHE_DIR` | Directory for API response caching | |
| 12 | |
| 13 | ## Provider routing |
| 14 | |
| 15 | PlanOpticon auto-discovers available models and routes each task to the best option: |
| 16 | |
| 17 | | Task | Default preference | |
| 18 | |------|--------------------| |
| 19 | | Vision (diagrams) | Gemini Flash > GPT-4o > Claude Sonnet | |
| 20 | | Chat (analysis) | Claude Sonnet > GPT-4o > Gemini Flash | |
| 21 | | Transcription | Whisper-1 > Gemini Flash | |
| 22 | |
| 23 | Override with `--provider`, `--vision-model`, or `--chat-model` flags. |
| 24 | |
| 25 | ## Frame sampling |
| 26 | |
| 27 |
| --- docs/getting-started/configuration.md | |
| +++ docs/getting-started/configuration.md | |
| @@ -5,22 +5,25 @@ | |
| 5 | | Variable | Description | |
| 6 | |----------|-------------| |
| 7 | | `OPENAI_API_KEY` | OpenAI API key | |
| 8 | | `ANTHROPIC_API_KEY` | Anthropic API key | |
| 9 | | `GEMINI_API_KEY` | Google Gemini API key | |
| 10 | | `OLLAMA_HOST` | Ollama server URL (default: `http://localhost:11434`) | |
| 11 | | `GOOGLE_APPLICATION_CREDENTIALS` | Path to Google service account JSON (for Drive) | |
| 12 | | `CACHE_DIR` | Directory for API response caching | |
| 13 | |
| 14 | ## Provider routing |
| 15 | |
| 16 | PlanOpticon auto-discovers available models and routes each task to the best option: |
| 17 | |
| 18 | | Task | Default preference | |
| 19 | |------|--------------------| |
| 20 | | Vision (diagrams) | Gemini Flash > GPT-4o > Claude Sonnet > Ollama | |
| 21 | | Chat (analysis) | Claude Sonnet > GPT-4o > Gemini Flash > Ollama | |
| 22 | | Transcription | Local Whisper > Whisper-1 > Gemini Flash | |
| 23 | |
| 24 | If no cloud API keys are configured, PlanOpticon automatically falls back to Ollama when a local server is running. This enables fully offline operation when paired with local Whisper for transcription. |
| 25 | |
| 26 | Override with `--provider`, `--vision-model`, or `--chat-model` flags. |
| 27 | |
| 28 | ## Frame sampling |
| 29 | |
| 30 |
+20
-1
| --- docs/getting-started/installation.md | ||
| +++ docs/getting-started/installation.md | ||
| @@ -62,11 +62,15 @@ | ||
| 62 | 62 | |
| 63 | 63 | Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to PATH. |
| 64 | 64 | |
| 65 | 65 | ## API keys |
| 66 | 66 | |
| 67 | -You need at least one AI provider API key. Set them as environment variables: | |
| 67 | +You need at least one AI provider API key **or** a running Ollama server. | |
| 68 | + | |
| 69 | +### Cloud providers | |
| 70 | + | |
| 71 | +Set API keys as environment variables: | |
| 68 | 72 | |
| 69 | 73 | ```bash |
| 70 | 74 | export OPENAI_API_KEY="sk-..." |
| 71 | 75 | export ANTHROPIC_API_KEY="sk-ant-..." |
| 72 | 76 | export GEMINI_API_KEY="AI..." |
| @@ -78,6 +82,21 @@ | ||
| 78 | 82 | OPENAI_API_KEY=sk-... |
| 79 | 83 | ANTHROPIC_API_KEY=sk-ant-... |
| 80 | 84 | GEMINI_API_KEY=AI... |
| 81 | 85 | ``` |
| 82 | 86 | |
| 87 | +### Ollama (fully offline) | |
| 88 | + | |
| 89 | +No API keys needed — just install and run [Ollama](https://ollama.com): | |
| 90 | + | |
| 91 | +```bash | |
| 92 | +# Install Ollama, then pull models | |
| 93 | +ollama pull llama3.2 # Chat/analysis | |
| 94 | +ollama pull llava # Vision (diagram detection) | |
| 95 | + | |
| 96 | +# Start the server (if not already running) | |
| 97 | +ollama serve | |
| 98 | +``` | |
| 99 | + | |
| 100 | +PlanOpticon auto-detects Ollama and uses it as a fallback when no cloud API keys are set. For a fully offline pipeline, pair Ollama with local Whisper transcription (`pip install planopticon[gpu]`). | |
| 101 | + | |
| 83 | 102 | PlanOpticon will automatically discover which providers are available and route to the best model for each task. |
| 84 | 103 |
| --- docs/getting-started/installation.md | |
| +++ docs/getting-started/installation.md | |
| @@ -62,11 +62,15 @@ | |
| 62 | |
| 63 | Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to PATH. |
| 64 | |
| 65 | ## API keys |
| 66 | |
| 67 | You need at least one AI provider API key. Set them as environment variables: |
| 68 | |
| 69 | ```bash |
| 70 | export OPENAI_API_KEY="sk-..." |
| 71 | export ANTHROPIC_API_KEY="sk-ant-..." |
| 72 | export GEMINI_API_KEY="AI..." |
| @@ -78,6 +82,21 @@ | |
| 78 | OPENAI_API_KEY=sk-... |
| 79 | ANTHROPIC_API_KEY=sk-ant-... |
| 80 | GEMINI_API_KEY=AI... |
| 81 | ``` |
| 82 | |
| 83 | PlanOpticon will automatically discover which providers are available and route to the best model for each task. |
| 84 |
| --- docs/getting-started/installation.md | |
| +++ docs/getting-started/installation.md | |
| @@ -62,11 +62,15 @@ | |
| 62 | |
| 63 | Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to PATH. |
| 64 | |
| 65 | ## API keys |
| 66 | |
| 67 | You need at least one AI provider API key **or** a running Ollama server. |
| 68 | |
| 69 | ### Cloud providers |
| 70 | |
| 71 | Set API keys as environment variables: |
| 72 | |
| 73 | ```bash |
| 74 | export OPENAI_API_KEY="sk-..." |
| 75 | export ANTHROPIC_API_KEY="sk-ant-..." |
| 76 | export GEMINI_API_KEY="AI..." |
| @@ -78,6 +82,21 @@ | |
| 82 | OPENAI_API_KEY=sk-... |
| 83 | ANTHROPIC_API_KEY=sk-ant-... |
| 84 | GEMINI_API_KEY=AI... |
| 85 | ``` |
| 86 | |
| 87 | ### Ollama (fully offline) |
| 88 | |
| 89 | No API keys needed — just install and run [Ollama](https://ollama.com): |
| 90 | |
| 91 | ```bash |
| 92 | # Install Ollama, then pull models |
| 93 | ollama pull llama3.2 # Chat/analysis |
| 94 | ollama pull llava # Vision (diagram detection) |
| 95 | |
| 96 | # Start the server (if not already running) |
| 97 | ollama serve |
| 98 | ``` |
| 99 | |
| 100 | PlanOpticon auto-detects Ollama and uses it as a fallback when no cloud API keys are set. For a fully offline pipeline, pair Ollama with local Whisper transcription (`pip install planopticon[gpu]`). |
| 101 | |
| 102 | PlanOpticon will automatically discover which providers are available and route to the best model for each task. |
| 103 |
| --- docs/getting-started/quickstart.md | ||
| +++ docs/getting-started/quickstart.md | ||
| @@ -35,10 +35,13 @@ | ||
| 35 | 35 | # Auto-detect best available (default) |
| 36 | 36 | planopticon analyze -i video.mp4 -o ./out |
| 37 | 37 | |
| 38 | 38 | # Force a specific provider |
| 39 | 39 | planopticon analyze -i video.mp4 -o ./out --provider openai |
| 40 | + | |
| 41 | +# Use Ollama for fully offline processing (no API keys needed) | |
| 42 | +planopticon analyze -i video.mp4 -o ./out --provider ollama | |
| 40 | 43 | |
| 41 | 44 | # Override specific models |
| 42 | 45 | planopticon analyze -i video.mp4 -o ./out \ |
| 43 | 46 | --vision-model gpt-4o \ |
| 44 | 47 | --chat-model claude-sonnet-4-5-20250929 |
| 45 | 48 |
| --- docs/getting-started/quickstart.md | |
| +++ docs/getting-started/quickstart.md | |
| @@ -35,10 +35,13 @@ | |
| 35 | # Auto-detect best available (default) |
| 36 | planopticon analyze -i video.mp4 -o ./out |
| 37 | |
| 38 | # Force a specific provider |
| 39 | planopticon analyze -i video.mp4 -o ./out --provider openai |
| 40 | |
| 41 | # Override specific models |
| 42 | planopticon analyze -i video.mp4 -o ./out \ |
| 43 | --vision-model gpt-4o \ |
| 44 | --chat-model claude-sonnet-4-5-20250929 |
| 45 |
| --- docs/getting-started/quickstart.md | |
| +++ docs/getting-started/quickstart.md | |
| @@ -35,10 +35,13 @@ | |
| 35 | # Auto-detect best available (default) |
| 36 | planopticon analyze -i video.mp4 -o ./out |
| 37 | |
| 38 | # Force a specific provider |
| 39 | planopticon analyze -i video.mp4 -o ./out --provider openai |
| 40 | |
| 41 | # Use Ollama for fully offline processing (no API keys needed) |
| 42 | planopticon analyze -i video.mp4 -o ./out --provider ollama |
| 43 | |
| 44 | # Override specific models |
| 45 | planopticon analyze -i video.mp4 -o ./out \ |
| 46 | --vision-model gpt-4o \ |
| 47 | --chat-model claude-sonnet-4-5-20250929 |
| 48 |
+91
-3
| --- tests/test_providers.py | ||
| +++ tests/test_providers.py | ||
| @@ -1,8 +1,10 @@ | ||
| 1 | 1 | """Tests for the provider abstraction layer.""" |
| 2 | 2 | |
| 3 | 3 | from unittest.mock import MagicMock, patch |
| 4 | + | |
| 5 | +import pytest | |
| 4 | 6 | |
| 5 | 7 | from video_processor.providers.base import BaseProvider, ModelInfo |
| 6 | 8 | from video_processor.providers.manager import ProviderManager |
| 7 | 9 | |
| 8 | 10 | |
| @@ -107,21 +109,27 @@ | ||
| 107 | 109 | assert "anthropic/claude-sonnet-4-5-20250929" == used["chat"] |
| 108 | 110 | |
| 109 | 111 | |
| 110 | 112 | class TestDiscovery: |
| 111 | 113 | @patch("video_processor.providers.discovery._cached_models", None) |
| 114 | + @patch( | |
| 115 | + "video_processor.providers.ollama_provider.OllamaProvider.is_available", return_value=False | |
| 116 | + ) | |
| 112 | 117 | @patch.dict("os.environ", {}, clear=True) |
| 113 | - def test_discover_skips_missing_keys(self): | |
| 118 | + def test_discover_skips_missing_keys(self, mock_ollama): | |
| 114 | 119 | from video_processor.providers.discovery import discover_available_models |
| 115 | 120 | |
| 116 | - # No API keys -> empty list, no errors | |
| 121 | + # No API keys and no Ollama -> empty list, no errors | |
| 117 | 122 | models = discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""}) |
| 118 | 123 | assert models == [] |
| 119 | 124 | |
| 120 | 125 | @patch.dict("os.environ", {}, clear=True) |
| 126 | + @patch( | |
| 127 | + "video_processor.providers.ollama_provider.OllamaProvider.is_available", return_value=False | |
| 128 | + ) | |
| 121 | 129 | @patch("video_processor.providers.discovery._cached_models", None) |
| 122 | - def test_discover_caches_results(self): | |
| 130 | + def test_discover_caches_results(self, mock_ollama): | |
| 123 | 131 | from video_processor.providers import discovery |
| 124 | 132 | |
| 125 | 133 | models = discovery.discover_available_models( |
| 126 | 134 | api_keys={"openai": "", "anthropic": "", "gemini": ""} |
| 127 | 135 | ) |
| @@ -131,5 +139,85 @@ | ||
| 131 | 139 | assert models2 == [] # Still cached empty result |
| 132 | 140 | |
| 133 | 141 | # Force refresh |
| 134 | 142 | discovery.clear_discovery_cache() |
| 135 | 143 | # Would try to connect with real key, so skip that test |
| 144 | + | |
| 145 | + | |
| 146 | +class TestOllamaProvider: | |
| 147 | + @patch("video_processor.providers.ollama_provider.requests") | |
| 148 | + def test_is_available_when_running(self, mock_requests): | |
| 149 | + mock_resp = MagicMock() | |
| 150 | + mock_resp.status_code = 200 | |
| 151 | + mock_requests.get.return_value = mock_resp | |
| 152 | + | |
| 153 | + from video_processor.providers.ollama_provider import OllamaProvider | |
| 154 | + | |
| 155 | + assert OllamaProvider.is_available() | |
| 156 | + | |
| 157 | + @patch("video_processor.providers.ollama_provider.requests") | |
| 158 | + def test_is_available_when_not_running(self, mock_requests): | |
| 159 | + mock_requests.get.side_effect = ConnectionError | |
| 160 | + | |
| 161 | + from video_processor.providers.ollama_provider import OllamaProvider | |
| 162 | + | |
| 163 | + assert not OllamaProvider.is_available() | |
| 164 | + | |
| 165 | + @patch("video_processor.providers.ollama_provider.requests") | |
| 166 | + @patch("video_processor.providers.ollama_provider.OpenAI") | |
| 167 | + def test_transcribe_raises(self, mock_openai, mock_requests): | |
| 168 | + from video_processor.providers.ollama_provider import OllamaProvider | |
| 169 | + | |
| 170 | + provider = OllamaProvider() | |
| 171 | + with pytest.raises(NotImplementedError): | |
| 172 | + provider.transcribe_audio("/tmp/test.wav") | |
| 173 | + | |
| 174 | + @patch("video_processor.providers.ollama_provider.requests") | |
| 175 | + @patch("video_processor.providers.ollama_provider.OpenAI") | |
| 176 | + def test_list_models(self, mock_openai, mock_requests): | |
| 177 | + mock_resp = MagicMock() | |
| 178 | + mock_resp.status_code = 200 | |
| 179 | + mock_resp.json.return_value = { | |
| 180 | + "models": [ | |
| 181 | + {"name": "llama3.2:latest", "details": {"family": "llama"}}, | |
| 182 | + {"name": "llava:13b", "details": {"family": "llava"}}, | |
| 183 | + ] | |
| 184 | + } | |
| 185 | + mock_requests.get.return_value = mock_resp | |
| 186 | + | |
| 187 | + from video_processor.providers.ollama_provider import OllamaProvider | |
| 188 | + | |
| 189 | + provider = OllamaProvider() | |
| 190 | + models = provider.list_models() | |
| 191 | + assert len(models) == 2 | |
| 192 | + assert models[0].provider == "ollama" | |
| 193 | + | |
| 194 | + # llava should have vision capability | |
| 195 | + llava = [m for m in models if "llava" in m.id][0] | |
| 196 | + assert "vision" in llava.capabilities | |
| 197 | + | |
| 198 | + # llama should have only chat | |
| 199 | + llama = [m for m in models if "llama" in m.id][0] | |
| 200 | + assert "chat" in llama.capabilities | |
| 201 | + assert "vision" not in llama.capabilities | |
| 202 | + | |
| 203 | + def test_provider_for_model_ollama_via_discovery(self): | |
| 204 | + mgr = ProviderManager() | |
| 205 | + mgr._available_models = [ | |
| 206 | + ModelInfo(id="llama3.2:latest", provider="ollama", capabilities=["chat"]), | |
| 207 | + ] | |
| 208 | + assert mgr._provider_for_model("llama3.2:latest") == "ollama" | |
| 209 | + | |
| 210 | + def test_provider_for_model_ollama_fuzzy_tag(self): | |
| 211 | + mgr = ProviderManager() | |
| 212 | + mgr._available_models = [ | |
| 213 | + ModelInfo(id="llama3.2:latest", provider="ollama", capabilities=["chat"]), | |
| 214 | + ] | |
| 215 | + # Should match "llama3.2" to "llama3.2:latest" via prefix | |
| 216 | + assert mgr._provider_for_model("llama3.2") == "ollama" | |
| 217 | + | |
| 218 | + def test_init_forced_provider_ollama(self): | |
| 219 | + mgr = ProviderManager(provider="ollama") | |
| 220 | + # Ollama defaults are empty (resolved dynamically) | |
| 221 | + assert mgr.vision_model == "" | |
| 222 | + assert mgr.chat_model == "" | |
| 223 | + assert mgr.transcription_model == "" | |
| 136 | 224 |
| --- tests/test_providers.py | |
| +++ tests/test_providers.py | |
| @@ -1,8 +1,10 @@ | |
| 1 | """Tests for the provider abstraction layer.""" |
| 2 | |
| 3 | from unittest.mock import MagicMock, patch |
| 4 | |
| 5 | from video_processor.providers.base import BaseProvider, ModelInfo |
| 6 | from video_processor.providers.manager import ProviderManager |
| 7 | |
| 8 | |
| @@ -107,21 +109,27 @@ | |
| 107 | assert "anthropic/claude-sonnet-4-5-20250929" == used["chat"] |
| 108 | |
| 109 | |
| 110 | class TestDiscovery: |
| 111 | @patch("video_processor.providers.discovery._cached_models", None) |
| 112 | @patch.dict("os.environ", {}, clear=True) |
| 113 | def test_discover_skips_missing_keys(self): |
| 114 | from video_processor.providers.discovery import discover_available_models |
| 115 | |
| 116 | # No API keys -> empty list, no errors |
| 117 | models = discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""}) |
| 118 | assert models == [] |
| 119 | |
| 120 | @patch.dict("os.environ", {}, clear=True) |
| 121 | @patch("video_processor.providers.discovery._cached_models", None) |
| 122 | def test_discover_caches_results(self): |
| 123 | from video_processor.providers import discovery |
| 124 | |
| 125 | models = discovery.discover_available_models( |
| 126 | api_keys={"openai": "", "anthropic": "", "gemini": ""} |
| 127 | ) |
| @@ -131,5 +139,85 @@ | |
| 131 | assert models2 == [] # Still cached empty result |
| 132 | |
| 133 | # Force refresh |
| 134 | discovery.clear_discovery_cache() |
| 135 | # Would try to connect with real key, so skip that test |
| 136 |
| --- tests/test_providers.py | |
| +++ tests/test_providers.py | |
| @@ -1,8 +1,10 @@ | |
| 1 | """Tests for the provider abstraction layer.""" |
| 2 | |
| 3 | from unittest.mock import MagicMock, patch |
| 4 | |
| 5 | import pytest |
| 6 | |
| 7 | from video_processor.providers.base import BaseProvider, ModelInfo |
| 8 | from video_processor.providers.manager import ProviderManager |
| 9 | |
| 10 | |
| @@ -107,21 +109,27 @@ | |
| 109 | assert "anthropic/claude-sonnet-4-5-20250929" == used["chat"] |
| 110 | |
| 111 | |
| 112 | class TestDiscovery: |
| 113 | @patch("video_processor.providers.discovery._cached_models", None) |
| 114 | @patch( |
| 115 | "video_processor.providers.ollama_provider.OllamaProvider.is_available", return_value=False |
| 116 | ) |
| 117 | @patch.dict("os.environ", {}, clear=True) |
| 118 | def test_discover_skips_missing_keys(self, mock_ollama): |
| 119 | from video_processor.providers.discovery import discover_available_models |
| 120 | |
| 121 | # No API keys and no Ollama -> empty list, no errors |
| 122 | models = discover_available_models(api_keys={"openai": "", "anthropic": "", "gemini": ""}) |
| 123 | assert models == [] |
| 124 | |
| 125 | @patch.dict("os.environ", {}, clear=True) |
| 126 | @patch( |
| 127 | "video_processor.providers.ollama_provider.OllamaProvider.is_available", return_value=False |
| 128 | ) |
| 129 | @patch("video_processor.providers.discovery._cached_models", None) |
| 130 | def test_discover_caches_results(self, mock_ollama): |
| 131 | from video_processor.providers import discovery |
| 132 | |
| 133 | models = discovery.discover_available_models( |
| 134 | api_keys={"openai": "", "anthropic": "", "gemini": ""} |
| 135 | ) |
| @@ -131,5 +139,85 @@ | |
| 139 | assert models2 == [] # Still cached empty result |
| 140 | |
| 141 | # Force refresh |
| 142 | discovery.clear_discovery_cache() |
| 143 | # Would try to connect with real key, so skip that test |
| 144 | |
| 145 | |
| 146 | class TestOllamaProvider: |
| 147 | @patch("video_processor.providers.ollama_provider.requests") |
| 148 | def test_is_available_when_running(self, mock_requests): |
| 149 | mock_resp = MagicMock() |
| 150 | mock_resp.status_code = 200 |
| 151 | mock_requests.get.return_value = mock_resp |
| 152 | |
| 153 | from video_processor.providers.ollama_provider import OllamaProvider |
| 154 | |
| 155 | assert OllamaProvider.is_available() |
| 156 | |
| 157 | @patch("video_processor.providers.ollama_provider.requests") |
| 158 | def test_is_available_when_not_running(self, mock_requests): |
| 159 | mock_requests.get.side_effect = ConnectionError |
| 160 | |
| 161 | from video_processor.providers.ollama_provider import OllamaProvider |
| 162 | |
| 163 | assert not OllamaProvider.is_available() |
| 164 | |
| 165 | @patch("video_processor.providers.ollama_provider.requests") |
| 166 | @patch("video_processor.providers.ollama_provider.OpenAI") |
| 167 | def test_transcribe_raises(self, mock_openai, mock_requests): |
| 168 | from video_processor.providers.ollama_provider import OllamaProvider |
| 169 | |
| 170 | provider = OllamaProvider() |
| 171 | with pytest.raises(NotImplementedError): |
| 172 | provider.transcribe_audio("/tmp/test.wav") |
| 173 | |
| 174 | @patch("video_processor.providers.ollama_provider.requests") |
| 175 | @patch("video_processor.providers.ollama_provider.OpenAI") |
| 176 | def test_list_models(self, mock_openai, mock_requests): |
| 177 | mock_resp = MagicMock() |
| 178 | mock_resp.status_code = 200 |
| 179 | mock_resp.json.return_value = { |
| 180 | "models": [ |
| 181 | {"name": "llama3.2:latest", "details": {"family": "llama"}}, |
| 182 | {"name": "llava:13b", "details": {"family": "llava"}}, |
| 183 | ] |
| 184 | } |
| 185 | mock_requests.get.return_value = mock_resp |
| 186 | |
| 187 | from video_processor.providers.ollama_provider import OllamaProvider |
| 188 | |
| 189 | provider = OllamaProvider() |
| 190 | models = provider.list_models() |
| 191 | assert len(models) == 2 |
| 192 | assert models[0].provider == "ollama" |
| 193 | |
| 194 | # llava should have vision capability |
| 195 | llava = [m for m in models if "llava" in m.id][0] |
| 196 | assert "vision" in llava.capabilities |
| 197 | |
| 198 | # llama should have only chat |
| 199 | llama = [m for m in models if "llama" in m.id][0] |
| 200 | assert "chat" in llama.capabilities |
| 201 | assert "vision" not in llama.capabilities |
| 202 | |
| 203 | def test_provider_for_model_ollama_via_discovery(self): |
| 204 | mgr = ProviderManager() |
| 205 | mgr._available_models = [ |
| 206 | ModelInfo(id="llama3.2:latest", provider="ollama", capabilities=["chat"]), |
| 207 | ] |
| 208 | assert mgr._provider_for_model("llama3.2:latest") == "ollama" |
| 209 | |
| 210 | def test_provider_for_model_ollama_fuzzy_tag(self): |
| 211 | mgr = ProviderManager() |
| 212 | mgr._available_models = [ |
| 213 | ModelInfo(id="llama3.2:latest", provider="ollama", capabilities=["chat"]), |
| 214 | ] |
| 215 | # Should match "llama3.2" to "llama3.2:latest" via prefix |
| 216 | assert mgr._provider_for_model("llama3.2") == "ollama" |
| 217 | |
| 218 | def test_init_forced_provider_ollama(self): |
| 219 | mgr = ProviderManager(provider="ollama") |
| 220 | # Ollama defaults are empty (resolved dynamically) |
| 221 | assert mgr.vision_model == "" |
| 222 | assert mgr.chat_model == "" |
| 223 | assert mgr.transcription_model == "" |
| 224 |
+9
-7
| --- video_processor/cli/commands.py | ||
| +++ video_processor/cli/commands.py | ||
| @@ -73,11 +73,11 @@ | ||
| 73 | 73 | ) |
| 74 | 74 | @click.option("--title", type=str, help="Title for the analysis report") |
| 75 | 75 | @click.option( |
| 76 | 76 | "--provider", |
| 77 | 77 | "-p", |
| 78 | - type=click.Choice(["auto", "openai", "anthropic", "gemini"]), | |
| 78 | + type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]), | |
| 79 | 79 | default="auto", |
| 80 | 80 | help="API provider", |
| 81 | 81 | ) |
| 82 | 82 | @click.option("--vision-model", type=str, default=None, help="Override model for vision tasks") |
| 83 | 83 | @click.option("--chat-model", type=str, default=None, help="Override model for LLM/chat tasks") |
| @@ -154,11 +154,11 @@ | ||
| 154 | 154 | ) |
| 155 | 155 | @click.option("--title", type=str, default="Batch Processing Results", help="Batch title") |
| 156 | 156 | @click.option( |
| 157 | 157 | "--provider", |
| 158 | 158 | "-p", |
| 159 | - type=click.Choice(["auto", "openai", "anthropic", "gemini"]), | |
| 159 | + type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]), | |
| 160 | 160 | default="auto", |
| 161 | 161 | help="API provider", |
| 162 | 162 | ) |
| 163 | 163 | @click.option("--vision-model", type=str, default=None, help="Override model for vision tasks") |
| 164 | 164 | @click.option("--chat-model", type=str, default=None, help="Override model for LLM/chat tasks") |
| @@ -343,12 +343,14 @@ | ||
| 343 | 343 | """Discover and display available models from all configured providers.""" |
| 344 | 344 | from video_processor.providers.discovery import discover_available_models |
| 345 | 345 | |
| 346 | 346 | models = discover_available_models(force_refresh=True) |
| 347 | 347 | if not models: |
| 348 | - click.echo("No models discovered. Check that at least one API key is set:") | |
| 349 | - click.echo(" OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY") | |
| 348 | + click.echo( | |
| 349 | + "No models discovered. Check that at least one API key is set or Ollama is running:" | |
| 350 | + ) | |
| 351 | + click.echo(" OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, or `ollama serve`") | |
| 350 | 352 | return |
| 351 | 353 | |
| 352 | 354 | by_provider: dict[str, list] = {} |
| 353 | 355 | for m in models: |
| 354 | 356 | by_provider.setdefault(m.provider, []).append(m) |
| @@ -417,11 +419,11 @@ | ||
| 417 | 419 | ) |
| 418 | 420 | @click.option("--title", type=str, help="Title for the analysis report") |
| 419 | 421 | @click.option( |
| 420 | 422 | "--provider", |
| 421 | 423 | "-p", |
| 422 | - type=click.Choice(["auto", "openai", "anthropic", "gemini"]), | |
| 424 | + type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]), | |
| 423 | 425 | default="auto", |
| 424 | 426 | help="API provider", |
| 425 | 427 | ) |
| 426 | 428 | @click.option("--vision-model", type=str, default=None, help="Override model for vision tasks") |
| 427 | 429 | @click.option("--chat-model", type=str, default=None, help="Override model for LLM/chat tasks") |
| @@ -511,11 +513,11 @@ | ||
| 511 | 513 | type=click.Choice(["basic", "standard", "comprehensive"]), |
| 512 | 514 | default="standard", |
| 513 | 515 | ) |
| 514 | 516 | provider = click.prompt( |
| 515 | 517 | " Provider", |
| 516 | - type=click.Choice(["auto", "openai", "anthropic", "gemini"]), | |
| 518 | + type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]), | |
| 517 | 519 | default="auto", |
| 518 | 520 | ) |
| 519 | 521 | ctx.invoke( |
| 520 | 522 | analyze, |
| 521 | 523 | input=input_path, |
| @@ -540,11 +542,11 @@ | ||
| 540 | 542 | type=click.Choice(["basic", "standard", "comprehensive"]), |
| 541 | 543 | default="standard", |
| 542 | 544 | ) |
| 543 | 545 | provider = click.prompt( |
| 544 | 546 | " Provider", |
| 545 | - type=click.Choice(["auto", "openai", "anthropic", "gemini"]), | |
| 547 | + type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]), | |
| 546 | 548 | default="auto", |
| 547 | 549 | ) |
| 548 | 550 | ctx.invoke( |
| 549 | 551 | batch, |
| 550 | 552 | input_dir=input_dir, |
| 551 | 553 |
| --- video_processor/cli/commands.py | |
| +++ video_processor/cli/commands.py | |
| @@ -73,11 +73,11 @@ | |
| 73 | ) |
| 74 | @click.option("--title", type=str, help="Title for the analysis report") |
| 75 | @click.option( |
| 76 | "--provider", |
| 77 | "-p", |
| 78 | type=click.Choice(["auto", "openai", "anthropic", "gemini"]), |
| 79 | default="auto", |
| 80 | help="API provider", |
| 81 | ) |
| 82 | @click.option("--vision-model", type=str, default=None, help="Override model for vision tasks") |
| 83 | @click.option("--chat-model", type=str, default=None, help="Override model for LLM/chat tasks") |
| @@ -154,11 +154,11 @@ | |
| 154 | ) |
| 155 | @click.option("--title", type=str, default="Batch Processing Results", help="Batch title") |
| 156 | @click.option( |
| 157 | "--provider", |
| 158 | "-p", |
| 159 | type=click.Choice(["auto", "openai", "anthropic", "gemini"]), |
| 160 | default="auto", |
| 161 | help="API provider", |
| 162 | ) |
| 163 | @click.option("--vision-model", type=str, default=None, help="Override model for vision tasks") |
| 164 | @click.option("--chat-model", type=str, default=None, help="Override model for LLM/chat tasks") |
| @@ -343,12 +343,14 @@ | |
| 343 | """Discover and display available models from all configured providers.""" |
| 344 | from video_processor.providers.discovery import discover_available_models |
| 345 | |
| 346 | models = discover_available_models(force_refresh=True) |
| 347 | if not models: |
| 348 | click.echo("No models discovered. Check that at least one API key is set:") |
| 349 | click.echo(" OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY") |
| 350 | return |
| 351 | |
| 352 | by_provider: dict[str, list] = {} |
| 353 | for m in models: |
| 354 | by_provider.setdefault(m.provider, []).append(m) |
| @@ -417,11 +419,11 @@ | |
| 417 | ) |
| 418 | @click.option("--title", type=str, help="Title for the analysis report") |
| 419 | @click.option( |
| 420 | "--provider", |
| 421 | "-p", |
| 422 | type=click.Choice(["auto", "openai", "anthropic", "gemini"]), |
| 423 | default="auto", |
| 424 | help="API provider", |
| 425 | ) |
| 426 | @click.option("--vision-model", type=str, default=None, help="Override model for vision tasks") |
| 427 | @click.option("--chat-model", type=str, default=None, help="Override model for LLM/chat tasks") |
| @@ -511,11 +513,11 @@ | |
| 511 | type=click.Choice(["basic", "standard", "comprehensive"]), |
| 512 | default="standard", |
| 513 | ) |
| 514 | provider = click.prompt( |
| 515 | " Provider", |
| 516 | type=click.Choice(["auto", "openai", "anthropic", "gemini"]), |
| 517 | default="auto", |
| 518 | ) |
| 519 | ctx.invoke( |
| 520 | analyze, |
| 521 | input=input_path, |
| @@ -540,11 +542,11 @@ | |
| 540 | type=click.Choice(["basic", "standard", "comprehensive"]), |
| 541 | default="standard", |
| 542 | ) |
| 543 | provider = click.prompt( |
| 544 | " Provider", |
| 545 | type=click.Choice(["auto", "openai", "anthropic", "gemini"]), |
| 546 | default="auto", |
| 547 | ) |
| 548 | ctx.invoke( |
| 549 | batch, |
| 550 | input_dir=input_dir, |
| 551 |
| --- video_processor/cli/commands.py | |
| +++ video_processor/cli/commands.py | |
| @@ -73,11 +73,11 @@ | |
| 73 | ) |
| 74 | @click.option("--title", type=str, help="Title for the analysis report") |
| 75 | @click.option( |
| 76 | "--provider", |
| 77 | "-p", |
| 78 | type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]), |
| 79 | default="auto", |
| 80 | help="API provider", |
| 81 | ) |
| 82 | @click.option("--vision-model", type=str, default=None, help="Override model for vision tasks") |
| 83 | @click.option("--chat-model", type=str, default=None, help="Override model for LLM/chat tasks") |
| @@ -154,11 +154,11 @@ | |
| 154 | ) |
| 155 | @click.option("--title", type=str, default="Batch Processing Results", help="Batch title") |
| 156 | @click.option( |
| 157 | "--provider", |
| 158 | "-p", |
| 159 | type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]), |
| 160 | default="auto", |
| 161 | help="API provider", |
| 162 | ) |
| 163 | @click.option("--vision-model", type=str, default=None, help="Override model for vision tasks") |
| 164 | @click.option("--chat-model", type=str, default=None, help="Override model for LLM/chat tasks") |
| @@ -343,12 +343,14 @@ | |
| 343 | """Discover and display available models from all configured providers.""" |
| 344 | from video_processor.providers.discovery import discover_available_models |
| 345 | |
| 346 | models = discover_available_models(force_refresh=True) |
| 347 | if not models: |
| 348 | click.echo( |
| 349 | "No models discovered. Check that at least one API key is set or Ollama is running:" |
| 350 | ) |
| 351 | click.echo(" OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, or `ollama serve`") |
| 352 | return |
| 353 | |
| 354 | by_provider: dict[str, list] = {} |
| 355 | for m in models: |
| 356 | by_provider.setdefault(m.provider, []).append(m) |
| @@ -417,11 +419,11 @@ | |
| 419 | ) |
| 420 | @click.option("--title", type=str, help="Title for the analysis report") |
| 421 | @click.option( |
| 422 | "--provider", |
| 423 | "-p", |
| 424 | type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]), |
| 425 | default="auto", |
| 426 | help="API provider", |
| 427 | ) |
| 428 | @click.option("--vision-model", type=str, default=None, help="Override model for vision tasks") |
| 429 | @click.option("--chat-model", type=str, default=None, help="Override model for LLM/chat tasks") |
| @@ -511,11 +513,11 @@ | |
| 513 | type=click.Choice(["basic", "standard", "comprehensive"]), |
| 514 | default="standard", |
| 515 | ) |
| 516 | provider = click.prompt( |
| 517 | " Provider", |
| 518 | type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]), |
| 519 | default="auto", |
| 520 | ) |
| 521 | ctx.invoke( |
| 522 | analyze, |
| 523 | input=input_path, |
| @@ -540,11 +542,11 @@ | |
| 542 | type=click.Choice(["basic", "standard", "comprehensive"]), |
| 543 | default="standard", |
| 544 | ) |
| 545 | provider = click.prompt( |
| 546 | " Provider", |
| 547 | type=click.Choice(["auto", "openai", "anthropic", "gemini", "ollama"]), |
| 548 | default="auto", |
| 549 | ) |
| 550 | ctx.invoke( |
| 551 | batch, |
| 552 | input_dir=input_dir, |
| 553 |
| --- video_processor/providers/discovery.py | ||
| +++ video_processor/providers/discovery.py | ||
| @@ -75,10 +75,22 @@ | ||
| 75 | 75 | logger.info(f"Discovered {len(models)} Gemini models") |
| 76 | 76 | all_models.extend(models) |
| 77 | 77 | except Exception as e: |
| 78 | 78 | logger.warning(f"Gemini discovery failed: {e}") |
| 79 | 79 | |
| 80 | + # Ollama (local, no API key needed) | |
| 81 | + try: | |
| 82 | + from video_processor.providers.ollama_provider import OllamaProvider | |
| 83 | + | |
| 84 | + if OllamaProvider.is_available(): | |
| 85 | + provider = OllamaProvider() | |
| 86 | + models = provider.list_models() | |
| 87 | + logger.info(f"Discovered {len(models)} Ollama models") | |
| 88 | + all_models.extend(models) | |
| 89 | + except Exception as e: | |
| 90 | + logger.info(f"Ollama discovery skipped: {e}") | |
| 91 | + | |
| 80 | 92 | # Sort by provider then id |
| 81 | 93 | all_models.sort(key=lambda m: (m.provider, m.id)) |
| 82 | 94 | _cached_models = all_models |
| 83 | 95 | logger.info(f"Total discovered models: {len(all_models)}") |
| 84 | 96 | return all_models |
| 85 | 97 |
| --- video_processor/providers/discovery.py | |
| +++ video_processor/providers/discovery.py | |
| @@ -75,10 +75,22 @@ | |
| 75 | logger.info(f"Discovered {len(models)} Gemini models") |
| 76 | all_models.extend(models) |
| 77 | except Exception as e: |
| 78 | logger.warning(f"Gemini discovery failed: {e}") |
| 79 | |
| 80 | # Sort by provider then id |
| 81 | all_models.sort(key=lambda m: (m.provider, m.id)) |
| 82 | _cached_models = all_models |
| 83 | logger.info(f"Total discovered models: {len(all_models)}") |
| 84 | return all_models |
| 85 |
| --- video_processor/providers/discovery.py | |
| +++ video_processor/providers/discovery.py | |
| @@ -75,10 +75,22 @@ | |
| 75 | logger.info(f"Discovered {len(models)} Gemini models") |
| 76 | all_models.extend(models) |
| 77 | except Exception as e: |
| 78 | logger.warning(f"Gemini discovery failed: {e}") |
| 79 | |
| 80 | # Ollama (local, no API key needed) |
| 81 | try: |
| 82 | from video_processor.providers.ollama_provider import OllamaProvider |
| 83 | |
| 84 | if OllamaProvider.is_available(): |
| 85 | provider = OllamaProvider() |
| 86 | models = provider.list_models() |
| 87 | logger.info(f"Discovered {len(models)} Ollama models") |
| 88 | all_models.extend(models) |
| 89 | except Exception as e: |
| 90 | logger.info(f"Ollama discovery skipped: {e}") |
| 91 | |
| 92 | # Sort by provider then id |
| 93 | all_models.sort(key=lambda m: (m.provider, m.id)) |
| 94 | _cached_models = all_models |
| 95 | logger.info(f"Total discovered models: {len(all_models)}") |
| 96 | return all_models |
| 97 |
+27
-2
| --- video_processor/providers/manager.py | ||
| +++ video_processor/providers/manager.py | ||
| @@ -91,10 +91,15 @@ | ||
| 91 | 91 | "gemini": { |
| 92 | 92 | "chat": "gemini-2.5-flash", |
| 93 | 93 | "vision": "gemini-2.5-flash", |
| 94 | 94 | "audio": "gemini-2.5-flash", |
| 95 | 95 | }, |
| 96 | + "ollama": { | |
| 97 | + "chat": "", | |
| 98 | + "vision": "", | |
| 99 | + "audio": "", | |
| 100 | + }, | |
| 96 | 101 | } |
| 97 | 102 | return defaults.get(provider, {}).get(capability, "") |
| 98 | 103 | |
| 99 | 104 | def _get_provider(self, provider_name: str) -> BaseProvider: |
| 100 | 105 | """Lazily initialize and cache a provider instance.""" |
| @@ -109,10 +114,14 @@ | ||
| 109 | 114 | self._providers[provider_name] = AnthropicProvider() |
| 110 | 115 | elif provider_name == "gemini": |
| 111 | 116 | from video_processor.providers.gemini_provider import GeminiProvider |
| 112 | 117 | |
| 113 | 118 | self._providers[provider_name] = GeminiProvider() |
| 119 | + elif provider_name == "ollama": | |
| 120 | + from video_processor.providers.ollama_provider import OllamaProvider | |
| 121 | + | |
| 122 | + self._providers[provider_name] = OllamaProvider() | |
| 114 | 123 | else: |
| 115 | 124 | raise ValueError(f"Unknown provider: {provider_name}") |
| 116 | 125 | return self._providers[provider_name] |
| 117 | 126 | |
| 118 | 127 | def _provider_for_model(self, model_id: str) -> str: |
| @@ -127,15 +136,18 @@ | ||
| 127 | 136 | return "openai" |
| 128 | 137 | if model_id.startswith("claude-"): |
| 129 | 138 | return "anthropic" |
| 130 | 139 | if model_id.startswith("gemini-"): |
| 131 | 140 | return "gemini" |
| 132 | - # Try discovery | |
| 141 | + # Try discovery (exact match, then prefix match for ollama name:tag format) | |
| 133 | 142 | models = self._get_available_models() |
| 134 | 143 | for m in models: |
| 135 | 144 | if m.id == model_id: |
| 136 | 145 | return m.provider |
| 146 | + for m in models: | |
| 147 | + if m.id.startswith(model_id + ":"): | |
| 148 | + return m.provider | |
| 137 | 149 | raise ValueError(f"Cannot determine provider for model: {model_id}") |
| 138 | 150 | |
| 139 | 151 | def _get_available_models(self) -> list[ModelInfo]: |
| 140 | 152 | if self._available_models is None: |
| 141 | 153 | self._available_models = discover_available_models() |
| @@ -159,14 +171,27 @@ | ||
| 159 | 171 | try: |
| 160 | 172 | self._get_provider(prov) |
| 161 | 173 | return prov, model |
| 162 | 174 | except (ValueError, ImportError): |
| 163 | 175 | continue |
| 176 | + | |
| 177 | + # Fallback: try Ollama if available (no API key needed) | |
| 178 | + try: | |
| 179 | + from video_processor.providers.ollama_provider import OllamaProvider | |
| 180 | + | |
| 181 | + if OllamaProvider.is_available(): | |
| 182 | + provider = self._get_provider("ollama") | |
| 183 | + models = provider.list_models() | |
| 184 | + for m in models: | |
| 185 | + if capability in m.capabilities: | |
| 186 | + return "ollama", m.id | |
| 187 | + except Exception: | |
| 188 | + pass | |
| 164 | 189 | |
| 165 | 190 | raise RuntimeError( |
| 166 | 191 | f"No provider available for capability '{capability}'. " |
| 167 | - "Set an API key for at least one provider." | |
| 192 | + "Set an API key for at least one provider, or start Ollama." | |
| 168 | 193 | ) |
| 169 | 194 | |
| 170 | 195 | def _track(self, provider: BaseProvider, prov_name: str, model: str) -> None: |
| 171 | 196 | """Record usage from the last API call on a provider.""" |
| 172 | 197 | last = getattr(provider, "_last_usage", None) |
| 173 | 198 | |
| 174 | 199 | ADDED video_processor/providers/ollama_provider.py |
| --- video_processor/providers/manager.py | |
| +++ video_processor/providers/manager.py | |
| @@ -91,10 +91,15 @@ | |
| 91 | "gemini": { |
| 92 | "chat": "gemini-2.5-flash", |
| 93 | "vision": "gemini-2.5-flash", |
| 94 | "audio": "gemini-2.5-flash", |
| 95 | }, |
| 96 | } |
| 97 | return defaults.get(provider, {}).get(capability, "") |
| 98 | |
| 99 | def _get_provider(self, provider_name: str) -> BaseProvider: |
| 100 | """Lazily initialize and cache a provider instance.""" |
| @@ -109,10 +114,14 @@ | |
| 109 | self._providers[provider_name] = AnthropicProvider() |
| 110 | elif provider_name == "gemini": |
| 111 | from video_processor.providers.gemini_provider import GeminiProvider |
| 112 | |
| 113 | self._providers[provider_name] = GeminiProvider() |
| 114 | else: |
| 115 | raise ValueError(f"Unknown provider: {provider_name}") |
| 116 | return self._providers[provider_name] |
| 117 | |
| 118 | def _provider_for_model(self, model_id: str) -> str: |
| @@ -127,15 +136,18 @@ | |
| 127 | return "openai" |
| 128 | if model_id.startswith("claude-"): |
| 129 | return "anthropic" |
| 130 | if model_id.startswith("gemini-"): |
| 131 | return "gemini" |
| 132 | # Try discovery |
| 133 | models = self._get_available_models() |
| 134 | for m in models: |
| 135 | if m.id == model_id: |
| 136 | return m.provider |
| 137 | raise ValueError(f"Cannot determine provider for model: {model_id}") |
| 138 | |
| 139 | def _get_available_models(self) -> list[ModelInfo]: |
| 140 | if self._available_models is None: |
| 141 | self._available_models = discover_available_models() |
| @@ -159,14 +171,27 @@ | |
| 159 | try: |
| 160 | self._get_provider(prov) |
| 161 | return prov, model |
| 162 | except (ValueError, ImportError): |
| 163 | continue |
| 164 | |
| 165 | raise RuntimeError( |
| 166 | f"No provider available for capability '{capability}'. " |
| 167 | "Set an API key for at least one provider." |
| 168 | ) |
| 169 | |
| 170 | def _track(self, provider: BaseProvider, prov_name: str, model: str) -> None: |
| 171 | """Record usage from the last API call on a provider.""" |
| 172 | last = getattr(provider, "_last_usage", None) |
| 173 | |
| 174 | DDED video_processor/providers/ollama_provider.py |
| --- video_processor/providers/manager.py | |
| +++ video_processor/providers/manager.py | |
| @@ -91,10 +91,15 @@ | |
| 91 | "gemini": { |
| 92 | "chat": "gemini-2.5-flash", |
| 93 | "vision": "gemini-2.5-flash", |
| 94 | "audio": "gemini-2.5-flash", |
| 95 | }, |
| 96 | "ollama": { |
| 97 | "chat": "", |
| 98 | "vision": "", |
| 99 | "audio": "", |
| 100 | }, |
| 101 | } |
| 102 | return defaults.get(provider, {}).get(capability, "") |
| 103 | |
| 104 | def _get_provider(self, provider_name: str) -> BaseProvider: |
| 105 | """Lazily initialize and cache a provider instance.""" |
| @@ -109,10 +114,14 @@ | |
| 114 | self._providers[provider_name] = AnthropicProvider() |
| 115 | elif provider_name == "gemini": |
| 116 | from video_processor.providers.gemini_provider import GeminiProvider |
| 117 | |
| 118 | self._providers[provider_name] = GeminiProvider() |
| 119 | elif provider_name == "ollama": |
| 120 | from video_processor.providers.ollama_provider import OllamaProvider |
| 121 | |
| 122 | self._providers[provider_name] = OllamaProvider() |
| 123 | else: |
| 124 | raise ValueError(f"Unknown provider: {provider_name}") |
| 125 | return self._providers[provider_name] |
| 126 | |
| 127 | def _provider_for_model(self, model_id: str) -> str: |
| @@ -127,15 +136,18 @@ | |
| 136 | return "openai" |
| 137 | if model_id.startswith("claude-"): |
| 138 | return "anthropic" |
| 139 | if model_id.startswith("gemini-"): |
| 140 | return "gemini" |
| 141 | # Try discovery (exact match, then prefix match for ollama name:tag format) |
| 142 | models = self._get_available_models() |
| 143 | for m in models: |
| 144 | if m.id == model_id: |
| 145 | return m.provider |
| 146 | for m in models: |
| 147 | if m.id.startswith(model_id + ":"): |
| 148 | return m.provider |
| 149 | raise ValueError(f"Cannot determine provider for model: {model_id}") |
| 150 | |
| 151 | def _get_available_models(self) -> list[ModelInfo]: |
| 152 | if self._available_models is None: |
| 153 | self._available_models = discover_available_models() |
| @@ -159,14 +171,27 @@ | |
| 171 | try: |
| 172 | self._get_provider(prov) |
| 173 | return prov, model |
| 174 | except (ValueError, ImportError): |
| 175 | continue |
| 176 | |
| 177 | # Fallback: try Ollama if available (no API key needed) |
| 178 | try: |
| 179 | from video_processor.providers.ollama_provider import OllamaProvider |
| 180 | |
| 181 | if OllamaProvider.is_available(): |
| 182 | provider = self._get_provider("ollama") |
| 183 | models = provider.list_models() |
| 184 | for m in models: |
| 185 | if capability in m.capabilities: |
| 186 | return "ollama", m.id |
| 187 | except Exception: |
| 188 | pass |
| 189 | |
| 190 | raise RuntimeError( |
| 191 | f"No provider available for capability '{capability}'. " |
| 192 | "Set an API key for at least one provider, or start Ollama." |
| 193 | ) |
| 194 | |
| 195 | def _track(self, provider: BaseProvider, prov_name: str, model: str) -> None: |
| 196 | """Record usage from the last API call on a provider.""" |
| 197 | last = getattr(provider, "_last_usage", None) |
| 198 | |
| 199 | DDED video_processor/providers/ollama_provider.py |
| --- a/video_processor/providers/ollama_provider.py | ||
| +++ b/video_processor/providers/ollama_provider.py | ||
| @@ -0,0 +1,109 @@ | ||
| 1 | +"""Ollama provider implementation using OpenAI-compatible API.""" | |
| 2 | + | |
| 3 | +import base64 | |
| 4 | +import logging | |
| 5 | +import os | |
| 6 | +from pathlib import Path | |
| 7 | +from typing import Optional | |
| 8 | + | |
| 9 | +import requests | |
| 10 | +from openai import OpenAI | |
| 11 | + | |
| 12 | +from video_processor.providers.base import BaseProvider, ModelInfo | |
| 13 | + | |
| 14 | +logger = logging.getLogger(__name__) | |
| 15 | + | |
| 16 | +# Known vision-capable model families (base name before the colon/tag) | |
| 17 | +_VISION_FAMILIES = { | |
| 18 | + "llava", | |
| 19 | + "llava-llama3", | |
| 20 | + "llava-phi3", | |
| 21 | + "llama3.2-vision", | |
| 22 | + "moondream", | |
| 23 | + "bakllava", | |
| 24 | + "minicpm-v", | |
| 25 | + "deepseek-vl", | |
| 26 | + "internvl2", | |
| 27 | +} | |
| 28 | + | |
| 29 | +DEFAULT_HOST = "http://localhost:11434" | |
| 30 | + | |
| 31 | + | |
| 32 | +class OllamaProvider(BaseProvider): | |
| 33 | + """Ollama local LLM provider via OpenAI-compatible API.""" | |
| 34 | + | |
| 35 | + provider_name = model = model or self._dprompt_tokens", 0) self.host ULT_HOST) | |
| 36 | + model = model or secmethod | |
| 37 | + def is_available(host: Optional[str] = None) -> bool: | |
| 38 | + """Check if an Ollama server is running and reachable.""" | |
| 39 | + host = host or os.getenv("OLLAMA_HOST", DEFAULT_HOST) | |
| 40 | + try: | |
| 41 | + resp = requests.get(f"{host}/api/tags", timeout=3) | |
| 42 | + return resp.status_code == 200 | |
| 43 | + except Exception: | |
| 44 | + return False | |
| 45 | + | |
| 46 | + @property | |
| 47 | + def _default_model(self) -> str: | |
| 48 | + models = self._get_models() | |
| 49 | + for m in models: | |
| 50 | + if "chat" in m.capabilities: | |
| 51 | + return m.id | |
| 52 | + return "llama3.2:latest" | |
| 53 | + | |
| 54 | + @property | |
| 55 | + def _default_vision_model(self) -> Optional[str]: | |
| 56 | + models = self._get_models() | |
| 57 | + for m in models: | |
| 58 | + if "vision" in m.capabilities: | |
| 59 | + return m.id | |
| 60 | + return None | |
| 61 | + | |
| 62 | + def _get_models(self) -> list[ModelInfo]: | |
| 63 | + if self._models_cache is None: | |
| 64 | + self._models_cache = self.list_models() | |
| 65 | + return self._models_cache | |
| 66 | + | |
| 67 | + def chat( | |
| 68 | + self, | |
| 69 | + messages: list[ model = model or self._dprompt_tokens", 0) self.host ULT_HOST) | |
| 70 | + model = model or setions.create( | |
| 71 | + model=model, | |
| 72 | + messages=messages, | |
| 73 | + max_tokens=max_tokens, | |
| 74 | + temperature=temperature, | |
| 75 | + ) | |
| 76 | + self._last_usage = { | |
| 77 | + "input_tokens": (getattr(response.usage, "prompt_tokens", 0) or 0) | |
| 78 | + if response.usage | |
| 79 | + else 0, | |
| 80 | + "output_tokens": (getattr(response.usage, "completion_tokens", 0) or 0) | |
| 81 | + if response.usage | |
| 82 | + else 0, | |
| 83 | + } | |
| 84 | + return response.choices[0].message.content or "" | |
| 85 | + | |
| 86 | + def analyze_image( | |
| 87 | + self, | |
| 88 | + image_bytes: bytes, | |
| 89 | + prompt: str, | |
| 90 | + max_tokens: int = 4096, | |
| 91 | + model: Optional[str] = None, | |
| 92 | + ) -> str: | |
| 93 | + model = model or self._default_vision_model | |
| 94 | + if not model: | |
| 95 | + raise RuntimeError( | |
| 96 | + "No Ollama vision model available. Install a multimodal model: ollama pull llava" | |
| 97 | + ) | |
| 98 | + b64 = base64.b64encode(image_bytes).decode() | |
| 99 | + response = self.client.chat.completions.create( | |
| 100 | + model=model, | |
| 101 | + messages=[ | |
| 102 | + { | |
| 103 | + "role": "user", | |
| 104 | + "content": [ | |
| 105 | + {"type": "text", "text": prompt}, | |
| 106 | + { | |
| 107 | + "type": "image_url", | |
| 108 | + "image_url": {"url": f"data:image/jpeg;base64,{b64}"}, | |
| 109 | + |
| --- a/video_processor/providers/ollama_provider.py | |
| +++ b/video_processor/providers/ollama_provider.py | |
| @@ -0,0 +1,109 @@ | |
| --- a/video_processor/providers/ollama_provider.py | |
| +++ b/video_processor/providers/ollama_provider.py | |
| @@ -0,0 +1,109 @@ | |
| 1 | """Ollama provider implementation using OpenAI-compatible API.""" |
| 2 | |
| 3 | import base64 |
| 4 | import logging |
| 5 | import os |
| 6 | from pathlib import Path |
| 7 | from typing import Optional |
| 8 | |
| 9 | import requests |
| 10 | from openai import OpenAI |
| 11 | |
| 12 | from video_processor.providers.base import BaseProvider, ModelInfo |
| 13 | |
| 14 | logger = logging.getLogger(__name__) |
| 15 | |
| 16 | # Known vision-capable model families (base name before the colon/tag) |
| 17 | _VISION_FAMILIES = { |
| 18 | "llava", |
| 19 | "llava-llama3", |
| 20 | "llava-phi3", |
| 21 | "llama3.2-vision", |
| 22 | "moondream", |
| 23 | "bakllava", |
| 24 | "minicpm-v", |
| 25 | "deepseek-vl", |
| 26 | "internvl2", |
| 27 | } |
| 28 | |
| 29 | DEFAULT_HOST = "http://localhost:11434" |
| 30 | |
| 31 | |
| 32 | class OllamaProvider(BaseProvider): |
| 33 | """Ollama local LLM provider via OpenAI-compatible API.""" |
| 34 | |
| 35 | provider_name = model = model or self._dprompt_tokens", 0) self.host ULT_HOST) |
| 36 | model = model or secmethod |
| 37 | def is_available(host: Optional[str] = None) -> bool: |
| 38 | """Check if an Ollama server is running and reachable.""" |
| 39 | host = host or os.getenv("OLLAMA_HOST", DEFAULT_HOST) |
| 40 | try: |
| 41 | resp = requests.get(f"{host}/api/tags", timeout=3) |
| 42 | return resp.status_code == 200 |
| 43 | except Exception: |
| 44 | return False |
| 45 | |
| 46 | @property |
| 47 | def _default_model(self) -> str: |
| 48 | models = self._get_models() |
| 49 | for m in models: |
| 50 | if "chat" in m.capabilities: |
| 51 | return m.id |
| 52 | return "llama3.2:latest" |
| 53 | |
| 54 | @property |
| 55 | def _default_vision_model(self) -> Optional[str]: |
| 56 | models = self._get_models() |
| 57 | for m in models: |
| 58 | if "vision" in m.capabilities: |
| 59 | return m.id |
| 60 | return None |
| 61 | |
| 62 | def _get_models(self) -> list[ModelInfo]: |
| 63 | if self._models_cache is None: |
| 64 | self._models_cache = self.list_models() |
| 65 | return self._models_cache |
| 66 | |
| 67 | def chat( |
| 68 | self, |
| 69 | messages: list[ model = model or self._dprompt_tokens", 0) self.host ULT_HOST) |
| 70 | model = model or setions.create( |
| 71 | model=model, |
| 72 | messages=messages, |
| 73 | max_tokens=max_tokens, |
| 74 | temperature=temperature, |
| 75 | ) |
| 76 | self._last_usage = { |
| 77 | "input_tokens": (getattr(response.usage, "prompt_tokens", 0) or 0) |
| 78 | if response.usage |
| 79 | else 0, |
| 80 | "output_tokens": (getattr(response.usage, "completion_tokens", 0) or 0) |
| 81 | if response.usage |
| 82 | else 0, |
| 83 | } |
| 84 | return response.choices[0].message.content or "" |
| 85 | |
| 86 | def analyze_image( |
| 87 | self, |
| 88 | image_bytes: bytes, |
| 89 | prompt: str, |
| 90 | max_tokens: int = 4096, |
| 91 | model: Optional[str] = None, |
| 92 | ) -> str: |
| 93 | model = model or self._default_vision_model |
| 94 | if not model: |
| 95 | raise RuntimeError( |
| 96 | "No Ollama vision model available. Install a multimodal model: ollama pull llava" |
| 97 | ) |
| 98 | b64 = base64.b64encode(image_bytes).decode() |
| 99 | response = self.client.chat.completions.create( |
| 100 | model=model, |
| 101 | messages=[ |
| 102 | { |
| 103 | "role": "user", |
| 104 | "content": [ |
| 105 | {"type": "text", "text": prompt}, |
| 106 | { |
| 107 | "type": "image_url", |
| 108 | "image_url": {"url": f"data:image/jpeg;base64,{b64}"}, |
| 109 |