PlanOpticon

planopticon / docs / architecture / providers.md

Source Rendered

Blame History Raw 119 lines

1	`# Provider System`
2
3	`## Overview`
4
5	`PlanOpticon supports multiple AI providers through a unified abstraction layer. Default models favor cost-effective options (Haiku, GPT-4o-mini, Gemini Flash) for routine tasks, with more capable models available when needed.`
6
7	`## Supported providers`
8
9	`\| Provider \| Chat \| Vision \| Transcription \| Env Variable \|`
10	`\|----------\|------\|--------\|--------------\|--------------\|`
11	\| OpenAI \| GPT-4o-mini, GPT-4o \| GPT-4o-mini, GPT-4o \| Whisper-1 \| `OPENAI_API_KEY` \|
12	\| Anthropic \| Claude Haiku, Sonnet, Opus \| Claude Haiku, Sonnet, Opus \| — \| `ANTHROPIC_API_KEY` \|
13	\| Google Gemini \| Gemini Flash, Pro \| Gemini Flash, Pro \| Gemini Flash \| `GEMINI_API_KEY` \|
14	\| Azure OpenAI \| GPT-4o-mini, GPT-4o \| GPT-4o-mini, GPT-4o \| Whisper-1 \| `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT` \|
15	\| Together AI \| Llama, Mixtral, etc. \| Llava \| — \| `TOGETHER_API_KEY` \|
16	\| Fireworks AI \| Llama, Mixtral, etc. \| Llava \| — \| `FIREWORKS_API_KEY` \|
17	\| Cerebras \| Llama (fast inference) \| — \| — \| `CEREBRAS_API_KEY` \|
18	\| xAI \| Grok \| Grok \| — \| `XAI_API_KEY` \|
19	\| Ollama (local) \| Any installed model \| llava, moondream, etc. \| — (use local Whisper) \| `OLLAMA_HOST` \|
20
21	`## Default models`
22
23	`PlanOpticon defaults to cheap, fast models for cost efficiency:`
24
25	`\| Task \| Default model \|`
26	`\|------\|--------------\|`
27	`\| Vision (diagrams) \| Gemini Flash \|`
28	`\| Chat (analysis) \| Claude Haiku \|`
29	`\| Transcription \| Local Whisper (fallback: Whisper-1) \|`
30
31	Use `--vision-model` and `--chat-model` to override with more capable models when needed (e.g., `--chat-model claude-sonnet-4-20250514` for complex analysis).
32
33	`## Ollama (offline mode)`
34
35	`[Ollama](https://ollama.com) enables fully offline operation with no API keys required. PlanOpticon connects via Ollama's OpenAI-compatible API.`
36
37	```bash
38	`# Install and start Ollama`
39	`ollama serve`
40
41	`# Pull a chat model`
42	`ollama pull llama3.2`
43
44	`# Pull a vision model (for diagram analysis)`
45	`ollama pull llava`
46	```
47
48	`PlanOpticon auto-detects Ollama when it's running. To force Ollama:`
49
50	```bash
51	`planopticon analyze -i video.mp4 -o ./out --provider ollama`
52	```
53
54	Configure a non-default host via `OLLAMA_HOST`:
55
56	```bash
57	`export OLLAMA_HOST=http://192.168.1.100:11434`
58	```
59
60	`## Auto-discovery`
61
62	On startup, `ProviderManager` checks which API keys are configured, queries each provider's API, and checks for a running Ollama server to discover available models:
63
64	```python
65	`from video_processor.providers.manager import ProviderManager`
66
67	`pm = ProviderManager()`
68	`# Automatically discovers models from all configured providers + Ollama`
69	```
70
71	`## Routing preferences`
72
73	`Each task type has a default preference order (cheapest first):`
74
75	`\| Task \| Preference \|`
76	`\|------\|-----------\|`
77	`\| Vision \| Gemini Flash → GPT-4o-mini → Claude Haiku → Ollama \|`
78	`\| Chat \| Claude Haiku → GPT-4o-mini → Gemini Flash → Ollama \|`
79	`\| Transcription \| Local Whisper → Whisper-1 → Gemini Flash \|`
80
81	`Ollama acts as the last-resort fallback -- if no cloud API keys are set but Ollama is running, it is used automatically.`
82
83	`## Manual override`
84
85	```python
86	`pm = ProviderManager(`
87	`vision_model="gpt-4o",`
88	`chat_model="claude-sonnet-4-20250514",`
89	`provider="openai", # Force a specific provider`
90	`)`
91
92	`# Use a cheap model for bulk processing`
93	`pm = ProviderManager(`
94	`chat_model="claude-haiku-3-5-20241022",`
95	`vision_model="gemini-2.0-flash",`
96	`)`
97
98	`# Or use Ollama for fully offline processing`
99	`pm = ProviderManager(provider="ollama")`
100
101	`# Use Azure OpenAI`
102	`pm = ProviderManager(provider="azure")`
103
104	`# Use Together AI for open-source models`
105	`pm = ProviderManager(provider="together", chat_model="meta-llama/Llama-3.3-70B-Instruct-Turbo")`
106	```
107
108	`## BaseProvider interface`
109
110	`All providers implement:`
111
112	```python
113	`class BaseProvider(ABC):`
114	`def chat(messages, max_tokens, temperature) -> str`
115	`def analyze_image(image_path, prompt, max_tokens) -> str`
116	`def transcribe_audio(audio_path) -> dict`
117	`def list_models() -> List[ModelInfo]`
118	```
119

PlanOpticon

Keyboard Shortcuts