PlanOpticon

planopticon / docs / api / providers.md

Source Rendered

Blame History Raw 504 lines

1	`# Providers API Reference`
2
3	`::: video_processor.providers.base`
4
5	`::: video_processor.providers.manager`
6
7	`::: video_processor.providers.discovery`
8
9	`---`
10
11	`## Overview`
12
13	`The provider system abstracts LLM API calls behind a unified interface. It supports multiple providers (OpenAI, Anthropic, Gemini, Ollama, and OpenAI-compatible services), automatic model discovery, capability-based routing, and usage tracking.`
14
15	`Key components:`
16
17	- `BaseProvider` -- abstract interface that all providers implement
18	- `ProviderRegistry` -- global registry mapping provider names to classes
19	- `ProviderManager` -- high-level router that picks the best provider for each task
20	- `discover_available_models()` -- scans all configured providers for available models
21
22	`---`
23
24	`## BaseProvider (ABC)`
25
26	```python
27	`from video_processor.providers.base import BaseProvider`
28	```
29
30	`Abstract base class that all provider implementations must subclass. Defines the four core capabilities: chat, vision, audio transcription, and model listing.`
31
32	`Class attribute:`
33
34	`\| Attribute \| Type \| Description \|`
35	`\|---\|---\|---\|`
36	\| `provider_name` \| `str` \| Identifier for this provider (e.g., `"openai"`, `"anthropic"`) \|
37
38	`### chat()`
39
40	```python
41	`def chat(`
42	`self,`
43	`messages: list[dict],`
44	`max_tokens: int = 4096,`
45	`temperature: float = 0.7,`
46	`model: Optional[str] = None,`
47	`) -> str`
48	```
49
50	`Send a chat completion request.`
51
52	`Parameters:`
53
54	`\| Parameter \| Type \| Default \| Description \|`
55	`\|---\|---\|---\|---\|`
56	\| `messages` \| `list[dict]` \| required \| OpenAI-format message list (`role`, `content`) \|
57	\| `max_tokens` \| `int` \| `4096` \| Maximum tokens in the response \|
58	\| `temperature` \| `float` \| `0.7` \| Sampling temperature \|
59	\| `model` \| `Optional[str]` \| `None` \| Override model ID \|
60
61	Returns: `str` -- the assistant's text response.
62
63	`### analyze_image()`
64
65	```python
66	`def analyze_image(`
67	`self,`
68	`image_bytes: bytes,`
69	`prompt: str,`
70	`max_tokens: int = 4096,`
71	`model: Optional[str] = None,`
72	`) -> str`
73	```
74
75	`Analyze an image with a text prompt using a vision-capable model.`
76
77	`Parameters:`
78
79	`\| Parameter \| Type \| Default \| Description \|`
80	`\|---\|---\|---\|---\|`
81	\| `image_bytes` \| `bytes` \| required \| Raw image data (JPEG, PNG, etc.) \|
82	\| `prompt` \| `str` \| required \| Analysis instructions \|
83	\| `max_tokens` \| `int` \| `4096` \| Maximum tokens in the response \|
84	\| `model` \| `Optional[str]` \| `None` \| Override model ID \|
85
86	Returns: `str` -- the assistant's analysis text.
87
88	`### transcribe_audio()`
89
90	```python
91	`def transcribe_audio(`
92	`self,`
93	`audio_path: str \| Path,`
94	`language: Optional[str] = None,`
95	`model: Optional[str] = None,`
96	`) -> dict`
97	```
98
99	`Transcribe an audio file.`
100
101	`Parameters:`
102
103	`\| Parameter \| Type \| Default \| Description \|`
104	`\|---\|---\|---\|---\|`
105	\| `audio_path` \| `str \\| Path` \| required \| Path to the audio file \|
106	\| `language` \| `Optional[str]` \| `None` \| Language hint (ISO 639-1 code) \|
107	\| `model` \| `Optional[str]` \| `None` \| Override model ID \|
108
109	Returns: `dict` -- transcription result with keys `text`, `segments`, `duration`, etc.
110
111	`### list_models()`
112
113	```python
114	`def list_models(self) -> list[ModelInfo]`
115	```
116
117	`Discover available models from this provider's API.`
118
119	Returns: `list[ModelInfo]` -- available models with capability metadata.
120
121	`---`
122
123	`## ModelInfo`
124
125	```python
126	`from video_processor.providers.base import ModelInfo`
127	```
128
129	`Pydantic model describing an available model from a provider.`
130
131	`\| Field \| Type \| Default \| Description \|`
132	`\|---\|---\|---\|---\|`
133	\| `id` \| `str` \| required \| Model identifier (e.g., `"gpt-4o"`, `"claude-haiku-4-5-20251001"`) \|
134	\| `provider` \| `str` \| required \| Provider name (e.g., `"openai"`, `"anthropic"`, `"gemini"`) \|
135	\| `display_name` \| `str` \| `""` \| Human-readable display name \|
136	\| `capabilities` \| `List[str]` \| `[]` \| Model capabilities: `"chat"`, `"vision"`, `"audio"`, `"embedding"` \|
137
138	```json
139	`{`
140	`"id": "gpt-4o",`
141	`"provider": "openai",`
142	`"display_name": "GPT-4o",`
143	`"capabilities": ["chat", "vision"]`
144	`}`
145	```
146
147	`---`
148
149	`## ProviderRegistry`
150
151	```python
152	`from video_processor.providers.base import ProviderRegistry`
153	```
154
155	Class-level registry for provider classes. Providers register themselves with metadata on import. This registry is used internally by `ProviderManager` but can also be used directly for introspection.
156
157	`### register()`
158
159	```python
160	`@classmethod`
161	`def register(`
162	`cls,`
163	`name: str,`
164	`provider_class: type,`
165	`env_var: str = "",`
166	`model_prefixes: Optional[List[str]] = None,`
167	`default_models: Optional[Dict[str, str]] = None,`
168	`) -> None`
169	```
170
171	`Register a provider class with its metadata. Called by each provider module at import time.`
172
173	`Parameters:`
174
175	`\| Parameter \| Type \| Default \| Description \|`
176	`\|---\|---\|---\|---\|`
177	\| `name` \| `str` \| required \| Provider name (e.g., `"openai"`) \|
178	\| `provider_class` \| `type` \| required \| The provider class \|
179	\| `env_var` \| `str` \| `""` \| Environment variable for API key \|
180	\| `model_prefixes` \| `Optional[List[str]]` \| `None` \| Model ID prefixes for auto-detection (e.g., `["gpt-", "o1-"]`) \|
181	\| `default_models` \| `Optional[Dict[str, str]]` \| `None` \| Default models per capability (e.g., `{"chat": "gpt-4o", "vision": "gpt-4o"}`) \|
182
183	`### get()`
184
185	```python
186	`@classmethod`
187	`def get(cls, name: str) -> type`
188	```
189
190	Return the provider class for a given name. Raises `ValueError` if the provider is not registered.
191
192	`### get_by_model()`
193
194	```python
195	`@classmethod`
196	`def get_by_model(cls, model_id: str) -> Optional[str]`
197	```
198
199	Return the provider name for a model ID based on prefix matching. Returns `None` if no match is found.
200
201	`### get_default_models()`
202
203	```python
204	`@classmethod`
205	`def get_default_models(cls, name: str) -> Dict[str, str]`
206	```
207
208	`Return the default models dict for a provider, mapping capability names to model IDs.`
209
210	`### available()`
211
212	```python
213	`@classmethod`
214	`def available(cls) -> List[str]`
215	```
216
217	`Return names of providers whose required environment variable is set (or providers with no env var requirement, like Ollama).`
218
219	`### all_registered()`
220
221	```python
222	`@classmethod`
223	`def all_registered(cls) -> Dict[str, Dict]`
224	```
225
226	`Return all registered providers and their metadata dictionaries.`
227
228	`---`
229
230	`## OpenAICompatibleProvider`
231
232	```python
233	`from video_processor.providers.base import OpenAICompatibleProvider`
234	```
235
236	Base class for providers using OpenAI-compatible APIs (Together, Fireworks, Cerebras, xAI, Azure). Implements `chat()`, `analyze_image()`, and `list_models()` using the OpenAI client library. `transcribe_audio()` raises `NotImplementedError` by default.
237
238	`Constructor:`
239
240	```python
241	`def __init__(self, api_key: Optional[str] = None, base_url: Optional[str] = None)`
242	```
243
244	`\| Parameter \| Type \| Default \| Description \|`
245	`\|---\|---\|---\|---\|`
246	\| `api_key` \| `Optional[str]` \| `None` \| API key (falls back to `self.env_var` environment variable) \|
247	\| `base_url` \| `Optional[str]` \| `None` \| API base URL (falls back to `self.base_url` class attribute) \|
248
249	`Subclass attributes to override:`
250
251	`\| Attribute \| Description \|`
252	`\|---\|---\|`
253	\| `provider_name` \| Provider identifier string \|
254	\| `base_url` \| Default API base URL \|
255	\| `env_var` \| Environment variable name for the API key \|
256
257	Usage tracking: After each `chat()` or `analyze_image()` call, the provider stores token counts in `self._last_usage` as `{"input_tokens": int, "output_tokens": int}`. This is consumed by `ProviderManager._track()`.
258
259	`---`
260
261	`## ProviderManager`
262
263	```python
264	`from video_processor.providers.manager import ProviderManager`
265	```
266
267	`High-level router that selects the best available provider and model for each API call. Supports explicit model selection, forced provider, or automatic selection based on discovered capabilities.`
268
269	`### Constructor`
270
271	```python
272	`def __init__(`
273	`self,`
274	`vision_model: Optional[str] = None,`
275	`chat_model: Optional[str] = None,`
276	`transcription_model: Optional[str] = None,`
277	`provider: Optional[str] = None,`
278	`auto: bool = True,`
279	`)`
280	```
281
282	`\| Parameter \| Type \| Default \| Description \|`
283	`\|---\|---\|---\|---\|`
284	\| `vision_model` \| `Optional[str]` \| `None` \| Override model for vision tasks (e.g., `"gpt-4o"`) \|
285	\| `chat_model` \| `Optional[str]` \| `None` \| Override model for chat/LLM tasks \|
286	\| `transcription_model` \| `Optional[str]` \| `None` \| Override model for transcription \|
287	\| `provider` \| `Optional[str]` \| `None` \| Force all tasks to a single provider \|
288	\| `auto` \| `bool` \| `True` \| If `True` and no model specified, pick the best available \|
289
290	`Attributes:`
291
292	`\| Attribute \| Type \| Description \|`
293	`\|---\|---\|---\|`
294	\| `usage` \| `UsageTracker` \| Tracks token counts and API costs across all calls \|
295
296	`### Auto-selection preferences`
297
298	When `auto=True` and no explicit model is set, providers are tried in this order:
299
300	Vision: Gemini (`gemini-2.5-flash`) > OpenAI (`gpt-4o-mini`) > Anthropic (`claude-haiku-4-5-20251001`)
301
302	Chat: Anthropic (`claude-haiku-4-5-20251001`) > OpenAI (`gpt-4o-mini`) > Gemini (`gemini-2.5-flash`)
303
304	Transcription: OpenAI (`whisper-1`) > Gemini (`gemini-2.5-flash`)
305
306	`If no API-key-based provider is available, Ollama is tried as a fallback.`
307
308	`### chat()`
309
310	```python
311	`def chat(`
312	`self,`
313	`messages: list[dict],`
314	`max_tokens: int = 4096,`
315	`temperature: float = 0.7,`
316	`) -> str`
317	```
318
319	`Send a chat completion to the best available provider. Automatically resolves which provider and model to use.`
320
321	`Parameters:`
322
323	`\| Parameter \| Type \| Default \| Description \|`
324	`\|---\|---\|---\|---\|`
325	\| `messages` \| `list[dict]` \| required \| OpenAI-format messages \|
326	\| `max_tokens` \| `int` \| `4096` \| Maximum response tokens \|
327	\| `temperature` \| `float` \| `0.7` \| Sampling temperature \|
328
329	Returns: `str` -- assistant response text.
330
331	Raises: `RuntimeError` if no provider is available for the `chat` capability.
332
333	`### analyze_image()`
334
335	```python
336	`def analyze_image(`
337	`self,`
338	`image_bytes: bytes,`
339	`prompt: str,`
340	`max_tokens: int = 4096,`
341	`) -> str`
342	```
343
344	`Analyze an image using the best available vision provider.`
345
346	Returns: `str` -- analysis text.
347
348	Raises: `RuntimeError` if no provider is available for the `vision` capability.
349
350	`### transcribe_audio()`
351
352	```python
353	`def transcribe_audio(`
354	`self,`
355	`audio_path: str \| Path,`
356	`language: Optional[str] = None,`
357	`speaker_hints: Optional[list[str]] = None,`
358	`) -> dict`
359	```
360
361	`Transcribe audio. Prefers local Whisper (no file size limits, no API costs) when available, falling back to API-based transcription.`
362
363	`Parameters:`
364
365	`\| Parameter \| Type \| Default \| Description \|`
366	`\|---\|---\|---\|---\|`
367	\| `audio_path` \| `str \\| Path` \| required \| Path to the audio file \|
368	\| `language` \| `Optional[str]` \| `None` \| Language hint \|
369	\| `speaker_hints` \| `Optional[list[str]]` \| `None` \| Speaker names for better recognition \|
370
371	Returns: `dict` -- transcription result with `text`, `segments`, `duration`.
372
373	Local Whisper: If `transcription_model` is unset or starts with `"whisper-local"`, the manager tries local Whisper first. Use `"whisper-local:large"` to specify a model size.
374
375	`### get_models_used()`
376
377	```python
378	`def get_models_used(self) -> dict[str, str]`
379	```
380
381	Return a dict mapping capability to `"provider/model"` string for tracking purposes.
382
383	```python
384	`pm = ProviderManager()`
385	`print(pm.get_models_used())`
386	`# {"vision": "gemini/gemini-2.5-flash", "chat": "anthropic/claude-haiku-4-5-20251001", ...}`
387	```
388
389	`### Usage examples`
390
391	```python
392	`from video_processor.providers.manager import ProviderManager`
393
394	`# Auto-select best providers`
395	`pm = ProviderManager()`
396
397	`# Force everything through one provider`
398	`pm = ProviderManager(provider="openai")`
399
400	`# Explicit model selection`
401	`pm = ProviderManager(`
402	`vision_model="gpt-4o",`
403	`chat_model="claude-haiku-4-5-20251001",`
404	`transcription_model="whisper-local:large",`
405	`)`
406
407	`# Chat completion`
408	`response = pm.chat([`
409	`{"role": "user", "content": "Summarize this meeting transcript..."}`
410	`])`
411
412	`# Image analysis`
413	`with open("diagram.png", "rb") as f:`
414	`analysis = pm.analyze_image(f.read(), "Describe this architecture diagram")`
415
416	`# Transcription with speaker hints`
417	`result = pm.transcribe_audio(`
418	`"meeting.mp3",`
419	`language="en",`
420	`speaker_hints=["Alice", "Bob", "Charlie"],`
421	`)`
422
423	`# Check usage`
424	`print(pm.usage.summary())`
425	```
426
427	`---`
428
429	`## discover_available_models()`
430
431	```python
432	`from video_processor.providers.discovery import discover_available_models`
433	```
434
435	```python
436	`def discover_available_models(`
437	`api_keys: Optional[dict[str, str]] = None,`
438	`force_refresh: bool = False,`
439	`) -> list[ModelInfo]`
440	```
441
442	Discover available models from all configured providers. For each provider with a valid API key, calls `list_models()` and returns a unified, sorted list.
443
444	`Parameters:`
445
446	`\| Parameter \| Type \| Default \| Description \|`
447	`\|---\|---\|---\|---\|`
448	\| `api_keys` \| `Optional[dict[str, str]]` \| `None` \| Override API keys (defaults to environment variables) \|
449	\| `force_refresh` \| `bool` \| `False` \| Force re-discovery, ignoring the session cache \|
450
451	Returns: `list[ModelInfo]` -- all discovered models, sorted by provider then model ID.
452
453	Caching: Results are cached for the session. Use `force_refresh=True` or `clear_discovery_cache()` to refresh.
454
455	```python
456	`from video_processor.providers.discovery import (`
457	`discover_available_models,`
458	`clear_discovery_cache,`
459	`)`
460
461	`# Discover models using environment variables`
462	`models = discover_available_models()`
463	`for m in models:`
464	`print(f"{m.provider}/{m.id} - {m.capabilities}")`
465
466	`# Force refresh`
467	`models = discover_available_models(force_refresh=True)`
468
469	`# Override API keys`
470	`models = discover_available_models(api_keys={`
471	`"openai": "sk-...",`
472	`"anthropic": "sk-ant-...",`
473	`})`
474
475	`# Clear cache`
476	`clear_discovery_cache()`
477	```
478
479	`### clear_discovery_cache()`
480
481	```python
482	`def clear_discovery_cache() -> None`
483	```
484
485	Clear the cached model list, forcing the next `discover_available_models()` call to re-query providers.
486
487	`---`
488
489	`## Built-in Providers`
490
491	`The following providers are registered automatically when the provider system initializes:`
492
493	`\| Provider \| Environment Variable \| Capabilities \| Default Chat Model \|`
494	`\|---\|---\|---\|---\|`
495	\| `openai` \| `OPENAI_API_KEY` \| chat, vision, audio \| `gpt-4o-mini` \|
496	\| `anthropic` \| `ANTHROPIC_API_KEY` \| chat, vision \| `claude-haiku-4-5-20251001` \|
497	\| `gemini` \| `GEMINI_API_KEY` \| chat, vision, audio \| `gemini-2.5-flash` \|
498	\| `ollama` \| (none -- checks server) \| chat, vision \| (depends on installed models) \|
499	\| `together` \| `TOGETHER_API_KEY` \| chat \| (varies) \|
500	\| `fireworks` \| `FIREWORKS_API_KEY` \| chat \| (varies) \|
501	\| `cerebras` \| `CEREBRAS_API_KEY` \| chat \| (varies) \|
502	\| `xai` \| `XAI_API_KEY` \| chat \| (varies) \|
503	\| `azure` \| `AZURE_OPENAI_API_KEY` \| chat, vision \| (varies) \|
504

PlanOpticon

Keyboard Shortcuts