PlanOpticon

planopticon / docs / api / providers.md
1
# Providers API Reference
2
3
::: video_processor.providers.base
4
5
::: video_processor.providers.manager
6
7
::: video_processor.providers.discovery
8
9
---
10
11
## Overview
12
13
The provider system abstracts LLM API calls behind a unified interface. It supports multiple providers (OpenAI, Anthropic, Gemini, Ollama, and OpenAI-compatible services), automatic model discovery, capability-based routing, and usage tracking.
14
15
**Key components:**
16
17
- **`BaseProvider`** -- abstract interface that all providers implement
18
- **`ProviderRegistry`** -- global registry mapping provider names to classes
19
- **`ProviderManager`** -- high-level router that picks the best provider for each task
20
- **`discover_available_models()`** -- scans all configured providers for available models
21
22
---
23
24
## BaseProvider (ABC)
25
26
```python
27
from video_processor.providers.base import BaseProvider
28
```
29
30
Abstract base class that all provider implementations must subclass. Defines the four core capabilities: chat, vision, audio transcription, and model listing.
31
32
**Class attribute:**
33
34
| Attribute | Type | Description |
35
|---|---|---|
36
| `provider_name` | `str` | Identifier for this provider (e.g., `"openai"`, `"anthropic"`) |
37
38
### chat()
39
40
```python
41
def chat(
42
self,
43
messages: list[dict],
44
max_tokens: int = 4096,
45
temperature: float = 0.7,
46
model: Optional[str] = None,
47
) -> str
48
```
49
50
Send a chat completion request.
51
52
**Parameters:**
53
54
| Parameter | Type | Default | Description |
55
|---|---|---|---|
56
| `messages` | `list[dict]` | *required* | OpenAI-format message list (`role`, `content`) |
57
| `max_tokens` | `int` | `4096` | Maximum tokens in the response |
58
| `temperature` | `float` | `0.7` | Sampling temperature |
59
| `model` | `Optional[str]` | `None` | Override model ID |
60
61
**Returns:** `str` -- the assistant's text response.
62
63
### analyze_image()
64
65
```python
66
def analyze_image(
67
self,
68
image_bytes: bytes,
69
prompt: str,
70
max_tokens: int = 4096,
71
model: Optional[str] = None,
72
) -> str
73
```
74
75
Analyze an image with a text prompt using a vision-capable model.
76
77
**Parameters:**
78
79
| Parameter | Type | Default | Description |
80
|---|---|---|---|
81
| `image_bytes` | `bytes` | *required* | Raw image data (JPEG, PNG, etc.) |
82
| `prompt` | `str` | *required* | Analysis instructions |
83
| `max_tokens` | `int` | `4096` | Maximum tokens in the response |
84
| `model` | `Optional[str]` | `None` | Override model ID |
85
86
**Returns:** `str` -- the assistant's analysis text.
87
88
### transcribe_audio()
89
90
```python
91
def transcribe_audio(
92
self,
93
audio_path: str | Path,
94
language: Optional[str] = None,
95
model: Optional[str] = None,
96
) -> dict
97
```
98
99
Transcribe an audio file.
100
101
**Parameters:**
102
103
| Parameter | Type | Default | Description |
104
|---|---|---|---|
105
| `audio_path` | `str \| Path` | *required* | Path to the audio file |
106
| `language` | `Optional[str]` | `None` | Language hint (ISO 639-1 code) |
107
| `model` | `Optional[str]` | `None` | Override model ID |
108
109
**Returns:** `dict` -- transcription result with keys `text`, `segments`, `duration`, etc.
110
111
### list_models()
112
113
```python
114
def list_models(self) -> list[ModelInfo]
115
```
116
117
Discover available models from this provider's API.
118
119
**Returns:** `list[ModelInfo]` -- available models with capability metadata.
120
121
---
122
123
## ModelInfo
124
125
```python
126
from video_processor.providers.base import ModelInfo
127
```
128
129
Pydantic model describing an available model from a provider.
130
131
| Field | Type | Default | Description |
132
|---|---|---|---|
133
| `id` | `str` | *required* | Model identifier (e.g., `"gpt-4o"`, `"claude-haiku-4-5-20251001"`) |
134
| `provider` | `str` | *required* | Provider name (e.g., `"openai"`, `"anthropic"`, `"gemini"`) |
135
| `display_name` | `str` | `""` | Human-readable display name |
136
| `capabilities` | `List[str]` | `[]` | Model capabilities: `"chat"`, `"vision"`, `"audio"`, `"embedding"` |
137
138
```json
139
{
140
"id": "gpt-4o",
141
"provider": "openai",
142
"display_name": "GPT-4o",
143
"capabilities": ["chat", "vision"]
144
}
145
```
146
147
---
148
149
## ProviderRegistry
150
151
```python
152
from video_processor.providers.base import ProviderRegistry
153
```
154
155
Class-level registry for provider classes. Providers register themselves with metadata on import. This registry is used internally by `ProviderManager` but can also be used directly for introspection.
156
157
### register()
158
159
```python
160
@classmethod
161
def register(
162
cls,
163
name: str,
164
provider_class: type,
165
env_var: str = "",
166
model_prefixes: Optional[List[str]] = None,
167
default_models: Optional[Dict[str, str]] = None,
168
) -> None
169
```
170
171
Register a provider class with its metadata. Called by each provider module at import time.
172
173
**Parameters:**
174
175
| Parameter | Type | Default | Description |
176
|---|---|---|---|
177
| `name` | `str` | *required* | Provider name (e.g., `"openai"`) |
178
| `provider_class` | `type` | *required* | The provider class |
179
| `env_var` | `str` | `""` | Environment variable for API key |
180
| `model_prefixes` | `Optional[List[str]]` | `None` | Model ID prefixes for auto-detection (e.g., `["gpt-", "o1-"]`) |
181
| `default_models` | `Optional[Dict[str, str]]` | `None` | Default models per capability (e.g., `{"chat": "gpt-4o", "vision": "gpt-4o"}`) |
182
183
### get()
184
185
```python
186
@classmethod
187
def get(cls, name: str) -> type
188
```
189
190
Return the provider class for a given name. Raises `ValueError` if the provider is not registered.
191
192
### get_by_model()
193
194
```python
195
@classmethod
196
def get_by_model(cls, model_id: str) -> Optional[str]
197
```
198
199
Return the provider name for a model ID based on prefix matching. Returns `None` if no match is found.
200
201
### get_default_models()
202
203
```python
204
@classmethod
205
def get_default_models(cls, name: str) -> Dict[str, str]
206
```
207
208
Return the default models dict for a provider, mapping capability names to model IDs.
209
210
### available()
211
212
```python
213
@classmethod
214
def available(cls) -> List[str]
215
```
216
217
Return names of providers whose required environment variable is set (or providers with no env var requirement, like Ollama).
218
219
### all_registered()
220
221
```python
222
@classmethod
223
def all_registered(cls) -> Dict[str, Dict]
224
```
225
226
Return all registered providers and their metadata dictionaries.
227
228
---
229
230
## OpenAICompatibleProvider
231
232
```python
233
from video_processor.providers.base import OpenAICompatibleProvider
234
```
235
236
Base class for providers using OpenAI-compatible APIs (Together, Fireworks, Cerebras, xAI, Azure). Implements `chat()`, `analyze_image()`, and `list_models()` using the OpenAI client library. `transcribe_audio()` raises `NotImplementedError` by default.
237
238
**Constructor:**
239
240
```python
241
def __init__(self, api_key: Optional[str] = None, base_url: Optional[str] = None)
242
```
243
244
| Parameter | Type | Default | Description |
245
|---|---|---|---|
246
| `api_key` | `Optional[str]` | `None` | API key (falls back to `self.env_var` environment variable) |
247
| `base_url` | `Optional[str]` | `None` | API base URL (falls back to `self.base_url` class attribute) |
248
249
**Subclass attributes to override:**
250
251
| Attribute | Description |
252
|---|---|
253
| `provider_name` | Provider identifier string |
254
| `base_url` | Default API base URL |
255
| `env_var` | Environment variable name for the API key |
256
257
**Usage tracking:** After each `chat()` or `analyze_image()` call, the provider stores token counts in `self._last_usage` as `{"input_tokens": int, "output_tokens": int}`. This is consumed by `ProviderManager._track()`.
258
259
---
260
261
## ProviderManager
262
263
```python
264
from video_processor.providers.manager import ProviderManager
265
```
266
267
High-level router that selects the best available provider and model for each API call. Supports explicit model selection, forced provider, or automatic selection based on discovered capabilities.
268
269
### Constructor
270
271
```python
272
def __init__(
273
self,
274
vision_model: Optional[str] = None,
275
chat_model: Optional[str] = None,
276
transcription_model: Optional[str] = None,
277
provider: Optional[str] = None,
278
auto: bool = True,
279
)
280
```
281
282
| Parameter | Type | Default | Description |
283
|---|---|---|---|
284
| `vision_model` | `Optional[str]` | `None` | Override model for vision tasks (e.g., `"gpt-4o"`) |
285
| `chat_model` | `Optional[str]` | `None` | Override model for chat/LLM tasks |
286
| `transcription_model` | `Optional[str]` | `None` | Override model for transcription |
287
| `provider` | `Optional[str]` | `None` | Force all tasks to a single provider |
288
| `auto` | `bool` | `True` | If `True` and no model specified, pick the best available |
289
290
**Attributes:**
291
292
| Attribute | Type | Description |
293
|---|---|---|
294
| `usage` | `UsageTracker` | Tracks token counts and API costs across all calls |
295
296
### Auto-selection preferences
297
298
When `auto=True` and no explicit model is set, providers are tried in this order:
299
300
**Vision:** Gemini (`gemini-2.5-flash`) > OpenAI (`gpt-4o-mini`) > Anthropic (`claude-haiku-4-5-20251001`)
301
302
**Chat:** Anthropic (`claude-haiku-4-5-20251001`) > OpenAI (`gpt-4o-mini`) > Gemini (`gemini-2.5-flash`)
303
304
**Transcription:** OpenAI (`whisper-1`) > Gemini (`gemini-2.5-flash`)
305
306
If no API-key-based provider is available, Ollama is tried as a fallback.
307
308
### chat()
309
310
```python
311
def chat(
312
self,
313
messages: list[dict],
314
max_tokens: int = 4096,
315
temperature: float = 0.7,
316
) -> str
317
```
318
319
Send a chat completion to the best available provider. Automatically resolves which provider and model to use.
320
321
**Parameters:**
322
323
| Parameter | Type | Default | Description |
324
|---|---|---|---|
325
| `messages` | `list[dict]` | *required* | OpenAI-format messages |
326
| `max_tokens` | `int` | `4096` | Maximum response tokens |
327
| `temperature` | `float` | `0.7` | Sampling temperature |
328
329
**Returns:** `str` -- assistant response text.
330
331
**Raises:** `RuntimeError` if no provider is available for the `chat` capability.
332
333
### analyze_image()
334
335
```python
336
def analyze_image(
337
self,
338
image_bytes: bytes,
339
prompt: str,
340
max_tokens: int = 4096,
341
) -> str
342
```
343
344
Analyze an image using the best available vision provider.
345
346
**Returns:** `str` -- analysis text.
347
348
**Raises:** `RuntimeError` if no provider is available for the `vision` capability.
349
350
### transcribe_audio()
351
352
```python
353
def transcribe_audio(
354
self,
355
audio_path: str | Path,
356
language: Optional[str] = None,
357
speaker_hints: Optional[list[str]] = None,
358
) -> dict
359
```
360
361
Transcribe audio. Prefers local Whisper (no file size limits, no API costs) when available, falling back to API-based transcription.
362
363
**Parameters:**
364
365
| Parameter | Type | Default | Description |
366
|---|---|---|---|
367
| `audio_path` | `str \| Path` | *required* | Path to the audio file |
368
| `language` | `Optional[str]` | `None` | Language hint |
369
| `speaker_hints` | `Optional[list[str]]` | `None` | Speaker names for better recognition |
370
371
**Returns:** `dict` -- transcription result with `text`, `segments`, `duration`.
372
373
**Local Whisper:** If `transcription_model` is unset or starts with `"whisper-local"`, the manager tries local Whisper first. Use `"whisper-local:large"` to specify a model size.
374
375
### get_models_used()
376
377
```python
378
def get_models_used(self) -> dict[str, str]
379
```
380
381
Return a dict mapping capability to `"provider/model"` string for tracking purposes.
382
383
```python
384
pm = ProviderManager()
385
print(pm.get_models_used())
386
# {"vision": "gemini/gemini-2.5-flash", "chat": "anthropic/claude-haiku-4-5-20251001", ...}
387
```
388
389
### Usage examples
390
391
```python
392
from video_processor.providers.manager import ProviderManager
393
394
# Auto-select best providers
395
pm = ProviderManager()
396
397
# Force everything through one provider
398
pm = ProviderManager(provider="openai")
399
400
# Explicit model selection
401
pm = ProviderManager(
402
vision_model="gpt-4o",
403
chat_model="claude-haiku-4-5-20251001",
404
transcription_model="whisper-local:large",
405
)
406
407
# Chat completion
408
response = pm.chat([
409
{"role": "user", "content": "Summarize this meeting transcript..."}
410
])
411
412
# Image analysis
413
with open("diagram.png", "rb") as f:
414
analysis = pm.analyze_image(f.read(), "Describe this architecture diagram")
415
416
# Transcription with speaker hints
417
result = pm.transcribe_audio(
418
"meeting.mp3",
419
language="en",
420
speaker_hints=["Alice", "Bob", "Charlie"],
421
)
422
423
# Check usage
424
print(pm.usage.summary())
425
```
426
427
---
428
429
## discover_available_models()
430
431
```python
432
from video_processor.providers.discovery import discover_available_models
433
```
434
435
```python
436
def discover_available_models(
437
api_keys: Optional[dict[str, str]] = None,
438
force_refresh: bool = False,
439
) -> list[ModelInfo]
440
```
441
442
Discover available models from all configured providers. For each provider with a valid API key, calls `list_models()` and returns a unified, sorted list.
443
444
**Parameters:**
445
446
| Parameter | Type | Default | Description |
447
|---|---|---|---|
448
| `api_keys` | `Optional[dict[str, str]]` | `None` | Override API keys (defaults to environment variables) |
449
| `force_refresh` | `bool` | `False` | Force re-discovery, ignoring the session cache |
450
451
**Returns:** `list[ModelInfo]` -- all discovered models, sorted by provider then model ID.
452
453
**Caching:** Results are cached for the session. Use `force_refresh=True` or `clear_discovery_cache()` to refresh.
454
455
```python
456
from video_processor.providers.discovery import (
457
discover_available_models,
458
clear_discovery_cache,
459
)
460
461
# Discover models using environment variables
462
models = discover_available_models()
463
for m in models:
464
print(f"{m.provider}/{m.id} - {m.capabilities}")
465
466
# Force refresh
467
models = discover_available_models(force_refresh=True)
468
469
# Override API keys
470
models = discover_available_models(api_keys={
471
"openai": "sk-...",
472
"anthropic": "sk-ant-...",
473
})
474
475
# Clear cache
476
clear_discovery_cache()
477
```
478
479
### clear_discovery_cache()
480
481
```python
482
def clear_discovery_cache() -> None
483
```
484
485
Clear the cached model list, forcing the next `discover_available_models()` call to re-query providers.
486
487
---
488
489
## Built-in Providers
490
491
The following providers are registered automatically when the provider system initializes:
492
493
| Provider | Environment Variable | Capabilities | Default Chat Model |
494
|---|---|---|---|
495
| `openai` | `OPENAI_API_KEY` | chat, vision, audio | `gpt-4o-mini` |
496
| `anthropic` | `ANTHROPIC_API_KEY` | chat, vision | `claude-haiku-4-5-20251001` |
497
| `gemini` | `GEMINI_API_KEY` | chat, vision, audio | `gemini-2.5-flash` |
498
| `ollama` | *(none -- checks server)* | chat, vision | *(depends on installed models)* |
499
| `together` | `TOGETHER_API_KEY` | chat | *(varies)* |
500
| `fireworks` | `FIREWORKS_API_KEY` | chat | *(varies)* |
501
| `cerebras` | `CEREBRAS_API_KEY` | chat | *(varies)* |
502
| `xai` | `XAI_API_KEY` | chat | *(varies)* |
503
| `azure` | `AZURE_OPENAI_API_KEY` | chat, vision | *(varies)* |
504

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button