PlanOpticon
Providers API Reference¶
video_processor.providers.base
¶
Abstract base class, registry, and shared types for provider implementations.
BaseProvider
¶
Bases: ABC
Abstract base for all provider implementations.
Source code in video_processor/providers/base.py
analyze_image(image_bytes, prompt, max_tokens=4096, model=None)
abstractmethod
¶
Analyze an image with a prompt. Returns the assistant text.
chat(messages, max_tokens=4096, temperature=0.7, model=None)
abstractmethod
¶
Send a chat completion request. Returns the assistant text.
list_models()
abstractmethod
¶
transcribe_audio(audio_path, language=None, model=None)
abstractmethod
¶
Transcribe an audio file. Returns dict with 'text', 'segments', etc.
ModelInfo
¶
Bases: BaseModel
Information about an available model.
Source code in video_processor/providers/base.py
OpenAICompatibleProvider
¶
Bases: BaseProvider
Base for providers using OpenAI-compatible APIs.
Suitable for Together, Fireworks, Cerebras, xAI, Azure, and similar services.
Source code in video_processor/providers/base.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 | |
ProviderRegistry
¶
Registry for provider classes. Providers register themselves with metadata.
Source code in video_processor/providers/base.py
all_registered()
classmethod
¶
available()
classmethod
¶
Return names of providers whose env var is set (or have no env var requirement).
Source code in video_processor/providers/base.py
get(name)
classmethod
¶
Return the provider class for a given name.
get_by_model(model_id)
classmethod
¶
Return provider name for a model ID based on prefix matching.
Source code in video_processor/providers/base.py
get_default_models(name)
classmethod
¶
Return the default models dict for a provider.
register(name, provider_class, env_var='', model_prefixes=None, default_models=None)
classmethod
¶
Register a provider class with its metadata.
Source code in video_processor/providers/base.py
video_processor.providers.manager
¶
ProviderManager - unified interface for routing API calls to the best available provider.
ProviderManager
¶
Routes API calls to the best available provider/model.
Supports explicit model selection or auto-routing based on discovered available models.
Source code in video_processor/providers/manager.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 | |
__init__(vision_model=None, chat_model=None, transcription_model=None, provider=None, auto=True)
¶
Initialize the ProviderManager.
Parameters¶
vision_model : override model for vision tasks (e.g. 'gpt-4o') chat_model : override model for chat/LLM tasks transcription_model : override model for transcription provider : force all tasks to a single provider ('openai', 'anthropic', 'gemini') auto : if True and no model specified, pick the best available
Source code in video_processor/providers/manager.py
analyze_image(image_bytes, prompt, max_tokens=4096)
¶
Analyze an image using the best available vision provider.
Source code in video_processor/providers/manager.py
chat(messages, max_tokens=4096, temperature=0.7)
¶
Send a chat completion to the best available provider.
Source code in video_processor/providers/manager.py
get_models_used()
¶
Return a dict mapping capability to 'provider/model' for tracking.
Source code in video_processor/providers/manager.py
transcribe_audio(audio_path, language=None, speaker_hints=None)
¶
Transcribe audio using local Whisper if available, otherwise API.
Source code in video_processor/providers/manager.py
video_processor.providers.discovery
¶
Auto-discover available models across providers.
clear_discovery_cache()
¶
discover_available_models(api_keys=None, force_refresh=False)
¶
Discover available models from all configured providers.
For each provider with a valid API key, calls list_models() and returns a unified list. Results are cached for the session.
Source code in video_processor/providers/discovery.py
Overview¶
The provider system abstracts LLM API calls behind a unified interface. It supports multiple providers (OpenAI, Anthropic, Gemini, Ollama, and OpenAI-compatible services), automatic model discovery, capability-based routing, and usage tracking.
Key components:
BaseProvider-- abstract interface that all providers implementProviderRegistry-- global registry mapping provider names to classesProviderManager-- high-level router that picks the best provider for each taskdiscover_available_models()-- scans all configured providers for available models
BaseProvider (ABC)¶
Abstract base class that all provider implementations must subclass. Defines the four core capabilities: chat, vision, audio transcription, and model listing.
Class attribute:
| Attribute | Type | Description |
|---|---|---|
provider_name |
str |
Identifier for this provider (e.g., "openai", "anthropic") |
chat()¶
def chat(
self,
messages: list[dict],
max_tokens: int = 4096,
temperature: float = 0.7,
model: Optional[str] = None,
) -> str
Send a chat completion request.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
messages |
list[dict] |
required | OpenAI-format message list (role, content) |
max_tokens |
int |
4096 |
Maximum tokens in the response |
temperature |
float |
0.7 |
Sampling temperature |
model |
Optional[str] |
None |
Override model ID |
Returns: str -- the assistant's text response.
analyze_image()¶
def analyze_image(
self,
image_bytes: bytes,
prompt: str,
max_tokens: int = 4096,
model: Optional[str] = None,
) -> str
Analyze an image with a text prompt using a vision-capable model.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
image_bytes |
bytes |
required | Raw image data (JPEG, PNG, etc.) |
prompt |
str |
required | Analysis instructions |
max_tokens |
int |
4096 |
Maximum tokens in the response |
model |
Optional[str] |
None |
Override model ID |
Returns: str -- the assistant's analysis text.
transcribe_audio()¶
def transcribe_audio(
self,
audio_path: str | Path,
language: Optional[str] = None,
model: Optional[str] = None,
) -> dict
Transcribe an audio file.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
audio_path |
str \| Path |
required | Path to the audio file |
language |
Optional[str] |
None |
Language hint (ISO 639-1 code) |
model |
Optional[str] |
None |
Override model ID |
Returns: dict -- transcription result with keys text, segments, duration, etc.
list_models()¶
Discover available models from this provider's API.
Returns: list[ModelInfo] -- available models with capability metadata.
ModelInfo¶
Pydantic model describing an available model from a provider.
| Field | Type | Default | Description |
|---|---|---|---|
id |
str |
required | Model identifier (e.g., "gpt-4o", "claude-haiku-4-5-20251001") |
provider |
str |
required | Provider name (e.g., "openai", "anthropic", "gemini") |
display_name |
str |
"" |
Human-readable display name |
capabilities |
List[str] |
[] |
Model capabilities: "chat", "vision", "audio", "embedding" |
{
"id": "gpt-4o",
"provider": "openai",
"display_name": "GPT-4o",
"capabilities": ["chat", "vision"]
}
ProviderRegistry¶
Class-level registry for provider classes. Providers register themselves with metadata on import. This registry is used internally by ProviderManager but can also be used directly for introspection.
register()¶
@classmethod
def register(
cls,
name: str,
provider_class: type,
env_var: str = "",
model_prefixes: Optional[List[str]] = None,
default_models: Optional[Dict[str, str]] = None,
) -> None
Register a provider class with its metadata. Called by each provider module at import time.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
name |
str |
required | Provider name (e.g., "openai") |
provider_class |
type |
required | The provider class |
env_var |
str |
"" |
Environment variable for API key |
model_prefixes |
Optional[List[str]] |
None |
Model ID prefixes for auto-detection (e.g., ["gpt-", "o1-"]) |
default_models |
Optional[Dict[str, str]] |
None |
Default models per capability (e.g., {"chat": "gpt-4o", "vision": "gpt-4o"}) |
get()¶
Return the provider class for a given name. Raises ValueError if the provider is not registered.
get_by_model()¶
Return the provider name for a model ID based on prefix matching. Returns None if no match is found.
get_default_models()¶
Return the default models dict for a provider, mapping capability names to model IDs.
available()¶
Return names of providers whose required environment variable is set (or providers with no env var requirement, like Ollama).
all_registered()¶
Return all registered providers and their metadata dictionaries.
OpenAICompatibleProvider¶
Base class for providers using OpenAI-compatible APIs (Together, Fireworks, Cerebras, xAI, Azure). Implements chat(), analyze_image(), and list_models() using the OpenAI client library. transcribe_audio() raises NotImplementedError by default.
Constructor:
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key |
Optional[str] |
None |
API key (falls back to self.env_var environment variable) |
base_url |
Optional[str] |
None |
API base URL (falls back to self.base_url class attribute) |
Subclass attributes to override:
| Attribute | Description |
|---|---|
provider_name |
Provider identifier string |
base_url |
Default API base URL |
env_var |
Environment variable name for the API key |
Usage tracking: After each chat() or analyze_image() call, the provider stores token counts in self._last_usage as {"input_tokens": int, "output_tokens": int}. This is consumed by ProviderManager._track().
ProviderManager¶
High-level router that selects the best available provider and model for each API call. Supports explicit model selection, forced provider, or automatic selection based on discovered capabilities.
Constructor¶
def __init__(
self,
vision_model: Optional[str] = None,
chat_model: Optional[str] = None,
transcription_model: Optional[str] = None,
provider: Optional[str] = None,
auto: bool = True,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
vision_model |
Optional[str] |
None |
Override model for vision tasks (e.g., "gpt-4o") |
chat_model |
Optional[str] |
None |
Override model for chat/LLM tasks |
transcription_model |
Optional[str] |
None |
Override model for transcription |
provider |
Optional[str] |
None |
Force all tasks to a single provider |
auto |
bool |
True |
If True and no model specified, pick the best available |
Attributes:
| Attribute | Type | Description |
|---|---|---|
usage |
UsageTracker |
Tracks token counts and API costs across all calls |
Auto-selection preferences¶
When auto=True and no explicit model is set, providers are tried in this order:
Vision: Gemini (gemini-2.5-flash) > OpenAI (gpt-4o-mini) > Anthropic (claude-haiku-4-5-20251001)
Chat: Anthropic (claude-haiku-4-5-20251001) > OpenAI (gpt-4o-mini) > Gemini (gemini-2.5-flash)
Transcription: OpenAI (whisper-1) > Gemini (gemini-2.5-flash)
If no API-key-based provider is available, Ollama is tried as a fallback.
chat()¶
Send a chat completion to the best available provider. Automatically resolves which provider and model to use.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
messages |
list[dict] |
required | OpenAI-format messages |
max_tokens |
int |
4096 |
Maximum response tokens |
temperature |
float |
0.7 |
Sampling temperature |
Returns: str -- assistant response text.
Raises: RuntimeError if no provider is available for the chat capability.
analyze_image()¶
Analyze an image using the best available vision provider.
Returns: str -- analysis text.
Raises: RuntimeError if no provider is available for the vision capability.
transcribe_audio()¶
def transcribe_audio(
self,
audio_path: str | Path,
language: Optional[str] = None,
speaker_hints: Optional[list[str]] = None,
) -> dict
Transcribe audio. Prefers local Whisper (no file size limits, no API costs) when available, falling back to API-based transcription.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
audio_path |
str \| Path |
required | Path to the audio file |
language |
Optional[str] |
None |
Language hint |
speaker_hints |
Optional[list[str]] |
None |
Speaker names for better recognition |
Returns: dict -- transcription result with text, segments, duration.
Local Whisper: If transcription_model is unset or starts with "whisper-local", the manager tries local Whisper first. Use "whisper-local:large" to specify a model size.
get_models_used()¶
Return a dict mapping capability to "provider/model" string for tracking purposes.
pm = ProviderManager()
print(pm.get_models_used())
# {"vision": "gemini/gemini-2.5-flash", "chat": "anthropic/claude-haiku-4-5-20251001", ...}
Usage examples¶
from video_processor.providers.manager import ProviderManager
# Auto-select best providers
pm = ProviderManager()
# Force everything through one provider
pm = ProviderManager(provider="openai")
# Explicit model selection
pm = ProviderManager(
vision_model="gpt-4o",
chat_model="claude-haiku-4-5-20251001",
transcription_model="whisper-local:large",
)
# Chat completion
response = pm.chat([
{"role": "user", "content": "Summarize this meeting transcript..."}
])
# Image analysis
with open("diagram.png", "rb") as f:
analysis = pm.analyze_image(f.read(), "Describe this architecture diagram")
# Transcription with speaker hints
result = pm.transcribe_audio(
"meeting.mp3",
language="en",
speaker_hints=["Alice", "Bob", "Charlie"],
)
# Check usage
print(pm.usage.summary())
discover_available_models()¶
def discover_available_models(
api_keys: Optional[dict[str, str]] = None,
force_refresh: bool = False,
) -> list[ModelInfo]
Discover available models from all configured providers. For each provider with a valid API key, calls list_models() and returns a unified, sorted list.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
api_keys |
Optional[dict[str, str]] |
None |
Override API keys (defaults to environment variables) |
force_refresh |
bool |
False |
Force re-discovery, ignoring the session cache |
Returns: list[ModelInfo] -- all discovered models, sorted by provider then model ID.
Caching: Results are cached for the session. Use force_refresh=True or clear_discovery_cache() to refresh.
from video_processor.providers.discovery import (
discover_available_models,
clear_discovery_cache,
)
# Discover models using environment variables
models = discover_available_models()
for m in models:
print(f"{m.provider}/{m.id} - {m.capabilities}")
# Force refresh
models = discover_available_models(force_refresh=True)
# Override API keys
models = discover_available_models(api_keys={
"openai": "sk-...",
"anthropic": "sk-ant-...",
})
# Clear cache
clear_discovery_cache()
clear_discovery_cache()¶
Clear the cached model list, forcing the next discover_available_models() call to re-query providers.
Built-in Providers¶
The following providers are registered automatically when the provider system initializes:
| Provider | Environment Variable | Capabilities | Default Chat Model |
|---|---|---|---|
openai |
OPENAI_API_KEY |
chat, vision, audio | gpt-4o-mini |
anthropic |
ANTHROPIC_API_KEY |
chat, vision | claude-haiku-4-5-20251001 |
gemini |
GEMINI_API_KEY |
chat, vision, audio | gemini-2.5-flash |
ollama |
(none -- checks server) | chat, vision | (depends on installed models) |
together |
TOGETHER_API_KEY |
chat | (varies) |
fireworks |
FIREWORKS_API_KEY |
chat | (varies) |
cerebras |
CEREBRAS_API_KEY |
chat | (varies) |
xai |
XAI_API_KEY |
chat | (varies) |
azure |
AZURE_OPENAI_API_KEY |
chat, vision | (varies) |