Providers API Reference

::: video_processor.providers.base

::: video_processor.providers.manager

::: video_processor.providers.discovery

Overview

The provider system abstracts LLM API calls behind a unified interface. It supports multiple providers (OpenAI, Anthropic, Gemini, Ollama, and OpenAI-compatible services), automatic model discovery, capability-based routing, and usage tracking.

Key components:

BaseProvider -- abstract interface that all providers implement
ProviderRegistry -- global registry mapping provider names to classes
ProviderManager -- high-level router that picks the best provider for each task
discover_available_models() -- scans all configured providers for available models

BaseProvider (ABC)

from video_processor.providers.base import BaseProvider

Abstract base class that all provider implementations must subclass. Defines the four core capabilities: chat, vision, audio transcription, and model listing.

Class attribute:

Attribute	Type	Description
`provider_name`	`str`	Identifier for this provider (e.g., `"openai"`, `"anthropic"`)

chat()

def chat(
    self,
    messages: list[dict],
    max_tokens: int = 4096,
    temperature: float = 0.7,
    model: Optional[str] = None,
) -> str

Send a chat completion request.

Parameters:

Parameter	Type	Default	Description
`messages`	`list[dict]`	required	OpenAI-format message list (`role`, `content`)
`max_tokens`	`int`	`4096`	Maximum tokens in the response
`temperature`	`float`	`0.7`	Sampling temperature
`model`	`Optional[str]`	`None`	Override model ID

Returns: str -- the assistant's text response.

analyze_image()

def analyze_image(
    self,
    image_bytes: bytes,
    prompt: str,
    max_tokens: int = 4096,
    model: Optional[str] = None,
) -> str

Analyze an image with a text prompt using a vision-capable model.

Parameters:

Parameter	Type	Default	Description
`image_bytes`	`bytes`	required	Raw image data (JPEG, PNG, etc.)
`prompt`	`str`	required	Analysis instructions
`max_tokens`	`int`	`4096`	Maximum tokens in the response
`model`	`Optional[str]`	`None`	Override model ID

Returns: str -- the assistant's analysis text.

transcribe_audio()

def transcribe_audio(
    self,
    audio_path: str | Path,
    language: Optional[str] = None,
    model: Optional[str] = None,
) -> dict

Transcribe an audio file.

Parameters:

Parameter	Type	Default	Description
`audio_path`	`str \\| Path`	required	Path to the audio file
`language`	`Optional[str]`	`None`	Language hint (ISO 639-1 code)
`model`	`Optional[str]`	`None`	Override model ID

Returns: dict -- transcription result with keys text, segments, duration, etc.

list_models()

def list_models(self) -> list[ModelInfo]

Discover available models from this provider's API.

Returns: list[ModelInfo] -- available models with capability metadata.

ModelInfo

from video_processor.providers.base import ModelInfo

Pydantic model describing an available model from a provider.

Field	Type	Default	Description
`id`	`str`	required	Model identifier (e.g., `"gpt-4o"`, `"claude-haiku-4-5-20251001"`)
`provider`	`str`	required	Provider name (e.g., `"openai"`, `"anthropic"`, `"gemini"`)
`display_name`	`str`	`""`	Human-readable display name
`capabilities`	`List[str]`	`[]`	Model capabilities: `"chat"`, `"vision"`, `"audio"`, `"embedding"`

{
  "id": "gpt-4o",
  "provider": "openai",
  "display_name": "GPT-4o",
  "capabilities": ["chat", "vision"]
}

ProviderRegistry

from video_processor.providers.base import ProviderRegistry

Class-level registry for provider classes. Providers register themselves with metadata on import. This registry is used internally by ProviderManager but can also be used directly for introspection.

register()

@classmethod
def register(
    cls,
    name: str,
    provider_class: type,
    env_var: str = "",
    model_prefixes: Optional[List[str]] = None,
    default_models: Optional[Dict[str, str]] = None,
) -> None

Parameters:

Parameter	Type	Default	Description
`name`	`str`	required	Provider name (e.g., `"openai"`)
`provider_class`	`type`	required	The provider class
`env_var`	`str`	`""`	Environment variable for API key
`model_prefixes`	`Optional[List[str]]`	`None`	Model ID prefixes for auto-detection (e.g., `["gpt-", "o1-"]`)
`default_models`	`Optional[Dict[str, str]]`	`None`	Default models per capability (e.g., `{"chat": "gpt-4o", "vision": "gpt-4o"}`)

get()

@classmethod
def get(cls, name: str) -> type

Return the provider class for a given name. Raises ValueError if the provider is not registered.

get_by_model()

@classmethod
def get_by_model(cls, model_id: str) -> Optional[str]

Return the provider name for a model ID based on prefix matching. Returns None if no match is found.

get_default_models()

@classmethod
def get_default_models(cls, name: str) -> Dict[str, str]

Return the default models dict for a provider, mapping capability names to model IDs.

available()

@classmethod
def available(cls) -> List[str]

Return names of providers whose required environment variable is set (or providers with no env var requirement, like Ollama).

all_registered()

@classmethod
def all_registered(cls) -> Dict[str, Dict]

Return all registered providers and their metadata dictionaries.

OpenAICompatibleProvider

from video_processor.providers.base import OpenAICompatibleProvider

Base class for providers using OpenAI-compatible APIs (Together, Fireworks, Cerebras, xAI, Azure). Implements chat(), analyze_image(), and list_models() using the OpenAI client library. transcribe_audio() raises NotImplementedError by default.

Constructor:

def __init__(self, api_key: Optional[str] = None, base_url: Optional[str] = None)

Parameter	Type	Default	Description
`api_key`	`Optional[str]`	`None`	API key (falls back to `self.env_var` environment variable)
`base_url`	`Optional[str]`	`None`	API base URL (falls back to `self.base_url` class attribute)

Subclass attributes to override:

Attribute	Description
`provider_name`	Provider identifier string
`base_url`	Default API base URL
`env_var`	Environment variable name for the API key

Usage tracking: After each chat() or analyze_image() call, the provider stores token counts in self._last_usage as {"input_tokens": int, "output_tokens": int}. This is consumed by ProviderManager._track().

ProviderManager

from video_processor.providers.manager import ProviderManager

High-level router that selects the best available provider and model for each API call. Supports explicit model selection, forced provider, or automatic selection based on discovered capabilities.

Constructor

def __init__(
    self,
    vision_model: Optional[str] = None,
    chat_model: Optional[str] = None,
    transcription_model: Optional[str] = None,
    provider: Optional[str] = None,
    auto: bool = True,
)

Parameter	Type	Default	Description
`vision_model`	`Optional[str]`	`None`	Override model for vision tasks (e.g., `"gpt-4o"`)
`chat_model`	`Optional[str]`	`None`	Override model for chat/LLM tasks
`transcription_model`	`Optional[str]`	`None`	Override model for transcription
`provider`	`Optional[str]`	`None`	Force all tasks to a single provider
`auto`	`bool`	`True`	If `True` and no model specified, pick the best available

Attributes:

Attribute	Type	Description
`usage`	`UsageTracker`	Tracks token counts and API costs across all calls

Auto-selection preferences

When auto=True and no explicit model is set, providers are tried in this order:

Vision: Gemini (gemini-2.5-flash) > OpenAI (gpt-4o-mini) > Anthropic (claude-haiku-4-5-20251001)

Chat: Anthropic (claude-haiku-4-5-20251001) > OpenAI (gpt-4o-mini) > Gemini (gemini-2.5-flash)

Transcription: OpenAI (whisper-1) > Gemini (gemini-2.5-flash)

If no API-key-based provider is available, Ollama is tried as a fallback.

chat()

def chat(
    self,
    messages: list[dict],
    max_tokens: int = 4096,
    temperature: float = 0.7,
) -> str

Send a chat completion to the best available provider. Automatically resolves which provider and model to use.

Parameters:

Parameter	Type	Default	Description
`messages`	`list[dict]`	required	OpenAI-format messages
`max_tokens`	`int`	`4096`	Maximum response tokens
`temperature`	`float`	`0.7`	Sampling temperature

Returns: str -- assistant response text.

Raises: RuntimeError if no provider is available for the chat capability.

analyze_image()

def analyze_image(
    self,
    image_bytes: bytes,
    prompt: str,
    max_tokens: int = 4096,
) -> str

Analyze an image using the best available vision provider.

Returns: str -- analysis text.

Raises: RuntimeError if no provider is available for the vision capability.

transcribe_audio()

def transcribe_audio(
    self,
    audio_path: str | Path,
    language: Optional[str] = None,
    speaker_hints: Optional[list[str]] = None,
) -> dict

Transcribe audio. Prefers local Whisper (no file size limits, no API costs) when available, falling back to API-based transcription.

Parameters:

Parameter	Type	Default	Description
`audio_path`	`str \\| Path`	required	Path to the audio file
`language`	`Optional[str]`	`None`	Language hint
`speaker_hints`	`Optional[list[str]]`	`None`	Speaker names for better recognition

Returns: dict -- transcription result with text, segments, duration.

Local Whisper: If transcription_model is unset or starts with "whisper-local", the manager tries local Whisper first. Use "whisper-local:large" to specify a model size.

get_models_used()

def get_models_used(self) -> dict[str, str]

Return a dict mapping capability to "provider/model" string for tracking purposes.

pm = ProviderManager()
print(pm.get_models_used())
# {"vision": "gemini/gemini-2.5-flash", "chat": "anthropic/claude-haiku-4-5-20251001", ...}

Usage examples

from video_processor.providers.manager import ProviderManager

# Auto-select best providers
pm = ProviderManager()

# Force everything through one provider
pm = ProviderManager(provider="openai")

# Explicit model selection
pm = ProviderManager(
    vision_model="gpt-4o",
    chat_model="claude-haiku-4-5-20251001",
    transcription_model="whisper-local:large",
)

# Chat completion
response = pm.chat([
    {"role": "user", "content": "Summarize this meeting transcript..."}
])

# Image analysis
with open("diagram.png", "rb") as f:
    analysis = pm.analyze_image(f.read(), "Describe this architecture diagram")

# Transcription with speaker hints
result = pm.transcribe_audio(
    "meeting.mp3",
    language="en",
    speaker_hints=["Alice", "Bob", "Charlie"],
)

# Check usage
print(pm.usage.summary())

discover_available_models()

from video_processor.providers.discovery import discover_available_models

def discover_available_models(
    api_keys: Optional[dict[str, str]] = None,
    force_refresh: bool = False,
) -> list[ModelInfo]

Discover available models from all configured providers. For each provider with a valid API key, calls list_models() and returns a unified, sorted list.

Parameters:

Parameter	Type	Default	Description
`api_keys`	`Optional[dict[str, str]]`	`None`	Override API keys (defaults to environment variables)
`force_refresh`	`bool`	`False`	Force re-discovery, ignoring the session cache

Returns: list[ModelInfo] -- all discovered models, sorted by provider then model ID.

Caching: Results are cached for the session. Use force_refresh=True or clear_discovery_cache() to refresh.

from video_processor.providers.discovery import (
    discover_available_models,
    clear_discovery_cache,
)

# Discover models using environment variables
models = discover_available_models()
for m in models:
    print(f"{m.provider}/{m.id} - {m.capabilities}")

# Force refresh
models = discover_available_models(force_refresh=True)

# Override API keys
models = discover_available_models(api_keys={
    "openai": "sk-...",
    "anthropic": "sk-ant-...",
})

# Clear cache
clear_discovery_cache()

clear_discovery_cache()

def clear_discovery_cache() -> None

Clear the cached model list, forcing the next discover_available_models() call to re-query providers.

Built-in Providers

The following providers are registered automatically when the provider system initializes:

Provider	Environment Variable	Capabilities	Default Chat Model
`openai`	`OPENAI_API_KEY`	chat, vision, audio	`gpt-4o-mini`
`anthropic`	`ANTHROPIC_API_KEY`	chat, vision	`claude-haiku-4-5-20251001`
`gemini`	`GEMINI_API_KEY`	chat, vision, audio	`gemini-2.5-flash`
`ollama`	(none -- checks server)	chat, vision	(depends on installed models)
`together`	`TOGETHER_API_KEY`	chat	(varies)
`fireworks`	`FIREWORKS_API_KEY`	chat	(varies)
`cerebras`	`CEREBRAS_API_KEY`	chat	(varies)
`xai`	`XAI_API_KEY`	chat	(varies)
`azure`	`AZURE_OPENAI_API_KEY`	chat, vision	(varies)

PlanOpticon

Providers API Reference

Overview

BaseProvider (ABC)

chat()

analyze_image()

transcribe_audio()

list_models()

ModelInfo

ProviderRegistry

register()

get()

get_by_model()

get_default_models()

available()

all_registered()

OpenAICompatibleProvider

ProviderManager

Constructor

Auto-selection preferences

chat()

analyze_image()

transcribe_audio()

get_models_used()

Usage examples

discover_available_models()

clear_discovery_cache()

Built-in Providers

Keyboard Shortcuts