|
3551b80…
|
noreply
|
1 |
# Sources API Reference |
|
3551b80…
|
noreply
|
2 |
|
|
3551b80…
|
noreply
|
3 |
::: video_processor.sources.base |
|
3551b80…
|
noreply
|
4 |
|
|
3551b80…
|
noreply
|
5 |
--- |
|
3551b80…
|
noreply
|
6 |
|
|
3551b80…
|
noreply
|
7 |
## Overview |
|
3551b80…
|
noreply
|
8 |
|
|
3551b80…
|
noreply
|
9 |
The sources module provides a unified interface for fetching content from cloud services, local applications, and the web. All sources implement the `BaseSource` abstract class, providing consistent `authenticate()`, `list_videos()`, and `download()` methods. |
|
3551b80…
|
noreply
|
10 |
|
|
3551b80…
|
noreply
|
11 |
Sources are lazy-loaded to avoid pulling in optional dependencies at import time. You can import any source directly from `video_processor.sources` and the correct module will be loaded on demand. |
|
3551b80…
|
noreply
|
12 |
|
|
3551b80…
|
noreply
|
13 |
--- |
|
3551b80…
|
noreply
|
14 |
|
|
3551b80…
|
noreply
|
15 |
## BaseSource (ABC) |
|
3551b80…
|
noreply
|
16 |
|
|
3551b80…
|
noreply
|
17 |
```python |
|
3551b80…
|
noreply
|
18 |
from video_processor.sources import BaseSource |
|
3551b80…
|
noreply
|
19 |
``` |
|
3551b80…
|
noreply
|
20 |
|
|
3551b80…
|
noreply
|
21 |
Abstract base class that all source integrations implement. Defines the standard three-step workflow: authenticate, list, download. |
|
3551b80…
|
noreply
|
22 |
|
|
3551b80…
|
noreply
|
23 |
### authenticate() |
|
3551b80…
|
noreply
|
24 |
|
|
3551b80…
|
noreply
|
25 |
```python |
|
3551b80…
|
noreply
|
26 |
@abstractmethod |
|
3551b80…
|
noreply
|
27 |
def authenticate(self) -> bool |
|
3551b80…
|
noreply
|
28 |
``` |
|
3551b80…
|
noreply
|
29 |
|
|
3551b80…
|
noreply
|
30 |
Authenticate with the cloud provider or service. Uses the auth strategy defined for the source (OAuth, API key, local access, etc.). |
|
3551b80…
|
noreply
|
31 |
|
|
3551b80…
|
noreply
|
32 |
**Returns:** `bool` -- `True` on successful authentication, `False` on failure. |
|
3551b80…
|
noreply
|
33 |
|
|
3551b80…
|
noreply
|
34 |
### list_videos() |
|
3551b80…
|
noreply
|
35 |
|
|
3551b80…
|
noreply
|
36 |
```python |
|
3551b80…
|
noreply
|
37 |
@abstractmethod |
|
3551b80…
|
noreply
|
38 |
def list_videos( |
|
3551b80…
|
noreply
|
39 |
self, |
|
3551b80…
|
noreply
|
40 |
folder_id: Optional[str] = None, |
|
3551b80…
|
noreply
|
41 |
folder_path: Optional[str] = None, |
|
3551b80…
|
noreply
|
42 |
patterns: Optional[List[str]] = None, |
|
3551b80…
|
noreply
|
43 |
) -> List[SourceFile] |
|
3551b80…
|
noreply
|
44 |
``` |
|
3551b80…
|
noreply
|
45 |
|
|
3551b80…
|
noreply
|
46 |
List available video files (or other content, depending on the source). |
|
3551b80…
|
noreply
|
47 |
|
|
3551b80…
|
noreply
|
48 |
**Parameters:** |
|
3551b80…
|
noreply
|
49 |
|
|
3551b80…
|
noreply
|
50 |
| Parameter | Type | Default | Description | |
|
3551b80…
|
noreply
|
51 |
|---|---|---|---| |
|
3551b80…
|
noreply
|
52 |
| `folder_id` | `Optional[str]` | `None` | Provider-specific folder/container identifier | |
|
3551b80…
|
noreply
|
53 |
| `folder_path` | `Optional[str]` | `None` | Path within the source (e.g., folder name) | |
|
3551b80…
|
noreply
|
54 |
| `patterns` | `Optional[List[str]]` | `None` | File name glob patterns to filter results | |
|
3551b80…
|
noreply
|
55 |
|
|
3551b80…
|
noreply
|
56 |
**Returns:** `List[SourceFile]` -- available files matching the criteria. |
|
3551b80…
|
noreply
|
57 |
|
|
3551b80…
|
noreply
|
58 |
### download() |
|
3551b80…
|
noreply
|
59 |
|
|
3551b80…
|
noreply
|
60 |
```python |
|
3551b80…
|
noreply
|
61 |
@abstractmethod |
|
3551b80…
|
noreply
|
62 |
def download( |
|
3551b80…
|
noreply
|
63 |
self, |
|
3551b80…
|
noreply
|
64 |
file: SourceFile, |
|
3551b80…
|
noreply
|
65 |
destination: Path, |
|
3551b80…
|
noreply
|
66 |
) -> Path |
|
3551b80…
|
noreply
|
67 |
``` |
|
3551b80…
|
noreply
|
68 |
|
|
3551b80…
|
noreply
|
69 |
Download a single file to a local path. |
|
3551b80…
|
noreply
|
70 |
|
|
3551b80…
|
noreply
|
71 |
**Parameters:** |
|
3551b80…
|
noreply
|
72 |
|
|
3551b80…
|
noreply
|
73 |
| Parameter | Type | Description | |
|
3551b80…
|
noreply
|
74 |
|---|---|---| |
|
3551b80…
|
noreply
|
75 |
| `file` | `SourceFile` | File descriptor from `list_videos()` | |
|
3551b80…
|
noreply
|
76 |
| `destination` | `Path` | Local destination path | |
|
3551b80…
|
noreply
|
77 |
|
|
3551b80…
|
noreply
|
78 |
**Returns:** `Path` -- the local path where the file was saved. |
|
3551b80…
|
noreply
|
79 |
|
|
3551b80…
|
noreply
|
80 |
### download_all() |
|
3551b80…
|
noreply
|
81 |
|
|
3551b80…
|
noreply
|
82 |
```python |
|
3551b80…
|
noreply
|
83 |
def download_all( |
|
3551b80…
|
noreply
|
84 |
self, |
|
3551b80…
|
noreply
|
85 |
files: List[SourceFile], |
|
3551b80…
|
noreply
|
86 |
destination_dir: Path, |
|
3551b80…
|
noreply
|
87 |
) -> List[Path] |
|
3551b80…
|
noreply
|
88 |
``` |
|
3551b80…
|
noreply
|
89 |
|
|
3551b80…
|
noreply
|
90 |
Download multiple files to a directory, preserving subfolder structure from `SourceFile.path`. This is a concrete method provided by the base class. |
|
3551b80…
|
noreply
|
91 |
|
|
3551b80…
|
noreply
|
92 |
**Parameters:** |
|
3551b80…
|
noreply
|
93 |
|
|
3551b80…
|
noreply
|
94 |
| Parameter | Type | Description | |
|
3551b80…
|
noreply
|
95 |
|---|---|---| |
|
3551b80…
|
noreply
|
96 |
| `files` | `List[SourceFile]` | Files to download | |
|
3551b80…
|
noreply
|
97 |
| `destination_dir` | `Path` | Base directory for downloads (created if needed) | |
|
3551b80…
|
noreply
|
98 |
|
|
3551b80…
|
noreply
|
99 |
**Returns:** `List[Path]` -- local paths of successfully downloaded files. Failed downloads are logged and skipped. |
|
3551b80…
|
noreply
|
100 |
|
|
3551b80…
|
noreply
|
101 |
--- |
|
3551b80…
|
noreply
|
102 |
|
|
3551b80…
|
noreply
|
103 |
## SourceFile |
|
3551b80…
|
noreply
|
104 |
|
|
3551b80…
|
noreply
|
105 |
```python |
|
3551b80…
|
noreply
|
106 |
from video_processor.sources import SourceFile |
|
3551b80…
|
noreply
|
107 |
``` |
|
3551b80…
|
noreply
|
108 |
|
|
3551b80…
|
noreply
|
109 |
Pydantic model describing a file available in a cloud source. |
|
3551b80…
|
noreply
|
110 |
|
|
3551b80…
|
noreply
|
111 |
| Field | Type | Default | Description | |
|
3551b80…
|
noreply
|
112 |
|---|---|---|---| |
|
3551b80…
|
noreply
|
113 |
| `name` | `str` | *required* | File name | |
|
3551b80…
|
noreply
|
114 |
| `id` | `str` | *required* | Provider-specific file identifier | |
|
3551b80…
|
noreply
|
115 |
| `size_bytes` | `Optional[int]` | `None` | File size in bytes | |
|
3551b80…
|
noreply
|
116 |
| `mime_type` | `Optional[str]` | `None` | MIME type (e.g., `"video/mp4"`) | |
|
3551b80…
|
noreply
|
117 |
| `modified_at` | `Optional[str]` | `None` | Last modified timestamp | |
|
3551b80…
|
noreply
|
118 |
| `path` | `Optional[str]` | `None` | Path within the source folder (used for subfolder structure in `download_all`) | |
|
3551b80…
|
noreply
|
119 |
|
|
3551b80…
|
noreply
|
120 |
```json |
|
3551b80…
|
noreply
|
121 |
{ |
|
3551b80…
|
noreply
|
122 |
"name": "sprint-review-2026-03-01.mp4", |
|
3551b80…
|
noreply
|
123 |
"id": "abc123def456", |
|
3551b80…
|
noreply
|
124 |
"size_bytes": 524288000, |
|
3551b80…
|
noreply
|
125 |
"mime_type": "video/mp4", |
|
3551b80…
|
noreply
|
126 |
"modified_at": "2026-03-01T14:30:00Z", |
|
3551b80…
|
noreply
|
127 |
"path": "recordings/march/sprint-review-2026-03-01.mp4" |
|
3551b80…
|
noreply
|
128 |
} |
|
3551b80…
|
noreply
|
129 |
``` |
|
3551b80…
|
noreply
|
130 |
|
|
3551b80…
|
noreply
|
131 |
--- |
|
3551b80…
|
noreply
|
132 |
|
|
3551b80…
|
noreply
|
133 |
## Lazy Loading Pattern |
|
3551b80…
|
noreply
|
134 |
|
|
3551b80…
|
noreply
|
135 |
All sources are lazy-loaded via `__getattr__` in the package `__init__.py`. This means importing `video_processor.sources` does not pull in any external dependencies (e.g., `google-auth`, `msal`, `notion-client`). The actual module is loaded only when you access the class. |
|
3551b80…
|
noreply
|
136 |
|
|
3551b80…
|
noreply
|
137 |
```python |
|
3551b80…
|
noreply
|
138 |
# This import is instant -- no dependencies loaded |
|
3551b80…
|
noreply
|
139 |
from video_processor.sources import ZoomSource |
|
3551b80…
|
noreply
|
140 |
|
|
3551b80…
|
noreply
|
141 |
# The zoom_source module (and its dependencies) are loaded here |
|
3551b80…
|
noreply
|
142 |
source = ZoomSource() |
|
3551b80…
|
noreply
|
143 |
``` |
|
3551b80…
|
noreply
|
144 |
|
|
3551b80…
|
noreply
|
145 |
--- |
|
3551b80…
|
noreply
|
146 |
|
|
3551b80…
|
noreply
|
147 |
## Available Sources |
|
3551b80…
|
noreply
|
148 |
|
|
3551b80…
|
noreply
|
149 |
### Cloud Recordings |
|
3551b80…
|
noreply
|
150 |
|
|
3551b80…
|
noreply
|
151 |
Sources for fetching recorded meetings from video conferencing platforms. |
|
3551b80…
|
noreply
|
152 |
|
|
3551b80…
|
noreply
|
153 |
| Source | Class | Auth Method | Description | |
|
3551b80…
|
noreply
|
154 |
|---|---|---|---| |
|
3551b80…
|
noreply
|
155 |
| Zoom | `ZoomSource` | OAuth / Server-to-Server | List and download Zoom cloud recordings | |
|
3551b80…
|
noreply
|
156 |
| Google Meet | `MeetRecordingSource` | OAuth (Google) | List and download Google Meet recordings from Drive | |
|
3551b80…
|
noreply
|
157 |
| Microsoft Teams | `TeamsRecordingSource` | OAuth (Microsoft) | List and download Teams meeting recordings | |
|
3551b80…
|
noreply
|
158 |
|
|
3551b80…
|
noreply
|
159 |
### Cloud Storage and Workspace |
|
3551b80…
|
noreply
|
160 |
|
|
3551b80…
|
noreply
|
161 |
Sources for accessing files stored in cloud platforms. |
|
3551b80…
|
noreply
|
162 |
|
|
3551b80…
|
noreply
|
163 |
| Source | Class | Auth Method | Description | |
|
3551b80…
|
noreply
|
164 |
|---|---|---|---| |
|
3551b80…
|
noreply
|
165 |
| Google Drive | `GoogleDriveSource` | OAuth (Google) | Files from Google Drive | |
|
3551b80…
|
noreply
|
166 |
| Google Workspace | `GWSSource` | OAuth (Google) | Google Docs, Sheets, Slides | |
|
3551b80…
|
noreply
|
167 |
| Microsoft 365 | `M365Source` | OAuth (Microsoft) | OneDrive, SharePoint files | |
|
3551b80…
|
noreply
|
168 |
| Notion | `NotionSource` | OAuth / API key | Notion pages and databases | |
|
3551b80…
|
noreply
|
169 |
| GitHub | `GitHubSource` | OAuth / API token | Repository files, issues, discussions | |
|
3551b80…
|
noreply
|
170 |
| Dropbox | `DropboxSource` | OAuth / access token | *(via auth config)* | |
|
3551b80…
|
noreply
|
171 |
|
|
3551b80…
|
noreply
|
172 |
### Notes Applications |
|
3551b80…
|
noreply
|
173 |
|
|
3551b80…
|
noreply
|
174 |
Sources for local and cloud-based note-taking apps. |
|
3551b80…
|
noreply
|
175 |
|
|
3551b80…
|
noreply
|
176 |
| Source | Class | Auth Method | Description | |
|
3551b80…
|
noreply
|
177 |
|---|---|---|---| |
|
3551b80…
|
noreply
|
178 |
| Apple Notes | `AppleNotesSource` | Local (macOS) | Notes from Apple Notes.app | |
|
3551b80…
|
noreply
|
179 |
| Obsidian | `ObsidianSource` | Local filesystem | Markdown files from Obsidian vaults | |
|
3551b80…
|
noreply
|
180 |
| Logseq | `LogseqSource` | Local filesystem | Pages from Logseq graphs | |
|
3551b80…
|
noreply
|
181 |
| OneNote | `OneNoteSource` | OAuth (Microsoft) | Microsoft OneNote notebooks | |
|
3551b80…
|
noreply
|
182 |
| Google Keep | `GoogleKeepSource` | OAuth (Google) | Google Keep notes | |
|
3551b80…
|
noreply
|
183 |
|
|
3551b80…
|
noreply
|
184 |
### Web and Content |
|
3551b80…
|
noreply
|
185 |
|
|
3551b80…
|
noreply
|
186 |
Sources for fetching content from the web. |
|
3551b80…
|
noreply
|
187 |
|
|
3551b80…
|
noreply
|
188 |
| Source | Class | Auth Method | Description | |
|
3551b80…
|
noreply
|
189 |
|---|---|---|---| |
|
3551b80…
|
noreply
|
190 |
| YouTube | `YouTubeSource` | API key / OAuth | YouTube video metadata and transcripts | |
|
3551b80…
|
noreply
|
191 |
| Web | `WebSource` | None | General web page content extraction | |
|
3551b80…
|
noreply
|
192 |
| RSS | `RSSSource` | None | RSS/Atom feed entries | |
|
3551b80…
|
noreply
|
193 |
| Podcast | `PodcastSource` | None | Podcast episodes from RSS feeds | |
|
3551b80…
|
noreply
|
194 |
| arXiv | `ArxivSource` | None | Academic papers from arXiv | |
|
3551b80…
|
noreply
|
195 |
| Hacker News | `HackerNewsSource` | None | Hacker News posts and comments | |
|
3551b80…
|
noreply
|
196 |
| Reddit | `RedditSource` | API credentials | Reddit posts and comments | |
|
3551b80…
|
noreply
|
197 |
| Twitter/X | `TwitterSource` | API credentials | Tweets and threads | |
|
3551b80…
|
noreply
|
198 |
|
|
3551b80…
|
noreply
|
199 |
--- |
|
3551b80…
|
noreply
|
200 |
|
|
3551b80…
|
noreply
|
201 |
## Auth Integration |
|
3551b80…
|
noreply
|
202 |
|
|
3551b80…
|
noreply
|
203 |
Most sources use PlanOpticon's unified auth system (see [Auth API](auth.md)). The typical pattern within a source implementation: |
|
3551b80…
|
noreply
|
204 |
|
|
3551b80…
|
noreply
|
205 |
```python |
|
3551b80…
|
noreply
|
206 |
from video_processor.auth import get_auth_manager |
|
3551b80…
|
noreply
|
207 |
|
|
3551b80…
|
noreply
|
208 |
class MySource(BaseSource): |
|
3551b80…
|
noreply
|
209 |
def __init__(self): |
|
3551b80…
|
noreply
|
210 |
self._token = None |
|
3551b80…
|
noreply
|
211 |
|
|
3551b80…
|
noreply
|
212 |
def authenticate(self) -> bool: |
|
3551b80…
|
noreply
|
213 |
manager = get_auth_manager("my_service") |
|
3551b80…
|
noreply
|
214 |
if manager: |
|
3551b80…
|
noreply
|
215 |
token = manager.get_token() |
|
3551b80…
|
noreply
|
216 |
if token: |
|
3551b80…
|
noreply
|
217 |
self._token = token |
|
3551b80…
|
noreply
|
218 |
return True |
|
3551b80…
|
noreply
|
219 |
return False |
|
3551b80…
|
noreply
|
220 |
|
|
3551b80…
|
noreply
|
221 |
def list_videos(self, **kwargs) -> list[SourceFile]: |
|
3551b80…
|
noreply
|
222 |
if not self._token: |
|
3551b80…
|
noreply
|
223 |
raise RuntimeError("Not authenticated. Call authenticate() first.") |
|
3551b80…
|
noreply
|
224 |
# Use self._token to call the API |
|
3551b80…
|
noreply
|
225 |
... |
|
3551b80…
|
noreply
|
226 |
``` |
|
3551b80…
|
noreply
|
227 |
|
|
3551b80…
|
noreply
|
228 |
--- |
|
3551b80…
|
noreply
|
229 |
|
|
3551b80…
|
noreply
|
230 |
## Usage Examples |
|
3551b80…
|
noreply
|
231 |
|
|
3551b80…
|
noreply
|
232 |
### Listing and downloading Zoom recordings |
|
3551b80…
|
noreply
|
233 |
|
|
3551b80…
|
noreply
|
234 |
```python |
|
3551b80…
|
noreply
|
235 |
from pathlib import Path |
|
3551b80…
|
noreply
|
236 |
from video_processor.sources import ZoomSource |
|
3551b80…
|
noreply
|
237 |
|
|
3551b80…
|
noreply
|
238 |
source = ZoomSource() |
|
3551b80…
|
noreply
|
239 |
if source.authenticate(): |
|
3551b80…
|
noreply
|
240 |
recordings = source.list_videos() |
|
3551b80…
|
noreply
|
241 |
for rec in recordings: |
|
3551b80…
|
noreply
|
242 |
print(f"{rec.name} ({rec.size_bytes} bytes)") |
|
3551b80…
|
noreply
|
243 |
|
|
3551b80…
|
noreply
|
244 |
# Download all to a local directory |
|
3551b80…
|
noreply
|
245 |
paths = source.download_all(recordings, Path("./downloads")) |
|
3551b80…
|
noreply
|
246 |
``` |
|
3551b80…
|
noreply
|
247 |
|
|
3551b80…
|
noreply
|
248 |
### Fetching from multiple sources |
|
3551b80…
|
noreply
|
249 |
|
|
3551b80…
|
noreply
|
250 |
```python |
|
3551b80…
|
noreply
|
251 |
from pathlib import Path |
|
3551b80…
|
noreply
|
252 |
from video_processor.sources import GoogleDriveSource, NotionSource |
|
3551b80…
|
noreply
|
253 |
|
|
3551b80…
|
noreply
|
254 |
# Google Drive |
|
3551b80…
|
noreply
|
255 |
gdrive = GoogleDriveSource() |
|
3551b80…
|
noreply
|
256 |
if gdrive.authenticate(): |
|
3551b80…
|
noreply
|
257 |
files = gdrive.list_videos( |
|
3551b80…
|
noreply
|
258 |
folder_path="Meeting Recordings", |
|
3551b80…
|
noreply
|
259 |
patterns=["*.mp4", "*.webm"], |
|
3551b80…
|
noreply
|
260 |
) |
|
3551b80…
|
noreply
|
261 |
gdrive.download_all(files, Path("./drive-downloads")) |
|
3551b80…
|
noreply
|
262 |
|
|
3551b80…
|
noreply
|
263 |
# Notion |
|
3551b80…
|
noreply
|
264 |
notion = NotionSource() |
|
3551b80…
|
noreply
|
265 |
if notion.authenticate(): |
|
3551b80…
|
noreply
|
266 |
pages = notion.list_videos() # Lists Notion pages |
|
3551b80…
|
noreply
|
267 |
for page in pages: |
|
3551b80…
|
noreply
|
268 |
print(f"Page: {page.name}") |
|
3551b80…
|
noreply
|
269 |
``` |
|
3551b80…
|
noreply
|
270 |
|
|
3551b80…
|
noreply
|
271 |
### YouTube content |
|
3551b80…
|
noreply
|
272 |
|
|
3551b80…
|
noreply
|
273 |
```python |
|
3551b80…
|
noreply
|
274 |
from video_processor.sources import YouTubeSource |
|
3551b80…
|
noreply
|
275 |
|
|
3551b80…
|
noreply
|
276 |
yt = YouTubeSource() |
|
3551b80…
|
noreply
|
277 |
if yt.authenticate(): |
|
3551b80…
|
noreply
|
278 |
videos = yt.list_videos(folder_path="https://youtube.com/playlist?list=...") |
|
3551b80…
|
noreply
|
279 |
for v in videos: |
|
3551b80…
|
noreply
|
280 |
print(f"{v.name} - {v.id}") |
|
3551b80…
|
noreply
|
281 |
``` |