PlanOpticon

planopticon / docs / api / sources.md
1
# Sources API Reference
2
3
::: video_processor.sources.base
4
5
---
6
7
## Overview
8
9
The sources module provides a unified interface for fetching content from cloud services, local applications, and the web. All sources implement the `BaseSource` abstract class, providing consistent `authenticate()`, `list_videos()`, and `download()` methods.
10
11
Sources are lazy-loaded to avoid pulling in optional dependencies at import time. You can import any source directly from `video_processor.sources` and the correct module will be loaded on demand.
12
13
---
14
15
## BaseSource (ABC)
16
17
```python
18
from video_processor.sources import BaseSource
19
```
20
21
Abstract base class that all source integrations implement. Defines the standard three-step workflow: authenticate, list, download.
22
23
### authenticate()
24
25
```python
26
@abstractmethod
27
def authenticate(self) -> bool
28
```
29
30
Authenticate with the cloud provider or service. Uses the auth strategy defined for the source (OAuth, API key, local access, etc.).
31
32
**Returns:** `bool` -- `True` on successful authentication, `False` on failure.
33
34
### list_videos()
35
36
```python
37
@abstractmethod
38
def list_videos(
39
self,
40
folder_id: Optional[str] = None,
41
folder_path: Optional[str] = None,
42
patterns: Optional[List[str]] = None,
43
) -> List[SourceFile]
44
```
45
46
List available video files (or other content, depending on the source).
47
48
**Parameters:**
49
50
| Parameter | Type | Default | Description |
51
|---|---|---|---|
52
| `folder_id` | `Optional[str]` | `None` | Provider-specific folder/container identifier |
53
| `folder_path` | `Optional[str]` | `None` | Path within the source (e.g., folder name) |
54
| `patterns` | `Optional[List[str]]` | `None` | File name glob patterns to filter results |
55
56
**Returns:** `List[SourceFile]` -- available files matching the criteria.
57
58
### download()
59
60
```python
61
@abstractmethod
62
def download(
63
self,
64
file: SourceFile,
65
destination: Path,
66
) -> Path
67
```
68
69
Download a single file to a local path.
70
71
**Parameters:**
72
73
| Parameter | Type | Description |
74
|---|---|---|
75
| `file` | `SourceFile` | File descriptor from `list_videos()` |
76
| `destination` | `Path` | Local destination path |
77
78
**Returns:** `Path` -- the local path where the file was saved.
79
80
### download_all()
81
82
```python
83
def download_all(
84
self,
85
files: List[SourceFile],
86
destination_dir: Path,
87
) -> List[Path]
88
```
89
90
Download multiple files to a directory, preserving subfolder structure from `SourceFile.path`. This is a concrete method provided by the base class.
91
92
**Parameters:**
93
94
| Parameter | Type | Description |
95
|---|---|---|
96
| `files` | `List[SourceFile]` | Files to download |
97
| `destination_dir` | `Path` | Base directory for downloads (created if needed) |
98
99
**Returns:** `List[Path]` -- local paths of successfully downloaded files. Failed downloads are logged and skipped.
100
101
---
102
103
## SourceFile
104
105
```python
106
from video_processor.sources import SourceFile
107
```
108
109
Pydantic model describing a file available in a cloud source.
110
111
| Field | Type | Default | Description |
112
|---|---|---|---|
113
| `name` | `str` | *required* | File name |
114
| `id` | `str` | *required* | Provider-specific file identifier |
115
| `size_bytes` | `Optional[int]` | `None` | File size in bytes |
116
| `mime_type` | `Optional[str]` | `None` | MIME type (e.g., `"video/mp4"`) |
117
| `modified_at` | `Optional[str]` | `None` | Last modified timestamp |
118
| `path` | `Optional[str]` | `None` | Path within the source folder (used for subfolder structure in `download_all`) |
119
120
```json
121
{
122
"name": "sprint-review-2026-03-01.mp4",
123
"id": "abc123def456",
124
"size_bytes": 524288000,
125
"mime_type": "video/mp4",
126
"modified_at": "2026-03-01T14:30:00Z",
127
"path": "recordings/march/sprint-review-2026-03-01.mp4"
128
}
129
```
130
131
---
132
133
## Lazy Loading Pattern
134
135
All sources are lazy-loaded via `__getattr__` in the package `__init__.py`. This means importing `video_processor.sources` does not pull in any external dependencies (e.g., `google-auth`, `msal`, `notion-client`). The actual module is loaded only when you access the class.
136
137
```python
138
# This import is instant -- no dependencies loaded
139
from video_processor.sources import ZoomSource
140
141
# The zoom_source module (and its dependencies) are loaded here
142
source = ZoomSource()
143
```
144
145
---
146
147
## Available Sources
148
149
### Cloud Recordings
150
151
Sources for fetching recorded meetings from video conferencing platforms.
152
153
| Source | Class | Auth Method | Description |
154
|---|---|---|---|
155
| Zoom | `ZoomSource` | OAuth / Server-to-Server | List and download Zoom cloud recordings |
156
| Google Meet | `MeetRecordingSource` | OAuth (Google) | List and download Google Meet recordings from Drive |
157
| Microsoft Teams | `TeamsRecordingSource` | OAuth (Microsoft) | List and download Teams meeting recordings |
158
159
### Cloud Storage and Workspace
160
161
Sources for accessing files stored in cloud platforms.
162
163
| Source | Class | Auth Method | Description |
164
|---|---|---|---|
165
| Google Drive | `GoogleDriveSource` | OAuth (Google) | Files from Google Drive |
166
| Google Workspace | `GWSSource` | OAuth (Google) | Google Docs, Sheets, Slides |
167
| Microsoft 365 | `M365Source` | OAuth (Microsoft) | OneDrive, SharePoint files |
168
| Notion | `NotionSource` | OAuth / API key | Notion pages and databases |
169
| GitHub | `GitHubSource` | OAuth / API token | Repository files, issues, discussions |
170
| Dropbox | `DropboxSource` | OAuth / access token | *(via auth config)* |
171
172
### Notes Applications
173
174
Sources for local and cloud-based note-taking apps.
175
176
| Source | Class | Auth Method | Description |
177
|---|---|---|---|
178
| Apple Notes | `AppleNotesSource` | Local (macOS) | Notes from Apple Notes.app |
179
| Obsidian | `ObsidianSource` | Local filesystem | Markdown files from Obsidian vaults |
180
| Logseq | `LogseqSource` | Local filesystem | Pages from Logseq graphs |
181
| OneNote | `OneNoteSource` | OAuth (Microsoft) | Microsoft OneNote notebooks |
182
| Google Keep | `GoogleKeepSource` | OAuth (Google) | Google Keep notes |
183
184
### Web and Content
185
186
Sources for fetching content from the web.
187
188
| Source | Class | Auth Method | Description |
189
|---|---|---|---|
190
| YouTube | `YouTubeSource` | API key / OAuth | YouTube video metadata and transcripts |
191
| Web | `WebSource` | None | General web page content extraction |
192
| RSS | `RSSSource` | None | RSS/Atom feed entries |
193
| Podcast | `PodcastSource` | None | Podcast episodes from RSS feeds |
194
| arXiv | `ArxivSource` | None | Academic papers from arXiv |
195
| Hacker News | `HackerNewsSource` | None | Hacker News posts and comments |
196
| Reddit | `RedditSource` | API credentials | Reddit posts and comments |
197
| Twitter/X | `TwitterSource` | API credentials | Tweets and threads |
198
199
---
200
201
## Auth Integration
202
203
Most sources use PlanOpticon's unified auth system (see [Auth API](auth.md)). The typical pattern within a source implementation:
204
205
```python
206
from video_processor.auth import get_auth_manager
207
208
class MySource(BaseSource):
209
def __init__(self):
210
self._token = None
211
212
def authenticate(self) -> bool:
213
manager = get_auth_manager("my_service")
214
if manager:
215
token = manager.get_token()
216
if token:
217
self._token = token
218
return True
219
return False
220
221
def list_videos(self, **kwargs) -> list[SourceFile]:
222
if not self._token:
223
raise RuntimeError("Not authenticated. Call authenticate() first.")
224
# Use self._token to call the API
225
...
226
```
227
228
---
229
230
## Usage Examples
231
232
### Listing and downloading Zoom recordings
233
234
```python
235
from pathlib import Path
236
from video_processor.sources import ZoomSource
237
238
source = ZoomSource()
239
if source.authenticate():
240
recordings = source.list_videos()
241
for rec in recordings:
242
print(f"{rec.name} ({rec.size_bytes} bytes)")
243
244
# Download all to a local directory
245
paths = source.download_all(recordings, Path("./downloads"))
246
```
247
248
### Fetching from multiple sources
249
250
```python
251
from pathlib import Path
252
from video_processor.sources import GoogleDriveSource, NotionSource
253
254
# Google Drive
255
gdrive = GoogleDriveSource()
256
if gdrive.authenticate():
257
files = gdrive.list_videos(
258
folder_path="Meeting Recordings",
259
patterns=["*.mp4", "*.webm"],
260
)
261
gdrive.download_all(files, Path("./drive-downloads"))
262
263
# Notion
264
notion = NotionSource()
265
if notion.authenticate():
266
pages = notion.list_videos() # Lists Notion pages
267
for page in pages:
268
print(f"Page: {page.name}")
269
```
270
271
### YouTube content
272
273
```python
274
from video_processor.sources import YouTubeSource
275
276
yt = YouTubeSource()
277
if yt.authenticate():
278
videos = yt.list_videos(folder_path="https://youtube.com/playlist?list=...")
279
for v in videos:
280
print(f"{v.name} - {v.id}")
281
```
282

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button