PlanOpticon

planopticon / docs / api / sources.md

Source Rendered

Blame History Raw 282 lines

1	`# Sources API Reference`
2
3	`::: video_processor.sources.base`
4
5	`---`
6
7	`## Overview`
8
9	The sources module provides a unified interface for fetching content from cloud services, local applications, and the web. All sources implement the `BaseSource` abstract class, providing consistent `authenticate()`, `list_videos()`, and `download()` methods.
10
11	Sources are lazy-loaded to avoid pulling in optional dependencies at import time. You can import any source directly from `video_processor.sources` and the correct module will be loaded on demand.
12
13	`---`
14
15	`## BaseSource (ABC)`
16
17	```python
18	`from video_processor.sources import BaseSource`
19	```
20
21	`Abstract base class that all source integrations implement. Defines the standard three-step workflow: authenticate, list, download.`
22
23	`### authenticate()`
24
25	```python
26	`@abstractmethod`
27	`def authenticate(self) -> bool`
28	```
29
30	`Authenticate with the cloud provider or service. Uses the auth strategy defined for the source (OAuth, API key, local access, etc.).`
31
32	Returns: `bool` -- `True` on successful authentication, `False` on failure.
33
34	`### list_videos()`
35
36	```python
37	`@abstractmethod`
38	`def list_videos(`
39	`self,`
40	`folder_id: Optional[str] = None,`
41	`folder_path: Optional[str] = None,`
42	`patterns: Optional[List[str]] = None,`
43	`) -> List[SourceFile]`
44	```
45
46	`List available video files (or other content, depending on the source).`
47
48	`Parameters:`
49
50	`\| Parameter \| Type \| Default \| Description \|`
51	`\|---\|---\|---\|---\|`
52	\| `folder_id` \| `Optional[str]` \| `None` \| Provider-specific folder/container identifier \|
53	\| `folder_path` \| `Optional[str]` \| `None` \| Path within the source (e.g., folder name) \|
54	\| `patterns` \| `Optional[List[str]]` \| `None` \| File name glob patterns to filter results \|
55
56	Returns: `List[SourceFile]` -- available files matching the criteria.
57
58	`### download()`
59
60	```python
61	`@abstractmethod`
62	`def download(`
63	`self,`
64	`file: SourceFile,`
65	`destination: Path,`
66	`) -> Path`
67	```
68
69	`Download a single file to a local path.`
70
71	`Parameters:`
72
73	`\| Parameter \| Type \| Description \|`
74	`\|---\|---\|---\|`
75	\| `file` \| `SourceFile` \| File descriptor from `list_videos()` \|
76	\| `destination` \| `Path` \| Local destination path \|
77
78	Returns: `Path` -- the local path where the file was saved.
79
80	`### download_all()`
81
82	```python
83	`def download_all(`
84	`self,`
85	`files: List[SourceFile],`
86	`destination_dir: Path,`
87	`) -> List[Path]`
88	```
89
90	Download multiple files to a directory, preserving subfolder structure from `SourceFile.path`. This is a concrete method provided by the base class.
91
92	`Parameters:`
93
94	`\| Parameter \| Type \| Description \|`
95	`\|---\|---\|---\|`
96	\| `files` \| `List[SourceFile]` \| Files to download \|
97	\| `destination_dir` \| `Path` \| Base directory for downloads (created if needed) \|
98
99	Returns: `List[Path]` -- local paths of successfully downloaded files. Failed downloads are logged and skipped.
100
101	`---`
102
103	`## SourceFile`
104
105	```python
106	`from video_processor.sources import SourceFile`
107	```
108
109	`Pydantic model describing a file available in a cloud source.`
110
111	`\| Field \| Type \| Default \| Description \|`
112	`\|---\|---\|---\|---\|`
113	\| `name` \| `str` \| required \| File name \|
114	\| `id` \| `str` \| required \| Provider-specific file identifier \|
115	\| `size_bytes` \| `Optional[int]` \| `None` \| File size in bytes \|
116	\| `mime_type` \| `Optional[str]` \| `None` \| MIME type (e.g., `"video/mp4"`) \|
117	\| `modified_at` \| `Optional[str]` \| `None` \| Last modified timestamp \|
118	\| `path` \| `Optional[str]` \| `None` \| Path within the source folder (used for subfolder structure in `download_all`) \|
119
120	```json
121	`{`
122	`"name": "sprint-review-2026-03-01.mp4",`
123	`"id": "abc123def456",`
124	`"size_bytes": 524288000,`
125	`"mime_type": "video/mp4",`
126	`"modified_at": "2026-03-01T14:30:00Z",`
127	`"path": "recordings/march/sprint-review-2026-03-01.mp4"`
128	`}`
129	```
130
131	`---`
132
133	`## Lazy Loading Pattern`
134
135	All sources are lazy-loaded via `__getattr__` in the package `__init__.py`. This means importing `video_processor.sources` does not pull in any external dependencies (e.g., `google-auth`, `msal`, `notion-client`). The actual module is loaded only when you access the class.
136
137	```python
138	`# This import is instant -- no dependencies loaded`
139	`from video_processor.sources import ZoomSource`
140
141	`# The zoom_source module (and its dependencies) are loaded here`
142	`source = ZoomSource()`
143	```
144
145	`---`
146
147	`## Available Sources`
148
149	`### Cloud Recordings`
150
151	`Sources for fetching recorded meetings from video conferencing platforms.`
152
153	`\| Source \| Class \| Auth Method \| Description \|`
154	`\|---\|---\|---\|---\|`
155	\| Zoom \| `ZoomSource` \| OAuth / Server-to-Server \| List and download Zoom cloud recordings \|
156	\| Google Meet \| `MeetRecordingSource` \| OAuth (Google) \| List and download Google Meet recordings from Drive \|
157	\| Microsoft Teams \| `TeamsRecordingSource` \| OAuth (Microsoft) \| List and download Teams meeting recordings \|
158
159	`### Cloud Storage and Workspace`
160
161	`Sources for accessing files stored in cloud platforms.`
162
163	`\| Source \| Class \| Auth Method \| Description \|`
164	`\|---\|---\|---\|---\|`
165	\| Google Drive \| `GoogleDriveSource` \| OAuth (Google) \| Files from Google Drive \|
166	\| Google Workspace \| `GWSSource` \| OAuth (Google) \| Google Docs, Sheets, Slides \|
167	\| Microsoft 365 \| `M365Source` \| OAuth (Microsoft) \| OneDrive, SharePoint files \|
168	\| Notion \| `NotionSource` \| OAuth / API key \| Notion pages and databases \|
169	\| GitHub \| `GitHubSource` \| OAuth / API token \| Repository files, issues, discussions \|
170	\| Dropbox \| `DropboxSource` \| OAuth / access token \| (via auth config) \|
171
172	`### Notes Applications`
173
174	`Sources for local and cloud-based note-taking apps.`
175
176	`\| Source \| Class \| Auth Method \| Description \|`
177	`\|---\|---\|---\|---\|`
178	\| Apple Notes \| `AppleNotesSource` \| Local (macOS) \| Notes from Apple Notes.app \|
179	\| Obsidian \| `ObsidianSource` \| Local filesystem \| Markdown files from Obsidian vaults \|
180	\| Logseq \| `LogseqSource` \| Local filesystem \| Pages from Logseq graphs \|
181	\| OneNote \| `OneNoteSource` \| OAuth (Microsoft) \| Microsoft OneNote notebooks \|
182	\| Google Keep \| `GoogleKeepSource` \| OAuth (Google) \| Google Keep notes \|
183
184	`### Web and Content`
185
186	`Sources for fetching content from the web.`
187
188	`\| Source \| Class \| Auth Method \| Description \|`
189	`\|---\|---\|---\|---\|`
190	\| YouTube \| `YouTubeSource` \| API key / OAuth \| YouTube video metadata and transcripts \|
191	\| Web \| `WebSource` \| None \| General web page content extraction \|
192	\| RSS \| `RSSSource` \| None \| RSS/Atom feed entries \|
193	\| Podcast \| `PodcastSource` \| None \| Podcast episodes from RSS feeds \|
194	\| arXiv \| `ArxivSource` \| None \| Academic papers from arXiv \|
195	\| Hacker News \| `HackerNewsSource` \| None \| Hacker News posts and comments \|
196	\| Reddit \| `RedditSource` \| API credentials \| Reddit posts and comments \|
197	\| Twitter/X \| `TwitterSource` \| API credentials \| Tweets and threads \|
198
199	`---`
200
201	`## Auth Integration`
202
203	`Most sources use PlanOpticon's unified auth system (see [Auth API](auth.md)). The typical pattern within a source implementation:`
204
205	```python
206	`from video_processor.auth import get_auth_manager`
207
208	`class MySource(BaseSource):`
209	`def __init__(self):`
210	`self._token = None`
211
212	`def authenticate(self) -> bool:`
213	`manager = get_auth_manager("my_service")`
214	`if manager:`
215	`token = manager.get_token()`
216	`if token:`
217	`self._token = token`
218	`return True`
219	`return False`
220
221	`def list_videos(self, **kwargs) -> list[SourceFile]:`
222	`if not self._token:`
223	`raise RuntimeError("Not authenticated. Call authenticate() first.")`
224	`# Use self._token to call the API`
225	`...`
226	```
227
228	`---`
229
230	`## Usage Examples`
231
232	`### Listing and downloading Zoom recordings`
233
234	```python
235	`from pathlib import Path`
236	`from video_processor.sources import ZoomSource`
237
238	`source = ZoomSource()`
239	`if source.authenticate():`
240	`recordings = source.list_videos()`
241	`for rec in recordings:`
242	`print(f"{rec.name} ({rec.size_bytes} bytes)")`
243
244	`# Download all to a local directory`
245	`paths = source.download_all(recordings, Path("./downloads"))`
246	```
247
248	`### Fetching from multiple sources`
249
250	```python
251	`from pathlib import Path`
252	`from video_processor.sources import GoogleDriveSource, NotionSource`
253
254	`# Google Drive`
255	`gdrive = GoogleDriveSource()`
256	`if gdrive.authenticate():`
257	`files = gdrive.list_videos(`
258	`folder_path="Meeting Recordings",`
259	`patterns=[".mp4", ".webm"],`
260	`)`
261	`gdrive.download_all(files, Path("./drive-downloads"))`
262
263	`# Notion`
264	`notion = NotionSource()`
265	`if notion.authenticate():`
266	`pages = notion.list_videos() # Lists Notion pages`
267	`for page in pages:`
268	`print(f"Page: {page.name}")`
269	```
270
271	`### YouTube content`
272
273	```python
274	`from video_processor.sources import YouTubeSource`
275
276	`yt = YouTubeSource()`
277	`if yt.authenticate():`
278	`videos = yt.list_videos(folder_path="https://youtube.com/playlist?list=...")`
279	`for v in videos:`
280	`print(f"{v.name} - {v.id}")`
281	```
282

PlanOpticon

Keyboard Shortcuts