PlanOpticon

planopticon / docs / api / sources.md
Source Blame History 281 lines
3551b80… noreply 1 # Sources API Reference
3551b80… noreply 2
3551b80… noreply 3 ::: video_processor.sources.base
3551b80… noreply 4
3551b80… noreply 5 ---
3551b80… noreply 6
3551b80… noreply 7 ## Overview
3551b80… noreply 8
3551b80… noreply 9 The sources module provides a unified interface for fetching content from cloud services, local applications, and the web. All sources implement the `BaseSource` abstract class, providing consistent `authenticate()`, `list_videos()`, and `download()` methods.
3551b80… noreply 10
3551b80… noreply 11 Sources are lazy-loaded to avoid pulling in optional dependencies at import time. You can import any source directly from `video_processor.sources` and the correct module will be loaded on demand.
3551b80… noreply 12
3551b80… noreply 13 ---
3551b80… noreply 14
3551b80… noreply 15 ## BaseSource (ABC)
3551b80… noreply 16
3551b80… noreply 17 ```python
3551b80… noreply 18 from video_processor.sources import BaseSource
3551b80… noreply 19 ```
3551b80… noreply 20
3551b80… noreply 21 Abstract base class that all source integrations implement. Defines the standard three-step workflow: authenticate, list, download.
3551b80… noreply 22
3551b80… noreply 23 ### authenticate()
3551b80… noreply 24
3551b80… noreply 25 ```python
3551b80… noreply 26 @abstractmethod
3551b80… noreply 27 def authenticate(self) -> bool
3551b80… noreply 28 ```
3551b80… noreply 29
3551b80… noreply 30 Authenticate with the cloud provider or service. Uses the auth strategy defined for the source (OAuth, API key, local access, etc.).
3551b80… noreply 31
3551b80… noreply 32 **Returns:** `bool` -- `True` on successful authentication, `False` on failure.
3551b80… noreply 33
3551b80… noreply 34 ### list_videos()
3551b80… noreply 35
3551b80… noreply 36 ```python
3551b80… noreply 37 @abstractmethod
3551b80… noreply 38 def list_videos(
3551b80… noreply 39 self,
3551b80… noreply 40 folder_id: Optional[str] = None,
3551b80… noreply 41 folder_path: Optional[str] = None,
3551b80… noreply 42 patterns: Optional[List[str]] = None,
3551b80… noreply 43 ) -> List[SourceFile]
3551b80… noreply 44 ```
3551b80… noreply 45
3551b80… noreply 46 List available video files (or other content, depending on the source).
3551b80… noreply 47
3551b80… noreply 48 **Parameters:**
3551b80… noreply 49
3551b80… noreply 50 | Parameter | Type | Default | Description |
3551b80… noreply 51 |---|---|---|---|
3551b80… noreply 52 | `folder_id` | `Optional[str]` | `None` | Provider-specific folder/container identifier |
3551b80… noreply 53 | `folder_path` | `Optional[str]` | `None` | Path within the source (e.g., folder name) |
3551b80… noreply 54 | `patterns` | `Optional[List[str]]` | `None` | File name glob patterns to filter results |
3551b80… noreply 55
3551b80… noreply 56 **Returns:** `List[SourceFile]` -- available files matching the criteria.
3551b80… noreply 57
3551b80… noreply 58 ### download()
3551b80… noreply 59
3551b80… noreply 60 ```python
3551b80… noreply 61 @abstractmethod
3551b80… noreply 62 def download(
3551b80… noreply 63 self,
3551b80… noreply 64 file: SourceFile,
3551b80… noreply 65 destination: Path,
3551b80… noreply 66 ) -> Path
3551b80… noreply 67 ```
3551b80… noreply 68
3551b80… noreply 69 Download a single file to a local path.
3551b80… noreply 70
3551b80… noreply 71 **Parameters:**
3551b80… noreply 72
3551b80… noreply 73 | Parameter | Type | Description |
3551b80… noreply 74 |---|---|---|
3551b80… noreply 75 | `file` | `SourceFile` | File descriptor from `list_videos()` |
3551b80… noreply 76 | `destination` | `Path` | Local destination path |
3551b80… noreply 77
3551b80… noreply 78 **Returns:** `Path` -- the local path where the file was saved.
3551b80… noreply 79
3551b80… noreply 80 ### download_all()
3551b80… noreply 81
3551b80… noreply 82 ```python
3551b80… noreply 83 def download_all(
3551b80… noreply 84 self,
3551b80… noreply 85 files: List[SourceFile],
3551b80… noreply 86 destination_dir: Path,
3551b80… noreply 87 ) -> List[Path]
3551b80… noreply 88 ```
3551b80… noreply 89
3551b80… noreply 90 Download multiple files to a directory, preserving subfolder structure from `SourceFile.path`. This is a concrete method provided by the base class.
3551b80… noreply 91
3551b80… noreply 92 **Parameters:**
3551b80… noreply 93
3551b80… noreply 94 | Parameter | Type | Description |
3551b80… noreply 95 |---|---|---|
3551b80… noreply 96 | `files` | `List[SourceFile]` | Files to download |
3551b80… noreply 97 | `destination_dir` | `Path` | Base directory for downloads (created if needed) |
3551b80… noreply 98
3551b80… noreply 99 **Returns:** `List[Path]` -- local paths of successfully downloaded files. Failed downloads are logged and skipped.
3551b80… noreply 100
3551b80… noreply 101 ---
3551b80… noreply 102
3551b80… noreply 103 ## SourceFile
3551b80… noreply 104
3551b80… noreply 105 ```python
3551b80… noreply 106 from video_processor.sources import SourceFile
3551b80… noreply 107 ```
3551b80… noreply 108
3551b80… noreply 109 Pydantic model describing a file available in a cloud source.
3551b80… noreply 110
3551b80… noreply 111 | Field | Type | Default | Description |
3551b80… noreply 112 |---|---|---|---|
3551b80… noreply 113 | `name` | `str` | *required* | File name |
3551b80… noreply 114 | `id` | `str` | *required* | Provider-specific file identifier |
3551b80… noreply 115 | `size_bytes` | `Optional[int]` | `None` | File size in bytes |
3551b80… noreply 116 | `mime_type` | `Optional[str]` | `None` | MIME type (e.g., `"video/mp4"`) |
3551b80… noreply 117 | `modified_at` | `Optional[str]` | `None` | Last modified timestamp |
3551b80… noreply 118 | `path` | `Optional[str]` | `None` | Path within the source folder (used for subfolder structure in `download_all`) |
3551b80… noreply 119
3551b80… noreply 120 ```json
3551b80… noreply 121 {
3551b80… noreply 122 "name": "sprint-review-2026-03-01.mp4",
3551b80… noreply 123 "id": "abc123def456",
3551b80… noreply 124 "size_bytes": 524288000,
3551b80… noreply 125 "mime_type": "video/mp4",
3551b80… noreply 126 "modified_at": "2026-03-01T14:30:00Z",
3551b80… noreply 127 "path": "recordings/march/sprint-review-2026-03-01.mp4"
3551b80… noreply 128 }
3551b80… noreply 129 ```
3551b80… noreply 130
3551b80… noreply 131 ---
3551b80… noreply 132
3551b80… noreply 133 ## Lazy Loading Pattern
3551b80… noreply 134
3551b80… noreply 135 All sources are lazy-loaded via `__getattr__` in the package `__init__.py`. This means importing `video_processor.sources` does not pull in any external dependencies (e.g., `google-auth`, `msal`, `notion-client`). The actual module is loaded only when you access the class.
3551b80… noreply 136
3551b80… noreply 137 ```python
3551b80… noreply 138 # This import is instant -- no dependencies loaded
3551b80… noreply 139 from video_processor.sources import ZoomSource
3551b80… noreply 140
3551b80… noreply 141 # The zoom_source module (and its dependencies) are loaded here
3551b80… noreply 142 source = ZoomSource()
3551b80… noreply 143 ```
3551b80… noreply 144
3551b80… noreply 145 ---
3551b80… noreply 146
3551b80… noreply 147 ## Available Sources
3551b80… noreply 148
3551b80… noreply 149 ### Cloud Recordings
3551b80… noreply 150
3551b80… noreply 151 Sources for fetching recorded meetings from video conferencing platforms.
3551b80… noreply 152
3551b80… noreply 153 | Source | Class | Auth Method | Description |
3551b80… noreply 154 |---|---|---|---|
3551b80… noreply 155 | Zoom | `ZoomSource` | OAuth / Server-to-Server | List and download Zoom cloud recordings |
3551b80… noreply 156 | Google Meet | `MeetRecordingSource` | OAuth (Google) | List and download Google Meet recordings from Drive |
3551b80… noreply 157 | Microsoft Teams | `TeamsRecordingSource` | OAuth (Microsoft) | List and download Teams meeting recordings |
3551b80… noreply 158
3551b80… noreply 159 ### Cloud Storage and Workspace
3551b80… noreply 160
3551b80… noreply 161 Sources for accessing files stored in cloud platforms.
3551b80… noreply 162
3551b80… noreply 163 | Source | Class | Auth Method | Description |
3551b80… noreply 164 |---|---|---|---|
3551b80… noreply 165 | Google Drive | `GoogleDriveSource` | OAuth (Google) | Files from Google Drive |
3551b80… noreply 166 | Google Workspace | `GWSSource` | OAuth (Google) | Google Docs, Sheets, Slides |
3551b80… noreply 167 | Microsoft 365 | `M365Source` | OAuth (Microsoft) | OneDrive, SharePoint files |
3551b80… noreply 168 | Notion | `NotionSource` | OAuth / API key | Notion pages and databases |
3551b80… noreply 169 | GitHub | `GitHubSource` | OAuth / API token | Repository files, issues, discussions |
3551b80… noreply 170 | Dropbox | `DropboxSource` | OAuth / access token | *(via auth config)* |
3551b80… noreply 171
3551b80… noreply 172 ### Notes Applications
3551b80… noreply 173
3551b80… noreply 174 Sources for local and cloud-based note-taking apps.
3551b80… noreply 175
3551b80… noreply 176 | Source | Class | Auth Method | Description |
3551b80… noreply 177 |---|---|---|---|
3551b80… noreply 178 | Apple Notes | `AppleNotesSource` | Local (macOS) | Notes from Apple Notes.app |
3551b80… noreply 179 | Obsidian | `ObsidianSource` | Local filesystem | Markdown files from Obsidian vaults |
3551b80… noreply 180 | Logseq | `LogseqSource` | Local filesystem | Pages from Logseq graphs |
3551b80… noreply 181 | OneNote | `OneNoteSource` | OAuth (Microsoft) | Microsoft OneNote notebooks |
3551b80… noreply 182 | Google Keep | `GoogleKeepSource` | OAuth (Google) | Google Keep notes |
3551b80… noreply 183
3551b80… noreply 184 ### Web and Content
3551b80… noreply 185
3551b80… noreply 186 Sources for fetching content from the web.
3551b80… noreply 187
3551b80… noreply 188 | Source | Class | Auth Method | Description |
3551b80… noreply 189 |---|---|---|---|
3551b80… noreply 190 | YouTube | `YouTubeSource` | API key / OAuth | YouTube video metadata and transcripts |
3551b80… noreply 191 | Web | `WebSource` | None | General web page content extraction |
3551b80… noreply 192 | RSS | `RSSSource` | None | RSS/Atom feed entries |
3551b80… noreply 193 | Podcast | `PodcastSource` | None | Podcast episodes from RSS feeds |
3551b80… noreply 194 | arXiv | `ArxivSource` | None | Academic papers from arXiv |
3551b80… noreply 195 | Hacker News | `HackerNewsSource` | None | Hacker News posts and comments |
3551b80… noreply 196 | Reddit | `RedditSource` | API credentials | Reddit posts and comments |
3551b80… noreply 197 | Twitter/X | `TwitterSource` | API credentials | Tweets and threads |
3551b80… noreply 198
3551b80… noreply 199 ---
3551b80… noreply 200
3551b80… noreply 201 ## Auth Integration
3551b80… noreply 202
3551b80… noreply 203 Most sources use PlanOpticon's unified auth system (see [Auth API](auth.md)). The typical pattern within a source implementation:
3551b80… noreply 204
3551b80… noreply 205 ```python
3551b80… noreply 206 from video_processor.auth import get_auth_manager
3551b80… noreply 207
3551b80… noreply 208 class MySource(BaseSource):
3551b80… noreply 209 def __init__(self):
3551b80… noreply 210 self._token = None
3551b80… noreply 211
3551b80… noreply 212 def authenticate(self) -> bool:
3551b80… noreply 213 manager = get_auth_manager("my_service")
3551b80… noreply 214 if manager:
3551b80… noreply 215 token = manager.get_token()
3551b80… noreply 216 if token:
3551b80… noreply 217 self._token = token
3551b80… noreply 218 return True
3551b80… noreply 219 return False
3551b80… noreply 220
3551b80… noreply 221 def list_videos(self, **kwargs) -> list[SourceFile]:
3551b80… noreply 222 if not self._token:
3551b80… noreply 223 raise RuntimeError("Not authenticated. Call authenticate() first.")
3551b80… noreply 224 # Use self._token to call the API
3551b80… noreply 225 ...
3551b80… noreply 226 ```
3551b80… noreply 227
3551b80… noreply 228 ---
3551b80… noreply 229
3551b80… noreply 230 ## Usage Examples
3551b80… noreply 231
3551b80… noreply 232 ### Listing and downloading Zoom recordings
3551b80… noreply 233
3551b80… noreply 234 ```python
3551b80… noreply 235 from pathlib import Path
3551b80… noreply 236 from video_processor.sources import ZoomSource
3551b80… noreply 237
3551b80… noreply 238 source = ZoomSource()
3551b80… noreply 239 if source.authenticate():
3551b80… noreply 240 recordings = source.list_videos()
3551b80… noreply 241 for rec in recordings:
3551b80… noreply 242 print(f"{rec.name} ({rec.size_bytes} bytes)")
3551b80… noreply 243
3551b80… noreply 244 # Download all to a local directory
3551b80… noreply 245 paths = source.download_all(recordings, Path("./downloads"))
3551b80… noreply 246 ```
3551b80… noreply 247
3551b80… noreply 248 ### Fetching from multiple sources
3551b80… noreply 249
3551b80… noreply 250 ```python
3551b80… noreply 251 from pathlib import Path
3551b80… noreply 252 from video_processor.sources import GoogleDriveSource, NotionSource
3551b80… noreply 253
3551b80… noreply 254 # Google Drive
3551b80… noreply 255 gdrive = GoogleDriveSource()
3551b80… noreply 256 if gdrive.authenticate():
3551b80… noreply 257 files = gdrive.list_videos(
3551b80… noreply 258 folder_path="Meeting Recordings",
3551b80… noreply 259 patterns=["*.mp4", "*.webm"],
3551b80… noreply 260 )
3551b80… noreply 261 gdrive.download_all(files, Path("./drive-downloads"))
3551b80… noreply 262
3551b80… noreply 263 # Notion
3551b80… noreply 264 notion = NotionSource()
3551b80… noreply 265 if notion.authenticate():
3551b80… noreply 266 pages = notion.list_videos() # Lists Notion pages
3551b80… noreply 267 for page in pages:
3551b80… noreply 268 print(f"Page: {page.name}")
3551b80… noreply 269 ```
3551b80… noreply 270
3551b80… noreply 271 ### YouTube content
3551b80… noreply 272
3551b80… noreply 273 ```python
3551b80… noreply 274 from video_processor.sources import YouTubeSource
3551b80… noreply 275
3551b80… noreply 276 yt = YouTubeSource()
3551b80… noreply 277 if yt.authenticate():
3551b80… noreply 278 videos = yt.list_videos(folder_path="https://youtube.com/playlist?list=...")
3551b80… noreply 279 for v in videos:
3551b80… noreply 280 print(f"{v.name} - {v.id}")
3551b80… noreply 281 ```

Keyboard Shortcuts

Open search /
Next entry (timeline) j
Previous entry (timeline) k
Open focused entry Enter
Show this help ?
Toggle theme Top nav button