PlanOpticon

planopticon / docs / guide / document-ingestion.md

Source Blame History 434 lines

3551b80…	noreply	1	# Document Ingestion
3551b80…	noreply	2
3551b80…	noreply	3	Document ingestion lets you process files -- PDFs, Markdown, and plaintext -- into a knowledge graph. PlanOpticon extracts text from documents, chunks it into manageable pieces, runs LLM-powered entity and relationship extraction, and stores the results in a FalkorDB knowledge graph. This is the same knowledge graph format produced by video analysis, so you can combine video and document insights in a single graph.
3551b80…	noreply	4
3551b80…	noreply	5	## Supported formats
3551b80…	noreply	6
3551b80…	noreply	7	\| Extension \| Processor \| Description \|
3551b80…	noreply	8	\|-----------\|-----------\|-------------\|
3551b80…	noreply	9	\| `.pdf` \| `PdfProcessor` \| Extracts text page by page using pymupdf or pdfplumber \|
3551b80…	noreply	10	\| `.md`, `.markdown` \| `MarkdownProcessor` \| Splits on headings into sections \|
3551b80…	noreply	11	\| `.txt`, `.text`, `.log`, `.csv` \| `PlaintextProcessor` \| Splits on paragraph boundaries \|
3551b80…	noreply	12
3551b80…	noreply	13	Additional formats can be added by implementing the `DocumentProcessor` base class and registering it (see [Extending with custom processors](#extending-with-custom-processors) below).
3551b80…	noreply	14
3551b80…	noreply	15	## CLI usage
3551b80…	noreply	16
3551b80…	noreply	17	### `planopticon ingest`
3551b80…	noreply	18
3551b80…	noreply	19	```
3551b80…	noreply	20	planopticon ingest INPUT_PATH [OPTIONS]
3551b80…	noreply	21	```
3551b80…	noreply	22
3551b80…	noreply	23	Arguments:
3551b80…	noreply	24
3551b80…	noreply	25	\| Argument \| Description \|
3551b80…	noreply	26	\|----------\|-------------\|
3551b80…	noreply	27	\| `INPUT_PATH` \| Path to a file or directory to ingest (must exist) \|
3551b80…	noreply	28
3551b80…	noreply	29	Options:
3551b80…	noreply	30
3551b80…	noreply	31	\| Option \| Short \| Default \| Description \|
3551b80…	noreply	32	\|--------\|-------\|---------\|-------------\|
3551b80…	noreply	33	\| `--output` \| `-o` \| Current directory \| Output directory for the knowledge graph \|
3551b80…	noreply	34	\| `--db-path` \| \| None \| Path to an existing `knowledge_graph.db` to merge into \|
3551b80…	noreply	35	\| `--recursive / --no-recursive` \| `-r` \| `--recursive` \| Recurse into subdirectories (directory ingestion only) \|
3551b80…	noreply	36	\| `--provider` \| `-p` \| `auto` \| LLM provider for entity extraction (`openai`, `anthropic`, `gemini`, `ollama`, `azure`, `together`, `fireworks`, `cerebras`, `xai`) \|
3551b80…	noreply	37	\| `--chat-model` \| \| None \| Override the model used for LLM entity extraction \|
3551b80…	noreply	38
3551b80…	noreply	39	### Single file ingestion
3551b80…	noreply	40
3551b80…	noreply	41	Process a single document and create a new knowledge graph:
3551b80…	noreply	42
3551b80…	noreply	43	```bash
3551b80…	noreply	44	planopticon ingest spec.md
3551b80…	noreply	45	```
3551b80…	noreply	46
3551b80…	noreply	47	This creates `knowledge_graph.db` and `knowledge_graph.json` in the current directory.
3551b80…	noreply	48
3551b80…	noreply	49	Specify an output directory:
3551b80…	noreply	50
3551b80…	noreply	51	```bash
3551b80…	noreply	52	planopticon ingest report.pdf -o ./results
3551b80…	noreply	53	```
3551b80…	noreply	54
3551b80…	noreply	55	This creates `./results/knowledge_graph.db` and `./results/knowledge_graph.json`.
3551b80…	noreply	56
3551b80…	noreply	57	### Directory ingestion
3551b80…	noreply	58
3551b80…	noreply	59	Process all supported files in a directory:
3551b80…	noreply	60
3551b80…	noreply	61	```bash
3551b80…	noreply	62	planopticon ingest ./docs/
3551b80…	noreply	63	```
3551b80…	noreply	64
3551b80…	noreply	65	By default, this recurses into subdirectories. To process only the top-level directory:
3551b80…	noreply	66
3551b80…	noreply	67	```bash
3551b80…	noreply	68	planopticon ingest ./docs/ --no-recursive
3551b80…	noreply	69	```
3551b80…	noreply	70
3551b80…	noreply	71	PlanOpticon automatically filters for supported file extensions. Unsupported files are silently skipped.
3551b80…	noreply	72
3551b80…	noreply	73	### Merging into an existing knowledge graph
3551b80…	noreply	74
3551b80…	noreply	75	To add document content to an existing knowledge graph (e.g., one created from video analysis), use `--db-path`:
3551b80…	noreply	76
3551b80…	noreply	77	```bash
3551b80…	noreply	78	# First, analyze a video
3551b80…	noreply	79	planopticon analyze meeting.mp4 -o ./results
3551b80…	noreply	80
3551b80…	noreply	81	# Then, ingest supplementary documents into the same graph
3551b80…	noreply	82	planopticon ingest ./meeting-notes/ --db-path ./results/knowledge_graph.db
3551b80…	noreply	83	```
3551b80…	noreply	84
3551b80…	noreply	85	The ingested entities and relationships are merged with the existing graph. Duplicate entities are consolidated automatically by the knowledge graph engine.
3551b80…	noreply	86
3551b80…	noreply	87	### Choosing an LLM provider
3551b80…	noreply	88
3551b80…	noreply	89	Entity and relationship extraction requires an LLM. By default, PlanOpticon auto-detects available providers based on your environment variables. You can override this:
3551b80…	noreply	90
3551b80…	noreply	91	```bash
3551b80…	noreply	92	# Use Anthropic for extraction
3551b80…	noreply	93	planopticon ingest docs/ -p anthropic
3551b80…	noreply	94
3551b80…	noreply	95	# Use a specific model
3551b80…	noreply	96	planopticon ingest docs/ -p openai --chat-model gpt-4o
3551b80…	noreply	97
3551b80…	noreply	98	# Use a local Ollama model
3551b80…	noreply	99	planopticon ingest docs/ -p ollama --chat-model llama3
3551b80…	noreply	100	```
3551b80…	noreply	101
3551b80…	noreply	102	### Output
3551b80…	noreply	103
3551b80…	noreply	104	After ingestion, PlanOpticon prints a summary:
3551b80…	noreply	105
3551b80…	noreply	106	```
3551b80…	noreply	107	Knowledge graph: ./knowledge_graph.db
3551b80…	noreply	108	spec.md: 12 chunks
3551b80…	noreply	109	architecture.md: 8 chunks
3551b80…	noreply	110	requirements.txt: 3 chunks
3551b80…	noreply	111
3551b80…	noreply	112	Ingestion complete:
3551b80…	noreply	113	Files processed: 3
3551b80…	noreply	114	Total chunks: 23
3551b80…	noreply	115	Entities extracted: 47
3551b80…	noreply	116	Relationships: 31
3551b80…	noreply	117	Knowledge graph: ./knowledge_graph.db
3551b80…	noreply	118	```
3551b80…	noreply	119
3551b80…	noreply	120	Both `.db` (SQLite/FalkorDB) and `.json` formats are saved automatically.
3551b80…	noreply	121
3551b80…	noreply	122	## How each processor works
3551b80…	noreply	123
3551b80…	noreply	124	### PDF processor
3551b80…	noreply	125
3551b80…	noreply	126	The `PdfProcessor` extracts text from PDF files on a per-page basis. It tries two extraction libraries in order:
3551b80…	noreply	127
3551b80…	noreply	128	1. pymupdf (preferred) -- Fast, reliable text extraction. Install with `pip install pymupdf`.
3551b80…	noreply	129	2. pdfplumber (fallback) -- Alternative extractor. Install with `pip install pdfplumber`.
3551b80…	noreply	130
3551b80…	noreply	131	If neither library is installed, the processor raises an `ImportError` with installation instructions.
3551b80…	noreply	132
3551b80…	noreply	133	Each page becomes a separate `DocumentChunk` with:
3551b80…	noreply	134
3551b80…	noreply	135	- `text`: The extracted text content of the page
3551b80…	noreply	136	- `page`: The 1-based page number
3551b80…	noreply	137	- `metadata.extraction_method`: Which library was used (`pymupdf` or `pdfplumber`)
3551b80…	noreply	138
3551b80…	noreply	139	To install PDF support:
3551b80…	noreply	140
3551b80…	noreply	141	```bash
3551b80…	noreply	142	pip install 'planopticon[pdf]'
3551b80…	noreply	143	# or
3551b80…	noreply	144	pip install pymupdf
3551b80…	noreply	145	# or
3551b80…	noreply	146	pip install pdfplumber
3551b80…	noreply	147	```
3551b80…	noreply	148
3551b80…	noreply	149	### Markdown processor
3551b80…	noreply	150
3551b80…	noreply	151	The `MarkdownProcessor` splits Markdown files on heading boundaries (lines starting with `#` through `######`). Each heading and its content until the next heading becomes a separate chunk.
3551b80…	noreply	152
3551b80…	noreply	153	Splitting behavior:
3551b80…	noreply	154
3551b80…	noreply	155	- If the file contains headings, each heading section becomes a chunk. The `section` field records the heading text.
3551b80…	noreply	156	- Content before the first heading is captured as a `(preamble)` chunk.
3551b80…	noreply	157	- If the file contains no headings, it falls back to paragraph-based chunking (same as plaintext).
3551b80…	noreply	158
3551b80…	noreply	159	For example, a file with this structure:
3551b80…	noreply	160
3551b80…	noreply	161	```markdown
3551b80…	noreply	162	Some intro text.
3551b80…	noreply	163
3551b80…	noreply	164	# Architecture
3551b80…	noreply	165
3551b80…	noreply	166	The system uses a microservices architecture...
3551b80…	noreply	167
3551b80…	noreply	168	## Components
3551b80…	noreply	169
3551b80…	noreply	170	There are three main components...
3551b80…	noreply	171
3551b80…	noreply	172	# Deployment
3551b80…	noreply	173
3551b80…	noreply	174	Deployment is handled via...
3551b80…	noreply	175	```
3551b80…	noreply	176
3551b80…	noreply	177	Produces four chunks: `(preamble)`, `Architecture`, `Components`, and `Deployment`.
3551b80…	noreply	178
3551b80…	noreply	179	### Plaintext processor
3551b80…	noreply	180
3551b80…	noreply	181	The `PlaintextProcessor` handles `.txt`, `.text`, `.log`, and `.csv` files. It splits text on paragraph boundaries (double newlines) and groups paragraphs into chunks with a configurable maximum size.
3551b80…	noreply	182
3551b80…	noreply	183	Chunking parameters:
3551b80…	noreply	184
3551b80…	noreply	185	\| Parameter \| Default \| Description \|
3551b80…	noreply	186	\|-----------\|---------\|-------------\|
3551b80…	noreply	187	\| `max_chunk_size` \| 2000 characters \| Maximum size of each chunk \|
3551b80…	noreply	188	\| `overlap` \| 200 characters \| Number of characters from the end of one chunk to repeat at the start of the next \|
3551b80…	noreply	189
3551b80…	noreply	190	The overlap ensures that entities or context that spans a paragraph boundary are not lost. Chunks are created by accumulating paragraphs until the next paragraph would exceed `max_chunk_size`, at which point the current chunk is flushed and a new one begins.
3551b80…	noreply	191
3551b80…	noreply	192	## The ingestion pipeline
3551b80…	noreply	193
3551b80…	noreply	194	Document ingestion follows this pipeline:
3551b80…	noreply	195
3551b80…	noreply	196	```
3551b80…	noreply	197	File on disk
3551b80…	noreply	198	\|
3551b80…	noreply	199	v
3551b80…	noreply	200	Processor selection (by file extension)
3551b80…	noreply	201	\|
3551b80…	noreply	202	v
3551b80…	noreply	203	Text extraction (PDF pages / Markdown sections / plaintext paragraphs)
3551b80…	noreply	204	\|
3551b80…	noreply	205	v
3551b80…	noreply	206	DocumentChunk objects (text + metadata)
3551b80…	noreply	207	\|
3551b80…	noreply	208	v
3551b80…	noreply	209	Source registration (provenance tracking in the KG)
3551b80…	noreply	210	\|
3551b80…	noreply	211	v
3551b80…	noreply	212	KG content addition (LLM entity/relationship extraction per chunk)
3551b80…	noreply	213	\|
3551b80…	noreply	214	v
3551b80…	noreply	215	Knowledge graph storage (.db + .json)
3551b80…	noreply	216	```
3551b80…	noreply	217
3551b80…	noreply	218	### Step 1: Processor selection
3551b80…	noreply	219
3551b80…	noreply	220	PlanOpticon maintains a registry of processors keyed by file extension. When you call `ingest_file()`, it looks up the appropriate processor using `get_processor(path)`. If no processor is registered for the file extension, a `ValueError` is raised.
3551b80…	noreply	221
3551b80…	noreply	222	### Step 2: Text extraction
3551b80…	noreply	223
3551b80…	noreply	224	The selected processor reads the file and produces a list of `DocumentChunk` objects. Each chunk contains:
3551b80…	noreply	225
3551b80…	noreply	226	\| Field \| Type \| Description \|
3551b80…	noreply	227	\|-------\|------\|-------------\|
3551b80…	noreply	228	\| `text` \| `str` \| The extracted text content \|
3551b80…	noreply	229	\| `source_file` \| `str` \| Path to the source file \|
3551b80…	noreply	230	\| `chunk_index` \| `int` \| Sequential index of this chunk within the file \|
3551b80…	noreply	231	\| `page` \| `Optional[int]` \| Page number (PDF only, 1-based) \|
3551b80…	noreply	232	\| `section` \| `Optional[str]` \| Section heading (Markdown only) \|
3551b80…	noreply	233	\| `metadata` \| `Dict[str, Any]` \| Additional metadata (e.g., extraction method) \|
3551b80…	noreply	234
3551b80…	noreply	235	### Step 3: Source registration
3551b80…	noreply	236
3551b80…	noreply	237	Each ingested file is registered as a source in the knowledge graph with provenance metadata:
3551b80…	noreply	238
3551b80…	noreply	239	- `source_id`: A SHA-256 hash of the absolute file path (first 12 characters), unless you provide a custom ID
3551b80…	noreply	240	- `source_type`: Always `"document"`
3551b80…	noreply	241	- `title`: The file stem (filename without extension)
3551b80…	noreply	242	- `path`: The file path
3551b80…	noreply	243	- `mime_type`: Detected MIME type
3551b80…	noreply	244	- `ingested_at`: ISO-8601 timestamp
3551b80…	noreply	245	- `metadata`: Chunk count and file extension
3551b80…	noreply	246
3551b80…	noreply	247	### Step 4: Entity and relationship extraction
3551b80…	noreply	248
3551b80…	noreply	249	Each chunk's text is passed to `knowledge_graph.add_content()`, which uses the configured LLM provider to extract entities and relationships. The content source is tagged with the document name and either the page number or section name:
3551b80…	noreply	250
3551b80…	noreply	251	- `document:report.pdf:page:3`
3551b80…	noreply	252	- `document:spec.md:section:Architecture`
3551b80…	noreply	253	- `document:notes.txt` (no page or section)
3551b80…	noreply	254
3551b80…	noreply	255	### Step 5: Storage
3551b80…	noreply	256
3551b80…	noreply	257	The knowledge graph is saved in both `.db` (SQLite-backed FalkorDB) and `.json` formats.
3551b80…	noreply	258
3551b80…	noreply	259	## Combining with video analysis
3551b80…	noreply	260
3551b80…	noreply	261	A common workflow is to analyze a video recording and then ingest related documents into the same knowledge graph:
3551b80…	noreply	262
3551b80…	noreply	263	```bash
3551b80…	noreply	264	# Step 1: Analyze the meeting recording
3551b80…	noreply	265	planopticon analyze meeting-recording.mp4 -o ./project-kg
3551b80…	noreply	266
3551b80…	noreply	267	# Step 2: Ingest the meeting agenda
3551b80…	noreply	268	planopticon ingest agenda.md --db-path ./project-kg/knowledge_graph.db
3551b80…	noreply	269
3551b80…	noreply	270	# Step 3: Ingest the project spec
3551b80…	noreply	271	planopticon ingest project-spec.pdf --db-path ./project-kg/knowledge_graph.db
3551b80…	noreply	272
3551b80…	noreply	273	# Step 4: Ingest a whole docs folder
3551b80…	noreply	274	planopticon ingest ./reference-docs/ --db-path ./project-kg/knowledge_graph.db
3551b80…	noreply	275
3551b80…	noreply	276	# Step 5: Query the combined graph
3551b80…	noreply	277	planopticon query --db-path ./project-kg/knowledge_graph.db
3551b80…	noreply	278	```
3551b80…	noreply	279
3551b80…	noreply	280	The resulting knowledge graph contains entities and relationships from all sources -- video transcripts, meeting agendas, specs, and reference documents -- with full provenance tracking so you can trace any entity back to its source.
3551b80…	noreply	281
3551b80…	noreply	282	## Python API
3551b80…	noreply	283
3551b80…	noreply	284	### Ingesting a single file
3551b80…	noreply	285
3551b80…	noreply	286	```python
3551b80…	noreply	287	from pathlib import Path
3551b80…	noreply	288	from video_processor.integrators.knowledge_graph import KnowledgeGraph
3551b80…	noreply	289	from video_processor.processors.ingest import ingest_file
3551b80…	noreply	290
3551b80…	noreply	291	kg = KnowledgeGraph(db_path=Path("knowledge_graph.db"))
3551b80…	noreply	292	chunk_count = ingest_file(Path("document.pdf"), kg)
3551b80…	noreply	293	print(f"Processed {chunk_count} chunks")
3551b80…	noreply	294
3551b80…	noreply	295	kg.save(Path("knowledge_graph.db"))
3551b80…	noreply	296	```
3551b80…	noreply	297
3551b80…	noreply	298	### Ingesting a directory
3551b80…	noreply	299
3551b80…	noreply	300	```python
3551b80…	noreply	301	from pathlib import Path
3551b80…	noreply	302	from video_processor.integrators.knowledge_graph import KnowledgeGraph
3551b80…	noreply	303	from video_processor.processors.ingest import ingest_directory
3551b80…	noreply	304
3551b80…	noreply	305	kg = KnowledgeGraph(db_path=Path("knowledge_graph.db"))
3551b80…	noreply	306	results = ingest_directory(
3551b80…	noreply	307	Path("./docs"),
3551b80…	noreply	308	kg,
3551b80…	noreply	309	recursive=True,
3551b80…	noreply	310	extensions=[".md", ".pdf"], # Optional: filter by extension
3551b80…	noreply	311	)
3551b80…	noreply	312
3551b80…	noreply	313	for filepath, chunks in results.items():
3551b80…	noreply	314	print(f" {filepath}: {chunks} chunks")
3551b80…	noreply	315
3551b80…	noreply	316	kg.save(Path("knowledge_graph.db"))
3551b80…	noreply	317	```
3551b80…	noreply	318
3551b80…	noreply	319	### Listing supported extensions
3551b80…	noreply	320
3551b80…	noreply	321	```python
3551b80…	noreply	322	from video_processor.processors.base import list_supported_extensions
3551b80…	noreply	323
3551b80…	noreply	324	extensions = list_supported_extensions()
3551b80…	noreply	325	print(extensions)
3551b80…	noreply	326	# ['.csv', '.log', '.markdown', '.md', '.pdf', '.text', '.txt']
3551b80…	noreply	327	```
3551b80…	noreply	328
3551b80…	noreply	329	### Working with processors directly
3551b80…	noreply	330
3551b80…	noreply	331	```python
3551b80…	noreply	332	from pathlib import Path
3551b80…	noreply	333	from video_processor.processors.base import get_processor
3551b80…	noreply	334
3551b80…	noreply	335	processor = get_processor(Path("report.pdf"))
3551b80…	noreply	336	if processor:
3551b80…	noreply	337	chunks = processor.process(Path("report.pdf"))
3551b80…	noreply	338	for chunk in chunks:
3551b80…	noreply	339	print(f"Page {chunk.page}: {chunk.text[:100]}...")
3551b80…	noreply	340	```
3551b80…	noreply	341
3551b80…	noreply	342	## Extending with custom processors
3551b80…	noreply	343
3551b80…	noreply	344	To add support for a new file format, implement the `DocumentProcessor` abstract class and register it:
3551b80…	noreply	345
3551b80…	noreply	346	```python
3551b80…	noreply	347	from pathlib import Path
3551b80…	noreply	348	from typing import List
3551b80…	noreply	349	from video_processor.processors.base import (
3551b80…	noreply	350	DocumentChunk,
3551b80…	noreply	351	DocumentProcessor,
3551b80…	noreply	352	register_processor,
3551b80…	noreply	353	)
3551b80…	noreply	354
3551b80…	noreply	355
3551b80…	noreply	356	class HtmlProcessor(DocumentProcessor):
3551b80…	noreply	357	supported_extensions = [".html", ".htm"]
3551b80…	noreply	358
3551b80…	noreply	359	def can_process(self, path: Path) -> bool:
3551b80…	noreply	360	return path.suffix.lower() in self.supported_extensions
3551b80…	noreply	361
3551b80…	noreply	362	def process(self, path: Path) -> List[DocumentChunk]:
3551b80…	noreply	363	from bs4 import BeautifulSoup
3551b80…	noreply	364
3551b80…	noreply	365	soup = BeautifulSoup(path.read_text(), "html.parser")
3551b80…	noreply	366	text = soup.get_text(separator="\n")
3551b80…	noreply	367	return [
3551b80…	noreply	368	DocumentChunk(
3551b80…	noreply	369	text=text,
3551b80…	noreply	370	source_file=str(path),
3551b80…	noreply	371	chunk_index=0,
3551b80…	noreply	372	)
3551b80…	noreply	373	]
3551b80…	noreply	374
3551b80…	noreply	375
3551b80…	noreply	376	register_processor(HtmlProcessor.supported_extensions, HtmlProcessor)
3551b80…	noreply	377	```
3551b80…	noreply	378
3551b80…	noreply	379	After registration, `planopticon ingest` will automatically handle `.html` and `.htm` files.
3551b80…	noreply	380
3551b80…	noreply	381	## Companion REPL
3551b80…	noreply	382
3551b80…	noreply	383	Inside the interactive companion REPL, you can ingest files using the `/ingest` command:
3551b80…	noreply	384
3551b80…	noreply	385	```
3551b80…	noreply	386	> /ingest ./meeting-notes.md
3551b80…	noreply	387	Ingested meeting-notes.md: 5 chunks
3551b80…	noreply	388	```
3551b80…	noreply	389
3551b80…	noreply	390	This adds content to the currently loaded knowledge graph.
3551b80…	noreply	391
3551b80…	noreply	392	## Common workflows
3551b80…	noreply	393
3551b80…	noreply	394	### Build a project knowledge base from scratch
3551b80…	noreply	395
3551b80…	noreply	396	```bash
3551b80…	noreply	397	# Ingest all project docs
3551b80…	noreply	398	planopticon ingest ./project-docs/ -o ./knowledge-base
3551b80…	noreply	399
3551b80…	noreply	400	# Query what was captured
3551b80…	noreply	401	planopticon query --db-path ./knowledge-base/knowledge_graph.db
3551b80…	noreply	402
3551b80…	noreply	403	# Export as an Obsidian vault
3551b80…	noreply	404	planopticon export obsidian ./knowledge-base/knowledge_graph.db -o ./vault
3551b80…	noreply	405	```
3551b80…	noreply	406
3551b80…	noreply	407	### Incrementally build a knowledge graph
3551b80…	noreply	408
3551b80…	noreply	409	```bash
3551b80…	noreply	410	# Start with initial docs
3551b80…	noreply	411	planopticon ingest ./sprint-1-docs/ -o ./kg
3551b80…	noreply	412
3551b80…	noreply	413	# Add more docs over time
3551b80…	noreply	414	planopticon ingest ./sprint-2-docs/ --db-path ./kg/knowledge_graph.db
3551b80…	noreply	415	planopticon ingest ./sprint-3-docs/ --db-path ./kg/knowledge_graph.db
3551b80…	noreply	416
3551b80…	noreply	417	# The graph grows with each ingestion
3551b80…	noreply	418	planopticon query --db-path ./kg/knowledge_graph.db stats
3551b80…	noreply	419	```
3551b80…	noreply	420
3551b80…	noreply	421	### Ingest from Google Workspace or Microsoft 365
3551b80…	noreply	422
3551b80…	noreply	423	PlanOpticon provides integrated commands that fetch cloud documents and ingest them in one step:
3551b80…	noreply	424
3551b80…	noreply	425	```bash
3551b80…	noreply	426	# Google Workspace
3551b80…	noreply	427	planopticon gws ingest --folder-id FOLDER_ID -o ./results
3551b80…	noreply	428
3551b80…	noreply	429	# Microsoft 365 / SharePoint
3551b80…	noreply	430	planopticon m365 ingest --web-url https://contoso.sharepoint.com/sites/proj \
3551b80…	noreply	431	--folder-url /sites/proj/Shared\ Documents
3551b80…	noreply	432	```
3551b80…	noreply	433
3551b80…	noreply	434	These commands handle authentication, document download, text extraction, and knowledge graph creation automatically.

PlanOpticon

Keyboard Shortcuts