How it works
- You upload documents or add website URLs in the dashboard.
- Content is chunked, embedded using OpenAI
text-embedding-3-small(1536 dimensions), and stored in PostgreSQL with pgvector. - At query time, the agent retrieves the top 3 most similar chunks (similarity threshold: 0.4) and includes them as context.
Supported file types
PDF
Upload PDF documents up to 10 MB. Text is extracted and chunked automatically. Scanned PDFs with embedded text are supported.
Word (DOCX)
Word (DOCX)
Microsoft Word documents are parsed with full formatting support. Tables and lists are preserved as text.
Plain text (TXT)
Plain text (TXT)
Raw text files are chunked by paragraph boundaries.
Markdown (MD)
Markdown (MD)
Markdown files are parsed with heading-aware chunking so sections stay together.
CSV
CSV
CSV files are converted to structured text. Each row becomes a retrievable unit.
Website crawling
You can also add website URLs as knowledge sources. The crawler:- Fetches the page content and extracts readable text
- Follows internal links to crawl related pages
- Re-crawls on demand when you trigger a refresh from the dashboard
Add a website source
Go to Knowledge in the dashboard sidebar and click Add Source. Select Website and enter the URL.
Wait for ingestion
The crawler fetches and processes the pages. Progress is shown in the source list.
Limits
| Resource | Limit |
|---|---|
| Sources per project | 20 |
| Files per upload | 15 |
| Max file size | 10 MB |
| Supported formats | PDF, DOCX, TXT, MD, CSV |
The knowledge base uses cosine similarity search. If the agent isn’t finding relevant content, try breaking large documents into smaller, topic-focused files.