Production RAG, without the stack.
Document ingestion, chunking, embeddings, vector storage, retrieval, and grounded generation — all behind one OpenAI-compatible API. No infra to build.
Every layer of the stack
PyMuPDF extraction, semantic chunking, BGE-M3 embeddings, pgvector storage, top-k retrieval, context injection, citations. Managed end-to-end.
Distributed, not centralized
Embeddings + inference run on Lexora's distributed worker network. You get hosted reliability without GPU rental bills.
Bring any model
Use Qwen3 for fast answers, Kimi for long context, DeepSeek for reasoning. The same KB powers all of them.
Free to build, pay to serve
Knowledge base creation is free. You pay only when chat completions actually run. Perfect for prototyping RAG products.
Lexora vs DIY RAG stack
Side-by-side breakdown of what matters.
How it works
Ingest
POST PDFs / TXT to /v1/kb/{kb_id}/files. Up to 50 MB per file.
Chunk + Embed
1000-token chunks with 150-token overlap. BGE-M3 1024-dim embeddings.
Store
Vectors stored in pgvector under your account. Free up to plan limit.
Retrieve
Pass kb_id to /v1/chat/completions. Top-5 chunks injected as system context.
Generate
Any model — Qwen3, DeepSeek, Kimi — produces a grounded answer with citations.
Ship RAG, don't build it.
$1 signup credit. Free knowledge bases. Pay-per-token inference. Everything you need to launch a RAG product this week.
Related