The Inference Layer
Owned by the Network
Route LLM inference to distributed consumer GPUs. 50% cheaper than centralized APIs. Every token streams directly from the worker node that generated it.
AI Developers
Drop-in OpenAI replacement. Half the cost.
- OpenAI-compatible /v1/chat endpoint
- SSE token streaming
- Pay per token, no minimums
- π$1.00 free credit on signup
GPU Providers
Monetize idle VRAM while you sleep.
- Python CLI worker β one command start
- Auto-routes jobs to your GPU
- Paid per token generated on-chain
Provider onboarding is invite-only right now. Email us your GPU specs to join the waitlist.
chirantan@lexoratechnologies.com βOpen-weight models,
live on the network.
Drop-in OpenAI-compatible IDs. Live models are available now at beta pricing β pipeline models are deploying soon.
Llama 3.2 3B Instruct
meta-llama/Llama-3.2-3B-Instruct
Fastest model on the network. Ideal for high-throughput chat, classification, and low-latency integrations.
FLUX.1 schnell
black-forest-labs/FLUX.1-schnell
Guidance-free 4-step image generation. Best speed-to-quality ratio for real-time image apps.
Llama 3.1 8B Instruct
meta-llama/Llama-3.1-8B-Instruct
Balanced performance and cost. Strong on reasoning, code, and long-context tasks.
Llama 3.1 70B Instruct
meta-llama/Llama-3.1-70B-Instruct
Flagship open-weight LLM. Near-GPT-4 quality at a fraction of the cost β routes to cluster-grade nodes.
FLUX.1 dev
black-forest-labs/FLUX.1-dev
Guidance-distilled FLUX for higher prompt adherence. 1024 px output, tunable CFG scale.
Stable Diffusion XL
stabilityai/stable-diffusion-xl-base-1.0
Industry-standard SDXL base model. Wide ecosystem of LoRAs, ControlNets, and refiners.
Up to 95% cheaper than centralized providers
Beta pricing β $0.04 / 1M tokens for 3B Β· $0.002 / image for FLUX schnell.
Full pricing details and model specs β
View pricing pageProduction infrastructure,
decentralized edge.
NestJS orchestrator, Redis node registry, PostgreSQL job ledger, and a Python vLLM worker β open source, self-hostable.
OpenAI-Compatible API
Drop-in replacement for the OpenAI SDK. Change one line β the base URL β and slash your inference bill in half.
Global Node Network
Jobs are routed to the best-ranked node by reputation score, available VRAM, and network latency β in real time.
50% Cost Reduction
Consumer GPUs have zero infrastructure overhead. Savings pass directly to you, visible in the live telemetry widget.
JWT-Authenticated Nodes
Worker nodes authenticate with hardware-fingerprint-bound JWTs. Reputation slashing for misbehaving providers.
Fault-Tolerant Dispatch
Node goes offline mid-generation? The orchestrator detects it in 30s, cancels the SSE stream, and re-dispatches automatically.
Earn from Idle VRAM
Run the Python CLI worker on any CUDA GPU. Get paid per token. No cloud setup, no committed capacity.
Every token traced
to its source GPU.
The network is radically transparent. When you use the sandbox below, you see the exact node ID, GPU model, tokens-per-second, and cost savings for your specific inference request β all streamed live.
- Node ID piped via SSE headers
- GPU model on active worker
- Live TPS from stream timing
- Cost delta vs. OpenAI GPT-4
Try it. Watch the node do the work.
This prompt runs against the real orchestrator. The telemetry panel shows which node answered and how fast it generated.
Live inference on decentralized GPUs
Tokens stream directly from a worker node
Inference routed to the best available GPU node in the Lexora network.
vs. OpenAI GPT-4
Live telemetry piped from the selected worker node via SSE headers & stream events.
Change two lines.
Cut your bill in half.
The Lexora network speaks the OpenAI protocol natively. Point your existing SDK at our endpoint, swap the key, and start saving. Zero refactoring.
Get API Key1import OpenAI from "openai";2 3const client = new OpenAI({4 baseURL: "https://api.lexora.network/v1",5 apiKey: process.env.DEPIN_API_KEY,6});7 8const stream = client.chat.completions.stream({9 model: "mistralai/Mistral-7B-Instruct-v0.2",10 messages: [{ role: "user", content: "Hello" }],11});12 13for await (const chunk of stream) {14 process.stdout.write(chunk.choices[0]?.delta?.content ?? "");15}