Live on Mainnet

The Inference Layer
Owned by the Network

Route LLM inference to distributed consumer GPUs. 50% cheaper than centralized APIs. Every token streams directly from the worker node that generated it.

~50%
Cost reduction
β€”
Active nodes
12ms
Median TTFT
99.4%
Uptime

AI Developers

Drop-in OpenAI replacement. Half the cost.

  • OpenAI-compatible /v1/chat endpoint
  • SSE token streaming
  • Pay per token, no minimums
  • 🎁$1.00 free credit on signup
Get API Key

GPU Providers

Monetize idle VRAM while you sleep.

  • Python CLI worker β€” one command start
  • Auto-routes jobs to your GPU
  • Paid per token generated on-chain

Provider onboarding is invite-only right now. Email us your GPU specs to join the waitlist.

chirantan@lexoratechnologies.com β†’
Model Catalogue

Open-weight models,
live on the network.

Drop-in OpenAI-compatible IDs. Live models are available now at beta pricing β€” pipeline models are deploying soon.

Live β€” Beta Pricing
Beta

Llama 3.2 3B Instruct

meta-llama/Llama-3.2-3B-Instruct

Fastest model on the network. Ideal for high-throughput chat, classification, and low-latency integrations.

ChatInstructFast$0.04 / 1M tokens
Beta

FLUX.1 schnell

black-forest-labs/FLUX.1-schnell

Guidance-free 4-step image generation. Best speed-to-quality ratio for real-time image apps.

Image4-stepFast$0.002 / image
In the Pipeline
In Pipeline

Llama 3.1 8B Instruct

meta-llama/Llama-3.1-8B-Instruct

Balanced performance and cost. Strong on reasoning, code, and long-context tasks.

ChatInstruct128K ctx
In Pipeline

Llama 3.1 70B Instruct

meta-llama/Llama-3.1-70B-Instruct

Flagship open-weight LLM. Near-GPT-4 quality at a fraction of the cost β€” routes to cluster-grade nodes.

ChatReasoningLarge128K ctx
In Pipeline

FLUX.1 dev

black-forest-labs/FLUX.1-dev

Guidance-distilled FLUX for higher prompt adherence. 1024 px output, tunable CFG scale.

ImageGuidedHD
In Pipeline

Stable Diffusion XL

stabilityai/stable-diffusion-xl-base-1.0

Industry-standard SDXL base model. Wide ecosystem of LoRAs, ControlNets, and refiners.

ImageSDXLFP16

Up to 95% cheaper than centralized providers

Beta pricing β€” $0.04 / 1M tokens for 3B Β· $0.002 / image for FLUX schnell.

Get API Key

Full pricing details and model specs β†’

View pricing page
Architecture

Production infrastructure,
decentralized edge.

NestJS orchestrator, Redis node registry, PostgreSQL job ledger, and a Python vLLM worker β€” open source, self-hostable.

OpenAI-Compatible API

Drop-in replacement for the OpenAI SDK. Change one line β€” the base URL β€” and slash your inference bill in half.

Global Node Network

Jobs are routed to the best-ranked node by reputation score, available VRAM, and network latency β€” in real time.

50% Cost Reduction

Consumer GPUs have zero infrastructure overhead. Savings pass directly to you, visible in the live telemetry widget.

JWT-Authenticated Nodes

Worker nodes authenticate with hardware-fingerprint-bound JWTs. Reputation slashing for misbehaving providers.

Fault-Tolerant Dispatch

Node goes offline mid-generation? The orchestrator detects it in 30s, cancels the SSE stream, and re-dispatches automatically.

Earn from Idle VRAM

Run the Python CLI worker on any CUDA GPU. Get paid per token. No cloud setup, no committed capacity.

Live network

Every token traced
to its source GPU.

The network is radically transparent. When you use the sandbox below, you see the exact node ID, GPU model, tokens-per-second, and cost savings for your specific inference request β€” all streamed live.

  • Node ID piped via SSE headers
  • GPU model on active worker
  • Live TPS from stream timing
  • Cost delta vs. OpenAI GPT-4
Live Demo

Try it. Watch the node do the work.

This prompt runs against the real orchestrator. The telemetry panel shows which node answered and how fast it generated.

lexora://inference-sandbox
Mistral-7B-Instruct-v0.2

Live inference on decentralized GPUs

Tokens stream directly from a worker node

Inference routed to the best available GPU node in the Lexora network.

2-line migration

Change two lines.
Cut your bill in half.

The Lexora network speaks the OpenAI protocol natively. Point your existing SDK at our endpoint, swap the key, and start saving. Zero refactoring.

Get API Key
inference.ts
1import OpenAI from "openai";
2
3const client = new OpenAI({
4 baseURL: "https://api.lexora.network/v1",
5 apiKey: process.env.DEPIN_API_KEY,
6});
7
8const stream = client.chat.completions.stream({
9 model: "mistralai/Mistral-7B-Instruct-v0.2",
10 messages: [{ role: "user", content: "Hello" }],
11});
12
13for await (const chunk of stream) {
14 process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
15}