Live · Distributed AI Infrastructure

AI Infrastructure
Without Idle GPU Costs

Run inference, build knowledge bases from your documents, and access frontier models — all through one API. Pay only when you use it. No GPUs to rent, no idle hours to cover.

Idle GPU cost

Free

Knowledge Bases

—

Distributed nodes

112ms

Median TTFT

AI Developers

One API for hosted & frontier models.

Hosted Qwen3, BGE-M3, FLUX + partner models
Build chatbots from your PDFs — free RAG
Pay per token, no idle GPU costs
🎁$1.00 free credit on signup. No credit card required.

Get API Key

Run a Node

Power the network. Earn per token served.

Distributed inference, no central GPU farm
Python CLI worker — one command start
Earnings paid per token your GPU generates

Provider onboarding is invite-only right now. Email us your GPU specs to join the waitlist.

chirantan@lexoratechnologies.com →

Registered users

13.7K

API requests served

12.5M

Tokens generated

99.1%

API uptime

Model Catalogue

Open-weight models,
live on the network.

Drop-in OpenAI-compatible IDs. Live models are available now at beta pricing — pipeline models are deploying soon.

Live — Beta Pricing

Beta

Qwen3 8B

Qwen/Qwen3-8B

Fast reasoning model on the network. Ideal for chat, code, classification, and high-throughput integrations.

ChatReasoningFast$0.10 / 1M tokens

Beta

FLUX.1 schnell

black-forest-labs/FLUX.1-schnell

Guidance-free 4-step image generation. Best speed-to-quality ratio for real-time image apps.

Image4-stepFast$0.002 / image

In the Pipeline

In Pipeline

Llama 3.1 8B Instruct

meta-llama/Llama-3.1-8B-Instruct

Balanced performance and cost. Strong on reasoning, code, and long-context tasks.

ChatInstruct128K ctx

In Pipeline

Llama 3.1 70B Instruct

meta-llama/Llama-3.1-70B-Instruct

Flagship open-weight LLM. Near-GPT-4 quality at a fraction of the cost — routes to cluster-grade nodes.

ChatReasoningLarge128K ctx

In Pipeline

FLUX.1 dev

black-forest-labs/FLUX.1-dev

Guidance-distilled FLUX for higher prompt adherence. 1024 px output, tunable CFG scale.

ImageGuidedHD

In Pipeline

Stable Diffusion XL

stabilityai/stable-diffusion-xl-base-1.0

Industry-standard SDXL base model. Wide ecosystem of LoRAs, ControlNets, and refiners.

ImageSDXLFP16

Up to 50% cheaper than centralized providers

Beta pricing — $0.04 / 1M tokens for 3B · $0.002 / image for FLUX schnell.

Get API Key

Full pricing details and model specs →

View pricing page

Architecture

Production infrastructure,
decentralized edge.

NestJS orchestrator, Redis node registry, PostgreSQL job ledger, and a Python vLLM worker — open source, self-hostable.

OpenAI-Compatible API

Drop-in replacement for the OpenAI SDK. Change one line — the base URL — and slash your inference bill in half.

Global Node Network

Jobs are routed to the best-ranked node by reputation score, available VRAM, and network latency — in real time.

50% Cost Reduction

Consumer GPUs have zero infrastructure overhead. Savings pass directly to you, visible in the live telemetry widget.

JWT-Authenticated Nodes

Worker nodes authenticate with hardware-fingerprint-bound JWTs. Reputation slashing for misbehaving providers.

Fault-Tolerant Dispatch

Node goes offline mid-generation? The orchestrator detects it in 30s, cancels the SSE stream, and re-dispatches automatically.

Earn from Idle VRAM

Run the Python CLI worker on any CUDA GPU. Get paid per token. No cloud setup, no committed capacity.

RAG · Knowledge Base

Train your chatbot
today. For free.

Upload any document. BGE-M3 embeds it. Qwen3 reads it and answers your questions with citations — no fine-tuning, no GPU bill, no waiting. It just works.

Upload your docs

PDF or plain text, up to 50 MB. Your contracts, manuals, research papers — anything with text.

BGE-M3 indexes it

State-of-the-art multilingual embeddings chunk and encode your content into a private vector store. Runs on the network — zero cost to you.

Qwen3 answers from it

Every question retrieves the exact relevant passages first. Qwen3 8B reads them and replies with cited page numbers — not guesses from training data.

No GPU required on your endPDF · TXT supportedCitations with page numbersPowered by Qwen3 8B + BGE-M3

Start training free

Free tier included · No credit card needed to start

Live network

Every token traced
to its source GPU.

The network is radically transparent. When you use the sandbox below, you see the exact node ID, GPU model, tokens-per-second, and cost savings for your specific inference request — all streamed live.

Node ID piped via SSE headers
GPU model on active worker
Live TPS from stream timing
Cost delta vs. OpenAI GPT-4

Live Demo

Try it. Watch the node do the work.

This prompt runs against the real orchestrator. The telemetry panel shows which node answered and how fast it generated.

lexora://inference-sandbox

Mistral-7B-Instruct-v0.2

Live inference on decentralized GPUs

Tokens stream directly from a worker node

Inference routed to the best available GPU node in the Lexora network.

Node Telemetry

Active Node

Node ID

—

GPU

—

TPS

—

Tokens

Prompt

—

Generated

—

Latency

—

Cost Saved

$0.0000

vs. OpenAI GPT-4

Live telemetry piped from the selected worker node via SSE headers & stream events.

2-line migration

Change two lines.
Cut your bill in half.

The Lexora network speaks the OpenAI protocol natively. Point your existing SDK at our endpoint, swap the key, and start saving. Zero refactoring.

Get API Key

inference.ts

1import OpenAI from "openai";
2 
3const client = new OpenAI({
4  baseURL: "https://api.lexora.network/v1",
5  apiKey: process.env.LEXORA_API_KEY,
6});
7 
8const stream = client.chat.completions.stream({
9  model: "mistralai/Mistral-7B-Instruct-v0.2",
10  messages: [{ role: "user", content: "Hello" }],
11});
12 
13for await (const chunk of stream) {
14  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
15}

AI InfrastructureWithout Idle GPU Costs

AI Developers

Run a Node

Open-weight models,live on the network.

Qwen3 8B

FLUX.1 schnell

Llama 3.1 8B Instruct

Llama 3.1 70B Instruct

FLUX.1 dev

Stable Diffusion XL

Production infrastructure,decentralized edge.

OpenAI-Compatible API

Global Node Network

50% Cost Reduction

JWT-Authenticated Nodes

Fault-Tolerant Dispatch

Earn from Idle VRAM

Train your chatbottoday. For free.

Upload your docs

BGE-M3 indexes it

Qwen3 answers from it

Every token tracedto its source GPU.

Try it. Watch the node do the work.

Change two lines.Cut your bill in half.

AI Infrastructure
Without Idle GPU Costs

Open-weight models,
live on the network.

Production infrastructure,
decentralized edge.

Train your chatbot
today. For free.

Every token traced
to its source GPU.

Change two lines.
Cut your bill in half.