The Runtime for
Edge AI in React
Run privacy-first small AI without the infrastructure overhead.
Zero Latency. Zero Cost. Easy To Use.
CHECKING HARDWARE...
Analyzing Hardware...
React Brai Features
A complete runtime for distributed, privacy-first AI on a React app.
Slash Cloud Costs
Don't pay per-token for every simple task. Offload routine jobs like summarization and JSON extraction to the client, saving your API budget for complex reasoning.
Crash-Proof Swarm
Browsers limit WebGPU to one active context. Our Leader Election system synchronizes state across tabs to prevent crashes.
Total Data Privacy
Inputs never leave the client's device. Perfect for HIPAA-compliant healthcare apps, legal tech, and PII redaction.
Zero-Latency Inference
Remove network round-trips entirely. Once the model loads, text generation is instantaneous and works fully offline.
Production-Grade DX
Built for serious development. Full TypeScript support, Zod schema validation, and strict state management out of the box.
Native Metal & Vulkan
We bypass standard WebGL to access raw compute shaders via WebGPU, unlocking near-native performance on Apple M-Series and NVIDIA cards.
Documentation
Introduction
react-brai is a fault-tolerant WebGPU runtime for React. It creates a distributed "Swarm" across your user's open tabs, allowing them to share a single GPU context (VRAM) while executing requests in a persistent, recoverable queue.
The Swarm (v2)
Browsers limit WebGPU to a single active context. To prevent crashes when users open multiple tabs,react-brai uses a Leader Election system.
Leader Tab
The first tab to open. It holds the GPU Lock and the loaded Model in VRAM. It processes the Job Queue sequentially.
Follower Tabs
Subsequent tabs. They do not load the model. Instead, they send requests to the Leader and listen for token streams.
Persistent Recovery
If the Leader is closed or refreshed, the lock is released. The next Follower in line automatically promotes itself to Leader, reads the Persistent Queue from LocalStorage, reloads the model, and resumes processing pending jobs.
Quick Start
Initialize the engine. The hook automatically handles Leader/Follower negotiation.
import { useLocalAI } from 'react-brai';
export default function Chat() {
const { loadModel, chat, isReady, tps } = useLocalAI();
useEffect(() => {
loadModel('Llama-3.2-1B-Instruct-q4f16_1-MLC');
}, []);
return <div>Speed: {tps} T/s</div>
}Core Patterns
Structured Extraction
const response = await chat([
{ role: "system", content: "Output JSON: { sentiment: 'pos' | 'neg' }" },
{ role: "user", content: "I love this library!" }
]);
const data = JSON.parse(response);API Reference
useLocalAI()
Swarm State
roleSynced'LEADER' | 'FOLLOWER' | 'PENDING'Current tab status. LEADERS perform inference. FOLLOWERS delegate to Leader.
queueSyncedQueueItem[]Real-time list of all pending jobs across all tabs. Synced via LocalStorage.
Engine State
isReadySyncedbooleanTrue when the Swarm Leader has a model loaded in VRAM.
progressSynced{ progress: number, text: string }Download progress (0-1). Visible to all tabs during initialization.
isLoadingLocalbooleanTrue ONLY if *this specific tab* initiated the request currently being processed.
responseLocalstringStreaming output buffer. Only populates for the tab that requested the chat.
Telemetry
tpsReal-timenumberTokens Per Second. Instantaneous inference speed metric.
generatedTokensReal-timenumberTotal number of tokens generated for the current active response.
Methods
loadModel(modelId, config?)RequiredvoidRequests the Leader to load a model. If Leader is busy/loaded, request is ignored safely.
unloadModel()voidForces the Leader to purge the model from VRAM and terminate the worker to free memory.
chat(messages)RequiredvoidPush a job to the Swarm Queue. The promise resolves when tokens start streaming.
reset()voidAborts the current job (if it belongs to this tab) and kills the Leader worker if necessary.
Type Definitions
export interface LocalAIHook {
role: 'LEADER' | 'FOLLOWER' | 'PENDING';
isReady: boolean;
isLoading: boolean;
response: string;
tps: number; // 🆕 Telemetry
generatedTokens: number; // 🆕 Telemetry
loadModel: (modelId: string, config?: ModelConfig) => void;
unloadModel: () => void; // 🆕 Method
chat: (messages: Message[]) => void;
reset: () => void;
}React Brai Playground
Serverless AI, powered by your GPU.
Looking for more? Find compatible models on HuggingFace ending in q4f16_1-MLC
Fastest. Best for JSON & Chat.
Ready to Initialize
We need to download 1.2 GB of weights to your cache. This happens once.