react-braiv1.0.0 Stable

The Runtime for
Edge AI in React

Run privacy-first small AI without the infrastructure overhead.Zero Latency. Zero Cost. Easy To Use.

Llama-3.2-1B
Gemma-2-2B
Phi-3.5-Mini
Qwen-2.5-1.5B
Mistral-7B-v0.3
Llama-3.1-8B
Llama-3.2-1B
Gemma-2-2B
Phi-3.5-Mini
Qwen-2.5-1.5B
Mistral-7B-v0.3
Llama-3.1-8B
Llama-3.2-1B
Gemma-2-2B
Phi-3.5-Mini
Qwen-2.5-1.5B
Mistral-7B-v0.3
Llama-3.1-8B
Llama-3.2-1B
Gemma-2-2B
Phi-3.5-Mini
Qwen-2.5-1.5B
Mistral-7B-v0.3
Llama-3.1-8B
JSON Extraction
PII Redaction
Semantic Search
Mod & Safety
Smart Form Fill
JSON Extraction
PII Redaction
Semantic Search
Mod & Safety
Smart Form Fill
JSON Extraction
PII Redaction
Semantic Search
Mod & Safety
Smart Form Fill
JSON Extraction
PII Redaction
Semantic Search
Mod & Safety
Smart Form Fill

CHECKING HARDWARE...

Analyzing Hardware...

React Brai Features

A complete runtime for distributed, privacy-first AI on a React app.

Slash Cloud Costs

Don't pay per-token for every simple task. Offload routine jobs like summarization and JSON extraction to the client, saving your API budget for complex reasoning.

Crash-Proof Swarm

Browsers limit WebGPU to one active context. Our Leader Election system synchronizes state across tabs to prevent crashes.

Total Data Privacy

Inputs never leave the client's device. Perfect for HIPAA-compliant healthcare apps, legal tech, and PII redaction.

Zero-Latency Inference

Remove network round-trips entirely. Once the model loads, text generation is instantaneous and works fully offline.

Production-Grade DX

Built for serious development. Full TypeScript support, Zod schema validation, and strict state management out of the box.

Native Metal & Vulkan

We bypass standard WebGL to access raw compute shaders via WebGPU, unlocking near-native performance on Apple M-Series and NVIDIA cards.

v2.1.0 Stable

Introduction

react-brai is a fault-tolerant WebGPU runtime for React. It creates a distributed "Swarm" across your user's open tabs, allowing them to share a single GPU context (VRAM) while executing requests in a persistent, recoverable queue.

The Swarm (v2)

Browsers limit WebGPU to a single active context. To prevent crashes when users open multiple tabs,react-brai uses a Leader Election system.

Leader Tab

The first tab to open. It holds the GPU Lock and the loaded Model in VRAM. It processes the Job Queue sequentially.

Follower Tabs

Subsequent tabs. They do not load the model. Instead, they send requests to the Leader and listen for token streams.

Persistent Recovery

If the Leader is closed or refreshed, the lock is released. The next Follower in line automatically promotes itself to Leader, reads the Persistent Queue from LocalStorage, reloads the model, and resumes processing pending jobs.

Quick Start

Initialize the engine. The hook automatically handles Leader/Follower negotiation.

ChatComponent.tsx
tsx
import { useLocalAI } from 'react-brai';

export default function Chat() {
  const { loadModel, chat, isReady, tps } = useLocalAI();

  useEffect(() => { 
      loadModel('Llama-3.2-1B-Instruct-q4f16_1-MLC'); 
  }, []);

  return <div>Speed: {tps} T/s</div>
}

Core Patterns

Structured Extraction

extract.ts
typescript
const response = await chat([
  { role: "system", content: "Output JSON: { sentiment: 'pos' | 'neg' }" },
  { role: "user", content: "I love this library!" }
]);
const data = JSON.parse(response);

API Reference

useLocalAI()

Swarm State

roleSynced'LEADER' | 'FOLLOWER' | 'PENDING'

Current tab status. LEADERS perform inference. FOLLOWERS delegate to Leader.

queueSyncedQueueItem[]

Real-time list of all pending jobs across all tabs. Synced via LocalStorage.

Engine State

isReadySyncedboolean

True when the Swarm Leader has a model loaded in VRAM.

progressSynced{ progress: number, text: string }

Download progress (0-1). Visible to all tabs during initialization.

isLoadingLocalboolean

True ONLY if *this specific tab* initiated the request currently being processed.

responseLocalstring

Streaming output buffer. Only populates for the tab that requested the chat.

Telemetry

tpsReal-timenumber

Tokens Per Second. Instantaneous inference speed metric.

generatedTokensReal-timenumber

Total number of tokens generated for the current active response.

Methods

loadModel(modelId, config?)Requiredvoid

Requests the Leader to load a model. If Leader is busy/loaded, request is ignored safely.

unloadModel()void

Forces the Leader to purge the model from VRAM and terminate the worker to free memory.

chat(messages)Requiredvoid

Push a job to the Swarm Queue. The promise resolves when tokens start streaming.

reset()void

Aborts the current job (if it belongs to this tab) and kills the Leader worker if necessary.

Type Definitions

types.ts
typescript
export interface LocalAIHook {
  role: 'LEADER' | 'FOLLOWER' | 'PENDING';
  isReady: boolean;
  isLoading: boolean;
  response: string;
  tps: number;              // 🆕 Telemetry
  generatedTokens: number;  // 🆕 Telemetry
  
  loadModel: (modelId: string, config?: ModelConfig) => void;
  unloadModel: () => void;  // 🆕 Method
  chat: (messages: Message[]) => void;
  reset: () => void;
}
BROWSER NATIVE SWARM

React Brai Playground

Serverless AI, powered by your GPU.

Model Configuration

Looking for more? Find compatible models on HuggingFace ending in q4f16_1-MLC

VRAM Required2GB

Fastest. Best for JSON & Chat.

Telemetry
Speed
0
tokens per second
Job Queue0
Idle. Ready for requests.
chat

Ready to Initialize

We need to download 1.2 GB of weights to your cache. This happens once.