browserlab — in-browser LLM engine comparison

Model

Prompt

Max tokens

Pick a model, then hit ⚡ Auto-cycle & compare to load → run → unload each engine and get a stats table — or test one engine at a time with its own Load → Run. Greyed lanes can't run the selected model.

How it works. Each lane loads the model independently into the browser's WebGPU device and runs a greedy decode of your prompt, reporting tokens/sec. Weights are fetched from Hugging Face on first use — ~0.2–5 GB depending on the model, so the first Load can take ~1 min (cached by the browser afterward; subsequent loads are seconds). The webml lane runs best in a fresh tab.

Credits. The custom WGSL engine lane is our fork of tylerstraub/gemma4-webgpu (© Tyler Straub, Apache-2.0) — we added the Qwen3 port, in-shader Q4_K/Q8, multi-row matmuls, and a byte-level BPE tokenizer (our fork). The transformers.js and raw onnxruntime-web lanes use 🤗 transformers.js / onnxruntime-web. The webml-community kernels lane points to the hand-written WebGPU kernels by Xenova / the HF WebML community (gemma-4, lfm2 Spaces), which we adapted in LocalMind. Part of browser-big-fast-lab.

* The webml-community Spaces carry no license, so rather than re-host that code here, this lane links to Xenova's canonical demos — open one to run it side-by-side in another tab.