Streaming
Token-by-token
You see what the orc is thinking. SSE-based streaming with no polling, no buffering, no hand-waving.
chat · v1 · open-weight

Streaming token-by-token. Code execution in a sandbox. Live web browsing. Self-hosted on orcs.to — your spoils, your warband.
in the warband
gemma-4-31b-itkimi-k2.6glm-5.1gemma-4-31b-it-freewhat it does
Streaming
You see what the orc is thinking. SSE-based streaming with no polling, no buffering, no hand-waving.
code_run
E2B-powered. Real Python, real outputs, real plots — isolated per thread, killed when you leave.
web_browse
Watch the orc raid the web in real time. Browserless-backed Chromium, JPEG screencast, extracted text.
Quickstrike
Type /code to summon Quickstrike — one-shot scripts and edits inside the chat. Repo-shaped work? Head to the Forge.
Open weights
No proprietary lock-in. Four open-weight aliases routed through LiteLLM. Swap them as the field improves.
Self-hosted
Your data stays on orcs.to infra. CNPG Postgres, Loki logs, Langfuse traces — all on a single Hetzner node.
the warband
Four LiteLLM aliases. Pick at the start of every thread, switch mid-thread if the orc gets stuck. The roster moves as the field does.
gemma-4-31b-itFlagship · production
Google's open-weight Gemma 4 at 31B Instruct. Full-precision, the default for most threads.
kimi-k2.6Reasoning · multilingual
Moonshot's K2 lineage. Strong at long-form reasoning, code, and non-English prompts.
glm-5.1Balanced · structured
Zhipu's GLM 5.1. Punches above its weight on structured tasks and JSON output.
gemma-4-31b-it-freeFree tier · rate-limited
Same Gemma 4, routed through the free-tier upstream. Best for casual threads.
how it raids
Plain text or markdown. Pick a model up top, or stick with the default Gemma 4 31B.
If a tool helps — code_run for math/plots, web_browse for fresh facts — the model picks it.
Tokens stream live; tool runs render as inline cards (code blocks, browser frames, structured results).
pricing
One sign-up, both surfaces — chat and the Forge. Post-alpha pricing will be metered per token + per runner-minute. We’ll write you before anything changes.
alpha · today
Soft cap of $5 of model spend / 30d. We’ll write you when it changes.
post-alpha · v0.3+
Pay only for tokens consumed and runner-minutes used. No seats. No annual contracts. No surprises. Numbers land when we’ve seen real usage — not before.
We do not promise free forever. We do not promise unlimited anything. We will tell you, in writing, before pricing changes.
questions
Four open-weight aliases routed through LiteLLM: Gemma 4 31B Instruct (the default), Gemma 4 31B Free, Kimi K2.6, and GLM 5.1. The roster shifts as the open-weight field moves — we don't add proprietary fallbacks.
Prompts and tool outputs are sent to whichever upstream serves the chosen model. They're not used to train. orcs.to itself stores threads in our own Postgres on Hetzner; nothing leaves the box except the prompt + response, and only to the upstream you've selected.
Two reasons. One: open weights mean we can self-host a fallback if any single upstream goes dark. Two: the open-weight gap to frontier proprietary models is now small enough that we'd rather optimize the surface — streaming, tools, citations — than pay rent to a closed lab.
The platform surface (platform.orcs.to) exposes API keys, usage dashboards, and a OpenAI-compatible endpoint. v1 is single-user; pay-as-you-go billing via Stripe is v2.
Yes — the application repo lives at git.orcs.to/orcs/chat and the infrastructure repo at git.orcs.to/orcs/infra. Self-host from there if you want your own warband.
Clone the infra repo, run the bootstrap scripts on a fresh Debian box, point DNS at the host, and let Argo CD reconcile. You'll need a Clerk project, an E2B key for code_run, and a Browserless deployment for web_browse — all documented in the infra runbooks.

Free, signup-gated, no credit card. Bring a question; bring back spoils.