Skip to content
orcs.toorcs.to
alpha · live · status →

chat · v1 · open-weight

orcs.to

A clean AI chat, raided on open-weight models.

Streaming token-by-token. Code execution in a sandbox. Live web browsing. Self-hosted on orcs.to — your spoils, your warband.

in the warband

gemma-4-31b-itkimi-k2.6glm-5.1gemma-4-31b-it-free

what it does

Six things, done with care.

Streaming

Token-by-token

You see what the orc is thinking. SSE-based streaming with no polling, no buffering, no hand-waving.

code_run

Sandboxed Python

E2B-powered. Real Python, real outputs, real plots — isolated per thread, killed when you leave.

web_browse

Live web browsing

Watch the orc raid the web in real time. Browserless-backed Chromium, JPEG screencast, extracted text.

Quickstrike

/code in any thread

Type /code to summon Quickstrike — one-shot scripts and edits inside the chat. Repo-shaped work? Head to the Forge.

Open weights

Kimi · GLM · Gemma · Qwen

No proprietary lock-in. Four open-weight aliases routed through LiteLLM. Swap them as the field improves.

Self-hosted

Your spoils

Your data stays on orcs.to infra. CNPG Postgres, Loki logs, Langfuse traces — all on a single Hetzner node.

the warband

Open-weight, without exception.

Four LiteLLM aliases. Pick at the start of every thread, switch mid-thread if the orc gets stuck. The roster moves as the field does.

gemma-4-31b-it

Gemma 4 31B

Flagship · production

Google's open-weight Gemma 4 at 31B Instruct. Full-precision, the default for most threads.

ctx · 128kOpen in chat
kimi-k2.6

Kimi K2.6

Reasoning · multilingual

Moonshot's K2 lineage. Strong at long-form reasoning, code, and non-English prompts.

ctx · 128kOpen in chat
glm-5.1

GLM 5.1

Balanced · structured

Zhipu's GLM 5.1. Punches above its weight on structured tasks and JSON output.

ctx · 128kOpen in chat
gemma-4-31b-it-free

Gemma 4 31B (free)

Free tier · rate-limited

Same Gemma 4, routed through the free-tier upstream. Best for casual threads.

ctx · 128kOpen in chat

how it raids

Prompt in. Streamed answer + tool widgets out.

  1. 01

    You prompt.

    Plain text or markdown. Pick a model up top, or stick with the default Gemma 4 31B.

  2. 02

    The orc decides.

    If a tool helps — code_run for math/plots, web_browse for fresh facts — the model picks it.

  3. 03

    Stream + widgets.

    Tokens stream live; tool runs render as inline cards (code blocks, browser frames, structured results).

pricing

Free during alpha. No credit card.

One sign-up, both surfaces — chat and the Forge. Post-alpha pricing will be metered per token + per runner-minute. We’ll write you before anything changes.

alpha · today

Free during alpha

$0no credit card
  • Open-weight models only
  • code_run + web_browse tools
  • Quickstrike (/code in any thread)
  • Forge runs + live preview URLs
  • One sign-up, both surfaces

Soft cap of $5 of model spend / 30d. We’ll write you when it changes.

post-alpha · v0.3+

Metered

Pay only for tokens consumed and runner-minutes used. No seats. No annual contracts. No surprises. Numbers land when we’ve seen real usage — not before.

We do not promise free forever. We do not promise unlimited anything. We will tell you, in writing, before pricing changes.

questions

What you'll want to know.

What models does orcs.to support?

Four open-weight aliases routed through LiteLLM: Gemma 4 31B Instruct (the default), Gemma 4 31B Free, Kimi K2.6, and GLM 5.1. The roster shifts as the open-weight field moves — we don't add proprietary fallbacks.

Is my data shared with the model providers?

Prompts and tool outputs are sent to whichever upstream serves the chosen model. They're not used to train. orcs.to itself stores threads in our own Postgres on Hetzner; nothing leaves the box except the prompt + response, and only to the upstream you've selected.

Why open-weight only?

Two reasons. One: open weights mean we can self-host a fallback if any single upstream goes dark. Two: the open-weight gap to frontier proprietary models is now small enough that we'd rather optimize the surface — streaming, tools, citations — than pay rent to a closed lab.

Can I use orcs.to via an API?

The platform surface (platform.orcs.to) exposes API keys, usage dashboards, and a OpenAI-compatible endpoint. v1 is single-user; pay-as-you-go billing via Stripe is v2.

Is the source code open?

Yes — the application repo lives at git.orcs.to/orcs/chat and the infrastructure repo at git.orcs.to/orcs/infra. Self-host from there if you want your own warband.

How do I host this myself?

Clone the infra repo, run the bootstrap scripts on a fresh Debian box, point DNS at the host, and let Argo CD reconcile. You'll need a Clerk project, an E2B key for code_run, and a Browserless deployment for web_browse — all documented in the infra runbooks.

orcs.to

Summon a thread.

Free, signup-gated, no credit card. Bring a question; bring back spoils.