> ## Documentation Index
> Fetch the complete documentation index at: https://upstash.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Use Cases

The idea behind Upstash Box is simple: **give AI its own computer**. Your agent gets a full, isolated cloud environment it can control. Run commands, write files, or execute code independent of any user device. Freeze a box anytime, and continue days or even weeks later with perfect resumability.

***

## 1. Agent Servers

A very powerful pattern is the **Agent Server**: a long-running, per-tenant agent that persists its state across sessions. Unlike ephemeral sandboxes that lose everything on shutdown, an Agent Server keeps its history, context, and learned preferences intact forever.

<Frame>
  <img src="https://mintcdn.com/upstash/zj2F924dqQhl2uwq/img/box/agent-server.png?fit=max&auto=format&n=zj2F924dqQhl2uwq&q=85&s=262c64fdc96db5b34670ffdea086fb42" width="2751" height="2108" data-path="img/box/agent-server.png" />
</Frame>

Each user gets a dedicated Box running its own agent. The agent observes every request and response in a non-blocking way. It builds up a personalized understanding of what that user needs. Over time, it contributes back to a shared **Knowledge Base**, so insights from one tenant can improve results for everyone.

Because boxes are serverless, idle tenants only cost a very low storage rate. When a user returns, their box wakes instantly with all prior state intact (installed packages, file history, learned preferences) and picks up exactly where it left off.

## 2. Multi-Agent Orchestration

Box's async SDK lets you spin up multiple boxes in parallel, each running a specialized agent with a distinct role. Once every agent finishes, a final box can synthesize their outputs into a single result.

<Frame>
  <img src="https://mintcdn.com/upstash/zj2F924dqQhl2uwq/img/box/review-infrastructure.png?fit=max&auto=format&n=zj2F924dqQhl2uwq&q=85&s=9a794cbbc1332a4cc8fc89cf2ffcebc2" width="3464" height="1451" data-path="img/box/review-infrastructure.png" />
</Frame>

A practical example is an automated **PR review pipeline**. When a pull request is opened, you fan out to three boxes. One for security review, one for code quality, and one for architecture. Then you collect their findings in a fourth box that summarizes everything and posts a comment on GitHub.

<CodeGroup>
  ```typescript box.ts theme={"system"}
  const pr = "https://github.com/acme/app/pull/42"

  const [security, quality, architecture] = await Promise.all([
    Box.create({ runtime: "node", agent: { harness: Agent.ClaudeCode, model: "anthropic/claude-sonnet-4-6" } }),
    Box.create({ runtime: "node", agent: { harness: Agent.ClaudeCode, model: "anthropic/claude-sonnet-4-6" } }),
    Box.create({ runtime: "node", agent: { harness: Agent.ClaudeCode, model: "anthropic/claude-sonnet-4-6" } }),
  ])

  const reviews = await Promise.all([
    security.agent.run({ prompt: `Security review for ${pr}` }),
    quality.agent.run({ prompt: `Code quality review for ${pr}` }),
    architecture.agent.run({ prompt: `Architecture review for ${pr}` }),
  ])

  const jury = await Box.create({
    runtime: "node",
    agent: { harness: Agent.ClaudeCode, model: "anthropic/claude-sonnet-4-6" },
    git: { token: process.env.GIT_TOKEN },
  })

  await jury.agent.run({
    prompt: `Summarize these reviews and post a comment on ${pr}:\n${reviews.map((r) => r.result).join("\n\n")}`,
  })
  ```

  ```python box.py theme={"system"}
  import asyncio
  import os
  from upstash_box import AsyncBox, Agent

  async def main():
      pr = "https://github.com/acme/app/pull/42"

      security, quality, architecture = await asyncio.gather(
          AsyncBox.create(runtime="node", agent={"harness": Agent.CLAUDE_CODE, "model": "anthropic/claude-sonnet-4-6"}),
          AsyncBox.create(runtime="node", agent={"harness": Agent.CLAUDE_CODE, "model": "anthropic/claude-sonnet-4-6"}),
          AsyncBox.create(runtime="node", agent={"harness": Agent.CLAUDE_CODE, "model": "anthropic/claude-sonnet-4-6"}),
      )

      reviews = await asyncio.gather(
          security.agent.run(prompt=f"Security review for {pr}"),
          quality.agent.run(prompt=f"Code quality review for {pr}"),
          architecture.agent.run(prompt=f"Architecture review for {pr}"),
      )

      jury = await AsyncBox.create(
          runtime="node",
          agent={"harness": Agent.CLAUDE_CODE, "model": "anthropic/claude-sonnet-4-6"},
          git={"token": os.environ["GITHUB_TOKEN"]},
      )

      summary = "\n\n".join(r.result for r in reviews)
      await jury.agent.run(
          prompt=f"Summarize these reviews and post a comment on {pr}:\n{summary}",
      )

  asyncio.run(main())
  ```
</CodeGroup>

Because each box is isolated, the agents cannot interfere with each other. You get true parallelism with independent filesystems, and the orchestration logic stays in your own code.

## 3. Parallel Testing & Comparison

Box makes it easy to run parallel test scenarios at scale. Spin up N boxes, each running a different model against the same inputs, and compare the results side by side.

<Frame>
  <img src="https://mintcdn.com/upstash/zj2F924dqQhl2uwq/img/box/model-comparison.png?fit=max&auto=format&n=zj2F924dqQhl2uwq&q=85&s=34091bd222c648fdc37f70a4c0233f74" width="3560" height="1451" data-path="img/box/model-comparison.png" />
</Frame>

For example, at Context7 we use Box to benchmark LLMs for context extraction over documentation. We spin up boxes in parallel, each running a different model against the same documentation files and prompts. We then evaluate hallucination percentage, accuracy score, and context quality to find the best model:

<CodeGroup>
  ```typescript box.ts theme={"system"}
  const models = ["anthropic/claude-opus-4-6", "openai/gpt-5.3-codex", "openrouter/google/gemini-2.5-pro"]

  const docs = await fs.readFile("./documentation.md", "utf-8")
  const prompt = `Extract all API endpoints from this documentation:\n${docs}`

  const boxes = await Promise.all(
    models.map((model) => Box.create({ runtime: "node", agent: { model } })),
  )

  const results = await Promise.all(boxes.map((box) => box.agent.run({ prompt })))

  const evaluation = results.map((r, i) => ({
    model: models[i],
    result: r.result,
    hallucinationPct: evaluateHallucination(r.result, docs),
    accuracyScore: evaluateAccuracy(r.result, docs),
  }))

  const bestModel = evaluation.sort((a, b) => a.hallucinationPct - b.hallucinationPct)[0]
  ```

  ```python box.py theme={"system"}
  import asyncio
  from upstash_box import AsyncBox, Agent

  async def main():
      # each model is paired with the harness that runs it
      agents = [
          {"harness": Agent.CLAUDE_CODE, "model": "anthropic/claude-opus-4-6"},
          {"harness": Agent.CODEX, "model": "openai/gpt-5.3-codex"},
          {"harness": Agent.OPEN_CODE, "model": "openrouter/google/gemini-2.5-pro"},
      ]

      with open("./documentation.md", encoding="utf-8") as f:
          docs = f.read()
      prompt = f"Extract all API endpoints from this documentation:\n{docs}"

      boxes = await asyncio.gather(
          *(AsyncBox.create(runtime="node", agent=agent) for agent in agents)
      )

      results = await asyncio.gather(*(box.agent.run(prompt=prompt) for box in boxes))

      evaluation = [
          {
              "model": agents[i]["model"],
              "result": r.result,
              "hallucination_pct": evaluate_hallucination(r.result, docs),
              "accuracy_score": evaluate_accuracy(r.result, docs),
          }
          for i, r in enumerate(results)
      ]

      best_model = min(evaluation, key=lambda e: e["hallucination_pct"])

  asyncio.run(main())
  ```
</CodeGroup>

Each box is fully isolated, so one model's behavior never leaks into another's results.
