Documentation

How to connect agents across providers without burning tokens on data transit.

How it works

Every agent in your pipeline has a token budget. When Agent A produces a large output — a code review, a research report, a database dump — and Agent B needs it, you have two options: paste the full text into B's prompt (expensive) or truncate it (lossy).

ContextRelay gives you a third option. Agent A pushes the payload to the edge and gets back an 80-character URL. Agent B receives the URL, then pulls the full payload directly — it never appears in A's conversation again. For sensitive payloads, add encrypted=True and the data is encrypted in your process before it leaves your machine — Cloudflare only ever sees ciphertext.

Agent A → push(50 KB result) → "https://.../pull/uuid"
↓ pass the URL (20 tokens)
Agent B → pull(url) → 50 KB result (~75 ms, 0 tokens in A)

Quick start

1

Install

pip install contextrelay
2

Get your API key

Sign up and create an API key — copy the cr_live_... value and store it in an env var.

export CONTEXTRELAY_API_KEY="cr_live_..."
3

Connect and relay

import os
from contextrelay import ContextRelay

# No base_url needed — defaults to the managed cloud worker
relay = ContextRelay(api_key=os.environ["CONTEXTRELAY_API_KEY"])

url = relay.push("any large text, JSON, or Markdown — up to 25 MB")
data = relay.pull(url)   # retrieve from any agent, any machine

Use case 1 — Cross-provider handoff (Claude → Mistral)

Claude does a thorough code review. The review is too large to fit alongside the follow-up instructions in Mistral's context. ContextRelay bridges them.

import os, anthropic
from mistralai import Mistral
from contextrelay import ContextRelay

relay   = ContextRelay(api_key=os.environ["CONTEXTRELAY_API_KEY"])
claude  = anthropic.Anthropic()
mistral = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

# ── Agent A: Claude reviews the PR ──────────────────────────────
diff = open("pr_diff.txt").read()   # ~20 KB of git diff

review = claude.messages.create(
    model="claude-opus-4-5",
    max_tokens=4096,
    messages=[{"role": "user", "content": f"Code review:\n\n{diff}"}],
).content[0].text

# Store the review — hand off only the URL
review_url = relay.push(review, metadata={"pr": "PR-441", "type": "review"})

# ── Agent B: Mistral turns the review into Jira tickets ─────────
full_review = relay.pull(review_url)   # fetched directly by Mistral

tickets = mistral.chat.complete(
    model="mistral-large-latest",
    messages=[{
        "role": "user",
        "content": f"Convert this review into Jira tickets:\n\n{full_review}"
    }],
).choices[0].message.content
The review travels from Claude to Mistral as a URL (~20 tokens). Without ContextRelay, pasting a 3,000-token review into Mistral's prompt would cost ~$0.02 every handoff — at 500 reviews/day that is $10/day in pure overhead.

Use case 2 — Parallel agents, single channel

Three researcher agents write different sections simultaneously. A synthesis agent subscribes to a channel and assembles the report the moment all three push their findings — no polling.

import os, threading
from contextrelay import ContextRelay

relay = ContextRelay(api_key=os.environ["CONTEXTRELAY_API_KEY"])
collected = {}

def on_section_ready(url):
    meta    = relay.peek(url)           # read tag without downloading
    section = relay.pull(url)
    collected[meta["section"]] = section
    if len(collected) == 3:
        report = "\n\n".join(
            collected[k] for k in ["intro", "analysis", "conclusion"]
        )
        print("Report ready:", len(report), "chars")

# Synthesis agent subscribes before any pushes arrive
threading.Thread(
    target=relay.subscribe,
    args=("report-ch", on_section_ready),
    daemon=True,
).start()

# Three researcher agents push independently, in any order
def researcher(name, prompt):
    result = call_llm(prompt)           # your LLM call here
    relay.push(result, channel="report-ch", metadata={"section": name})

for name, prompt in [
    ("intro",      "Write an intro to quantum computing"),
    ("analysis",   "Analyse current quantum hardware"),
    ("conclusion", "5-year outlook for quantum computing"),
]:
    threading.Thread(target=researcher, args=(name, prompt)).start()
Each push triggers the subscriber callback within ~20 ms via Cloudflare Durable Objects WebSockets. The synthesis agent never polls.

Use case 3 — Delegate a task to Claude Code (AgentBridge)

An orchestrator script sends a coding task to a Claude Code instance running in a tmux window. push_and_wait blocks until Claude finishes and the result is relayed back — no polling, no SSH, no copy-paste.

Start the coordinator once (in your tmux session)

# terminal in your tmux session named "vibe", window 0
pip install contextrelay
contextrelay-bridge start --tmux vibe --task-channel vibe-tasks --done-channel vibe-done

Send tasks from any script

import os
from contextrelay import ContextRelay, AgentBridge

relay  = ContextRelay(api_key=os.environ["CONTEXTRELAY_API_KEY"])
bridge = AgentBridge(relay, task_channel="vibe-tasks", done_channel="vibe-done")

result = bridge.push_and_wait(
    "Refactor the auth module to use Firebase. "
    "Run the type checker. Return a summary of all changed files."
)

print(result)   # full Claude Code output, stripped of UI chrome
The bridge pushes the task through ContextRelay, pastes it into the Claude Code terminal via tmux, waits for the agent to finish, then relays the output back. Your script gets the full response as a string.

Use case 4 — Context checkpoint inside a chain

A pipeline stage produces output too large for the next step's context window. Checkpoint it to ContextRelay and reload only when you actually need it.

import os, json
from contextrelay import ContextRelay

relay = ContextRelay(api_key=os.environ["CONTEXTRELAY_API_KEY"])

# Step 1 — SQL extraction produces 80 KB JSON
raw_data = run_sql_query()
url = relay.push(
    json.dumps(raw_data),
    metadata={"step": "sql_extraction", "rows": len(raw_data)},
)

# Step 2 — peek first: is this worth the analysis cost?
meta = relay.peek(url)
if meta["rows"] < 100:
    print("Too few rows — skip analysis")
else:
    # Step 3 — pull only when needed, in the agent that needs it
    data = json.loads(relay.pull(url))
    run_analysis_agent(data)

Use case 5 — Claude plans, Mistral builds (live demo)

The idea: Use Claude Opus as your architect (high reasoning, worth the cost) and Mistral as your engineer (fast, accurate, cheaper per token). They never share a conversation — ContextRelay passes the architecture as a URL so Mistral never burns tokens on Claude's planning context.

Step 1 — paste this in Claude Code

Claude has the push_context MCP tool available. It will design the API and push the architecture to ContextRelay automatically.

You are a senior software architect. Design a production-ready FastAPI task management API.

Your design must cover:
- Data models: User, Task (with status, priority, due_date)
- All REST endpoints: auth (register/login/me), tasks (CRUD + filter by status)
- JWT authentication flow
- SQLite + SQLAlchemy ORM setup
- Pydantic schemas for request/response validation
- File structure and key implementation decisions

Write the complete architecture document with precise details so an engineer
can implement without asking questions.

When done, use the push_context tool to save the full document to ContextRelay.
Print the returned URL clearly — your engineer (Mistral) will build the entire codebase from it.

Step 2 — copy the URL Claude prints, paste this in Mistral

Replace PASTE_URL_FROM_CLAUDE_HERE and your API key. Mistral fetches the architecture and implements the full codebase.

You are a senior Python engineer. Your architect (Claude Opus) has designed a FastAPI API.

Step 1 — fetch the architecture from ContextRelay:

import requests
plan = requests.get(
    "PASTE_URL_FROM_CLAUDE_HERE",
    headers={"Authorization": "Bearer YOUR_API_KEY"}
).text
print(plan)

Step 2 — read the architecture and implement the complete codebase:
- Every Python file described in the design
- requirements.txt
- README.md with setup and run instructions

Rules:
- Match the architect's design exactly — do not improvise
- Output complete, runnable code only
- No placeholders, no TODOs
What just happened: Mistral never saw Claude's conversation — it received only the architecture URL (~80 chars). Claude never saw Mistral's output. Two specialist models collaborated on a real codebase through a single URL pointer. The architecture stays valid for 24 hours — share it with a reviewer, a CI agent, or a third model.

Automate the full loop in Python

import os, anthropic
from mistralai import Mistral
from contextrelay import ContextRelay

relay   = ContextRelay(api_key=os.environ["CONTEXTRELAY_API_KEY"])
claude  = anthropic.Anthropic()
mistral = Mistral(api_key=os.environ["MISTRAL_API_KEY"])

# ── Claude Opus: architect ───────────────────────────────────────
arch_response = claude.messages.create(
    model="claude-opus-4-5",
    max_tokens=8192,
    messages=[{
        "role": "user",
        "content": (
            "Design a production FastAPI task management API. Include data models, "
            "all endpoints, JWT auth, SQLAlchemy setup, and file structure. "
            "Be complete — an engineer will implement directly from this document."
        )
    }],
).content[0].text

# Push architecture — hand off just the URL
arch_url = relay.push(arch_response, metadata={"role": "architecture", "project": "task-api"})
print(f"Architecture saved: {arch_url}")

# ── Mistral Large: engineer ──────────────────────────────────────
architecture = relay.pull(arch_url)   # Mistral fetches directly — 0 tokens in Claude

code_response = mistral.chat.complete(
    model="mistral-large-latest",
    messages=[{
        "role": "user",
        "content": (
            f"You are a senior Python engineer. Implement this architecture as a complete, "
            f"runnable codebase. Every file. No placeholders.\n\nArchitecture:\n{architecture}"
        )
    }],
).choices[0].message.content

# Push implementation — share the URL with your team or CI
impl_url = relay.push(code_response, metadata={"role": "implementation", "project": "task-api"})
print(f"Implementation saved: {impl_url}")
print(f"\nToken cost of the handoff: ~20 tokens (the URL)")
print(f"Token cost without ContextRelay: ~{len(arch_response.split()):,} tokens")

End-to-end encryption

🔒
Your data, your keys. When you push with encrypted=True, the payload is encrypted in your Python process before it leaves your machine. Cloudflare — and ContextRelay — only ever store ciphertext. The decryption key is embedded in the URL fragment (#key=…) and is never sent to any server (RFC 3986 guarantees fragments stay client-side).

Encryption is opt-in per push. Use it for any payload containing PII, credentials, or proprietary data.

relay = ContextRelay(api_key=os.environ["CONTEXTRELAY_API_KEY"])

# Encrypt on push — a fresh AES key is generated locally
url = relay.push(sensitive_payload, encrypted=True)
# url → "https://.../pull/<uuid>#key=<base64-fernet-key>"

# Anyone with the full URL can decrypt; anyone without #key= cannot
result = relay.pull(url)   # decrypted locally, never on the server

What is encrypted vs what is not

FieldEncrypted?Notes
dataYesFernet (AES-128-CBC + HMAC-SHA256)
metadataNoAlways plaintext — don't put secrets here
keyNever leaves clientURL fragment — never transmitted to server
pip install contextrelay[crypto]   # or: pip install cryptography

SDK reference

MethodWhat it does
push(data, ...)Upload payload (str, up to 25 MB), returns URL. Options: channel, encrypted, metadata.
pull(url)Download payload. Auto-decrypts if URL contains #key=.
peek(url)Fetch metadata only — no payload download.
subscribe(ch, fn)Subscribe to a channel. Calls fn(url) on each push. Blocking — run in a thread.
publish(ch, msg)Publish a message to a channel without a payload push.

REST API

Every SDK method maps to a Worker endpoint. Authenticate with Authorization: Bearer cr_live_....

MethodEndpointDescription
POST/pushUpload payload → { url, id }
GET/pull/:idDownload payload by ID
GET/peek/:idMetadata only, no payload
GET/ws/:channelWebSocket upgrade — pub/sub
curl -X POST https://contextrelay.hashim-cmd.workers.dev/push \
  -H "Authorization: Bearer $CONTEXTRELAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"data": "hello from curl"}'

MCP (Claude Desktop / Claude Code)

Register ContextRelay as a native MCP server so Claude can push and pull context without leaving the conversation.

// ~/.claude/mcp.json  or  .mcp.json in project root
{
  "mcpServers": {
    "contextrelay": {
      "command": "contextrelay-mcp",
      "env": {
        "CONTEXTRELAY_URL": "https://contextrelay.hashim-cmd.workers.dev",
        "CONTEXTRELAY_API_KEY": "cr_live_..."
      }
    }
  }
}

Available tools: push_context, peek_context, pull_context.

Limits

PlanPushes / moPulls / mo
Free1 00010 000
Pro100 0001 000 000
Team1 000 00010 000 000

Max payload: 25 MB · TTL: 24 hours.

Ready to stop paying token tax?

Get your free API key →