
If your team uses Claude, you already know the bill can creep up fast. Opus 4 costs $15 per million input tokens, and that adds up when prompts run long or usage spikes without warning. The only way to catch problems early is to monitor Anthropic Claude API costs request by request, not from a daily summary email.
You can do that without touching your existing code. Here’s how.
The problem with Anthropic cost visibility
Anthropic’s dashboard shows aggregate daily usage. It lags by hours, has no per-model breakdown, and sends no alerts. You usually find out about runaway costs on the invoice.
Three scenarios where that causes real damage:
- A retry loop fires 400 requests in two minutes. You hit your rate limit before you notice.
- A code review swaps
claude-haikuforclaude-opus-4in a background job. Costs jump 18x but the daily total looks fine until end of month. - A new feature ships on Friday. Usage is up 300%. Is that expected growth or a bug? You cannot tell.
What you actually need to monitor your Anthropic Claude API costs properly:
- Per-request cost tracking (which call cost what)
- Breakdown by model (Opus vs Sonnet vs Haiku)
- Latency per request
- Anomaly alerts before costs spiral
- Token usage over time
Step 1: Set up Spanlens as your Anthropic proxy
Spanlens runs as a proxy between your app and Anthropic’s API. Change one line in your config, and every request gets logged with cost and latency. No SDK wrapper, no new function calls.

Two lines to add:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
baseURL: "https://api.spanlens.io/proxy/anthropic/v1",
defaultHeaders: {
"Authorization": `Bearer ${process.env.SPANLENS_API_KEY}`,
},
});Your existing client.messages.create() calls stay the same. Every request logs automatically.
Sign up at spanlens.io. The free plan covers 50,000 requests a month.
Try Spanlens free
Point one baseURL, see every LLM call. 50,000 requests free, no card required.
Step 2: See costs appear in real-time
Send your first request and it shows up in the dashboard immediately:

Each row shows the cost, model, and latency. You can see which calls cost the most and whether anything looks off.
The Stats page breaks it down further: spend per hour, requests per minute, and a latency histogram. If you are running multiple Claude models in the same app, you will see exactly which one is driving your bill.
Step 3: Understand your Claude model cost breakdown
Claude pricing varies a lot across the model family:
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Best for |
|---|---|---|---|
| Claude Opus 4 | $15.00 | $75.00 | Complex reasoning, agents |
| Claude Sonnet 4.6 | $3.00 | $15.00 | Most production tasks |
| Claude Haiku 4.5 | $0.80 | $4.00 | Classification, routing, short tasks |
Once you can see which calls use Opus, you can check whether they actually need to. Switching 30% of Opus calls to Sonnet cuts that part of the bill by 80%.
A common finding: a background classification job running hourly was using claude-opus-4 because that was the default in the original prompt. Haiku handles it just as well. The switch brought the monthly cost for that job from ~$180 to ~$12.
Step 4: Set a cost anomaly alert
In Settings, enable cost anomaly detection. Spanlens tracks your rolling baseline and emails you when a window goes above the expected range. A runaway loop shows up in minutes instead of on your invoice.
You can also set hard limits per API key from the Projects page. If a key hits N requests per minute, Spanlens blocks further calls and logs the event. Useful for rate-limiting specific services or end-users.
What to look for in the first week
Once you have monitoring in place, three things are worth checking immediately.
Cost per request by model. Sort descending. Look at your top 10 most expensive calls. Are they using the right model? Most teams find at least one call using Opus where Sonnet would work.
Spend trend vs request trend. Open the Stats page, set the range to 7 days. If cost per day grows faster than requests per day, you have model drift somewhere. A new deploy switched models without anyone noticing.
Anomaly history. Check the Anomalies tab for the past week. Each flagged event links to the exact requests that triggered it. Even if there are no current problems, this gives you a baseline for what “normal” looks like.
Python setup
Python SDK setup looks the same:
import anthropic
import os
client = anthropic.Anthropic(
api_key=os.environ.get("ANTHROPIC_API_KEY"),
base_url="https://api.spanlens.io/proxy/anthropic/v1",
default_headers={
"Authorization": f"Bearer {os.environ.get('SPANLENS_API_KEY')}",
},
)All existing client.messages.create() calls work unchanged.
Also using OpenAI?
The same proxy setup works for OpenAI. If your app calls both providers, you can monitor OpenAI and Anthropic costs side by side in a single dashboard. See how to monitor OpenAI API costs for the OpenAI-specific setup.
If you are comparing observability tools rather than setting up monitoring from scratch, LangSmith alternatives and Langfuse alternatives cover the tradeoffs between the main options.
What you can see
- Cost per request, by model
- Token usage (input vs output) for every call
- Latency at P50 and P95
- Email alerts when cost anomalies fire
- 7-day, 30-day, and custom time ranges
- Works with Opus, Sonnet, and Haiku across all versions
Try Spanlens free
Point one baseURL, see every LLM call. 50,000 requests free, no card required.
Spanlens is open source (MIT). If this was useful, star it on GitHub ⭐