How to Monitor Anthropic Claude API Costs in Real-Time

Monitor Anthropic Claude API costs in real-time using Spanlens proxy

If your team uses Claude, you already know the bill can creep up fast. Opus 4 costs $15 per million input tokens, and that adds up when prompts run long or usage spikes without warning. The only way to catch problems early is to monitor Anthropic Claude API costs request by request, not from a daily summary email.

You can do that without touching your existing code. Here’s how.

The problem with Anthropic cost visibility

Anthropic’s dashboard shows aggregate daily usage. It lags by hours, has no per-model breakdown, and sends no alerts. You usually find out about runaway costs on the invoice.

Three scenarios where that causes real damage:

A retry loop fires 400 requests in two minutes. You hit your rate limit before you notice.
A code review swaps claude-haiku for claude-opus-4 in a background job. Costs jump 18x but the daily total looks fine until end of month.
A new feature ships on Friday. Usage is up 300%. Is that expected growth or a bug? You cannot tell.

What you actually need to monitor your Anthropic Claude API costs properly:

Per-request cost tracking (which call cost what)
Breakdown by model (Opus vs Sonnet vs Haiku)
Latency per request
Anomaly alerts before costs spiral
Token usage over time

Step 1: Set up Spanlens as your Anthropic proxy

Spanlens runs as a proxy between your app and Anthropic’s API. Change one line in your config, and every request gets logged with cost and latency. No SDK wrapper, no new function calls.

Anthropic SDK code before and after adding Spanlens proxy baseURL

Two lines to add:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
  baseURL: "https://api.spanlens.io/proxy/anthropic/v1",
  defaultHeaders: {
    "Authorization": `Bearer ${process.env.SPANLENS_API_KEY}`,
  },
});

Your existing client.messages.create() calls stay the same. Every request logs automatically.

Try Spanlens free

Point one baseURL, see every LLM call. 50,000 requests free, no card required.

Start free →

Step 2: See costs appear in real-time

Send your first request and it shows up in the dashboard immediately:

Spanlens dashboard showing real-time Anthropic Claude API cost breakdown by model

Each row shows the cost, model, and latency. You can see which calls cost the most and whether anything looks off.

The Stats page breaks it down further: spend per hour, requests per minute, and a latency histogram. If you are running multiple Claude models in the same app, you will see exactly which one is driving your bill.

Step 3: Understand your Claude model cost breakdown

Claude pricing varies a lot across the model family:

Model	Input (per 1M tokens)	Output (per 1M tokens)	Best for
Claude Opus 4	$15.00	$75.00	Complex reasoning, agents
Claude Sonnet 4.6	$3.00	$15.00	Most production tasks
Claude Haiku 4.5	$0.80	$4.00	Classification, routing, short tasks

Once you can see which calls use Opus, you can check whether they actually need to. Switching 30% of Opus calls to Sonnet cuts that part of the bill by 80%.

A common finding: a background classification job running hourly was using claude-opus-4 because that was the default in the original prompt. Haiku handles it just as well. The switch brought the monthly cost for that job from ~$180 to ~$12.

Step 4: Set a cost anomaly alert

In Settings, enable cost anomaly detection. Spanlens tracks your rolling baseline and emails you when a window goes above the expected range. A runaway loop shows up in minutes instead of on your invoice.

You can also set hard limits per API key from the Projects page. If a key hits N requests per minute, Spanlens blocks further calls and logs the event. Useful for rate-limiting specific services or end-users.

What to look for in the first week

Once you have monitoring in place, three things are worth checking immediately.

Cost per request by model. Sort descending. Look at your top 10 most expensive calls. Are they using the right model? Most teams find at least one call using Opus where Sonnet would work.

Spend trend vs request trend. Open the Stats page, set the range to 7 days. If cost per day grows faster than requests per day, you have model drift somewhere. A new deploy switched models without anyone noticing.

Anomaly history. Check the Anomalies tab for the past week. Each flagged event links to the exact requests that triggered it. Even if there are no current problems, this gives you a baseline for what “normal” looks like.

Python setup

Python SDK setup looks the same:

import anthropic
import os

client = anthropic.Anthropic(
    api_key=os.environ.get("ANTHROPIC_API_KEY"),
    base_url="https://api.spanlens.io/proxy/anthropic/v1",
    default_headers={
        "Authorization": f"Bearer {os.environ.get('SPANLENS_API_KEY')}",
    },
)

All existing client.messages.create() calls work unchanged.

Also using OpenAI?

The same proxy setup works for OpenAI. If your app calls both providers, you can monitor OpenAI and Anthropic costs side by side in a single dashboard. See how to monitor OpenAI API costs for the OpenAI-specific setup.

If you are comparing observability tools rather than setting up monitoring from scratch, LangSmith alternatives and Langfuse alternatives cover the tradeoffs between the main options.

What you can see

Cost per request, by model
Token usage (input vs output) for every call
Latency at P50 and P95
Email alerts when cost anomalies fire
7-day, 30-day, and custom time ranges
Works with Opus, Sonnet, and Haiku across all versions

Try Spanlens free

Point one baseURL, see every LLM call. 50,000 requests free, no card required.

Start free →

Spanlens is open source (MIT). If this was useful, star it on GitHub ⭐

How to Monitor Anthropic Claude API Costs in Real-Time

The problem with Anthropic cost visibility

Step 1: Set up Spanlens as your Anthropic proxy

Step 2: See costs appear in real-time

Step 3: Understand your Claude model cost breakdown

Step 4: Set a cost anomaly alert

What to look for in the first week

Python setup

Also using OpenAI?

What you can see

Like this:

Related

Leave a ReplyCancel reply

The problem with Anthropic cost visibility

Step 1: Set up Spanlens as your Anthropic proxy

Step 2: See costs appear in real-time

Step 3: Understand your Claude model cost breakdown

Step 4: Set a cost anomaly alert

What to look for in the first week

Python setup

Also using OpenAI?

What you can see

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Spanlens Blog