How to Monitor OpenAI API Costs in Real-Time

You deployed your OpenAI integration last week. The daily summary email says $4.20 spent yesterday. Sounds fine. But by the time that email arrives, a runaway loop that fired 3,000 requests at 3 AM has already finished. You lost $12, your rate limit got hammered, and you found out 18 hours later.

The only way to monitor OpenAI API costs properly is request by request, not from a morning digest. This guide shows you how to get per-minute spend visibility in under ten minutes using Spanlens, a free open-source observability proxy.

Why a daily summary is not enough

OpenAI’s usage dashboard shows aggregate daily totals. That is useful for accounting but too slow for catching problems. Three scenarios where daily totals leave you blind:

A retry loop bug sends 500 requests in 2 minutes. You hit your rate limit before you notice.
You accidentally left gpt-4o in a code path that should use gpt-4o-mini. Costs are 5x higher, but the daily total looks normal until end of month.
A new feature shipped on Friday afternoon. Usage is up 300%. Is that expected growth or a bug? You cannot tell until Monday.

Four numbers that actually matter

Spend per hour tells you about spikes. Cost by model shows if you are accidentally hitting an expensive model. Requests per minute tracks traffic volume independently of cost, so you can tell the difference between a traffic surge and a cost regression. Error rate is the sneaky one: when it rises alongside spend, retries are usually multiplying your costs.

Those four metrics together give you a complete picture. Daily totals give you one.

Set up Spanlens in 5 minutes

Spanlens acts as a proxy in front of the OpenAI API. Change one line in your code and every request gets logged, costed, and traced automatically.

Step 1: Sign up at spanlens.io and create a project. You get an API key in the format sl_live_...

Step 2: Change your baseURL to point to the Spanlens proxy:

TypeScript code showing how to configure OpenAI client with Spanlens proxy baseURL — Changing baseURL is the only required code change.

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://api.spanlens.io/proxy/openai/v1",
  defaultHeaders: {
    "Authorization": `Bearer ${process.env.SPANLENS_API_KEY}`,
  },
});

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: "https://api.spanlens.io/proxy/openai/v1",
  defaultHeaders: {
    "Authorization": `Bearer ${process.env.SPANLENS_API_KEY}`,
  },
});

Your existing OpenAI calls work unchanged. No SDK wrapper, no new function calls, no restructuring.

Try Spanlens free

Point one baseURL, see every LLM call. 50,000 requests free, no card required.

Start free →

Reading the dashboard

Once your first request flows through, the Spanlens dashboard shows spend in real time. The “Traffic and spend” chart updates every minute. The solid line tracks request volume; the dashed line tracks dollar cost.

When a runaway loop fires, you see both lines spike at the same time. The anomaly detector flags it automatically with the exact timestamp, cost delta, and percentage increase.

Spanlens dashboard showing an hourly cost spike from a runaway loop at 3am — Example: a runaway loop causes a +3100% cost spike. Spanlens detects and flags it in real time.

You can configure Slack or email alerts to fire the moment a spike is detected. That means you know within minutes instead of the next morning.

Breaking down costs by model

The dashboard also breaks down spend by model, so you can see exactly where your budget is going.

Spanlens dashboard showing cost breakdown by model — Cost breakdown by model makes it easy to spot accidental use of expensive models.

A common finding: a background job that runs hourly was using gpt-4o for a task that gpt-4o-mini handles just as well. Switching that one call typically cuts the monthly bill by 30 to 50 percent.

A 10-minute cost audit

Spanlens Requests page sorted by cost descending — Sorting requests by cost descending immediately shows your most expensive calls.

Open the Requests page and sort by cost descending. Look at the top 10 calls. Are they using the right model for the task?
Switch to the Stats page and set the range to 7 days. Check if spend per day is growing faster than requests per day. If cost grows faster than volume, you likely have model drift somewhere.
Go to the Anomalies tab. Look at any flagged events from the past week. Each one links to the exact requests that triggered it.

The whole thing takes about 10 minutes and almost always surfaces at least one change worth making.

Two patterns to watch for

Runaway retry loops. An error condition triggers retries. Each retry costs money. Without rate limiting on the retry logic, a single bad request can multiply into hundreds. Spanlens shows this as a sudden vertical spike on the hourly chart.

Model drift. A code review bumps a model from gpt-4o-mini to gpt-4o in a hot path. The change looks small in a diff but can double your daily bill. Watching cost-per-request over time catches this within hours of the deploy.

Also using Claude or other providers?

The same proxy setup works for Anthropic. If your app calls both providers, you can monitor OpenAI and Claude costs side by side in a single dashboard. See how to monitor Anthropic Claude API costs for the Claude-specific setup.

If you want to go further and cut the bill rather than just track it, how to reduce OpenAI API costs covers the model-switching and prompt optimization strategies that move the needle most. If you are comparing observability tools rather than setting up from scratch, see LangSmith alternatives and Langfuse alternatives for a side-by-side breakdown.

Try Spanlens free

Point one baseURL, see every LLM call. 50,000 requests free, no card required.

Start free →

Spanlens is open source (MIT). If this was useful, star it on GitHub ⭐

How to Monitor OpenAI API Costs in Real-Time

Why a daily summary is not enough

Four numbers that actually matter

Set up Spanlens in 5 minutes

Reading the dashboard

Breaking down costs by model

A 10-minute cost audit

Two patterns to watch for

Also using Claude or other providers?

Like this:

Related

Leave a ReplyCancel reply

Why a daily summary is not enough

Four numbers that actually matter

Set up Spanlens in 5 minutes

Reading the dashboard

Breaking down costs by model

A 10-minute cost audit

Two patterns to watch for

Also using Claude or other providers?

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Spanlens Blog