May 18, 2026

One Payload Format For OpenAI, Claude, Codex, And Other AI Jobs

A practical metadata shape for tracking model, token, cost, latency, and job results across OpenAI, Claude, Codex, and other AI tools.

The Shape Matters More Than The Provider

A provider-neutral AI job payload is a small JSON object that records the same run metadata across different tools: provider, model, project, status, token usage, latency, cost, and job results.

The provider matters. The model matters. But the monitoring fields are usually the same.

For most AI jobs, I would start with this common set:

provider
model
project
status
input_tokens
output_tokens
cached_input_tokens
reasoning_tokens
total_tokens
latency_ms or duration_ms
cost_usd
items_processed
items_failed
eval_score

You will not have every field for every job. That is fine.

For cost_usd, use your own pricing calculation, provider usage API, or billing export. Model prices and aliases change, so I treat this field as an estimate that should be explainable later.

An eval run might send eval_score, test_cases, and failed_cases. A RAG sync might send documents_indexed, chunks_created, and embedding_model. A coding agent might send git_repo, git_branch, issue_id, and token counts.

The point is to keep the shared fields boring and consistent. If every job calls the model field model, the project field project, and the cost field cost_usd, the history becomes easier to search, group, chart, and explain later.

Field group	Example fields	What it helps answer
Identity	`provider`, `model`, `project`	What ran, and where does it belong?
Usage	`input_tokens`, `output_tokens`, `total_tokens`	How much model work happened?
Operations	`status`, `latency_ms`, `items_failed`	Did the run finish cleanly and on time?
Cost and quality	`cost_usd`, `eval_score`	Was the run worth trusting or investigating?

The Same Payload Shape For OpenAI And Claude

Here is an example OpenAI job:

{
  "provider": "openai",
  "model": "gpt-5.2",
  "project": "support-summary",
  "status": "success",
  "input_tokens": 4200,
  "output_tokens": 640,
  "cached_input_tokens": 900,
  "reasoning_tokens": 0,
  "total_tokens": 4840,
  "latency_ms": 2300,
  "cost_usd": 0.02,
  "items_processed": 128,
  "items_failed": 0
}

And here is a Claude job using the same shape:

{
  "provider": "anthropic",
  "model": "claude-3-5-haiku-20241022",
  "project": "support-summary",
  "status": "success",
  "input_tokens": 3900,
  "output_tokens": 710,
  "cached_input_tokens": 0,
  "reasoning_tokens": 0,
  "total_tokens": 4610,
  "latency_ms": 2600,
  "cost_usd": 0.01,
  "items_processed": 128,
  "items_failed": 0
}

Those examples are not trying to hide provider differences. OpenAI, Anthropic, local models, coding tools, batch APIs, and hosted agents will all expose usage in their own way.

I would use stable model IDs in production when the provider recommends them. Aliases are handy while testing, but stable IDs make old runs easier to explain later.

But once your job has the numbers, the monitoring payload can normalize the parts you care about:

who ran the job
what project it belongs to
which provider and model ran
how many tokens it used
how long it took
what it cost
how much work it completed
whether it failed in a way the process exit code did not catch

That common shape is what makes mixed-model usage readable.

The One-POST Pattern

The integration can stay small.

At the end of a job, send a POST to the tracker URL:

curl -X POST "https://telemhq.com/ping/YOUR_TRACKING_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-5.2",
    "project": "support-summary",
    "status": "success",
    "input_tokens": 4200,
    "output_tokens": 640,
    "total_tokens": 4840,
    "latency_ms": 2300,
    "cost_usd": 0.02,
    "items_processed": 128,
    "items_failed": 0
  }'

That request gives the run a timestamp and a payload.

If the tracker has a schedule, TelemHQ can also care whether the job checked in on time. If the tracker is ad hoc, it can behave more like run history for workers, agents, and manually triggered tasks.

Either way, the payload is the important part. It turns "the job ran" into "the job used this model, spent this much, processed this many items, and finished with this status."

Numeric Fields Become The Start Of A Dashboard

The nice thing about a JSON payload is that the numbers already have names.

If a run sends:

total_tokens
latency_ms
cost_usd
items_processed
items_failed
eval_score

those fields are useful over time. You can chart them, compare them between jobs, and notice when one value changes.

Here is what that looks like in TelemHQ when an AI usage tracker sends the same numeric fields over time:

TelemHQ payload metric charts showing total tokens, cost, latency, and cached input tokens for an AI usage tracker

That is why I like sending small, boring numeric fields instead of storing a blob of logs. Logs are still useful when you need detail. But for monitoring, I want the values that can answer questions quickly:

Did token usage climb after a prompt change?
Did latency jump when we switched providers?
Did the job process fewer items than normal?
Did failed items start showing up after a deploy?
Did cost increase on one project while the others stayed flat?

This Works Beyond Provider APIs

Provider-neutral payloads are not only for direct API calls.

A Codex usage tracker might send:

{
  "tool": "codex",
  "provider": "openai",
  "model": "gpt-5.2-codex",
  "project": "billing-api",
  "git_repo": "company/billing-api",
  "git_branch": "issue-482-refactor-invoices",
  "issue_id": "482",
  "status": "completed",
  "input_tokens": 118000,
  "output_tokens": 9200,
  "cached_input_tokens": 51000,
  "reasoning_tokens": 14000,
  "total_tokens": 127200,
  "cost_usd": 0.45
}

A RAG indexing job might send:

{
  "provider": "openai",
  "model": "text-embedding-3-large",
  "project": "docs-search",
  "status": "success",
  "documents_scanned": 940,
  "documents_indexed": 936,
  "chunks_created": 12840,
  "items_failed": 4,
  "duration_ms": 184000,
  "total_tokens": 221000,
  "cost_usd": 0.03
}

An eval run might send:

{
  "provider": "anthropic",
  "model": "claude-3-5-haiku-20241022",
  "project": "support-summary",
  "status": "success",
  "test_cases": 200,
  "failed_cases": 14,
  "pass_rate": 0.93,
  "eval_score": 0.88,
  "duration_ms": 64000,
  "cost_usd": 0.42
}

The extra fields can vary by job. The shared fields keep the history connected.

That means a team can look across direct model calls, AI coding tools, RAG syncs, evals, queue workers, and scheduled scripts without forcing every workflow into a different monitoring setup.

Do Not Send The Whole Job

There is a privacy habit I want to be explicit about:

For monitoring, metadata is usually enough.

I would not send prompts, completions, generated code, retrieved documents, customer data, secrets, API keys, full file contents, or raw private paths as payload data. Most monitoring questions can be answered with counts, IDs, status fields, model names, token usage, cost, latency, and project metadata.

For example, this is useful:

{
  "project": "support-summary",
  "provider": "openai",
  "model": "gpt-5.2",
  "status": "success",
  "tickets_processed": 128,
  "tickets_failed": 0,
  "total_tokens": 4840,
  "cost_usd": 0.02
}

This is the line I would try not to cross by default:

{
  "prompt": "Full private prompt text goes here",
  "completion": "Full generated response goes here",
  "customer_email": "person@example.com",
  "local_path": "/private/company/customer-export.csv"
}

The first payload helps you understand the run. The second payload creates a data handling problem you probably did not need.

A Small Starting Point

If you are tracking AI work across more than one provider or tool, I would start with one tracker and two jobs.

Send the same core fields from both:

provider
model
project
status
input_tokens
output_tokens
total_tokens
latency_ms or duration_ms
cost_usd

Then add the fields that prove the job did useful work:

items_processed
items_failed
eval_score
documents_indexed
chunks_created
git_branch
issue_id

That is enough to see the shape of your AI usage in one place.

If you want a provider-specific starting point, I keep setup guides for OpenAI pipeline tracking, Claude pipeline tracking, and Codex usage tracking.

I am building TelemHQ for this exact kind of work: scheduled jobs, ad hoc AI workers, AI coding tools like Codex and Claude Code, model usage across OpenAI, Anthropic, Gemini, Llama, Qwen, DeepSeek, Kimi, and GLM, RAG syncs, evals, agents, and traditional cron jobs that need more than "it ran."

The useful pattern is simple: send one POST after the run, include the payload fields that matter, and keep the history somewhere your future self or team can actually inspect.