May 25, 2026

How Teams Can Track AI Usage In One Place

The useful part is not surveillance. It is agreeing on a small set of fields so scattered AI work can be understood later.

Agree On The Fields First

Team AI usage tracking works best when the payload is boring.

Every job does not need the same provider. Every person does not need the same tool. But the shared fields should mean the same thing across the team.

For example, a useful team payload might include:

provider
model
project
status
total_tokens
input_tokens
output_tokens
cached_input_tokens
reasoning_tokens
cost_usd
latency_ms or duration_ms
items_processed
items_failed
git_repo
git_branch
issue_id

The key field is often project.

If three teammates all send project: "billing-api", their separate jobs can still roll up into one view of billing-api activity. The project becomes the shared noun that connects local scripts, coding agents, scheduled jobs, and CI tasks.

That is more useful than forcing every workflow into the same integration.

If your team has not agreed on a shared payload shape yet, start with one payload format for OpenAI, Claude, Codex, and other AI jobs.

A Small Payload Example

Here is the kind of payload I would want from a team member's AI job:

{
  "provider": "openai",
  "model": "gpt-5.2-chat-latest",
  "project": "billing-api",
  "git_repo": "company/billing-api",
  "git_branch": "issue-482-refactor-invoices",
  "issue_id": "482",
  "status": "success",
  "total_tokens": 9060,
  "cached_input_tokens": 3100,
  "reasoning_tokens": 800,
  "cost_usd": 0.42,
  "items_processed": 128,
  "items_failed": 0
}

That payload does not tell you what the prompt said. It does not include generated code. It does not expose customer data.

It says enough to answer the operational questions:

what ran
where it belongs
how much model work happened
whether it finished cleanly
whether it did useful work
which branch or issue it came from

That is usually the level of detail a team needs first.

The Team View Should Answer Practical Questions

The reason to collect this metadata is not to create a leaderboard of who used the most AI.

That kind of view can get weird quickly.

The better question is: where does the team need visibility?

For a manager or tech lead, a shared view should answer:

Which projects are using the most tokens?
Which projects are failing most often?
Which jobs are getting slower?
Which model/provider mix is common across the team?
Which branch or issue created an unexpected cost spike?
Which old automation still runs after the person who built it moved on?

For a developer, the same data should answer a different set of questions:

Did my job send the fields the team expects?
Did this branch cost more than the previous branch?
Did cached tokens reduce the cost?
Did the eval or quality score drop after a change?
Did the job process fewer items than normal?

That is the line I would keep: make the work easier to explain, not the person easier to judge.

One Project Name Beats Three Dashboards

Imagine three people are working around the same service.

One person runs an AI coding tool while refactoring invoice logic:

{
  "tool": "codex",
  "provider": "openai",
  "model": "gpt-5.3-codex",
  "project": "billing-api",
  "git_branch": "issue-482-refactor-invoices",
  "total_tokens": 127200,
  "cost_usd": 0.45,
  "status": "completed"
}

Another person runs a Claude enrichment worker:

{
  "provider": "anthropic",
  "model": "claude-3-5-haiku-20241022",
  "project": "billing-api",
  "status": "success",
  "items_processed": 340,
  "items_failed": 2,
  "total_tokens": 18800,
  "duration_ms": 42000,
  "cost_usd": 0.08
}

A scheduled OpenAI report runs every morning:

{
  "provider": "openai",
  "model": "gpt-5.2-chat-latest",
  "project": "billing-api",
  "status": "success",
  "items_processed": 64,
  "items_failed": 0,
  "total_tokens": 12100,
  "latency_ms": 3100,
  "cost_usd": 0.04
}

Those jobs are different. The shared project value is what makes them comparable.

You can still drill into the details later, but the first pass is simple: billing-api used this many tokens, spent this much, ran these jobs, and had these failures.

Project usage dashboard showing ai-platform token totals, spend, contributors, and one failed run

Do Not Send The Whole Job

Team monitoring has a privacy problem if it collects too much.

For most AI jobs, metadata is enough.

I would not send:

prompts
completions
generated code
source files
customer records
secrets
API keys
raw private paths

I would send:

project
provider
model
status
token counts
estimated cost
latency or duration
item counts
failure counts
Git metadata
eval or quality scores

This matters more on a team because the audience is wider. A payload that feels harmless in a private script may become risky when it appears in a shared dashboard, an exported report, a webhook, or a summary email.

Keep the shared history useful. Keep the sensitive work out of it.

Use Assertions For Bad Successful Runs

A successful process is not always a successful job.

An AI job can exit cleanly and still process zero items, exceed a cost budget, fail half its inputs, or return an eval score below the team's threshold.

That is where payload assertions help.

For a team AI job, I would start with rules like:

status = success
items_processed > 0
items_failed = 0
cost_usd <= 5
eval_score >= 0.8

The exact rules depend on the job. The habit is the point.

If the team agrees on fields, the team can also agree on what a healthy run means.

Start Small

You do not need a full AI usage policy to start.

Start with one project and a small payload contract:

Pick a shared project value.
Add provider, model, status, and token fields.
Add cost_usd when you can estimate it.
Add job-result fields such as items_processed, items_failed, or eval_score.
Add Git fields when the work happens inside a codebase.
Avoid prompts, completions, generated code, secrets, customer data, and raw private paths.

This is the part of TelemHQ I care about most for teams: not proving that someone used AI, but making the work legible enough that a team can debug it, budget it, and improve it.

If a team can agree on a few boring fields, the scattered usage starts to become a shared operational history.