June 8, 2026Engineering6 min read

Human Activity Is Run Telemetry

A sunlit desk with an open notebook, coffee cup, and faint workflow trace overlays.

Human activity is not just app noise. It is the run trace for how work actually happened.

Software has traces. Work should too.

Software teams already know how to improve systems from logs, traces, evals, errors, and replay.

We do not usually treat human work the same way.

A person opens a calendar invite, reads a Slack thread, checks GitHub, searches a document, drafts a reply, asks a model for help, edits the output, sends a message, and moves on.

To the user, that was work. To most software systems, it was scattered app activity. To an agent, it is often missing context.

Apprentice is built around a different idea: human activity is run telemetry.

A run trace can start with a person

A run trace does not need to come only from an agent. A person's workflow can also become a trace: what happened, what repeated, what changed, and what was accepted or corrected.

That does not mean every moment is perfectly understood. It does not mean the system can always infer intent.

It means the raw material of work can be captured with permission, redacted where needed, structured carefully, and used as evidence for future assistance.

Index cards arranged as a work graph on a sunlit desk.

Recording is the beginning

The hard part is not recording. Recording is only the beginning.

The hard part is deciding what matters, what is sensitive, what is stale, what is a one-off action, what is a repeated pattern, and what should become an agent-executable workflow.

That is why a useful trace needs source boundaries, redaction posture, review state, and correction history. Without those, activity becomes surveillance or noise instead of product evidence.

The useful unit is not a screen recording. It is reviewed evidence about how work moved from context to outcome.

Why traces matter for agents

Agents improve when they can compare what happened with what should have happened.

That is easy to say for agent runs. The model tried a tool call, got an error, retried, then succeeded. The trace is obvious.

Agent loops are useful, but they mostly optimize the agent's own behavior. They do not automatically explain the human workflow the agent is supposed to help with.

The missing layer is not another loop. It is a work trace: evidence of what happened, what repeated, what changed, and what should be reviewed before becoming automation.

Human work is messier. People multitask. They check messages that may not be relevant. They switch tasks midstream. They use context they never write down.

That ambiguity is not a reason to ignore human activity. It is a reason to model uncertainty explicitly.

The Apprentice trace model

The direction we are testing is to convert permissioned activity into a durable work graph and traceable work objects.

Those objects can then support proposals, drafts, workflow candidates, reviews, handoffs, value measurement, and debugging.

A useful trace should preserve at least four things: what evidence existed, what the system inferred, what action or artifact resulted, and how the user corrected or accepted it.

That correction loop is central. If a user changes a draft, rejects a proposal, marks evidence as stale, or says the system misunderstood the task, that is not a failure to hide. It is training data, product feedback, and a better map of the user's work.

Not every action becomes automation

Classic workflow automation assumes the process is known. It asks someone to define steps, triggers, conditions, and outputs.

Many real work patterns are not that clean.

Sometimes the right move is a one-off proposal: this changed, so Apprentice should prepare a brief or ask whether to follow up. Sometimes the right move is a reusable workflow: this pattern happened several times, so Apprentice should propose a repeatable path. Sometimes the right move is to watch longer.

Human telemetry lets the product choose among those paths with evidence instead of guessing from a single prompt.

If agent activity can be optimized from traces, human activity should be usable as the baseline trace.

The benchmark question

The first benchmark is not whether Apprentice can produce a flashy automation demo.

The benchmark is whether the system can turn observed work into a useful, reviewable, evidence-backed next step with less user effort over time.

That means measuring acceptance, corrections, edit distance, review time, source quality, task completion, handoff rate, and repeatability.

The thesis is practical: human activity can become the baseline trace for safer, more useful automation, provided the loop stays permissioned, reviewable, and honest about uncertainty.