Mendel Traffic
The Boeing problem, restated for AI-augmented developer tooling: when an autonomous agent is managing a production experiment, how do you design the moment it hands control back to the engineer?
Mendel Traffic is Google's experiment management platform — used by hundreds of thousands of engineers to deploy, monitor, and ramp feature experiments across Google's production infrastructure. The platform is over 15 years old. I joined as UX lead to modernize it, with a specific focus on reducing stage ramp delays by surfacing the right data at the right decision point.
The work quickly became about something larger: how do you introduce progressive AI autonomy into a system that engineers have trusted with production traffic for 15 years?
The Problem
Experiment onboarding in Mendel was slow. Engineers had to manually configure GCL flags — a technical debt artifact from the platform's early architecture — before they could begin ramping an experiment. For greenfield users, this was a significant barrier. For experienced users, it was a ritual that added no value.
Beyond onboarding, the experiment monitoring workflow was fragmented. Traffic data lived in Mendel. Insights data lived in a separate product. Engineers had to context-switch between surfaces to answer basic questions about their experiment's health — questions they often didn't know to ask until something had already gone wrong.
The Redesign
I drove end-to-end redesign of the experiment onboarding flow, eliminating mandatory GCL flag configuration through automation. What had been a manual prerequisite became a background process — the system detected the experiment context and configured the flags automatically.
I also bridged the historically siloed Traffic and Insights products by integrating live insights data directly into the traffic workflow. Engineers could now see the data that mattered — metric movements, anomaly signals, peer experiment comparisons — without leaving the context where they were making decisions.
The insight driving this integration came from an unusual source: synthesizing thousands of existing UXR studies using NotebookLLM alongside quant UX researchers. The finding wasn't that engineers lacked data. It was that they encountered data too late — after a decision point had already passed — and in a separate context from where the decision was being made.
Progressive AI Autonomy
The most significant design work in Mendel is the conceptual framework for progressive AI autonomy in experiment authorship. This is where the Boeing question becomes the central design problem.
The framework defines four stages of AI involvement in experiment management, each with explicit UI mechanisms for the engineer to control the handoff:
- Stage 1 — IDE-level recognition: The AI detects a code change that suggests a new experiment opportunity and surfaces a proactive rollout prompt. The engineer is in full control. The AI is advisory only.
- Stage 2 — Guided authorship: The AI assists with experiment configuration, flag setup, and hypothesis documentation. Co-creation mode. The engineer approves every material decision.
- Stage 3 — Monitored autonomy: The AI monitors the running experiment, surfaces anomalies, and recommends ramp decisions. The engineer reviews and approves recommendations before they execute.
- Stage 4 — Delegated operation: The AI manages routine ramp stages autonomously, surfaces only the decisions that exceed a configurable confidence threshold, and maintains a complete audit trail of every autonomous action.
Each stage is opt-in. Engineers can set their autonomy level per experiment, per metric type, or globally. The system never advances to a higher autonomy stage without explicit permission.
The question I've been designing around since a Boeing flight deck in 2009: when the system is operating autonomously and something unexpected happens, can the engineer take back control — and do they have the information they need to do it correctly?
The Handoff Mechanism
The handoff from AI to human is the hardest design problem in the system. When an autonomous agent surfaces a decision for human review, several things have to be true simultaneously:
- The engineer must understand what the agent was doing and why it stopped
- The agent must surface the conflicting signals that triggered the escalation — not just its recommendation, but the uncertainty behind it
- The engineer must be able to make a decision quickly without needing to reconstruct the full experiment context from scratch
- If the engineer overrides the agent, the system must incorporate that correction into future autonomous decisions
This is the conflicting signal hierarchies pattern from the PAIR Guidebook, implemented at the level of a production engineering workflow. The agent doesn't silently resolve uncertainty and present a clean recommendation. It surfaces the disagreement between signals — the metric that says ramp, the anomaly that says wait — and gives the engineer the information they need to form their own judgment.
Status
Active project. Feb 2026–Present. Details available on request.