Autonomous Optimization
Continuous improvement loop that detects metric changes, generates hypotheses, runs experiments, and auto-implements winners
npx gtm-skills add drill/autonomous-optimizationWhat this drill teaches
Autonomous Optimization
This is the drill that makes the Durable level fundamentally different from Scalable. Instead of running a play and measuring results, this drill creates an always-on agent loop that:
- Monitors — detects when metrics plateau, drop, or spike
- Diagnoses — generates hypotheses for what to change
- Experiments — designs and runs A/B tests on the top hypothesis
- Decides — evaluates results and auto-implements winners
- Reports — generates weekly executive summaries of what changed and why
The goal is to find the local maximum of each play — the best possible performance given the current market, audience, and competitive landscape — and maintain it as conditions change.
Input
- A play that has been running at Scalable level for at least 4 weeks (baseline data required)
- PostHog tracking configured with the play's core events
- n8n instance for scheduling the optimization loop
- Anthropic API key for Claude (hypothesis generation + evaluation)
The Optimization Loop
Phase 1: Monitor (runs daily via n8n cron)
Build an n8n workflow triggered by a daily cron schedule:
- Use
posthog-anomaly-detectionto check the play's primary KPIs - Compare last 2 weeks against 4-week rolling average
- Classify: normal (within ±10%), plateau (±2% for 3+ weeks), drop (>20% decline), spike (>50% increase)
- If normal → log to Attio, no action needed
- If anomaly detected → trigger Phase 2
Phase 2: Diagnose (triggered by anomaly detection)
- Gather context: pull the play's current configuration from Attio (targeting, messaging, cadence, channel mix)
- Pull 8-week metric history from PostHog using
posthog-dashboards - Run
hypothesis-generationwith the anomaly data + context - Receive 3 ranked hypotheses with expected impact and risk levels
- Store hypotheses in Attio as notes on the play's campaign record
- If the top hypothesis has risk = "high" → send Slack alert for human review and STOP
- If risk = "low" or "medium" → proceed to Phase 3
Phase 3: Experiment (triggered by hypothesis acceptance)
- Take the top-ranked hypothesis
- Design the experiment: use
posthog-experimentsto create a feature flag that splits traffic between control (current) and variant (hypothesis change) - Implement the variant using the appropriate fundamental (e.g., if the hypothesis is "change email subject line," use
loops-sequencesorinstantly-campaignto create the B variant) - Set the experiment duration: minimum 7 days or until 100+ samples per variant, whichever is longer
- Log the experiment start in Attio with: hypothesis, start date, expected duration, success criteria
Phase 4: Evaluate (triggered by experiment completion)
- Pull experiment results from PostHog
- Run
experiment-evaluationwith control vs variant data - Decision:
- Adopt: Update the live configuration to use the winning variant. Log the change. Move to Phase 5.
- Iterate: Generate a new hypothesis building on this result. Return to Phase 2.
- Revert: Disable the variant, restore control. Log the failure. Return to Phase 1 monitoring.
- Extend: Keep the experiment running for another period. Set a reminder.
- Store the full evaluation (decision, confidence, reasoning) in Attio
Phase 5: Report (runs weekly via n8n cron)
- Aggregate all optimization activity for the week: anomalies detected, hypotheses generated, experiments run, decisions made
- Calculate: net metric change from all adopted changes this week
- Generate a weekly optimization brief using Claude:
- What changed and why
- Net impact on primary KPIs
- Current distance from estimated local maximum
- Recommended focus for next week
- Post the brief to Slack and store in Attio
Guardrails (CRITICAL)
- Rate limit: Maximum 1 active experiment per play at a time. Never stack experiments.
- Revert threshold: If primary metric drops >30% at any point during an experiment, auto-revert immediately.
- Human approval required for:
- Budget changes >20%
- Audience/targeting changes that affect >50% of traffic
- Any change the hypothesis generator flags as "high risk"
- Cooldown: After a failed experiment (revert), wait 7 days before testing a new hypothesis on the same variable.
- Maximum experiments per month: 4 per play. If all 4 fail, pause optimization and flag for human strategic review.
- Never optimize what isn't measured: If a KPI doesn't have PostHog tracking, fix tracking first (use
posthog-gtm-eventsdrill) before running experiments on it.
Output
- Continuous metric monitoring with anomaly alerts
- Automated hypothesis → experiment → evaluation → implementation cycle
- Weekly optimization briefs
- Audit trail of every change, why it was made, and what happened
When to Stop
The optimization loop runs indefinitely at Durable level. However, it should detect convergence — when successive experiments produce diminishing returns (<2% improvement for 3 consecutive experiments). At convergence:
- The play has reached its local maximum
- Reduce monitoring frequency from daily to weekly
- Report to the team: "This play is optimized. Current performance is [metrics]. Further gains require strategic changes (new channels, new audience, product changes) rather than tactical optimization."