You've rolled out AI tools. People say they love it. But leadership asks: "Is this actually saving time? What's the ROI?"
"Vibes" aren't enough. You need data.
At OpenClaw Labs, we've run 30+ AI pilots. Here's how we measure success—and how you can too.
The Problem: Most Teams Don't Track Anything
The typical pattern:
- Buy AI tool licenses
- Run training
- Check usage dashboard once
- Hope for the best
Six months later, leadership asks for ROI. You have... anecdotes.
That's not enough to justify renewal, expansion, or investment in more automation.
What to Measure: The 3-Layer Framework
We track three layers:
Layer 1: Adoption (Are people using it?)
Layer 2: Impact (Is it saving time?)
Layer 3: Outcomes (Is the business better?)
Let's break down each.
Layer 1: Adoption Metrics
Goal: Understand who's using the tool, how often, and for what.
Metrics to track:
1. Daily Active Users (DAU)
- How many people used the tool today?
- Track this weekly. If DAU drops after Week 2, something's wrong.
2. Weekly Active Users (WAU)
- How many people used it at least once this week?
- This should be 60-80% of trained users by Week 4.
3. Breadth of use cases
- Are people using it for 1 thing (e.g., only autocomplete) or 5 things (autocomplete, refactors, tests, docs, debugging)?
- More use cases = deeper adoption.
4. Questions in Slack
- Are people asking "how do I...?" questions?
- More questions = experimentation (which is good).
How to collect this:
- Cursor, Copilot: Most tools have usage dashboards. Export weekly.
- Custom tools (OpenClaw, etc.): Log every action and aggregate in a dashboard.
- Survey: Monthly 3-question survey: "How often did you use AI this month? For what? What worked/didn't?"
What success looks like:
- Week 1: 60-80% of trained users try it
- Week 4: 40-60% are weekly active users
- Month 3: 30-50% are daily active users
If you're below these numbers, dig in: Is training the issue? Is the tool not useful? Are there blockers?
Layer 2: Impact Metrics
Goal: Quantify time saved and productivity gains.
Metrics to track:
1. Time saved per person per week
- Survey or time-tracking: "How much time did AI save you this week?"
- Typical range after 4 weeks: 2-5 hours/week per person.
- After 3 months: 5-10 hours/week.
2. Tasks automated
- How many repetitive tasks are now fully or partially automated?
- Examples: PR summaries, ticket creation, status updates, email drafts.
3. PRs with AI assistance
- For dev teams: what % of PRs used AI?
- Target: 40-60% by Month 3.
4. Cycle time (for dev teams)
- Measure time from commit to merge.
- If AI is working, cycle time should drop 10-20%.
5. Lines of code changed per PR
- If AI is helping with refactors, PRs should be larger (more scope) but not take longer.
How to collect this:
- Survey: Weekly 2-minute survey: "How much time did AI save you? What tasks?"
- GitHub: Pull PR data (labels, size, time-to-merge). Compare before/after.
- Time-tracking: If your team uses Harvest, Toggl, or similar, add an "AI-assisted" tag.
What success looks like:
- Week 4: 3-5 hours/week saved per person
- Month 3: 5-10 hours/week saved
- Month 6: 20-40% faster cycle time on repetitive tasks
If time saved is low, ask: Are people using AI for the right tasks? Are they prompting effectively? Are there blockers?
Layer 3: Outcome Metrics
Goal: Tie AI adoption to business outcomes.
Metrics to track:
1. Velocity (for dev teams)
- Story points completed per sprint
- Should increase 10-20% over 6 months
2. Lead time (for sales/support)
- Time from lead → first contact, or ticket → resolved
- Should drop 20-40% if AI is handling drafts and triage
3. Error rate
- Are bugs increasing? (If AI is writing bad code, this will show up)
- Target: no increase (or slight decrease)
4. Employee satisfaction
- Quarterly survey: "Do you feel more productive? Less burned out?"
- AI should make work better, not just faster
5. Cost savings
- Calculate: (Time saved per person per week) × (hourly rate) × (number of people)
- Compare to tool cost
How to collect this:
- Jira/Linear: Pull velocity data
- Support tools: Pull lead time and resolution time
- Bug trackers: Pull error rate trends
- Survey: Quarterly engagement survey (1 question: "Do you feel AI is making your work better?")
What success looks like:
- Month 6: 10-20% velocity increase
- Month 6: 20-40% faster lead time (sales/support)
- Month 6: No increase in bug rate
- Month 6: 70%+ say "AI makes my work better"
If outcomes aren't moving, the problem is usually:
- Adoption is shallow (people use it, but not for high-impact tasks)
- Training is weak (people don't know how to prompt effectively)
- Automations aren't shipped (they have tools, but not workflow integration)
Example: How We Measured ROI for a 20-Person Dev Team
Client: Series B SaaS company, 20 engineers, 5 ops people.
Pilot: 4 weeks. Cursor workshop + 3 OpenClaw automations.
Week 4 Results:
| Metric | Before | After | Change |
|--------|--------|-------|--------|
| Weekly active users | 0 | 18/20 (90%) | +90% |
| Time saved per person | 0 | 4 hrs/week | +4 hrs |
| PRs with AI assistance | 0% | 45% | +45% |
| Cycle time (median) | 3.2 days | 2.7 days | -16% |
| Bug rate | 2.1/sprint | 2.0/sprint | -5% |
Cost:
- Cursor licenses: $400/month ($20 × 20 devs)
- OpenClaw hosting: $200/month
- Our pilot fee: $8,000 (one-time)
Time saved:
- 4 hrs/week × 20 people = 80 hrs/week
- 80 hrs/week × $80/hr (avg eng rate) = $6,400/week
- $6,400/week × 4 weeks = $25,600 saved
ROI after 4 weeks: $25,600 saved - $8,600 invested = $17,000 net gain
Projected annual ROI: ~$300k saved vs. ~$15k tool cost = 20x ROI
Month 6 Results:
| Metric | Week 4 | Month 6 | Change |
|--------|--------|---------|--------|
| Weekly active users | 18/20 | 19/20 (95%) | +5% |
| Time saved per person | 4 hrs/week | 7 hrs/week | +75% |
| Cycle time (median) | 2.7 days | 2.3 days | -15% |
| Velocity (story points) | 42/sprint | 51/sprint | +21% |
Leadership renewed. We expanded to the ops and product teams.
Tools We Use to Track This
- Google Sheets: Simple dashboard. Columns: Week, DAU, WAU, Time Saved, PRs with AI, Cycle Time.
- Notion: For qualitative feedback. "What worked this week? What didn't?"
- Slack bot: Weekly automated survey: "How much time did AI save you?" (3-question form).
- GitHub API: Pull PR data (labels, size, time-to-merge).
- Jira API: Pull velocity and cycle time.
Total setup time: 2-3 hours. Update weekly: 15 minutes.
What to Do With This Data
If adoption is low (< 50% WAU by Week 4):
- Problem: Training, tool friction, or no clear use case
- Fix: Run office hours, share wins in Slack, improve onboarding
If time saved is low (< 3 hrs/week by Week 4):
- Problem: People are using it for low-value tasks
- Fix: Show high-impact prompts, ship automations
If outcomes aren't moving (velocity, cycle time flat):
- Problem: AI is being used for tasks that don't bottleneck work
- Fix: Identify the actual bottlenecks and target those
Want Help Measuring ROI for Your Team?
We've done this 30+ times. We'll set up tracking, run the pilot, and give you a report that leadership can act on.
Book a 30-min discovery call →
Bonus: Our Post-Pilot ROI Template
Copy this into a Google Sheet or Notion:
| Metric | Before | After | Change |
|--------|--------|-------|--------|
| Weekly active users | | | |
| Time saved per person | | | |
| PRs with AI assistance | | | |
| Cycle time (median) | | | |
| Velocity (story points) | | | |
| Bug rate | | | |
| Cost (tools + training) | | | |
| Value (time saved × hourly rate) | | | |
| ROI | | | |
Fill it out after Week 4 and Month 6. Share with leadership.