Morris Misel leading a strategic AI and foresight session with a team of professionals around a table

The KPIs I Give to Leaders Who Ask: ‘How Do We Know If AI’s Working?’

A CEO leaned across the boardroom table and asked me

“Morris, how do we actually know if our AI is working?”

Not performing.

Not saving money.

Working — as in making us smarter, better, more human

It’s the right question

And one I get asked more often than you’d think

Because while most leaders are tracking ROI, speed, and automation output, what they’re really trying to work out is

Are we becoming more effective, or just more efficient?

That’s where my measurement tools come in

They’re not just KPIs

They’re signals

Signals that show you whether your organisation is evolving with AI or just installing it

These are the same tools I use in keynotes, workshops, offsites and closed-door strategy sessions across sectors

They offer clarity where standard dashboards do not

And they help leaders deal with PTFA — Past Trauma Future Anxiety — the invisible tension that grows when change moves faster than understanding

At the centre is HUMAND – Human + Machine + AI

This list shows you how to measure whether that blend is actually working


Why You Need KPIs Beyond Dollars and Outputs

Money is easy to count

But it’s a lagging indicator

It shows you what happened

Not what’s happening

If you only measure what’s easy, you’ll miss what’s essential

Things like

  • Trust erosion between human and machine

  • Mindless automation adoption

  • Decision fatigue hiding under compliance

  • Innovation drop-offs in highly optimised teams

  • People behaving like systems while systems get more human

These are not tech problems

They’re early warning signs

The organisations that thrive next aren’t the ones with the best AI

They’re the ones reading these signals first and responding fast


The Real-World KPIs I Use to Measure Human + Machine + AI Collaboration


1. Decision Override Rate

What it is: How often a human overrules an AI suggestion and improves the outcome

Why it matters: It shows whether human judgment is still adding value or being sidelined

How to measure: Audit decision logs. Track override frequency and whether it led to a better result

Benchmark: High-quality overrides at 10 to 15 percent suggest a healthy collaboration tension

In practice: In a workshop with a national insurer, their top teams overrode AI claim decisions 12 percent of the time. Not recklessly. Thoughtfully. And it improved customer retention. They weren’t ignoring the system. They were completing it


2. Adaptability Lag

What it is: The time it takes for humans to adjust after the AI learns something new

Why it matters: If the AI evolves but the people don’t, the gap widens

How to measure: Track how long it takes to integrate new model outputs or recommendations into actual behaviour or workflow

Benchmark: Two weeks or less is agile. Longer than four means the machine is moving ahead without its people

In practice: A logistics firm dropped lag time from 42 days to 9 by embedding AI update alerts into weekly ops meetings. Routing got faster. Mistakes dropped. People felt more in sync


3. Human Confidence Score

What it is: How confident your people feel using and challenging AI recommendations

Why it matters: Low confidence usually means blind obedience or total avoidance. Both are dangerous

How to measure: Ask for a 1 to 10 rating in pulse checks and project reviews. Track improvement over time

Benchmark: You want to see confidence rising above 7 and trending up

In practice: A financial firm’s junior staff scored 4 out of 10 using an AI portfolio builder. When we added weekly “gut-check” sessions to discuss overrides and rationale, scores lifted to 8.2 in three months


4. Curiosity Quotient (CQ)

What it is: The frequency and depth of questions your people ask after seeing AI-generated insights

Why it matters: If they’re not asking why or what else, they’re not engaged

How to measure: Count follow-up questions per AI insight. Track how often people dig deeper

Benchmark: Two or more thoughtful questions per recommendation is healthy. Zero means disengagement

In practice: A retail client ran monthly “curiosity audits.” Teams that asked more follow-up questions made better stocking decisions and built stronger vendor relationships. Curiosity flowed in both directions


5. Context Integrity Ratio

What it is: How often AI-generated decisions align with local knowledge or cultural nuance

Why it matters: AI lacks instinct for nuance. Humans don’t

How to measure: Survey frontline feedback and compare with AI-generated actions. Look for disconnects

Benchmark: Above 90 percent alignment is strong. Below 75 percent is cause for review

In practice: A Southeast Asian retailer’s AI pricing model ignored local habits. After adding context feedback, loyalty rose and returns fell — without changing the pricing algorithm itself


6. Collaboration Health Score

What it is: A measure of trust and clarity between humans and AI in the workflow

Why it matters: AI that performs well but isn’t trusted won’t be used properly

How to measure: Ask people if they know the AI’s role, whether they trust its output and how easy it is to work with

Benchmark: Over 80 percent positive response means good collaboration. Under 65 means friction

In practice: A tech firm scored 52 percent. People didn’t trust the tool. After resetting its role and adding weekly AI-human check-ins, scores jumped and so did output quality


7. Human Downtime Drift

What it is: The time people lose waiting for AI to finish or figuring out what it’s doing

Why it matters: Automation should reduce lag, not create it

How to measure: Log periods of inactivity caused by unclear AI roles or delays. Compare across teams

Benchmark: Anything above 5 percent of time lost is a red flag

In practice: A hospital system found junior staff waited 30 minutes per shift for AI diagnostics. By staggering tasks, they regained that time and dropped patient processing stress


8. Meaning-Making Index

What it is: Your team’s ability to explain not just what the AI decided, but why it matters

Why it matters: Insight needs human framing to become impact

How to measure: Ask team leads to explain AI outputs in plain language and strategic relevance

Benchmark: 80 percent alignment with purpose shows understanding. Anything less needs work

In practice: A consulting firm found that many leaders could describe data, but not connect it to client goals. We rebuilt that muscle through storytelling sessions, not dashboards


9. Signal Response Velocity

What it is: How fast your org responds to weak signals, from AI or people

Why it matters: Future-readiness is not about seeing first. It’s about acting fast enough

How to measure: Log time between signal and first action taken

Benchmark: Less than two weeks for internal signals. Under 30 days for external trends

In practice: A global supplier had AI flag logistics risks 23 days before action. The system worked. The people didn’t trust it — yet


10. Purpose Alignment Pulse

What it is: Whether AI-enabled decisions match your values and long-term goals

Why it matters: Optimisation can drift from intention without regular recalibration

How to measure: Post-decision reviews scored against stated purpose

Benchmark: Above 90 percent shows clarity. Below 70 signals misalignment

In practice: A values-driven wealth manager realised their AI had prioritised returns over social impact. A small tweak to weighting brought both back into harmony


11. Signal Fatigue Score

What it is: How overwhelmed people feel by too many AI signals or alerts

Why it matters: If everything is urgent, nothing is

How to measure: Monthly surveys on decision clarity and overload

Benchmark: Less than 20 percent reporting overload is ideal. Above 40 is a serious warning

In practice: One team had five AI systems sending alerts. Nobody acted on any of them. We consolidated them into one weekly insight brief. Output improved


12. Feedback Loop Strength

What it is: How often and how clearly humans feed back into the AI system

Why it matters: No feedback means no learning

How to measure: Track how many human inputs shape system updates. Reward good feedback

Benchmark: One feedback loop per major workflow. More is better

In practice: A publisher added a “was this useful” button to AI content briefs. Writers clicked it 80 percent of the time. That feedback made the next model 27 percent more relevant


13. Emotional Friction Index

What it is: The stress, resistance or discomfort people feel when using AI

Why it matters: This is where PTFA shows up first

How to measure: Sentiment surveys. Focus groups. Anonymous reflections

Benchmark: Less than 15 percent experiencing emotional friction is ideal

In practice: A senior exec admitted in a strategy session, “I don’t trust it because I don’t understand it.” That unlocked a wave of quiet resistance across the team. Naming it changed everything


14. Human Stretch Zone

What it is: Time spent doing truly human work — like creativity, empathy and vision

Why it matters: This is where HUMAND comes alive

How to measure: Journals, surveys or interviews asking people where they feel most stretched

Benchmark: More than 40 percent of the work week in stretch mode is excellent

In practice: A manufacturing client redesigned daily roles after a HUMAND audit. Machines did repeatables. AI flagged exceptions. Humans solved creatively. Everyone got better at what only they could do


Final Thought: What You Measure Shapes What You Become

We’re not just redesigning dashboards
We’re redesigning meaning

If your metrics only track speed, scale or ROI, you’ll miss the signals that matter most

But if you start measuring curiosity, context, stretch and override
You’ll start seeing your future before it arrives

That’s the kind of clarity I build with leadership teams and strategy groups every day

Because strategy doesn’t live in spreadsheets
It lives in the dance between human and machine and AI

So. What are you measuring?

#AI #FutureOfWork #Leadership #KPIs #BusinessStrategy #HumanCentredAI #MorrisMisel #StrategicForesight #ExecutiveTools #FuturistThinking #HUMAND #NonFinancialMetrics #CEOLeadership #EventSpeaker #AIIntegration #PostAutomation

Leave a comment