How to Nail the Metric Investigation Case Study

Someone Slacks you: "Weekly active users are down 15%. The CEO is asking questions. Can you figure out what's going on?"

That's the metric investigation. It's the most common case study format in data science interviews, and it's the one that most closely mirrors what you'll actually do on the job. A metric moved. Nobody knows why. You have data. Go figure it out.

If you've read our general framework for case study interviews, you already know the high-level approach: clarify, plan, define metrics, explore, conclude, recommend. This post is about how to apply that framework specifically to investigations — the moves that matter, the traps to avoid, and the things that separate a strong performance from a mediocre one.

What makes investigations different

Every case study type requires structured thinking. But investigations have a specific quality that makes them uniquely challenging: the problem is defined by an outcome, not a question.

In a metric design case, someone asks "how should we measure success for this feature?" That's a question. You can start thinking about the answer immediately. In an investigation, someone says "this number went down." That's not a question — it's a symptom. Your job is to turn a symptom into a diagnosis, and that requires a different kind of thinking.

Investigations reward a particular skill: the ability to systematically narrow a problem space. You start with something broad — "WAUs are down" — and through a series of structured moves, you get to something specific — "WAUs are down because Android users in the US who've had accounts for 6+ months stopped opening the app after the v4.2 release." That narrowing process is what the interviewer is evaluating.

Start by scoping the metric

Don't take the prompt at face value. "WAUs are down 15%" is underspecified. Before you do anything else, make sure you understand exactly what you're looking at.

How is the metric defined? Weekly active users could mean anyone who opened the app, anyone who performed a meaningful action, or anyone who logged in. The definition matters because it changes what a decline means.

What's the baseline? Is 15% relative to last week? Last month? Same week last year? A sudden week-over-week drop suggests an event. A gradual month-over-month decline suggests a trend. These are fundamentally different problems.

Is this unusual? Some metrics are naturally volatile. A 15% drop might be within normal variance, especially if it's a smaller product or a metric with weekly seasonality. Ask whether this is outside of expected range.

What's the business context? Has anything changed recently — a product launch, a price change, a marketing campaign ending, a competitor move? This won't always give you the answer, but it gives you hypotheses before you've written a single query.

Two minutes of scoping can save you fifteen minutes of aimless exploration. Do it every time.

Form hypotheses before you query

This is the single highest-leverage habit you can build for investigations. Before you touch the data, list out 3-5 plausible explanations for what you're seeing.

For a WAU decline, your hypothesis list might look like:

Channel-specific drop: A high-performing acquisition channel broke or underperformed, reducing the inflow of new active users.
Product change: A recent release altered the user experience in a way that reduced engagement.
Segment concentration: The decline is concentrated in a specific user group — a platform, a region, a tenure cohort — not broad-based.
Seasonal or external: A holiday, a competitor launch, or a news cycle pulled attention away.
Measurement issue: A logging change, a tracking bug, or a definitional shift is making the metric look worse than reality.

Say these out loud. The interviewer wants to see this. It demonstrates that you can think about a problem from multiple angles before committing to a direction.

Now — and this is the important part — use your hypotheses to decide what to query first. Don't just segment the metric by every available dimension and hope something jumps out. Pick the hypothesis you think is most likely, design a query to test it, and go.

Segment, isolate, drill down

The core mechanic of an investigation is decomposition. You're taking a single number — "WAUs are down 15%" — and breaking it apart until you find where the problem lives.

The first cut matters. Pick the dimension most likely to reveal concentration. If you suspect a channel issue, segment by acquisition source. If you suspect a product change, segment by platform or app version. If you don't have a strong prior, segment by the highest-cardinality dimension you have — usually platform, region, or user tenure.

You're looking for asymmetry. If WAUs dropped 15% overall but 40% on Android and 0% on iOS, you just eliminated half the problem space. If the drop is perfectly even across every segment, that's a signal too — it suggests something that affects all users equally, which narrows your hypothesis set differently.

Follow the concentration. Once you find where the drop is concentrated, go one level deeper. If it's Android, is it all Android users or a specific OS version? A specific region? Users who updated to the latest release vs. those who didn't? Each query should cut the remaining problem space in half.

Know when to pivot. If your first two queries don't show any clear concentration, don't keep segmenting the same metric by increasingly obscure dimensions. Step back, revisit your hypotheses, and try a different angle. Maybe the issue isn't where the users are — it's when they dropped off, or what they stopped doing. Switch from segmenting the metric to looking at the timeline or the user journey.

A common mistake here is running too many queries without synthesizing. After every 2-3 queries, pause and say to the interviewer: "Here's what I've learned so far, and here's what I'm going to look at next." This keeps your investigation visibly structured and prevents the drift that kills most performances.

Check for the boring explanations

Before you build an elaborate theory about why users are churning, rule out the mundane stuff. Experienced interviewers often plant one of these in the data specifically to see if you check:

Logging or tracking changes. Did something break in the event pipeline? A new SDK version that drops events? A change in how "active" is defined? If the drop coincides perfectly with a deploy date, check whether it's a real behavioral change or an instrumentation artifact.

Seasonality. Is this week historically low? Comparing to the same week last year (if you have the data) can save you from chasing a seasonal pattern that's completely normal.

Composition shifts. Sometimes the metric didn't actually change — the population did. If you acquired a large cohort of low-engagement users last month and they're now entering their natural churn window, overall WAUs can decline even though per-cohort engagement is flat. This is Simpson's Paradox territory, and it trips people up regularly.

You don't need to check all of these in every investigation. But having them in the back of your mind — and knowing when to pull them out — is what makes an investigation feel thorough rather than superficial.

Build the narrative

Around the 25-minute mark of a 45-minute case, you should shift from exploration to synthesis. If you're still writing queries at minute 35, you're going to run out of time before you can deliver a coherent answer.

Your narrative should follow a clear structure:

What happened. State the finding plainly. "The 15% WAU decline is almost entirely concentrated in Android users in the US. iOS and international markets are flat."

Why it happened. Connect your finding to a cause — or your best hypothesis for a cause. "The drop began the week of the v4.2 release. Android users on the new version show a 35% lower session rate than those still on v4.1. This suggests a regression in the Android update."

How confident you are. Be honest. "I haven't been able to confirm a specific UX issue from the data alone, but the timing and platform concentration strongly suggest the release is the culprit. I'd want to cross-reference with crash logs and session recordings to confirm."

What to do about it. See the next section.

Notice the structure: observation → explanation → confidence → action. This is how senior analysts communicate. If you can deliver this cleanly in 3-4 minutes, you'll leave a strong impression regardless of whether your root cause is exactly right.

Close with a recommendation

Don't just stop at "the Android release probably caused it." Close the loop.

Immediate action: "I'd recommend the engineering team review the v4.2 release for regressions in the core app experience on Android — specifically around session initialization and feature loading times."

Further investigation: "With more time, I'd want to look at crash rates by app version, compare specific feature engagement pre- and post-release, and check if there's a UX change in v4.2 that altered navigation patterns."

Monitoring: "Going forward, we should set up automated alerting on WAUs segmented by platform and app version so we catch this kind of divergence within days, not weeks."

This is the part that makes the interviewer think "I want this person on my team." Findings are expected. Recommendations show ownership.

Common mistakes to avoid

Starting with SQL instead of thinking. Your first move should be asking questions and forming hypotheses. Not SELECT * FROM users LIMIT 10.

Boiling the ocean. You don't need to check every possible dimension. Be targeted. Follow the signal.

Forgetting to narrate. Silence during an investigation is deadly. The interviewer can't evaluate thinking they can't hear. Talk through your reasoning as you go — what you're looking for, what you found, what it means.

Getting attached to a hypothesis. If the data doesn't support your theory, drop it and move on. Don't spend 15 minutes trying to make a dead-end work. Pivoting is a strength, not a failure.

Running out of time without a conclusion. Budget your time. If you have 45 minutes, aim to start synthesizing by minute 25-30. An incomplete investigation with a clear narrative beats a thorough investigation with no conclusion.

Practice this until it's automatic

The metric investigation is a skill. The good news is that the pattern — scope, hypothesize, segment, isolate, narrate, recommend — is the same every time. What changes is the data and the specific scenario. Once the pattern is second nature, you can spend your mental energy on the actual problem instead of figuring out what to do next.

That only happens with reps. Read a scenario, open a dataset, set a timer, and go. Do it five times and you'll feel the difference. Do it ten times and the interview will feel familiar instead of terrifying.

(Rabbit Hole has investigation cases modeled on real interviews at Meta, DoorDash, and Netflix — a good place to start if you want realistic practice.)