Root Cause Analysis — a Story

Photo by Fabrice Villard on Unsplash

Let’s say I recently joined a fictitious online fashion store as an analyst. One morning I receive a message from Bill, our VP of Analytics.

Bill: Jack from reverse logistics team called. said he sees lots of return requests being placed yesterday. Most of these requests are for Nike t-shirts. @Sachin can you please investigate?

Me: Sure @Bill. Looking into it right away.

So I go ahead and start investigating.

I write a query to pull total return requests data for last 30 days. Total return requests indeed went up by ~10% yesterday.

I then modify my query to pull return return requests data where brand is Nike and category is t-shirts. Return requests more than doubled yesterday. That’s a huge spike.

Next, I start digging deeper. I am now looking at data from different dimensions. Is this spike specific to any product or order attributes? I start pulling data by customer geography (state, city), product subcategory, SKU, color, size, price etc.

Having spent a few hours into this investigation, I still haven’t found anything. Seems like the root cause is far away. I message Bill to keep him updated.

Me: @Bill I’m still investigating. I have looked at returns data from all angles but haven’t found anything yet.

Bill: have you looked at order deliveries data? note that we have a 30-day return policy but our customers typically place a return request 3–5 days after receiving the item.

With this new found insight, I again start investigating. But before I start, I am trying to make sense of what Bill said. What did he mean when he said our customers place return requests 3–5 days after receiving the item? How do I use this information in my analysis?

So I speak to my teammate about this. She said it simply means that I should look at delivery data for 3–5 days ago from yesterday, and not yesterday. Since it takes 3–5 days for customers to return, I won’t find any evidence in yesterday’s data for deliveries.

I now repeat the process all over again, this time with order delivery data, and that too focusing on 4–6 days ago. I find that order deliveries for Nike t-shirts were almost double the usual numbers. This probably explains high return requests. To confirm this indeed is the case, I take the cue from Bill’s message and now also look at order data. I know we usually deliver orders in 3–4 days. So I look at orders data from 7–10 days ago. Voila, orders for Nike t-shirts were almost double as well.

After having spent an entire day investigating, I wonder if there’s an easier and faster way to do this. Fortunately, it didn’t turn out to be an issue this time. But next time if it’s a real issue, we need a way to get to the root cause as soon as possible. We must minimize Mean Time to Detection (MTTD) for business issues as well.

Founder, Cuebook (cuebook.ai). Augmented Analytics

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store