Let’s say I recently joined a fictitious online fashion store as an analyst. One morning I receive a message from Bill, our VP of Analytics.
Bill: Jack from reverse logistics team called. said he sees lots of return requests being placed yesterday. Most of these requests are for Nike t-shirts. @Sachin can you please investigate?
Me: Sure @Bill. Looking into it right away.
So I go ahead and start investigating.
I write a query to pull total return requests data for last 30 days. Total return requests indeed went up by ~10% yesterday.
I then modify my query to pull return return requests data where brand is Nike and category is t-shirts. Return requests more than doubled yesterday. That’s a huge spike.
Next, I start digging deeper. I am now looking at data from different dimensions. Is this spike specific to any product or order attributes? I start pulling data by customer geography (state, city), product subcategory, SKU, color, size, price etc.
Having spent a few hours into this investigation, I still haven’t found anything. Seems like the root cause is far away. I message Bill to keep him updated.
Me: @Bill I’m still investigating. I have looked at returns data from all angles but haven’t found anything yet.
Bill: have you looked at order deliveries data? note that we have a 30-day return policy but our customers typically place a return request 3–5 days after receiving the item.
With this new found insight, I again start investigating. But before I start, I am trying to make sense of what Bill said. What did he mean when he said our customers place return requests 3–5 days after receiving the item? How do I use this information in my analysis?
So I speak to my teammate about this. She said it simply means that I should look at delivery data for 3–5 days ago from yesterday, and not yesterday. Since it takes 3–5 days for customers to return, I won’t find any evidence in yesterday’s data for deliveries.
I now repeat the process all over again, this time with order delivery data, and that too focusing on 4–6 days ago. I find that order deliveries for Nike t-shirts were almost double the usual numbers. This probably explains high return requests. To confirm this indeed is the case, I take the cue from Bill’s message and now also look at order data. I know we usually deliver orders in 3–4 days. So I look at orders data from 7–10 days ago. Voila, orders for Nike t-shirts were almost double as well.
After having spent an entire day investigating, I wonder if there’s an easier and faster way to do this. Fortunately, it didn’t turn out to be an issue this time. But next time if it’s a real issue, we need a way to get to the root cause as soon as possible. We must minimize Mean Time to Detection (MTTD) for business issues as well.