The title of this blog is a play on words in a facetious way. Although there is significant value in the SSAS Time Series data mining algorithm, I do consider it the weakest of the bunch and don’t often achieve results so good that I’m shocked. That isn’t due to any deficiencies in the implementation of the algorithm, but in the nature of forecasting a value based on the series of past values.
To illustrate how the Time Series algorithm can sometimes be unreliable, think of a comet orbiting through space. We’re able to forecast its location very well because we know its location and various points of the recent past and we know its periodicity. We’re able to forecast when we see it again to a pretty accurate degree for centuries.
If it weren’t for the added complexity of other objects affecting the comet’s path to a small degree, we could forecast its location indefinitely. That is, until the comet smashes into something, then all bets are off. At best, the comet is off on another trajectory rendering its positions of the recent past irrelevant. At the worst, the comet no longer exists as a single entity, making a forecast of its next position somewhat foolish.
Unlike space where collision are relatively rare, life on Earth involves countless objects (people and things) interacting perhaps in periodic rhythms, but there are so many moving parts the complexities of the rhythms are often no longer really rhythms. Our actions are constantly crashing into each other disrupting the rhythms of our lives. Eventually forces such as our will and laws of the land restore the rhythms to a good extent, but the future will play itself our very differently nonetheless.
Again, the Time Series algorithm does present significant value, but it is prone to unacceptable inaccuracy when attempting to predict events in our high-collision world. It knows nothing of high-impact unpredictable events such as a 9/11 (if it were predictable, we would have stopped it). But here is a novel way that we can employ the Time Series algorithm into our Predictive Analytics framework in a highly valuable fashion.
To begin explaining, I need to explain that my use of the word “shocked” in the title of this blog isn’t in the “Will you look at those wonderfully clear patterns!” context. I mean shocked in the context of how our brains decide upon what to focus our attention on.
The idea is to compare what the Time Series algorithm would have forecast versus what actually happened. One may wonder why would we care about a predicted value once we have the actual value. This is a large component of how we humans escape mathematical fates.
Our brains are constantly predicting, consciously and mostly subconsciously, what is going to happen next. These predictions are based on our individual experiences. When what actually does happen is not what our brains predicted, we are surprised. That SURPRISE forces our attention on what is different. Most of the time that was a good clue of a predator lurking in the bushes. This notion is very well described in the highly readable book, On Intelligence, by Jeff Hawkins.
In today’s world that inconsistency between what statistics-based algorithms predicts and what actually happens can be a good hint that a customer’s circumstances have changed or that we’re about to sell more cars than we have in inventory. For example, if we’re a pharma company and our Time Series model predicts a segment of customers will purchase a million units of aspirin this coming March and have done so with little variance over the past three years, but sitting in April we see they only purchased half a million units, it’s a good clue that something has happened.
We may not be able to use that information to accurately forecast the inventory needs. However, that information is still valuable in pointing out that we may need to divert resources towards protecting our brand of aspirin. This is an example of concurrence between multiple models.
Our thought are reflections of models built in our heads which are more sophisticated than those built with data mining tools, but models nonetheless. The big difference is that the models in our head are highly interelated, leveraging concurrence between various points of view (similar experiences we’ve had).
I used to think that the Internet would make people’s behavior more homogenous. It probably does in many respects, but in other respects it seems to make life more chaotic. I used to “collide” with fewer people than I do now thanks to LinkedIn and Facebook. Through their postings, I learn of many more interesting things that does impact a wide range of minor decisions that I make.
To conclude, it’s always worth reminding people that Predictive Analytics doesn’t promise the correct answer. It promises the best guess based on what it knows about that has happened in the past. The main idea being if something has happened before under certain conditions, it’s likely to happen again under simliar circumstances.
Although that is a very valid point, in this complex world of countless collisions and intelligent predators undermining your predictions, Predictive Analytics is an iterative process. The template is the process by which a doctor diagnoses your condition. She has goals of identifying your condition before it’s too late to address it while utilizing the resources of time and money as efficiently as possible. During that iterative process, she utilizes statistics in her head and hopefully from empirical sources guiding her towards the correct diagnosis.
In the end, a human probably still provides the “best guess” more often than not, at least today. But the quality of that human’s guess AND the magnitude of the number of guesses the human can make is significantly enhanced through the parntership of the statistics-based best guess and the human’s common sense.