Undermined Predictive Analytics

Predictive Analytics is a combination of common sense and statistical prowess. At this time, most of the common sense is with humans and most of the statistical prowess is with machines. As technologies such as self-service data mining applications and semantic web features improve over the coming years machines will be more and more able to shoulder the common sense load.

Common sense is the most important skill for a Predictive Analytics consultant to have in her arsenal. For a Predictive Analytics consultant, mastery of the other important skills such as statistical algorithms and database technologies are pretty useless without common sense. If you think about it, everything we think about is Predictive Analytics. Even for cavemen.

I’m mentioning this because it’s important to understand that it’s easy to misinterpret the results of Predictive Analytics. It would be merely bad if the worst result was that it only told us things we already knew. It would be badder if it gave us wrong information from which we made bad decisions. But what is the baddest is falling prey to a predator.

Think of that baddest scenario as another kind of security issue, one that is unique to Predictive Analytics. It’s something we need to think about as we design the system. Let’s apply a common sense approach to this scenario with a little thought experiment.

If I carefully drop a drop of black ink into a glass of water at room temperature, the ink will evenly disperse throughout the glass resulting in a grayish color. Is this absolutely true? No, mostly true, but not absolutely true. Although the ink and water molecules bouncing around each other will most likely result in the even distribution, there is a small possibility (an incredibly small one – wow, even smaller than me winning the lottery) that the molecules will bounce against each other such that the dispersal happens so slowly, we’re long gone before it happens.

I first heard this ink and water question long ago reading a book on what was then the new study of chaos theory. It struck home for me that nothing is absolutely certain. There is just a probability of it happening. The actions of a few molecules in a glass of water may be discernable from another set of molecules, but at the macro level the ink predictably disperses through the water. In our human frame of reference, for our purposes, it will do that every time.

This was one of the moments that helped guide me to eventually working in Predictive Analytics. Left to the laws of physics and the law of averages, predictable patterns emerge at the macro level from what seems to be senseless at the micro level. Predictive Analytics uncovers the fuzzy rhythms of the interaction of objects at the various levels of the hierarchy of existence.

For example, at the level of an individual ant, we see the ant’s arsenal of sensors, its organs, the few simple behaviors programmed into its little brain, etc. But the ant colony is not simply just a large number of ants. It is in itself a different organism neither looking like nor behaving like the individual ant. Each ant performs its duty without question (sometimes sacrificing itself) leading to highly predictable behavior emerging from the ant colony.

However, there is another reason (with a much higher probability) that the ink will not disperse through the water. I could toss the whole thing down the drain before the ink disperses. Most humans are far from the dutiful ant that results in predictable ant colony behavior. Armed with our self-aware brains, each of us is a force of the universe with capabilities beyond the laws of physics and the law of averages. Our “intelligent” actions are constantly disrupting the rhythms of a complex but mindless universe.

There is a big difference between data mining when the data is captured from a system where all the agents (objects of the system) are dutifully going about their business following a few simple rules and a system where there is an intelligence actually attempting to undermine another intelligence. For example, Predictive Analytics is very effective at improving inventory correct inventory levels. A history of purchases by thousands or even millions of customers over a long period of time can uncover patterns that can forecast future demand levels fairly well. Customers are simply going about their business buying things to keep their lives moving merrily along.

Contrast that to credit card fraud or embezzlement, where the agents are not simply going about their business. Smart thieves know how to fly under the radar. This means they know what you are looking for and execute their crime where you’re not looking. They excel at being invisible by not triggering any red flags as they go about their mischief under the cover of the false sense of security provided by those limited red flags.

In fact, one technique for recognizing bad guys is to find those with behavior that is too perfect. For example, they avoid raising the any red flags of credit card fraud. A smart data miner can determine your rules for ruling out credit card fraud since most algorithms out there use pretty much the same techniques or at least similar enough so the rules can be reverse engineered.

In such cases, Predictive Analytics is worse than just useless. It is actually a weapon of the thief, as much a weapon as the darkness is to a house burglar. The bad guys use your predictive analytics against you. Think of how we can catch fish on a hook and line because we use the fish’s recognition systems against them. We’re able to set that trap only because we know exactly what the fish is going to do.

This graphic shows a tiger, a fiece predator, using camouflage to hide from prey so it can sneek up on it. The rabbit, the epitomy of prey, also uses camouflage to hide from its predators.

The notion of predators lying to prey and prey lying to predators underlies much of how I approach Predictive Analytics. It’s all about Predictive Analytics guessing what the opponent is hiding and then the opponent needing counter with another tactic.

It’s important to ensure that the data mining rules should be secured if possible. Otherwise, people will know how to game your system. And believe me, they do. I’ve witnessed many incredibly clever ways workers have gamed their KPIs. They are able to because they know the rules upon which they will be judged.

About Eugene

Business Intelligence and Predictive Analytics on the Microsoft BI Stack.
This entry was posted in Data Mining and Predictive Analytics. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s