Since the very fun days when I worked in the Analysis Services product team (1998), I was fascinated by what was then a brand-new project named “Aurum”. Aurum is latin for gold, and the root of gold’s chemical symbol, Au. The result of project Aurum is known since 2000 as the Data Mining end of SQL Server Analysis Services. Almost a decade since its release, Analysis Services Data Mining (I’ll just call it “Data Mining” from now on) is still considered fringe in the world of Business Intelligence. However, that seems to be suddenly and rapidly changing.
If you play a little with Data Mining (see Data Mining Tutorials and How-to Topics), you’ll see that you can do some incredibly intriguing and sophisticated things very easily. All in all, I’d say Data Mining is much easier to work with than the OLAP side of Analysis Services, when you consider the difficulty of topics such as MDX and optimization. It’s not hard at all to see how the predictions and insights one can glean are of immense value to a business. And more interesting, one can see things they never would have thought of before.
Thinking of “BI” as a verb, as in “I’m doing BI!”, Data Mining is BI. ETL, DW, and OLAP are just the setup for what we really want. Until data mining is incorporated, we have only what I lovingly call “glorified reporting”. Glorified Reporting is better than plain old reporting, but it isn’t BI. What you do when you’re browsing through your OLAP cubes, slicing and dicing, is data mining. It is the verb, BI. It’s just that it’s manual data mining. Pretty much everything Data Mining offers is what you do in your head anyway. It may not result in an ultimate decision, but it brings the information you need to a whole new level.
The problem is that I think there is such a black box mentality that people automatically shut down. In the OLAP world, in my MDX classes I deal up-front and head on with the two hindrances I encounter with the students; the thought that OLAP technology is some super-exotic thing and that this multi-dimensional thing is beyond any mortal comprehension. Therefore, I spend much time up front showing how OLAP’s data structures are really just a set of tabular structures, indexes, and bitmaps and how we really do understand multi-dimensional space. The OLAP black box becomes at least a gray box (you really don’t need to know the nitty gritty) and multi-dimensional space becomes almost as second nature as lattitude, longitude, and altitude.
I believe data mining suffers from the same mental blocks. That really shouldn’t be the case since we’re all natural data miners. Some of the most talented data miners I’ve seen are like the guy sitting next to you in the bar spouting off surprisingly insightful and sophisticated arguments for who is the greatest home run hitter.
The first mental block is not understanding what Data Mining is used for. Isn’t it something only propeller-heads do? I hope I helped to begin cracking that mental block with the first few paragraphs. It is used to tell you things about your business that may not be readily evident. For example, the tutorials I mention above are based on a fictitious Bicycle manufacturer. In any industry, I’d want to focus my limited time on likely customers and less on customers where it usually a matter of wasting time. If a trim, rather 30-ish person and an overweight, middle-aged person,walked in, I suppose I would think I should spend my time with the trim, 30-ish person. After all, that person is obviously athletic and would love bicycles. But on the other hand, what are the odds that younger person already owns a bike and is there to purchase a relatively cheap accessory and knows exactly what he/she wants anyway? What are the odds the overweight middle-aged person found out he/she has high-blood pressure, is looking to increase exercise in a fun way, and is primed to buy a bike?
The second mental block is the advanced math that must accompany Data Mining. The math is indeed advanced, but for the Data Mining program, not the user. The whole purpose for writing the Data Mining program is to encapsulate the ugliness of the math shielding it from the user. However, a lack of advanced math skill (up to calculus) will be a hindrance as the level of the data mining advances. I suggest picking up a good book on statistics that presents it in a user-friendly way. A book that I’ve recommended that seems to help is Teach Yourself Statistics, by Alan Graham. Statistics is the basis for the first plateau of data mining. Things like machine learning are at a second level. Focus on the statistics-based aspects first.
Another block, although not a “mental block”, is an inability to gather data. No data, no data mining. The inability to gather gather and store your own data in data mining is as much of an impediment as the inability to read. Good results from data mining (confirming something you suspect or discovering things you didn’t already know) requires integrating data from different sources in creative ways. Even if you have access to a fantastic data warehouse at work, chances are it doesn’t contain everything you need to test out a hypothesis. If you ask the DBA to add that data to the data warehouse, chances are 99.9% you will be told “NO!” Fortunately, Analysis Services 2005-2008 allow you to define data from various sources in the Data Source View. Mastery of the “Data” tab in Excel 2007 and your own SQL Server database into which you can simply import data (and add it to an Analysis Services Data Source View) will take you a long way.
Related blog: Data Mining in the PerformancePoint “MAP” Framework
My advice on learning a tough topic: You have to really want it. If you want to be valuable, you need to think and thinking is hard. A minority of the people I try to teach are glorified robots who go through life expecting to learn by being programmed like a computer (H#, anyone?). For most of the others, OLAP and data mining just aren’t their thing, so they don’t pour their heart and soul into it … which I can certainly understand. After all, how can we pour our heart and soul into everything? Fortunately, there are always one or two who really appreciate and get what I do for them and that makes the entire effort worthwhile.
You will struggle to learn this. That’s great if you do. It’s because I know that if you approach the subject like a predator after their prey, you will eventually learn it and when you do, you will be much better off than those for whom it comes naturally. It’s not a coincidence that very often the most unlikely people are the tops in their field.
It’s unfortunate that the terms “predator” has taken on such a negative connotation. The predators of the “jungle” are the intelligent, noble, creatures. The prey just move about from green pasture to green pasture, expecting God (or at least the farmer) to provide the grass, and they run as fast as they can when the predator decides to attack. The prey really owes the health of the species to the work of the predator.
The predator is driven and thinks. If you really want to learn something, when you encounter a plateau or roadblock, you will step back and analyze the situation. You don’t worry about taking wrong paths as what is learned down the wrong path usually comes into play later. A glorified robot will drop the learning material and find another source that better spoon-feeds things for “effortless learning” … the biggest fantasy I could ever think of. There is no such thing as effortless learning. What there is that is effortless is “effortless doing”, which comes after decades of consistent, diligent, and mindful thinking and learning.
A last word: Thinking humans go beyond a jungle predator, though. A jungle predator must think in order to take the resources of others; it takes away someone else’s piece of the pie; such as the Wall Street sort. For a jungle predator, life is a zero-sum game. A “thinking human predator” stalks ways to increase the pie.