Is OLAP Terminally Ill?

Someone told me yesterday that “OLAP is dead”. “Everyone is choosing tabular/in-memory.” I know it’s not dead, maybe at least sick. But did I underestimate the time of the tipping point, the curve in the hockey stick, where the vast majority of users will “sensibly choose” the tabular/in-memory option over OLAP?

I realize some, including me, think this topic is beaten to death. From the point of view that OLAP is my bread and butter, my top skill (I should have become a cop like my dad wanted), of course I took it to the heart, and take things (including “official” word from MSFT) with a grain of salt. But I also realize the person who told me this is very bright and knows the market. So I had to take a time-out today to revisit this issue as a reality check on my own professional strategy; a good thing to do every now and then.

When I first became aware of the OLAP is dead controversy a little over two years ago, I wasn’t too afraid of this since 256 GB of RAM was still really high-end. Today, 2 TB is “really high-end” (a few Moore’s Law iterations), well beyond the size of all but a few OLAP cubes I’ve dealt with (not even considering in-memory compression!).  And there were a couple of issues I still had not fully digested at that time.

One of those issues was not fully appreciating the value and genius of the in-memory compression. At first, I was only thinking that RAM with no IO is just simply faster. But the compression/decompression cost that occurs in the CPUs, which results in a whole lot more CPU utilization, isn’t really much of a cost since those cores were under-utilized anyway. Another was the volatility issue of RAM. At the time solid state memory was still fringe and my thought was that even though volatility wouldn’t be much of an issue in the read-only BI world, but would be an issue in the OLTP world. Well, that doesn’t seem to be the case with Hekaton.

After thinking for much of the night, here are two key questions I came up with that will determine whether OLAP (specifically SQL Server Analysis Services OLAP) will die:

  1. Is the question really more will hard drives (the kind we use today with the spinning wheels and all those moving parts) become obsolete? RAM and/or flash could neutrailize all the advantages of disks (cheaper, bigger, non-volatile) relatively soon.
  2. Will OLAP become minor enough in terms of utilization and product pull-through that Microsoft will no longer support a dev team? I can quickly think of a few Microsoft products with a strong but relatively small following that just didn’t warrant an infrastructure and were dumped.

An important thing to keep in mind is that there are really two separate issues. One is the underlying structures, OLAP versus in-memory, and tabular versus multi-dimensional. The first issue, the underlying structures, is a far more powerful argument for the death of OLAP. The underlying structure really will be seamless to the end-user and it won’t require any guru-level people to implement properly, messing with all those IO-related options.

However, I still don’t completely buy the “tabular is easier to understand than multi-dimensional” argument. I buy it to the point that, yes, it is true, but I don’t think this is the way it should be. My feeling is that the multi-dimensional concepts encapsulated in MDX and OLAP are more along the lines of how we think than what is encapsulated with SQL and relational databases. What comes to mind is the many times I’ve engaged a customer with “thousands of reports” that were really variations of a couple dozen and were mostly replaced with a cube or two.

As a side note, one exercise I use to demonstrate the elegance of MDX is to think about the syntax of how Excel handles multi-dimensions. Excel is multi-dimension, but just two dimensions. With a cap on dimensionality, it’s easy to use the A1 (column A, row 1) syntax. But what about three dimensions? A sheet (Sheet1$A1). Four dimensions? A different xlsx document. Five? A different directory. That’s not at all elegant. But MDX elegantly “scales” in the number of dimensions; it looks the same from zero through 128 dimensions.

The tabular model reminds me of when I started my OLAP career in 1998 as a developer on the “OLAP Services” (SQL Server 7.0 version of Analysis Services) team at Microsoft. OLAP for SQL Server 7.0 was really the just core OLAP, no frills, just strong hierarchies and aggregations. It was very easy to understand, but users quickly hit walls with it. That reminds me of how VB was so easy to learn. One could learn to build pretty good applications quickly, but would run into problems venturing beyond the 80/20 point. Eventually .NET (C#/VB.NET) came along, still relatively easy to use (compared to C++), but still a quantum leap in complexity. For OLAP Services, that was SQL Server 2005 Analysis Services with the weak hierarchies, many to many relationships, MDX Script, KPIs, etc.

I guess what I’m saying is this is a case of taking a step backwards to take two steps forward. The spotlight (tabular) isn’t currently on the high-end where I normally make my living. However, it doesn’t mean there isn’t a high-end. The high-end as we know it today (OLAP) will eventually die or at least severely morph, but requirements of yet unknown sorts on the low-end will push the complexity back up. How will Big Data affect the kinds of analysis that are done? Will 2 TB of RAM then be sufficient for the “masses”?

At the moment, I do believe that in terms of raw new BI implementations, tabular is giving a whooping to OLAP. It should since the idea is to expand the accessibility of BI to a much broader audience. I’ve lived through the rise of Windows 3.1 and the dot-com crash. This is a minor disruption; it’s not like I haven’t begun moving on years ago – in fact, skill-wise, I learned to always to be moving on to some extent.

BTW, he also told me that “no one makes money on Big Data and that Predictive Analytics is limited to one or two people”. Those are in fact the two skills I’ve been shifting towards the past few years in light of the sickness of OLAP. While I really don’t know about the former claim (and would find it short-sighted to base strategy on that), I do have a couple of opinions on the latter:

cube-small

Figure 1 – Even my cat wonders whether cubes could be on the way out.

About Eugene

Business Intelligence and Predictive Analytics on the Microsoft BI Stack.
This entry was posted in SQL Server Analysis Services and tagged . Bookmark the permalink.

12 Responses to Is OLAP Terminally Ill?

  1. Warren Rutledge says:

    Great assessment. I see PowerPivot and the tabular model as lowering the barrier of entry to OLAP, but it definitely does not replace MDX yet. We expect to see a great deal of growth in the tabular environment.

  2. Chris Webb says:

    Some great thoughts here, but I think you’re missing the most important reason why SSAS cubes might/will die – Microsoft’s overall BI strategy. PowerPivot and the Excel 2013 data model show that MS now think about BI more as a feature to sell Office rather than SQL Server. PowerPivot and especially DAX concepts and syntax had to be consistent with Excel for them to have a chance of being successful, and Tabular is the corporate BI extension of PowerPivot; cubes and MDX don’t fit in nicely with that strategy. For Microsoft overall I think that the strategy makes sense, even if it means that the kind of corporate BI that the likes of you and I specialise in gets relatively less attention from Microsoft.

    • Eugene says:

      Hi Chris,

      Thanks for bringing up that point. I actually agree with the strategy you outline since it will (has) expand the MSFT BI market.

      Here’s some other thoughts I have about the fate of OLAP. I mention in the blog that they engineered a take one step backwards to take two or three steps forward strategy. Like going back to the level of ease on par with the AS2K days to drastically broaden the MSFT BI mind-share, but that users will progressively demand more features maturing back to a more complex level.

      I think that maturity has to consider the elegance of multi-dimensional concepts as more “scalable” than the Excel-like syntaxes (which I mention in the blog). I remember an awful joke Mosha used to tell: In the not too distant future, an older and younger programmer were talking. The younger one asked, “What is a table?” The older replied, “That’s what we used to call a 2D cube.” As awful as that joke is (and as of today is completely wrong – hahaha), it does effectively illustrate the elegance of multi-dimensional concepts.

      OLAP has to come back some day because more than likely analytics will somehow get to the point where even RAM is too slow (ex: Let’s track the position of atoms in an unfolding protein molecule by picoseconds!) and we’ll need to again resort to clever algorithms, one being a pre-agg mechanism – going back to the notion of OLAP really being a “data warehouse accelerator”. So OLAP will return not as a thing in itself, but a 2nd-class mechanism, a feature of a tabular DB, a little more complex than indexes. And it may not have the IO complexity it has today because by then disk drives will be replaced by flash.

      So, glad I moved on to Predictive Analytics … hahaha.

      Eugene

      • Chris Webb says:

        One last thought on why OLAP might come back one day. Apart from the data volume issue that you mention, the other tension is between flexibility and rigidity in the model. OLAP always gets criticised because its model is too rigid for changing requirements, which is true; what gets forgotten is that a rigid multidimensional model has benefits in that it makes some calculations much easier to define. For example with hierarchies defined and members on those hierarchies having a fixed order, you can easily say “get me the value for sales for the previous day” in a calculation without needing to know anything about how to navigate through the values in a dimension to get to the previous day. It’s just one of those cases where the pendulum will swing from one extreme to the other over the years, though, and you’re right that user demands for more functionality will bring Tabular back to the same level of complexity as Multidimensional in a few years.

  3. markvsql says:

    This is a great discussion, gents. I am still pretty new to OLAP, and I just today completed my first SSAS Tabular project. So, I certainly don’t have the depth of knowledge and experience you do. But I thought I would share my perspective. From what I have seen over the past few years, since starting to focus on BI, is that Microsoft certainly does seem to be transferring eggs from the Traditional OLAP basket to the Tabular/in-memory basket. While this is happening, though, it seems to me that OLAP certainly has not contracted NSD (Notification Services Disease) and will just disappear. I look at the complexity that must be involved in making the necessary changes for having Power View work against traditional OLAP cubes and the fact that Microsoft is making that investment. The feeling I get is that the future will be a hybrid of what seem to be two competing technologies today. It is possible that the future technology could be looked upon by Traditional OLAP folks as OLAP with the benefits of Tabular while Tabular folks see it is Tabular with the benefits of Traditional OLAP.

  4. waltika says:

    For me OLAP is alive and kicking, tabular technologies are not incompatible with it and Microsoft is not the sole provider of OLAP.

    It’s the usual architectural trade-of that is at play between expressivity power and associated complexity and learning curve.

    The following sentence is just gut guessing because I never attempted formal proof of it but for me OLAP(MDX), OWL or DAX are either semantic equivalents or sub/super-sets of each other and I see them in the future as alternatives to formalize the meaning of and interact with data that is expressed in its simplest atomic form in the underlying physical model.

  5. BrainE says:

    Welcome to the only comment thread on the entire internet where everyone just agrees with each other completely.

  6. tradiecorp says:

    Hi, great article, with some great points….
    so its 18 months later, how have your views changed?
    have i got this correct when you say OLTP gives an approximation, where as OLAP constructs the real case but then only uses a minority of the data?
    The sql 2014 and hekaton seems to have a mix of both OLTP and OLAP.
    I read somewhere about the lack of need to invert the algo (fishers matrix) reducing computation, hence power/cooling. but the biggest improvement was the ability to use machine code to instruct drastically increasing speeds
    The sql 2014 and hekaton seems to have a mix of both OLTP and OLAP.
    but i am only just getting into this area (about 3 days now 🙂 )
    Tim

    • Eugene says:

      Hi Tim,

      As a matter of fact, my views have changed somewhat. I’ve been meaning to write a follow-up, but with deadlines, massive changes in our field and all … well, you know 🙂

      For now, in a nutshell, I love Hekaton, but I’m still frustrated at the seeming avoidance in our industry of tackling the “hard problems”, as opposed to finding escapes. It reminds me of how instead of drastically improving renewable energy sources, we found fracking, which lets us punt the real issue a few more decades.

      Examples of real innovation is Micron’s Automata Processor, which fundamentally addresses “massively” parallel processing, and Neo4j’s graph database which does a better job at modeling the real world as opposed to the highly sterilized relational schemas we’re clinging to. Azure ML is another, although Predixion Software did that years ago. My primary sentiment is that faster processing is all well and good, but the way we model our data is too rigid for analytical purposes and that will be the bottleneck for answering really juicy analytical questions. Of course, elaborating on that last sentence requires a lengthy article, which I’m going to write.

      Anyway, I should be posting a few blogs related to this topic and a couple of new ideas I have through the end of the year.

      As for your question, “have i got this correct when you say OLTP gives an approximation, where as OLAP constructs the real case but then only uses a minority of the data?”, please elaborate on what part of the article I make that suggestion. In essence, it’s sort of the opposite as OLTP data is the recording of unambiguous facts (such as a customer’s age, gender, income, name, phone number) and OLAP aggregates data across many “dimensions” (such as across customers) whereby patterns emerge (which tell a story open to interpretation). The fundamental difference between OLTP and OLAP (roughly speaking) is that OLTP supports use cases consisting of a lot of small read/write/update, whereas OLAP is lots of big reads and infrequent big updates.

      Thank you very much for your comment!

      Eugene

  7. Pingback: Planning a 1-Day Symposium in Boise on the Utilization of Graph-Centric Data Technologies in Business Intelligence | Soft Coded Logic

  8. Bill Anton says:

    Bump for updated perspective…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s