Planning a 1-Day Symposium in Boise on the Utilization of Graph-Centric Data Technologies in Business Intelligence

Introduction

I’m currently working with the organizers of the Boise BI User Group and a few heavy hitters from various Boise-based technology communities on a 1-day symposium introducing graph-based technologies to those in the Boise Business Intelligence community. (To clarify, by “graphs”, I’m referring to those web-like “networks” of relationships, and not visualizations such as line graphs seen in software such as Pyramid Analytics or Tableau.) The overarching goal is to inform BI practitioners of the toolset already out there required to begin addressing what I consider to be BI’s “hard problem”. That is, to feasibly formulate, organize, maintain, and query relationships between data throughout an enterprise.

We’re in the early design and planning stages, shooting for a mid-October (2015) delivery. The nature of this symposium is forward-thinking, meaning not many people would think to look even for it, so it doesn’t come with a ready-made audience (ex: such as a class on advanced Tableau). I chose to post this blog early in the process as a feeler gauging interest in this symposium as well as to gather input for the content. This post is by no means a formal announcement.

As a caveat, it’s important to state upfront that in the overarching Business Intelligence context of this symposium, in order to apply many of the techniques that will be covered, there will still a pre-requisite for a well-developed BI infrastructure … for the most part. I realize that for many enterprises, even a somewhat-developed BI infrastructure is still a far off dream. But hopefully this symposium will reveal a much bigger payoff than was previously imagined for a well-developed BI infrastructure, spurring much more incentive to aggressively strive for that goal. However, it’s crucial to keep in mind this doesn’t mean that there aren’t narrower-scoped use cases for graph technologies ready to tackle without a well-developed BI infrastructure, particularly with the Automata Processor.

Abstract

An accelerating maturity of analytics combined with Boise’s rich Business Intelligence community, innovative spirit, and the headquarters of Micron with its Automata Processor presents a powerful opportunity for Boise to yield world-class analytics innovation. The “three v’s” of Big Data, massive volume, velocity, and variety is simply just more data without improvement of the even tougher task of organizing the myriad data relationships which today are mostly not encoded. We need to begin solving our problems of a complex world in non-linear, truly massively parallel, massively hierarchical, and non-deterministic manners. Such an effort begins by shifting away from the central role of the tidy simplicity of our current relational databases to the scalable, reflective, modeling capabilities of graph (network) structures taking center stage.

Everything is a set of relationships and that is what graphs are all about. Our human intelligence is based on a model of our world, a big graph of relationships, built in our brains over the course of our lives. We humans are able to readily communicate with each other because those unique models of the world held in each of our brains mostly overlaps – our cultures. Where our individual models of the world don’t overlap with those of others represents our unique talents. The net effect is that our society is incredibly richer because we can exceed the limitations of our individual brains through the aggregation of our collective knowledge.

Likewise, machine analytics systems of our enterprises possess skills beyond the limitation of our brains. The problem is that those systems don’t share our human culture. In order for us humans to effectively leverage the “intelligence” captured in those enterprise analytics systems, those systems also need to possess models of the world at least somewhat overlapping with us. Models in current analytics systems are limited by restrictions dictated by the limitations of computers of the past, for example, the limited notion of “relationships” of relational databases. Deeper communication between humans and machine intelligence currently requires grueling programming of the computers and sophisticated training on our part. Today’s technology, particularly graph technologies, is our opening to surpass those outdated techniques, building, maintaining, and querying superior models of the world in our analytics systems. The improved machine intelligence fosters smoother, more robust communication between human and machine intelligence.

The key takeaways are:

  • Understand why breaking away from the predominantly relational database model to graph databases opens the door to quantum leaps in analytic capability.
  • The challenges of navigating through the increasing complexity of the real world, at the risk of being left behind by enterprises that do build that capability.
  • An introduction to the technologies and concepts of graphs.
  • A roadmap towards the transition to graph data.

My Initial Vision as a Starting Point

As I mentioned earlier, we are in the early design and planning stages, and the purpose of this blog is to gauge the interest for such a symposium as well as to gather input from the potential attendees on the content. So nothing is set in stone, the concrete is just starting to be mixed. However, I would like to include my initial vision of the agenda in this post just as a starting point.

As we have just this past week reached a few critical milestones (participation of a few key parties, a venue), we’re just starting to engage other key players to work out an agenda that will provide maximum value to the attendees. So it will certainly morph to a noticeable extent by the time we formally announce the symposium.

Before continuing on to my initial agenda, Sessions 1 and 6 are targeted at mature BI practitioners. Because the symposium is set in a BI context, I thought to begin laying out the current BI landscape and pointing out big problem. Sessions 2 through 5 are at a rather introductory level on graph technologies, laying out the pieces required to attack that big problem. We would then wrap up with a discussion on how to apply graph technologies to BI. Anyway, here is the initial agenda I tossed out to begin the process:

Session 1: The Current State of Analytics

The enterprise analytics world is currently a complicated zoo of concepts, processes, and technologies, all of which do hold legitimate roles. However, they exist in our enterprises as islands of poorly linked pieces lacking the rich integration, as do the memories in our brains or the organs in our bodies. A business enterprise is a system of relationships like any natural system. In this session we explore these “tectonic plates” of BI and the gaps required to lead towards an increased capability of our business enterprises leaping ahead through the vastly improved bridging of human and machine intelligence.

  • The Current Landscape of “the Intelligence of Business”: ETL, Data Marts and Warehouses, Data Lakes, Performance Management, Self-Service BI and Analytics, Master Data Management, Metadata Management, Complex Event Processing, Predictive Analytics and Machine Learning, Deep Learning, Knowledge Management.
  • The Missing Links: Why do we still make bad decisions, fail to see things coming, and keep acting on organizational myths and legends?
  • The Secret Sauce: Soften the boundaries between objects and balance bottom-up flexibility and top-down centralization.

Session 2: Graphs and the Theory of Computation

It’s certainly not that graphs are unfamiliar to us. We are well familiar with org charts, food chains, flow charts, family trees, etc., even decision trees. While such simple “maps” we’re used to seeing in applications such as Visio, PowerPoint, or SQL Server Integration Services are very helpful in our everyday lives, they quickly grow like kudzu into incomprehensible messes from which we readily shy away. This session will introduce basic concepts of graph theory and the Theory of Computation as well as to begin exploring that unwieldy reality of relationships we’ve so far punted down the road.

  • Introduction to Graphs: Terminology and Basics of Graph Theory, and a it on the Theory of Computation.
  • The Importance of Graphs, Models and Rules in the Enterprise – Everything is a graph. Examples of graphs used in commonly used business tools.
  • Robust Graph Processing: Model Integration, Fuzziness, Inference, massively parallel, many to many, massively hierarchical.
  • Where Relational Databases Fail in the Enterprise and why we keep retreating back to that comfort zone (ex. retreat from OLAP back to relational databases). Note: It may sound odd that I’m talking about focusing on relationships even though today’s primary data sources,  “relational databases”, are called “relational”. The problem is its not relational enough.

Session 3: Embracing Complexity

It doesn’t take a network of seven billion independent minds and billions more Web-enabled devices forming the so-called Internet of Things to result in a complex system where nothing is reliably predictable. For example, a distributor of goods lives in an environment of vendors, stores, customers, their customers’ customers, regulations from all governments (in the “Global Economy”), and world events where reliable predictability is limited to low-hanging fruit problems. Each is rife with imperfect information of many sorts and competing goals. Consequently, the problems faced by such enterprises are of a “bigger” nature than the limited-scope problems we’ve so far typically addressed with our analytics systems. The reason is we are attempting to resolve complex problems using techniques for resolving complicated problems.

  • Overview of Complex Adaptive Systems. The many to many, heterogeneously parallel, massively hierarchical, non-linear nature of our world.
  • The Things We Know we Don’t Know and the Things We Don’t Know We Don’t Know: Predator vs Prey, Predator vs Predator
  • Rare Event Processing: Statistics-based prediction models fall short for those high impact rare events, where novel solutions are engineered from a comprehensive map of relationships.
  • The world is a complex system: Situational Awareness
  • Healthcare: Perfect Storms of Many Little Things
  • Lots of Independent and Intelligent Moving Parts: Supply Chain Management, Manufacturing, Agriculture

Session 4: Beyond Visio – Robust Graph Technologies

Graph concepts and technologies have been around for a long time, in fact, from the beginning of computing. Many of the concepts are core in the world of application developers who hide the ugliness from end users by presenting flattened, sterilized, distilled, templated chunks of data. Think of the wiring of your computer hidden from the end user by the casing. Gradually, the complexity is such that the ugliness demands to be addressed at the higher levels of the end user, albeit in a cleaner form.

  • Graph Databases: Neo4j Introduction and Demo
  • Overview of IBM’s Watson
  • Object-Oriented Databases and ORM.
  • The Semantic Web: RDF, OWL, SPARQL.
  • Introduction to graph-like co-processors; particularly the Automata Processor

Session 5: Micron’s Automata Processor

Micron’s Automata Processor is one of the most important innovations in semi-conductors. It presents a shift away from the current computer architecture that for decades has been geared towards the simplicity of solving strictly procedural problems. Ironically, in order to effectively tackle the problems of an increasingly complex world, we retreat from the current computer architecture of today to a simpler model based on finite state machines. The massively parallel, loosely-coupled nature of the Automata Processor more comfortably reflects the nature of the environments in which we live, whether business, nature, or social. The truly massively parallel nature of the Automata Processor represents a leap as big a leap akin to the leap from single-threaded to multi-tasking operating systems decades ago.

  • Micron’s AP demo and examples of current applications
  • Proposed Automata Processor BI Use Case.
  • Recognizing Opportunities for the Automata Processor.

 Session 6: The Big Problem of Building the Robust Models

So what is the roadmap for building such ambitious systems? This is not about building an Artificial Intelligence but to soften the communication boundaries between people and our databases by drastically improving upon the relationships between data. Automation of the generation and maintenance of these relationships, the rules, are the keys. For example, it’s not much harder to map out the relationships within a static system than it is to write a comprehensive book on a fairly static but complicated subject. The trick is to do the same for a system/subject in constant flux.

  • Where do Rules Come From?
  • Existing Sources of models and rules in the Enterprise.
  • A Common Model and Rule Encoding Language.
  • Mechanism for Handling Change, Massive Parallelism, Massively Hierarchical, Missing or low confidence data.
  • Knitting together the Pieces of the Current Analytics Landscape mentioned in Session 1.

A Little Background on Where I’m Coming From

It’s not that people, particularly those involved with BI and analytics, aren’t aware of the importance and value of encoding knowledge onto graphs. It’s actually rather obvious and graphs are very much in use. It’s that these graphs are for the most part simple, disparate artifacts (connect the dots pictures), disconnected islands of knowledge. That condition is similar to enterprises only a few years ago with hundreds of OLTP systems scattered throughout  (and thousands of Excel documents today – even with SharePoint Excel Services!), with their silos of data and clumsy methods of integration. There have been efforts in the recent past to promote graphs to a more prominent level that gained very much attention, but fizzled back into relative obscurity. Relevant examples include UML and the Semantic Web. Neither are dead, but maybe with the fuller complement of related technologies today, they may finally find lasting traction.

A couple of years ago I wrote a blog – strictly for entertainment purposes only – titled, The Magic of the Whole is Greater than the Sum of Its Parts. It’s just a fun exploration of the notion of a business as an organism. Particularly that organism’s intelligence, what I call the “intelligence of business”. Although we shouldn’t take that metaphor too far (and maybe I did in that blog … hahaha), I think it’s fair to say that a business has rough counterparts to a human’s organs, desires, pain, ability to physically manipulate its surroundings, and knowledge, which today is much more harmonious in us than in the business analog.

However, the problem is that a business’ “intelligence”, the ability to store, analyze, and maintain webs of relationships, lies almost exclusively in the human brains of the workers and hardly in the fairly hard-coded/wired mechanical things (devices, software, documents). That’s fine as long as the quality of the knowledge is fairly transferrable to another person (in case the worker leaves) or the skill has been commoditized, and that there is some level of overlap of knowledge among the employees (redundancy).

One major outcome of failing to address this, at least in my opinion, is that in the name of optimization (particularly when the elimination of variance and redundancy is say overly-zealous), workers are forced into deeper and deeper specialization which draws stronger boxes around these “organic components” of the business. The knowledge in those workers’ brains are hardly ever recorded to an extent that a replacement is able to readily take over. When a knowledge worker leaves, it’s as if the enterprise had a stroke and must relearn capabilities.

Our poor human brains are filled to capacity to the point where we whittle away at things in life outside of work in order to keep up. We long ago maxed out on our ability to work optimally in groups when our “tribes” began consisting of too many people and there is too much flux in the membership. It used to be that knowledge could be captured in books. But change and increasing complexity comes too fast for the subject matter experts to effectively document, then for we readers to assimilate. As we’ve increased the scalability of data through the Big Data mantra of volume, velocity, and variety of data, we need to improve the scalability of our ability to encode and assimilate increasing knowledge requirements.

The answer isn’t AI, at least the Commander Data or HAL version promised for the last half century. Even with IBM Watson’s success on Jeopardy and its subsequent exponential improvement, I seriously don’t think there will be an AI more innovative than teams of motivated and educated humans for quite a while. The answer is to build a better “pidgin” bridging human intelligence and data, a far less grandiose track for which the pieces are mostly there and offers a long-time incremental path towards improvement.

Here are a few old blogs that sample much of my earlier thoughts that lead to the idea for this symposium:

Actually, almost all of my blogs are somewhat related to the subject of this symposium. My blogs have always been about pushing the boundaries of Business Intelligence. A couple of years ago I attempted to materialize all my thoughts around this subject into a software system I developed which I named Map Rock. This symposium is not about Map Rock as I’ve “retired” it and Map Rock only represents my vision. It makes more sense today to pull together the best of breed pieces out there into something from which be can begin to evolve an “intelligence of business”. However, my 5-part series on Map Rock  offers a comprehensive description of what I was after.

Conclusion

This symposium is intended to be an introduction that will hopefully cut down some of those fences we fear to hop so that we can seriously explore the vast frontier of BI becoming a truly strategic asset, rather than being stuck straddling the tactical and operational realms. It can begin to move from “Help me calculate this value to plug into this formula” to “Help me create and maintain this formula”.

To recap the current status:

  • We’re in the early stages of planning. The agenda presented here is just an initial draft.
  • We’re planning to deliver this in Boise in the mid-October (2015) timeframe. We should have a date and a tighter agenda well before the end of August.
  • We’re trying to gauge the interest in the Boise area for such a 1-day symposium.
  • We’re asking for any input on content or hard problems in your business that could be better approached as a complex problem, not a complicated problem.

Please email me at eugene@softcodedlogic.com with any questions or comments.

About Eugene

Business Intelligence and Predictive Analytics on the Microsoft BI Stack.
This entry was posted in BI Development, Cutting-Edge Business Intelligence, Data Mining and Predictive Analytics and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s