Glossary: Scalable Semantic Layers, OLAP Pre-Aggregation, BI in AI Era

Abductive Reasoning

A form of reasoning that starts with an observation and seeks the most plausible explanation, even if it cannot be guaranteed to be true. Abduction is about forming hypotheses—“what could explain this?” Sherlock Holmes famously described his method as deduction, but in reality it combined induction (spotting patterns), abduction (hypothesizing the cause), and deduction (testing the implications). Abductive reasoning is central to problem solving under uncertainty: it favors plausible inference when complete proof is not available. See Peircian Triad.

AGI (Artificial General Intelligence)

AGI refers to a machine’s ability to understand, learn, and apply knowledge across a wide range of tasks at human-level competence. Unlike narrow AI, which excels at one domain, AGI can flexibly transfer insights and skills from one context to another.

Aggregation design (OLAP)

The methodical selection of which attribute combinations to pre-aggregate and store so most queries hit fast summaries while staying within build time and storage budgets. It weighs your workload (query log), hierarchies, and partitions to pick a small set of high-value aggregations (e.g., Month × State × Category) and skip redundant ones you can roll up to (so you don’t also pre-aggregate Quarter if Month exists).

AI Agent

An autonomous software entity that can perceive its environment, plan actions, and execute tasks—often by orchestrating multiple AI components. In a RAG pipeline, an AI agent might issue queries to a vector store or graph database, feed the results into an LLM, then post-process and deliver answers or trigger downstream workflows.

ASI (Artificial Super Intelligence)

ASI denotes a hypothetical intelligence that far surpasses the brightest human minds in every field—creativity, scientific reasoning, social skills, and more. It represents the point at which AI not only matches but greatly exceeds human capability across the board.

Balanced Scorecard (BSC)

A management framework for turning strategy into action by choosing a small set of objectives, measures (KPIs), targets, and initiatives across four lenses—typically Financial, Customer, Internal Processes, and Learning & Growth. The BSC gives leaders a recurring cadence (monthly/quarterly reviews) to see what’s working, fix what isn’t, and align teams and budgets to the plan. Think of it as the organization’s instrument panel and governance loop: what we aim to achieve, how we’ll measure it, who owns it, and when we’ll adjust.

In contrast, A strategy map is the storyboard—a visual of objectives linked in cause-and-effect (“improve capacity → better service → more referrals → growth”). The Balanced Scorecard is how you run that story: it assigns KPIs, targets, owners, and initiatives to those objectives and keeps the review cycle honest. In short, the map explains why outcomes should happen; the BSC ensures we measure, manage, and learn our way to them.

Business Intelligence (BI)

A discipline and technology stack focused on collecting, organizing, modeling, and analyzing enterprise data to support reporting, dashboards, and decision-making. BI transforms raw operational data into structured insights through data warehouses, semantic models, queries, and visualizations, enabling organizations to monitor performance, detect trends, and guide strategy.

In the era of AI, Business Intelligence can appear antiquated at first glance—a relic of slow cubes that took days to process, opaque metrics whose lineage few could explain, ETL pipelines brittle as glass, and domain concept mapping exercises that devolved into prolonged debates between subject-matter experts. Data warehouses, meanwhile, were often criticized as perpetually stale snapshots of a business that had already moved on.

Yet in the LLM era, BI remains foundational. Even the human brain does not reason directly over raw sensory input; it transforms it into higher-order constructs that can be compared, correlated, and acted upon. Modern architectural patterns such as Data Vault and Data Mesh have reduced the monolithic brittleness of earlier BI stacks, while large language models now help stitch fragmented domains back into coherent semantic layers. For these reasons, BI continues to serve as the spearhead of enterprise intelligence—the structured substrate upon which more advanced AI reasoning can reliably operate.

Casting a Wide Net

In the context of the Tuple Correlation Web, casting a wide net refers to the process of generating a large matrix of statistical relationships—typically Pearson correlations—between all pairs of monitored metrics drawn from different dimensions or cubes of enterprise data. Rather than computing one correlation at a time, this approach retrieves a broad grid of correlations en masse to expose potentially relevant associations that might not be found through narrow probing alone. It supports exploratory search by making a dense space of pairwise statistical signals available for navigation and further analysis. See correlation grid.

Chain of Strong Correlations (CSC)

A CSC is the data storyboard built inside the Enterprise Knowledge Graph—specifically derived from the Tuple Correlation Web (TCW) and Bayesian conditional probabilities, with context from the KG’s ontology/taxonomy, the Insight Space Graph (ISG), and the Data Catalog. Each edge carries strength, lag, and window, sketching plots like X uptick → Y follows → Z eases. CSCs don’t claim causation. Rather, they prioritize what to test and operationalize (alerts, Markov links, or rules) once the strongest scenes prove stable.

Complex (vs. Complicated and Simple)

Complex systems, have many interacting parts whose relationships change over time. Their behavior emerges from feedback loops and context. A business organization, a living ecosystem, or a production data platform are complex — not just difficult, but dynamic. You can’t solve them once and for all; you can only observe, adapt, and learn as patterns evolve.

in comparison, simple systems have few parts and direct cause-and-effect relationships. Their behavior is predictable — like a light switch or a basic SQL query. If something goes wrong, the cause is easy to trace. Complicated systems have many parts, but those parts interact in fixed, knowable ways. A jet engine or a database optimizer is complicated — hard to understand, but ultimately analyzable with enough expertise and documentation.

Context Window

The context window is the span of text or tokens an AI model can “see” and consider at once when generating a response. It defines how much recent or retrieved information influences each prediction. A small context window limits awareness to short passages or single questions; a large one allows the model to reason across entire documents or multi-step conversations. Expanding the context window increases situational awareness but also demands more computation and careful context engineering to keep inputs relevant and frugal.

Correlation Grid

A correlation grid is a two-dimensional matrix of correlations where the rows and columns correspond to two distinct sets of metrics (e.g., from different OLAP cubes or functional domains). Each cell in the grid holds a statistical association measure (such as the Pearson correlation coefficient) for a pair of metrics. A correlation grid is a way of visually and computationally casting a wide net over exploratory relationships, enabling pattern discovery and evidence generation across a high-dimensional space of potential connections.

Cypher

A declarative graph query language designed for property graph databases, most notably Neo4j. Cypher uses pattern matching to traverse nodes and relationships, allowing users to express complex queries in a readable, SQL-like syntax. Unlike SPARQL, Cypher operates on labeled property graphs (LPGs), which include labels on nodes and key-value pairs on both nodes and relationships. Cypher is intuitive and fast for application developers, but is not part of the Semantic Web standards.

Data Catalog

In the Enterprise Knowledge Graph (EKG) architecture, the Data Catalog is an enterprise-wide registry of all analytical data assets—primarily databases expressed through tables, columns, and their associated metadata. Beyond traditional technical metadata (names, data types, lineage, and descriptions), the catalog is extended with semantic identifiers such as IRIs and vector embeddings, enabling alignment with ontologies and semantic search. Its role is connective: it anchors abstract knowledge objects—concepts, entities, risks, and metrics—to their physical data representations, while also linking BI-derived graph structures such as the Insight Space Graph (ISG) and Tuple Correlation Web (TCW) back to source data. In this way, the Data Catalog functions as the semantic spine of the EKG, binding business meaning, analytical insight, and operational data into a unified navigable framework.

Data Frame

A data frame is a two-dimensional, tabular data structure (rows and columns) where each column can hold a different type (numbers, text, dates). It’s the go-to format in analytics and languages like R or Python (Pandas) for slicing, dicing, and transforming datasets before modeling or visualization.

Data Mesh

An architectural approach to enterprise data in which domain teams own, publish, and govern their data as products rather than relying on a centralized data team. Each domain is responsible for the quality, documentation, and accessibility of its data, while shared standards ensure interoperability across the organization. Data Mesh emphasizes decentralized ownership with federated governance, enabling scalability and domain expertise while reducing bottlenecks in traditional centralized data platforms.

Data Model View (DMV)

A Data Model View is a structured, often visual representation of the tables, columns, relationships, and constraints that make up a database schema. In platforms like SQL Server, the data model view shows how entities are organized and linked—primary keys, foreign keys, hierarchies, and cardinalities—providing both developers and analysts a blueprint of the data’s logical structure. It does not contain the data itself; rather, it describes how the data is shaped and related, serving as a foundation for query design, integration, governance, and downstream semantic modeling (ex. knowledge graphs or ontologies). For most database types, the DMV can be queried through an API or even a SQL-like syntax for an RDBMS such as SQL Server.

Deductive Reasoning

Reasoning from the general to the specific. If the premises are true and the logic is valid, the conclusion must be true. Deduction applies established rules or principles to reach certain outcomes (e.g., All humans are mortal; Socrates is a human; therefore, Socrates is mortal). In computing, Prolog exemplifies deductive reasoning—rules and facts yield guaranteed logical conclusions. See Peircian Triad.

DIKW / DIKUW

The DIKW hierarchy is a well-known model of cognition and learning: Data → Information → Knowledge → Wisdom. Data are raw signals, information organizes them into patterns, knowledge encodes models and structures, and wisdom applies judgment in context. An extended version, DIKUW, adds Understanding between knowledge and wisdom. Understanding interprets why patterns and rules hold, providing the bridge from structured knowledge to wise action. This additional layer highlights that intelligence requires not just storing and applying rules, but grasping their underlying meaning.

Drill-Down

In OLAP, drill-down means navigating from a higher-level summary to more detailed data along a hierarchy. For example, from Year → Quarter → Month → Day. Drill-down refines the view by expanding members into their child members.

Drill-Through

Drill-through jumps out of the cube entirely to view the underlying fact records that contributed to an aggregated cell. For example, clicking a sales total in an OLAP report to see the individual transaction rows from the source system. Drill-through bridges summarized OLAP data and raw case-level detail.

Drill-Up

The reverse of drill-down. Drill-up rolls detailed members back into their parent level in the hierarchy. For example, moving from Day → Month → Quarter → Year. Drill-up provides broader context and reduces detail.

Edge Computing

A computing architecture where data processing occurs close to the source of data—such as IoT sensors, mobile devices, or industrial equipment—rather than being sent to a centralized cloud or data center. The “edge” refers to the edge of the network, where latency is lower and response times are faster. This approach reduces bandwidth usage, improves real-time responsiveness, and enables autonomous behavior in disconnected or bandwidth-constrained environments. Edge computing is especially important for scenarios like industrial automation, autonomous vehicles, and real-time analytics where waiting for round-trips to the cloud isn’t practical.

Enterprise Knowledge Graph (EKG)

A connected, machine-readable map of what the business is—its people, products, customers, processes, rules, metrics, and the relationships among them—expressed with web identifiers (IRIs) so everything can be linked, queried, and reused across systems. Unlike a single database or a static glossary, an EKG unifies data + meaning: operational records, definitions, KPIs, policies, lineage, and external knowledge all live as nodes and edges that applications and analysts can traverse.

An EKG goes beyond ontologies and taxonomies. It also docks the organization’s stories—strategy maps, SBAR frames, causal claims, scenarios—as first-class, versioned artifacts, alongside pattern layers like ISG (what analysts notice), TCW (chains of strong correlations), and Time Molecules (event sequences). That makes the graph not just descriptive but executable: you can diff proposed strategy changes, attach evidence, trace impacts to KPIs, and learn from outcomes.

The EKG becomes the enterprise’s shared semantic layer and memory: a place where data, definitions, and decisions connect, so people and machines can ask better questions, make faster decisions, and continuously improve the story the business runs on.

Suggested Reading: BI-Extended Enterprise Knowledge Graphs

Event Ensemble

The Event Ensemble is the dimensional data architecture at the core of the TimeSolution system. It serves as a unified repository that integrates events from many sources into a single analytical framework. At the center of the Event Ensemble is the EventsFact table, which records individual events along with their timestamps, sources, and case identifiers.

The purpose of the Event Ensemble is to organize event data so that processes unfolding through time can be analyzed. Each recorded event captures the fundamental elements of a process: what happened, when it happened, and where it occurred. Supporting dimension tables—such as DimEvents, DimDate, and DimTime—provide context that allows events to be categorized, filtered, and analyzed across many perspectives.

In this architecture, the Event Ensemble functions as a data warehouse of integrated events. By consolidating event streams into a structured model, it enables a wide range of analytical approaches, including process mining, statistical analysis, OLAP-style slice-and-dice queries, and the construction of probabilistic models such as Markov chains. It forms the foundation upon which higher-level models and insights are built.

See Markov Model Ensemble.

Event Sourcing

Event Sourcing is a system design pattern in which the state of an application is not stored as a single current snapshot but is instead derived from a chronological sequence of events that describe everything that has happened. Each event records a change in the system—such as OrderPlaced, PaymentReceived, or InventoryAdjusted. The current state of the system can be reconstructed at any time by replaying these events in order.

This approach provides a complete audit trail, since every change is preserved rather than overwritten. It also enables powerful capabilities such as time travel (reconstructing the system state at any point in the past), debugging complex behaviors, and analyzing processes over time.

Event sourcing is closely related to concepts such as event streams, event logs, and event-driven architectures, which treat events as the fundamental units of system activity.

In the context of The Assemblage of Artificial Intelligence and Time Molecules, event sourcing provides a natural foundation for capturing the sequence of actions performed by AI agents or business processes. When these events are recorded and analyzed collectively, they allow systems to learn patterns of behavior, identify anomalies, and model the processes that unfold through time.

Event Storming

Event Storming is a collaborative workshop technique used to explore and model complex business processes by identifying the events that occur within a system. Participants from different roles—such as domain experts, developers, analysts, and architects—work together to map out the sequence of events that describe how a process unfolds over time.

During an event storming session, events are typically written in past tense (for example, Order Placed, Payment Approved, or Shipment Delivered) and arranged chronologically. By focusing on events first, teams can uncover how a system actually behaves rather than how it is assumed to behave. The process often reveals hidden rules, exceptions, bottlenecks, and missing information that may not appear in traditional documentation.

Event Storming is widely used in Domain-Driven Design (DDD) to help teams develop a shared understanding of a domain before designing software systems.

In the context of The Assemblage of Artificial Intelligence and Time Molecules, event storming provides a practical way to identify the event types that make up real-world processes. Once these events are defined and captured in data systems, they can be analyzed as event streams and used to build process models—such as the Markov-based structures described in Time Molecules—that reveal how complex systems evolve through time.

Executive Function

In neurology, executive function refers to a cluster of higher-order cognitive processes orchestrated by the prefrontal cortex that govern goal-directed behavior, self-regulation, and adaptive problem-solving. These “command center” skills enable planning, impulse control, and flexible thinking in complex, changing environments.

Key Components:

Working memory: Temporarily holding and manipulating information.
Inhibitory control: Suppressing distractions or impulsive actions.
Cognitive flexibility: Switching strategies or perspectives as needed.

Primarily involves the prefrontal cortex, with support from the basal ganglia, thalamus, and anterior cingulate; impairments (ex. from TBI or ADHD) disrupt daily functioning.

Exploration Subgraph (Explorer Graph)

A graph structure used to support navigation and hypothesis generation when statistical relationships alone are insufficient to determine next steps. The Exploration Subgraph records provisional, role-based relationships—such as materials, components, processes, and conditions—that describe how things participate in other things, even when those relationships are indirect, asymmetric, or economically small. It functions as a hypothesis overlay connected to, but separate from, the core ontology, allowing the system to propose plausible adjacencies and re-enter correlation-based reasoning at a different level of granularity.

Inductive Reasoning

Reasoning from specific observations to broader generalizations. Induction looks at repeated patterns and infers rules or probabilities (e.g., the sun has risen every day of my life, therefore it will rise tomorrow). Inductive reasoning is the basis of machine learning, where models are trained on historical data to predict future outcomes. Unlike deduction, induction does not guarantee certainty, but it provides evidence-based likelihoods. See Peircian Triad.

Insight Function Array (IFA)

A collection of analytical functions that operate on BI dataframes (QueryDefs) to automatically detect and extract commonly recognized patterns, metrics, and anomalies from visualizable data. Each function encodes a specific class of observational insight—such as trend direction, volatility, changepoints, distribution skew, clustering shapes, or outlier members—mirroring the kinds of things human analysts routinely notice in charts but rarely persist. The outputs of these functions are stored as structured, searchable insight objects rather than remaining ephemeral observations.

Within the Enterprise Knowledge Graph, the IFA industrializes the act of “noticing.” Instead of insights vaporizing when a dashboard is closed, they are computed systematically across many queries, domains, and analysts, forming a reusable layer of behavioral metadata about enterprise data. In this sense, the IFA transforms BI from a visualization medium into an insight-generating substrate—capturing both what analysts look for and what they might otherwise overlook.

See my blog, Charting the Insight Space of Enterprise Data, and the topic in my book, Enterprise Intelligence, page 268, Business Intelligence Insights.

IRI (Internationalized Resource Identifier)

A globally unique identifier for a thing in a knowledge graph. An IRI is not a name or label—it is the identity everything else refers to.

Key Performance Indicator (KPI)

A performance indicator is any metric that tells us how some part of a system is doing: page-load time, call-center wait time, daily ad spend, fuel consumption, error rate, and so on.

A Key Performance Indicator (KPI) is a performance indicator that’s been “promoted“. It is explicitly tied to a strategic goal, is important enough to watch continuously, and will actually trigger decisions if it drifts. In other words, all KPIs are performance indicators, but only a small subset of performance indicators are truly “key.”

I choose to generally refer to performance indicators as KPIs because “pi” is overloaded (the more famous pi). More importantly, I use “KPI” as a blanket term for any performance indicator that makes it onto the strategy map. Metrics, key or not, and even goals and objectives are performance indicators. For example, achieving higher net profit is a goal, but it’s also something we measure. If it’s on the map, it’s “key” by definition, because it participates in the planning and trade-off structure rather than just being a background metric.

Knowledge Graph

A structured network of entities (nodes) and their interrelations (edges), often enriched with attributes and semantic context. Knowledge graphs enable machines to understand and traverse complex domains by encoding facts and their relationships in a graph format, powering search, recommendation, and reasoning applications.

Leaf-Level

In OLAP cubes, the leaf-level refers to the lowest level of detail stored in the cube—the point at which no aggregations have been applied. Each row at the leaf corresponds directly to a fact record at the cube’s base grain (e.g., individual transactions, line items, or events). From the leaf-level upward, higher-level summaries are derived through aggregations along hierarchies.

At leaf-level, the cube can be as large as the underlying fact table, since it retains all raw dimensional keys and measures. Querying directly at this level is equivalent to working with the base fact detail, while aggregated levels above provide faster, smaller, and reusable summaries.

Lit Up (Tuple Illumination)

A state within the Tuple Correlation Web (TCW) in which a tuple becomes actively prioritized for correlation probing due to triggering signals such as KPI distress, surprise insights surfaced by the Insight Function Array, or focused analyst investigation. When “lit up”, the tuple’s latent adjacency neighborhood receives exploratory attention — correlations are probed, reinforced, or manifested as edges within the materialized web. Illumination is not permanent; it follows a signal half-life, progressively dimming unless reinforced by continued distress, novelty, or investigative activity. In System ⅈ terms, tuple illumination is analogous to neuronal firing—a temporary activation that draws associative exploration without implying that the underlying relationships were newly created, only newly attended.

LLM System

An LLM-based system wrapped in retrieval, memory, and tool orchestration—so it can search, use external tools, and carry context across steps. The point of the term is to distinguish this whole system (ChatGPT/Gemini/Grok as experienced) from the underlying LLM component.

This is differentiated from the LLM itself, which is the product of a transformer architecture. I like to use the analogy that the LLM is to the LLM system as an aircraft carrier is to the Carrier Group.

Markov Model Ensemble (MME)

The Markov Model Ensemble (MME) is the analytical layer of the TimeSolution architecture that stores and manages Markov models generated from the Event Ensemble. While the Event Ensemble captures the raw sequences of events occurring across systems, the MME transforms those sequences into probabilistic models that describe how processes typically unfold.

Each Markov model in the ensemble represents the likelihood that one event will follow another within a particular context, such as a specific time period, location, or subset of cases. These models capture the dynamic flow of processes by estimating transition probabilities between events and measuring characteristics such as average time between transitions.

The MME can be thought of as analogous to the aggregation layer of an OLAP cube, where computationally intensive calculations are cached for efficient reuse. Once created, these models allow analysts and systems to study patterns of behavior, compare processes across different segments, detect anomalies, and make probabilistic forecasts about future events.

Together with the Event Ensemble, the Markov Model Ensemble forms the backbone of the TimeSolution architecture. The Event Ensemble provides the structured event data, while the MME converts that data into process models that reveal the statistical dynamics of complex, event-driven systems.

See Event Ensemble.

Master Data Management (MDM)

A set of processes and technologies used to create, govern, and maintain consistent definitions of core business entities such as customers, products, suppliers, and locations across systems. MDM resolves identity, removes duplication, and establishes authoritative records so that different applications refer to the same real-world entities consistently. In semantic contexts, MDM provides identity grounding, while knowledge graphs extend this foundation by adding explicit meaning and relationships.

MOLAP (Multidimensional OLAP)

MOLAP stands for Multidimensional Online Analytical Processing. In this approach, data is pre-aggregated and stored in a specialized multidimensional structure (often called a “cube”) rather than a standard relational database. These cubes are optimized for fast query performance, typically enabling split-second responses for complex, multi-level aggregations across multiple dimensions.

Because the aggregations are materialized ahead of time, MOLAP excels at query speed and predictable performance, even for large or complex queries. However, it requires significant processing time up front to build and refresh the cubes, and it introduces storage overhead—you’re effectively creating and maintaining a duplicate, summarized version of your data.

Tools like Microsoft SSAS (Multidimensional mode) and legacy enterprise BI platforms are examples of MOLAP implementations.

Key trade-off: MOLAP delivers blazing-fast queries and rich multidimensional capabilities, but at the cost of data freshness, flexibility, and maintenance complexity.

NoLLM (Not Only LLM)

NoLLM says LLMs aren’t all of AI—they’re one powerful thread in a tapestry woven from earlier “AI summers” that still matter: rules/Expert Systems, Semantic Web/KGs, classical ML, CEP/stream processing, planning/optimization, and more. In The Assemblage of Artificial Intelligence, NoLLM means composing these strands so each does what it’s best at: LLMs translate intent and bridge humans and systems; CEP reacts in real time; rules enforce policy; KGs carry meaning; classical models quantify risk; optimizers choose actions. The point isn’t nostalgia—it’s complementarity: by orchestrating proven parts with LLMs (not replacing them), you get systems that are faster, more governable, and less brittle than LLM-only stacks.

OLAP (OnLine Analytical Processing)

A read-optimized layer in a data warehouse built on dimensional models and data marts, where:

Dimensions (e.g. Time, Product, Region) slice the data cube
Measures (e.g. Sales, Profit) live at each cell
You slice, dice, drill and pivot for fast, ad-hoc analytics

OLAP systems are denormalized, batch-loaded, and tuned for complex queries over large datasets; OLTP systems are normalized, handle row-level transactions, and optimize for high-volume inserts/updates. OLAP is the analysis of transactions by dimensional slicing and dicing, while OLTP is the input and maintenance of transactions and entities of a database.

O(n) (Linear Time Complexity)

Denotes an algorithm whose running time (or space usage) grows proportionally with the size of its input, n. In practical terms, if you double the amount of data, an O(n) process will take roughly twice as long—common examples include single-pass loops and simple scans through a list.

On-the-Fly Aggregation

Ephemeral aggregation that’s computed on the OLAP node from root or source when no persisted agg exists; cached locally only until restart.

OODA

An acronym for Observe, Orient, Decide, Act, first developed by U.S. Air Force Colonel John Boyd to describe the cycle of decision-making in combat. In business and AI contexts, the OODA loop represents a dynamic model of intelligence under pressure: observing signals, orienting by framing them against context, deciding on a course of action, and acting to change the environment. The cycle then repeats, with each action creating new observations. OODA is powerful because it is recursive and adaptive—it captures not just reaction but continuous learning, making it a natural structure for knowledge graphs and reasoning systems that must operate in real time.

OLTP (Online Transaction Processing)

High-volume, fine-grained, write-heavy systems that handle day-to-day transactions (ex. add order, update balance) with strict consistency and low latency. Contrasting with OLAP, OLAP scans/joins large data volumes and returns a relatively small result set (aggregates, summaries), whereas OLTP reads a small, fixed set of rows, updates them, and writes back immediately.

Parameters

In a language model, parameters are the internal numerical values—essentially weights—that determine how the model maps input text to output text. Each parameter represents a learned connection between features of language, adjusted during training through gradient descent. The more parameters a model has, the richer and more nuanced its representation of relationships between words, ideas, and contexts—but also the greater its computational and energy cost.

Pearson Correlation

A statistical measure of the linear relationship between two continuous variables, denoted by r. It ranges from –1 (perfect negative linear association) through 0 (no linear association) to +1 (perfect positive linear association). Computed in O(n) time by comparing paired deviations from each variable’s mean, Pearson’s r tells you how strongly—and in which direction—two series move together.

Peircian Triad

Charles S. Peirce’s three modes of inference that drive inquiry:

Abduction — hypothesis formation (If A were true, B would be expected; observe B ⇒ maybe A).
Deduction — derive testable consequences from a hypothesis (If A then B; A ⇒ B).
Induction — evaluate and update credibility from data (How strongly does evidence support A?).

Practical prompts:

Abduction: “What are the top 3 plausible explanations that would make the observation unsurprising?”
Deduction: “For each, what unique prediction could distinguish it from the others?”
Induction: “Given the data, how much did my belief move? What error bars remain?”

Performance management

An ongoing, organization-wide system for turning strategy into results by setting goals, measuring what matters, learning from outcomes, and adjusting course. It links goals → measures (KPIs) → reviews → decisions → actions across levels (enterprise, team, individual) so work stays aligned and improves over time.

Core elements:

Direction: clarify strategy and translate it into objectives, success criteria, and ownership.
Measurement: define a small set of KPIs (leading and lagging), targets, and data sources.
Cadence: run regular reviews (weekly/monthly/quarterly) to assess progress and remove blockers.
Decisions & actions: choose interventions, allocate resources, and record why changes are made.
Learning loop: compare expected vs. observed results, update assumptions, and refine the plan.

Common frameworks & tools:

Balanced Scorecard (BSC) with strategy maps to visualize cause-and-effect among objectives.
Objectives and Key Results (OKRs) — Objectives = qualitative goals; Key Results = measurable outcomes that signal success.
Management dashboards, reviews (QBRs), retrospectives, and issue/risk logs.

Good performance management creates alignment, accountability, and learning: people know what matters, see progress, and can act on evidence—not just opinion. Poor practice reduces it to score-keeping; good practice makes it a feedback system that continuously improves execution.

Phase Change

A phase change is the point where a system stops behaving the way it used to and begins operating under a different set of dynamics. It’s not just that things are getting “more” or “less” of something — the nature of the response itself shifts. Before the phase change, pressure builds gradually and outcomes scale in a familiar, manageable way. After the phase change, the same pressure produces outsized, often nonlinear effects. In business and operational settings, phase changes often appear as thresholds being crossed — inventory buffers suddenly failing, customer churn accelerating, costs spiking, or systems cascading into instability. Colloquially, it’s the moment where “things stop creeping and start exploding,” marking the transition from tolerable stress to fragile escalation.

PMML (Predictive Model Markup Language).

An older, vendor-neutral XML wrapper for classic ML models: you train in Tool A (SPSS/SAS/KNIME, etc.), export PMML, and score in System B without retraining. It carries the schema, feature transforms, and the model itself—great for audit trails and for the “don’t drift between train and serve” problem. PMML shines with regressions, trees, scorecards, and other traditional algorithms, but it never kept pace with Pythonic pipelines and modern deep learning. Today it’s still alive but niche—you’ll see it in banks/insurers with long-lived stacks; new projects usually pick ONNX, MLflow flavors, or just ship a containerized Python scorer. Net: solid for legacy governance, not the default for greenfield.

Premature Convergence

A condition in evolution, optimization, or technology where systems settle too early on a “good enough” solution. In biology, it describes species that adapt narrowly and lose future flexibility. In AI, it refers to the widespread adoption of early-stage methods—such as Large Language Models—that work well enough to dominate, but risk locking out richer or more balanced approaches. Premature convergence is not failure; it is limitation. It warns us that progress may stall when success arrives too soon.

Process Mining

The analysis of time-ordered event data to discover, reconstruct, and evaluate the processes that produced it. Process mining identifies recurring sequences, variations, bottlenecks, and deviations in real behavior—often revealing processes that were never explicitly modeled or documented.

RAG (Retrieval-Augmented Generation)

A hybrid AI approach that combines a language model with an external knowledge store (documents, databases, or a knowledge graph). When given a prompt, the system first retrieves relevant information and then augments the model’s output with those facts—reducing hallucinations and grounding responses in up-to-date, sourceable data.

Resource Description Framework (RDF)

A foundational Semantic Web model for representing information as subject–predicate–object triples. RDF provides a flexible, graph-based syntax (e.g., Turtle, RDF/XML) to encode facts about resources in a machine-readable way.

ROLAP (Relational OLAP)

ROLAP stands for Relational Online Analytical Processing. Unlike MOLAP (Multidimensional OLAP), which stores pre-aggregated cubes in proprietary formats, ROLAP operates directly on relational databases—calculating aggregations on the fly using SQL at query time.

ROLAP leverages standard SQL engines to serve dimensional queries, often translating cube-like structures (dimensions, hierarchies, measures) into joins and GROUP BY operations. This avoids pre-processing and cube build times but can result in slower initial queries, especially without proper indexing or caching.

In practice, ROLAP queries often hit summary tables or intermediate caches—in memory or on disk—especially for frequently accessed aggregations. However, these cached layers are usually transient, meaning they disappear after a server restart unless explicitly persisted. Many BI engines and semantic layers blend ROLAP behavior behind the scenes, using smart query generation and temporary caching to mimic cube performance.

Key trade-off: ROLAP sacrifices some query speed in exchange for greater flexibility, reduced storage overhead, and real-time reflection of changes in source data.

Root aggregation

The grand total of a measure across all dimensions—i.e., the cube’s All/Total level. In the aggregation lattice it’s the root node (∅ attribute set): a single value like “Total Sales (all products, regions, dates).” Engines often compute/store it implicitly (ROLAP: GROUP BY (); MOLAP: a stored cell). It’s the baseline cell every query can roll up from, useful for caching, quick totals, and sanity checks. Think of it as a flattened cube.

SBAR (Situation-Background-Assessment-Recommendation)

A structured communication framework originally developed in healthcare to standardize handoffs and urgent escalations. Users succinctly describe the Situation, provide relevant Background, share their Assessment of the problem, and offer a clear Recommendation—ensuring concise, focused, and actionable dialogue.

Self-Supervised Learning

A technique where the system creates its own learning signals from unlabeled data by hiding part of the input and training itself to predict the missing part. This is how modern language models, vision models, and audio models scale: predict the next word, fill in the masked patch of an image, or reconstruct missing audio. It’s not supervised by humans, but the model is “supervising itself” by turning raw data into prediction tasks. In human terms, it’s like learning by completing patterns—predicting what comes next in a sentence, or guessing what’s behind an occluded object—long before anyone explains the rules.

Semantic Layer

A curated, governed abstraction layer that transforms raw, fragmented enterprise data (from warehouses, lakes, streams, or mesh domains) into a consistent, business-meaningful view—exposing measures, dimensions, hierarchies, KPIs, and valid grains in a performant, queryable form (often via pre-aggregated OLAP structures like those in Kyvos or legacy SSAS UDM).

In the context of Enterprise Intelligence, the semantic layer serves as the reliable substrate—the “spearhead”—that grounds higher-order reasoning: it enforces consistent definitions across tools and users, caches context to conserve tokens in LLM interactions, enables millisecond analytics without recomputation, and provides the structured foundation upon which knowledge graphs, ontologies, stories, Tuple Correlation Webs (TCW), Insight Space Graphs (ISG), and dynamic relation exploration (ex. Explorer Subgraph) can reliably build. Far from a mere modeling convenience, it bridges operational data to executable meaning, reducing hallucinations in AI assemblages, supporting causal hypothesis chains, and ensuring decisions trace back to trusted, auditable business logic rather than raw noise. In the LLM era, BI’s semantic layer evolves from static cubes to a living, stitchable component of the Enterprise Knowledge Graph (EKG), where curated analytics meet semantic web-scale connectivity for true hybrid intelligence.

A governed semantic layer provides a unified business view (metrics, dimensions, hierarchies, rules) across data sources. In the AI era, it grounds LLMs and agents in trusted context, prevents metric drift, and handles massive concurrency with accelerated performance. Kyvos exemplifies this with its GenAI-powered capabilities and MCP Server integration.

Semantic Web

An extension of the World Wide Web that adds formal semantics to data, allowing information to be shared and reused across application, enterprise, and community boundaries. It relies on standardized data models and vocabularies so machines can interpret and integrate information from heterogeneous sources.

Slice and Dice (Query Pattern)

Slice and dice is a fundamental query pattern in Business Intelligence (BI) and OLAP (Online Analytical Processing) that refers to filtering (slicing) and regrouping (dicing) multidimensional data to explore it from different perspectives. Technically, it is often implemented through SQL GROUP BY queries and filtering conditions, forming the basis of most OLAP cube interactions.

SPARQL (SPARQL Protocol and RDF Query Language)

A W3C-standard query language for retrieving and manipulating data stored in RDF (Resource Description Framework) format. SPARQL operates over triple stores and is designed to extract patterns from semantic graphs, similar to how SQL works with relational databases. It supports filtering, aggregation, subqueries, and federated queries across multiple RDF sources. While powerful for declarative graph querying, it lacks native support for recursion or rule-based reasoning.

SSAS MD (SQL Server Analysis Services Multidimensional)

The original OLAP engine from Microsoft SQL Server Analysis Services, built on the multidimensional (MD) model. SSAS MD organizes data into cubes, dimensions, measures, and hierarchies, and uses the MDX query language. It supports MOLAP, HOLAP, and ROLAP storage modes, and is optimized for drill-down style exploration—sums, counts, averages, and other aggregations across large datasets. While later replaced in many deployments by SSAS Tabular (using DAX and columnar storage), SSAS MD remains a powerful engine for complex cube designs, advanced calculations, and traditional OLAP-style analytics.

Supervised Learning

A form of machine learning where the algorithm is trained using labeled examples: inputs paired with the correct answers. The model’s job is to learn the mapping from input to label. Classic examples include “cat vs dog” image classifiers, sentiment labels on text, and medical diagnosis models. It resembles how humans learn when someone explicitly tells us “This sound means ‘dog,’ this thing is hot, don’t touch that.” Supervision provides the target.

SWRL (Semantic Web Rule Language)

An extension to OWL (Web Ontology Language) that allows users to define Horn-like rules (if-then statements) over RDF and OWL ontologies. SWRL enables basic inference on top of ontologies—such as classifying individuals or inferring new relationships—but is limited in expressivity (no recursion, no negation-as-failure) and computationally expensive at scale. Unlike Prolog, which is a full logic programming language, SWRL is constrained by OWL’s open-world and monotonic reasoning assumptions.

System ⅈ

System ⅈ refers to the background, non-goal-directed processes of intelligence that continuously explore, integrate, and associate information without explicit instruction. It operates in parallel, over long time horizons, tolerates ambiguity, and surfaces patterns, anomalies, analogies, and candidate explanations rather than conclusions. In humans, it aligns with default-mode-network activity; in artificial systems, it includes exploratory processes such as event correlation, process discovery, probabilistic modeling, and context formation that feed higher levels of reasoning.

System ⅈ is a third part of cognition that I created inspired by Daniel Kahneman’s notion of System 1 and System 2 in his book, Thinking Fast and Slow. For AI, my utilization is a bit different:

System 1: System 1 consists of fast, automatic recognitions and responses built from prior learning and reinforcement. It executes without deliberation, producing immediate outputs based on patterns that have already been validated. In artificial systems, System 1 corresponds to deployed rules, trained models, and reflexive responses that act directly on events.
System 2: System 2 is deliberate, conscious reasoning: slow, serial, and effortful. It is invoked when System 1 fails or uncertainty is high, and it works by constructing, testing, and refining explicit hypotheses. In artificial systems, System 2 includes human-guided analysis, planning, explanation, and deep reasoning—often delegated to tool-augmented reasoning agents—whose outputs may later be distilled into System 1 or explored further by System ⅈ.

This concept is introduced in my blog, System ⅈ: The Default Mode Network of AGI.

Time Solution

The implementation of what I lay out in my book, Time Molecules. It consists of:

A high-scale event processing system, which complex event processing features.
An “event warehouse” that houses the events.
A module to transform the events within the event warehouse into Markov models and conditional probabilities.
A set of functions to build, maintain, and query the Markov models.

Trade-off/Semantic Network (TOSN)

A Trade-off/Semantic Network (TOSN) is a graph-based modeling framework for representing and managing complex configurable systems, such as database servers or enterprise software environments. It combines two integrated layers:

The trade-off graph component captures directional cause-and-effect relationships between elemental attributes—atomic, quantifiable system characteristics (e.g., CPU utilization, memory allocation, index usage, query performance metrics). These relationships are explicitly defined as trade-offs, which can be proportional (changes move in the same direction) or inversely proportional (one increases as the other decreases), unidirectional or bidirectional, and configurable to model cascading impacts of attribute changes.
The semantic/taxonomy layer provides a hierarchical classification structure using relations such as “is a,” “part of,” or “a kind of,” enabling attributes and relationships to be grouped, inferred, and reconstituted into higher-level concepts or real-world system objects.

The core purpose of a TOSN is to enable proactive and reactive system administration: diagnosing “pain” states (where tolerances or limits are exceeded), tracing causal paths (including direct/first-class paths respecting direction and strength, and broader second- or third-class paths), and facilitating reconfiguration by manipulating attribute values along identified paths to alleviate issues.

Dynamic behavior is supported through agility extensions, which allow modification of attribute values, enforcement of constraints (e.g., min/max bounds, pain thresholds), external logic calls, and propagation of changes according to the defined trade-off rules. This makes the network executable in nature, supporting simulation, optimization, and automated adjustments while preserving transparency in how changes propagate.

Originally conceived in 2004–2006 for SQL Server performance tuning, the TOSN framework models interconnected subsystems where adjustments involve inherent gains and losses, with the intent to capture contextual nuances (“it depends” scenarios) through configurable logic and relationship dynamics—laying groundwork for later evolutions like Conditional Trade-Off Graphs that emphasize explicit conditional rules (e.g., via Prolog) for strength scaling, exception handling, and enablement guards.

See:

Original take: US Patent No. 7,302,418 B2, “Trade-off/semantic networks,” issued November 27, 2007, for the formal specification.
Updated Take: Conditional Trade-Off Graphs – Prolog in the LLM Era

Tribal Knowledge

Informal, experience-based knowledge held by individuals or small groups rather than documented in systems or processes. It often includes unwritten practices, workarounds, and institutional memory critical to operations but vulnerable to loss when personnel change, making it a key risk factor in governance, scalability, and knowledge transfer. AKA:

Institutional Knowledge: Probably the most formal and widely accepted synonym. Refers to accumulated know-how within an organization, whether documented or not (though often still informally held).
Institutional Memory: Emphasizes historical continuity — how things were done, why decisions were made, what failed before. Common in governance and operations discussions.
Tacit Knowledge: More academic / knowledge-management terminology. Refers to knowledge that is hard to articulate or codify — learned through experience rather than documentation.

Triple Store

A database platform designed specifically to store and query RDF data using the subject–predicate–object triple model. Triple stores treat IRIs, ontologies, and semantic relationships as first-class constructs and provide native support for SPARQL querying, named graphs, and inference. They differ from property graph databases in that meaning is encoded through standardized RDF semantics rather than application-defined node and edge properties.

Term	Precision	When to use it	Two Best Options	Best Product
Triple Store	High	When emphasizing RDF triples storage/query	Triple Store / RDF Store	GraphDB
RDF Store	High	Same as triple store, slightly more formal	RDF Store / Triple Store	Stardog
Graph Database (RDF)	Medium	When contrasting with property graphs	Triple Store / RDF Store	Stardog
Semantic Graph Database	Medium	Marketing / architectural framing	Triple Store / RDF Store	Stardog
RDF Database	Acceptable	Informal, blog-friendly	Triple Store / RDF Database	Apache Jena Fuseki

Transference of Cost

In performance management, this refers to improving one measure by offloading its burden onto another part of the organization. For example, reducing call-center handle time may look like efficiency, but if it leaves more issues unresolved, the cost reappears later in escalations, churn, or diminished customer satisfaction. Transference of cost exposes the hidden trade-offs behind KPI gains and reminds us that true performance improvement comes from systemic balance, not shifting burdens.

Tuple (BI Context)

A qualified data value formed by the intersection of one or more dimensional members, usually paired with a measure. In Business Intelligence, a tuple represents a specific analytic coordinate within a semantic space—for example, (Product = “Beef”, Region = “Brazil”, Month = “Jan 2026”)—combined with a metric such as sales, cost, or volume. That tuple example is the same as the question: “What is the sales amount of beef in Brazil during January 2026?”

Tuples function as the atomic units of analysis and correlation within the BI semantic layer, serving as the “puzzle pieces” whose relationships and co-movements are explored within structures such as the Tuple Correlation Web (TCW). Considering a classic BI multidimensional cube (a number of dimensions and a “measures” dimension), each intersection is a tuple. See my blog, An MDX Primer.

Tuple Correlation Web Probing Protocol

The exploratory process used to navigate the virtual correlation space of the Tuple Correlation Web (TCW). Rather than materializing all tuple relationships in advance, the protocol dynamically assembles correlation neighborhoods on demand, guided by prioritized signals such as KPIs in distress, surprise insights surfaced by the Insight Function Array, and embedding-based affinity cues. Mirroring how a human assembles a puzzle, the TCW Probing Protocol follows coarse relational hints, drills into stronger associations, and expands outward through moderate correlations—escalating to AI-generated “quasi-tuple” hypotheses when native signals are sparse. It operationalizes correlation discovery as a targeted, demand-driven search rather than an exhaustive precomputed map. See the page, Tuple Correlation Web Probing Protocol.

Unified Dimensional Model (UDM)

A conceptual layer introduced in SQL Server Analysis Services Multidimensional (SSAS MD) that presents enterprise data as a single, consistent dimensional model. The UDM integrates disparate relational sources into an OLAP cube structure, exposing measures, hierarchies, and KPIs through dimensions and facts. It allows users to query data using MDX as if all the information were contained in one unified cube, regardless of the underlying sources.

The UDM was more than just a modeling layer; it acted as an early semantic layer, abstracting complex schemas into business-friendly terms while providing the performance benefits of MOLAP. Many modern semantic layer approaches, especially in MOLAP systems like Kyvos, can trace their lineage back to the UDM concept—using pre-aggregated cube structures not only for speed but also to enforce consistent business logic across tools.

Unsupervised Learning

A type of learning where the model receives data without labels and must find structure on its own. It clusters, groups, compresses, or discovers patterns purely from the shape of the data. Examples include grouping customers by behavior, identifying anomalies, or discovering latent topics in documents. It’s similar to how an infant observes the world before understanding language: seeing shapes, hearing sounds, noticing similarities without anyone naming them.

Vector Database

A Vector Database is a system designed to store and retrieve high-dimensional numeric vectors—typically embeddings—using similarity measures rather than exact matches. Queries return items that are most similar to a given vector according to a distance metric (such as cosine similarity or dot product), making vector databases well suited for semantic recall, fuzzy matching, and neighborhood expansion over large collections of unstructured or semi-structured artifacts.

In an intelligence architecture, a vector database does not represent meaning, identity, or truth; it represents proximity in a learned embedding space. Its value lies in rapidly surfacing plausible candidates and associations, especially in System ⅈ–style background processes. Vector similarity can suggest what might be related, but those suggestions must be grounded through structured graphs, correlation, and empirical validation before they are trusted or acted upon.

Vector Embeddings

Vector embeddings are numerical representations of text or data entities encoded as points in a high-dimensional space, where semantic similarity is reflected by geometric proximity. In enterprise contexts, embeddings are often generated from metadata such as table and column descriptions in a data catalog. Once embedded, fields with similar meaning—e.g., CustomerID, ClientKey, or AccountNumber—cluster near one another even if naming conventions differ. This enables semantic search, automated lineage discovery, and ontology alignment by comparing vectors rather than relying solely on exact text matches or predefined taxonomies. Embeddings therefore act as a bridge between unstructured business language and machine-navigable similarity, supporting discovery and reasoning across large analytical estates.

Web of Functions (WoF)

A set of functions that are linked through their inputs and outputs. In other words, the output of a function is the input to other functions. The functions are generally machine learning models. Additionally, they could be relatively simple Prolog or functions written in programming languages such as Python and C#, and deployed onto a scalable function platforms such as Azure Functions and AWS Lambda functions.

In a System 1 tradition, the functions should be very fast. But they should also be invariant (the same input returns the same output).

Web Ontology Language (OWL)

A richer ontology language built on RDF that adds formal logic constructs—classes, properties, restrictions, and axioms—for defining complex vocabularies and enabling automated reasoning. OWL lets you specify hierarchies, cardinalities, and constraints, making it possible to infer new knowledge from existing graph data.

Self-Supervised Learning

Supervised Learning

Share this: