Closer to Causation – Prolog in the LLM Era – Spring Break Special

Welcome to the Prolog in the LLM Era Spring Break Special! Starring … decoupled recognition and action … event streaming … correlation doesn’t imply causation … and our special guest star … Deductive Time Travel!!!

Notes before diving in:

Please see Part 1 in the series for background on Prolog if you’re not familiar with it. These are the takeaways from Part 1 relevant to this blog:
- Prolog is a declarative language designed for logic, deductive reasoning.
- In an era dominated by LLMs and machine learning, Prolog offers clarity, auditability, and control.
- Where LLMs suggest what might be true, Prolog states what must be true under defined assumptions.
- Declarative rules (like those in Prolog) can coexist with probabilistic models by capturing the point where hypotheses become operational logic.
- This blog extends the idea by exploring how Prolog can be used to encode event-triggering logic, bringing us closer to causation in event-driven systems.
TL;DR: We’re shifting from looking at systems as scattered fragments of knowledge to understanding them as parallel and evolving flows of interacting processes. Event-driven thinking is taking over—from web clicks to IoT to process mining—because it reflects how things actually happen. But we still fall short of causation if we only log what happened. We need to know why and how each event was triggered. If that triggering logic were written in Prolog—a declarative language built for expressing logic—we could attach a pointer to the exact version of the rules, plus any extra facts used but not recorded. That gives us a richer picture of intent and reasoning, helping us move beyond surface patterns and closer to real causation.

We’ve come a long way in capturing events—Web page clicks, purchases, complaints, diagnoses, IoT emissions—and enriching them with their associated properties like customer ID, product category, price paid, or symptoms presented. But we’re still missing something critical: the parameters, the rules, the logic, that triggered the event. Was it a policy? A gut instinct? An LLM’s hallucination-prone recommendation? Did it involve months of debate in Congress?

The logic behind an event is really the most valuable, yet most invisible, over-looked, part of the system. Think about how someone takes an action or responds to your question. You’re aware of some circumstances (parameters) that went into the action or response, but you don’t know what else is going on in that person’s head.

In this post, I argue that we need to go beyond just logging the who, what, and where of the events we capture in our databases. We need to better capture how and why the event was triggered. And for that, I propose using Prolog as a common encoding format to declare the rules that shape decisions, especially when those rules have already been battle-tested through machine learning or expert judgment.

This becomes even more important as event-based computing and analysis continues to grow. That is, driven by the billions of devices deployed as the Internet of Things and billions to trillions of AI agents, and many entities that are a combination of the two. Which are integrated and processed through event streaming, complex event processing, event sourcing, and event-oriented analytics implementations like I discuss in my book, Time Molecules (release Spring 2025), which is the time-oriented counterpart to tuple-oriented OLAP cubes.

What we typically do with events—especially in business intelligence (BI) and machine learning (ML) pipelines—is correlate them. We look at what tends to happen before or after, how events cluster together, how their respective metrics go up and down together, and how properties align. It’s the old principle of “what fires together, wires together”—a phrase often associated with neural learning, but it holds a deeper truth: deductive reasoning begins with repeated associations. When events occur together often enough, we suspect a relationship. That suspicion is the seed of logic. We begin to ask: Is there a rule here? Can we explain it? Can we use it to predict and intervene? This correlation-first mindset is fundamental—it’s how both statistical models and human intuition start the journey toward reasoning.

Yes, correlation doesn’t imply causation. Every time I say “correlation”, I’m told that … hahaha. However, although mistaking correlation for causation can lead to disastrous decisions, correlations are the possible hints to the realization of valuable relationships. It’s a double-edged sword. They are like a shiny glint in the distance that could be a diamond, but it’s usually just a shard of quartz or glass. They are the hypotheses from where to begin a process towards understanding causation and applying that understanding towards resolving problems.

ML as we know it today can only show us patterns that show up in data, how one event seems to follow another. But causation is harder—often nearly impossible. It means more than seeing a pattern—it means testing, isolating, and ideally, disrupting the pattern to see if the outcome changes.

As automated systems continue to proliferate—AI agents, IoT edge devices, robots—they generate an ever-growing stream of events. But if we want to move beyond surface-level correlative, probabilistic patterns and get closer to understanding causation, we need more than just the events and their properties. We need to link each event to the logic that led to it—the reasoning that processed those properties and triggered the outcome.

As a declarative language, Prolog is built specifically for encoding logic (indeed, the “log” in Prolog). With a deep legacy in AI, Prolog offers a universal and highly robust format for encoding the decision-making rules behind events. By attaching not just metadata (properties), but the human and machine-readable logic that triggered each event, we open the door to systems that are not only reactive, but intelligible—systems we can audit, trust, and continually refine.

When Do We Have Causation?

Even if we believe something with all our heart—because we’ve seen it a vigintillion times, or because all the experts told us so—it doesn’t mean we’re looking at actual causation. For example, in the context of its effect on health, since the early 20th century, eggs have been declared good (1900s–1950s), bad (1960s–1990s), good again (2000s), maybe bad (early 2010s), possibly great (late 2010s), and finally, a more nuanced, it depends (2020s). Sometimes, it’s just a very convincing illusion founded upon the inability for any of our brains to overcome the imperfect information of the world we live in.

Even in math (remember, “numbers don’t lie”) there are beautiful examples of patterns that seem ironclad … until, waaaaay later, they fall apart. Take Mertens’ Conjecture, for instance. Mathematicians noticed a certain function—let’s call this one a kind of “yes-no counter” that flips back and forth—never got too far from zero. They tested it through millions, even billions of values, and it always behaved. It looked like a rule. But it turns out that way, way further down the number line—past any practical limit anyone had checked—the rule breaks. It seems like that Merten’s Conjecture fails around 10 to the 10 to the 14 (1 followed by a hundred trillion zeroes)—that’s kind of more than a vigintillion, what I thought was the biggest number as a child.

That’s why in math, you need a proof. Until you have one, you’re just looking at a really long lucky streak and calling it a law. In the everyday world we live in, we have courts, experts, and methodologies for attempting to develop proofs. Putting aside the infallibility of courts and experts, one such methodology is A/B testing—the IT-world cousin of the Randomized Control Trials we’ve heard so much about over the past few years. A/B testing is faster, less formal, but built on the same core idea: random assignment and controlled comparison to infer causality.

It’s the frustrating nature of things that for the most part, we may never be absolutely certain about causation. A profound issue is that we can’t know all the times an event didn’t happen. But we can inch ourselves closer to it with A/B testing. In a typical experiment, we test a hypothesis by randomly splitting the world: one group experiences the change, the other doesn’t. If everything else stays the same, and the outcome shifts, we can start to talk about causation. In web systems (esp. e-commerce), this is especially doable—different versions of a webpage, an ad, or a recommendation algorithm can be run live. Rules are changed in real-time, behaviors are tracked, and patterns are observed across control and test cohorts.

But most of the data we capture and work with wasn’t designed to prove causation. Note, before BI was a mainstream thing, we used to say, “most data wasn’t designed for analysis”, hence, all the DW/ETL stuff was needed. Until a fairly well-reasoning AI became plausible (ChatGPT Nov 2022), we usually just concern ourselves with capturing data. We then rely on our human intelligence to interpret what happened—the human intelligence composed of countless models we’ve trained into our heads through our education courses and experience. We just wanted the facts and we would do the heavy lifting of reasoning.

Over the past decade and some years, ML has become prevalent enough to reverse-engineer rules from this data to varying degrees of accuracy. For example, decision forests deriving the combinations of attributes and values that measure the probability of a variable value. But those ML models don’t extend their computation beyond the set of properties presented to it. Our brains generally can associate outside the dataframe with years of physical experience in the real world. Today’s level of AI (LLMs and the supporting systems that have emerged around it over the past few years) have a level of reasoning that doesn’t yet match our deepest reasoning capability nor is it anywhere near flawless, but it opens the door to widespread, non-human reasoning.

To be clear, ML, AI, our human intelligence, and the software we’ve painfully coded over the decades (in a sense, the manual encoding of our human intelligence) all currently contribute significantly in unique ways.

Correlation Example

Consider an event stream from customer support. We might see that a customer complained, an agent issued a 25% coupon, and the customer later left a positive review. But does that mean the coupon caused the satisfaction? Or was it the tone of the agent? Or the fact that the customer had already calmed down? The customer realized he was over-reacting? We don’t know. We just have a record of the sequence, not the context that gave rise to it.

Sometimes what looks like a direct cause-and-effect relationship is actually being shaped by a confounding variable—a hidden factor influencing both the trigger and the outcome. Recognizing these variables is another way we move closer to causation. They don’t always show up in our data, but when they do, slicing models by those extra properties—or folding them into our logic—can reveal what was previously obscured. They’re not the focus here, but they’re big part of the toolkit for getting closer to causation.

To get closer to causation, we need more than events—we need to understand the nature of the event triggers. As mentioned in the introduction, that’s why every event should carry with it a bundle of meaningful metadata. We often attach this as a JSON payload: product details, price paid, customer location, agent ID, even inferred sentiment. This allows us to slice the analytics models computed from events by those properties. We begin to see different paths emerge—different probabilities for different kinds of customers, different kinds of agents. But even that has limits.

Why did the agent issue a coupon? Was it because they liked the customer? Why? Feared a bad escalation? Why? Followed a strict policy? Unless that rationale is captured, we’re still speculating. And even if we later learn the key variable, we likely weren’t capturing it at the time. So we test again. We isolate the factor. If it proves meaningful, we add it to our logging. But now that variable is no longer natural—it’s part of a new system, one that’s already changed by our observation.

Even so, there are always elements that are unknown, often unknowable. No matter how much data we capture, we remain observers of interacting black boxes—interpreters processing the captured properties through the models trained in our brains over our lifetime. Some gaps can be filled with knowledge graphs—structured relationships that imply missing facts. Some can be bridged by machine learning, finding patterns we somehow couldn’t see. But both still float in probability.

Why You Can’t Always Learn the Rules from Data

Suppose we have a database of people who chose to take a particular medication—and those who didn’t. Let’s say the populations solidly represent the general public. For each person, we’ve got dozens of attributes: age, gender, income, education, prior conditions, maybe even some lifestyle factors. The question is: Why did some people opt in, and others didn’t?

We could throw ML at it. Train models that essentially reverse-engineer plausible “hows and whys” (the rules) from the data. And it might find patterns—people above a certain age with condition X were more likely to take it, or people in zip code Y tended to avoid it. Fine.

But the model is stuck inside a box—a box constrained to the attributes we collected. Even with dozens to hundreds of attributes, it’s a tiny box compared to all of what makes a human decision.

Dozens to hundreds of properties barely scratch the surface of the attributes that could describe billions of unique individuals. The real model—what we carry around in our heads—is shaped by unique characteristics sculpted by childhood, traumas, accomplishments, culture, stories, gut feelings, TV commercials we forgot we saw, and how tired we were when the decision came up. That’s why we might come up with different conclusions based on what is seemingly the same data.

Even if two people arrive at the same decision, they may have gotten there by different paths varying from nuanced differences to completely different tracks of reasoning. One might trust their doctor completely. Another might be acting out of fear. Another might have talked to a friend who had a bad experience. From the outside, their data might look the same. Inside, there are different sets of variables and different wiring.

Even follow-up surveys can only go so far. Yes, those questionnaires have free-text boxes, but most people leave them blank. And even when they do fill them out, you need them to recall what they were thinking at the time—not after the fact, not filtered by how they feel now. And how many people wish to write out a significant chunk of text, or conversely, can adequately express themselves in 200 through 1000 characters?

Deductive Time Travel

Lastly, not only does logic differ among people, but logic evolves in each of us as we learn more—logically for the better, but not always. Each person also plays multiple roles in life, which means different rules probably apply in different contexts. Still, that evolution and contextual logic can be encoded in versions of a Prolog, or even a single Prolog. And in automated systems, that logic should be attached to the events it produces—not necessarily embedded entirely, but referenced via a GUID or pointer to the exact version of the Prolog that made the decision.

Rules are often implemented in cloud functions or software layers that silently change over time. Rarely do we have access to what the rules used to be when an event occurred. But if each version of the logic is stored as a stand-alone Prolog file and referenced at the event level, we gain something extremely powerful: Deductive Time Travel. We can reconstruct the reasoning at the time of the event—not just what happened, but why it happened then, according to the rules that existed at that point in time. That’s not just logging. That’s interpretability, auditability, and a step closer to true intelligence.

Prolog

Prolog is a declarative language designed for logic programming, meaning you describe what you want to conclude rather than how to compute it. It operates as a logic engine, allowing you to express rules and relationships directly, without having to manage control flow or state like in a general-purpose language such as Python. SQL is an example of another declarative language that should be more familiar to readers—used to query relational databases by describing the desired result, not the step-by-step procedure. But while SQL is largely limited to data retrieval and filtering, Prolog’s expressiveness goes far beyond. It can encode complex rules, contextual facts, recursive relationships, constraints, and hypothetical reasoning. You can represent not just straightforward conditions, but layered, composable logic that mirrors how humans reason—especially in domains where decisions emerge from a web of interrelated facts.

If the reason something happened is because a rule fired, then the rule is the reason. And if that rule is written in Prolog, you can trace it, examine it, “debug” it, improve it. You don’t need to guess what logic might have been in play. You can see it. That’s the power of declarative logic.

ML is about reverse-engineering the rules behind behavior and Prolog is about declaring them. Prolog doesn’t infer what probably happened—it states what must logically be true, based on what we know. Whether that logic is correct is another matter. But at least it’s transparent. A human customer support agent can take or leave a Prolog recommendation. An AI agent might just follow it.

We may never be able to fully eliminate uncertainty. The world is constantly in motion—people behave in unexpected ways, policies shift, interpretations vary. But Prolog gives us a way to say: Given these facts, here’s the conclusion. Right or wrong, it’s declared, not implied.

Every event happens when just the right conditions come together. Our subconscious often recognizes this before we do. And when it does, we act. That’s what intelligence is—not just push-button response, but the ability to recognize context, weigh implications, and act with purpose.

Decoupled Recognition and Action

One of the most overlooked aspects of intelligence is that it’s not a single act—it’s an iterative process of cycles of recognition and action. We observe what’s going on, orient ourselves to a situation, we decide what to do, and execute the action. The key is that these are decoupled processes. Recognition is about pattern matching, context awareness, identifying what’s going on. Action is a separate step—it requires strategy, prioritization, risk assessment. It requires scrutiny because it’s usually physically irreversible. I detail this in Levels of Intelligence.

This decoupling of recognition and action is critical in adversarial environments—the real world—where we’re all making moves with incomplete information, often in direct competition with others. Think Texas Hold’em: the player recognizes the current state of the table, their hand, the betting patterns… but selecting the next move—bet, raise, fold—requires another layer of reasoning. It’s not just about what is, but what could be. Think about that annoying fish that recognizes something good to eat. But instead of just chomping at it, it hesitates, realizing up close something isn’t quite right about a worm dangling around as it would from a fishing line.

And that’s what brings us—loop by loop—closer to causation. But recognition and action don’t just happen once. They form a cycle. We recognize a situation, select an action, and observe what happens as a result. That observation becomes the starting point for the next loop—just like the OODA Loop: Observe, Orient, Decide, Act.

That recognition—what’s really going on—isn’t the action itself. It’s the orienting step. It’s where we assemble the context, read the situation, and make sense of where we are. The decision that follows is the selection of an action, based on that orientation.

We don’t just recognize a pattern—we recognize its relevance, its implications, and then choose from possible responses. That’s not push-button intelligence. That’s the cycle of judgment. And every action we take alters the world just a bit. The next time around, the observation is really an assessment of what just happened—how the world responded to our last move. Only then do we re-orient, re-decide, and act again.

This is where Prolog becomes useful, not in recognizing what’s happening, but in selecting the action to take once we know. Once we’ve taken machine-learned correlations far enough—refined them through A/B testing or meticulous elucidation by subject matter experts (SME), sliced them by context, stress-tested them in production—we may reach the point where they’re practical enough to encode as rules. When that happens, Prolog is how we embed that decision logic. It’s how we say: In this context, do this.

We’re still operating under uncertainty, of course—but at least we’re doing so in a way that’s declarative, explainable, and repeatable. That’s how we evolve from raw data to structured intelligence: not by eliminating uncertainty, but by building systems that can re-orient and re-decide—loop by loop.

In the context of event-based systems, this distinction becomes foundational. An event isn’t just “what happened.” It’s the product of a recognition-action loop. The system saw something—either through sensors, data, or user input—and then chose to do something in response. To model this properly, each event must encode both sides: what it recognized (the state, the properties), and how the action was selected (the logic, the rule, the policy). In human systems, that logic is often subconscious. In machine systems, it should be Prolog—or at least a reference to it.

By structuring events this way, we allow ourselves to analyze not just sequences, but decision cycles. And we can start to assess not just what happened, but whether the decision-making process was sound—across agents, across time, and across environments.

It’s also worth noting that recognition doesn’t always require extensive, complex logic—it can be straightforward, even rote and mechanical. A sensor reads a humidity level. A camera detects motion. These recognitions might be automatic, pattern-based, or statistical. But the action that follows often demands more nuanced intelligence and scrutiny.

What About ML Model Versioning?

You might be thinking: “Don’t we already store the version of the machine learning model that triggered an action—like recommending a coupon?” And the answer is yes. In any reasonably mature MLOps pipeline, versioning models is a best practice. When a model makes a decision, we typically log:

The model version (e.g., “retention_model_v3.1.4”)
The features or input data used
Sometimes the prediction confidence
Occasionally, metadata about the training set

What I’m proposing isn’t just about versioning. It’s about capturing the logic—explicitly, transparently, and in a machine-readable way.

Prolog is a declarative language built for logic, for encoding how and why a decision is made—not just the fact that a decision occurred.
Prolog rules are modular—you can import them, extend them, and compose them from smaller snippets.
You can even create mapping rules like:
- map_segment(high_value, loyalty_score, X) :- X > 800.
You can version Prolog files just like models, and store a pointer to the exact rule version that triggered an event—along with any additional facts used but not captured in the event log.

As I mentioned earlier, I discuss Prolog model versioning in Deductive Time Travel.

To illustrate what this might look like in practice, here’s an example of a typical event payload from an IoT edge device—something that might come from a factory floor sensor or a smart HVAC unit. These payloads usually include sensor readings, metadata about the device, timestamps, and sometimes even the results of local machine learning inference. But critically, imagine this payload also includes a reference to the Prolog rule or decision logic that triggered the event. By embedding or linking that logic—just like we do with model versions—we give future systems a chance to understand not just what happened, but why.

{
   "device_id": "pump-xyz-4321",
   "location": "plant-5-zone-a",
   "timestamp": "2025-03-29T13:47:00Z",
   "firmware_version": "3.4.1",
   "sensor_data": {
   "vibration_level": 0.05,
   "temperature": 87.2,
   "humidity": 34
},
   "event_type": "anomaly_detected",
   "trigger_rule_id": "rule-145-failure-threshold",
   "inference_result": "bearing_wear_likely",
   "confidence_score": 0.91,
   "model_version": "v5.0.2",
   "reasoning": "vibration_level exceeded threshold + temp rising",
   "prolog_reasoning_id": "vib_threshold_v3.1.4"
}

That Prolog referenced as “vib_threshold_v3.1.1.4” might look something like this following code, with reasoning beyond typically meeting thresholds:

% Version: vib_threshold_v3.1.1.4
% Thresholds and logic weights (tunable per domain)
threshold(vibration_level, 0.04).
threshold(temperature, 85.0).
minimum_severity(bearing_wear_likely, 0.8).

% Facts for current sensor readings
sensor_reading(pump_xyz_4321, vibration_level, 0.05).
sensor_reading(pump_xyz_4321, temperature, 87.2).

% Optional historical or derived facts
trend(pump_xyz_4321, temperature, rising).
trend(pump_xyz_4321, vibration_level, steady).
device_status(pump_xyz_4321, aging_components).
recent_alert(pump_xyz_4321, overheating_warning).

% Compute severity score based on multiple criteria
severity_score(pump_xyz_4321, bearing_wear_likely, Score) :-
    sensor_reading(pump_xyz_4321, vibration_level, V),
    threshold(vibration_level, VT), V > VT,
    sensor_reading(pump_xyz_4321, temperature, T),
    threshold(temperature, TT), T > TT,
    trend(pump_xyz_4321, temperature, rising),
    (   recent_alert(pump_xyz_4321, overheating_warning) -> HeatRisk = 0.2 ; HeatRisk = 0.0 ),
    (   device_status(pump_xyz_4321, aging_components) -> AgingRisk = 0.1 ; AgingRisk = 0.0 ),
    BaseScore is (V - VT) * 5 + (T - TT) * 0.5 + HeatRisk + AgingRisk,
    Score is min(BaseScore, 1.0).

% Rule to infer anomaly with severity check
anomaly(bearing_wear_likely, pump_xyz_4321) :-
    severity_score(pump_xyz_4321, bearing_wear_likely, S),
    minimum_severity(bearing_wear_likely, Min),
    S >= Min.

ML + Prolog: The Best of Both Worlds

ML is great at automatically (maybe semi-automatically) surfacing correlations, measuring thresholds and generating candidate rules. But once a rule becomes stable—after enough A/B testing, SME validation, or statistical confidence—we can encode it in Prolog as decision logic. I explore this idea in more depth in Part 4 of this series, where I walk through how ML models—like decision trees or rule-based learners—can be transformed into Prolog rules. This allows the system to move from probabilistic inference to explicit, declarative reasoning.

Shared Library of Prolog Rules

If Prolog were to play a serious role in triggering and explaining events, we’ll need more than isolated rules tucked away in silos. Just as the Semantic Web proposes using globally unique identifiers (URIs) to create an interoperable layer of shared concepts and relationships, we can imagine a similar approach for decision logic—a library of reusable, referenceable Prolog rules.

Each rule, or logic module, could have a globally unique ID—versioned, documented, and shared across systems. That means when an event is triggered, we’re not just storing opaque logic—we’re storing a pointer to a known, inspectable reasoning path.

This idea aligns with what I explored for neural networks in Machine Learning Models Embedded in Knowledge Graphs, where neural networks can be referenced and reasoned about as nodes in a knowledge graph. The same could apply to Prolog rules: instead of being buried inside services, they can become first-class knowledge assets—discoverable, reusable, and linked to other components in an enterprise knowledge graph.

By treating logic as data—indexable, inspectable, shareable—we open the door to a new level of interoperability, where decisions can be understood not just in local context, but across domains and organizations.

Conclusion: Store the What, But Also the Why

As we move further into an event-driven world—where systems don’t just store snapshots, but flows of behavior—it’s no longer enough to record what happened. If we truly want to understand causation, we need to record why it happened.

That’s why every event should carry three key things:

The ID of the Prolog rule that triggered the event (a stable reference to the logic in play).
Version history of the Prolog.
The metadata about the event context (ex. environment, system state, user role)
The properties or facts that were actually used in the decision process—even if they weren’t originally logged

This gives us more than just a timeline of outcomes—it gives us a traceable, testable narrative of reasoning. One we can audit, analyze, simulate, or improve.

Whether triggered by AI agents, rules engines, or human-in-the-loop systems, every meaningful action is rooted in logic—even if that logic is invisible. Attaching that logic to the event itself is how we move from logging behavior… to understanding it.

And that’s how we get closer to causation.

Closer to Causation – Prolog in the LLM Era – Spring Break Special

When Do We Have Causation?

Correlation Example

Why You Can’t Always Learn the Rules from Data

Deductive Time Travel

Prolog

Decoupled Recognition and Action

What About ML Model Versioning?

ML + Prolog: The Best of Both Worlds

Shared Library of Prolog Rules

Conclusion: Store the What, But Also the Why

Published by Eugene

Leave a comment Cancel reply

When Do We Have Causation?

Correlation Example

Why You Can’t Always Learn the Rules from Data

Deductive Time Travel

Prolog

Decoupled Recognition and Action

What About ML Model Versioning?

ML + Prolog: The Best of Both Worlds

Shared Library of Prolog Rules

Conclusion: Store the What, But Also the Why

Share this:

Related

Published by Eugene

Leave a comment Cancel reply