FAQ: Handling Client-Side Calculations in the Age of AI

As organizations integrate AI systems with their analytics platforms, a long-standing issue in Business Intelligence is becoming more serious. The issue used to be just Client-side calculations and analyst-generated data that lack traceability.

Now, AI agents (perhaps vastly outnumbering human analysts) can generate values without traceability. The methods an AI agent uses to calculate a value is as “personal” and distinct to the AI agent as a human analyst’s “private” calculation in Tableau or an Excel spreadsheet.

When these human or AI artifacts appear in presentations, spreadsheets, or reports, we cannot determine:

where the data came from
how it was derived
who created it
when it expires
whether it can be trusted

This FAQ summarizes the current thinking (written March 5, 2026) and emerging best practices for handling this problem.

Q1: What is the underlying problem?

Modern BI systems allow human analysts and AI agents to create calculations directly. For human analysts that means in client tools such as dashboards, spreadsheets, or notebooks. For AI agents, that is their chain-of-thought or RAG processes.

Examples include:

calculated measures in dashboards and visualization tools
spreadsheet transformations after data export
derived metrics created in ad-hoc SQL queries
local joins or filters applied outside official pipelines

These transformations occur outside the governed data pipeline. See Q4 – Shadow Metrics.

As a result, downstream systems, including AI agents, cannot trace how the numbers were produced. This creates a risk that unverified or outdated artifacts become inputs to automated reasoning systems.

Q2: Why does this become more dangerous with AI systems?

Traditional BI reports were interpreted by humans who, through a great deal of nuanced and in-context experience, question unusual numbers. AI agents, at the time of writing, probably don’t possess nuanced knowledge that exists in the heads of humans who are physically entrenched in the real world.

AI systems, however, may:

ingest analytics outputs automatically
incorporate them into reasoning or predictions
propagate errors into other systems

Without traceability, AI cannot determine whether a dataset is:

official
experimental
outdated
or fabricated.

In other words, the AI lacks data provenance. See my blog, AI Agents, Context Engineering, and Time Molecules.

Q3: What are “data lineage” and “data provenance”?

These two concepts are essential to modern data governance.

Data lineage: Tracks the path of data through systems and transformations.

Example:

			
CRM → ETL pipeline → warehouse table → semantic model → dashboard metric

However, a piece of data can go through transformations within each step. For example, ETL pipelines might apply multiple stages of transformations.

Data provenance: Records the origin, authorship, and methodology of a dataset or transformation.

Example metadata:

			
created_by: analyst_42
tool: Power BI
formula: revenue / active_users
timestamp: 2026-03-04

Lineage explains how data flows, while provenance explains who created it, how, why, and when.

Both are necessary for trustworthy AI.

Q4: What are “shadow metrics” or “shadow transformations”?

A shadow transformation occurs when a calculation is created outside official data pipelines. Examples include (same as Q1):

dashboard-level calculated fields
spreadsheet formulas applied to exported data
manual corrections applied by analysts
ad-hoc metrics created in notebooks

These calculations may be perfectly valid—but without metadata they become opaque artifacts. If these artifacts enter AI systems, the model cannot verify them and will … ugh … make assumptions.

Q5: Should analysts be forced to centralize all calculations?

In practice, this is neither feasible nor desirable. Analysts need the freedom to explore data and create experimental metrics. The real requirement is not centralization, but discoverability. Any transformation that influences decision-making should be:

machine-discoverable
documented through metadata
traceable through lineage systems.

Exploration artifacts (scratch measures, what-if spreadsheets) are allowed—but must be tagged experimental or non-authoritative. Decision artifacts (anything shown to leadership / used by agents) must have provenance + owner + freshness.

Excel Hell

In the old days, before the days of centralized BI, there was the term, “Excel Hell”. It means that information presented in a report, email, presentation, or even verbally came from one of thousands of Excel spreadsheets on any given person’s desktop.

A solution was to automatically sweep all spreadsheets on all desktops, extracting metadata, formulas, and whatever other objects of value. Of course, people were modifying spreadsheets every day, so it was immediately out of date.

Then came Excel Services in SharePoint that at least placed the Excel spreadsheets on a central repository and enabled editing by multiple users.

Q6: What are the current best practices?

1. Treat the semantic layer as part of the data pipeline

Metric definitions should exist in a centralized semantic layer rather than scattered across dashboards. This ensures that:

metrics have canonical definitions
transformations are versioned
lineage can be tracked.

2. Capture column-level lineage

Modern lineage systems track how individual columns are derived. Example:

			
profit_margin =
(revenue - cost) / revenue

The lineage system records this transformation so downstream systems can understand it.

3. Automatically ingest BI (Semantic Layer) metadata

Governance platforms increasingly parse metadata from BI tools to capture:

calculated measures
dashboard dependencies
query logic
dataset usage.

This prevents transformations from disappearing inside dashboards.

4. Maintain an enterprise-wide data catalog

A data catalog provides searchable metadata describing:

datasets
transformations
owners
freshness guarantees
dependencies.

This allows both humans and AI systems to discover the context behind data artifacts.

5. Assign data ownership

Every dataset and metric should have a designated owner responsible for:

maintaining definitions
validating transformations
updating expiration policies.

Ownership is a core principle of modern data governance.

Consider a data mesh approach to assign this accountability to the data producers.

6. Version analytics artifacts

Metrics and derived datasets should be versioned similarly to software.

Example:

			
metric: customer_lifetime_value
version: v3
definition: ...
owner: analytics_team

Versioning prevents silent changes from propagating through dashboards and AI systems.

Q7: What metadata should every dataset contain?

A useful minimal metadata schema includes:

			
origin
transformations
author
timestamp
data owner
freshness SLA
dependencies
trust classification

		

This metadata allows automated systems to reason about whether data is safe to use.

Q8: How does this relate to “context engineering” for AI?

LLMs and AI agents rely heavily on contextual data. If that context contains undocumented or untraceable metrics, the model may generate confident but incorrect conclusions. A reliable AI system therefore requires trusted context pipelines.

These pipelines ensure that:

context is derived from traceable sources
transformations are documented
datasets meet freshness requirements.

Q9: What is the long-term architectural direction?

Many organizations are moving toward a model where analytics artifacts are treated as first-class knowledge objects.

These objects contain both data metadata describing their origin and transformations. In effect, each artifact carries its own data bill of materials.

This approach allows AI systems to inspect and evaluate the reliability of the information they consume.

Q10: What is the key principle to remember?

The issue is not that analysts create new knowledge artifacts. The problem arises when those artifacts are stripped of their provenance. For AI systems to reason safely, the artifacts produced by human analysts must remain traceable, inspectable, and governed.

Without that, automated systems risk building conclusions on foundations that cannot be verified.

Q11: Should we think of analytics pipelines as a “knowledge supply chain”?

One useful way to understand the governance challenge is to think of modern analytics systems as knowledge supply chains. Traditional supply chains track the origin and transformation of physical goods:

			
raw materials
→ manufacturing
→ assembly
→ distribution
→ finished product

		

Modern analytics environments follow a similar pattern, except the outputs are knowledge artifacts rather than physical products.

			
raw data
→ transformation pipelines
→ semantic models
→ BI metrics
→ dashboards
→ AI inference

		

Each stage produces new artifacts derived from earlier ones. Examples include:

derived metrics
calculated measures
statistical summaries
forecasts
narrative reports.

These artifacts are then consumed by humans, applications, or AI systems.

Why supply-chain thinking matters

If a manufacturing company loses track of where a component came from, it creates serious risks:

quality defects
counterfeit parts
regulatory violations
product recalls.

The same principle applies to analytics systems. If the provenance of a metric or dataset cannot be determined, downstream systems cannot evaluate its reliability. When AI systems consume such artifacts, they may unknowingly propagate:

outdated calculations
undocumented assumptions
experimental metrics
fabricated data.

In effect, the knowledge supply chain has been broken.

The role of metadata in a knowledge supply chain

To maintain integrity, each artifact in the pipeline must carry metadata describing:

			
origin
transformations
author
timestamp
dependencies
trust classification

		

This metadata allows systems to answer questions such as:

Where did this metric originate?
Which transformations produced it?
Who is responsible for maintaining it?
Is it still valid?

In other industries, this kind of traceability is sometimes called a bill of materials.

Analytics systems increasingly require something similar — a data bill of materials.

Why this matters for AI systems

AI agents often operate on aggregated knowledge artifacts rather than raw data. Examples include:

curated datasets
dashboards
metric summaries
knowledge graph entries.

If those artifacts lack traceability, the model cannot determine whether they are trustworthy. This means the reliability of AI reasoning increasingly depends on the integrity of the knowledge supply chain.

The long-term implication

As organizations deploy more automated reasoning systems, analytics platforms will gradually evolve from:

data processing systems

into:

knowledge production systems

In this environment, BI tools are not merely generating reports. They are producing artifacts that become part of the organization’s machine-readable knowledge base. Ensuring those artifacts remain traceable and well-documented is therefore essential for building reliable AI systems.

Q12: How should an AI agent cite/attach provenance when it outputs a number?

Output should carry a “data bill of materials” pointer: dataset IDs + metric version + query hash + timestamp + confidence/trust classification.

There is not yet one universal standard for data artifacts to carry provenance. but OpenLineage, W3C PROV, Frictionless Data Package, and emerging data contract standards are the main pieces converging toward that future. Here are a few standards worth investigating:

OpenLineage — probably the closest thing to an OpenTelemetry-style standard for lineage events around datasets, jobs, and runs.
URL: https://openlineage.io/getting-started/
Facets/extensibility: https://openlineage.io/docs/spec/facets/
W3C PROV / PROV-DM — the formal standard for provenance modeling, centered on entities, activities, and agents.
URL: https://www.w3.org/TR/prov-dm/
Overview: https://www.w3.org/TR/prov-overview/
Frictionless Data Package — a practical spec for a dataset or collection of files to carry a portable metadata envelope in JSON.
URL: https://specs.frictionlessdata.io/data-package/
Guide: https://specs.frictionlessdata.io/guides/data-package/
Open Data Contract Standard (ODCS) — an emerging open standard for data contracts, including schema and operational expectations.
URL: https://bitol-io.github.io/open-data-contract-standard/v3.1.0/
Repo: https://github.com/bitol-io/open-data-contract-standard
Data Contract Specification — another open initiative in the same space, more in the style of OpenAPI for data.
URL: https://github.com/datacontract/datacontract-specification