Trust in a dashboard erodes one wrong number at a time. You never get a single dramatic failure. You get a tile that says €4.1m when finance is sure it is €4.3m, a deal that shows twice, a contact whose company is suddenly "Unknown ltd". After three of those, the team starts double-checking the dashboard against the source, and once that happens the dashboard is dead. We have written about that fate elsewhere.

The fix is not heroic data engineering. It is a small set of checks that run on every refresh, and a tile that surfaces their results before anyone else does. Here are the six we run by default.

The six

1. Row-count drift

For every source table, compare row count to the same day last week. A drop of more than ten percent or a spike of more than fifty percent triggers a flag. This catches the most common silent failure: a broken sync that returns zero rows and looks healthy because the warehouse table is technically up to date.

2. Null spikes on must-have fields

For every must-have column (id, amount, owner, stage), the null rate should be roughly stable across days. A null spike almost always means a schema change upstream that nobody mentioned. We log the rate daily and flag a doubling.

3. Foreign-key orphans

Every join is a possible orphan. A deal whose company_id points at a deleted company. An invoice whose customer no longer exists. We count orphans per join and flag any count that grows by more than ten across a refresh.

4. Duplicate detection

Three signals together: same email, same company, same created_at within a minute. Almost always a real duplicate. One signal alone is not enough; people legitimately share emails and create accounts in batches.

5. Reconciliation totals

For every aggregate that appears on a dashboard, compute the same number from the source system's native report and compare. Tolerance under one percent. This is the check that finance teams care about more than any other, and the one we run last because it depends on every other check passing.

6. Staleness

Every source has a service-level promise (every fifteen minutes, every hour, daily). If the most recent record is older than two times the promise, we flag. This is more important than people realise. Stale data does not show as wrong, it shows as boring, and boring data is what gets ignored.

Where the results live

The dashboard gets a small status pill at the top right. Green means all six passed at the last refresh. Yellow means one or two flagged but the dashboard is still trustworthy. Red means do not trust this number until someone has looked at it.

Clicking the pill opens the data quality page, which lists every check, its current result, and the timestamp it ran. Engineers read this page first when something looks off.

The checks we choose not to run

Two categories of check feel responsible but quietly waste time.

  • Type validation. The warehouse already enforces types. Running schema checks downstream of the warehouse is theatre.
  • Per-row validation rules. Tempting (an amount cannot be negative, an email must contain an @). In practice these flag a thousand legitimate edge cases for every one real problem. We replace them with aggregate checks (the share of negative amounts must be under one percent) which catch the same problems with a fraction of the noise.

How a check graduates

Every quarter, we look at the checks that have not fired in ninety days and ask whether they should retire. A check that has never flagged is either watching a stable corner of the data, or watching nothing. We err on the side of retiring it. The fewer checks, the more weight each one carries.

In return, anything that actually broke trust in a dashboard becomes a check the next week. Real incidents are the only reliable source of which checks earn their keep.