Two dbt incremental anti-patterns (and what to do instead)

Two dbt incremental anti-patterns (and what to do instead) image

Please stop using these

I keep running into the same two anti-patterns with dbt incremental models. They kind of work, until they don’t. Both cases are really about using incremental models for things they weren’t designed to do, and there's no good reason. I get why people reach for them, but I've had to explain why I think they are bad ideas so often now a precedent has been set in our project, that I wanted to describe the problems with them, and the (much) better alternatives, in a resource I can share next time someone takes this approach.

1. “Build it once and never again” incremental models

The pattern:

You create an incremental model, but your logic ensures it never actually runs incrementally after the initial run. Something like:

{% if is_incremental() %}
where false
{% endif %}

The idea is simple:

This table should only ever be built once.

Why people do this

  • They want to avoid rebuilding a static dataset
  • They don’t want to waste compute
  • They still want the model inside dbt for lineage/docs etc

These are totally reasonable goals, that's not the issue, it's more that this misuse of incremental models, can be confusing, error prone, and afaik, they will still count as a successful model build for dbt Cloud customers (although maybe not for Fusion users). There are also simpler alternatives that I see no reason not to use instead.

Incremental models are supposed to incrementally update data. This one doesn’t. Anyone reading it later may well be confused. If they do a dbt run for example, they expect an incremental model to, well, increment. Instead, it will silently do nothing. Sure, if you know a bit about dbt, you'll probably work it out, sometimes there might even be a helpful comment, but I wouldn't bet on it.

It's also an accident waiting to happen. I've seen people mess up the call to is_incremental, resulting in unpredictable behaviour. In one case, I've seen a model that ran daily for about nine months appending the same million rows again and again resulting in a table that just got bigger and bigger. Worse still, the owner of a downstream model, having noticed their model contain duplicates, used DISTINCT to fix their model, resulting in a runtime that just grew and grew.

What to do instead

There are a couple of better options depending a bit on what you're trying to achieve.

Option A: Do the transformation once outside of dbt

  • Run the transformation once (SQL, Python, DuckDB, whatever)
  • Store the result in your warehouse
  • Save the transformation logic in version control
  • Register it as a source in dbt

This means there's no wasted compute, the source is still usable in your DAG, and can still be tested and documented just as a model would be. I've seen cases where the model doesn't even transform the data at all, in which case all the model achieves is re-inventing the concept of a dbt source, and adds an unnecessary node in the lineage.

This approach also allow you to leverage cheaper compute than whatever target your dbt project is using. Storage is almost always cheaper than compute, so it shouldn't be prohibitive to keep a copy of both the before, and after datasets. It's also a good idea to store the transformation logic used in version control somewhere, especially if it's complicated, or likely to be required in the future. Even if it isn't, it never hurts to keep track of this in git.

I think this an example of the old "if the only tool you have is a hammer, everything looks like a nail" proverb. A tool like dbt is often the only one an analyst / engineer has, or thinks they have available, so every transformation looks like it needs a dbt model, so it might be necessary to engage with someone with more control and/or expertise in these cases.

Option B: Just use a view

{{ config(materialized='view') }}

Yes, it rebuilds every time it is run. But:

  • It’s fast
  • It’s explicit
  • It’s predictable
  • It won’t surprise anyone later

This can be a useful alternative, particular if the size of the data is relatively small, and the cost of whatever transformation is done is relatively low. For static data, Option A is almost always a better approach, especially if no other models require the pre-transformed data, but this is a serviceable approach for many of the use cases I've seen if your only option is to do this in dbt.

2. Using incremental models to track changes over time

The pattern:

Using an incremental model to append rows whenever data changes, effectively trying to build a history table.

Why people do this

  • They want to track changes over time
  • They want something like a slowly changing dimension
  • Incremental feels like the right tool because “it adds rows”

Why it’s an anti-pattern

Again, this is a common, and reasonable goal, but this isn't the way to do it. This one is more subtle, but even more likely to create a mess. Incremental models don’t handle updates or deletes well, backfills become painful, or impossible, and more to the point, this is basically what a dbt snapshot is for, so why re-invent the wheel?

I've often seen people use incremental logic to essentially re-append every record that exists at source on every run. This seems like a good idea, but then requires custom logic to grab the correct record in a given context. While this seems feasible, it's easy to get wrong, and edge cases (deletes, late arriving data) pile up fast.

If you ever need to fix historical logic, incremental models make it hard, or impossible to safely recompute the past. A full refresh will delete the history altogether, so people try to workaround this by adding full_refresh = false. Again, this works, for a while, but you're one errant PR away from blitzing the history. Your data platform team might be able to get this data back for you, but they won't be happy about it.

What to do instead

This one is simple, use dbt snapshots. That’s literally what they’re for:

  • Track changes over time
  • Handle updates cleanly
  • Maintain a reliable history

So if your mental model is:

“I want to know how this row changed over time”

→ you want a snapshot, not an incremental model.

The bigger takeaway

Incremental models are great, but they’re not a general-purpose tool. They’re a performance optimization for a specific class of problems:

“I can safely process only new or changed data, and I don’t need full historical re-computation every time.”

When you step outside that boundary, things get weird.

Final thoughts

Most of these anti-patterns come from good instincts:

  • Avoid wasted compute
  • Keep things in dbt
  • Be efficient

But bending tools into shapes they weren’t meant for usually creates more problems than it solves.

A few simple rules of thumb

  • For static data, precompute it outside dbt, or use a different materialisation type for smaller data
  • To track history over time, use a snapshot
  • If a full refresh is going to mess up table, you don't want to use an incremental model