Writing custom generic data tests
dbt ships with Not Null, Unique, Relationships, and Accepted Values generic data tests. (These used to be called "schema tests," and you'll still see that name in some places.) Under the hood, these generic data tests are defined as test
blocks (like macros).
There are tons of generic data tests defined in open source packages, such as dbt-utils and dbt-expectations — the test you're looking for might already be here!
Generic tests with standard arguments
Generic tests are defined in SQL files. Those files can live in two places:
tests/generic/
: that is, a special subfolder namedgeneric
within your test paths (tests/
by default)macros/
: Why? Generic tests work a lot like macros, and historically, this was the only place they could be defined. If your generic test depends on complex macro logic, you may find it more convenient to define the macros and the generic test in the same file.
To define your own generic tests, simply create a test
block called <test_name>
. All generic tests should accept one or both of the standard arguments:
model
: The resource on which the test is defined, templated out to its relation name. (Note that the argument is always namedmodel
, even when the resource is a source, seed, or snapshot.)column_name
: The column on which the test is defined. Not all generic tests operate on the column level, but if they do, they should acceptcolumn_name
as an argument.
Here's an example of an is_even
schema test that uses both arguments:
{% test is_even(model, column_name) %}
with validation as (
select
{{ column_name }} as even_field
from {{ model }}
),
validation_errors as (
select
even_field
from validation
-- if this is true, then even_field is actually odd!
where (even_field % 2) = 1
)
select *
from validation_errors
{% endtest %}
If this select
statement returns zero records, then every record in the supplied model
argument is even! If a nonzero number of records is returned instead, then at least one record in model
is odd, and the test has failed.
To use this generic test, specify it by name in the tests
property of a model, source, snapshot, or seed:
With one line of code, you've just created a test! In this example, users
will be passed to the is_even
test as the model
argument, and favorite_number
will be passed in as the column_name
argument. You could add the same line for other columns, other models—each will add a new test to your project, using the same generic test definition.
Add description to generic data test logic
You can add a description to the Jinja macro that provides the core logic for a data test by including the description
key under the macros:
section. You can add descriptions directly to the macro, including descriptions for macro arguments.
Here's an example:
macros:
- name: test_not_empty_string
description: Complementary test to default `not_null` test as it checks that there is not an empty string. It only accepts columns of type string.
arguments:
- name: model
type: string
description: Model Name
- name: column_name
type: string
description: Column name that should not be an empty string
In this example:
- When documenting custom test macros in a
schema.yml
file, add thetest_
prefix to the macro name. For example, if the test block's name isnot_empty_string
, then the macro's name would betest_not_empty_string
. - We've provided a description at the macro level, explaining what the test does and any relevant notes.
- Each argument (like
model
,column_name
) also includes a description to clarify its purpose.
Generic tests with additional arguments
The is_even
test works without needing to specify any additional arguments. Other tests, like relationships
, require more than just model
and column_name
. If your custom tests requires more than the standard arguments, include those arguments in the test signature, as field
and to
are included below:
{% test relationships(model, column_name, field, to) %}
with parent as (
select
{{ field }} as id
from {{ to }}
),
child as (
select
{{ column_name }} as id
from {{ model }}
)
select *
from child
where id is not null
and id not in (select id from parent)
{% endtest %}
When calling this test from a .yml
file, supply the arguments to the test in a dictionary. Note that the standard arguments (model
and column_name
) are provided by the context, so you do not need to define them again.
Generic tests with default config values
It is possible to include a config()
block in a generic test definition. Values set there will set defaults for all specific instances of that generic test, unless overridden within the specific instance's .yml
properties.
{% test warn_if_odd(model, column_name) %}
{{ config(severity = 'warn') }}
select *
from {{ model }}
where ({{ column_name }} % 2) = 1
{% endtest %}
Any time the warn_if_odd
test is used, it will always have warning-level severity, unless the specific test overrides that value:
Customizing dbt's built-in tests
To change the way a built-in generic test works—whether to add additional parameters, re-write the SQL, or for any other reason—you simply add a test block named <test_name>
to your own project. dbt will favor your version over the global implementation!
{% test unique(model, column_name) %}
-- whatever SQL you'd like!
{% endtest %}
Examples
Here's some additional examples of custom generic ("schema") tests from the community: