Fundamentals of Data Validation with pointblank

pointblank

This article provides a overview of the core data validation features in pointblank.

Published

February 26, 2026

This article provides a overview of the core data validation features in pointblank. It introduces the key concepts and shows examples of the main functionality, giving you a foundation for using the package effectively.

Validation Rules

pointblank’s core functionality revolves around validation steps, which are individual checks that verify different aspects of your data. These steps are created by calling validation functions. When combined with create_agent() they create a comprehensive validation plan for your data.

Here’s an example of a validation that incorporates three different validation methods:

agent <- create_agent(tbl = small_table) %>%
  col_vals_gt(columns = a, value = 0) %>%
  rows_distinct() %>%
  col_exists(columns = date) %>%
  interrogate()

agent

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	S	N	EXT
Pointblank Validation
[2026-02-26\|17:55:03] tibble small_table
	1	`col_vals_gt()`	`▮a`	`0`	✓	`13`	`13` `1`	`0` `0`	—	—	—	—
	2	`rows_distinct()`	`▮date_time,` `▮date,` `▮a,` `▮b,` `▮c,` `▮d,` `▮e,` `▮f`	—	✓	`13`	`11` `0.84615`	`2` `0.15385`	—	—	—
	3	`col_exists()`	`▮date`	—	✓	`1`	`1` `1`	`0` `0`	—	—	—	—
2026-02-26 17:55:03 GMT < 1 s 2026-02-26 17:55:03 GMT

This example showcases how you can combine different types of validations in a single validation plan:

a column value validation with col_vals_gt()
a row-based validation with rows_distinct()
a table structure validation with col_exists()

Most validation methods share common parameters that enhance their flexibility and power. These shared parameters create a consistent interface across all validation steps while allowing you to customize validation behavior for specific needs.

The next few sections take you through the most important ways in which you can customize your validation plans.

Column Selection Patterns

You can apply the same validation logic to multiple columns at once through use of column selection patterns (used in the columns argument). This reduces repetitive code and makes your validation plans more maintainable.

agent <- create_agent(tbl = small_table) %>%
  col_vals_gte(columns = c(c, d), value = 0) %>%
  col_vals_not_null(columns = starts_with("d")) %>%
  interrogate()

agent

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	S	N	EXT
Pointblank Validation
[2026-02-26\|17:55:04] tibble small_table
	1	`col_vals_gte()`	`▮c`	`0`	✓	`13`	`11` `0.84615`	`2` `0.15385`	—	—	—
	2	`col_vals_gte()`	`▮d`	`0`	✓	`13`	`13` `1`	`0` `0`	—	—	—	—
	3	`col_vals_not_null()`	`▮date_time`	—	✓	`13`	`13` `1`	`0` `0`	—	—	—	—
	4	`col_vals_not_null()`	`▮date`	—	✓	`13`	`13` `1`	`0` `0`	—	—	—	—
	5	`col_vals_not_null()`	`▮d`	—	✓	`13`	`13` `1`	`0` `0`	—	—	—	—
2026-02-26 17:55:04 GMT < 1 s 2026-02-26 17:55:04 GMT

This technique is particularly valuable when working with wide datasets containing many similarly-structured columns or when applying standard quality checks across an entire table. Details about the column selection helpers can be found in the tidyselect package. Making use of column selection patterns also ensures consistency in how validation rules are applied across related data columns.

To validate row-wise relationships between columns, you can use the vars() function to reference columns. With this you can, for example, validate that values in one column are greater (or less) than values in another column.

agent <- create_agent(tbl = small_table) %>%
  col_vals_gte(columns = c(c, d), value = vars(a)) %>%
  col_vals_between(columns = a, left = 0, right = vars(c)) %>%
  interrogate()

agent

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	S	N	EXT
Pointblank Validation
[2026-02-26\|17:55:04] tibble small_table
	1	`col_vals_gte()`	`▮c`	`~a`	✓	`13`	`7` `0.53846`	`6` `0.46154`	—	—	—
	2	`col_vals_gte()`	`▮d`	`~a`	✓	`13`	`13` `1`	`0` `0`	—	—	—	—
	3	`col_vals_between()`	`▮a`	`[0,` `▮c]`	✓	`13`	`7` `0.53846`	`6` `0.46154`	—	—	—
2026-02-26 17:55:04 GMT < 1 s 2026-02-26 17:55:04 GMT

Preprocessing

Preprocessing (with the preconditions argument) allows you to transform or modify your data before applying validation checks, enabling you to validate derived or modified data without altering the original dataset. There is no need to create multiple validation plans for different transformations of the original data.

agent <- create_agent(tbl = small_table) %>%
  col_vals_gt(
    columns = a_transformed,
    value = 5,
    preconditions = ~ . %>% dplyr::mutate(a_transformed = a * 2)
  ) %>%
  col_vals_lt(
    columns = d,
    value = 1000,
    preconditions = ~ . %>% dplyr::filter(date > "2016-01-15")
  ) %>%
  interrogate()

agent

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	S	N
Pointblank Validation
[2026-02-26\|17:55:04] tibble small_table
	1	`col_vals_gt()`	`▮a_transformed`	`5`	✓	`13`	`9` `0.69231`	`4` `0.30769`	—	—	—
	2	`col_vals_lt()`	`▮d`	`1000`	✓	`6`	`4` `0.66667`	`2` `0.33333`	—	—	—
2026-02-26 17:55:04 GMT < 1 s 2026-02-26 17:55:04 GMT

Preprocessing enables validation of transformed data without modifying your original dataset, making it ideal for checking derived metrics, or validating normalized values. This approach keeps your validation code clean while allowing for sophisticated data quality checks on calculated results.

More complex preprocessing can be applied through custom functions, rather than inlined via ananymous functions as shown above. You can also use the preconditions argument to subset your data to specific rows before applying validation checks. However, a consise way of doing this is illustrated in the next section.

Segmentation

Segmentation (through the segments argument) allows you to validate data across different groups, enabling you to identify segment-specific quality issues that might be hidden in aggregate analyses.

You can segment

by all unique values in a column, e.g., segments = vars(f)
by only specific values in a column, e.g., segments = f ~ c("low", "high")
by multiple columns, e.g., segments = list(vars(f), a ~ c(1, 2))

You can also segment in conjunction with preprocessing, allowing you to segment based on derived or modified data.

agent <- create_agent(tbl = small_table) %>%
  col_vals_gt(
    columns = d,
    value = 100,
    preconditions = . %>%
      dplyr::mutate(a_category = dplyr::if_else(a > 5, "high", "low")),
    segments = vars(a_category)
  ) %>%
  interrogate()

agent

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	S	N	EXT
Pointblank Validation
[2026-02-26\|17:55:04] tibble small_table
	1	`col_vals_gt()`	`▮d`	`100`	✓	`10`	`10` `1`	`0` `0`	—	—	—	—
	2	`col_vals_gt()`	`▮d`	`100`	✓	`3`	`3` `1`	`0` `0`	—	—	—	—
2026-02-26 17:55:04 GMT < 1 s 2026-02-26 17:55:04 GMT

Thresholds

Thresholds (set through the actions argument) provide a nuanced way to monitor data quality, allowing you to set different severity levels based on the importance of each validation and your organization’s tolerance for specific types of data issues.

Thresholds can be set for three different levels: warnings, errors, and critical notifications. They can be specified as either relative proportions of failing test units or absolute numbers of failing test units.

agent <- create_agent(tbl = small_table) %>%
  col_vals_gt(
    columns = vars(a),
    value = 1,
    actions = action_levels(warn_at = 0.1, stop_at = 0.2, notify_at = 0.3)
  ) %>%
  col_vals_lt(
    columns = vars(c),
    value = 10,
    actions = action_levels(warn_at = 1, stop_at = 2)
  ) %>%
  interrogate()

agent

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	S	N
Pointblank Validation
[2026-02-26\|17:55:04] tibble small_table
	1	`col_vals_gt()`	`▮a`	`1`	✓	`13`	`12` `0.92308`	`1` `0.07692`	●	○	○
	2	`col_vals_lt()`	`▮c`	`10`	✓	`13`	`11` `0.84615`	`2` `0.15385`	●	●	—
2026-02-26 17:55:04 GMT < 1 s 2026-02-26 17:55:04 GMT

Apart from using the agent’s results table to visually inspect the outcomes of your validation steps, you can also programmatically access information about the results through the so-called x-list, e.g., which validation steps crossed the stop threshold.

x_list <- get_agent_x_list(agent)

x_list$stop

[1] FALSE  TRUE

Conclusion

The features covered in this vignette—column selection patterns, preprocessing, segmentation, and thresholds—form the foundation of pointblank’s flexible validation system. By combining these capabilities, you can create sophisticated validation workflows that adapt to your specific data quality requirements. Whether you’re validating simple column constraints or complex multi-step transformations across different data segments, pointblank provides the tools to build robust, maintainable validation pipelines that scale with your data and organizational needs. These patterns enable you to catch data quality issues early and implement systematic approaches to data validation across your projects.