How do I create and run a data quality check?

Prev Next

In this tutorial, we’ll see how to use the CheckDefinitions API to model a set of rules and run a simple data quality (DQ) check.

Let’s imagine we want to check instruments that were ingested or updated in the most recent run of an integration for some essential properties. To do this, we first create a check definition that defines which data to check for. We can then run the check and inspect the results for rule breaches.

Note we’ll only run checks manually in this tutorial. To learn how to automate checks and handle the results, follow this tutorial on setting up a data quality check worker.

Note

To complete this tutorial, you must have suitable access control permissions. This can most easily be achieved by assigning your LUSID user the lusid-administrator role.

Step 1: Creating a check definition to model rulesets

We must first create a check definition using the CreateCheckDefinition API, passing in the following:

  • A scope and code that together uniquely identify the check definition

  • A friendly displayName and description

  • The type of datasetSchema for the check:

    • type: Currently LusidEntity is the only valid value

    • entityType: Currently Instrument is the only valid value

  • A list of ruleSets, with each item containing the following:

    • ruleSetKey: A unique identifier

    • displayName: A friendly name

    • description: A detailed description of the rule set

    • ruleSetFilter: Defines which data to run the rule checks using LUSID filtering syntax

  • Optionally, a list of properties to include on the check definition

curl -X POST 'https://<your-domain>.lusid.com/api/api/dataquality/checkdefinitions' 
  -H 'Authorization: Bearer <your-api-access-token>'
  -H 'Content-Type: application/json-patch+json'
  -d '{
  "id": {
    "scope": "Finbourne-Examples",
    "code": "DQ-Check-instrument-properties"
  },
  "displayName": "Instruments check",
  "description": "A check definition to validate instruments are populated with the correct properties",
  "datasetSchema": {
    "type": "LusidEntity",
    "entityType": "Instrument"
  },
  "ruleSets": [
    {
      "ruleSetKey": "instrument-properties-checks",
      "displayName": "Instrument properties checks ruleset",
      "description": "A set of rules to apply to instruments assigned for data refreshes to check for appropriate properties.",
      "ruleSetFilter": "Properties[Instrument/Finbourne-Examples/RefreshData] exists"
    }
  ],
}'
# Set up CheckDefinitions API
import lusid
import lusid.api
import lusid.models
import lusid.extensions
from lusidjam import RefreshingToken

lusid_api_factory = lusid.extensions.SyncApiClientFactory(config_loaders=[
    lusid.extensions.ArgsConfigurationLoader(
        api_url="https://<your-domain>.lusid.com/api",
        access_token=RefreshingToken(),
        app_name="LusidJupyterNotebook"
    )
])

checkDefinitions_api = lusid_api_factory.build(lusid.api.CheckDefinitionsApi)

# Create check definition
rule_set_key = "instrument-properties-checks"

try:
    checkDefinitions_api.create_check_definition(
        create_check_definition_request=lusid.models.CreateCheckDefinitionRequest(
            id=lusid.models.ResourceId(
                scope="Finbourne-Examples",
                code="DQ-Check-instrument-properties"
            ),
            display_name="Instruments check",
            description="A check definition to validate instruments are populated with the correct properties",
            dataset_schema=lusid.models.CheckDefinitionDatasetSchema(
                type="LusidEntity",
                entity_type="Instrument"
            ),
            rule_sets=[lusid.models.UpdateCheckDefinitionRuleSet(
                rule_set_key=rule_set_key,
                display_name="Instrument properties checks ruleset",
                description="A set of rules to apply to instruments assigned for data refreshes to check for appropriate properties.",
                rule_set_filter="Properties[Instrument/Finbourne-Examples/RefreshData] exists"
            )]
        )
    )
except lusid.ApiException as e:
    if e.status == 400 and "EntityWithIdAlreadyExists" in str(e.body):
        print("Check definition already exists, skipping creation.")
    else:
        raise

Part of a response is shown below. Note the asAtVersionNumber of the check definition, which increments if you update the check definition:

{
  "id": {
    "scope": "Finbourne-Examples",
    "code": "DQ-Check-instrument-properties"
  },
  "version": {
    "asAtCreated": "2026-01-22T11:29:44.2505000+00:00",
    "asAtModified": "2026-01-22T11:29:44.2505000+00:00",
    "userIdModified": "00ujk6twb4jDcHGjN2p8", 
    "asAtVersionNumber": 1,
    ...
  },
...

Step 2: Adding rules to the check definition

Now that we have our check definition, we can upsert rules to the rulesets within the definition. To do so, we call the UpsertRules API, passing in the following:

  • Within the request URL, the scope and code of the check definition we want to add the rules to

  • Within the request body:

    • ruleSetKey: The unique for a ruleset within the check definition

    • rule:

      • ruleKey: A unique identifier for the rule

      • displayName: A friendly name

      • description: A detailed description of the rule

      • ruleFormula: A formula that uses derived property syntax to determine a value of true or false for the data check

      • severity: A number to indicate the importance of the rule

curl -X POST 'https://<your-domain>.lusid.com/api/api/dataquality/checkdefinitions/Finbourne-Examples/DQ-Check-instrument-properties/$upsertRules' 
  -H 'Authorization: Bearer <your-api-access-token>'
  -H 'Content-Type: application/json-patch+json'
  -d '[
  {
    "ruleSetKey": "instrument-properties-checks",
    "rule": {
      "ruleKey": "issuer-exists",
      "displayName": "Issuer exists check",
      "description": "Checks whether an instrument is decorated with the Instrument/Finbourne-Examples/Issuer property.",
      "ruleFormula": "properties[Instrument/Finbourne-Examples/Issuer] exists",
      "severity": 1
    }
  },
]
try:
    checkDefinitions_api.upsert_rules(
        scope="Finbourne-Examples",
        code="DQ-Check-instrument-properties",
        upsert_data_quality_rule=[lusid.models.UpsertDataQualityRule(
            rule_set_key=rule_set_key,
            rule=lusid.models.CheckDefinitionRule(
                rule_key="issuer-exists",
                display_name="Issuer exists check",
                description="Checks whether an instrument is decorated with the Instrument/Finbourne-Examples/Issuer property",
                rule_formula="properties[Instrument/Finbourne-Examples/Issuer] exists",
                severity=1
            )
        )]
    )
except lusid.ApiException as e:
    print(f"Error upserting rules: {e}")

Part of a response is shown below, containing the check definition which now shows the ruleset populated with our rule.

{
  "id": {
    "scope": "Finbourne-Examples",
    "code": "DQ-Check-instrument-properties"
  }, 
  "ruleSets": [
    {
      "ruleSetKey": "instrument-properties-checks",
      "displayName": "Instrument properties checks ruleset",
      "description": "A set of rules to apply to instruments assigned for data refreshes to check for appropriate properties.",
      "ruleSetFilter": "Properties[Instrument/Finbourne-Examples/RefreshData] exists",
      "rules": [
        {
          "ruleKey": "issuer-exists",
          "displayName": "Issuer exists check",
          "description": "Checks whether an instrument is decorated with the Instrument/Finbourne-Examples/Issuer property.",
          "ruleFormula": "properties[Instrument/Finbourne-Examples/Issuer] exists",
          "severity": 1
        }
      ]
    }
  ],
...
}

Step 3: Running the check on some instruments

Our check definition now contains rules in rule sets. To run the check on some data, we can call the RunCheckDefinition API, passing in the following:

  • Within the request URL, the scope and code of the check definition we want to run

  • Within the request body:

    • A lusidEntityDataset containing:

      • asAt: An asAt date for the data to be checked; for our example we’ll let it default to the latest

      • effectiveAt: An effectiveAt date for the data to be checked; for our example we’ll let it default to the latest

      • scope: The scope for the data we want to check; for our example we want to check instruments in the Finbourne-Examples scope

      • asAtModifiedSince: Optionally, an asAtModified date for the data to be checked; LUSID will only run the check against entities that have been modified since the date specified

      • selectorAttribute: A field name, property key, or identifier to define the data to run the check against; for our example we’ll run the check on instruments that have the property Instrument/Finbourne-Examples/RefreshData

      • selectorValue: A value for the selectorAttribute; for our example we’ll run the check on instruments with selector property value of True

      • returnIdentifierKey: A preferred identifier type to return for each of the checked entities

    • limitIndividualBreachesPerRule: A limit for the number of breaches per rule (above the limit, further breaches are grouped into a single result)

curl -X POST 'https://<your-domain>.lusid.com/api/api/dataquality/checkdefinitions/Finbourne-Examples/DQ-Check-instrument-properties/$run' 
  -H 'Authorization: Bearer <your-api-access-token>'
  -H 'Content-Type: application/json-patch+json'
  -d '{
  "lusidEntityDataset": {
    "scope": "Finbourne-Examples",
    "selectorAttribute": "Properties[Instrument/Finbourne-Examples/RefreshData]",
    "selectorValue": "True",
    "returnIdentifierKey": "Instrument/default/ClientInternal"
  },
  "limitIndividualBreachesPerRule": 20
}'
try:
    run_check_response = checkDefinitions_api.run_check_definition(
        scope="Finbourne-Examples",
        code="DQ-Check-instrument-properties",
        run_check_request=lusid.models.RunCheckRequest(
            lusid_entity_dataset=lusid.models.LusidEntityDataset(
                scope="Finbourne-ExamplesDQ-Check",
                selector_attribute="Properties[Instrument/Finbourne-Examples/RefreshData]",
                selector_value="True",
                return_identifier_key="Instrument/default/ClientInternal"
            ),
            limit_individual_breaches_per_rule=100
        )
    )

    results = run_check_response.data_quality_check_results
    total = len(results)
    print(f"Check complete: {total} results (breaches)")
except lu.ApiException as e:
    print(f"Error running check: {e}")

Next steps

Follow this tutorial on setting up a DQ check workflow to:

  • Use the check definition in a workflow to automate the DQ check

  • Inspect the results from a run of the DQ check workflow via the LUSID web app