The platform engineer's guide to dynamic cluster policy enforcement in Databricks

Learn how to use the lookup function and mutators

The platform engineer's guide to dynamic cluster policy enforcement in Databricks

Have you seen the following symptom: developers export jobs from the Databricks graphical user interface (GUI), only to find out later that they have to modify them when they reach production?

I've seen it and experienced it. From a platform engineering perspective, managing a Databricks instance with many clusters can be painful.

But with the launch of Python Databricks Asset Bundles (DABs), now in General Availability (GA) status, you gain more control. Control that has minimal impact on developers' exported job workflows. It also lets you avoid over-privileging either personal or non-personal accounts when leveraging the policy definitions you've created.

Let me show you how.

Prerequisites

Before you start your journey, there are three prerequisites:

  • Databricks CLI v0.275.0 or above.
  • A Databricks instance.
  • Uv v0.9.17 or above.

When these are in place, you're good to go.

Understanding the problem

Often, developers create job workflows through the GUI. They receive YAML containing raw resource IDs. This works when they're testing in environments like development or test. But when they deploy it through DABs in production, these resource IDs wouldn't exist.

job_clusters:
  - job_cluster_key: development_cluster
    new_cluster:
      policy_id: "001DD81A49229D7C"
      num_workers: 4

There's also another problem. Most environments operate under a "least privilege" model, and DABs are deployed via CI/CD systems. You can give the identity running the deployment many privileges, but why would you?

That's where you can use cluster policy definitions and provide the identity with the Can manage permission, allowing job workflows to spawn a job cluster when they need to.

But how do you instruct your developers to use the policy definition? And how can you make their lives easier without constantly modifying all job workflows?

Solution 1: Lookup function

To provide a proper solution, there are two ingredients: the lookup function and mutators. Before you think, "What the hell are mutators?", hold that thought. I'll cover this one later. Let's first look at the lookup function.

The lookup function allows you to use the resource name instead of the resource ID. This enables you to reference cluster policies by their human-readable name.

For example, imagine you have the cluster policy definition named small job cluster. In your databricks.yml file, you can use it as follows:

variables:
  small_policy_id:
    description: The cluster policy ID to apply to all small job clusters
    lookup:
      cluster_policy: "small job cluster"

When deploying a DAB containing the job workflow, it will fetch the cluster policy definition resource ID. The lookup function supports multiple resource types:

  • alert
  • cluster_policy
  • cluster
  • dashboard
  • instance_pool
  • job
  • metastore
  • notificaiton_destination
  • pipeline
  • query
  • service_principal
  • warehouse

You can now directly reference it in your job cluster definition as such:

job_clusters:
  - job_cluster_key: main_cluster
    new_cluster:
      policy_id: ${var.small_policy_id}
      num_workers: 2

Solution 2: Python mutators

The lookup function solves the portability problem. But what if developers forget to reference the cluster policy variable altogether? Or what if you want to enforce the policy without modifying every job definition?

That's where mutators come in.

You can see Python mutators as simple functions. Functions that transform resources during deployment. They run after all resources are loaded, but before deployment. This means you can intercept every job that gets deployed and apply cluster policies automatically.

To keep it even simpler, think of mutators as middleware for DABs. Developers export their jobs from the GUI, drop them in the resources folder, and your mutator handles the rest.

Let's take a look at how we can implement mutators step by step.

Step 1: Initialize a Python DAB

To use mutators, you need a Python-enabled DAB. The easiest way to get started is with the pydabs template:

databricks bundle init pydabs

Follow the prompts provided by the template. This creates a project structure with Python support already configured:

your_bundle/
├── databricks.yml
├── pyproject.toml
├── resources/
│   └── __init__.py
├── src/
│   └── your_bundle/
└── tests/

Step 2: Create the mutator file

Create a file named mutators.py in your bundle root. This file contains functions decorated with @job_mutator that transform jobs during deployment:

from dataclasses import replace

from databricks.bundles.core import Bundle, Variable, job_mutator, variables
from databricks.bundles.jobs import ClusterSpec, Job, JobCluster


@variables
class BundleVariables:
    """
    Bundle variables resolved at deployment time.
    
    These map to the variables defined in databricks.yml.
    The @variables decorator enables type-safe access to bundle variables.
    """
    cluster_policy_id: Variable[str]
    num_workers: Variable[str]


@job_mutator
def apply_cluster_policy(bundle: Bundle, job: Job) -> Job:
    """
    Applies a cluster policy to all job clusters.
    
    This mutator ensures governance by applying a cluster policy 
    to all job clusters, regardless of how the job was defined.
    Developers can export jobs from the GUI without worrying about
    cluster policies
    """
    # Skip jobs without job clusters
    if not job.job_clusters:
        return job

    # Resolve the cluster policy ID from bundle variables
    try:
        policy_id = bundle.resolve_variable(BundleVariables.cluster_policy_id)
    except Exception:
        # No policy configured, skip mutation
        return job

    # Skip if policy_id is empty or unresolved
    if not policy_id or policy_id.startswith("${"):
        return job

    # Get num_workers from variables (policies often require fixed workers)
    try:
        num_workers_str = bundle.resolve_variable(BundleVariables.num_workers)
        num_workers = int(num_workers_str) if num_workers_str else 4
    except Exception:
        num_workers = 4

    # Apply the policy to each job cluster
    updated_clusters = []
    for cluster in job.job_clusters:
        if cluster.new_cluster:
            updated_spec = replace(
                cluster.new_cluster,
                policy_id=policy_id,
                num_workers=num_workers,
                autoscale=None,  # Remove autoscale for policy compliance
                apply_policy_default_values=True,
            )
            updated_clusters.append(replace(cluster, new_cluster=updated_spec))
        else:
            updated_clusters.append(cluster)

    return replace(job, job_clusters=updated_clusters)

Key points about this mutator:

  • The @variables decorator creates a type-safe mapping to your databricks.yml variables.
  • The @job_mutator decorator registers the function to run on every job during deployment.
  • The function receives the current Bundle context and the Job being processed.
  • It must return a Job object (either the original or a modified copy).
  • The replace() function from dataclasses creates immutable copies with updated fields.

Step 3: Configure variables in databricks.yml

Add the cluster policy variable using the lookup function, and define the num_workers variable:

variables:
  cluster_policy_id:
    description: The cluster policy ID to apply to all job clusters
    lookup:
      small_cluster_policy: "small job cluster"
  num_workers:
    description: Fixed number of workers (required by most cluster policies)
    default: "4"

Step 4: Register the mutator

In your databricks.yml, add the mutator to the Python configuration section. Mutators run in the order they're listed:

python:
  venv_path: .venv
  mutators:
    - "mutators:apply_cluster_policy"

The format is module_name:function_name. Since mutators.py is in the bundle root, the module name is mutators.

Step 5: Add developer job definitions

Now your developers can export jobs from their GUI and place them in the resources/ folder. For example, a file named resources/my_job.yaml:

resources:
  jobs:
    my_etl_job:
      name: "My ETL Job"
      job_clusters:
        - job_cluster_key: main_cluster
          new_cluster:
            spark_version: "14.3.x-scala2.12"
            node_type_id: "Standard_D4s_v5"
            num_workers: 2
      tasks:
        - task_key: ingest_data
          job_cluster_key: main_cluster
          notebook_task:
            notebook_path: ../src/notebooks/ingest.py

Notice there's no policy_id in the job definition. The mutator adds it automatically during deployment.

Step 6: Configure per-target overrides (optional)

Different environments often require different cluster policies. You can override the lookup per target:

targets:
  dev:
    mode: development
    variables:
      # Uses the default lookup from bundle variables
      
  prod:
    mode: production
    variables:
      # Override with a production-specific policy
      cluster_policy_id:
        lookup:
          cluster_policy: "production job cluster"

Validation and deployment

To validate whether the mutator or lookup function works, you can use the bundle validate command:

databricks bundle validate --target dev

The expected output:

Name: your_bundle
Target: dev
Workspace:
  Host: https://your-workspace.azuredatabricks.net
  User: your.email@company.com
  Path: /Workspace/Users/your.email@company.com/.bundle/your_bundle/dev

Validation OK!

To deploy the bundle, use the bundle deploy command:

databricks bundle deploy --target dev

During deployment, the CLI loads all resources, runs your mutators, and then deploys the transformed jobs. Every job cluster now includes the cluster policy. Your developers didn't have to know anything about the resource ID.

Summary

Now that Python DABs are GA, attention shifts to engineering teams that need to enforce governance rules in Databricks.

With the lookup function, you get portability, allowing you to reuse the exact bundle definition across development, test, and production environments without developers hardcoding IDs.

With mutators, you get automatic enforcement. Developers can export jobs from the GUI, and your policies will automatically apply them. No manual edits required, and no forgotten policy references.

For more information, check out the official documentation on Databricks.