|
| 1 | +--- |
| 2 | +title: Databricks |
| 3 | +description: Run SQL queries and manage jobs on Databricks |
| 4 | +--- |
| 5 | + |
| 6 | +import { BlockInfoCard } from "@/components/ui/block-info-card" |
| 7 | + |
| 8 | +<BlockInfoCard |
| 9 | + type="databricks" |
| 10 | + color="#FF3621" |
| 11 | +/> |
| 12 | + |
| 13 | +## Usage Instructions |
| 14 | + |
| 15 | +Connect to Databricks to execute SQL queries against SQL warehouses, trigger and monitor job runs, manage clusters, and retrieve run outputs. Requires a Personal Access Token and workspace host URL. |
| 16 | + |
| 17 | + |
| 18 | + |
| 19 | +## Tools |
| 20 | + |
| 21 | +### `databricks_execute_sql` |
| 22 | + |
| 23 | +Execute a SQL statement against a Databricks SQL warehouse and return results inline. Supports parameterized queries and Unity Catalog. |
| 24 | + |
| 25 | +#### Input |
| 26 | + |
| 27 | +| Parameter | Type | Required | Description | |
| 28 | +| --------- | ---- | -------- | ----------- | |
| 29 | +| `host` | string | Yes | Databricks workspace host \(e.g., dbc-abc123.cloud.databricks.com\) | |
| 30 | +| `apiKey` | string | Yes | Databricks Personal Access Token | |
| 31 | +| `warehouseId` | string | Yes | The ID of the SQL warehouse to execute against | |
| 32 | +| `statement` | string | Yes | The SQL statement to execute \(max 16 MiB\) | |
| 33 | +| `catalog` | string | No | Unity Catalog name \(equivalent to USE CATALOG\) | |
| 34 | +| `schema` | string | No | Schema name \(equivalent to USE SCHEMA\) | |
| 35 | +| `rowLimit` | number | No | Maximum number of rows to return | |
| 36 | +| `waitTimeout` | string | No | How long to wait for results \(e.g., "50s"\). Range: "0s" or "5s" to "50s". Default: "50s" | |
| 37 | + |
| 38 | +#### Output |
| 39 | + |
| 40 | +| Parameter | Type | Description | |
| 41 | +| --------- | ---- | ----------- | |
| 42 | +| `statementId` | string | Unique identifier for the executed statement | |
| 43 | +| `status` | string | Execution status \(SUCCEEDED, PENDING, RUNNING, FAILED, CANCELED, CLOSED\) | |
| 44 | +| `columns` | array | Column schema of the result set | |
| 45 | +| ↳ `name` | string | Column name | |
| 46 | +| ↳ `position` | number | Column position \(0-based\) | |
| 47 | +| ↳ `typeName` | string | Column type \(STRING, INT, LONG, DOUBLE, BOOLEAN, TIMESTAMP, DATE, DECIMAL, etc.\) | |
| 48 | +| `data` | array | Result rows as a 2D array of strings where each inner array is a row of column values | |
| 49 | +| `totalRows` | number | Total number of rows in the result | |
| 50 | +| `truncated` | boolean | Whether the result set was truncated due to row_limit or byte_limit | |
| 51 | + |
| 52 | +### `databricks_list_jobs` |
| 53 | + |
| 54 | +List all jobs in a Databricks workspace with optional filtering by name. |
| 55 | + |
| 56 | +#### Input |
| 57 | + |
| 58 | +| Parameter | Type | Required | Description | |
| 59 | +| --------- | ---- | -------- | ----------- | |
| 60 | +| `host` | string | Yes | Databricks workspace host \(e.g., dbc-abc123.cloud.databricks.com\) | |
| 61 | +| `apiKey` | string | Yes | Databricks Personal Access Token | |
| 62 | +| `limit` | number | No | Maximum number of jobs to return \(range 1-100, default 20\) | |
| 63 | +| `offset` | number | No | Offset for pagination | |
| 64 | +| `name` | string | No | Filter jobs by exact name \(case-insensitive\) | |
| 65 | +| `expandTasks` | boolean | No | Include task and cluster details in the response \(max 100 elements\) | |
| 66 | + |
| 67 | +#### Output |
| 68 | + |
| 69 | +| Parameter | Type | Description | |
| 70 | +| --------- | ---- | ----------- | |
| 71 | +| `jobs` | array | List of jobs in the workspace | |
| 72 | +| ↳ `jobId` | number | Unique job identifier | |
| 73 | +| ↳ `name` | string | Job name | |
| 74 | +| ↳ `createdTime` | number | Job creation timestamp \(epoch ms\) | |
| 75 | +| ↳ `creatorUserName` | string | Email of the job creator | |
| 76 | +| ↳ `maxConcurrentRuns` | number | Maximum number of concurrent runs | |
| 77 | +| ↳ `format` | string | Job format \(SINGLE_TASK or MULTI_TASK\) | |
| 78 | +| `hasMore` | boolean | Whether more jobs are available for pagination | |
| 79 | +| `nextPageToken` | string | Token for fetching the next page of results | |
| 80 | + |
| 81 | +### `databricks_run_job` |
| 82 | + |
| 83 | +Trigger an existing Databricks job to run immediately with optional job-level or notebook parameters. |
| 84 | + |
| 85 | +#### Input |
| 86 | + |
| 87 | +| Parameter | Type | Required | Description | |
| 88 | +| --------- | ---- | -------- | ----------- | |
| 89 | +| `host` | string | Yes | Databricks workspace host \(e.g., dbc-abc123.cloud.databricks.com\) | |
| 90 | +| `apiKey` | string | Yes | Databricks Personal Access Token | |
| 91 | +| `jobId` | number | Yes | The ID of the job to trigger | |
| 92 | +| `jobParameters` | string | No | Job-level parameter overrides as a JSON object \(e.g., \{"key": "value"\}\) | |
| 93 | +| `notebookParams` | string | No | Notebook task parameters as a JSON object \(e.g., \{"param1": "value1"\}\) | |
| 94 | +| `idempotencyToken` | string | No | Idempotency token to prevent duplicate runs \(max 64 characters\) | |
| 95 | + |
| 96 | +#### Output |
| 97 | + |
| 98 | +| Parameter | Type | Description | |
| 99 | +| --------- | ---- | ----------- | |
| 100 | +| `runId` | number | The globally unique ID of the triggered run | |
| 101 | +| `numberInJob` | number | The sequence number of this run among all runs of the job | |
| 102 | + |
| 103 | +### `databricks_get_run` |
| 104 | + |
| 105 | +Get the status, timing, and details of a Databricks job run by its run ID. |
| 106 | + |
| 107 | +#### Input |
| 108 | + |
| 109 | +| Parameter | Type | Required | Description | |
| 110 | +| --------- | ---- | -------- | ----------- | |
| 111 | +| `host` | string | Yes | Databricks workspace host \(e.g., dbc-abc123.cloud.databricks.com\) | |
| 112 | +| `apiKey` | string | Yes | Databricks Personal Access Token | |
| 113 | +| `runId` | number | Yes | The canonical identifier of the run | |
| 114 | +| `includeHistory` | boolean | No | Include repair history in the response | |
| 115 | +| `includeResolvedValues` | boolean | No | Include resolved parameter values in the response | |
| 116 | + |
| 117 | +#### Output |
| 118 | + |
| 119 | +| Parameter | Type | Description | |
| 120 | +| --------- | ---- | ----------- | |
| 121 | +| `runId` | number | The run ID | |
| 122 | +| `jobId` | number | The job ID this run belongs to | |
| 123 | +| `runName` | string | Name of the run | |
| 124 | +| `runType` | string | Type of run \(JOB_RUN, WORKFLOW_RUN, SUBMIT_RUN\) | |
| 125 | +| `attemptNumber` | number | Retry attempt number \(0 for initial attempt\) | |
| 126 | +| `state` | object | Run state information | |
| 127 | +| ↳ `lifeCycleState` | string | Lifecycle state \(QUEUED, PENDING, RUNNING, TERMINATING, TERMINATED, SKIPPED, INTERNAL_ERROR, BLOCKED, WAITING_FOR_RETRY\) | |
| 128 | +| ↳ `resultState` | string | Result state \(SUCCESS, FAILED, TIMEDOUT, CANCELED, SUCCESS_WITH_FAILURES, UPSTREAM_FAILED, UPSTREAM_CANCELED, EXCLUDED\) | |
| 129 | +| ↳ `stateMessage` | string | Descriptive message for the current state | |
| 130 | +| ↳ `userCancelledOrTimedout` | boolean | Whether the run was cancelled by user or timed out | |
| 131 | +| `startTime` | number | Run start timestamp \(epoch ms\) | |
| 132 | +| `endTime` | number | Run end timestamp \(epoch ms, 0 if still running\) | |
| 133 | +| `setupDuration` | number | Cluster setup duration \(ms\) | |
| 134 | +| `executionDuration` | number | Execution duration \(ms\) | |
| 135 | +| `cleanupDuration` | number | Cleanup duration \(ms\) | |
| 136 | +| `queueDuration` | number | Time spent in queue before execution \(ms\) | |
| 137 | +| `runPageUrl` | string | URL to the run detail page in Databricks UI | |
| 138 | +| `creatorUserName` | string | Email of the user who triggered the run | |
| 139 | + |
| 140 | +### `databricks_list_runs` |
| 141 | + |
| 142 | +List job runs in a Databricks workspace with optional filtering by job, status, and time range. |
| 143 | + |
| 144 | +#### Input |
| 145 | + |
| 146 | +| Parameter | Type | Required | Description | |
| 147 | +| --------- | ---- | -------- | ----------- | |
| 148 | +| `host` | string | Yes | Databricks workspace host \(e.g., dbc-abc123.cloud.databricks.com\) | |
| 149 | +| `apiKey` | string | Yes | Databricks Personal Access Token | |
| 150 | +| `jobId` | number | No | Filter runs by job ID. Omit to list runs across all jobs | |
| 151 | +| `activeOnly` | boolean | No | Only include active runs \(PENDING, RUNNING, or TERMINATING\) | |
| 152 | +| `completedOnly` | boolean | No | Only include completed runs | |
| 153 | +| `limit` | number | No | Maximum number of runs to return \(range 1-24, default 20\) | |
| 154 | +| `offset` | number | No | Offset for pagination | |
| 155 | +| `runType` | string | No | Filter by run type \(JOB_RUN, WORKFLOW_RUN, SUBMIT_RUN\) | |
| 156 | +| `startTimeFrom` | number | No | Filter runs started at or after this timestamp \(epoch ms\) | |
| 157 | +| `startTimeTo` | number | No | Filter runs started at or before this timestamp \(epoch ms\) | |
| 158 | + |
| 159 | +#### Output |
| 160 | + |
| 161 | +| Parameter | Type | Description | |
| 162 | +| --------- | ---- | ----------- | |
| 163 | +| `runs` | array | List of job runs | |
| 164 | +| ↳ `runId` | number | Unique run identifier | |
| 165 | +| ↳ `jobId` | number | Job this run belongs to | |
| 166 | +| ↳ `runName` | string | Run name | |
| 167 | +| ↳ `runType` | string | Run type \(JOB_RUN, WORKFLOW_RUN, SUBMIT_RUN\) | |
| 168 | +| ↳ `state` | object | Run state information | |
| 169 | +| ↳ `lifeCycleState` | string | Lifecycle state \(QUEUED, PENDING, RUNNING, TERMINATING, TERMINATED, SKIPPED, INTERNAL_ERROR, BLOCKED, WAITING_FOR_RETRY\) | |
| 170 | +| ↳ `resultState` | string | Result state \(SUCCESS, FAILED, TIMEDOUT, CANCELED, SUCCESS_WITH_FAILURES, UPSTREAM_FAILED, UPSTREAM_CANCELED, EXCLUDED\) | |
| 171 | +| ↳ `stateMessage` | string | Descriptive state message | |
| 172 | +| ↳ `userCancelledOrTimedout` | boolean | Whether the run was cancelled by user or timed out | |
| 173 | +| ↳ `startTime` | number | Run start timestamp \(epoch ms\) | |
| 174 | +| ↳ `endTime` | number | Run end timestamp \(epoch ms\) | |
| 175 | +| `hasMore` | boolean | Whether more runs are available for pagination | |
| 176 | +| `nextPageToken` | string | Token for fetching the next page of results | |
| 177 | + |
| 178 | +### `databricks_cancel_run` |
| 179 | + |
| 180 | +Cancel a running or pending Databricks job run. Cancellation is asynchronous; poll the run status to confirm termination. |
| 181 | + |
| 182 | +#### Input |
| 183 | + |
| 184 | +| Parameter | Type | Required | Description | |
| 185 | +| --------- | ---- | -------- | ----------- | |
| 186 | +| `host` | string | Yes | Databricks workspace host \(e.g., dbc-abc123.cloud.databricks.com\) | |
| 187 | +| `apiKey` | string | Yes | Databricks Personal Access Token | |
| 188 | +| `runId` | number | Yes | The canonical identifier of the run to cancel | |
| 189 | + |
| 190 | +#### Output |
| 191 | + |
| 192 | +| Parameter | Type | Description | |
| 193 | +| --------- | ---- | ----------- | |
| 194 | +| `success` | boolean | Whether the cancel request was accepted | |
| 195 | + |
| 196 | +### `databricks_get_run_output` |
| 197 | + |
| 198 | +Get the output of a completed Databricks job run, including notebook results, error messages, and logs. For multi-task jobs, use the task run ID (not the parent run ID). |
| 199 | + |
| 200 | +#### Input |
| 201 | + |
| 202 | +| Parameter | Type | Required | Description | |
| 203 | +| --------- | ---- | -------- | ----------- | |
| 204 | +| `host` | string | Yes | Databricks workspace host \(e.g., dbc-abc123.cloud.databricks.com\) | |
| 205 | +| `apiKey` | string | Yes | Databricks Personal Access Token | |
| 206 | +| `runId` | number | Yes | The run ID to get output for. For multi-task jobs, use the task run ID | |
| 207 | + |
| 208 | +#### Output |
| 209 | + |
| 210 | +| Parameter | Type | Description | |
| 211 | +| --------- | ---- | ----------- | |
| 212 | +| `notebookOutput` | object | Notebook task output \(from dbutils.notebook.exit\(\)\) | |
| 213 | +| ↳ `result` | string | Value passed to dbutils.notebook.exit\(\) \(max 5 MB\) | |
| 214 | +| ↳ `truncated` | boolean | Whether the result was truncated | |
| 215 | +| `error` | string | Error message if the run failed or output is unavailable | |
| 216 | +| `errorTrace` | string | Error stack trace if available | |
| 217 | +| `logs` | string | Log output \(last 5 MB\) from spark_jar, spark_python, or python_wheel tasks | |
| 218 | +| `logsTruncated` | boolean | Whether the log output was truncated | |
| 219 | + |
| 220 | +### `databricks_list_clusters` |
| 221 | + |
| 222 | +List all clusters in a Databricks workspace including their state, configuration, and resource details. |
| 223 | + |
| 224 | +#### Input |
| 225 | + |
| 226 | +| Parameter | Type | Required | Description | |
| 227 | +| --------- | ---- | -------- | ----------- | |
| 228 | +| `host` | string | Yes | Databricks workspace host \(e.g., dbc-abc123.cloud.databricks.com\) | |
| 229 | +| `apiKey` | string | Yes | Databricks Personal Access Token | |
| 230 | + |
| 231 | +#### Output |
| 232 | + |
| 233 | +| Parameter | Type | Description | |
| 234 | +| --------- | ---- | ----------- | |
| 235 | +| `clusters` | array | List of clusters in the workspace | |
| 236 | +| ↳ `clusterId` | string | Unique cluster identifier | |
| 237 | +| ↳ `clusterName` | string | Cluster display name | |
| 238 | +| ↳ `state` | string | Current state \(PENDING, RUNNING, RESTARTING, RESIZING, TERMINATING, TERMINATED, ERROR, UNKNOWN\) | |
| 239 | +| ↳ `stateMessage` | string | Human-readable state description | |
| 240 | +| ↳ `creatorUserName` | string | Email of the cluster creator | |
| 241 | +| ↳ `sparkVersion` | string | Spark runtime version \(e.g., 13.3.x-scala2.12\) | |
| 242 | +| ↳ `nodeTypeId` | string | Worker node type identifier | |
| 243 | +| ↳ `driverNodeTypeId` | string | Driver node type identifier | |
| 244 | +| ↳ `numWorkers` | number | Number of worker nodes \(for fixed-size clusters\) | |
| 245 | +| ↳ `autoscale` | object | Autoscaling configuration \(null for fixed-size clusters\) | |
| 246 | +| ↳ `minWorkers` | number | Minimum number of workers | |
| 247 | +| ↳ `maxWorkers` | number | Maximum number of workers | |
| 248 | +| ↳ `clusterSource` | string | Origin \(API, UI, JOB, MODELS, PIPELINE, PIPELINE_MAINTENANCE, SQL\) | |
| 249 | +| ↳ `autoterminationMinutes` | number | Minutes of inactivity before auto-termination \(0 = disabled\) | |
| 250 | +| ↳ `startTime` | number | Cluster start timestamp \(epoch ms\) | |
| 251 | + |
| 252 | + |
0 commit comments