Skip to content

Commit 7f212c0

Browse files
Check in UC quickstart project (#542)
Co-authored-by: Abhidatabricks <abhishekpratap.singh@databricks.com>
1 parent 1913e64 commit 7f212c0

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

71 files changed

+7319
-1
lines changed

.gitignore

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,4 +183,10 @@ cython_debug/
183183

184184
coverage.txt
185185

186-
*_arm64
186+
*_arm64
187+
188+
**/.terraform/**
189+
**/*.tfstate
190+
**/*.tfstate.*
191+
**/terraform.tfstate.d/**
192+
**/*.tfvars

CODEOWNERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,4 +19,5 @@ runtime-packages @nfx @alexott
1919
sql_migration_copilot @robertwhiffin
2020
tacklebox @Jonathan-Choi
2121
uc-catalog-cloning @esiol-db @vasco-lopes
22+
uc-quickstart @abhidatabricks @louiscsq
2223
.github @nfx @alexott @gueniai

uc-quickstart/README.md

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# Databricks Unity Catalog Quickstart 🌐🚀
2+
3+
**Accelerate Your Unity Catalog Setup with Optimized Terraform Automation!**
4+
5+
Welcome to the **databricks-uc-quickstart** repository! This project helps you deploy Unity Catalog (UC) on Databricks swiftly and efficiently, using Terraform scripts pre-configured with recommended settings. Eliminate tedious setup and configuration overhead to quickly launch your data governance initiatives.
6+
7+
## 📋 Best Practices Enforced
8+
9+
This quickstart enforces the following Unity Catalog best practices:
10+
11+
- **Catalog Design Defaults**: Pre-configured catalog structures optimized for data governance
12+
- **Workspace Defaults**: Standard workspace configurations for consistent deployments
13+
- **Role Permission Defaults**: Pre-defined role-based access controls following least privilege principles
14+
- **Storage Setup Defaults**: Optimized storage configurations for Unity Catalog
15+
- **Data Sharing Defaults**: Secure data sharing configurations ready for collaboration
16+
- **Research UC Default and Compatibility**: Ensures compatibility with existing Databricks features
17+
- **Volume Defaults**: Standard volume configurations for data storage
18+
- **Enable System Tables and Grant Role Access**: Automatic system table enablement with appropriate role permissions
19+
20+
Additionally, this quickstart includes **Industry Templates for ABAC** (Attribute-Based Access Control) implemented through Python and SQL notebooks, allowing users to leverage industry-ready functions and policies for fine-grained access control.
21+
22+
The Terraform configurations can be customized by modifying the variables in your deployment, while ABAC policies are managed through the provided notebooks.
23+
24+
## 🌟 Key Benefits
25+
26+
- **Automated Terraform Deployment**: Effortlessly set up and manage Unity Catalog.
27+
- **Instant Setup**: Deploy UC with recommended default configurations.
28+
- **Reduced Boilerplate**: Minimal setup—focus more on your core data projects.
29+
- **Flexible & Customizable**: Easily adapt configurations to match your unique requirements.
30+
31+
## 🏗️ What Gets Deployed
32+
33+
This Terraform quickstart deploys a complete Unity Catalog environment with the following components:
34+
35+
### **Core Infrastructure**
36+
- **3 Unity Catalog Environments**: Production, Development, and Sandbox catalogs
37+
- **Cloud Storage**: Dedicated storage accounts/buckets for each catalog with proper IAM/RBAC
38+
- **External Locations**: Secure storage credential and external location mappings
39+
- **System Schemas**: Access, billing, compute, and storage system tables (if permissions allow)
40+
41+
### **Access Management**
42+
- **User Groups**: Production service principals, developers, and sandbox users
43+
- **Catalog Permissions**: Role-based access control with environment-specific privileges
44+
- **System Schema Grants**: Appropriate permissions for monitoring and governance
45+
46+
### **Compute Resources**
47+
- **Cluster Policies**: Environment-specific policies with cost controls and security settings
48+
- **Clusters**: Pre-configured clusters for each environment with proper access controls
49+
50+
### **Cloud-Specific Resources**
51+
52+
**AWS Deployment:**
53+
- S3 buckets with versioning and encryption
54+
- IAM roles and policies for Unity Catalog access
55+
- Cross-account trust relationships
56+
57+
**Azure Deployment:**
58+
- Storage accounts with containers
59+
- Managed identities and access connectors
60+
- RBAC assignments for Databricks integration
61+
62+
## 🚀 Quick Start
63+
64+
Follow these steps to rapidly deploy Unity Catalog using Terraform:
65+
66+
### 📌 Prerequisites
67+
68+
Ensure you have:
69+
70+
- A Databricks Account
71+
- [Terraform Installed](https://developer.hashicorp.com/terraform/downloads)
72+
- Basic knowledge of Databricks and Terraform
73+
74+
**Workspace Requirements:**
75+
- An existing Databricks workspace is required
76+
- Workspace ID must be provided in the Terraform configuration (see template.tfvars.example)
77+
- The quickstart will configure Unity Catalog resources and permissions within your existing workspace
78+
79+
### 🛠 Installation Steps
80+
81+
1. **Clone this Repository:**
82+
83+
```bash
84+
git clone https://github.com/databrickslabs/sandbox.git
85+
cd sandbox/uc-quickstart/
86+
```
87+
88+
2. **Choose Your Cloud Provider:**
89+
90+
Navigate to the appropriate directory based on your cloud provider:
91+
92+
**For AWS:**
93+
```bash
94+
cd aws/
95+
```
96+
97+
**For Azure:**
98+
```bash
99+
cd azure/
100+
```
101+
102+
3. **Follow Cloud-Specific Setup:**
103+
104+
Each cloud provider has specific prerequisites and configuration steps detailed in their respective README files:
105+
- [AWS Setup Instructions](aws/README.md)
106+
- [Azure Setup Instructions](azure/README.md)
107+
108+
### ✅ Verify Deployment
109+
110+
Once deployment is complete, verify the setup directly within your Databricks workspace to ensure all components are correctly configured.
111+
112+
## 🔧 Need Help?
113+
114+
For cloud-specific troubleshooting and detailed configuration help:
115+
- **AWS Issues:** See [AWS README](aws/README.md#troubleshooting)
116+
- **Azure Issues:** See [Azure README](azure/README.md#troubleshooting)
117+
- **General Questions:** Check the [main documentation](https://docs.databricks.com/en/data-governance/unity-catalog/index.html)
118+
119+
## 📄 License
120+
121+
This project is licensed under the Databricks License—see [LICENSE](../LICENSE) for full details.
122+

uc-quickstart/aws/.terraform.lock.hcl

Lines changed: 96 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

uc-quickstart/aws/README.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# AWS Unity Catalog Deployment
2+
3+
This directory contains AWS-specific Terraform configurations for deploying Unity Catalog.
4+
5+
## Prerequisites
6+
7+
### Required Permissions
8+
- **Databricks Service Principal:**
9+
- Workspace Admin
10+
- Account Admin (for user groups)
11+
- Metastore Admin (for storage credentials and system schemas)
12+
13+
- **AWS Credentials:**
14+
- IAM permissions to create S3 buckets, IAM roles, and policies
15+
- Cross-account access for Unity Catalog integration
16+
17+
### Required Information
18+
- Databricks account ID and workspace ID
19+
- AWS account ID and region
20+
- Service principal client ID and secret
21+
22+
## Configuration
23+
24+
1. **Copy the configuration template:**
25+
```bash
26+
cp template.tfvars.example terraform.tfvars
27+
```
28+
29+
2. **Update `terraform.tfvars` with your values:**
30+
```hcl
31+
# Databricks credentials
32+
databricks_account_id = "your-account-id"
33+
databricks_host = "https://your-workspace.cloud.databricks.com"
34+
databricks_client_id = "your-service-principal-id"
35+
databricks_client_secret = "your-service-principal-secret"
36+
databricks_workspace_id = "your-workspace-id"
37+
38+
# AWS credentials
39+
aws_access_key = "your-aws-access-key"
40+
aws_secret_key = "your-aws-secret-key"
41+
aws_account_id = "your-aws-account-id"
42+
aws_session_token = "your-session-token" # Optional for STS
43+
```
44+
45+
3. **Customize catalog and group names (optional):**
46+
47+
Edit `variables.tf` to modify default names:
48+
```hcl
49+
variable "catalog_1" { default = "prod" }
50+
variable "catalog_2" { default = "dev" }
51+
variable "catalog_3" { default = "sandbox" }
52+
53+
variable "group_1" { default = "production_sp" }
54+
variable "group_2" { default = "developers" }
55+
variable "group_3" { default = "sandbox_users" }
56+
```
57+
58+
## Deployment
59+
60+
Run the following commands in sequence:
61+
62+
```bash
63+
terraform init
64+
terraform validate
65+
terraform plan
66+
terraform apply
67+
```
68+
69+
## AWS Resources Created
70+
71+
- **S3 Buckets:** One per catalog with versioning and encryption
72+
- **IAM Roles:** Unity Catalog cross-account access roles
73+
- **IAM Policies:** Least-privilege policies for each catalog
74+
- **Databricks Resources:** Catalogs, external locations, storage credentials
75+
76+
## Troubleshooting
77+
78+
**Authentication Issues:**
79+
- Ensure AWS CLI is configured: `aws configure`
80+
- For STS credentials, include `aws_session_token`
81+
- Verify service principal has required Databricks permissions
82+
83+
**Permission Errors:**
84+
- Check IAM user has permissions to create S3 buckets and IAM resources
85+
- Verify service principal is workspace admin in Databricks
86+
- For group creation errors, ensure account admin permissions
87+
88+
**Resource Conflicts:**
89+
- S3 bucket names must be globally unique
90+
- Check for existing IAM roles with same names

0 commit comments

Comments
 (0)