Skip to content

A comprehensive development environment for Apache Polaris featuring LocalStack integration on k3s. This kit automates the setup of a complete Polaris environment with S3-compatible storage, authentication, and role-based access control.

License

Notifications You must be signed in to change notification settings

Snowflake-Labs/polaris-local-forge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Apache Polaris(Incubating) Starter Kit with LocalStack on k3s

k3d Docker Desktop Apache Polaris(Incubating) LocalStack

This starter kit provides a complete development environment for Apache Polaris with LocalStack integration running on k3s Kubernetes. It includes automated setup of PostgreSQL metastore, S3 integration via LocalStack, and all necessary configurations for immediate development use.

Key features:

  • πŸš€ Automated k3s cluster setup with k3d
  • ☁️ Integrated LocalStack for AWS S3 emulation
  • πŸ—„οΈ PostgreSQL metastore configuration
  • πŸ€– Task-based automation for easy management
  • πŸ““ Jupyter notebook for verification

πŸš€ Quick Start

Get Apache Polaris running locally in 3 steps:

1. Clone the Repository

git clone https://github.com/snowflake-labs/polaris-local-forge
cd polaris-local-forge

2. Install Prerequisites

Install Task (if not already installed):

# macOS
brew install go-task/tap/go-task

# Linux
curl -sL https://taskfile.dev/install.sh | sh
sudo mv bin/task /usr/local/bin/

# Windows (Scoop)
scoop install task

# Windows (Chocolatey)
choco install go-task

Setup Python environment:

# Install uv and setup Python environment
task setup:python

Note: Task commands automatically use the virtual environment. You only need to manually activate it if running Python/Jupyter commands directly:

source .venv/bin/activate  # On Unix-like systems
.venv\Scripts\activate     # On Windows

3. Deploy Everything

task setup:all

This single command will:

  • βœ… Generate required configuration files
  • βœ… Create the k3s cluster with k3d
  • βœ… Deploy PostgreSQL and LocalStack
  • βœ… Deploy Apache Polaris
  • βœ… Create a demo catalog

That's it! ✨

After completion, open and run notebooks/verify_setup.ipynb to verify your setup.

πŸ“ What You Get

Once setup completes, you'll have the following services running:

Service URL Credentials/Details
🌟 Polaris API http://localhost:18181 See k8s/polaris/.bootstrap-credentials.env for login
☁️ LocalStack http://localhost:14566 AWS S3 emulator - Use test/test for credentials

Quick access:

task urls  # Display all service URLs
task status  # Check deployment status

πŸ“‹ Task Commands Reference

The project uses Task to automate common workflows. Here are the most useful commands:

Installation & Setup

task install:uv      # Install uv Python package manager
task setup:python    # Setup Python environment (installs uv + creates venv)
task setup:dnsmasq   # Configure DNSmasq for .localstack domain (macOS only)
task prepare         # Generate required configuration files

Essential Commands

task help            # List all available tasks
task setup:all       # Complete setup (prepare β†’ cluster β†’ deploy β†’ catalog)
task reset:all       # Complete reset (delete cluster β†’ recreate everything)
task urls            # Show all service URLs and credentials
task status          # Check deployment status
task clean:all       # Delete cluster and all resources

Cluster Management

task cluster:create           # Create k3d cluster
task cluster:bootstrap-check  # Wait for bootstrap deployments
task cluster:polaris-check    # Wait for Polaris deployment
task cluster:delete           # Delete the cluster
task cluster:reset            # Delete and recreate cluster with fresh catalog

Polaris Operations

task polaris:deploy     # Deploy Polaris to the cluster
task polaris:reset      # Purge and re-bootstrap Polaris
task polaris:purge      # Purge Polaris data
task polaris:bootstrap  # Bootstrap Polaris (run after purge)

Catalog Management

task catalog:setup    # Setup demo catalog (bucket, catalog, principal, roles)
task catalog:verify   # Generate verification notebook
task catalog:cleanup  # Cleanup catalog resources
task catalog:reset    # Cleanup and recreate catalog (keeps cluster running)

Logging & Troubleshooting

# View logs
task logs:polaris      # Stream Polaris server logs
task logs:postgresql   # Stream PostgreSQL logs
task logs:localstack   # Stream LocalStack logs
task logs:bootstrap    # View bootstrap job logs
task logs:purge        # View purge job logs

# Troubleshooting
task troubleshoot:polaris     # Diagnose Polaris issues
task troubleshoot:postgresql  # Check database connectivity
task troubleshoot:localstack  # Verify LocalStack connectivity
task troubleshoot:events      # Show recent events in polaris namespace

πŸ“¦ Prerequisites

Before you begin, ensure you have the following tools installed:

Required Tools

Optional Tools

Important Ensure all required tools are installed and on your PATH before running task setup:all.

Verify Prerequisites

# Check required tools
docker --version
kubectl version --client
k3d version
python3 --version
uv --version
task --version

# Check Docker is running
docker ps

πŸ”§ Advanced Configuration

Environment Variables

The Taskfile automatically manages most environment variables. If you need to customize them, create a .env file:

# Optional: Override default values
export PROJECT_HOME="$PWD"
export KUBECONFIG="$PWD/.kube/config"
export K3D_CLUSTER_NAME=polaris-local-forge
export K3S_VERSION=v1.32.1-k3s1
export FEATURES_DIR="$PWD/k8s"

Tip: Use direnv to automatically load environment variables when entering the project directory.

DNSmasq (Optional)

For seamless access to services, you can configure DNSmasq instead of editing /etc/hosts.

macOS Setup:

# Configure DNSmasq
echo "address=/.localstack/127.0.0.1" >> $(brew --prefix)/etc/dnsmasq.conf

# Add resolver
sudo tee /etc/resolver/localstack <<EOF
nameserver 127.0.0.1
EOF

# Restart DNSmasq
sudo brew services restart dnsmasq

Or use Task:

task setup:dnsmasq  # macOS only

Custom Python Version

# Pin a different Python version
uv python pin 3.11  # or 3.13

# Recreate virtual environment
uv venv --force
source .venv/bin/activate
uv sync

βœ… Verification

After running task setup:all, verify your setup:

1. Run the Verification Notebook

Activate the virtual environment (if not already activated) and open the notebook:

source .venv/bin/activate  # On Unix-like systems
# .venv\Scripts\activate   # On Windows

jupyter notebook notebooks/verify_setup.ipynb

The notebook will:

  • Create a test namespace
  • Create a test table
  • Insert sample data
  • Query the data back

2. Check LocalStack Storage

Open https://app.localstack.cloud/inst/default/resources/s3/polardb to view your Iceberg files:

Localstack

You should see the catalog structure with metadata and data files:

Catalog Catalog Metadata Catalog Data

3. Verify Deployments

# Check all deployments
task status

# Or manually
kubectl get all -n polaris
kubectl get all -n localstack

Expected output in polaris namespace:

NAME                           READY   STATUS      RESTARTS   AGE
pod/polaris-694ddbb476-m2trm   1/1     Running     0          13m
pod/polaris-bootstrap-xxxxx    0/1     Completed   0          13m
pod/postgresql-0               1/1     Running     0          15m

NAME                    TYPE           CLUSTER-IP     EXTERNAL-IP             PORT(S)          AGE
service/polaris         LoadBalancer   10.43.202.93   172.19.0.3,172.19.0.4   8181:32181/TCP   13m
service/postgresql      ClusterIP      10.43.182.31   <none>                  5432/TCP         15m
service/postgresql-hl   ClusterIP      None           <none>                  5432/TCP         15m

πŸ” Troubleshooting

Quick Diagnostics

# Check deployment status
task status

# View events
task troubleshoot:events

# Check specific component
task troubleshoot:polaris
task troubleshoot:postgresql
task troubleshoot:localstack

Common Issues

1. Polaris Server Fails to Start

# Check Polaris logs
task logs:polaris

# Check pod status and events
task troubleshoot:polaris

2. LocalStack Not Accessible

# Verify LocalStack is running
kubectl get pods -n localstack

# Check connectivity
task troubleshoot:localstack

3. PostgreSQL Connection Issues

# Check PostgreSQL logs
task logs:postgresql

# Verify connectivity
task troubleshoot:postgresql

4. Bootstrap Job Fails

# View bootstrap logs
task logs:bootstrap

# Reset Polaris
task polaris:reset

Manual Troubleshooting Commands

If Task commands don't help, you can use these manual commands:

# Check events
kubectl get events -n polaris --sort-by='.lastTimestamp'

# Describe pods
kubectl describe pod -n polaris -l app=polaris

# Check logs
kubectl logs -f -n polaris deployment/polaris
kubectl logs -f -n polaris jobs/polaris-bootstrap
kubectl logs -f -n polaris statefulset/postgresql
kubectl logs -f -n localstack deployment/localstack

# Check services
kubectl get svc -n polaris
kubectl get svc -n localstack

# Verify PostgreSQL
kubectl exec -it -n polaris postgresql-0 -- pg_isready -h localhost

# Verify LocalStack
kubectl exec -it -n localstack deployment/localstack -- \
  aws --endpoint-url=http://localhost:4566 s3 ls

🧹 Cleanup & Reset

Reset Catalog Only (Keep Cluster Running)

Clean and recreate the catalog with fresh data:

task catalog:reset

Or just cleanup without recreating:

task catalog:cleanup

Reset Everything (Delete and Recreate Cluster)

Complete reset - deletes cluster and recreates everything with fresh catalog:

task reset:all
# Same as: task cluster:reset

Delete Everything

Delete the k3d cluster and all resources:

task clean:all

This removes:

  • k3d cluster
  • All Kubernetes resources
  • Catalog data in LocalStack
  • PostgreSQL data

Note: Your configuration files in k8s/polaris/ (credentials, secrets, keys) are preserved. Run task prepare to regenerate them if needed.

πŸ› οΈ What's Next?

Now that you have Apache Polaris running locally, you can:

  • Connect query engines: Use with Apache Spark, Trino, or Risingwave
  • Explore the API: Check the Polaris API documentation
  • Create more catalogs: Run task catalog:setup with custom parameters
  • Develop integrations: Use the LocalStack S3 endpoint for testing
  • Experiment with Iceberg: Create tables, partitions, and time-travel queries

πŸ“š Related Projects and Tools

Core Components

  • Apache Polaris - Data Catalog and Governance Platform
  • Apache Iceberg - Open table format for data lakes
  • PyIceberg - Python library to interact with Apache Iceberg
  • LocalStack - AWS Cloud Service Emulator
  • k3d - k3s in Docker
  • k3s - Lightweight Kubernetes Distribution

Development Tools

  • Docker - Container Platform
  • Kubernetes - Container Orchestration
  • kubectl - Kubernetes CLI
  • Task - Modern task runner and build tool
  • uv - Fast Python packaging tool
  • Ansible - Automation and configuration management

Documentation


πŸ“„ License

Copyright (c) Snowflake Inc. All rights reserved.
Licensed under the Apache 2.0 license.

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


πŸ”§ Advanced: Manual Setup (Click to expand)

If you prefer to run commands manually instead of using Task, here's the step-by-step process:

1. Prepare Configuration Files

Generate required sensitive files from templates:

ansible-playbook polaris-forge-setup/prepare.yml

2. Create the Cluster

bin/setup.sh

Wait for bootstrap deployments:

ansible-playbook polaris-forge-setup/cluster_checks.yml --tags=bootstrap

3. Verify Base Components

PostgreSQL:

kubectl get pods,svc -n polaris

LocalStack:

kubectl get pods,svc -n localstack

4. Deploy Polaris

kubectl apply -k k8s/polaris

Wait for Polaris deployment:

ansible-playbook polaris-forge-setup/cluster_checks.yml --tags=polaris

5. Setup Catalog

Export AWS environment variables:

unset AWS_PROFILE
export AWS_ENDPOINT_URL=http://localstack.localstack:4566
export AWS_ACCESS_KEY_ID=test
export AWS_SECRET_ACCESS_KEY=test
export AWS_REGION=us-east-1

Create catalog:

ansible-playbook polaris-forge-setup/catalog_setup.yml

6. Generate Verification Notebook

ansible-playbook polaris-forge-setup/catalog_setup.yml --tags=verify

7. Purge and Re-bootstrap (if needed)

Purge:

kubectl patch job polaris-purge -n polaris -p '{"spec":{"suspend":false}}'
kubectl wait --for=condition=complete --timeout=300s job/polaris-purge -n polaris
kubectl logs -n polaris jobs/polaris-purge

Re-bootstrap:

kubectl delete -k k8s/polaris/job
kubectl apply -k k8s/polaris/job
kubectl wait --for=condition=complete --timeout=300s job/polaris-bootstrap -n polaris
kubectl logs -n polaris jobs/polaris-bootstrap

8. Cleanup

Cleanup catalog:

ansible-playbook polaris-forge-setup/catalog_cleanup.yml

Delete cluster:

bin/cleanup.sh

About

A comprehensive development environment for Apache Polaris featuring LocalStack integration on k3s. This kit automates the setup of a complete Polaris environment with S3-compatible storage, authentication, and role-based access control.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages