A comprehensive Infrastructure as Code (IaC) solution for deploying a production-ready Kubernetes cluster on Proxmox with monitoring, backup, and network scanning capabilities.
This project deploys a complete Kubernetes infrastructure stack including:
- Kubernetes Cluster: K3s-based cluster with control plane and worker nodes
- Load Balancing: MetalLB for bare-metal load balancing
- Monitoring Stack: Prometheus, Grafana, Loki, and Mimir for comprehensive observability
- Log Aggregation: Syslog receiver for OPNsense and external device logs
- Backup System: Automated backup solution with NFS storage
- Ingress: Nginx ingress controller with host-based routing
- Automation: N8N workflow automation platform
- Proxmox VE 7.0+ with API access
- Terraform 1.0+
- SSH key pair for VM access
- NFS server for backup storage (optional)
git clone <repository-url>
cd kubernetes-proxmox-infrastructure
cp terraform.tfvars.example terraform.tfvarsEdit terraform.tfvars with your environment settings:
# Proxmox Configuration
proxmox_api_url = "https://your-proxmox:8006/api2/json"
proxmox_api_token_id = "your-token-id"
proxmox_api_token_secret = "your-token-secret"
# VM Configuration
ssh_public_key_path = "~/.ssh/id_ed25519.pub"
ssh_private_key_path = "~/.ssh/id_ed25519"
# Network Configuration
vm_network_bridge = "vmbr0"
vm_network_vlan = 100
# NFS Backup Configuration (optional)
nfs_server_ip = "192.168.1.100"
nfs_backup_path = "/data/kubernetes/backups"# Pre-flight checks
./scripts/deployment/pre-flight-check.sh
# Deploy full stack
./scripts/deployment/deploy-full-stack.shβββ docs/ # Documentation
β βββ deployment/ # Deployment guides
β βββ backup/ # Backup documentation
β βββ monitoring/ # Monitoring setup
β βββ troubleshooting/ # Troubleshooting guides
β βββ TERRAFORM_STRUCTURE.md # Terraform organization guide
βββ scripts/ # Automation scripts
β βββ deployment/ # Deployment scripts
β βββ backup/ # Backup and restore scripts
β βββ maintenance/ # Maintenance scripts
β βββ troubleshooting/ # Troubleshooting scripts
βββ configs/ # Configuration files
β βββ grafana/ # Grafana dashboards
β βββ prometheus/ # Prometheus configs
β βββ backup/ # Backup configurations
βββ *.tf # Terraform configuration files
βββ terraform.tfvars # Environment variables
The Terraform configuration is organized following best practices with clear separation of concerns:
versions.tf: Terraform and provider version requirementsproviders.tf: Provider configurationsvariables.tf: Input variable declarationsoutputs.tf: Output value declarationsbackend.tf: Backend configuration
infrastructure-proxmox.tf: VM definitions and provisioninginfrastructure-network.tf: MetalLB and Ingress controller
kubernetes-cluster.tf: Cluster bootstrapping and setupkubernetes-storage.tf: NFS storage configurationkubernetes-ingress.tf: Ingress resource definitions
monitoring.tf: Prometheus, Grafana, Loki, Mimir stackbackup.tf: Backup system and CronJobsopnsense-logging.tf: OPNsense log integration
applications-media.tf: Media applications (Mylar)applications-automation.tf: Automation tools (N8N)
See TERRAFORM_STRUCTURE.md for detailed information
- Prometheus: Metrics collection and alerting
- Grafana: Visualization and dashboards
- Loki: Log aggregation and analysis
- Mimir: Long-term metrics storage
- Automated Backups: Scheduled ETCD and application data backups
- Manual Backup Triggers: On-demand backup capabilities
- Restoration Testing: Comprehensive restore validation
- NFS Storage: Centralized backup storage with redundancy
- MetalLB: Layer 2 load balancing for bare-metal
- Traefik Ingress: HTTP/HTTPS routing with automatic SSL
- Network Policies: Secure inter-pod communication
# Full stack deployment
./scripts/deployment/deploy-full-stack.sh
# Step-by-step deployment
./scripts/deployment/deploy-step-by-step.sh
# Pre-flight checks
./scripts/deployment/pre-flight-check.sh# Manual backup (all components)
./scripts/backup/trigger-manual-backup.sh
# Test backup restoration
./scripts/backup/test-backup-restoration.sh dry-run
# Restore specific component
./scripts/backup/test-individual-restore.sh grafana# Check NFS permissions
./scripts/maintenance/test-nfs-permissions.sh
# Update Grafana dashboards
./scripts/maintenance/update-grafana-dashboards.sh
# Diagnose NFS access issues
./scripts/troubleshooting/diagnose-nfs-access.sh
# Fix kubeconfig secret encoding
./scripts/troubleshooting/fix-kubeconfig-secret.sh- SSH Key Authentication: Password authentication disabled by default
- Network Segmentation: VLANs and network policies for isolation
- Secret Management: Kubernetes secrets for sensitive data
- Backup Encryption: Consider encrypting backup data at rest
- Access Control: RBAC policies for service accounts
-
Kubernetes Cluster Overview: Node and pod metrics
-
Backup Monitoring: Backup status and performance
-
Application Metrics: Component-specific dashboards
- Cluster resource utilization
- Backup success/failure rates
- Network device discovery status
- Application performance metrics
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Documentation: Check the
docs/directory for detailed guides - Issues: Report bugs and feature requests via GitHub issues
- Troubleshooting: Use the troubleshooting scripts in
scripts/troubleshooting/
Current version: 1.0.0
See CHANGELOG.md for version history and updates.
Note: This infrastructure is designed for production use but should be thoroughly tested in your environment before deployment. Always follow your organization's security and operational guidelines.