Ongoing · Production

Self-Hosted Homelab:
Production Infrastructure

Role: Architect & Operator Stack: Docker, Nginx, Linux, Tailscale, Proxmox Started: 2024

Overview

I designed and deployed a bare-metal homelab that runs production-grade services 24/7. This isn't a Raspberry Pi experiment — it's a multi-VM, multi-container infrastructure with secure remote access, automated monitoring, and real web-facing services behind an Nginx reverse proxy. Every component was chosen to mirror production environments I'd manage professionally.

Hardware

Hypervisor
Proxmox VE
CPU
8 Cores
RAM
32 GB DDR4
Storage
4 TB NAS
OS
Debian 12
Network
Tailscale Mesh

Architecture

The lab runs on Proxmox VE as the Type-1 hypervisor, hosting multiple Linux VMs. Each VM has a dedicated role — no monolithic servers. This forces me to think in terms of service boundaries and network segmentation, the same way production environments are designed.

Proxmox VE (Bare-Metal Hypervisor)
 └─ VM 01: Docker Host (Application Containers)
    └─ Nginx Reverse Proxy
    └─ Web Apps (Portfolio, BrightLayer)
    └─ Home Assistant Container
    └─ Plex Media Server
 └─ VM 02: NAS (4 TB Storage)
 └─ VM 03: Monitoring (Uptime, Logs)
 └─ Tailscale VPN (Mesh Network)
    └─ Encrypted remote access from any device
    └─ Exit node for mobile/automotive use

Key Components

Nginx Reverse Proxy

All web-facing services sit behind a single Nginx instance that handles SSL termination, virtual host routing, and rate limiting. I manage SSL certificates via Let's Encrypt with automated renewal. This setup lets me run multiple domains and subdomains from a single public IP without port conflicts.

Docker Compose Workflows

Services are defined in Docker Compose files with pinned versions, health checks, and restart policies. I use named volumes for persistent data and bridge networks to isolate services. The compose files are version-controlled in GitHub so I can rebuild the entire stack from scratch if needed.

Tailscale VPN Mesh

Rather than opening ports to the public internet, I route all administrative access through Tailscale. This gives me a zero-config WireGuard mesh with mutual TLS auth. I access the NAS, monitoring dashboards, and management consoles exclusively through the tailnet — no exposed management interfaces.

DNS & Uptime Monitoring

External DNS is managed through Netlify DNS with proper A/AAAA records, CNAMEs, and MX records for email. I run a lightweight uptime checker that pings each service every 5 minutes and sends alerts on failure. This taught me more about DNS propagation, TTL tuning, and failover than any course could.

Services Running

Lessons Learned

SSL renewal taught me cron + systemd

I originally ran certbot manually. When the first certificate expired at 3 AM, I learned to automate renewals with systemd timers. Now I treat everything as code — no manual steps in the entire stack.

Storage backup strategy matters

I lost a Docker volume once due to a bad update. Now I run nightly rsync jobs to the NAS with 7-day retention. The restore process is documented and tested.

Segregation prevents cascading failures

Early on, everything ran on one VM. When Plex transcoded and ate all the RAM, every service went down. Moving to per-service VMs with resource limits fixed this. Proxmox's cgroup limits are now my standard practice.

Monitoring is non-negotiable

I added uptime monitoring after discovering a service had been down for 3 days without me noticing. A simple HTTP check + email alert now catches issues within minutes. This is the kind of reliability ops teams care about.

What I'd Do Differently

If I rebuilt today, I'd adopt Kubernetes (K3s) for container orchestration and Terraform for infrastructure-as-code. The Docker Compose setup works well but doesn't scale to multi-node. I'm currently evaluating K3s for the next iteration — the control plane concepts transfer directly to AWS EKS, which aligns with my AWS cert.

← Back to Portfolio View GitHub ↗