Overview
I designed and deployed a bare-metal homelab that runs production-grade services 24/7. This isn't a Raspberry Pi experiment — it's a multi-VM, multi-container infrastructure with secure remote access, automated monitoring, and real web-facing services behind an Nginx reverse proxy. Every component was chosen to mirror production environments I'd manage professionally.
Hardware
Architecture
The lab runs on Proxmox VE as the Type-1 hypervisor, hosting multiple Linux VMs. Each VM has a dedicated role — no monolithic servers. This forces me to think in terms of service boundaries and network segmentation, the same way production environments are designed.
Key Components
Nginx Reverse Proxy
All web-facing services sit behind a single Nginx instance that handles SSL termination, virtual host routing, and rate limiting. I manage SSL certificates via Let's Encrypt with automated renewal. This setup lets me run multiple domains and subdomains from a single public IP without port conflicts.
Docker Compose Workflows
Services are defined in Docker Compose files with pinned versions, health checks, and restart policies. I use named volumes for persistent data and bridge networks to isolate services. The compose files are version-controlled in GitHub so I can rebuild the entire stack from scratch if needed.
Tailscale VPN Mesh
Rather than opening ports to the public internet, I route all administrative access through Tailscale. This gives me a zero-config WireGuard mesh with mutual TLS auth. I access the NAS, monitoring dashboards, and management consoles exclusively through the tailnet — no exposed management interfaces.
DNS & Uptime Monitoring
External DNS is managed through Netlify DNS with proper A/AAAA records, CNAMEs, and MX records for email. I run a lightweight uptime checker that pings each service every 5 minutes and sends alerts on failure. This taught me more about DNS propagation, TTL tuning, and failover than any course could.
Services Running
- Personal Portfolio — edwardclark.shop (Netlify-hosted, proxied through the lab)
- BrightLayer Studio — brightlayer.netlify.app (client-facing business site)
- Plex Media Server — Media streaming for personal use via Tailscale auth
- Home Assistant — Home automation with custom integrations
- NAS — 4 TB network storage with SMB/NFS shares
- Uptime Monitor — Internal dashboard tracking service health
Lessons Learned
SSL renewal taught me cron + systemd
I originally ran certbot manually. When the first certificate expired at 3 AM, I learned to automate renewals with systemd timers. Now I treat everything as code — no manual steps in the entire stack.
Storage backup strategy matters
I lost a Docker volume once due to a bad update. Now I run nightly rsync jobs to the NAS with 7-day retention. The restore process is documented and tested.
Segregation prevents cascading failures
Early on, everything ran on one VM. When Plex transcoded and ate all the RAM, every service went down. Moving to per-service VMs with resource limits fixed this. Proxmox's cgroup limits are now my standard practice.
Monitoring is non-negotiable
I added uptime monitoring after discovering a service had been down for 3 days without me noticing. A simple HTTP check + email alert now catches issues within minutes. This is the kind of reliability ops teams care about.
What I'd Do Differently
If I rebuilt today, I'd adopt Kubernetes (K3s) for container orchestration and Terraform for infrastructure-as-code. The Docker Compose setup works well but doesn't scale to multi-node. I'm currently evaluating K3s for the next iteration — the control plane concepts transfer directly to AWS EKS, which aligns with my AWS cert.