Skip to content

GF-approved geeked homelab k8s cluster deployed with Talos Linux; automated via FluxCD, Renovate and GitHub Actions 🤖

License

Notifications You must be signed in to change notification settings

deedee-ops/home-ops

Repository files navigation

kubepepe
Art by @SkeletalGadget

My Home Operations Repository ☸

... managed with Flux, Komodo, OpenTofu, Renovate and GitHub Actions 🤖

Discord   Talos   Kubernetes   Flux   Renovate

Age-Days   Uptime-Days   Node-Count   Pod-Count   CPU-Usage   Memory-Usage  Power-Usage  Alerts 


💡 Overview

This is a mono repository for my home infrastructure and Kubernetes cluster. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using tools like Komodo, OpenTofu, Kubernetes, Flux, Renovate, and GitHub Actions.

Note

Old ArgoCD is not maintained anymore, but it is available here for reference.


🌱 Kubernetes

My Kubernetes cluster is deployed with Talos. This is a semi-hyper-converged cluster, workloads and block storage are sharing the same available resources on my nodes while I have a separate server with ZFS for NFS/SMB shares, bulk file storage and backups.

There is a great template made by onedr0p if you want to try and follow along with some of the practices I use here.

Core Components

  • Networking & Service Mesh: cilium provides eBPF-based networking, cloudflared secures ingress traffic via Cloudflare, and external-dns keeps DNS records in sync automatically. All egress traffic is carefuly filtered using network policies.
  • Security & Secrets: cert-manager automates SSL/TLS certificate management. For secrets, I use external-secrets with self-hosted HashiCorp Vault to inject secrets into Kubernetes.
  • Storage & Data Protection: rook provides distributed storage for persistent volumes, with volsync handling backups and restores. spegel improves reliability by running a stateless, cluster-local OCI image mirror.
  • Automation & CI/CD: actions-runner-controller runs self-hosted GitHub Actions runners directly in the cluster for continuous integration workflows.

GitOps

Flux watches the clusters in my kubernetes folder (see Directories below) and makes the changes to my clusters based on the state of my Git repository.

The way Flux works for me here is it will recursively search the kubernetes/clusters/<cluster name> folder until it finds the most top level kustomization.yaml per directory and then apply all the resources listed in it. That aforementioned kustomization.yaml will generally only have a namespace resource and per-cluster Flux kustomization for subset of apps used in said cluster. Under the control of those Flux kustomizations there will be a HelmRelease or other resources related to the application which will be applied.

Renovate watches my entire repository looking for dependency updates, when they are found a PR is automatically created. When some PRs are merged Flux applies the changes to my cluster.

Docker

Machines which are not feesible to be maintained by kubernetes (like NAS), are managed by Komodo and Docker Compose files. Directories are organized similar to flux flow - there are global stacks with application configuration meant to be shared among machines, and hosts configurations with fine-tuned per-machine options.

Directories

This Git repository contains the following directories.

📁 bootstrap      # initial set of files necessary to kickstart the cluster
📁 docker
├── 📁 hosts      # per-host docker compose komodo configurations
└── 📁 stacks     # application templates with base rules shared among machines
📁 kubernetes
├── 📁 apps       # application templates with base rules shared among clusters
├── 📁 clusters   # per-cluster configurations of said apps
└── 📁 components # re-useable kustomize components
📁 opentofu       # opentofu plans for external services like cloudflare
📁 talos          # per-cluster talos configurations

😶 Cloud Dependencies

While most of my infrastructure and workloads are self-hosted I do rely upon the cloud for certain key parts of my setup. This saves me from having to worry about three things. (1) Dealing with chicken/egg scenarios, (2) services I critically need whether my cluster is online or not and (3) The "hit by a bus factor" - what happens to critical apps (e.g. Email, Password Manager, Photos) that my family relies on when I no longer around.

Service Use Cost
BorgBase Borg Backups $80/yr
Cloudflare Services exposed externally Free
GitHub Hosting this repository and continuous integration/deployments Free
healthchecks.io Heartbeats monitoring Free
Migadu Email hosting $19/yr
NextDNS Ad filtering ~$20/yr
Pushover Kubernetes Alerts and application notifications $5 OTP
Total: ~$10/mo

Networking

Click to see a high-level network diagram
graph TD
    %% Class Definitions
    classDef wan fill:#f87171,stroke:#fff,stroke-width:2px,color:#fff,font-weight:bold;
    classDef core fill:#60a5fa,stroke:#fff,stroke-width:2px,color:#fff,font-weight:bold;
    classDef agg fill:#34d399,stroke:#fff,stroke-width:2px,color:#fff,font-weight:bold;
    classDef switch fill:#a78bfa,stroke:#fff,stroke-width:2px,color:#fff,font-weight:bold;
    classDef device fill:#facc15,stroke:#fff,stroke-width:2px,color:#000,font-weight:bold;
    classDef vlan fill:#1f2937,stroke:#fff,stroke-width:1px,color:#fff,font-size:12px;

    %% Nodes
    WAN[🛜 netia<br/>1Gbps/300Mbps WAN]:::wan
    UCG[📦 UCG Ultra]:::core
    AGG[🔗 USW Pro Max 16 PoE]:::agg
    NAS[💾 NAS<br/>1 Server]:::device
    KUBE[☸️ Kubernetes<br/>3 Nodes]:::device
    SW[🔌 USW Flex 2.5G]:::switch
    DEV[💻 Devices]:::device
    WIFI[📶 WiFi Clients]:::device

    %% Subgraph for VLANs
    subgraph VLANs [VLANs]
        direction TB
        HOME[Home Network<br/>192.168.2.0/24]:::vlan
        IOTNOWAN["IoT Network (No WAN)<br/>192.168.3.0/24"]:::vlan
        IOTWAN["IoT Network (WAN)<br/>192.168.4.0/24"]:::vlan
        KUBERNETES[Kubernetes Network<br/>192.168.42.0/24]:::vlan
        VPN[VPN Network<br/>192.168.69.0/24]:::vlan
        GUEST[Guest Network<br/>192.168.99.0/24]:::vlan
        MGMT[Management Network<br/>192.168.254.0/24]:::vlan
    end

    style VLANs fill:#111,stroke:#fff,stroke-width:2px,rx:0,ry:0,padding:20px;

    %% Links
    WAN -.->|WAN| UCG
    UCG --> AGG
    AGG -- 2x10G LACP --- NAS
    AGG --> DEV
    AGG --> WIFI
    AGG -- 2.5G --- SW
    SW --> KUBE

    %% Style the bonded links thicker
    linkStyle 2 stroke-width:4px,stroke:34d399;

    %% Move VLANs below the graph, to the middle
    KUBE -.-> KUBERNETES
    linkStyle 7 stroke-width:0,opacity:0;
Loading

🌎 DNS

In my cluster there are two instances of ExternalDNS running. One for syncing private DNS records to my UCG Ultra using ExternalDNS webhook provider for UniFi, while another instance syncs public DNS to Cloudflare. This setup is managed by creating ingresses with two specific classes: internal for private DNS and external for public DNS. The external-dns instances then syncs the DNS records to their respective platforms accordingly.


⚙ Hardware

Click here to see my server rack rack
Device Num OS Disk Size Data Disk Size Ram OS Function
Intel NUC12WSHi5 3 512GB NVME 1TB (rook-ceph) 64GB Talos Kubernetes
AMD Ryzen + GB B550I Aorus Pro AX 1 1TB SSD 2x26TB ZFS (mirrored) + 4TB SSD 64GB TrueNAS SCALE NFS + Backup Server
JetKVM + AIMOS HDMI KVM Switch 1 - - - - KVM for Kubernetes
UniFi UCG Ultra 1 - - - - Router
UniFi USW-Pro-Max-16-PoE 1 - - - - 1Gb+2.5Gb PoE Switch
UniFi Flex Mini 2.5G 1 - - - - 2.5Gb k8s Switch

🌟 Stargazers


🙏 Gratitude and Thanks

Thanks to all the people who donate their time to the Home Operations Discord community. Be sure to check out kubesearch.dev for ideas on how to deploy applications or get ideas on what you could deploy.

About

GF-approved geeked homelab k8s cluster deployed with Talos Linux; automated via FluxCD, Renovate and GitHub Actions 🤖

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •