Site Reliability Engineer | DevOps | Cloud Infrastructure
I build and operate reliable, scalable cloud-native platforms with a strong focus on automation, observability, and incident resilience.
My work is centered on applying real-world SRE principles, not just tool usage.
- Cloud: Google Cloud Platform (GCP)
- Containers & Orchestration: Kubernetes, Docker
- Infrastructure as Code: Terraform
- CI/CD: GitHub Actions
- Automation & Scripting: Python, Bash
- Observability: Prometheus, Grafana
- Reliability Practices: SLOs, MTTR reduction, incident response
- Operated and supported production-grade Kubernetes platforms in cloud environments
- Acted as incident responder for SEV-1 / SEV-2 issues, ensuring rapid recovery and root cause analysis
- Implemented monitoring, alerting, and dashboards to improve system visibility
- Automated infrastructure provisioning and deployments using Terraform and CI/CD pipelines
- Focused on reliability engineering outcomes: stability, predictability, and operational excellence
π Repository: sre-integrated-project
A hands-on, end-to-end SRE implementation that demonstrates how modern reliability engineering is applied in real systems.
What this project showcases:
- Cloud infrastructure provisioning using Terraform
- Kubernetes-based service deployment with health checks
- Observability stack (metrics, dashboards, alerts)
- Incident-readiness and operational workflows
- Progressive delivery using Argo Rollouts (canary-based deployments)
This project is structured to reflect how SRE is practiced in production environments, from foundational setup to advanced deployment safety.
π (https://github.com/Nithish2699/sre-integrated-project)
- LinkedIn: www.linkedin.com/in/nithish-sagiraju
- Email: nithishraju2699@gmail.com
β If youβre a recruiter or engineer reviewing this profile:
my repositories focus on design decisions, reliability thinking, and production readiness.