We are looking for a seasoned Site Reliability Engineer (SRE) to join our team and lead the charge in designing, implementing, and maintaining high-performance, scalable, and secure cloud-based infrastructure on Azure. As a senior technical leader, you’ll serve as the “big brother” to DevOps engineers, helping guide best practices across Cloud, Infrastructure, CI/CD, Monitoring, and Reliability Engineering domains.
Key Responsibilities
Design, implement, and manage robust and scalable infrastructure on Microsoft Azure.
Drive the reliability, performance, and scalability of cloud-native applications and services.
Lead architecture and governance for CI/CD pipelines using Azure DevOps.
Build and maintain tools for deployment, monitoring, and operations.
Collaborate with engineering teams to define SLAs, SLOs, and SLIs.
Identify and mitigate system vulnerabilities and performance issues proactively.
Lead incident response, postmortems, and root cause analysis to ensure continuous improvement.
Drive automation across infrastructure provisioning, configuration, and deployments.
Ensure security, compliance, and best practices across environments.
Requirements
Must-Have Qualifications
12+ years of experience in IT/Software Engineering.
10+ years of hands-on DevOps experience.
Expert-level skills in Microsoft Azure cloud platform.
Deep understanding of Azure DevOps (Repos, Pipelines, Boards, Artefacts).
Strong experience in Infrastructure as Code (e.g., Bicep, Terraform, ARM templates).
Proficient in containerization and orchestration tools (e.g., Docker, Kubernetes, AKS).
Solid scripting experience (PowerShell, Bash, Python).
In-depth knowledge of monitoring/logging tools (e.g., Prometheus, Grafana, Azure Monitor, App Insights).
Experience with high availability, disaster recovery, and cost-optimized architecture design.