Principal Site Reliability Engineer (SRE)
The CHI Software team is not standing still. We love our job and give it one hundred percent of us! Every new project is a challenge that we face successfully. The only thing that can stop us is… Wait, it’s nothing! The number of projects is growing, and with them, our team too. And now we need a Senior Site Reliability Engineer (SRE).
About the client:
We are looking for a Principal Site Reliability Engineer to lead reliability, scalability, and resilience across our cloud platform and production systems.
This is a highly senior, hands-on technical leadership role responsible for defining SRE strategy, designing highly available distributed systems, and embedding reliability best practices across engineering teams.
You will work closely with platform, product, and engineering leadership to ensure our systems can scale reliably while maintaining strong performance, security, and cost efficiency.
Requirements:
- 8+ years of experience in SRE, DevOps, Platform Engineering, or Infrastructure Engineering;
- Deep production experience with Microsoft Azure, including services such as AKS, Azure Networking, Azure SQL, Cosmos DB, Storage, IAM (Azure AD/Entra ID), and monitoring;
- Expert-level Kubernetes experience in production environments;
- Strong Infrastructure-as-Code expertise (Terraform preferred; Bicep/ARM welcome);
- Proven track record designing and operating high-availability, distributed cloud systems;
- Experience building observability and incident management frameworks;
- Programming/scripting skills (Python, Go, PowerShell, Bash, etc.);
- Strong communication and leadership skills.
Responsibility:
- As a Senior Site Reliability Engineer, you will be part of a cross-functional team or a practice team that enables site reliability engineering skills and capabilities across a whole domain;
- Being an enthusiast in SRE, with a strong DevSecOps mindset, and thanks to your excellent collaboration skills you will work with your team to deliver the best answers to our customer’s needs and to take over full responsibility for its applications, from design to operation;
- You care diligently about the quality of your work, including proper documentation and security aspects;
- With your advanced skillet for understanding and solving problems, you are able to take full ownership of complex topics or multi-faceted initiatives and outcomes panning across your domain;
- You will use your deep technical skills to enable your team to deliver operational excellence and ensure and improve the reliability, performance, and maintainability of systems and services;
- You work closely with your team to understand the operational processes, and technical and business needs of the products and services your team is responsible for;
- You will be involved in raising operational readiness requirements as part of the development life cycle and validate that software development and delivery are consistent, meeting the specified requirements.
Our perks
-
Covered vacation period: 20 business days and 5 days off
-
Free English classes
-
Flexible working schedule
-
Truly friendly and supporting atmosphere
-
Working remotely or in one of our offices
-
Medical insurance for employees from Ukraine
-
Legal support