Vacant job
- Jobs
- Senior SRE & Production Reliability Lead – Build a New Operational Capability
Senior SRE & Production Reliability Lead – Build a New Operational Capability
Avaron ABStockholms län, Stockholm
Previous experience is desired
4 days left
to apply for the job
At Avaron, you get the security of a permanent employment combined with the variety of working on-site at different clients. We recruit specialists in everything from technology, IT, and industry to project management and business support – and regardless of the assignment, you have a consultant manager who is there for you and your development.
About the RoleYou will step into a strategically important role where you will build a new operational capability in SRE (Site Reliability Engineering) and Production Reliability for modern, business-critical applications and platforms. The environment is technically advanced and characterized by high demands for availability, security, traceability, and compliance. Here, technology, processes, responsibility, and governance need to work together from the start.
This role is suitable for you who thrive at the intersection of change management and modern IT operations. You will lead the establishment of a way of working that functions both in daily operations and long-term within a regulated banking context, with close collaboration between application teams, platform owners, operations functions, security functions, and external partners. This is an opportunity for you who want to combine strategic influence with operational execution in an environment where stability and quality truly matter.
Responsibilities- You develop the target vision, establishment plan, and implementation structure for a new SRE and Production Reliability capability.
- You define the team's mission, responsibilities, service boundaries, and collaboration with application teams, service desk, platform owners, operations partners, and security functions.
- You build the team based on roles, competencies, staffing, onboarding, training needs, and readiness models, and also step in as interim team lead.
- You ensure that the team can work operationally in modern environments with container platforms, OpenShift, Kubernetes, Azure, pipelines, networking, IAM, secrets, storage, and databases.
- You establish ways of working for incident management, alert management, escalation, runbooks, problem management, improvement work, and operational follow-up.
- You introduce a model for production maturity and onboarding of applications to the SRE team, including readiness criteria, documentation requirements, observability, support boundaries, and recovery capabilities.
- You drive the development of monitoring, dashboards, observability, and metrics, preferably with support from LGTM or similar tools.
- You contribute to reduced operational risk through better stability, shorter recovery times, fewer recurring incidents, more automation, and clearer operational governance.
- At least 8–10 years of experience in senior roles within IT operations, production operations, platform, cloud, infrastructure, or SRE.
- Documented experience in establishing, transforming, or leading operational capabilities or teams within operations, platform, or SRE.
- Documented experience working in complex, business-critical IT environments with many dependencies and stakeholders.
- Experience in defining or introducing operating models, responsibility distribution, collaboration forms, processes, and governance between multiple functions.
- Good understanding of container-based platforms such as Kubernetes and/or OpenShift.
- Good understanding of cloud environments, preferably Azure, as well as platform-adjacent areas such as networking, IAM, secrets, storage, and databases.
- Good understanding of CI/CD, pipelines, automation, and modern delivery flows.
- Experience with incident management, readiness, alert management, problem management, and operational follow-up.
- Experience in establishing or developing monitoring, observability, dashboards, alerts, and metrics.
- Experience working in environments with high demands for security, compliance, documentation, access control, and auditability.
- Experience in change management and implementation in organizations where new ways of working need to be anchored among several stakeholders.
- Very good ability to express yourself in Swedish and English, both verbally and in writing.
- Experience from banking, finance, insurance, or other regulated activities.
- Experience in establishing or leading a 24/7 organization with on-call or readiness duties.
- Experience in introducing onboarding models or readiness criteria for applications and services to operations or SRE organizations.
- Experience with OpenShift in enterprise environments and with Azure including private endpoints, networking, key or secrets management, and relevant platform services.
- Experience with LGTM or another modern observability stack.
- Experience in vendor management and collaboration in ecosystems with internal and external operations partners.
- Experience with resilience, recovery capabilities, continuity, backup, recovery, or DR.
- Previously conducted similar establishments from target vision to functioning operational delivery.
- Permanent employment at Avaron AB
- Pension plan
- Wellness allowance of 5,000 SEK per year
We are hiring continuously – please apply as soon as possible.
🖐 Was this job fit for someone?
Other jobs in the same field
Maybe it’s time to broaden the search with these available jobs
-
Up to 25% off experiences for mom – Celebrate Mother’s Day with Live it
Tue, 26 May 2026 - 12:00