Create Alert

Apply now »

Senior Site Reliability Engineer

Date: 23 Feb 2026

Location:

Bellville, Western Cape, ZA

Company: Sanlam Group

Who are we?

Sanlam Fintech is a newly established digital first business within the Sanlam Group on a mission to democratize financial advice and solutions for everyone across the African continent. We exist to pioneer inclusive financial confidence helping people build strong foundations to bridge the gap in generational wealth. Our culture us that of agility and constant deployment, we believe in learning fast, learning cheap and learning forward. Our aim is to provide a work environment where knowledge workers can accelerate the development of their ideas and bring innovation to market, at the same time provide compelling career and development proposition that will enable them to realize their dreams.

Position Overview

The Site Reliability Engineer (SRE) at Sanlam Fintech is responsible for ensuring the reliability, scalability, and performance of our cloud-native infrastructure and services. This role bridges software engineering and operations, applying engineering principles to solve complex infrastructure challenges. The SRE will focus on building and maintaining resilient systems on AWS, implementing comprehensive observability solutions, and driving automation across the infrastructure lifecycle.

Operating in a DevOps environment, the SRE takes full ownership of the systems they build and operate, ensuring high availability and optimal customer experience. They work closely with Software Engineers, Platform Engineers, and DevSecOps teams to deliver infrastructure solutions that support Sanlam Fintech business objectives and uphold our commitment to operational excellence.

What will you do?

Reliability & Resilience

Build highly available, fault-tolerant systems on AWS
Define SLIs, SLOs and error budgets to track and improve reliability
Plan and implement disaster recovery strategies (RTO/RPO)
Lead incident response and root cause analysis
Build self-healing systems with automated fixes for common failures
Run chaos engineering tests to find and fix weaknesses

Observability & Monitoring

Set up metrics, logs and traces for full system visibility
Build dashboards and alerts for fast incident detection
Implement distributed tracing to spot performance issues
Set monitoring standards and maintain operational runbooks
Publish regular uptime and operational metrics reports

Infrastructure Automation

Write and maintain Infrastructure as Code using Terraform and CloudFormation
Automate provisioning, configuration and deployments with DevOps/Platform teams
Build and manage CI/CD pipelines using GitHub Actions
Implement GitOps practices and self-service automation to reduce manual work

Cloud Infrastructure & Architecture

Design and optimise serverless solutions (Lambda, API Gateway, Step Functions)
Manage and optimise Kubernetes clusters
Implement cloud-native patterns like event-driven and microservices architectures
Optimise cloud costs and evaluate new AWS services

Software Engineering & Development

Build clean, well-structured automation tools and scripts
Apply Clean Architecture and Domain-Driven Design to infrastructure code
Improve internal tools to boost developer productivity
Use AI tools (Claude, GPT) to automate routine tasks

Collaboration & Knowledge Sharing

Work with cross-functional teams using Jira, Confluence and JSM
Participate in on-call rotations and incident handoffs
Mentor junior engineers in SRE practices
Document decisions, procedures and run blameless postmortems

Qualification and Experience

Required Experience

5+ years of experience in systems engineering, DevOps, or site reliability engineering roles
3+ years of hands-on experience with AWS cloud services in production environments
2+ years of experience with Infrastructure as Code (Terraform and/or CloudFormation)
Demonstrated experience in incident management and on-call responsibilities
Track record of implementing automation that reduced operational toil

Educational Background

Bachelor's degree in Computer Science, Information Technology, Engineering or related field; or equivalent practical experience
Relevant professional certifications are advantageous but not required

What will make you successful in this role?

Cloud Platforms & Infrastructure

Strong expertise in AWS services including EC2, ECS, EKS, Lambda, API Gateway, Step Functions, S3, RDS, DynamoDB, CloudWatch and networking services such as VPC, Route53 and ALB/NLB
Deep understanding of serverless architecture patterns and best practices
Experience with Kubernetes cluster management, deployment strategies and service mesh concepts
Knowledge of cloud security best practices including IAM, security groups and encryption

Infrastructure as Code & Automation

Proficiency in Terraform for multi-environment infrastructure management
Experience with AWS CloudFormation for native AWS resource provisioning
Strong scripting skills in Python for automation and tooling development
Experience with configuration management tools and practices

Observability & Monitoring

Expertise in Datadog, Cloudwatch and OTEL for full-stack observability including APM, infrastructure monitoring, log management and synthetic testing and monitoring
Experience designing and implementing SLI/SLO frameworks
Proficiency in creating effective dashboards, alerts and runbooks
Understanding of distributed tracing and correlation across services

Development & Version Control

Strong experience with GitHub for version control, code review and CI/CD workflows
Understanding of Clean Architecture principles and their application to infrastructure code
Familiarity with Domain-Driven Design concepts for complex system design
Experience building and maintaining CI/CD pipelines using GitHub Actions

Tools & Platforms

Proficiency with Atlassian suite (Jira, Confluence) for project management and documentation
Experience leveraging AI tools (Claude, GPT) for code generation, documentation, and problem-solving
Familiarity with containerisation technologies (Docker) and orchestration platforms
Experience with Linux system administration and troubleshooting

Nice To Have Skills

The following skills are desirable and will strengthen a candidate's application:

Experience with additional cloud providers (Azure and GCP) for multi-cloud strategies
Knowledge of FinOps practices and cloud cost optimisation techniques
Experience with chaos engineering tools (AWS Fault Injection Simulator, Gremlin and Chaos Monkey)
Familiarity with service mesh technologies (Istio and AWS App Mesh)
Experience with database reliability engineering and performance tuning
Knowledge of compliance frameworks relevant to financial services (POPIA and PCI-DSS)
Contributions to open-source projects or community involvement
AWS certifications (Solutions Architect, DevOps Engineer or SysOps Administrator)
Kubernetes certifications (CKA and CKAD)
Experience with event-driven architectures using AWS EventBridge, SNS, SQS or Kafka

Knowledge and Skills

IT Data Analysis

IT product enhancements

Software design and deployments

Platform management and integration

Business Requirements

Personal Attributes

Organisational savvy - Contributing through others

Manages complexity - Contributing through others

Plans and aligns - Contributing through others

Optimises work processes - Contributing through others

Build a successful career with us

We’re all about building strong, lasting relationships with our employees. We know that you have hopes for your future – your career, your personal development and of achieving great things. We pride ourselves in helping our employees to realise their worth. Through its five business clusters – Sanlam Fintech, Sanlam Life and Savings, Sanlam Investment Group, Sanlam Allianz, Santam, as well as MiWay and the Group Office – the group provides many opportunities for growth and development.

Core Competencies

Being resilient - Contributing through others

Collaborates - Contributing through others

Cultivates innovation - Contributing through others

Customer focus - Contributing through others

Drives results - Contributing through others

Turnaround time

The shortlisting process will only start once the application due date has been reached. The time taken to complete this process will depend on how far you progress and the availability of managers.

Our commitment to transformation

The Sanlam Group is committed to achieving transformation and embraces diversity. This commitment is what drives us to achieve a diverse, inclusive and equitable workplace as we believe that these are key components to ensuring a thriving and sustainable business in South Africa. The Group's Employment Equity plan and targets will be considered as part of the selection process.

Apply now »