Site Reliability Engineer (ML Ops)
What we do at Codat
Our mission is to make life easier for the lifeblood of economies globally; small and medium-sized businesses. Codat is a universal API for consented business financial data, powering the next generation of products and services for this historically underserved market.
We have offices in London, New York, Sydney, and a San Francisco office will be opening soon. We are a privately held company and have recently closed our Series B, being funded by Index Ventures, Tiger Global, American Express, PayPal and a line-up of world-class angel investors.
We live by our values of being united as a single team, building a product that is useful to our clients and their customers alike, with a focus and urgency that makes us unstoppable.
What we’re up to right now
Our teams are currently focusing on:
- Building high coverage for Codat's API across third-party software platforms. This includes building new data formats and sources into our financial data standard.
- Reliably and quickly processing huge amounts of financial data into our data cache and query engine.
- Building additional products on top of our core API to unlock the power of our clients' data, through the movement of data between sources, data visualisation, or insights.
- Best in class monitoring and operations tools to allow both ourselves and our clients to find, and fix, data issues early.
What you will be doing
We're looking for hands-on engineers with experience building and operating Python microservices and data pipelines reliably in production. You'll be working with other SREs, data engineers, and data scientists, across multiple development teams to provide guidance and improve our service and model reliability, observability, performance, scalability, and security.
- Working across development teams to support new features and deploy services.
- Building tools and instrumenting code using OpenTelemetry SDKs to improve our platform and model observability.
- Automating zero-downtime deployments through CI/CD pipelines and infrastructure-as-code.
- Troubleshooting incidents and ensuring lessons are learnt by following up with effective post-mortems.
- Contributing to our platform architecture roadmap to ensure our Azure infrastructure continues to scale and meet our needs.
Our system is entirely hosted on Microsoft's Azure cloud platform, and is built with a mixture of modern .NET and Python, utilising the latest language and runtime features. Our system is service-based, and leverages Azure Service Bus, Azure Storage, and SQL Server to ingest, process, and surface large amounts of data reliably and efficiently.
Our engineers operate in small, focused, multidisciplinary, and highly autonomous teams of around 4 to 6 people. Our teams tend to include:
- A hands-on Lead Engineer who spends time focusing on people and coding. A a bit of time on product alignment and technical alignment along with a product owner and delivery manager.
- A QA Engineer who represents quality throughout the team and encourages critical thinking as well as supporting automation.
- Software Engineers who help with business analysis, writing tests and code, and operating the components that the team owns.
- Data scientists responsible for data analysis, exploration, and building models to support business requirements, you will be working with them quite closely across the model lifecycle.
- Other specialists will sometimes join a team for certain work. This might be a designer or front-end engineer for a UI product. We have a team of platform engineers that also might get involved early on with a new feature, or during a scale activity, they might be a DBA or an SRE like you that can make sure that the team is building something they can support and run in production successfully.
All our engineers have end-to-end responsibility. They’re involved from early on in the product design process all the way through to monitoring and operation. With this responsibility comes trust. Our engineers are empowered to use the best tool for the job. We encourage engineers to be innovative, always thinking about the best ways to give value to our clients.
Our teams are always finding ways to make themselves more agile, and most teams use a “Scrumban” style of work, stand-ups, retrospectives, planning, and refining. Teams move rapidly, releasing at least once a day. We practise continuous integration and testing, and sometimes continuous delivery. Testing is very important to our process, and we strive for high levels of unit and end-to-end test coverage. This is helping us work towards our goal of continuous flow.
No matter what we’re doing - whether we’re speaking to customers, partners or to each other - we live by our values.
We believe in delivering useful technology that solves real problems for real businesses. We have a real want to do the stuff that isn't always “cool” but makes a difference.
We believe that the people in the best teams push and enable each other to excel. We’re united when we have each other’s backs - when something goes wrong, we don’t blame, we work together to fix it. We embrace differences of opinion to end up with better outcomes. We don’t let our egos win.
We believe that an unstoppable drive towards a single, clearly stated goal is the best way to build great things. We are biased towards action - we make informed decisions and then we act. There is no such thing as an impossible problem, just a great challenge to sink our teeth into.
What excites us
- We use a mix of technologies at Codat, but most are services supplied by Azure and leveraged using Python for data tasks, we like platform engineers that have an interest in good design for tooling in a Python environment.
- Our business apps work extensively with the .NET web stack and so some knowledge of the Microsoft stack and Azure would be helpful.
- Clear and concise communicators; we expect Engineers to collaborate with other team members, including data scientists, developers, quality assurance engineers and product stakeholders outside of Engineering. We really like platform engineers that like to collaborate with other engineering teams on operational concerns early in the process.
- We have a passion for testing and do behaviour and data driven unit tests for our data product and expect our engineers to have an active hand in developing integration tests with our quality assurance engineers.
- We really like engineers that are self-motivated, have a logical and systematic approach to problems solving, but understand things work best when we work as a team, are approachable, open minded, and patient.
- At Codat we handle huge volumes of sensitive data, and we are currently logging over 5 billion trace events every week and crunch TB of data in our models, we really like platform engineers who are excited by these scale problems and think in terms of security first.
£70,000 - £90,000 / year
If you are excited about applying for this role but aren't certain you meet 100% of the criteria, we'd still love to hear from you