Staff Site Reliability Engineer (m/f/d)

About the Role

As a Staff Site Reliability Engineer at ARX Robotics, your mission is to transform our central Cloud & IT services into highly reliable, observable, and automated products. You will take ownership of the critical infrastructure that our engineering teams depend on every day, including Vault/PKI, CI/CD systems, monitoring platforms, and other self-hosted tools.

This role is for you if you are driven by a deep need to automate, have strong opinions about backups because you’ve had to restore from them, and believe that doing a task manually more than once is a bug. You’ll be at the core of our engineering ecosystem, ensuring the systems that carry the company are robust, resilient, and always improving.

What You’ll Build

Clear service ownership, SLOs, and incident response workflows for our shared platform services.
A comprehensive observability practice with meaningful metrics, logs, alerts, and operational dashboards.
Resilient and automated patterns for deployment, monitoring, backup, and recovery.
Pragmatic automations that eliminate rec urring operational work and unblock engineering teams.
Highly available and secure shared services like Vault/PKI, build infrastructure, and CI/CD support systems.
Actionable runbooks and operational documentation that empower teams to respond with confidence.
Strong partnerships with engineering teams to establish clear ownership boundaries and improve service handoffs.
A close collaboration with Backend Engineering to ensure new internal applications are operable from day one.
A culture of reliability by participating in incident response, recovery drills, and blameless post-mortems.

What You Bring

A deep-seated passion for reliability and automation, likely demonstrated by personal projects, a homelab, or a history of automating your own workflows.
Proven experience in a Site Reliability, DevOps, or Platform Engineering role where.you were responsible for production systems.
Hands-on experience operating and improving shared services like CI/CD, secrets management, or monitoring platforms.
An automation-first mindset, with the scripting skills (e.g., Python, Go, or shell) to back it up.
A strong understanding of observability principles and experience building out monitoring for production services.
The ability to write clear and concise documentation, especially for runbooks and incident procedures.
A proactive, collaborative approach to problem-solving and a commitment to operational excellence.

Please note: You do not need to meet every single requirement to apply. We welcome motivated candidates who are eager to grow into the role and develop their expertise further.

Zur Stellenanzeige

Staff Site Reliability Engineer (m/f/d)

About the Role

What You’ll Build

What You Bring

Ähnliche Jobs

Internship (m/f/d) Industrial Optimization

Technische:r Mitarbeiter:in (w/m/d) im Bereich Automatisierungs- und Systemtechnik

Duales Studium BWL - Marketing Management - Mannheim 2027

Bachelor of Science, Angewandte Informatik (m/w/d) (2027)