München | Anteilig remote | Vollzeit

Staff Site Reliability Engineer (m/f/d)

About the Role

As a Staff Site Reliability Engineer at ARX Robotics, your mission is to transform our central Cloud & IT services into highly reliable, observable, and automated products. You will take ownership of the critical infrastructure that our engineering teams depend on every day, including Vault/PKI, CI/CD systems, monitoring platforms, and other self-hosted tools.

This role is for you if you are driven by a deep need to automate, have strong opinions about backups because you’ve had to restore from them, and believe that doing a task manually more than once is a bug. You’ll be at the core of our engineering ecosystem, ensuring the systems that carry the company are robust, resilient, and always improving.

What You’ll Build

  • Clear service ownership, SLOs, and incident response workflows for our shared platform services.
  • A comprehensive observability practice with meaningful metrics, logs, alerts, and operational dashboards.
  • Resilient and automated patterns for deployment, monitoring, backup, and recovery.
  • Pragmatic automations that eliminate rec urring operational work and unblock engineering teams.
  • Highly available and secure shared services like Vault/PKI, build infrastructure, and CI/CD support systems.
  • Actionable runbooks and operational documentation that empower teams to respond with confidence.
  • Strong partnerships with engineering teams to establish clear ownership boundaries and improve service handoffs.
  • A close collaboration with Backend Engineering to ensure new internal applications are operable from day one.
  • A culture of reliability by participating in incident response, recovery drills, and blameless post-mortems.

What You Bring

  • A deep-seated passion for reliability and automation, likely demonstrated by personal projects, a homelab, or a history of automating your own workflows.
  • Proven experience in a Site Reliability, DevOps, or Platform Engineering role where.you were responsible for production systems.
  • Hands-on experience operating and improving shared services like CI/CD, secrets management, or monitoring platforms.
  • An automation-first mindset, with the scripting skills (e.g., Python, Go, or shell) to back it up.
  • A strong understanding of observability principles and experience building out monitoring for production services.
  • The ability to write clear and concise documentation, especially for runbooks and incident procedures.
  • A proactive, collaborative approach to problem-solving and a commitment to operational excellence.

Please note: You do not need to meet every single requirement to apply. We welcome motivated candidates who are eager to grow into the role and develop their expertise further.