Membership in the Touchpoint Analytics Platform Team responsible for the development, maintenance, and operation of highly scalable data processing systems hosted on GCP
Significant influence on data infrastructure architecture solutions in accordance with enterprise architecture principles and coding standards
Ensure reliable delivery of infrastructure improvements and platform enhancements
Help manage our backlog focusing on infrastructure stability, monitoring, and system performance
Build and maintain our data platform infrastructure using Infrastructure as Code
Promote infrastructure best practices and help teams implement them properly
Support product teams with GCP data platform setup, monitoring solutions, and deployment pipelines
Actively contribute to our internal cloud infrastructure and platform engineering community
Implement and maintain monitoring, alerting, and self-healing systems to ensure platform reliability and minimal downtime
Apply Site Reliability Engineering (SRE) principles to improve system resilience and automate operational tasks
Develop and optimize CI/CD pipelines for infrastructure and data platform deployments
Your profile
Strong interest in secure, highly scalable, multi-region data processing and analytics systems
Fundamental knowledge of data architectures, streaming protocols, and distributed systems (desirable)
Profound expertise in configuration and maintenance of GCP data platform services (Dataflow, PubSub, BigQuery) using Infrastructure as Code
Mandatory experience with Terraform for infrastructure provisioning and management
Familiarity with common data engineering and DevOps technologies (Kubernetes, GitHub, Apache Beam, GitOps, etc.)
Experience with monitoring and observability tools like Grafana, Prometheus, alerting systems (PagerDuty, OpsGenie), and performance optimization with focus on SLI/SLO definition and implementation
Good knowledge of programming languages, especially Python and Java, for infrastructure automation and tooling
Experience with data platform operations, monitoring, and troubleshooting
Experience in cross-functional teams within agile enterprise environments (advantageous)
Fluency in English (German language skills are a plus)
Strong DevOps mindset with hands-on experience in CI/CD pipeline development and automated deployment strategies
Solid understanding of Site Reliability Engineering (SRE) principles including error budgets, incident response, and post-mortem culture
Experience with automated remediation and self-healing systems to reduce manual operational overhead
Proficiency in container orchestration and service mesh technologies (Docker, Kubernetes, Istio) for scalable platform operations