via Indeed
Staff Machine Learning Engineer, AI Applications, Berlin
About the role
As a Staff Machine Learning Engineer, you will operate at the intersection of advanced AI, knowledge systems, and Bolt's production support ecosystem, setting the technical direction for how we measure, improve, and scale AI quality across conversational experiences. You will help ensure our systems are accurate, helpful, grounded, and aligned with business and customer needs.
This is a high-impact, high-scope role where you will define the quality and learning strategy behind our AI systems, partner closely with engineering, product, operations, and domain experts, and turn real-world support interactions into durable improvements across models, retrieval, and workflows.
You will work closely with engineers and cross-functional partners building Bolt's AI automation stack, helping ensure our systems are not only scalable and reliable, but also grounded, aligned, and continuously improving in production.
Main tasks and responsibilities:
- Lead alignment of Customer Support AI system behaviour to Bolt's business goals, customer experience standards, and policy requirements, improving answer quality, consistency, grounding, and escalation behaviour..
- Build and scale knowledge mining capabilities that turn conversations, support tickets, workflows, policies, and operational data into reusable knowledge assets and improvement signals.
- Own the evaluation strategy for LLM-powered customer support systems, including offline benchmarks, golden datasets, human review workflows, online experimentation, and production quality metrics.
- Develop feedback loops that connect production failures and human corrections back into prompts, retrieval, benchmarks, and model behaviour, enabling continuous improvement of automation quality.
- Partner closely with engineers working on AI application infrastructure, contributing ML expertise to areas such as tool-use quality, routing behaviour, memory quality, and multi-step task performance.
- Influence roadmap and technical direction across teams by bringing clarity to trade-offs in model behaviour, retrieval quality, experimentation, and business impact.
We are not really looking for Node.js experts, but for cross-language engineers for whom a specific stack is just a tool to solve a problem, not the solution itself.
About you:
- You have hands-on experience with retrieval-augmented systems, search, ranking, or knowledge-based AI applications, and understand what drives high-quality grounded responses.
- You have worked on knowledge-heavy or policy-constrained domains where correctness, consistency, and trust matter.
- You have experience building feedback loops from user behaviour, QA signals, annotations, or operational outcomes into system improvements.
- You are able to operate at both strategic and hands-on levels: defining long-term direction while also diving into data, experiments, and failure analysis.
- You have strong communication and stakeholder management skills, and can align engineering, product, and operations around quality strategy and trade-offs.
- Experience with LLM-based pipelines or agentic AI systems is a strong plus, especially in areas such as tool use, routing, memory, multi-step reasoning, or human-AI orchestration.
Experience is great, but what we really look for is drive, intelligence, and integrity. So even if you don't tick every box, please consider applying if you feel you're the kind of person described above!