Senior AI Engineer
Start: asap / to discuss
Location: Amstelveen
Work type: Hybrid (2 days per week onsite, rest remotely)
Language: Dutch
The Digital Engineering Platform
Digital Engineering (DE) is our internal platform for building, shipping and operating
production-grade AI agents. Every agent project starts from a shared, production-ready template and is built with our SpectrumAI agent framework - a composable, governed foundation for LLM-powered agents that use tools, follow policy, respect tenant boundaries and run reliably in secure and regulated environments.
As a Senior AI Engineer you are the developer who turns LLM capabilities into trustworthy,
observable services: from first prototype, through hardening (security, governance,
multi-tenancy), to containerized deployment on Kubernetes - and the day-2 operations that keep agents healthy.
Role Summary
As a Senior AI Engineer you design, build and operate AI agents on the DE platform. You own agents across their full lifecycle and make them production-ready: secure, governed, observable and resilient. You are as comfortable writing clean Python and tests as you are reasoning about prompt injection, policy-as-code, multi-tenancy and zero-downtime rollouts.
You work in an autonomous scrum team with strong ownership, alongside the Product Owner, AI Infrastructure Architects and platform engineering teams. You balance time-to-market, feature development and operational excellence.
We are growing the team and are looking for additional AI developers who want to build real, production agents - not demos.
Key Responsibilities
Agent Development & Delivery
- Design and build production-ready AI agents on the SpectrumAI framework - defining tools,
- state, prompts, and structured (Pydantic-validated) outputs.
- Build CI/CD/CT pipelines for agent and prompt deployment, embedding generation, and version management.
- Automate staged rollouts and zero-downtime deployments with regulatory-grade auditability.
- Standardize the interfaces between tools, data pipelines, model/agent registries, inference runtimes, and agentic workflows.
Governance, Compliance & Auditability
- Implement governance-as-code with OPA/Rego so every agent action is authorized and logged.
- Enforce multi-tenancy and tenant isolation across storage, state and caches.
- Maintain lineage, provenance, versioning and reproducibility; keep an approved model/prompt
- catalogue with review workflows and validation checkpoints.
- Ensure tamper-evident audit trails exist across tools, inference endpoints and autonomous agent actions.
Reliability & Observability
- Build production-grade tracing (Langfuse), metrics, alerting and logging across all AI service layers.
- Engineer for high availability, performance, run-time stability and capacity planning of AI workloads.
- Implement security defenses (prompt-injection, input/tool validation), rate limiting, recursion control, cost controls, quotas and resource governance.
Operational Excellence
- Maintain robust runbooks, operational guidelines and monitoring dashboards for the platform.
- Containerize agents (Docker) and operate them on Kubernetes; own day-2 operations and incident response.
- Collaborate with the team to keep environments secure, compliant and efficient, and work with
- fellow engineers on deployment patterns, agent-behaviour monitoring and RAG workflow stability.
Experience & Skills
Must have
- Strong Python engineering background, with experience shipping and operating software in secure or regulated environments.
- Hands-on experience building LLM-powered or agentic applications (LangGraph/LangChain or comparable), including tool use and prompt design.
- Solid grasp of production delivery: testing (pytest), containerization (Docker), and
- deployment to Kubernetes, with CI/CD and infrastructure-as-code.
- Understanding of the AI lifecycle — observability, security, and reliability of AI systems in production.
Nice to have
- Experience with OPA/Rego, OpenBao/Vault, Langfuse/OpenTelemetry, or Pydantic
- structured output.
- Knowledge of OWASP LLM Top 10, multi-tenancy, and immutable audit logging.
- Multi-agent orchestration (supervisor/worker) and RAG experience.
Technical Stack and Tooling Experience
You don't need every item on day one, but you should be strong in several of these and eager to
grow into the rest. This stack reflects how we actually build agents on the DE platform.
Languages & core engineering
- Python - primary language; modern typing, async/await, packaging.
- Pydantic - BaseModel schemas for structured, validated agent output (JSON Schema / response_format).
- Git & trunk-based collaboration on GitHub (Enterprise).
- Comfortable on Linux/Bash and Windows/PowerShell developer environments.
Agent & LLM development
- LangGraph - graph-based agent architecture (state graphs, tool nodes, the ReAct loop).
- LangChain / langchain-core - the @tool decorator, structured tools, LLM interfaces.
- Working with multiple LLM providers through a unified gateway: OpenAI, Anthropic, Azure OpenAI, Mistral, Google, Ollama, and LiteLLM as a multi-provider proxy.
- Multi-agent orchestration - supervisor/worker patterns, delegation tools, agent registries.
- RAG (retrieval-augmented generation) and tool-using agents.
- Our SpectrumAI agent framework (built on LangGraph), composed from mixins for security, tenancy, audit, rate limiting, governance and observability - you'll pick this up fast if you know the building blocks above.
Security & governance
- AI/LLM security: prompt-injection defense, input sanitization, tool-argument validation (path traversal, SQLi, command injection, SSRF), and the OWASP Top 10 for LLM Applications.
- Open Policy Agent (OPA) / Rego - policy-as-code for governed, auditable agent decisions.
- Multi-tenancy & isolation - tenant-scoped storage, cache and state.
- Audit & compliance - immutable, hash-chained audit logging; provenance and reproducibility.
Reliability, observability & operations
- Langfuse - LLM tracing, prompt management and analytics.
- OpenTelemetry (with Prometheus/Grafana) for distributed tracing and metrics.
- Resilience patterns: retries, fallback chains, rate limiting, recursion/loop control, graceful degradation.
- Health/readiness endpoints, capacity planning, cost control and quota management for AI workloads.
Secrets, packaging & delivery
- OpenBao / HashiCorp Vault (and/or Azure Key Vault) for secrets management and rotation.
- Docker - multi-stage builds, non-root images, health checks.
- Kubernetes - Deployments, Services, ConfigMaps/Secrets, HPA, PodDisruptionBudgets,
- liveness/readiness probes, rolling updates. (Helm is a plus.)
- CI/CD - automated pipelines (e.g. GitHub Actions) for test, build and staged rollout.
Testing & quality
- pytest (incl. pytest-asyncio, coverage), mock LLMs/tools, and a layered testing pyramid that keeps fast unit tests separate from slow live-LLM tests.