Vacatures zoeken

Artificial Intelligence Engineer

Empiric • Amstelveen, NL • 1d geleden

Senior AI Engineer

Start: asap / to discuss

Location: Amstelveen

Work type: Hybrid (2 days per week onsite, rest remotely)

Language: Dutch

The Digital Engineering Platform

Digital Engineering (DE) is our internal platform for building, shipping and operating

production-grade AI agents. Every agent project starts from a shared, production-ready template and is built with our SpectrumAI agent framework - a composable, governed foundation for LLM-powered agents that use tools, follow policy, respect tenant boundaries and run reliably in secure and regulated environments.

As a Senior AI Engineer you are the developer who turns LLM capabilities into trustworthy,

observable services: from first prototype, through hardening (security, governance,

multi-tenancy), to containerized deployment on Kubernetes - and the day-2 operations that keep agents healthy.

Role Summary

As a Senior AI Engineer you design, build and operate AI agents on the DE platform. You own agents across their full lifecycle and make them production-ready: secure, governed, observable and resilient. You are as comfortable writing clean Python and tests as you are reasoning about prompt injection, policy-as-code, multi-tenancy and zero-downtime rollouts.

You work in an autonomous scrum team with strong ownership, alongside the Product Owner, AI Infrastructure Architects and platform engineering teams. You balance time-to-market, feature development and operational excellence.

We are growing the team and are looking for additional AI developers who want to build real, production agents - not demos.

Key Responsibilities

Agent Development & Delivery

Design and build production-ready AI agents on the SpectrumAI framework - defining tools,
state, prompts, and structured (Pydantic-validated) outputs.
Build CI/CD/CT pipelines for agent and prompt deployment, embedding generation, and version management.
Automate staged rollouts and zero-downtime deployments with regulatory-grade auditability.
Standardize the interfaces between tools, data pipelines, model/agent registries, inference runtimes, and agentic workflows.

Governance, Compliance & Auditability

Implement governance-as-code with OPA/Rego so every agent action is authorized and logged.
Enforce multi-tenancy and tenant isolation across storage, state and caches.
Maintain lineage, provenance, versioning and reproducibility; keep an approved model/prompt
catalogue with review workflows and validation checkpoints.
Ensure tamper-evident audit trails exist across tools, inference endpoints and autonomous agent actions.

Reliability & Observability

Build production-grade tracing (Langfuse), metrics, alerting and logging across all AI service layers.
Engineer for high availability, performance, run-time stability and capacity planning of AI workloads.
Implement security defenses (prompt-injection, input/tool validation), rate limiting, recursion control, cost controls, quotas and resource governance.

Operational Excellence

Maintain robust runbooks, operational guidelines and monitoring dashboards for the platform.
Containerize agents (Docker) and operate them on Kubernetes; own day-2 operations and incident response.
Collaborate with the team to keep environments secure, compliant and efficient, and work with
fellow engineers on deployment patterns, agent-behaviour monitoring and RAG workflow stability.

Experience & Skills

Must have

Strong Python engineering background, with experience shipping and operating software in secure or regulated environments.
Hands-on experience building LLM-powered or agentic applications (LangGraph/LangChain or comparable), including tool use and prompt design.
Solid grasp of production delivery: testing (pytest), containerization (Docker), and
deployment to Kubernetes, with CI/CD and infrastructure-as-code.
Understanding of the AI lifecycle — observability, security, and reliability of AI systems in production.

Nice to have

Experience with OPA/Rego, OpenBao/Vault, Langfuse/OpenTelemetry, or Pydantic
structured output.
Knowledge of OWASP LLM Top 10, multi-tenancy, and immutable audit logging.
Multi-agent orchestration (supervisor/worker) and RAG experience.

Technical Stack and Tooling Experience

You don't need every item on day one, but you should be strong in several of these and eager to

grow into the rest. This stack reflects how we actually build agents on the DE platform.

Languages & core engineering

Python - primary language; modern typing, async/await, packaging.
Pydantic - BaseModel schemas for structured, validated agent output (JSON Schema / response_format).
Git & trunk-based collaboration on GitHub (Enterprise).
Comfortable on Linux/Bash and Windows/PowerShell developer environments.

Agent & LLM development

LangGraph - graph-based agent architecture (state graphs, tool nodes, the ReAct loop).
LangChain / langchain-core - the @tool decorator, structured tools, LLM interfaces.
Working with multiple LLM providers through a unified gateway: OpenAI, Anthropic, Azure OpenAI, Mistral, Google, Ollama, and LiteLLM as a multi-provider proxy.
Multi-agent orchestration - supervisor/worker patterns, delegation tools, agent registries.
RAG (retrieval-augmented generation) and tool-using agents.
Our SpectrumAI agent framework (built on LangGraph), composed from mixins for security, tenancy, audit, rate limiting, governance and observability - you'll pick this up fast if you know the building blocks above.

Security & governance

AI/LLM security: prompt-injection defense, input sanitization, tool-argument validation (path traversal, SQLi, command injection, SSRF), and the OWASP Top 10 for LLM Applications.
Open Policy Agent (OPA) / Rego - policy-as-code for governed, auditable agent decisions.
Multi-tenancy & isolation - tenant-scoped storage, cache and state.
Audit & compliance - immutable, hash-chained audit logging; provenance and reproducibility.

Reliability, observability & operations

Langfuse - LLM tracing, prompt management and analytics.
OpenTelemetry (with Prometheus/Grafana) for distributed tracing and metrics.
Resilience patterns: retries, fallback chains, rate limiting, recursion/loop control, graceful degradation.
Health/readiness endpoints, capacity planning, cost control and quota management for AI workloads.

Secrets, packaging & delivery

OpenBao / HashiCorp Vault (and/or Azure Key Vault) for secrets management and rotation.
Docker - multi-stage builds, non-root images, health checks.
Kubernetes - Deployments, Services, ConfigMaps/Secrets, HPA, PodDisruptionBudgets,
liveness/readiness probes, rolling updates. (Helm is a plus.)
CI/CD - automated pipelines (e.g. GitHub Actions) for test, build and staged rollout.

Testing & quality

pytest (incl. pytest-asyncio, coverage), mock LLMs/tools, and a layered testing pyramid that keeps fast unit tests separate from slow live-LLM tests.

Empiric

Devoteam • Amsterdam, NL • 6d geleden

6d geleden

Apply