AI Engineer

Imperum • Amsterdam Area, NL • 14h geleden

About the role (LLMs, Agents & Retrieval · Autonomous SOC · Remote / Hybrid)

At Imperum, we are building an agentic AI platform for security operations – an autonomous SOC that detects, investigates, contains, and remediates threats with minimal human intervention. We are hiring an AI Engineer to build the brain of that system: the agents, retrieval pipelines, reasoning layers, and evaluation loops that turn raw telemetry into autonomous investigations and response.

This is an applied, production-focused role. You will not be training foundation models or writing papers. You will be choosing the right models for each task, designing agents that behave reliably on real security data, building retrieval and knowledge systems that ground them in facts, and setting up the evaluations that let us trust them in front of customers. You will ship on the same 5-day idea-to-release cadence as the rest of engineering.

What you’ll do

• Design and build the agentic core of Imperum: planning agents, investigation agents, response agents, and the orchestration between them.

• Build and evolve our retrieval and knowledge layer: RAG pipelines over security telemetry, runbooks, threat intel, and case history; and more advanced patterns (knowledge-graph-augmented generation, hybrid retrieval, structured memory) where they actually improve results.

• Own model selection and routing: pick the right model (frontier, open-weight, local) for each task based on cost, latency, quality, and privacy trade-offs.

• Design tool use and function calling that agents can rely on: clean tool contracts, safe defaults, deterministic failure modes, and strong audit trails.

• Build evaluations that matter: golden datasets, LLM-as-judge where appropriate, regression suites, and online metrics that tell us if agents are actually getting better.

• Harden against real failure modes: hallucinations, loops, prompt injection, unsafe tool calls, jailbreaks, data exfiltration – especially in a security context where the stakes are high.

• Work closely with full-stack engineers to ship agent features end-to-end: API, UI, data, telemetry, and observability, not just the model call.

• Deliver on our strict 5-day idea-to-release cycle: scope, build, harden, and ship in one working week, every week.

What we’re looking forMust-have

• EU nationality (or existing right to work in the EU without sponsorship). We are not able to sponsor work visas or relocation from outside the EU for this role.

• Fluent English (spoken and written) is a must – it is our working language across code, docs, meetings, and async communication.

• Minimum 5 years of software engineering experience, with at least 2 years building LLM-powered products in production (not just demos, notebooks, or hackathons).

• Strong Python engineer. You write clean, testable, production-grade Python. Familiarity with Go is a plus since parts of our stack are in Go.

• Solid foundations in machine learning. You understand embeddings, tokenization, attention, transformer architectures, fine-tuning vs. prompting trade-offs, and the basics of evaluation and statistics. You do not need a PhD; you need to know what you’re doing.

• Deep hands-on experience with LLMs and agent systems. You have built production systems using frontier APIs (Claude, GPT, Gemini) and/or open-weight models (Llama, Mistral, Qwen, DeepSeek). You are fluent in prompt engineering, tool use / function calling, multi-step planning, and multi-agent orchestration.

• RAG as a core skill. You have designed and shipped real retrieval systems: chunking strategies, embeddings, vector databases (pgvector, Qdrant, Weaviate, or similar), hybrid search (BM25 + dense), re-ranking, and evaluation. You know why most RAG systems fail and how to fix them.

• Familiarity with advanced retrieval and augmentation patterns beyond vanilla RAG – knowledge-graph-augmented generation (KAG), graph RAG, structured memory, agentic retrieval, query rewriting, and context compression – and a clear view of when each is worth the complexity.

• Evaluation discipline. You build evals before (or alongside) features, not after. You know the limits of LLM-as-judge, how to build golden datasets, and how to catch regressions before customers do.

• AI-native developer. You use AI coding tools (Claude Code, Cursor, or equivalent) daily and have a clear view on how to make yourself and the team measurably faster with them.

• Security-aware mindset. You understand prompt injection, data exfiltration via tools, jailbreaks, and the OWASP Top 10 for LLMs. Ideally, you also have working knowledge of SOC / Incident Response / Forensics concepts – or a strong willingness to learn them fast, because our product lives in that world.

• Clear communicator. You can turn a vague product idea into an agent design, explain trade-offs without jargon, and write docs and prompts that other humans (and other agents) can act on.

Nice-to-have

• Experience with MCP (Model Context Protocol) servers and custom tool development for AI agents.

• Experience fine-tuning or distilling open-weight models (LoRA, QLoRA, PEFT) for domain-specific tasks.

• Experience running local or self-hosted inference (vLLM, llama.cpp, Ollama, TGI) and thinking about the cost / latency / privacy trade-offs.

• Background in cybersecurity – detection engineering, threat intel, DFIR, or operator time served in a SOC.

• Experience with graph databases (Neo4j, TigerGraph) or knowledge graphs applied to real product problems.

• Public track record: talks, writing, open-source contributions, or side projects in the LLM / agent / RAG space.

How we work

• Our stack: Python and Go for backend services, React for the frontend. AI engineering work is primarily in Python.

• AI-first, not AI-only: every engineer is accountable for the code that ships. AI accelerates us; it does not excuse us.

• Strict 5-day idea-to-release cycles. Scope gets cut before the deadline slides.

• Small, senior team. Short feedback loops. High autonomy and high ownership.

• We invest in tooling: prompt libraries, shared conventions, custom MCP servers, eval harnesses, and internal agents that make everyone faster.

• We pay for the best AI tools and models so you always have the right tool for the job.

• Async-friendly, remote-first, with regular in-person time when it matters.

What we offer

• Compensation package combining competitive base salary, performance bonus, and RSUs (Restricted Stock Units).

• Top-tier hardware and a generous budget for AI tools, courses, conferences, and model API credits.

• Flexible remote or hybrid setup.

• A team that treats AI engineering as a craft worth mastering – you will ship real, hard, meaningful work, fast.

How to apply

Send us your CV or LinkedIn, your GitHub or portfolio, and a short note on two things: (1) an LLM-powered system you built or shipped – what it did, what went wrong, and what you learned, and (2) how you think about evaluating an agent you cannot fully predict. Bonus points if you have a favorite failure mode story.