Founding RL Researcher

Lanturn • Amsterdam Area, NL • 1m geleden

Founding Research Scientist (Long-Horizon RL) at Lanturn

Location: San Francisco (preferred) / Remote (US)

Compensation: $200K base + 0.5–1% equity

Type: Full-time · Founding Team

At Lanturn, we are building the next generation of reinforcement learning systems for real-world agents. Our focus is on enabling AI systems to learn from behavioral data and long-horizon workflows, through:

High-fidelity RL environments
Synthetic data generation
Closed-loop training systems

We are looking for a Founding RL Researcher to push the frontier of:

Long-horizon RL
Environment design
Post-training for agents

About us:

Lanturn is building the end-to-end behavioural learning stack for AI systems. We believe current approaches to RL and post-training are limited by short-horizon optimisation, weak or proxy reward signals, and a lack of grounded environments. Our approach is to build closed-loop RL systems where environments, data, training, and evaluation are tightly integrated and based on real-world behavioral data.

The role:

As a Founding RL Researcher, you will lead efforts to develop novel reinforcement learning algorithms and environments for training autonomous agents. You will work across:

Algorithm design
Environment modelling
Training systems
Evaluation frameworks

This role sits at the intersection of:

Frontier Labs-style RL research (environments + algorithms)
Modern LLM post-training (RLHF, preference optimisation)

Key responsibilities:

Design and implement RL systems for long-horizon tasks (10–100+ steps)
Develop and extend modern post-training methods:
PPO, DPO, ORPO
GRPO / GRPO++ and ranking-based optimization methods
Build RL environments grounded in real-world workflows
Work on meta-RL and adaptive learning systems:
Generalization across tasks
Rapid adaptation to new environments
Design reward systems for:
Behavioural correctness
Efficiency and robustness
Develop evaluation frameworks aligned with real-world outcomes
Collaborate with engineering teams to scale training systems

Ideal candidate:

You are a researcher with strong theoretical grounding and real-world system intuition, capable of working on open-ended problems in RL. You thrive in environments where:

Problems are not well-defined
Systems must be built from first principles
Research directly translates into deployed systems

Minimum qualifications:

Experience at a top-tier AI lab or company: OpenAI, DeepMind, Anthropic, FAIR, or equivalent
Strong background in reinforcement learning and post-training systems
Experience training large-scale models (LLMs or similar)
Strong programming skills (Python, PyTorch/JAX)

Preferred qualifications:

Experience with long-horizon RL or sequential decision-making systems
Experience designing or working with RL environments
Familiarity with: Preference optimization (DPO, ORPO), RLHF pipelines, and automated RL env generation
Experience with meta-RL / adaptive learning systems
Strong publication record in top-tier ML conferences

Core technical skills:

Deep understanding of: Policy gradient methods (PPO and beyond), KL-regularized optimization, and credit assignment in long-horizon settings
Experience with: Cascading RL pipelines (SFT → RL → evaluation), distributed training systems, and stability and scaling challenges
Strong intuition for: Exploration vs exploitation, reward shaping vs reward learning, and trajectory-level optimization

What makes this role unique ?

Focus on long-horizon behavioral learning, not short-form RLHF
Treats environment design and generation as a first-class problem
Opportunity to define GRPO++-style next-generation algorithms and publish to NeurIPS

Why join Lanturn ?

Founding ownership (0.5–1% equity)
Work on unsolved problems in RL and agent systems
High autonomy and research freedom
Direct impact on how real-world AI systems are trained
Work with second time founders directly who have worked with various big tech companies and enterprises.

If you’ve worked on RL at a top lab or have had production RL experience and want to push beyond current paradigms into real-world, long-horizon intelligence, this is your opportunity.