Saruman - Dorian Benhamou Goldfajn

Saruman

Python Updated: January 2026

View Code on GitHub →

About This Project

Saruman is an AI red teaming platform that automatically probes an LLM's defenses against social engineering attacks.

Configure a defender with secrets to protect, then unleash a squad of attacker personas. Watch how the model holds up in real-time. .

How It Works

Define secrets – SSNs, API keys—whatever your AI shouldn't leak
Choose your defense – Pick a personality template or write custom system prompts
Run the gauntlet – distinct attack strategies probe for weaknesses
See the results – Leak detection, security scoring, full conversation logs

Why Saruman?

Diverse defense + attack personas – Direct requests, gaslighting, fake authority, friendship manipulation, context poisoning, and more
Benchmarking experiments – Run controlled trials across all persona combinations
Multi-provider support – Test language models from various providers
Real-time streaming – Watch attacks unfold live

Built for AI safety researchers.

Technical Stack

FastAPI – Backend framework
React – Frontend interface
LiteLLm - Multi-provider LLM support