About This Project
Saruman is an AI red teaming platform that automatically probes an LLM's defenses against social engineering attacks.
Configure a defender with secrets to protect, then unleash a squad of attacker personas. Watch how the model holds up in real-time. .
How It Works
- Define secrets – SSNs, API keys—whatever your AI shouldn't leak
- Choose your defense – Pick a personality template or write custom system prompts
- Run the gauntlet – distinct attack strategies probe for weaknesses
- See the results – Leak detection, security scoring, full conversation logs
Why Saruman?
- Diverse defense + attack personas – Direct requests, gaslighting, fake authority, friendship manipulation, context poisoning, and more
- Benchmarking experiments – Run controlled trials across all persona combinations
- Multi-provider support – Test language models from various providers
- Real-time streaming – Watch attacks unfold live
Built for AI safety researchers.
Technical Stack
- FastAPI – Backend framework
- React – Frontend interface
- LiteLLm - Multi-provider LLM support