About This Project

Saruman is an AI red teaming platform that automatically probes an LLM's defenses against social engineering attacks.

Configure a defender with secrets to protect, then unleash a squad of attacker personas. Watch how the model holds up in real-time. .

How It Works

  • Define secrets – SSNs, API keys—whatever your AI shouldn't leak
  • Choose your defense – Pick a personality template or write custom system prompts
  • Run the gauntlet – distinct attack strategies probe for weaknesses
  • See the results – Leak detection, security scoring, full conversation logs

Why Saruman?

  • Diverse defense + attack personas – Direct requests, gaslighting, fake authority, friendship manipulation, context poisoning, and more
  • Benchmarking experiments – Run controlled trials across all persona combinations
  • Multi-provider support – Test language models from various providers
  • Real-time streaming – Watch attacks unfold live

Built for AI safety researchers.

Technical Stack

  • FastAPI – Backend framework
  • React – Frontend interface
  • LiteLLm - Multi-provider LLM support