The Prompt Injection Test (PINT) Benchmark is an evaluation framework designed to measure the efficacy of prompt injection detection systems. It allows red teamers and AI security researchers to test guardrails against a diverse dataset of adversarial prompts and jailbreaks without relying on over-fit public datasets.
Key Features
- Diverse Dataset: Includes 4,314 inputs covering English and non-English content across various attack vectors.
- Granular Categorization: Specific test sets for prompt injections, jailbreaks, and "hard negatives" designed to trigger false positives.
- Comparative Scoring: Standardized performance metrics compared against industry leaders like AWS Bedrock Guardrails and Azure AI Prompt Shield.
- Automated Evaluation: Includes a Jupyter Notebook for testing custom classifiers and detection functions against the dataset.
Use Cases
- Red Team LLM Auditing: Quantify the robustness of an organization's LLM defenses against sophisticated injection patterns.
- Security Product Benchmarking: Perform head-to-head comparisons of AI security vendors to determine the highest detection rate.
- Guardrail Tuning: Use the included "hard negative" samples to refine detection logic and minimize false positive rates in production environments.




