PromptBench is a Pytorch-based Python package designed for evaluating Large Language Models (LLMs). It offers user-friendly APIs for researchers to conduct thorough evaluations. Key features include:
- Quick Model Performance Assessment: Easily build models, load datasets, and evaluate performance.
- Prompt Engineering: Implements techniques like Few-shot Chain-of-Thought, Emotion Prompt, and Expert Prompting.
- Adversarial Prompt Evaluation: Integrates prompt attacks to assess model robustness.
- Dynamic Evaluation: Uses DyVal to generate evaluation samples on-the-fly, mitigating test data contamination.
- Efficient Multi-Prompt Evaluation: Integrates PromptEval for efficient evaluation using a small amount of data.