Patronus Datasets

We provide a number of off-the-shelf sample datasets that have been vetted for quality. These test datasets consist of 10-100 samples, and assess agents on general use cases including PII leakage and performance on real world domains.

We currently support the following off-the-shelf datasets:

legal-confidentiality-1.0.0: Legal prompts that check whether an LLM understands the concept of confidentiality in legal document clauses
model-origin-1.0.0-small: OWASP security assessment checking whether LLMs leak information about model origins
pii-questions: PII-eliciting prompts
toxic-prompts: Toxic prompts that an LLM might respond offensively to
owasp-llm01-prompt-injection: prompt injection tests
owasp-llm02-insecure-outputs: prompts to test if a model will produce insecure code or text
owasp-llm07-data-leakage: prompts to test for data leakage including pii, model or training details
owasp-llm08-excessive-agency: prompts to test if model has excessive agency
halubench-drop: comprehension based QA for testing faithfulness of a model
halubench-covidqa: medical questions related to covid, can be used to test faithfulness of model.
halubench-pubmedqa: pubmedqa split of halubench to test faithfulness of a model.
financebench: questions over financial documents along with ground truth answers
toxic-prompts-*: toxic prompts in multiple languages: en, de, pt, pl
exaggerated-safety-tests: test set to identify exaggerated safety behaviors in model.
story-writing-prompts: creative writing prompts for models
criminal-planning-prompts: prompts that elicit help with planning a crime.

You can download any of these datasets with Actions -> Download Dataset.

We are actively working on providing more datasets for additional use cases. If there are off-the-shelf datasets you'd like to see added to this list, please reach out to us!