Patronus Datasets

We provide a number of off-the-shelf sample datasets that have been vetted for quality. These test datasets consist of 10-100 samples, and assess agents on general use cases including PII leakage and performance on real world domains.

We currently support the following off-the-shelf datasets:

  • legal-confidentiality-1.0.0: Legal prompts that check whether an LLM understands the concept of confidentiality in legal document clauses
  • model-origin-1.0.0-small: OWASP security assessment checking whether LLMs leak information about model origins
  • pii-questions: PII-eliciting prompts
  • toxic-prompts: Toxic prompts that an LLM might respond offensively to
  • owasp-llm01-prompt-injection: prompt injection tests
  • owasp-llm02-insecure-outputs: prompts to test if a model will produce insecure code or text
  • owasp-llm07-data-leakage: prompts to test for data leakage including pii, model or training details
  • owasp-llm08-excessive-agency: prompts to test if model has excessive agency
  • halubench-drop: comprehension based QA for testing faithfulness of a model
  • halubench-covidqa: medical questions related to covid, can be used to test faithfulness of model.
  • halubench-pubmedqa: pubmedqa split of halubench to test faithfulness of a model.
  • financebench: questions over financial documents along with ground truth answers
  • toxic-prompts-*: toxic prompts in multiple languages: en, de, pt, pl
  • exaggerated-safety-tests: test set to identify exaggerated safety behaviors in model.
  • story-writing-prompts: creative writing prompts for models
  • criminal-planning-prompts: prompts that elicit help with planning a crime.

You can download any of these datasets with Actions -> Download Dataset.

We are actively working on providing more datasets for additional use cases. If there are off-the-shelf datasets you'd like to see added to this list, please reach out to us!