Meta releases ‘Purple Llama’ AI security suite to meet White House commitments

Meta released a suite of tools for securing and benchmarking generative artificial intelligence models (AI) on Dec. 7.

Dubbed “Purple Llama,” the toolkit is designed to help developers build safely and securely with generative AI tools, such as Meta’s open-source model, Llama-2.

Announcing Purple Llama — A new project to help level the playing field for building safe & responsible generative AI experiences.

Purple Llama includes permissively licensed tools, evals & models to enable both research & commercial use.

More details ➡️ https://t.co/k4ezDvhpHp pic.twitter.com/6BGZY36eM2

— AI at Meta (@AIatMeta) December 7, 2023

AI purple teaming

According to a blog post from Meta, the “Purple” part of “Purple Llama” refers to a combination of “red-teaming” and “blue teaming.”

Red teaming is a paradigm wherein developers or internal testers attack an AI model on purpose to see if they can produce errors, faults, or unwanted outputs and interactions. This allows developers to create resiliency strategies against malicious attacks and safeguard against security and safety faults.

Blue teaming, on the other hand, is pretty much the polar opposite. Here, developers or testers respond to red teaming attacks in order to determine the mitigating strategies necessary to combat actual threats in production, consumer, or client-facing models.

Per Meta:

“We believe that to truly mitigate the challenges that generative AI presents, we need to take both attack (red team) and defensive (blue team) postures. Purple teaming, composed of both red and blue team responsibilities, is a collaborative approach to evaluating and mitigating potential risks.”

Safeguarding models

The release, which Meta claims is the “first industry-wide set of cyber security safety evaluations for Large Language Models (LLMs),” includes:

Metrics for quantifying LLM cybersecurity risk
Tools to evaluate the frequency of insecure code suggestions
Tools to evaluate LLMs to make it harder to generate malicious code or aid in carrying out cyber attacks

The big idea is to integrate the system into model pipelines in order to reduce unwanted outputs and insecure code while simultaneously limiting the usefulness of model exploits to cybercriminals and bad actors.

“With this initial release,” writes the Meta AI team, “we aim to provide tools that will help address risks outlined in the White House commitments.”