Meta trials Purple Llama project for AI developers to test safety risks in models

Meta has launched a project called Purple Llama, which aims to provide open-source tools for developers to assess and improve trust and safety in their generative AI models. The project involves collaboration with other AI application developers, cloud platforms, chip designers, and software businesses. The first package released under Purple Llama includes tools to test cybersecurity issues in software-generating models and a language model that classifies inappropriate or violent text. Initial tests showed that large language models suggested vulnerable code 30 percent of the time. The CyberSec Eval tool allows developers to run benchmark tests to check the security of their AI models. Llama Guard is a language model trained to classify text and identify sexually explicit, offensive, harmful, or unlawful content. Developers can test their models by running input prompts and output responses generated by Llama Guard. Purple Llama takes a two-pronged approach to security and safety, focusing on both the inputs and outputs of AI. The project aims to create a center of mass for open trust and safety in AI development.

Full article

Leave a Reply