Lakera Launches Open-Source Security Benchmark for LLM backends in AI Agents

India, 10th November, 2025 – Check Point Software Technologies Ltd. (NASDAQ: CHKP), a global leader in cybersecurity solutions, and Lakera, a leading AI-native security platform for Agentic AI applications, in collaboration with researchers from the UK AI Security Institute (AISI), today announced the release of the Backbone Breaker Benchmark (b3) an open-source security evaluation specifically designed for assessing the security of large language models (LLMs) within AI agents.

The b3 benchmark introduces a novel concept called threat snapshots. Rather than simulating an entire AI agent workflow, threat snapshots focus on critical points where vulnerabilities in LLMs are most likely to occur. By testing models at these precise moments, developers and model providers can evaluate how well their systems withstand realistic adversarial challenges without the complexity of modeling full agent operations.

“We built the b3 benchmark because today’s AI agents are only as secure as the LLMs that power them,” said Mateo Rojas-Carulla, Co-Founder and Chief Scientist at Lakera, a Check Point company. “Threat snapshots allow us to systematically uncover vulnerabilities that have until now remained hidden in complex agent workflows. By making this benchmark open-source, we aim to equip developers and model providers with a practical way to measure and enhance their security posture.”

The benchmark combines 10 representative agent “threat snapshots” with a high-quality dataset of 19,433 crowdsourced adversarial attacks, collected via the gamified red-teaming platform Gandalf: Agent Breaker. It evaluates susceptibility to attacks such as system prompt exfiltration, phishing link insertion, malicious code injection, denial-of-service, and unauthorized tool calls.

Initial tests on 31 popular LLMs revealed several key insights:

Enhanced reasoning capabilities significantly improve security.
Model size does not necessarily correlate with security performance.
Closed-source models generally outperform open-weight models, though top open models are narrowing the gap.

Gandalf: Agent Breaker is a hacking simulator game designed to challenge players to exploit AI agents in realistic scenarios. The game features ten GenAI applications, each simulating real-world AI agent behaviors with multiple difficulty levels, layered defenses, and diverse attack surfaces, ranging from prompt engineering to code-level attacks, file processing, memory manipulation, and external tool usage.

Gandalf was originally developed during an internal hackathon at Lakera, where blue and red teams competed to defend and attack an LLM holding a secret password. Since its public release in 2023, Gandalf has become the world’s largest red-teaming community, generating over 80 million data points. Initially conceived as a game, Gandalf has proven invaluable in revealing real-world vulnerabilities in GenAI applications, highlighting the critical need for AI-first security.

Lakera Launches Open-Source Security Benchmark for LLM backends in AI Agents

Unlocking the Future: What 6G Technology Holds for 2026 and Beyond

Mumbai’s New Beacon of Innovation: Decimal Point Analytics Establishes Premier AI and Analytics Headquarters

AD Ports Group Unveils AI Blueprint to Redefine Global Port Operations

Leave a Reply Cancel reply

Unlocking the Future: What 6G Technology Holds for 2026 and Beyond

Praan Health Secures INR 8.5 Cr Seed Funding to Transform Chronic Care

IIT Indore & DHN Announce Leading 5 Startups for HealthTech Innovation Challenge 2025

Speed of Disbursal, Digital Ease Drive Festive Loan Choices: Paisabazaar Survey

Fierce Fashion Ahead: Aza Fashions Introduces ‘ROAR’, a Dynamic Collection Curated by Shilpa Shetty Kundra

Why local US newspapers are sounding the alarm

Searching for the ‘angel’ who held me on Westminster Bridge

All you need to know about penalty shootouts

The man who saved thousands of people from HIV

Searching for the forgotten heroes of World War Two

Unlocking the Future: What 6G Technology Holds for 2026 and Beyond

Praan Health Secures INR 8.5 Cr Seed Funding to Transform Chronic Care

IIT Indore & DHN Announce Leading 5 Startups for HealthTech Innovation Challenge 2025

Speed of Disbursal, Digital Ease Drive Festive Loan Choices: Paisabazaar Survey

Fierce Fashion Ahead: Aza Fashions Introduces ‘ROAR’, a Dynamic Collection Curated by Shilpa Shetty Kundra

You may have missed

Unlocking the Future: What 6G Technology Holds for 2026 and Beyond

Praan Health Secures INR 8.5 Cr Seed Funding to Transform Chronic Care

IIT Indore & DHN Announce Leading 5 Startups for HealthTech Innovation Challenge 2025

Speed of Disbursal, Digital Ease Drive Festive Loan Choices: Paisabazaar Survey

Fierce Fashion Ahead: Aza Fashions Introduces ‘ROAR’, a Dynamic Collection Curated by Shilpa Shetty Kundra

About Online Media Cafe

More Stories

Leave a Reply Cancel reply

You may have missed

About Online Media Cafe