When Agent Skills Become Black Boxes: Auditing the Unauditable

Lynn · May 4, 2026, 5:04am

When Agent Skills Become Black Boxes: Auditing the Unauditable

04 May 2026, Yanjiang

Semia translates hybrid agent skills into formal logic, enabling provable audit of hidden vulnerabilities.

It seems like you can’t scroll through tech news without hearing about a new AI agent that reads your email, executes shell commands, or signs blockchain transactions. Each of these capabilities comes packaged as a “skill”—a configuration that tells the agent what functions to call and when. But here’s the catch: those skills are written partly in structured code and partly in natural language, and the natural language part is reinterpreted by an LLM on every invocation. That means the same skill can behave differently each time it runs—and until now, no one has been able to provably audit what it actually does.

A team led by Hongbo Wen at the University of California, Santa Barbara—working with researchers from UCLA, UC San Diego, and the security firm Fuzzland—has developed a tool called Semia that can perform exactly this kind of audit. Their work appears in a preprint (arXiv:2605.00314) on arXiv.

The hybrid problem

An agent skill is a curious beast. Part of it is structured: declarations of interfaces, parameters, and allowed actions. That part is easy to parse with static analysis tools. But the other part is prose—natural language descriptions of when and how those actions should fire. “When the user says ‘check my email’, extract the inbox count and summarize any urgent messages,” a typical skill might read. An LLM interprets that prose probabilistically on every invocation, meaning the same instruction can lead to slightly different behaviors depending on the model’s internal state.

Conventional static analyzers can inspect the structured half but have no way to reason about what the prose means. LLM-based tools can read the prose but cannot reproducibly prove that a tainted input reaches a high-impact sink. Think of it like a recipe that says “add salt to taste.” A static analyzer will see the instruction “add salt” but has no idea what “to taste” means. An LLM can interpret “to taste” but can’t prove that if the salt is poisoned, it reaches the dish. Unlike a recipe, however, agent skills can execute actions that have real-world consequences—deleting files, writing to databases, signing transactions—so the stakes are considerably higher.

A language for skills

Semia handles this by translating each skill into a formal representation called the Skill Description Language (SDL). SDL is a Datalog fact base—Datalog is a declarative logic programming language commonly used in static analysis—that captures three essential aspects: the LLM-triggered actions, the prose-defined conditions under which they fire, and any human-in-the-loop checkpoints that can intervene.

The central challenge is constructing an SDL fact base that is both structurally sound and semantically faithful to the original prose. This is where the team’s insight comes in. They call it Constraint-Guided Representation Synthesis (CGRS): a propose-verify-evaluate loop that starts with an LLM generating a candidate SDL representation, then checks it against structural constraints, then evaluates whether the semantics match the original prose, and iterates until the candidate converges. The result is a formal model that an auditor can query.

Once the SDL fact base exists, security properties become Datalog reachability queries. Want to know if an indirect injection attack can leak a secret? Write a query that traces a tainted input from a user prompt to a high-impact sink. Want to check if a confused deputy attack—where one skill’s action is co-opted by another—is possible? Follow the logic chain through the fact base. Semia defines a library of such queries covering indirect injection, secret leakage, unguarded sinks, and other classic vulnerabilities that take on new forms in the agent skill world.

More than half carry critical risks

The team put Semia to the test on 13,728 real-world skills collected from public marketplaces. The tool rendered every single one of them auditable. More strikingly, more than half of the skills were found to carry at least one critical semantic risk—a vulnerability that could actually be exploited if the LLM interpreted the prose in a certain way.

To validate the tool’s accuracy, the researchers extracted a stratified sample of 541 skills and had security experts label each one for vulnerabilities. Semia achieved 97.7% recall and an F1 score of 90.6%—substantially outperforming signature-based scanners that only look at the structured part, and also outperforming pure LLM baselines that simply ask a model to guess whether a skill is vulnerable. The tool doesn’t just flag potential issues; it produces a trace that shows exactly how a tainted input could reach a dangerous sink. That traceability is what makes Semia a genuine audit tool rather than a statistical predictor.

	MHG	UCI	SLRO	IEC	UDS	DEP	SC	BCC	DMP	OBF	HCC	Total
#GT	278	97	37	23	14	10	6	5	4	2	1	477
#Det.	270	93	35	22	14	10	6	5	4	2	1	462
Recall	97.1%	95.9%	94.6%	95.7%	100.0%	100.0%	100.0%	100.0%	100.0%	100.0%	100.0%	96.9%

The tool perfectly detects seven skill categories but misses some instances in four others. This gap reveals where the auditing system needs refinement for reliable agent evaluation. (Source: arXiv:2605.00314)

What this means for AI safety

LLM agents are being deployed in increasingly sensitive roles—automating workflows, handling financial transactions, managing access to corporate systems. The skills they use are often downloaded from public repositories, much like apps on a phone, but without the equivalent of a security review. Wen and his colleagues’ work suggests that the current state of affairs is risky: more than half of publicly available skills contain vulnerabilities that a determined attacker could exploit.

Semia does not claim to be a complete solution. The CGRS loop can still miss vulnerabilities if the LLM’s interpretation of the prose diverges subtly from what a human auditor would understand. And the Datalog queries must be kept up to date as new attack patterns emerge. But the framework is extensible, and the team is already working on integrating Semia into continuous integration pipelines so that every skill can be audited before it is ever deployed.

For developers building agent skills, Semia offers something that has been conspicuously absent: a way to answer the question “Is this thing safe to run?” with a proof—or at least a reproducible analysis—rather than a guess. For the rest of us, it is a reminder that the rise of AI agents does not have to mean flying blind. The tools to hold them accountable are being built, one Datalog query at a time.

Yanjiang is an online editor of Loom Science

References

Hongbo Wen et al., Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis, arXiv:2605.00314

Topic	Replies	Views
Proving Safe Action: AI’s New Logical Straitjacket Science 365	0	June 2, 2026
The Worm That Lives Inside Your AI Assistant's Memory Science 365	4	May 5, 2026
When Language Models Write a Dictionary for Themselves Science 365	1	May 9, 2026
LoomRank Daily · cs.AI · Top 15 · 2026-05-13 Artificial Intelligence	0	May 13, 2026
When Your AI Assistant Becomes a Spy: The Game Theory of Trustworthy Knowledge Science 365	2	April 28, 2026

When Agent Skills Become Black Boxes: Auditing the Unauditable

When Agent Skills Become Black Boxes: Auditing the Unauditable

The hybrid problem

A language for skills

More than half carry critical risks

What this means for AI safety

Related topics