Agent skill marketplace supply chain attack: 121 skills across 7 repos vulnerable to GitHub username hijacking, 5 scanners disagree by 10x on malicious skill rates (arXiv:2603.16572)

A novel supply chain attack targets AI agent skill marketplaces by hijacking abandoned GitHub usernames to replace legitimate skills, with 121 vulnerable skills identified across 7 repositories. The study found automated scanners unreliable, disagreeing by an order of magnitude on malicious skill rates, and determined that conventional prompt-level defenses are ineffective. The most effective mitigation controls identified are strict tool whitelisting and implementing privilege separation for agent frameworks.

Read Full Article →

Radar Rating TR Threat Realism How real is this attack today? DU Defensive Urgency How urgently should defenders act? NO Novelty How new is this attack class? RM Research Maturity How solid is the evidence? Each dimension scored 1 (low) to 5 (high) This Week's Signal The agent skill supply chain is broken — and automated scanners cannot tell you how. Five scanners disagree by an order of magnitude on malicious skill rates, but real supply-chain hijacking via abandoned GitHub usernames is exploitable today (121 skills across 7 vulnerable repositories), and greybox fuzzing of agent frameworks renders prompt-level defences largely ineffective. In separate evaluations, tool whitelisting (RAP-2026-009) and privilege separation (RAP-2026-008) were the only controls that materially reduced attack success (RAP-2026-005). Single-source telemetry has structural limits that no detection tuning can overcome. The best single log source covers less than 40% of advanced supply-chain attack steps; complementary two-source pairing lifts reconstruction to ~64%. Model-layer attacks are invisible to conventional host and network telemetry altogether (RAP-2026-006). Mechanistic understanding of AI safety failures is catching up to the attacks. In VLMs, jailbreaks are now measurable as a distinct internal state — not a perception failure — enabling targeted inference-time defences (RAP-2026-010). Multimodal safety gaps tend to widen with capability upgrades (RAP-2026-011), and concept unlearning controls remain vulnerable to image-modality bypass (RAP-2026-007). Across agent frameworks, privilege separation (0% ASR on LLMail-Inject, RAP-2026-008) and tool filtering (17.4% ASR on AgentDojo, RAP-2026-009) outperform prompt-level defences by wide margins. ACT NOW Malicious Or Not: Adding Repository Context to Agent Skill Classification Authors: Florian Holzbauer, David Schmidt, Gabriel Gegenhuber, Sebastian Schrittwieser, Johanna Ullrich ArXiv: 2603.16572 Stream: S2 — Agent Security | RAXE ID: RAP-2026-005 Executive Takeaway Current automated scanners flag as many as 41.93% of marketplace skills as malicious, but the paper's repository-aware re-scoring leaves only 15 of 2,887 scanner-flagged skill-repository combinations (0.52%) in malicious-flagged repositories. More urgently, the paper identifies a real, previously undocumented supply-chain attack: adversaries can hijack abandoned GitHub repositories indexed by skill marketplaces, silently replacing legitimate skills before they are downloaded. Core Finding The largest empirical study of the AI agent skill ecosystem to date collected 238,180 unique skills from ClawHub, Skills.sh, SkillsDirectory, and GitHub (§3.1, Table 1). The paper asks whether the high malicious-classification rates reported by individual marketplaces reflect real risk or scanner artefacts. On classification rates, the paper finds the answer is mostly artefact. Across five scanners, fail rates ranged from 3.79% (Snyk on Skills.sh) to 41.93% (the OpenClaw scanner on ClawHub) — a tenfold spread (§5, Table 2). Cross-scanner consensus was negligible: "only 33 out of 27,111 skills (0.12%) are flagged as malicious by all five scanners" (§5, Cross-Scanner Agreement). When the authors applied repository-context scoring to the 2,887 skills flagged by both the Cisco Skill Scanner and their LLM classifier, "only 0.52% remain in malicious flagged repositories" (§6, Takeaway). In parallel, the paper identifies two structural attack vectors that are both real and underreported: repository hijacking and an API information disclosure bug in ClawHub (§4.0.2). Technical Mechanism Repository-context scoring operates in two stages. A codebase score (weighted 70%) uses an LLM to assess whether a skill's description aligns with the surrounding repository's code, README, and documentation. A metadata score (weighted 30%) estimates repository maturity through signals such as age, star count, fork count, and issue activity (§3.3). The composite penalises repositories that appear aligned but exhibit other suspicious characteristics. Repository hijacking exploits the link-out distribution model used by Skills.sh and SkillsDirectory. Both platforms index skills by pointing to GitHub repository URLs rather than hosting files directly. When an original repository owner renames their GitHub account, the previous username becomes available for registration. An adversary who claims that username and recreates the repository will intercept future skill downloads. The authors found "121 skills that forward to seven vulnerable repositories," with the most-downloaded hijackable skill having reached 2,032 downloads (§4.0.2). ClawHub is not affected because it hosts skills directly. The ClawHub API disclosure is a separate finding. The marketplace's API returns the GitHub-linked email address for each skill owner, even though this field is not exposed in the ClawHub web interface nor on default GitHub profiles (§4.0.2). A static secrets scan using ...

Read Full Article → ← Back to News

Agent skill marketplace supply chain attack: 121 skills across 7 repos vulnerable to GitHub username hijacking, 5 scanners disagree by 10x on malicious skill rates (arXiv:2603.16572)

Related Articles

Share this article