Red-Run - Claude CTF Automation

What: Tool for automating offensive security testing is introduced
Impact: Aims to improve efficiency and consistency in security testing processes

Products and Tools red-run All work and no tokens makes Claude a dull boy... Kevin O'Riley Mar 10, 2026 3 1 Share Regardless of how it may be portrayed on screen or in print, Offensive Security Testing can be extremely tedious and unforgiving. It requires organization, discipline, patience, system-of-systems thinking, and a multi-threaded intellect. Offensive Security Engineers have always pushed to automate at least a portion of their test methodologies for a cleaner, more-consistent, and detail-oriented approach. To that end, a thriving community has produced amazing tooling over the years; game-changing work that includes NMAP , Hydra , Metasploit , BurpSuite , aMASS , Impacket , masscan , Ghidra , Nikto , and the list goes on. We have heard whispers of “fully automated penetration tests” and “fully automated red teaming”, but nothing has ever really materialized and impacted our community in the same way as the semi-autonomous but ultimately operator-driven tools that we all use every day. Then came the LLMs. Thanks for reading Black Lantern Security (BLSOPS)! Subscribe for free to receive new posts and support my work. Subscribe Multiple companies and individuals hit the ground running with LLM-augmented and fully-automated test suites. Some have even had a significant degree of success [ 1 ][ 2 ]. Many of us in this community have built our livelihoods around providing Offensive Security Testing, usually with really smart humans supported by well-built tools. It feels like something very exciting is happening right now with the tools that support those humans, though. Agentic coding can turn a simple chat-based LLM into a partner that lives in your terminal with you and can run your entire stack, as long as you can get past that whole “existential threat” question. LLMs are now and will continue to be incredible catalysts for change, but with that change inevitably comes complex and gnarly new problems to solve. In the spirit of building, breaking, and bending new technologies to our will, a BLS operator has created red-run . It is an Offensive Security Testing Framework designed to run on top of Claude Code . It took ~2 weeks to build and required a shitload of tokens and at least one all-nighter. If we learned anything, it’s that the next few years are going to be exciting (and terrifying). As a working prototype, it’s far more capable than any of us thought it would be. red-run is a Claude Code project that combines skills, MCP servers, and agents with routing logic that guides Claude and an operator through the phases of a targeted attack against IT infrastructure. It is an offensive security toolkit that no doubt pales in comparison to the sophisticated LLM-powered tooling that nation-state level threat actors already have in their arsenal. Why? But wait… Claude Code can already do this, with no skills required. Why make red-run? red-run levels up Claude Code for Offensive Security operations: Customizable skill library with semantic RAG retrieval. Automated engagement state tracking, logging, and evidence gathering. Persistent shell and interactive tool sessions that can be shared between agents. Headless browser automation with Playwright. Offsec-aware agent routing and task parallelization suggestions. Self-improvement through retrospectives. Plus, it is just so damn fun to hack and iterate with Claude Code. It is an accelerator. Tools like Claude Code and other “AI” coding agents will likely become requirements for any serious Offensive Security team. Without them, you will simply fall behind. Remember - the bad guys have this stuff too. What? Let’s zoom out for a moment. A Large Language Model’s (LLM) context window is the amount of text that it can consider in its memory at one time. Think of the context window like volatile memory that is measured in tokens rather than gigabytes. A single token is roughly equivalent to three-quarters of a word [ 3 ]. Claude skills are markdown files that are loaded into context when called upon. Skills tell Claude how to do things the way you want them done. Claude already knows how to do just about everything. It can research. It can reason. It can troubleshoot. It can iterate. It can hack. The trick is getting it to do things in the correct way, in the proper sequence, and with accountability. When a Claude Code session approaches its context limit, the context window is automatically compacted (summarized, essentially). This is not good for extended sessions where you have gained initial access, moved laterally, and started privilege escalation when, suddenly, your context window is compacted and critical earlier information is lost. red-run attempts to solve this problem with the orchestrator skill - the single skill that is loaded into context at startup. orchestrator acts as the main function and is intended to run on the Opus model with adaptive thinking enabled. First and foremost, orchestrator is responsible for tracking the overall state of the e...

Read Full Article → ← Back to News

Red-Run - Claude CTF Automation

Related Articles

Share this article