Nasa CFITSIO Fuzzing: Memory Corruptions and a Codex-Assisted Pipeline

What: Researchers performed fuzzing on the CFITSIO library and found memory corruption issues
Impact: Highlights security testing in scientific software

We aresecurity engineerswho break bits and tell stories.Visit usdoyensec.comFollow us@doyensecEngage usinfo@doyensec.com © 2026Doyensec LLC Have you ever wondered how those amazing space photos are taken? Are they exclusive to the big telescopes floating in space or can you take one from your backyard? What does it take to extract hydrogen colors out of a seemingly black sky? Those are great questions, but you won’t learn it from here. Instead, I’ll show how I set up and performed fuzzing of theCFITSIOlibrary which is how those space photos are usually processed. I’ll show how the bugs were triaged at scale, and how Codex was used to unblock the fuzzing and to develop the initial security fixes. Note: the work described in this blogpost used the GPT-5-Codex, which was the latest model I had access to at the time. The Flexible Image Transport System (FITS) is a data standard created in the late 1970s by NASA, ESA, and the broader astronomy community. It started as a way to exchange telescope imagery across heterogeneous systems, but it evolved into a container for complex datasets: primary images, binary/ASCII tables, compressed tiles, world coordinate metadata, and instrument-specific headers. Today, most observatories, satellite missions, and even backyard observatories output FITS directly, so the ecosystem of tools is rich. Under the hood, FITS is far more than a simple image file - it routinely carries gigabyte-scale mosaics, time-series cubes, and calibration tables. The current FITS standard lives in a densespecand most of it addresses astronomy beyond typical astrophotography - radio, infrared, X-ray, time-series, and polarization data with all their metadata are first-class in the spec, while backyard imaging uses only a small slice. Once telescopes and CCD cameras got cheap enough for hobbyists, the community needed tooling that already worked, so adopting FITS was the obvious shortcut. The format was battle-tested and carried all the metadata serious imaging needed. Ultimately, hobbyists inherited a rather complex data format that rarely changes because backward compatibility with old files is still mandatory. There are several different libraries that claim to support the FITS format. Usually though, that only means some subset of the spec. CFITSIO is the most complete implementation and the library is used bynumerous great pieces of astronomy software, therefore it piqued my interest. For my fuzzing corpus, I’ve used some of my own astrophotos along with severalpublic samples. I’m sure the coverage could be vastly improved with the right set of specialized data. Initially, I began fuzzing using the standard AFL++ workflow. Harness code, testing corpus, some optimizations, with several sessions running over two weeks. This resulted in asecurity advisoryconsisting of six different bugs. It was a quick experiment to see how fruitful the fuzzing could be and how the communication with the NASA team works. Fortunately, the cooperation was great and issues were quickly addressed by the HEASARC team. Having the setup ready to go, I decided to give it another shot. Testing was performed against cfitsio-4.6.3 which included fixes to previously reported issues. This time, I focused exclusively on theExtended Filename Syntax (EFS)which got my interest earlier. It’s a set of filters, enclosed in square brackets, that can be used to modify the raw file in various ways before it is opened and read by the application. Although EFS looks like a filename parser on the surface, it’s effectively a mini-language: image slicing, histogram generation, filters, pixel expressions, region filtering, arithmetic expressions, and the entire parser stack behind them. An example FITS filename can look like this:myfile.fits[EVENTS][col Rad = sqrt(X**2 + Y**2)] This opens a FITS file, selects the EVENTS extension, and creates a new column computed from existing data. The library does all of that before the application sees a single byte. The filename alone triggers extension lookup, column arithmetic, and a temporary file copy. Each bracket pair activates a different parser subsystem inside CFITSIO. This represents a very interesting attack surface and it’s exposed in more places than people might think. Many applications accept filenames directly from external callers without realizing that CFITSIO will interpret them through EFS if only thefits_open_fileor similar method is called (a non-EFS alternative:fits_open_diskfilealso exists). If those filenames come from untrusted input, the attack path is open. This time, as I didn’t have too much dedicated time, I’ve strongly relied on help from the GPT/Codex. First, it generated the harness code and some helpful cleanup utilities. The harness itself is minimal: it reads a filename string from a file, passes it tofits_open_filein read-only mode, then exits. That’s enough to exercise the entire EFS parsing and evaluation pipeline (or most of it, as I learned later), without needing...

Read Full Article → ← Back to News

Nasa CFITSIO Fuzzing: Memory Corruptions and a Codex-Assisted Pipeline

Related Articles

Share this article