from Hacker News

Kaitai Struct: declarative binary format parsing language

by djoldman on 10/14/25, 2:51 PM with 45 comments

  • by mturk on 10/23/25, 8:31 PM

    Kaitai is absolutely one of my favorite projects. I use it for work (parsing scientific formats, prototyping and exploring those formats, etc) as well as for fun (reverse engineering games, formats for DOSbox core dumps, etc).

    I gave a guest lecture in a friend's class last week where we used Kaitai to back out the file format used in "Where in Time is Carmen Sandiego" and it was a total blast. (For me. Not sure that the class agreed? Maybe.) The Web IDE made this super easy -- https://ide.kaitai.io/ .

    (On my youtube page I've got recordings of streams where I work with Kaitai to do projects like these, but somehow I am not able to work up the courage to link them here.)

  • by depierre on 10/24/25, 7:25 AM

    One of my personal favorites. I've used it for parsing SAP's RPC network protocol, reverse-engineering Garmin apps [0], and more recently in a CTF challenge that involved an unknown file format, among others. It's surprisingly quick to pick up once you get the hang of the syntax.

    The serialization branch for Python [1] (I haven't tried the Java one) has generally done the job for me, though I've had to patch a few edge cases.

    One feature I've often wished for is access to physical offsets within the file being parsed (e.g. being able to tell that a field foo that you just parsed starts at offset 0x100 from the beginning of the file). As far as I know, you only get relative offsets to the parent structure.

    0: https://github.com/anvilsecure/garmin-ciq-app-research/blob/...

    1: https://doc.kaitai.io/serialization.html

  • by dgan on 10/23/25, 10:22 PM

    Wow this is good. My only complaint is annoyingly verbose yaml. What if I would like to use Kaitai instead of protobuffs, my .proto file is already a thousand lines, splitting each od these lines into 3-4 yaml indented lines is hurting readability
  • by carom on 10/24/25, 1:59 AM

    My dream for a parsing library / language is that it would be able to read, manipulate, and then re-serialize the data. I'm sure there are a ton of edge cases there, but the round trip would be so useful for fuzzing and program analysis.
  • by okanat on 10/23/25, 9:05 PM

    Even if you don't want to use it since it is not as efficient as a hand-written specialized parser, Kaitai Struct gives a perfect way of documenting file formats. I love the idea and every bit of the project!
  • by whitten on 10/24/25, 2:25 AM

    To quote from the page: id: flags type: u1

    This seems to say flags is a sort of unsigned integer.

    Is there a way to break the flags into big endiaN bits where the first two bits are either 01 or 10 but not 00 or 11 with 01 meaning DATA and 01 meaning POINTER with the next five bits as a counter of segments and the next bit is 1 if the default is BLACK and 1 if the default is WHITE ?

  • by theLiminator on 10/23/25, 8:31 PM

    Is the main difference from https://github.com/google/wuffs being that Kaitai is declarative?
  • by zzlk on 10/23/25, 7:43 PM

    I wanted to use this a long time ago but the rust support wasn't there. I can see now that it's on the front page with apparently first class support so looks like I can give it a go again.
  • by Everdred2dx on 10/23/25, 11:34 PM

    I had a ton of fun using Kaitai to write an unpacking script for a video game's proprietary pack file format. Super cool project.

    I did NOT have fun trying to use Kaitai to pack the files back together. Not sure if this has improved at all but a year or so ago you had to build dependencies yourself and the process was so cumbersome it ended up being easier to just write imperative code to do it myself.

  • by metaPushkin on 10/24/25, 9:59 AM

    Enjoyable tool. When I developed my text RPG game, I prepared a Kaitai specification for the save file data format so that it would be easy to create third-party software for viewing and modifying it =)
  • by pabs3 on 10/24/25, 2:49 AM

    Kaitai is one of many different tools that do this, there is a list of them here:

    https://github.com/dloss/binary-parsing

    Personally I like GNU Poke.

  • by somethingsome on 10/24/25, 6:38 AM

    I didn't check exactly what Kaitai does but, MPEG uses a custom SDL for it's binary syntax: https://mpeggroup.github.io/mpeg-sdl-editor/ Just sharing, in case someone is interested :)
  • by bburky on 10/24/25, 12:18 AM

    Kaitai is pretty nice. Hex editors with structure parsing support used to be more rare than they are now, so I've used https://ide.kaitai.io/ instead a few times.

    Also, the newest Kaitai release added (long awaited) serialization support! I haven't had a chance to try it out.

    https://kaitai.io/news/2025/09/07/kaitai-struct-v0.11-releas...

  • by jdp on 10/23/25, 7:58 PM

    I also like Protodata [1]. It's complementary as an exploration and transformation tool when working with binary data formats.

    [1]: https://github.com/evincarofautumn/protodata

  • by Locutus_ on 10/24/25, 6:58 AM

    How is the write support now-adays, is it production quality now?

    I used Kaitai in a IoT project for building data ingress parsers and it was great. But not having write support was a bummer.

  • by Rucadi on 10/24/25, 7:07 AM

    The most success I had so far on doing a project where I had to work with binary data parsing is Deku in rust, I would give this a try if I have the opportunity
  • by kodachi on 10/24/25, 2:59 AM

    The recent release of 0.11 marks the inclusion of the long awaited serialization feature. Python and Java only for now. I've been using it for a while for Python and although it has some rough edges, it works pretty well and I'm super excited for the project.
  • by woodruffw on 10/23/25, 8:05 PM

    Kaitai Struct is really great. I've used it several times over the years to quickly pull in a parser that I'd otherwise have to hand-roll (and almost certainly get subtly wrong).

    Their reference parsers for Mach-O and DER work quite nicely in abi3audit[1].

    [1]: https://github.com/pypa/abi3audit/tree/main/abi3audit/_vendo...

  • by setheron on 10/23/25, 8:35 PM

    Great timing! I just published https://github.com/fzakaria/nix-nar-kaitai-spec and contributed kaitai C++ STL runtime to nixpkgs https://github.com/NixOS/nixpkgs/pull/454243
  • by sitkack on 10/23/25, 9:14 PM

    What was the Python based binary parsing library from around 2010? Hachoir?

    https://hachoir.readthedocs.io/en/latest/index.html

  • by ginko on 10/23/25, 10:13 PM

    No pure C backend?
  • by layoric on 10/23/25, 8:46 PM

    I discovered this project recently and used it for Himawari Standard Data format and it made it so much easier. Definitely recommend using this if you need to create binary readers for uncommon formats.
  • by lzcdhr on 10/24/25, 1:36 AM

    Does it support incremental parsing? For example, when I am parsing a network protocol, can it still consume some data from the head of the buffer even if the data is incomplete? This would not only avoid multiple attempts to restart parsing from the beginning but also prevent the buffer from growing excessively.
  • by casey2 on 10/24/25, 11:36 AM

    https://www.erlang.org/doc/system/bit_syntax.html

    highly recommended if you like functional languages

  • by imtringued on 10/23/25, 10:15 PM

    https://en.wikipedia.org/wiki/Data_Format_Description_Langua...

    DFDL is heavily encroaching on Kaitai structs territory.