It works. The radio expansion works.

Left off last entry with this hanging in the air:

As long as it works……..

Well. It works.

I can’t believe it myself. Honestly most of the thanks here goes to FOXDIE team, and Claude, who is obviously much more versed in C code than I am.

I’ll let Claude walk us through most of the story here, but I’ll interject again a bit later with some current updates.

The radio-expand PR

If you read the last entry, the problem was that RADIO.DAT’s binary format uses 16-bit unsigned integers to store the byte-length of each call container. That’s a hard ceiling of 65,535 bytes per call. The staff calls in MGS Integral — with their enormous banks of random trivia — were already brushing that limit in the original Japanese, and injecting English text was going to push several of them right over it.

The nuclear option I floated was: modify the game’s executable to read 4-byte lengths instead of 2-byte ones, and update the recompiler to write them. It seemed insane at the time. Two months of sitting with it later, it’s merged.

PR #8 (radio-expand) landed on March 8th. The stats alone tell the story — 482 lines changed in RadioDatRecompiler.py, 162 in RadioDatTools.py. The new --long flag tells the recompiler to write 4-byte length fields for every container that needs it. It’s already wired into the build script:

python3 myScripts/RadioDatRecompiler.py --long workingFiles/jpn-d1/radio/RADIO-merged.xml ...

The game reads it fine. The binary compiles, the calls load, the codec windows open. It’s not the cleanest solution in the world — shipping with a patched executable raises some questions I’ll need to work through before this is truly “done” — but the gains are real and the alternative (cataloguing and trimming 80+ staff calls worth of trivia content) would have taken months.

Implications

So, here’s the tricky thing. Initial tests were really encouraging. It took AGES for Claude and I to come up with the right place to stick the modified code in trampoline jumps in the japanese binaries. After much testing, we now have a working modifier for the binary. But several things came from this.

  • I did always wonder why Integral specifically was using 0x800 offset blocks but originals weren’t. It’s likely due to the sheer size of the data, and because they have both english and japanese subs. But it did throw me for a few loops because A: STAGE.DIR instead used offset in 0x800 blocks (like other files do) rather than literal offsets that USA / JPN versions did. So the patcher had to be modified. B: Going back through the parser code, I knew there were staff calls but like an idiot I forgot the other new frequency in Integral: Healing Radio! Yes, if you didn’t know there was one, calling 140.66 plays one of three tracks, including a MGS1 Sound font version of Theme of Solid Snake.
  • Allegedly we hid the code in unused portions of the binary, but we’ll need to test thoroughly for crashes.
  • Luckily, as this doesn’t affect stage overlays, the only change needed is to the main binary on discs 1-2.

The kanji cleanup (PR #7)

The week before also brought in a smaller but satisfying cleanup: PR #7 (integral-claude) pulled in a new mgs_font_text.py utility module (196 lines) and cleaned up characters.py significantly. The main win here was using Claude to systematically compare the kanji present in the ABST stage data against the base font — basically running a diff of “what characters does the game’s font actually contain” against “what characters are in our dictionary.” Found and removed a bunch of phantom entries that were never in the font to begin with, and filled in a few real ones that were missing.

Not glamorous, but missing characters are one of those bugs that only surfaces mid-playthrough when a codec window just… renders garbage.

Radio debugging and cleanup (Mar 10)

After the PR landed there was still some rough edge work. A missed call in the graphics extraction logic needed a tweak in RadioDatTools.py — the kind of thing that only shows up when you actually run the extraction on the full disc and spot the one call that came through wrong. Added some debugging output for the jpn radio patch specifically; helpful for tracing injection issues without having to stare at raw hex. Claude also ripped out a bunch of leftover logging scaffolding that had accumulated — the file’s cleaner for it.

ZMOVIE fix (Mar 12)

The movie splitter had a bug in the two-block case — specifically, how it was building the subtitle dict when the graphics data spanned block0 and block1. The indices were getting confused between the two chunks, so some entries weren’t being picked up in the extraction. Fixed now, and it ties back to what’s in the memory notes about zmovie-02 being the only movie with chunk_count == 2. The fix was a refactor of the dict logic rather than a one-liner — about 33/31 line swap across the block, cleaner to follow.

The translation sprint

This is the one that feels genuinely big. From March 10th through the 14th, the story calls for disc 1 are now done.

That’s the full codec library — all the story calls Snake can place, through the end of the disc. The torture sequence was translated. The Meryl and Wolf sections came in. And then on the 14th, the commit just says: Story calls disc 1 done.

That’s been a long road. The last few months have had a lot of tooling work that was necessary before this could even be attempted at full confidence — the round-trip testing, the recompiler bug, the overflow problem, the expansion. Now the translation is actually running into that work and benefiting from it.

A chunk of VOX entries for in-game dialogue also came in on the 14th, which is a different system but part of the same push. VOX covers the voice cue data for lines that play during gameplay (not codec, not cutscene). Getting those indexed and translated is its own project.

Also: a “Fix johnny line” commit on the 16th. You hate to see it. You love to see it.

VOX offset adjuster (Mar 16)

The last thing this week was a bigger piece of work than the commit message suggests. voxOffsetAdjuster.py is a new 149-line script in StageDirTools/. The problem it solves: when voxTextInjector.py injects new text into a VOX block, the block can grow. If a block grows, all the offsets downstream in STAGE.DIR that point to those blocks are now wrong. Without adjustment, the game can’t find them.

The demo equivalent (demoOffsetAdjuster.py) already existed. VOX now has parity. The rejoiner was updated to track the offset map, the injector was updated to report expansion size correctly, and the build script now calls the adjuster in the right order:

python3 myScripts/voxTools/voxTextInjector.py
python3 myScripts/voxTools/voxRejoiner.py ...
python3 myScripts/StageDirTools/voxOffsetAdjuster.py -f -o ... -n ... -s ...

That -f flag (finalize) writes the adjusted STAGE.DIR in place. Before this, VOX injection was effectively untested in the full pipeline — the offsets would have been wrong in-game. Now it’s wired properly.

Where things stand

The tooling is genuinely mature at this point. RADIO, DEMO, VOX, ZMOVIE all have full extract/inject/recompile cycles. Round-trip tests pass on jpn and usa disks. The build script runs end-to-end on disc 1. The overflow problem that felt like a project-killer two weeks ago is resolved.

What remains is mostly translation work and some unresolved edge cases:

  • Disc 2 demo translation is barely started
  • The Integral disc RADIO.DAT still has the null byte padding issue (round-trip fails, not a crash — but it’s a loose end)
  • STAGE.DIR still needs an extraction script; packing still requires wine + ninja.exe
  • ZMOVIE subtitles are in progress but not complete

But disc 1 story calls being done is a real milestone. The skeleton is there. Playtesting can actually happen now in a meaningful way.

-J-Rush