Radio.dat contains Codec call dialogue, as well as scripts that the game uses to conduct the call. Basically it includes things like cues for the audio files and references their offsets in VOX.DAT, commands to add someone’s codec number to your memory, logic checks for different dialogue trees, etc. The codec dialogue for subtitles is read from here, though the lip syncing is read from VOX.DAT.

How to Extract

RADIO.DAT is structured in a way that must be preserved when everything is extracted. Hence, the file, once decoded, is kept in an XML file to ensure structure isn’t broken. This is not really easy to work with for translation, so I’ve also set it to export a json similar to Iseeeva’s tool (Credit to Iseeeva: https://github.com/iseeeva whose project is no longer online)

To extract, use RadioDatTools.py.

Linux/Mac:

python Scripts/RadioDatTools.py path/to/RADIO.DAT -xz radio-out

You’ll get two files:

radio-out.xml
radio-out.json

The json is the easier way to modify data, and there is a script to inject the translation into the XML (which preserves the rest of the binary data). There are 4 sections:

Section What is contained here
calls Call Dialogue. Keys are call offsets (based on where they are found in the original file) which serves as a unique identifier for the data. Dialogue lines also use their offset byte as a unique identifier.
saves Save file titles. These are the names of the locations where the game is saved that are written to the memory card. They are usually in double-wide english characters, but either double or single can work. Swapping those settings may involve manually modifying the recompiler script.

There are multiple because there are several times these are referenced. It’s best to modify once and copy paste the value to the rest of the keys. I think there was a script that could do that automatically.
freqAdd These are the names of the callers that are saved to the Codec Memory. It is scripted into calls. These are also saved in the data, so if you are translating and save a codec number before adjusting it, it will stay that way in your save file’s memory. Their offsets are unique as these could have multiple calls where the number is saved.
prompts There’s a “Prompt” type which is when calling Mei Ling and being prompted to Save or not to save.

There are multiple because there are several times these are referenced. It’s best to modify once and copy paste the value to the rest of the keys. I think there was a script that could do that automatically.

Injecting the json into the XML

Use xmlModifierTools.py to inject the json translation into the XML. The command will look like this:

Mac/Linux:

python xmlModifierTools.py inject radio-out-Iseeeva.json radio-out.xml 

Windows:

python.exe xmlModifierTools.py inject radio-out-Iseeeva.json radio-out.xml 

The output is always filename-merged.xml so that we don’t overwrite the original. In the future this will be changed to overwrite the file, so make sure you back up the original!

Making your own RADIO.DAT file

The final step is taking the XML file and recompiling it into a binary file. To do so, use the RadioDatRecompiler.py script. This command recompiles only:

Linux/Mac:

python Scripts/RadioDatRecompiler.py radio-out-merged.xml new-RADIO.DAT 

To make a usable file, though there are two extra components:

  1. We need to use the -p flag (for prepare) which re-calculates the lengths of all of the text that was modified. Because the file is structured, there are lengths recorded in bytes in several places that need to be modified. The -p flag will calculate it for all modifiable values above.
  2. Second, if any of the Radio calls have more or less length (basically any changes) then we need to adjust the offsets for the scripts as they appear in STAGE.DIR (which holds GCL scripts that call the scripts here). For this you’d use -s and -S flags for the original and new STAGE.DIR file (respectively)

The final command will look like this:

Linux/Mac:

python Scripts/RadioDatRecompiler.py -p radio-out-merged.xml new-RADIO.DAT -s build-src/MGS/STAGE.DIR ./new-STAGE.DIR

Windows:

python.exe .\mgs1-scripts\RadioDatRecompiler.py .\Radio-out-merged.xml new-RADIO.DAT -p -s '.\Metal Gear Solid (USA) (Disc 1) (v1.0)\MGS\STAGE.DIR' -S new-STAGE.DIR

This will output the following files:

  • new-RADIO.DAT » Your new radio call file.
  • new-STAGE.DIR » Stage file with modified offsets so that calls are played properly.

Known issues

Here are some known issues or limitations that you need to be aware of when changing Radio Dialogue.

1. False flags for calls

When we recompile, there will be some invalid keys. I haven’t written the scripts yet to pull the specific stages the GCL reports them, but these ones came from the USA metal gear version:

Finished checking for calls in STAGE.DIR! Ready to proceed.
ERROR! Offset invalid! Key: 63067457 returned ('12967', '000032a7')
ERROR! Offset invalid! Key: 63073563 returned ('45618', '0a00b232')
ERROR! Offset invalid! Key: 63073706 returned ('42407', '0000a5a7')
ERROR! Offset invalid! Key: 63073852 returned ('42407', '0000a5a7')
ERROR! Offset invalid! Key: 63073919 returned ('42407', '0000a5a7')
ERROR! Offset invalid! Key: 63074027 returned ('42869', '0100a775')
ERROR! Offset invalid! Key: 63074067 returned ('44726', '0000aeb6')
ERROR! Offset invalid! Key: 63074107 returned ('45037', '0100afed')
ERROR! Offset invalid! Key: 63074147 returned ('45466', '0000b19a')

As we can see they’re mostly in the same range, these ones I can confirm come from GCL in stage s00aa, which was a demo/unused stage in the final release. These errors are normal. The offset replacement is quick and dirty but will not modify anything else so it is safe to use. However, it does not have the capability to tell you which stage was modified. In the future the stage tools will be more robust. It also reads sequentially which makes it take a long time to process the file.

2. The Mei Ling problem

Because call scripts and most of the components used have lengths baked in, they keep track of how many bytes are contained ahead. Unfortunately they only use a two-byte integer, so the maximum size for a call script is 0xFFFF, or 65535 bytes. The problem with this is that when you inject text and it is longer than the original, we run the risk of increasing a call’s script size beyond this fixed limit.

Why is it a Mei Ling problem? Well, recall the Save blocks that hold the location for the title of the Save file. (i.e. Heliport). In Japanese they have maybe 2-3 japanese characters, each being two bytes. But in English, Heliport is 8 characters. And for others (Comm Tower A) they are much longer.

On its own is not a big deal, but these are repeated multiple times. And Mei Ling calls in particular have many branching paths all contained in the same call. So while in the english version we don’t have any significantly large scripts, the Japanese version has some that are very close to the limit already, and replacing the save blocks and titles proves to be far too much modification.

IF you see an error like this, you’ll know a call has exceeded the limit:

CALL AT OFFSET 887492 HAS A LENGTH THAT IS TOO LONG! Please fix!
Offset 887530 has failed check!!!

This isn’t a guarantee the game will crash, but if you have a call that has overrun the limit, then most likely that call will crash the game.

This is an issue I’ve not fixed yet, but it may take manually splitting that call into two separate calls that appear in different levels. That way all content is preserved, but how it is triggered will be slightly different. Many calls including Mei Lings save proverb calls are re-used in different stages.

3. We can’t repeatedly re-use the same STAGE.DIR file.

Calls and dialogue are kept unique to the original game by the offset where they originally appeared. If you need to modify an already modified game ISO, you can, but you must always return to the original file. Extracting the radio dialogue will use the offsets where the dialogue is found, and if it’s already been modified, then the numbers will change.

This serves as also true for the STAGE.DIR file. The fix for offsets does this:

  1. Scan the provided STAGE.DIR for any offsets
  2. Uses the offsets found in the original XML to match against these.
  3. Replace them with the NEW offsets calculated with the new lengths of characters at recompile.

So, if you are working with the original, always start from the original. If you modify someone else’s mod, always start with a clean extract from their mod.

4. Discs one and two are same but different

Codec dialogue is mirrored between discs 1 and 2. For that reason, you can use the json for both versions. However the XML files are different for each disk.

Some calls are not used for both discs, so if a call on disc 1 contains dialogue on disc 2, all of the “VOX_CUES” will have null audio offsets. In other words, it will not pull the audio from VOX, because the audio is not on that disc.

You can reuse the same json and inject it into xml exported from both discs to work around the issue. So far most of the versions I have reviewed are the same except for those offsets.