DEMO.DAT contains polygon demo files (cutscenes) and the audio used. Editing these is actually not super hard.

While it is possible (and quite easy) to simply replace the Japanese DEMO audio with the audio files from the US version, subtitles will be off as the timings are not exact. Also, the Japanese version was earlier and a bit more rough, so some camera angles are slightly different. I chose early on to keep these and only modify dialogue to get as close to the original experience as possible.

Extracting the Demo Text/Files

Currently many of the scripts are hard coded. This means you just run the script with no command line arguments, but the downside is that modifying the outputs requires editing some of the script file.

In the future these will work a bit differently, but for now here is the order of operations:

Extract individual demo to individual files.
For each file extracted, extract the dialogue from each file.
Write all dialogue to a json file, along with the timings.

Timings consist of two numbers. The first number is the frame number where the dialogue appears. The second number is the number of frames the text stays onscreen for.

Unlike RADIO which has 4 lines, Demo dialogue only has two lines to work with. Writing more than that will show offscreen at best, or crash the game at worst.

Splitting files:

If you want to change the directories that is fine, but here’s where the mappings are currently:

demoSplitter.py

version = "jpn"
disc = 1
filename = f"build-src/{version}-d{disc}/MGS/DEMO.DAT"
outputDir = f"workingFiles/{version}-d{disc}/demo/bins"
offsetJson = f"workingFiles/{version}-d{disc}/demo/bins/demoOffsets.json"

Because demo.dat is completely different for discs 1 and 2, these are separated. You can move these anywhere you want, just know that it will be referenced in the future scripts.

Mac/Linux:

python demoSplitter.py

Extracting text

Mac/Linux:

python demoTextExtractor.py

Settings:

version = "jpn"
disc = 1
inputDir = f'workingFiles/{version}-d{disc}/demo/bins'
outputDir = f'workingFiles/{version}-d{disc}/demo/texts'
outputJsonFile = f"workingFiles/{version}-d{disc}/demo/demoText-{version}.json"

Two things are output:

For each demo, a file is written in the texts folder. This has literally one line of dialogue per line, and no timings.
The json file that contains text and timings. It’s best to just work with this.

Modifying text

Version 1

Originally, texts and timings were listed under each demo (Timings are just a pair of Appear frame, # of frames to show / duration):

{
    "demo-01": [
        {
            "00": "In Alaska's Fox Archipelago,｜on Shadow Moses Island...",
            "01": "FOX-HOUND, along with the｜Next-Generation Special Forces...",
            "02": "...led a revolt and successfully｜took control of the",
            "03": "Nuclear Weapons Disposal Facility｜located on the island.",
            "04": "They're demanding that the government",
            "05": "turn over the remains of Big Boss.",
            "06": "They say that if their demands｜are not met within 24 hours,",
            "07": "they'll launch a nuclear weapon.",
            "08": "You'll have two mission objectives.",
            "09": "Infiltrate the Facility...",
            "10": "Rescue hostages DARPA chief｜Donald Anderson...",
            "11": "...and President of ArmsTech,｜Kenneth Baker.",
            "12": "Secondly, determine whether｜or not the terrorists",
            "13": "have the ability to launch a nuclear device",
            "14": "and stop them if they do.",
            "15": "What's the insertion method?",
            "16": "We'll approach the disposal facility｜by sub.",
            "17": "And then?",
            "18": "We'll launch a one-man SDV｜(swimmer delivery vehicle).",
            "19": "After the SDV gets as close as it can,｜dispose of it.",
            "20": "From there on you'll have to swim."
        },
        {
            "0": "913,138",
            "1": "1059,109",
            "2": "1169,36",
            "3": "1209,33",
            "4": "1278,58",
            "5": "1339,34",
            "6": "1411,85",
            "7": "1505,78",
            "8": "1636,57",
            "9": "1705,41",
            "10": "1753,105",
            "11": "1866,102",
            "12": "2000,45",
            "13": "2050,43",
            "14": "2102,60",
            "15": "2180,29",
            "16": "2211,72",
            "17": "2286,26",
            "18": "2314,53",
            "19": "2412,89",
            "20": "2537,36"
        }
    ],

This was advantageous for translation when the lines matched, but two things would make it difficult:

Copying multiple lines is good, but sometimes the dialogue is broken in different places. I.e. saying “Snake” or “I see…” in the english version is part of another line, but Japanese separates it.
Once a substitution is done like above, the numbers are off.

Hence I made a second version, seen below:

Version 2

{
    "demo-01": {
        "897": {
            "duration": "96",
            "text": "The nuclear weapons disposal｜facility on Shadow Moses Island"
        },
        "996": {
            "duration": "51",
            "text": "in Alaska's Fox Archipelago"
        },
        ...

The json has each demo listed as a key, with the text laid out like so (based on example above): “897” – Each key is a single subtitle. The 897 refers to the appear frame, 897. “Duration” – The number of frames the line of dialogue stays on screen. “text” – The subtitle text. Line breaks are currently the ｜ character, which is a double-wide | pipe unicode character. To ensure we use the correct character for line breaks, always copy it from another line.

U+FF5C : FULLWIDTH VERTICAL LINE

Version 2 made it easier to insert lines without having to redo the numbering for subsequent lines. However copying multiple lines at once that match was more difficult.

If you want to convert between the two versions, use this with the

Mac/Linux:

python DemoTools/demoJsonConverter.py demo-text.json

Re-injecting texts

At time of writing, we must use v1 version of the json file when re-injecting texts.

Settings:

inputDir = f'workingFiles/{version}-d{disc}/demo/bins'
outputDir = f'workingFiles/{version}-d{disc}/demo/newBins'
injectJson = f'build-proprietary/demo/demoText-{version}-undub-output.json'

Currently the way this script works:

Iterate through any keys in the json file (so if you exclude some demos, they will not be modified)
Open the original .dmo binary file
Inject the new dialogue
Write the file to the newBins folder.

I believe we will give warnings if the new file is made longer than the original (in blocks). If so, then modifying stage.dir will be needed.

Outputting the DEMO.DAT file.

Mac/Linux:

python DemoTools/demoRejoiner.py

Settings:

# Directory configs
inputDir = f'workingFiles/{version}-d{disc}/demo/bins'
outputDir = f'workingFiles/{version}-d{disc}/demo/newBins'
outputDemoFile = f'workingFiles/{version}-d{disc}/demo/new-DEMO.DAT'
offsetDump = f'workingFiles/{version}-d{disc}/demo/newDemoOffsets.json'
os.makedirs(outputDir, exist_ok=True)

How this script works:

For each demo listed (getting names from inputDir):
- if the new bin exists, add it to a running total of all data.
- If it does not, use the original.
Output the total file.
Also output offsets (used with original offsets to modify STAGE.DIR if necessary)

If it is necessary to modify STAGE.DIR, we will need to run stageDirTools/demoOffsetAdjuster.py.

Fixing STAGE.DIR offsets:

If some demos were longer than the originals, we will need to modify the offsets like we do in Radio. Hence, the output of the original offsets (from demoSplitter.py) and the offsets output from demoRejoiner.py need to be opened, as well as the original STAGE.DIR file (Or, a file we’ve already been working with for other adjustments).

Like the Radio modifier, it should not affect anything but Demo offsets. However, once modified, we cannot re-run it, so keep the original on hand.

Settings:

# Necessary files
oldOffsetFile = "workingFiles/jpn-d1/demo/bins/offsets.json"
newOffsetFile = "workingFiles/jpn-d1/demo/newDemoOffsets.json"
stageFile = "workingFiles/jpn-d1/stage/STAGE-j1.DIR"

Mac/Linux:

python stageDirTools/demoOffsetAdjuster.py