Letting Claude poke around the code

So I’ve been aware for a while that there were a handful of rough edges in the scripts — things I kept meaning to go back and clean up but never prioritized because they technically “worked.” Today I decided to actually sit down and work through some of them with Claude Code, which turned into a pretty productive session.

The audit

The first thing I did was have Claude read through demoClasses.py and demoManager.py and just tell me what it saw. These are the newer libraries I’ve been building for working with DEMO.DAT and VOX files — they’re meant to be a cleaner, more reusable foundation than the older extraction scripts.

The list it came back with was longer than I expected. The highlights:

  • The demo class has an XML element initialization path that’s explicitly marked # TODO: THIS IS UNFINISHED! — it sets a few attributes but never populates self.segments
  • demo.toBytes() references self.items which doesn’t exist — it should be self.segments. Also has an integer vs length bug in the padding check.
  • captionChunk.toBytes() is just pass with a TODO comment
  • Both audioChunk.toBytes() and demoChunk.toBytes() have a += + self.content typo that would throw a TypeError at runtime
  • splitVagChannels() has uninitialized variables and uses 'rb' (read mode) where it should be 'wb' for writing
  • parseDemoFile() in demoManager has a bug where the last demo gets the wrong slice of data
  • The createXMLDemoData() fallback case calls root.append("unknownChunk", {...}) which isn’t valid for ElementTree

None of these are surprising — this is actively being built out — but it was useful to have them all listed in one place. I added everything to the Active Issues Kanban.

Documentation review

I also pointed Claude at the docs and asked it to flag anything out of date. A few things came up:

  • The README was listing splitDemoFiles.py as the demo splitting script, but the actual file has always been DemoTools/demoSplitter.py
  • The Demo instructional had an offset filename inconsistency (offsets.json vs demoOffsets.json) in the STAGE.DIR section
  • The Code Documentation for demoClasses was mostly a stub — the class attribute table was missing modified and segments
  • The Main App Documentation file had literally one unfinished sentence
  • Some chunk type descriptions in the DEMO-VOX-ZMOVIE technical doc didn’t match how the code was actually treating them (the ImHex struct shows u24 for the length field, but the code reads it as u16 plus a padding byte)
  • The Tools kanban still had “Radio.dat Documentation” as a TODO, even though the complete guide has been in Technical Docs for a while now

All of those got logged in the kanban as well. I also had Claude generate a conventions document based on a survey of the codebase — basically a list of how I tend to write Python: how I handle imports, type hints, file I/O, naming, and so on. That’s now saved in Technical Docs as conventions.md. Mostly useful as context for future AI assistance, but also just good to have written down.

Actual fixes

After the audit we branched off main to start working through the issues. We didn’t get to the demoClasses bugs yet — those need more thought. But we did clean up two categories of problems across the broader codebase.

Bare except: clauses

There were five of these in the active scripts (ignoring Old vers/, which is deprecated):

  • RadioDatTools.py — a dict lookup that should have been except KeyError:
  • mgs-data-splitter.py — two subprocess calls catching everything, and the error message was print(Exception) (the class, not the instance) so it would never actually print the error
  • translation/radioDict.py — two places where dict.get() returns None and then None gets concatenated to a string, causing a TypeError. Rather than catching TypeError, we just check for None before concatenating.

File handle management

This one was more widespread. The scripts had a mix of open() with explicit .close() calls and with open() context managers, which is inconsistent. More importantly, a couple of spots were missing the .close() entirely:

  • demoSplitter.pydemoFile was opened to read the source data and never closed. The offsetFile written during splitting also had no close call.
  • RadioDatTools.pyradioFile read the binary data and was never closed.

Everything got standardized to with open() across RadioDatTools.py, RadioDatRecompiler.py, xmlModifierTools.py, demoSplitter.py, demoRejoiner.py, and demoTextInjector.py. The one exception is the log file handle in RadioDatTools.py, which is intentionally kept open for the life of the script.

What’s next

There are still a few items from the conventions review we didn’t get to — the global state situation and the hardcoded paths mixed with argparse. And of course, the actual demoClasses bugs that need real fixes. Those are all sitting in the kanban now.

The fixes we landed today are on a claude-fixes branch on GitHub, not yet merged to main. Before that happens I want to do some actual testing to make sure nothing regressed — particularly the demo pipeline, since a few of those files touched the injector and splitter logic.

Didju Riket

I’m still hesitant letting Claude go wild across the code base. I still need to know everything about how it works as I build it going forward, but even so. It’s encouraging to have essentially a coding underling that can present it work for me to approve.

More soon. Actually most of this blog was written by Claude as a summary of what I’ve done this evening. Personally I’ve been super excited using claude but my enthusiasm for working on the undub itself has been waning; not for lack of interest, but mostly because I’m very tired from being a parent and being awoken at 5 daily, and also because my code base is very large. There are a lot of branches to this project. I always want to pick one branch and stick with it, but often I get a little stuck and swapping to another branch helps me rejuvenate and get back on track. That always comes with some slippage of knowledge of what I was working on though.

I do promise though work is moving forward. When I’m tired I tend to gravitate to doing some rote translations and making progress on the actual translation. When I can focus on code fixes, thats where I go.

And I promise this section was written by me :) If you guys have issues with me using Claude to summarize my work, let me know, but it would actually allow me to keep better track of what I do.

Anywho, onto a bit more code review for the evening and then turning in. Talk soon

J-Rush