Add hook debug tooling and refine RT3 atlas

2026-04-08 16:31:33 -07:00 · 2026-04-08 16:31:33 -07:00 · 57bf0666e0
commit 57bf0666e0
parent 860d1aed90
38 changed files with 14437 additions and 873 deletions
--- a/docs/debug-load-workflow.md
+++ b/docs/debug-load-workflow.md
@ -0,0 +1,353 @@
+# Debug Load Workflow
+
+Use this when comparing:
+
+- one successful manual load of `hh`
+- one failing hook-driven auto-load attempt
+
+The goal is to compare the real successful owner path above `0x00445ac0` against the failing hook-driven path.
+
+## Current Findings
+
+From the current logs:
+
+- successful manual load now has a grounded pre-call site at `0x004390cb` with:
+  - `ECX = 0x02af5840`
+  - `[0x006cec78] = 0x02af5840`
+  - `[0x006cec74] = 0x01d81230`
+  - top-of-stack dwords:
+    - `arg1 = 0x01db4739`
+    - `arg2 = 4`
+    - `arg3 = 0x0022fb50`
+    - next dword = `0x026d7b88`
+- the subsequent successful `0x00445ac0` entry still has:
+  - `ret = 0x004390d0`
+  - `arg1 = 0x01db4739`
+  - `arg2 = 4`
+  - `arg3 = 0x0022fb50`
+- older failing auto-load attempts never reached `0x00445ac0`
+- the earlier failing breakpoint was `0x00517cf0` with:
+  - `[0x006cec78] = 0`
+  - `[0x006cec74] = 0x01d81230`
+- the staged request globals at `0x006ce9b8..0x006ce9c4` and `0x006d1270..0x006d127c` are zero on the successful manual path
+
+That older `0x00517cf0` result is no longer the current blocker. The hook now reaches the real coordinator entry, so the remaining gap is later shell timing or re-entrancy, not request-latch shape.
+
+The disassembly at `0x004390b0..0x004390cb` is now the strongest grounded manual-load branch:
+
+- it writes `[0x006cec74+0x6c] = 1`
+- it computes `arg1` from `([0x006cec7c] + 0x11)`
+- it pushes `arg2 = 4`
+- it passes `arg3 = &out_success`
+- and then calls `0x00445ac0`
+
+So any hook experiment that does not reproduce that exact shape is no longer a plausible match for the successful manual path.
+
+## Latest Auto-Load Comparison
+
+The newest hook-driven debugger run now reaches `0x00445ac0` directly.
+
+At the auto-load `0x00445ac0` breakpoint:
+
+- stack:
+  - `ret = 0x7650505c` inside `dinput8`
+  - `arg1 = 0x01db4739`
+  - `arg2 = 4`
+  - `arg3 = 0x0022fcf8`
+- globals:
+  - `[0x006cec74] = 0x01d81230`
+  - `[0x006cec7c] = 0x01db4728`
+  - `[0x006cec78] = 0x026d7b88`
+
+Compared to the successful manual path:
+
+- `arg1` matches exactly: `0x01db4739`
+- `arg2` matches exactly: `4`
+- `[0x006cec74]` matches exactly: `0x01d81230`
+- `[0x006cec7c]` still matches the same runtime-profile base used to derive `arg1`
+- `[0x006cec78]` is now non-null and published before entry
+
+So the hook is no longer missing the coordinator entry shape. The remaining question is no longer "can we reach `0x00445ac0`?" but "does the live non-debugger call return successfully and trigger the actual restore transition?"
+
+## Latest Live Crash
+
+The latest non-debugger auto-load run now reaches:
+
+- `rrt-hook: auto load ready gate passed`
+- `rrt-hook: auto load restore calling`
+
+and then crashes at:
+
+- `0x0053fea6`
+
+The local disassembly around `0x0053fe90` shows a shell-side list traversal over `[this+0x74]` that walks linked entries and calls a virtual method on each. The crash instruction at `0x0053fea6` dereferences one traversed entry:
+
+- `mov eax, DWORD PTR [esi]`
+
+That strongly suggests the current hook is invoking the restore from the right call shape but on the wrong shell-pump turn. The active hypothesis is now timing or re-entrancy:
+
+- the hook detects readiness and fires restore on the same shell-pump turn
+- RT3 later re-enters shell object traversal in a phase where one list entry is still invalid
+
+So the next experiment is to defer the actual restore by additional ready shell-pump turns instead of firing on the first ready turn.
+
+## Manual Owner Tail
+
+The branch at `0x004390b0..0x004390ea` now has a grounded post-call tail too:
+
+- `0x004390cb` calls `0x00445ac0`
+- `0x004390d0` immediately calls `0x004834e0(0, 1)` on `0x006cec74`
+- if `out_success != 0` or `esi != 0`, `0x004390ea` calls `0x004384d0`
+- then `0x004390ef` calls `0x0053f310` on `0x00ccbb20`
+- then `0x00439104` calls `0x004834e0(0, 1)` again
+
+The successful manual breakpoint at `0x004390cb` shows `ESI = 0` and `EDI = 1`, so the manual load branch only forces the `0x004384d0` post-load pipeline when `out_success` comes back nonzero.
+
+That makes the current hook gap narrower still: even with the correct `0x00445ac0` arguments, returning directly into `dinput8` skips RT3's own owner-tail work unless we mirror it ourselves.
+
+## Owner Xrefs Above `0x438890`
+
+The containing owner at `0x00438890` is now grounded as a larger `thiscall` shell owner with two stack arguments. Current xrefs found in local disassembly are:
+
+- `0x00443b57`
+- `0x00446d7f`
+- `0x0046b8bc`
+- `0x004830ca`
+
+The strongest caller so far is `0x004830ca`:
+
+- it publishes `0x006cec78 = eax`
+- then calls `0x00438890` as `thiscall(active_mode, 1, 0)`
+- it sits inside `shell_transition_mode`
+- it is the branch that constructs `LoadScreen.win` through `0x004ea620`
+- and it continues through shell-window follow-up on `0x006d401c` after the `0x00438890` call
+
+The surrounding mode map is tighter now too:
+
+- mode `1` = `Game.win`
+- mode `2` = `Setup.win`
+- mode `3` = `Video.win`
+- mode `4` = `LoadScreen.win`
+- mode `5` = `Multiplayer.win`
+- mode `6` = `Credits.win`
+- mode `7` = `Campaign.win`
+
+That makes `0x00438890(active_mode, 1, 0)` the strongest current RT3-native entry candidate for reproducing the successful manual load branch, because it owns the internal dispatch that later reaches `0x004390cb`.
+
+Current static xrefs also tighten the broader ownership split:
+
+- `0x00443b57` calls `0x00438890` from the world-entry side, but with `(0, 0)` after dismissing the current shell detail panel and servicing `0x4834e0(0, 0)`
+- `0x00446d7f` calls it from the saved-runtime restore side with the same `(0, 0)` shape before immediately building `.smp` bundle payloads through `0x530c80/0x531150/0x531360`
+- `0x0046b8bc` calls it from the multiplayer preview family before a later `0x00445ac0` call
+- `0x004830ca` calls it from the shell-side active-mode branch with the clearest `(1, 0)` setup
+
+So the function is no longer just a guessed hook target. It is now a real shared owner above world-entry, saved-runtime restore, multiplayer preview, and shell-side active-mode startup branches.
+
+The internal selector split inside `0x00438890` is tighter now too:
+
+- `[0x006cec7c+0x01]` is a startup-profile selector, not the shell mode id
+- selector values `1` and `7` share the tutorial lane at `0x00438f67`, which writes
+  `[0x006cec74+0x6c] = 2` and loads `Tutorial_2.gmp` or `Tutorial_1.gmp`
+- selector `2` is a world-root initialization lane at `0x00438fbe` that allocates `0x0062c120`
+  when needed, runs `0x0044faf0`, and then forces the selector to `3`
+- selector `4` is a setup-side world reset or regeneration lane at `0x00439038` that rebuilds
+  `0x0062c120` from setup globals `0x006d14cc/0x006d14d0`, then runs `0x00535100` and `0x0040b830`
+- selector values `3`, `5`, and `6` collapse into the same profile-seeded file-load lane at
+  `0x004390b0..0x004390ea`
+- selector `6` is the one variant that explicitly writes `[0x006cec74+0x6c] = 1` before the
+  shared file-load call
+
+Current grounded writers now tighten those values too:
+
+- `Campaign.win` writes selector `6` at `0x004b8a2f`
+- `Multiplayer.win` writes selector `3` on one pending-status branch at `0x004f041e`
+- the larger `Setup.win` dispatcher around `0x005033d0..0x00503b7b` writes selectors `2`, `3`, `4`,
+  and `5` on several validated launch branches
+- so the shared file-load lane is now best read as one reused profile-file startup family rather
+  than one owner-specific manual-load path
+
+That means the successful manual-load branch is not the whole function. It is one three-selector
+subfamily inside a broader startup dispatcher that also owns tutorial and fresh-world setup lanes.
+
+The multiplayer preview side is also tighter now:
+
+- `0x0046b8bc` publishes `0x006cec78`
+- calls `0x00438890` as `thiscall(active_mode, 0, 0)`
+- clears `[0x006cec74+0x6c]`
+- and only then calls `0x00445ac0(0x006ce630, [0x006ce9c0], 0)`
+
+That makes the preview relaunch path clearly different from the manual load branch, not just a differently staged copy of it.
+
+## Latest Headless Debugger Result
+
+The scripted auto-load debugger run is now useful without manual interaction:
+
+- all breakpoints were set successfully:
+  - `0x00438890`
+  - `0x004390cb`
+  - `0x00445ac0`
+  - `0x0053fea6`
+- but only `0x0053fea6` actually fired in the captured run
+
+So the current non-interactive path is good enough to gather repeatable crash-side state, but it also tells us that the current auto-load code path is still not obviously traversing the larger-owner breakpoints under `winedbg`. The next step is therefore more hook-side logging around the `0x00438890` call itself rather than more manual debugger work.
+
+The latest static pivot also means the next reverse-engineering step does not require a live run:
+
+- compare the mode-`4` `LoadScreen.win` owner path at `0x004830ca` against the world-entry and
+  saved-runtime callers of `0x00438890`
+- compare how the `(1, 0)` `LoadScreen.win` lane diverges from the `(0, 0)` world-entry and
+  saved-runtime lanes before control reaches the shared `0x004390b0` manual-load branch
+- only then return to hook experiments
+
+## Launchers
+
+Manual debugger run:
+
+```bash
+tools/run_rt3_winedbg.sh
+```
+
+Auto-load debugger run:
+
+```bash
+tools/run_hook_auto_load_winedbg.sh hh
+```
+
+Both scripts use `/opt/wine-stable/bin/winedbg` explicitly, so they do not depend on `winedbg` being on `PATH`.
+They also default to:
+
+- their matching command file in [tools/](/home/jan/projects/rrt/tools)
+- a logfile in the repo root:
+  - [rt3_manual_load_winedbg.log](/home/jan/projects/rrt/rt3_manual_load_winedbg.log)
+  - [rt3_auto_load_winedbg.log](/home/jan/projects/rrt/rt3_auto_load_winedbg.log)
+
+To save the full interactive debugger session to a file, set `RRT_WINEDBG_LOG`:
+
+```bash
+RRT_WINEDBG_LOG=/tmp/rt3-manual-load-winedbg.log tools/run_rt3_winedbg.sh
+```
+
+or:
+
+```bash
+RRT_WINEDBG_LOG=/tmp/rt3-auto-load-winedbg.log tools/run_hook_auto_load_winedbg.sh hh
+```
+
+Those wrappers use `script`, so both the commands you type and the debugger output are captured.
+
+`winedbg` under `/opt/wine-stable` also supports command files directly:
+
+```bash
+tools/run_rt3_winedbg.sh
+```
+
+and:
+
+```bash
+tools/run_hook_auto_load_winedbg.sh hh
+```
+
+Override either default if needed:
+
+```bash
+RRT_WINEDBG_LOG=/tmp/rt3-manual-load-winedbg.log tools/run_rt3_winedbg.sh
+```
+
+Ready-made debugger command files are also provided:
+
+- [winedbg_manual_load_445ac0.cmd](/home/jan/projects/rrt/tools/winedbg_manual_load_445ac0.cmd)
+- [winedbg_auto_load_compare.cmd](/home/jan/projects/rrt/tools/winedbg_auto_load_compare.cmd)
+
+If you do not use `RRT_WINEDBG_CMD_FILE`, you can still open those files and paste their contents into the debugger manually.
+
+Both scripts rebuild `rrt-hook`, copy `dinput8.dll` into the Wine RT3 directory, and launch RT3 under `winedbg`.
+
+## Successful Manual Load
+
+1. Launch:
+
+```bash
+tools/run_rt3_winedbg.sh
+```
+
+2. The default command file now breaks on both:
+   - `0x004390cb` first
+   - `0x00445ac0` second
+
+3. In RT3, load save `hh` manually.
+
+4. The command file will dump:
+   - registers
+   - top-of-stack dwords
+   - `0x006cec74`
+   - `0x006cec7c`
+   - `0x006cec78`
+   - `0x006ce9b8..0x006ce9c4`
+   - `0x006d1270..0x006d127c`
+   - backtrace
+
+Focus on:
+
+- whether the first hit is `0x004390cb` or `0x00445ac0`
+- caller address
+- `ecx`
+- the three stack arguments
+- `0x006cec74`
+- `0x006cec7c`
+- `0x006cec78`
+- `0x006ce9b8..0x006ce9c4`
+- `0x006d1270..`
+
+## Failing Auto-Load Run
+
+1. Launch:
+
+```bash
+tools/run_hook_auto_load_winedbg.sh hh
+```
+
+2. The default command file now scripts a fuller non-interactive capture sequence:
+   - `0x00438890`
+   - `0x004390cb`
+   - `0x00445ac0`
+   - `0x0053fea6`
+
+3. Let the hook run.
+
+4. The command file will dump the same register, stack, global, and backtrace state at the first hit.
+
+5. Compare that output directly against the successful manual run.
+
+So the current auto debugger path is now mostly headless:
+
+- launch `tools/run_hook_auto_load_winedbg.sh hh`
+- let the scripted breakpoints run
+- inspect [rt3_auto_load_winedbg.log](/home/jan/projects/rrt/rt3_auto_load_winedbg.log)
+
+Manual typing is no longer required for the main auto-load comparison path unless we need an additional ad hoc breakpoint.
+
+If the run still crashes and you need even earlier crash-side inspection after that, add one temporary extra breakpoint manually for:
+
+- `0x00517cf0`
+
+## Optional Host-Side GDB Fallback
+
+If `winedbg` is too clumsy for repeated crashes, attach host `gdb` to the crashing Wine process after RT3 starts:
+
+```bash
+pgrep -af 'wine.*RT3.exe'
+gdb -p <pid>
+```
+
+Useful commands in `gdb`:
+
+```gdb
+set pagination off
+handle SIGSEGV stop print
+continue
+bt
+info registers
+x/16wx $esp
+```
+
+This is mainly for cleaner backtraces after the fault PC is already known from `winedbg`.