rrt/docs/debug-load-workflow.md

353 lines
12 KiB
Markdown

# Debug Load Workflow
Use this when comparing:
- one successful manual load of `hh`
- one failing hook-driven auto-load attempt
The goal is to compare the real successful owner path above `0x00445ac0` against the failing hook-driven path.
## Current Findings
From the current logs:
- successful manual load now has a grounded pre-call site at `0x004390cb` with:
- `ECX = 0x02af5840`
- `[0x006cec78] = 0x02af5840`
- `[0x006cec74] = 0x01d81230`
- top-of-stack dwords:
- `arg1 = 0x01db4739`
- `arg2 = 4`
- `arg3 = 0x0022fb50`
- next dword = `0x026d7b88`
- the subsequent successful `0x00445ac0` entry still has:
- `ret = 0x004390d0`
- `arg1 = 0x01db4739`
- `arg2 = 4`
- `arg3 = 0x0022fb50`
- older failing auto-load attempts never reached `0x00445ac0`
- the earlier failing breakpoint was `0x00517cf0` with:
- `[0x006cec78] = 0`
- `[0x006cec74] = 0x01d81230`
- the staged request globals at `0x006ce9b8..0x006ce9c4` and `0x006d1270..0x006d127c` are zero on the successful manual path
That older `0x00517cf0` result is no longer the current blocker. The hook now reaches the real coordinator entry, so the remaining gap is later shell timing or re-entrancy, not request-latch shape.
The disassembly at `0x004390b0..0x004390cb` is now the strongest grounded manual-load branch:
- it writes `[0x006cec74+0x6c] = 1`
- it computes `arg1` from `([0x006cec7c] + 0x11)`
- it pushes `arg2 = 4`
- it passes `arg3 = &out_success`
- and then calls `0x00445ac0`
So any hook experiment that does not reproduce that exact shape is no longer a plausible match for the successful manual path.
## Latest Auto-Load Comparison
The newest hook-driven debugger run now reaches `0x00445ac0` directly.
At the auto-load `0x00445ac0` breakpoint:
- stack:
- `ret = 0x7650505c` inside `dinput8`
- `arg1 = 0x01db4739`
- `arg2 = 4`
- `arg3 = 0x0022fcf8`
- globals:
- `[0x006cec74] = 0x01d81230`
- `[0x006cec7c] = 0x01db4728`
- `[0x006cec78] = 0x026d7b88`
Compared to the successful manual path:
- `arg1` matches exactly: `0x01db4739`
- `arg2` matches exactly: `4`
- `[0x006cec74]` matches exactly: `0x01d81230`
- `[0x006cec7c]` still matches the same runtime-profile base used to derive `arg1`
- `[0x006cec78]` is now non-null and published before entry
So the hook is no longer missing the coordinator entry shape. The remaining question is no longer "can we reach `0x00445ac0`?" but "does the live non-debugger call return successfully and trigger the actual restore transition?"
## Latest Live Crash
The latest non-debugger auto-load run now reaches:
- `rrt-hook: auto load ready gate passed`
- `rrt-hook: auto load restore calling`
and then crashes at:
- `0x0053fea6`
The local disassembly around `0x0053fe90` shows a shell-side list traversal over `[this+0x74]` that walks linked entries and calls a virtual method on each. The crash instruction at `0x0053fea6` dereferences one traversed entry:
- `mov eax, DWORD PTR [esi]`
That strongly suggests the current hook is invoking the restore from the right call shape but on the wrong shell-pump turn. The active hypothesis is now timing or re-entrancy:
- the hook detects readiness and fires restore on the same shell-pump turn
- RT3 later re-enters shell object traversal in a phase where one list entry is still invalid
So the next experiment is to defer the actual restore by additional ready shell-pump turns instead of firing on the first ready turn.
## Manual Owner Tail
The branch at `0x004390b0..0x004390ea` now has a grounded post-call tail too:
- `0x004390cb` calls `0x00445ac0`
- `0x004390d0` immediately calls `0x004834e0(0, 1)` on `0x006cec74`
- if `out_success != 0` or `esi != 0`, `0x004390ea` calls `0x004384d0`
- then `0x004390ef` calls `0x0053f310` on `0x00ccbb20`
- then `0x00439104` calls `0x004834e0(0, 1)` again
The successful manual breakpoint at `0x004390cb` shows `ESI = 0` and `EDI = 1`, so the manual load branch only forces the `0x004384d0` post-load pipeline when `out_success` comes back nonzero.
That makes the current hook gap narrower still: even with the correct `0x00445ac0` arguments, returning directly into `dinput8` skips RT3's own owner-tail work unless we mirror it ourselves.
## Owner Xrefs Above `0x438890`
The containing owner at `0x00438890` is now grounded as a larger `thiscall` shell owner with two stack arguments. Current xrefs found in local disassembly are:
- `0x00443b57`
- `0x00446d7f`
- `0x0046b8bc`
- `0x004830ca`
The strongest caller so far is `0x004830ca`:
- it publishes `0x006cec78 = eax`
- then calls `0x00438890` as `thiscall(active_mode, 1, 0)`
- it sits inside `shell_transition_mode`
- it is the branch that constructs `LoadScreen.win` through `0x004ea620`
- and it continues through shell-window follow-up on `0x006d401c` after the `0x00438890` call
The surrounding mode map is tighter now too:
- mode `1` = `Game.win`
- mode `2` = `Setup.win`
- mode `3` = `Video.win`
- mode `4` = `LoadScreen.win`
- mode `5` = `Multiplayer.win`
- mode `6` = `Credits.win`
- mode `7` = `Campaign.win`
That makes `0x00438890(active_mode, 1, 0)` the strongest current RT3-native entry candidate for reproducing the successful manual load branch, because it owns the internal dispatch that later reaches `0x004390cb`.
Current static xrefs also tighten the broader ownership split:
- `0x00443b57` calls `0x00438890` from the world-entry side, but with `(0, 0)` after dismissing the current shell detail panel and servicing `0x4834e0(0, 0)`
- `0x00446d7f` calls it from the saved-runtime restore side with the same `(0, 0)` shape before immediately building `.smp` bundle payloads through `0x530c80/0x531150/0x531360`
- `0x0046b8bc` calls it from the multiplayer preview family before a later `0x00445ac0` call
- `0x004830ca` calls it from the shell-side active-mode branch with the clearest `(1, 0)` setup
So the function is no longer just a guessed hook target. It is now a real shared owner above world-entry, saved-runtime restore, multiplayer preview, and shell-side active-mode startup branches.
The internal selector split inside `0x00438890` is tighter now too:
- `[0x006cec7c+0x01]` is a startup-profile selector, not the shell mode id
- selector values `1` and `7` share the tutorial lane at `0x00438f67`, which writes
`[0x006cec74+0x6c] = 2` and loads `Tutorial_2.gmp` or `Tutorial_1.gmp`
- selector `2` is a world-root initialization lane at `0x00438fbe` that allocates `0x0062c120`
when needed, runs `0x0044faf0`, and then forces the selector to `3`
- selector `4` is a setup-side world reset or regeneration lane at `0x00439038` that rebuilds
`0x0062c120` from setup globals `0x006d14cc/0x006d14d0`, then runs `0x00535100` and `0x0040b830`
- selector values `3`, `5`, and `6` collapse into the same profile-seeded file-load lane at
`0x004390b0..0x004390ea`
- selector `6` is the one variant that explicitly writes `[0x006cec74+0x6c] = 1` before the
shared file-load call
Current grounded writers now tighten those values too:
- `Campaign.win` writes selector `6` at `0x004b8a2f`
- `Multiplayer.win` writes selector `3` on one pending-status branch at `0x004f041e`
- the larger `Setup.win` dispatcher around `0x005033d0..0x00503b7b` writes selectors `2`, `3`, `4`,
and `5` on several validated launch branches
- so the shared file-load lane is now best read as one reused profile-file startup family rather
than one owner-specific manual-load path
That means the successful manual-load branch is not the whole function. It is one three-selector
subfamily inside a broader startup dispatcher that also owns tutorial and fresh-world setup lanes.
The multiplayer preview side is also tighter now:
- `0x0046b8bc` publishes `0x006cec78`
- calls `0x00438890` as `thiscall(active_mode, 0, 0)`
- clears `[0x006cec74+0x6c]`
- and only then calls `0x00445ac0(0x006ce630, [0x006ce9c0], 0)`
That makes the preview relaunch path clearly different from the manual load branch, not just a differently staged copy of it.
## Latest Headless Debugger Result
The scripted auto-load debugger run is now useful without manual interaction:
- all breakpoints were set successfully:
- `0x00438890`
- `0x004390cb`
- `0x00445ac0`
- `0x0053fea6`
- but only `0x0053fea6` actually fired in the captured run
So the current non-interactive path is good enough to gather repeatable crash-side state, but it also tells us that the current auto-load code path is still not obviously traversing the larger-owner breakpoints under `winedbg`. The next step is therefore more hook-side logging around the `0x00438890` call itself rather than more manual debugger work.
The latest static pivot also means the next reverse-engineering step does not require a live run:
- compare the mode-`4` `LoadScreen.win` owner path at `0x004830ca` against the world-entry and
saved-runtime callers of `0x00438890`
- compare how the `(1, 0)` `LoadScreen.win` lane diverges from the `(0, 0)` world-entry and
saved-runtime lanes before control reaches the shared `0x004390b0` manual-load branch
- only then return to hook experiments
## Launchers
Manual debugger run:
```bash
tools/run_rt3_winedbg.sh
```
Auto-load debugger run:
```bash
tools/run_hook_auto_load_winedbg.sh hh
```
Both scripts use `/opt/wine-stable/bin/winedbg` explicitly, so they do not depend on `winedbg` being on `PATH`.
They also default to:
- their matching command file in [tools/](/home/jan/projects/rrt/tools)
- a logfile in the repo root:
- [rt3_manual_load_winedbg.log](/home/jan/projects/rrt/rt3_manual_load_winedbg.log)
- [rt3_auto_load_winedbg.log](/home/jan/projects/rrt/rt3_auto_load_winedbg.log)
To save the full interactive debugger session to a file, set `RRT_WINEDBG_LOG`:
```bash
RRT_WINEDBG_LOG=/tmp/rt3-manual-load-winedbg.log tools/run_rt3_winedbg.sh
```
or:
```bash
RRT_WINEDBG_LOG=/tmp/rt3-auto-load-winedbg.log tools/run_hook_auto_load_winedbg.sh hh
```
Those wrappers use `script`, so both the commands you type and the debugger output are captured.
`winedbg` under `/opt/wine-stable` also supports command files directly:
```bash
tools/run_rt3_winedbg.sh
```
and:
```bash
tools/run_hook_auto_load_winedbg.sh hh
```
Override either default if needed:
```bash
RRT_WINEDBG_LOG=/tmp/rt3-manual-load-winedbg.log tools/run_rt3_winedbg.sh
```
Ready-made debugger command files are also provided:
- [winedbg_manual_load_445ac0.cmd](/home/jan/projects/rrt/tools/winedbg_manual_load_445ac0.cmd)
- [winedbg_auto_load_compare.cmd](/home/jan/projects/rrt/tools/winedbg_auto_load_compare.cmd)
If you do not use `RRT_WINEDBG_CMD_FILE`, you can still open those files and paste their contents into the debugger manually.
Both scripts rebuild `rrt-hook`, copy `dinput8.dll` into the Wine RT3 directory, and launch RT3 under `winedbg`.
## Successful Manual Load
1. Launch:
```bash
tools/run_rt3_winedbg.sh
```
2. The default command file now breaks on both:
- `0x004390cb` first
- `0x00445ac0` second
3. In RT3, load save `hh` manually.
4. The command file will dump:
- registers
- top-of-stack dwords
- `0x006cec74`
- `0x006cec7c`
- `0x006cec78`
- `0x006ce9b8..0x006ce9c4`
- `0x006d1270..0x006d127c`
- backtrace
Focus on:
- whether the first hit is `0x004390cb` or `0x00445ac0`
- caller address
- `ecx`
- the three stack arguments
- `0x006cec74`
- `0x006cec7c`
- `0x006cec78`
- `0x006ce9b8..0x006ce9c4`
- `0x006d1270..`
## Failing Auto-Load Run
1. Launch:
```bash
tools/run_hook_auto_load_winedbg.sh hh
```
2. The default command file now scripts a fuller non-interactive capture sequence:
- `0x00438890`
- `0x004390cb`
- `0x00445ac0`
- `0x0053fea6`
3. Let the hook run.
4. The command file will dump the same register, stack, global, and backtrace state at the first hit.
5. Compare that output directly against the successful manual run.
So the current auto debugger path is now mostly headless:
- launch `tools/run_hook_auto_load_winedbg.sh hh`
- let the scripted breakpoints run
- inspect [rt3_auto_load_winedbg.log](/home/jan/projects/rrt/rt3_auto_load_winedbg.log)
Manual typing is no longer required for the main auto-load comparison path unless we need an additional ad hoc breakpoint.
If the run still crashes and you need even earlier crash-side inspection after that, add one temporary extra breakpoint manually for:
- `0x00517cf0`
## Optional Host-Side GDB Fallback
If `winedbg` is too clumsy for repeated crashes, attach host `gdb` to the crashing Wine process after RT3 starts:
```bash
pgrep -af 'wine.*RT3.exe'
gdb -p <pid>
```
Useful commands in `gdb`:
```gdb
set pagination off
handle SIGSEGV stop print
continue
bt
info registers
x/16wx $esp
```
This is mainly for cleaner backtraces after the fault PC is already known from `winedbg`.