rrt/docs/re-workflow.md

8.8 KiB

Reverse-Engineering Workflow

Goal

Produce durable, version-safe analysis that first explains high-level control loops and subsystem handoffs, then feeds the function-by-function Rust rewrite and later DLL-based replacement work.

Standard Loop

  1. Confirm the target binary and record its hash.
  2. Refresh the exported baseline artifacts.
  3. Update the relevant file under docs/control-loop-atlas/ with newly grounded loop roots, dispatchers, cadence points, state anchors, or subsystem handoffs, and touch docs/control-loop-atlas.md only if the top-level section layout changes.
  4. Analyze in Ghidra.
  5. Cross-check suspicious findings in Rizin or with CLI tools.
  6. Update the function map with names, prototypes, ownership, confidence, and loop-relevant notes.
  7. Commit regenerated exports and notes that would help future sessions.

Baseline Export

Use the committed helper:

python3 tools/py/collect_pe_artifacts.py \
  rt3_wineprefix/drive_c/rt3/RT3.exe \
  artifacts/exports/rt3-1.06

This export pass is expected to produce:

  • binary-summary.json
  • sections.csv
  • imported-dlls.txt
  • imported-functions.csv
  • interesting-strings.txt
  • subsystem-inventory.md
  • function-map.csv

For the startup-init milestone, run the Ghidra headless export as well:

python3 tools/py/export_startup_map.py \
  rt3_wineprefix/drive_c/rt3/RT3.exe \
  artifacts/exports/rt3-1.06

Optional flags:

python3 tools/py/export_startup_map.py \
  rt3_wineprefix/drive_c/rt3/RT3.exe \
  artifacts/exports/rt3-1.06 \
  --depth 2 \
  --root entry:0x005a313b \
  --root bootstrap:0x00484440

This startup pass is expected to add:

  • ghidra-startup-functions.csv
  • startup-call-chain.md

The raw CSV now includes root provenance columns:

  • root_name
  • root_address

Context Export

For branch-deepening passes after the initial root mapping, use the committed context exporter:

python3 tools/py/export_analysis_context.py \
  rt3_wineprefix/drive_c/rt3/RT3.exe \
  artifacts/exports/rt3-1.06 \
  --addr 0x00444dd0 \
  --addr 0x00508730 \
  --addr 0x00508880 \
  --string gpdLabelDB \
  --string gpdCityDB \
  --string 2DLabel.imb \
  --string 2DCity.imb \
  --string "Geographic Labels"

This pass is expected to add:

  • analysis-context-functions.csv
  • analysis-context-strings.csv
  • analysis-context.md

The function CSV captures target function metadata plus caller callee and data-ref summaries. The string CSV captures matched strings plus their code or data xrefs. The Markdown report keeps the human-readable disassembly excerpts that are useful for the next naming pass.

Use this exporter to close missing edges in the atlas before using it for leaf-function refinement.

Branch RE Kit

For deeper branch work after the atlas identifies a narrow unknown, use the CLI RE kit:

python3 tools/py/rt3_rekit.py \
  pending-template-store \
  rt3_wineprefix/drive_c/rt3/RT3.exe \
  artifacts/exports/rt3-1.06

Optional seed override:

python3 tools/py/rt3_rekit.py \
  pending-template-store \
  rt3_wineprefix/drive_c/rt3/RT3.exe \
  artifacts/exports/rt3-1.06 \
  --seed-addr 0x0059c470 \
  --seed-addr 0x0059c540

This pass is expected to add:

  • pending-template-store-functions.csv
  • pending-template-store-record-kinds.csv
  • pending-template-store-management.md

The function CSV captures the seed cluster plus adjacent discovered helpers in the same branch. The record-kinds CSV captures the pending-template dispatch-record destructor switch cases and their inferred payload cleanup shapes. The Markdown dossier groups the branch into lifecycle buckets such as init destroy lookup prune and dispatch.

This branch dossier is intentionally narrower than the atlas. Reach for it only when the broad loop map is already clear enough that a missing branch blocks the next high-level conclusion.

Ghidra Workflow

  • Create a local project for the canonical 1.06 executable.
  • Name the project after the binary version, not just RT3, so address notes stay version-safe.
  • Import the executable without modifying repo-tracked files.
  • Treat Ghidra as the primary source for function boundaries, control flow, and decompilation.
  • Local launcher on this host: ~/software/ghidra/ghidraRun
  • Local headless entrypoint on this host: ~/software/ghidra/support/analyzeHeadless
  • Headless project state should live under ghidra_projects/ and remain untracked.
  • The committed wrapper defaults to the entry and bootstrap roots but can be pointed at additional roots when a milestone needs it.

Rizin Workflow

Use Rizin as the fast second opinion when you need to:

  • check section layout, entrypoints, and imports from the CLI
  • confirm function boundaries or calling conventions
  • script quick address-oriented inspections without reopening the GUI

Runtime Debugging

Static analysis comes first. Use winedbg only after the local Wine runtime is confirmed to work with the project prefix and a 32-bit target process. Runtime traces should be recorded back into the function map as corroborating evidence, not treated as a replacement for static exports.

Current host note:

  • env WINEPREFIX=/home/jan/projects/rrt/rt3_wineprefix winedbg --help works.
  • RT3 launches successfully under /opt/wine-stable/bin/wine when the current directory is rt3_wineprefix/drive_c/rt3.
  • Launching from the wrong working directory can make the process exit cleanly because the game expects its relative asset paths to resolve under C:\\rt3.

That means runtime work can proceed, but startup commands should always be recorded with the working directory included.

Naming Rules

  • Names should prefer behavior over implementation detail when behavior is known.
  • Prefer the shape owner_verb_object[_qualifier].
  • Prefer one primary verb, one primary object, and at most one qualifier.
  • If behavior is only partly known, keep a neutral prefix such as subsystem_ or unk_.
  • Address-derived placeholder names are acceptable, but only as temporary rows.
  • Every renamed function should keep a short note explaining why the name is justified.
  • For high-level passes, prioritize names that clarify loop role, ownership, or handoff semantics over names that only describe a local helper's mechanics.
  • Prefer try_ for best-effort helpers that may fall through without mutation or publication.
  • Prefer apply_ when a helper commits one selected policy or state transition.
  • Reserve evaluate_ for read-heavy helpers that classify or score state without committing the later action themselves.
  • Prefer one stable family noun once a transient runtime structure is grounded.
  • Use queue_node for transient linked-list allocations, and reserve record for persisted rows or document-style payloads.
  • Prefer startup_company over company_start when the object is the newly started company.
  • Prefer participial qualifiers such as _ignoring_territories over _with_*_ignored once the side condition is grounded.
  • Drop filler tails such as _lanes once a broader owner is grounded well enough to carry the family directly.
  • Prefer _and_optionally_ over _with_optional_ when a helper may take one secondary path but the main owner is still singular.
  • Treat _and_, _with_, _if_, and _via_ as fallback tools for still-uncertain seams, not as the default naming shape.
  • Raw offset tails such as field_0xNN are acceptable for accessors and low-confidence rows, but should be dropped once a stable semantic field meaning is grounded.

Confidence Rules

  • 1: address exists, purpose unknown
  • 2: rough subsystem guess only
  • 3: behavior inferred from control flow or strings
  • 4: prototype or side effects mostly understood
  • 5: confirmed by multiple sources or runtime evidence

Export Policy

Commit exports that are cheap to diff and useful to reuse:

  • JSON, CSV, TXT, and Markdown summaries
  • function maps and subsystem inventories
  • small command outputs that anchor a finding
  • raw startup discovery exports from headless Ghidra

Keep these local-only:

  • Ghidra projects and caches
  • repo-local Ghidra runtime state under .ghidra/
  • Rizin databases and ephemeral sessions
  • temporary dumps and scratch notebooks that have not been curated

Keep the ownership split explicit:

  • raw Ghidra or Rizin discovery output is derived data
  • function-map.csv is the curated ledger and may intentionally diverge from auto-generated names

Exit Criteria For The Broad-Mapping Milestone

The current breadth-first milestone is complete when the repo has:

  • a stable starter map for the canonical binary
  • a control-loop atlas covering the major top-level loops and handoff points
  • named anchors for startup, shell/UI, frame/presentation, simulation, map/load, input, save/load, and multiplayer/network flow
  • enough notes and exports that a future session can continue without rediscovery