Deepen engine type parser semantics

This commit is contained in:
Jan Petykiewicz 2026-04-21 22:44:51 -07:00
commit 1bd4158c0c
7 changed files with 297 additions and 33 deletions

View file

@ -4,6 +4,9 @@ This directory preserves older queue snapshots and long-form implementation note
useful as evidence, but should not stay in the short active queue file.
- `archive-2026-04-19.md`: preserved detailed queue snapshot from the pre-index cleanup.
- `engine-types-parser-semantics-2026-04-21.md`: current static parser frontier for the
`engine_types` family, including the grounded `.car` fixed slots, guarded `.lco` companion/body
slots, and the remaining semantic questions around `.cgo`.
- `format-inventory-2026-04-21.md`: current file-format inventory under `rt3/` and `rt3_105/`,
including the RT3-native families we still do not parse.
- `locomotive-descriptor-tails-2026-04-21.md`: checked `.gms + .gmx` local locomotive catalog

View file

@ -0,0 +1,70 @@
# EngineTypes Parser Semantics (2026-04-21)
This note preserves the current static parser frontier for the `engine_types` family after the
first `.car` / `.lco` / `.cgo` / `.cct` inspector pass landed.
## Grounded Fixed Lanes
- `.car` is no longer just a three-string header:
- `0x0c`: primary display name
- `0x48`: content name
- `0x84`: internal stem
- `0xa2`: second fixed stem slot
- `0xc0`: side-view resource name such as `CarSideView_1.imb`
- The checked 1.05 corpus (`145` `.car` files) carries all five of those `.car` slots on every
file inspected so far.
- `.lco` carries one always-present primary stem at `0x04`.
- `.lco` only carries meaningful secondary slots when that leading stem slot is padded:
- `0x0c`: conditional companion stem such as `VL80T` or `Zephyr`
- `0x12`: conditional body label such as `Loco`
- The checked 1.05 corpus (`66` `.lco` files) shows why the guard matters: long primary stems
such as `AtlanticL` naturally spill across `0x0c`, so `0x0c` and `0x12` are not independent
fixed fields unless the earlier slot is actually zero-padded.
- `.cgo` looks structurally narrow right now: the checked 1.05 corpus has `37` files, all exactly
`25` bytes long, each carrying one leading scalar lane plus an inline content stem at `0x04`.
- `.cct` remains the least ambiguous sidecar: current shipped files still look like narrow one-row
text metadata.
## What The Current Parser Now Owns
- `.car`
- primary display name
- content name
- internal stem
- auxiliary stem slot
- side-view resource name
- `.lco`
- full internal stem
- conditional companion stem slot
- conditional body-type label
- early raw numeric lane block `0x20..0x54`
- `.cgo`
- leading scalar lane
- content stem
- `.cct`
- tokenized identifier/value row
## Remaining Static Questions
- `.car`
- what the `0xa2` auxiliary stem really represents across locomotive, tender, and freight-car
families: alias root, image key, or alternate content stem
- whether the trailing side-view resource can be tied cleanly to `.imb` metadata without
inventing frontend semantics
- `.lco`
- whether the guarded companion-stem slot is a tender/fallback display family, a foreign reuse
key, or only a subset authoring convenience
- how much of the early numeric lane block can be promoted from raw `u32/f32` views into stable
typed semantics without dynamic evidence
- `.cgo`
- whether the leading scalar is enough to justify a named typed field, or whether it should stay
a conservative raw scalar until more binary/code correlation exists
## Next Static Parser Work
- keep extending `engine_types` instead of creating a parallel parser family
- prefer fixed-slot promotion only when the corpus proves the slot is independent rather than a
spillover from an earlier variable-width stem
- treat `.cgo` as parser-complete structurally unless a clearer gameplay consumer appears
- keep the broader remaining unparsed-family list in [RT3 format inventory](format-inventory-2026-04-21.md)
rather than duplicating it here