Tales of Pirates on x64: porting the client + all 4 servers from 32-bit (a field guide)

Panda

Pirate
Developer
Server Owner
Registered
LV
0
 
Joined
May 28, 2026
Messages
22
Reaction score
4
Points
8
Location
Above and Below
Website
chaospirates.com
Tales of Pirates on x64: porting the client + all 4 servers from 32-bit (a field guide)

From the Chaos Pirates server development, by Panda. Code below is verbatim from our tree.

Almost every live ToP/PKO build is still Win32. On Chaos Pirates we took the MindPower3D client and all four daemons (Account / Game / Gate / Group) to native x64 and got back to full parity with the 32-bit build — login → world → combat → skills → audio, all confirmed in-game. This is the honest field report: the mental model, the migration order, the landmines (with our actual code), how to debug them when you have no debugger attached, and the myths that'll waste your time.

TL;DR — the two things that actually cost real time:
  1. Binary files written by 32-bit tools that embed pointers (.lgo/.lab models, encrypted .bin tables). A void* is 4 bytes on disk but 8 in memory → a raw fread desyncs the file → a count reads as garbage → multi-GB allocation → crash or a 30-second freeze. Fix with "disk-mirror" structs.
  2. Virtual override signature drift (DWORD vs std::uint32_t) → methods silently stop overriding → "the engine fires but nothing happens."
Everything else is the standard LLP64 checklist below.




1. Why bother
The 32-bit ~2 GB user address space is the real ceiling behind the classic "client distribution / texture" pain (our Client/texture alone is ~1.2 GB). x64 buys content and address headroom and a modern, supported toolchain. Raw perf is roughly a wash — headroom is the prize.

2. The mental model: Windows x64 is LLP64, and that's gentle
This is the single most important thing to internalize, because it turns a "rewrite" into an "audit":
  • int, long, DWORD, float, BOOL — stay 4 bytes (unchanged).
  • pointer, size_t, ptrdiff_t, INT_PTR, LPARAM — widen 4 → 8.
Only pointers and size_t-family widen. Two consequences:
  • Your network protocol stays wire-compatible with 32-bit clients — as long as packets serialize field-by-field and never embed a pointer or size_t. long staying 32-bit is what saves you. We ran the x64 client against the live 32-bit server through the entire bring-up, and you can run a mixed 32/64 server cluster during migration.
  • Most game structs don't change size. The work is finding the handful of places that touch pointer-width values on disk, in vtables, or in the Win32 API.

3. Recommended migration order
  1. Add x64 build configs to every project; get them to compile (the mechanical swaps in §6).
  2. Build the shared libs x64 first (crypto, network, utils, localization, Lua) — the servers reuse them.
  3. Vendor x64 dependency binaries (§7) and actually link the CRT.
  4. Boot it and fix the binary-loader desyncs (§4) — this is the real work, and it's where it'll "launch then crash/freeze."
  5. Runtime parity pass against the 32-bit build as your oracle (§9 debugging) until rendering/combat/audio match.
Keep every x64-specific change behind #ifdef _WIN64 so the proven 32-bit build stays byte-identical.

4. The big enemy: binary files written by 32-bit tools that embed pointers
The .lgo/.lab models and the encrypted .bin data tables were authored by 32-bit tools, where a struct's void* field occupies 4 bytes on disk. On x64 that member is 8 bytes and forces 8-byte struct alignment, so a raw fread of the record over-reads and desyncs the file position. A later count field then reads as garbage (we saw vertex_num == 0xFFFFFFFF), the engine does new lwVector3[0xFFFFFFFF] ≈ a 21–51 GB allocation, and you get either a bad_alloc crash or a ~30-second "Not Responding" freeze (the freeze is allocation-failure latency × ~156 bad meshes — not slow code, which matters in §10).

Audit, don't guess. After checking every fread in the whole model/mesh/material/bone/anim load path, only one pointer-bearing struct is read directly: lwTexInfo. Its data member was a 4-byte pointer on the 32-bit file; on x64 it is 8 bytes and forces 8-byte alignment, so a raw fread over-reads each record and desyncs the file. Every other directly-read struct is POD and identical on both architectures, and the Win32 build keeps the original raw fread byte-for-byte.

The "disk-mirror" fix (verbatim). Declare an explicit 32-bit on-disk layout where the pointer is a 4-byte slot, fread into it, copy the fields across:

Code:
struct lwTexInfo_Disk
{
    DWORD               stage;
    DWORD               level;
    DWORD               usage;
    D3DFORMAT           format;
    D3DPOOL             pool;
    DWORD               byte_alignment_flag;
    DWORD               type;
    DWORD               width;
    DWORD               height;
    DWORD               colorkey_type;
    lwColorValue4b      colorkey;
    char                file_name[LW_MAX_NAME];
    DWORD               data;            // was void* (a null 4-byte slot on the 32-bit file)
    lwRenderStateAtom   tss_set[LW_TEX_TSS_NUM];
};

static void lwReadTexInfoSeq_x64(lwTexInfo* dst, DWORD count, FILE* fp)
{
    for(DWORD i = 0; i < count; ++i)
    {
        lwTexInfo_Disk d;
        fread(&d, sizeof(d), 1, fp);

        dst[i].stage               = d.stage;
        dst[i].level               = d.level;
        dst[i].usage               = d.usage;
        dst[i].format              = d.format;
        dst[i].pool                = d.pool;
        dst[i].byte_alignment_flag = d.byte_alignment_flag;
        dst[i].type                = d.type;
        dst[i].width               = d.width;
        dst[i].height              = d.height;
        dst[i].colorkey_type       = d.colorkey_type;
        dst[i].colorkey            = d.colorkey;
        memcpy(dst[i].file_name, d.file_name, sizeof(d.file_name));
        dst[i].data                = 0;
        memcpy(dst[i].tss_set, d.tss_set, sizeof(d.tss_set));
    }
}

Both the struct and helper live under #ifdef _WIN64; at each read site you branch #ifdef _WIN64 → lwReadTexInfoSeq_x64(...) #else → fread(...) so the 32-bit path is untouched.

Audit every version of the format. This bit us twice. The legacy material format lwTexInfo_0001 (MTLTEX_VERSION0001) ALSO has a void* data member, so a raw fread over-reads on x64 exactly like lwTexInfo above — that was the scene-prop desync behind the char-select freeze and the missing buildings. Mirror it the same way, with an identical lwTexInfo_0001_Disk + lwReadTexInfo0001Seq_x64. (lwTexInfo_0000 is pointer-free, so it stays a plain fread.)

The .bin tables are the same bug, with a sharp edge. The encrypted data tables store a record whose stride includes a void* (CRawDataInfo::pData) → wrong stride on x64. Read the stride-correct layout (we mirror it the same way). Critically: never let the x64 client WRITE a shared .bin — it will emit 64-bit layout and your 32-bit client will then read garbage and flash-crash on load ("NULL GUI", the login form can't build). Our actual guard (verbatim, TableData.cpp):

Code:
void CRawDataSet::_WriteRawDataInfo_Bin(const char* pszFileName) {
#ifdef _WIN64
	// x64 must NEVER write a table .bin: it would be in 64-bit record layout and corrupt the file
	// for the shared 32-bit client. x64 reads the 32-bit .bin via the mirror in _LoadRawDataInfo_Bin.
	return;
#endif
	// ... 32-bit writer continues ...

(Bonus trap: a .gitattributes "* text=auto" will CRLF-mangle your binary .bin in git, so git is not a clean source for them — regenerate from .txt.)

5. The sneaky one: virtual-override signature drift → silent no-ops
If you're on a "modernized" codebase, watch for this — it cost us days and bit us twice. The base class had been retyped to std::uint32_t, but every derived override still used DWORD (which is unsigned long). They're both 32-bit but are distinct types, so the override silently doesn't match → the compiler adds a new vtable slot and your call resolves to the base no-op. Here's the actual base, after the fix, with the comment we left documenting the trap (STStateObj.h):

Code:
	// NOTE: these MUST be DWORD to match every derived state's override. They were briefly
	// std::uint32_t (from the modern refactor); since DWORD == unsigned long != unsigned int, that
	// breaks the override (derived declares a new vtable slot), routing attack/skill
	// keyframe events to these base no-ops -> no FX/sound and no ActionEnd (freeze).
	virtual void ActionBegin(DWORD pose_id) {
	}

	virtual void ActionFrame(DWORD pose_id, int key_frame) {
	}

	virtual void ActionEnd(DWORD pose_id) {
	}

Symptom before the fix: "the engine fires but nothing happens" — skill effects didn't render, combat SFX were silent, characters froze until cooldown, some labels drew the wrong color. All one root cause. Lesson: an x64 silent-no-op in polymorphic code = suspect base-vs-derived signature drift (DWORD vs uint32_t, or vs a pointer-width type). Add the override keyword everywhere and let the compiler flag the rest.

6. The mechanical swaps (quick, but you'll hit every one)
  • SetWindowLong / GWL_WNDPROC → SetWindowLongPtr / GWLP_WNDPROC
  • SetClassLong / GCL_HCURSOR → SetClassLongPtr / GCLP_HCURSOR
  • dialog proc returns BOOL → returns INT_PTR
  • ODBC SDWORD / SQLINTEGER length/indicator → SQLLEN / SQLULEN
  • (int)pPtr / storing a pointer in DWORD → store in a pointer-width type (INT_PTR / void*)
  • Inline asm (x64 MSVC forbids it): __asm int 3 → __debugbreak(); rewrite real x86 asm (FPU/rotate/xor helpers) as portable C; for Crypto++, define CRYPTOPP_DISABLE_ASM and let the C++ fallbacks run.
  • UCRT heads-up: modern UCRT fast-fails (0xC0000409, FAST_FAIL_INVALID_ARG) on some legacy patterns that old msvcr71 tolerated — e.g. passing a negative char (GBK byte) into ctype/CRT functions. If you get fast-fail crashes deep in init, that's a prime suspect.

7. Dependencies & build config
  • Vendor x64 builds of everything: DirectX9 x64 import libs (deprecated, not in the modern Windows SDK — pull from an x64 DX9 dep pack), LuaJIT rebuilt as x64 lua51.dll/.lib (build in an x64 dev env so its minilua/buildvm host tools run), SDL2/SDL2_mixer (API-compatible enough with SDL 1.2.7 that no wrapper code changed — only build wiring), discord-rpc x64.
  • Actually link the CRT. The legacy config omitted the default lib name and ignored MSVCRT.lib → nothing linked. Set RuntimeLibrary=MultiThreadedDLL and stop ignoring it.
  • Check your optimization flag. Ours had no <Optimization> element in any project, so every Release|x64 binary silently built /Od (debug-speed). Set MaxSpeed (/O2) centrally in a Directory.Build.props. (Note: /O2 will not fix the §4 freeze — see §10.)

8. Server side
The four daemons mostly reuse the client's shared x64 libs — one props file feeds Common/Network/Utils/crypto/lua51 + Ws2_32/Winmm/Iphlpapi to all of them. Because of LLP64 the protocol stays compatible, so a mixed 32/64 cluster runs fine during migration. The genuine LLP64 server fixes were few — the memorable one was a SQLBindCol arg-6 SQLINTEGER* → SQLLEN*. And a myth to retire: the "ICU x64 blocker" wasn't real for us — the tree used its own localization lib, not ICU. Most server effort was just rebuilding the shared libs and the ODBC width fixes.

9. How to debug this with no debugger attached
You usually won't have windbg/cdb on the box, and the crashes are deep in init. What worked:
  • Windows Event Viewer → "Application Error" gives you the faulting module + exception code for a bare crash. 0xC0000409 = a fast-fail; a giant private-bytes spike before death = the §4 allocation storm.
  • Instrument down the call chain with fopen("x64diag.log","a")+fprintf and bisect where x64 diverges from 32-bit. Trap: if your engine is a DLL, log singletons created in the EXE are a dead sink inside the DLL — DLL log lines silently vanish. Use a raw fopen from inside the DLL (cwd = your Client/ root).
  • Filter and cap every probe so the hot idle/run paths don't drown the log. This is how we proved a keyframe callback fired but the virtual dispatched to the base no-op (§5).
  • Use the 32-bit build as a perfect oracle. Same assets, same scripts, same server. Anything that differs is your x64 delta — instrument both, diff, fix.

10. What was NOT the problem (don't waste weeks here)
  • DX9 is fine — do not jump to DX11/DX12 to "fix" x64. The 32-bit client runs the same DX9, assets, and logic smoothly, which proves the graphics API isn't your problem. An API port is a far bigger project than the x64 port, for zero gain on these symptoms.
  • /O2 won't fix the char-select freeze. A ~30s freeze that's constant regardless of optimization is an allocation-failure / blocking-wait signature, not compute. Don't chase it with the optimizer — fix the §4 desync.
  • ICU isn't a blocker (see §8). Check what your tree actually links before believing the folklore.

11. Result
One source tree, client + all four servers native x64, running end-to-end at parity with the 32-bit build, protocol-compatible with 32-bit during the transition. The whole effort really reduces to: the LLP64 checklist (mechanical), the binary-loader desyncs (the disk-mirror pattern), and the vtable signature drift. Get those and you're most of the way there.

Happy to share more detail on any section — ask away.

— Panda, Chaos Pirates
 
Last edited:
  • BlueSquidCool
  • Like
Reactions: Zakernado and zLuke