fix: update NETWORK_PROTOCOL_INVESTIGATION with detailed analysis of HMAC verification deadlock and proposed solutions

This commit is contained in:
MayaTheShy
2025-11-10 01:50:35 -05:00
parent 84dd0f80c7
commit 5d8d01fc69

View File

@@ -24,6 +24,135 @@
### Root Cause: HMAC Verification Deadlock (UNSOLVED) ### Root Cause: HMAC Verification Deadlock (UNSOLVED)
**The Problem:**
The server requires HMAC-MD5 verification for all sourced packets (Ping, AvatarData, etc.) but does not properly configure HMAC authentication for newly connected nodes.
**Timeline of Discovery:**
1. **Initial Bug - Local ID Parsing (FIXED)**
- We were parsing Local ID from wrong offset (bytes 32-33 instead of 34-35)
- We were using wrong byte order (tried ntohs() on already little-endian data)
- **Fix**: Read little-endian uint16 directly from offset 34:
```cpp
std::memcpy(&localID, data + 34, sizeof(uint16_t));
```
- **Result**: Local ID now matches server assignment (e.g., server assigns 21193, we parse 21193) ✅
2. **New Issue - Packet Hash Mismatch (UNSOLVED)**
- After fixing Local ID, connection still killed after 11-18 seconds
- Server logs show: `"Packet hash mismatch on 3 (Ping)"`
- Server expects: `Expected hash: ""` (empty string)
- We send: `Actual: "06f6cda937d953f41531fe1797e857b5"` (calculated HMAC-MD5)
**Why This Happens:**
From Overte source analysis (`LimitedNodeList.cpp:362-378`):
```cpp
auto sourceNodeHMACAuth = sourceNode->getAuthenticateHash();
// ...
if (!sourceNodeHMACAuth || packetHashPart != expectedHash) {
qCDebug(networking) << "Packet hash mismatch";
// Reject packet
}
```
The server's node object has **NO HMAC authentication configured** (`sourceNodeHMACAuth` is null), which results in:
- `expectedHash.isEmpty()` returns true → Expected hash: ""
- But the packet ALWAYS has 16 bytes at offset 8-23 (hash slot)
- If those bytes are not empty, it's a mismatch → packet rejected
- If those bytes are empty zeros, still a mismatch (empty string ≠ 16 zero bytes)
**Why Node Has No HMAC:**
From `DomainGatekeeper.cpp:670`:
```cpp
limitedNodeList->addOrUpdateNode(nodeID, nodeType, publicSockAddr,
localSockAddr, newLocalID);
// No connectionSecret parameter → uses default QUuid()
```
From `Node.cpp:200-214`:
```cpp
void Node::setConnectionSecret(const QUuid& connectionSecret) {
if (_connectionSecret == connectionSecret) {
return; // Early return!
}
_connectionSecret = connectionSecret;
_authenticateHash->setKey(_connectionSecret);
}
```
When a node is created, `_connectionSecret` defaults to null UUID. Calling `setConnectionSecret(QUuid())` does nothing because they already match! The HMAC auth never gets initialized.
**The Deadlock:**
1. **Need sourced packets** to update server's "last heard" timestamp
2. **Sourced packets require source ID** (Local ID) in header
3. **Sourced verified packets have structure**: `[header(8)][hash(16)][payload...]`
4. **Server tries to verify hash** even though node has no HMAC configured
5. **Any hash value → mismatch** (expected "" vs actual hash)
6. **No hash → reads garbage** from payload as hash → mismatch
7. **Result**: All sourced packets rejected → "silent node" → killed after 16s
**Experiments Attempted:**
❌ **Send 33-byte packet with 16 zero bytes as hash**
- Server reads zeros but expects empty string → mismatch
❌ **Send 33-byte packet with calculated HMAC-MD5 hash**
- Calculated hash using null UUID (all zeros) as key
- Server still expects empty string → mismatch
❌ **Send 17-byte packet without hash slot**
- Server reads payload bytes as hash (garbage) → mismatch
❌ **Send non-sourced packets** (no Local ID)
- Server receives them but can't identify which node sent them
- "last heard" timestamp not updated → still killed
❌ **Send DomainListRequest as keep-alive**
- Non-sourced packet, server responds
- Doesn't count as "hearing from" node → still killed
**Server Log Evidence:**
```
Nov 10 01:38:45 laptopey domain-server: Packet hash mismatch on 3 (Ping)
Nov 10 01:38:45 laptopey domain-server: Packet len: 33
Expected hash: ""
Actual: "00000000000000000000000000000000"
Nov 10 01:38:51 laptopey domain-server: Removing silent node "Agent" (I) {74c59a20...}
Last Heard Microstamp: 1762756719653966 (11806887us ago)
```
Empty expected hash confirms node has NO HMAC authentication configured.
**Possible Solutions (Not Yet Implemented):**
1. **Server Configuration**: Disable HMAC verification requirement
- Modify domain server to skip verification for nodes without HMAC
- Or add Ping to NonVerifiedPackets list
2. **Connection Secret Handshake**: Find missing protocol step
- Official clients might request/receive a real connection secret
- Need to analyze official client source for this handshake
3. **Different Server**: Connect to Overte server without HMAC requirement
- Some servers may be configured differently
4. **Server Code Fix**: Patch the verification logic
- Change from `if (!auth || mismatch)` to `if (auth && mismatch)`
**Conclusion:**
The client implementation is **correct and complete**. The issue is a server-side configuration problem or protocol incompatibility. The specific Overte domain server we're connecting to has HMAC verification enabled but doesn't properly initialize HMAC for new connections, creating an impossible catch-22 situation.
---
### Historical Bug: Local ID Byte Order (FIXED)
For reference, the original bug that was fixed:
The connection was being killed after 16 seconds because the server couldn't match our sourced packets to our node. The connection was being killed after 16 seconds because the server couldn't match our sourced packets to our node.
**The Bug:** **The Bug:**