Skip to content

Silent process crash: /GS stack-buffer-overrun in libuv Windows TCP-connect path under high outbound HTTP-connection volume (24.15.0; appears fixed in 24.16.0) #63620

@JohnMcLear

Description

@JohnMcLear

Version

Reproduces on v24.15.0 (libuv 1.51.0). Appears fixed on v24.16.0 (libuv 1.52.1). Also clean on v25.9.0 (libuv 1.51.0).

Platform

Windows x64 (GitHub-hosted `windows-latest`). Not observed on Linux.

Subsystem

libuv (Windows TCP connect: uv__tcp_connect / uv__tcp_try_connect), reached via net/http client APIs and TCPWrap::Connect.

What steps will reproduce the bug?

A Node process that makes a high volume of short-lived outbound HTTP connections on Windows intermittently dies with no catchable error. Pure-Node repro — save as repro.js in an empty directory and run a few times on Windows + Node 24.15.0:

// repro.js — no dependencies. Storms short-lived loopback HTTP connections
// (keep-alive off) against an in-process server.
const http = require('http');
const HOST = process.env.REPRO_HOST || '127.0.0.1'; // 127.0.0.1 reproduced 100% of trials
const DURATION_MS = Number(process.env.DURATION_MS || 60000);
const CONCURRENCY = Number(process.env.CONCURRENCY || 96);

const server = http.createServer((req, res) => { res.end('ok'); });
server.listen(0, HOST, () => {
  const { port } = server.address();
  const agent = new http.Agent({ keepAlive: false, maxSockets: CONCURRENCY });
  const end = Date.now() + DURATION_MS;
  let outstanding = 0, made = 0, done = false;
  const finish = () => { if (!done) { done = true; console.log(`done: ${made} reqs, no crash`); server.close(() => process.exit(0)); } };
  const pump = () => {
    while (outstanding < CONCURRENCY && Date.now() < end) {
      made++; outstanding++;
      const req = http.get({ host: HOST, port, agent }, (res) => { res.resume(); res.on('end', () => outstanding--); });
      req.on('error', () => outstanding--);
    }
    if (Date.now() < end) setImmediate(pump);
    else if (outstanding === 0) finish();
  };
  const wd = setInterval(() => { if (Date.now() >= end && (outstanding === 0 || Date.now() >= end + 10000)) { clearInterval(wd); finish(); } }, 250);
  wd.unref();
  pump();
});
node repro.js

To capture the otherwise-invisible crash, enable a full dump on silent exit (elevated PowerShell) before running:

$dumps = "C:\node-dumps"; New-Item -ItemType Directory -Force -Path $dumps | Out-Null
$ifeo = "HKLM:\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\node.exe"
$spe  = "HKLM:\SOFTWARE\Microsoft\Windows NT\CurrentVersion\SilentProcessExit\node.exe"
New-Item -Path $ifeo -Force | Out-Null
New-ItemProperty -Path $ifeo -Name GlobalFlag     -PropertyType String -Value "0x200" -Force | Out-Null  # FLG_MONITOR_SILENT_PROCESS_EXIT
New-Item -Path $spe -Force | Out-Null
New-ItemProperty -Path $spe -Name ReportingMode   -PropertyType DWord  -Value 2 -Force | Out-Null         # LOCAL_DUMP
New-ItemProperty -Path $spe -Name LocalDumpFolder -PropertyType String -Value $dumps -Force | Out-Null
New-ItemProperty -Path $spe -Name DumpType        -PropertyType DWord  -Value 2 -Force | Out-Null         # full memory

How often does it reproduce? Is there a required condition?

Required condition: Windows + Node 24.0–24.15 (libuv 1.50/1.51) + a high rate of short-lived outbound HTTP connections. On 24.15.0 it crashed in 4/4 CI jobs with REPRO_HOST=127.0.0.1 and 2/4 with localhost (each job = 4 × 60 s trials). Address-family independent (both sockaddr_in and sockaddr_in6). A raw net.connect() + immediate destroy() storm did not trigger it — the HTTP request/response round-trip (or http.Agent connect setup) seems required, not the bare TCP connect.

What is the expected behavior? Why is that the expected behavior?

The connection storm runs to completion (prints done: …) and the process exits 0, as it does on Linux and on Node 24.16.0 / 25.x.

What do you see instead?

The process dies mid-run with no 'exit'/uncaughtException/unhandledRejection, no --report-on-fatalerror/--report-on-signal output, and no standard SEH/WER crash record — a "silent" death. A full-memory dump shows the main thread executing a /GS stack-cookie failure (a stack buffer overrun) in the connect path:

node::TCPWrap::Connect<sockaddr_in6>           (also <sockaddr_in>)
  → uv_tcp_connect
    → uv__tcp_connect / uv__tcp_try_connect     (identical-COMDAT-folded)
      → __security_check_cookie → __report_gsfailure → __fastfail(2)

RIP is inside __report_gsfailure; bytes at RIP are cd 29 (int 29h = __fastfail); ecx = 2 (FAST_FAIL_STACK_COOKIE_CHECK_FAILURE). The corrupted frame is uv__tcp_connect's (~0x218 bytes); in every captured sample the /GS cookie was overwritten but the saved return address (the slot above it) was intact, so /GS turned the corruption into a controlled crash. Symbolized against the official node.pdb for 24.15.0 (debug-id 3c541b69-34a1-cd3f-…).

Additional information

Version bisect (Windows windows-latest; each cell = jobs × 4 × 60 s trials):

Node bundled libuv localhost (→ ::1) 127.0.0.1
24.15.0 1.51.0 crashes (2/4) crashes (4/4)
24.16.0 1.52.1 clean (0/4) clean (0/4)
25.9.0 1.51.0 clean (0/4) clean (0/4)

It appears already fixed in 24.16.0, so the main asks are:

  1. Can you confirm the fix is intentional and complete on the 24.x LTS line? Note 25.9.0 is clean despite carrying the same nominal libuv (1.51.0) as the broken 24.15.0, so the fix doesn't map to the upstream libuv tag alone — likely a back-ported libuv patch and/or a Node/V8/build difference present in 24.16.0 and 25.x but not 24.15.0. Identifying the exact commit would confirm coverage.
  2. Since it's a memory-safety defect (out-of-bounds stack write, CWE-121) that causes an unrecoverable crash, does it warrant a security advisory so users still on 24.0–24.15 are notified to upgrade? I've kept a private/HackerOne-formatted writeup ready if you'd prefer to handle it through the security process — happy to route it there instead.

Relevant source:

I can provide the symbolized full-memory minidump(s), the complete faulting-thread stack + registers, the full bisect matrix, and the non-triggering raw-TCP variant for diffing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions