Open
Conversation
Agent-Logs-Url: https://github.com/aceld/zinx/sessions/004c8faa-6151-4a22-bacc-73d7310cb665 Co-authored-by: aceld <7778936+aceld@users.noreply.github.com>
Agent-Logs-Url: https://github.com/aceld/zinx/sessions/004c8faa-6151-4a22-bacc-73d7310cb665 Co-authored-by: aceld <7778936+aceld@users.noreply.github.com>
Agent-Logs-Url: https://github.com/aceld/zinx/sessions/004c8faa-6151-4a22-bacc-73d7310cb665 Co-authored-by: aceld <7778936+aceld@users.noreply.github.com>
Agent-Logs-Url: https://github.com/aceld/zinx/sessions/004c8faa-6151-4a22-bacc-73d7310cb665 Co-authored-by: aceld <7778936+aceld@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Implement graceful shutdown for WebSocket server
Fix graceful shutdown for WebSocket server
May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Server.Stop()in WebSocket-only mode did not stop the HTTP listener —ListenWebsocketConnusedhttp.ListenAndServedirectly with no shutdown handle, leaving the port bound indefinitely afterStop()returned.Changes
znet/server.goListenWebsocketConn: replaced barehttp.ListenAndServe*with an explicit*http.Serverusing a localhttp.NewServeMux()(avoids globalDefaultServeMuxpollution); stores the instance ins.wsServer; starts the listener in a goroutine; blocks on<-s.exitChan(consistent with TCP/KCP pattern); callsShutdown(ctx)withwsShutdownTimeout(5 s) on exit, falling back toClose()Serverstruct: addedwsServer *http.ServerandstopOnce sync.OnceStop(): wrapsclose(exitChan)instopOnce.Do— makes repeatedStop()calls safe without panicwsShutdownTimeoutpackage-level constantznet/server_test.goTestWebsocketServerGracefulStop: starts a WS-only server, callsStop(), asserts it returns within 3 s and the port is subsequently rebindableTestWebsocketServerStopIdempotent: verifies doubleStop()does not panicwaitForPort,waitForPortListening) with named constants (retryInterval,dialTimeout) instead of fixed sleepsOriginal prompt
目标:在
aceld/zinx仓库master分支上实现 WebSocket 模式下的“优雅停服”,确保调用(*znet.Server).Stop()时能停止/关闭 WebSocket HTTP Server(释放端口),而不仅仅是不阻塞。背景:此前 PR #399 已修复
Stop()中exitChan <- struct{}{}在 websocket-only 模式下无人接收导致永久阻塞的问题(改为仅close(exitChan))。但当前ListenWebsocketConn()仍调用http.ListenAndServe/ListenAndServeTLS,没有可控的http.Server实例,导致Stop()无法停止 WebSocket 监听。需要实现:
http.ListenAndServe*改为显式创建并保存*http.Server(例如存到Server结构体字段,如wsServer *http.Server),并使用自建ServeMux(避免污染全局http.DefaultServeMux)。Server.Stop()中:exitChan之外,增加对 websocket server 的关闭逻辑:优先Shutdown(ctx)(带超时 context),必要时兜底Close()。Start()里初始化相关字段;Stop()里判空。Stop()可重复调用不 panic(建议用sync.Once或原子状态,确保close(exitChan)只执行一次;如果仓库已有相关机制则复用)。验收标准:
Stop():不阻塞,且 HTTP server 能退出并释放监听端口。go test ./...(若存在)通过;CI 不应因为该变更失败。相关 PR:
PR 标题(英文翻译):将中文“解决WebSocket优雅停服的问题”翻译为英文作为 PR 标题。建议译为:
Fix graceful shutdown for WebSocket server或Fix WebSocket graceful shutdown(选择更自然的一个)。The following is the prior conversation context from the user's chat exploration (may be truncated):
User: @aceld/zinx/pull/399 这个PR,帮我验证一下正确性,用zinx的最新的master代码验证
Assistant: 结论:PR #399 的改动在“用 master 最新代码验证”的语境下是正确的,而且能修复作者描述的 websocket-only 模式 Stop() 永久阻塞问题;合并风险低。
它把
Stop()里对exitChan的“发送信号”去掉,仅保留close(exitChan),避免在无人接收时卡死。PR Summary(#399)
fix: prevent exitChan blocking in websocket modeaceld/zinx:master(当前 PR base sha:6fb5bd58...)mergeable=true,mergeable_state=clean(无冲突)<-s.exitChan,导致s.exitChan <- struct{}{}永久阻塞(我这边通过对 master 的
znet/server.go现状核对,确认你们 master 里Stop()确实存在该阻塞发送。)Core Changes(核心改动点)
改动文件:
znet/server.goStop()当前 master 逻辑是:s.ConnMgr.ClearConn()s.exitChan <- struct{}{}(这里可能阻塞)close(s.exitChan)PR #399 删除了第 2 步的发送,只保留 close:
close(ch)的语义是:立即唤醒所有正在<-ch的接收方(返回零值,且ok=false),不会阻塞。ch <- x如果没有接收者就会一直阻塞(除非 ch 是带缓冲且有空位)。在你们的实现里:
ListenTcpConn()/ListenKcpConn()末尾确实有select { case <-s.exitChan: ... },会有接收者。ListenWebsocketConn()里没有任何<-s.exitChan,而且它调用http.ListenAndServe会阻塞在内部;因此当只启动 websocket 模式时,Stop() 的发送一定可能卡死(正是 PR 描述的问题)。所以删掉发送是合理的,而且“仅 close”对 TCP/KCP 仍然有效(它们的
<-s.exitChan会被 close 触发)。变更 Diff(关键代码,按 PR patch)
func (s *Server) Stop() { zlog.Ins().InfoF("[STOP] Zinx server , name %s", s.Name) // Clear other connection information or other information that needs to be cleaned up // (将其他需要清理的连接信息或者其他信息 也要一并停止或者清理) s.ConnMgr.ClearConn() - s.exitChan <- struct{}{} close(s.exitChan) }Merge Readiness & Risk
我认为可以合并(风险低),原因:
close(exitChan)足以触发 TCP/KCP 的退出分支;不会引入新阻塞。close相比 “send+close” 更符合“广播退出信号”的常见 Go 模式。需要注意的一个边界(不是 PR 引入的,是原代码就存在/或值得补强):
Stop()可能被调用多次,close(s.exitChan)会 panic(重复 close)。这一点 PR 没有解决/也没有恶化;但如果你们期望 Stop 幂等,后续可以加sync.Once或defer recover/原子状态保护。我建议你在 master 上如何“验证”(本地复现/回归点)
你说“用 zinx 最新 master 代码验证”,最直接的验证标准是:
websocket-only 模式:调用
Stop()不应阻塞/超时s.exitChan <- struct{}{}tcp/kcp 模式:调用
Stop()后监听协程能走到case <-s.exitChan:分支,正常关闭 listener如果你愿意,我可以继续帮你把“websocket 模式下 Stop 如何真正让 ListenAndServe 退出(优雅关闭 http server)”也一起评估下;因为当前实现即使不阻塞,
ListenWebsocketConn()的 http server 也不会因为 exitChan 自动停(需要http.Server{}+Shutdown/Close才能优雅停)。User: 也就是,我现在可以合并这个PR吗?还是你帮我继续回复下
Assistant: 可以合并。就这个 PR 的改动本身(删掉
s.exitChan <- struct{}{})来说,它修复了 websocket-only 模式下Stop()可能永久阻塞的问题,且对 TCP/KCP 模式不构成破坏,合并风险低。但我建议你在合并前/合并后都明确团队预期:它只解决“Stop 不要卡死”,并不等于 websocket 服务器会被 Stop 优雅关闭(当前
ListenWebsocketConn()用的是http.ListenAndServe(...),没有Shutdown/Close控制柄,所以 Stop 也停不掉 HTTP 服务,只是不再阻塞在 exitChan 上)。我建议你在 PR 里给作者的回复(可直接粘贴)
<-exitChan接收者,exitChan <- struct{}{}会阻塞;用close(exitChan)广播退出信号更合适。ListenAndServe,如需完整停服需要后续改造(引入http.Server并在 Stop 中调用Shutdown/Close)。你可以这样回:
...
This pull request was created from Copilot chat.