Fix socket/fd leak in streaming endpoints (#2766)#3412
Conversation
|
Pushed an update: added |
7533234 to
3e5cfca
Compare
|
Scoped this PR down to just the leak fix. I removed the optional stream-collector that was bundled in earlier — it added a background thread and a new public API that turn a surgical fix into a design discussion, and the |
The streaming helpers (logs, events, attach, stats and raw exec output) hand back a generator that reads from the daemon socket. If you stop reading before the end - break out of the loop, hit an exception, or just drop the generator - the response was never closed, so the socket and its fd leaked. _get_raw_response_socket also keeps a reference to the response on the socket, so the garbage collector doesn't reliably clean it up. Wrap the read loops in try/finally so the response is closed when the generator is closed or garbage collected. Socket setup still happens on the first read, so nothing else about these calls changes. Closes docker#2766 Signed-off-by: lakshyaraj <raolakshyaraj@gmail.com>
Document that streams now close themselves when you stop reading, and add benchmarks/stream_leak.py - a small reproduction that needs no daemon: it opens a few hundred streams, reads one line from each, walks away, and counts the sockets left open. Before the fix every one leaks; after it, none do. Signed-off-by: lakshyaraj <raolakshyaraj@gmail.com>
3e5cfca to
c2fe936
Compare
Closes #2766.
What's wrong
container.logs(stream=True),events(),attach(),stats()and the rawexec streams all return a generator that reads from a socket to the daemon. If
you stop reading before the stream ends - you
break, an exception is raised,or the generator just goes out of scope - the response never gets closed, so the
socket and its fd stay open.
_get_raw_response_socket()also stores a referenceto the response on the socket, so the garbage collector can't always clean it up
on its own.
_read_from_socket(stream=True)even said so in a comment: "thecaller is responsible for closing the response."
If you have a long-running process that tails logs and bails out early, those
fds add up until you run out. That's the "ResourceWarning: unclosed
<socket.socket>" people kept hitting on the issue.
The change
Wrap the read loop in each streaming generator in
try/finally: response.close(). When you stop iterating, Python throwsGeneratorExitintothe generator (on
.close()and on garbage collection), sofinallyis wherethe socket gets released. Socket setup still happens on the first read like
before, so the calls behave exactly the same otherwise.
Only three helpers in
docker/api/client.pychanged:_stream_raw_result,_multiplexed_response_stream_helper, and_read_from_socket.Proof it leaked
benchmarks/stream_leak.pyreproduces it without a daemon. It serves an endlesschunked response locally, opens 200 streams, reads one chunk from each, stops,
and counts how many sockets are still open (and double-checks with psutil):
Tests and docs
tests/unit/stream_leak_test.pycovers early break, an exception in theconsumer, and an explicit
.close(). The tests use fakes, so they don't leanon any
requests/urllib3 internals.ruffis clean and the unit tests pass locally. I don't have a daemon set uphere, so the integration tests are up to CI.
No new dependencies, and nothing changes for existing callers except that
streams now get closed when you stop reading them.