[async_wrap] fix fatal error during destroy() calls by trevnorris · Pull Request #9753 · nodejs/node

trevnorris · 2016-11-23T05:41:48Z

Checklist

make -j8 test (UNIX), or vcbuild test nosign (Windows) passes
tests and/or benchmarks are included
commit message follows commit guidelines

Affected core subsystem(s)

async_wrap

Description of change

Calling JS during GC is a no-no. So intead create a queue of all ids
that need to have their destroy() callback called and call them later.

Removed checking destroy() in test-async-wrap-uid because destroy() can
be called after the 'exit' callback.

Missing a reliable test to reproduce the issue that caused the
FATAL_ERROR.

Fixes: #8216
Fixes: #9465

R=@bnoordhuis

Fishrock123 · 2016-11-23T15:12:17Z

CI: https://ci.nodejs.org/job/node-test-pull-request/4968/

Fishrock123 · 2016-11-23T15:22:44Z

test/parallel/test-async-wrap-uid.js

Does this PR remove destroy() then or...?

no. it just makes triggering destroy() before exit a pain. and haven't figured out a good way to test it yet.

addaleax · 2016-11-23T17:17:04Z

src/env.h

nit: size_t

whoops. thanks.

addaleax · 2016-11-23T17:29:01Z

src/env-inl.h

nit: Could you explicitly initialize .length = 0 here, too? (edit: gcc doesn’t seem to like this anyway?)

cool. i'll do it the pre c++11 way.

addaleax · 2016-11-23T17:29:27Z

src/env-inl.h

Doesn’t this set id twice, once in the new instance and once at the end of the prev one?

you're right. fixed.

addaleax · 2016-11-23T17:32:46Z

src/env-inl.h

This should probably also set destroy_ids_current_->length = 0;?

ah yup. fixed.

addaleax · 2016-11-23T17:35:02Z

src/env.h

teeny-tiny nit: Could you make this the last member of the struct so that length, prev and the first few entries all fit into a single cache line?

ooh. nice. done.

addaleax · 2016-11-23T17:35:33Z

src/env.h

Naming this get_… makes this sound a bit like a getter without side effects. How about pop_destroy_id() or something like that? (And if you like, correspondingly push_destroy_id?)

good idea. done.

trevnorris · 2016-11-23T22:59:55Z

Fixed things addressed in first review.

CI: https://ci.nodejs.org/job/node-test-pull-request/4976/

addaleax

LGTM

bnoordhuis · 2016-11-24T10:47:10Z

src/async-wrap.cc

There should be a HandleScope and Context::Scope here, the env->async_hooks_destroy_function() call below leaks a handle now.

For some reason I thought that was a strong persistent conversion, which I thought wouldn't leak a handle. Since that doesn't seem to be the case I'll add it. Though I'll still leave in the HandleScope in the loop so each function call can be cleaned up.

About the first note, should we be using a SealHandleScope in all these locations so we can detect any leaked handles? Currently it's only used around uv_run, but that doesn't seem to catch leaked handles from an asynchronous call.

Here's my diff to double check the scenario:

diff --git a/src/async-wrap.cc b/src/async-wrap.cc index a06e163..fa91e1a 100644 --- a/src/async-wrap.cc +++ b/src/async-wrap.cc @@ -185,6 +185,8 @@ void AsyncWrap::Initialize(Local<Object> target, void AsyncWrap::DestroyIdsCb(uv_idle_t* handle) { Environment* env = Environment::from_destroy_ids_idle_handle(handle); + v8::SealHandleScope seal(env->isolate()); + v8::Context::Scope context_scope(env->context()); // This doesn't leak a handle. Local<Function> fn = env->async_hooks_destroy_function(); int64_t current_id;

This never aborts. To double check I added an Object::New(env->isolate()) right after the Context::Scope() which did cause it to abort. So I don't believe this is leaking a handle. But if I am to hoist the TryCatch out of the loop it would be necessary to do anyway.

On the note of always using a SealHandleScope on the return of an async call. Guess that would only be useful if there was a centralized way to add the wrapper around uv_run every time it's about to call the callback. You know if there's a way to wrap wrap all callbacks from uv_run in a SealHandleScope? (update: see comment below about this being incorrect)

So, I tried this and it surprisingly didn't abort:

diff --git a/src/async-wrap.cc b/src/async-wrap.cc index a06e163..3e57ee8 100644 --- a/src/async-wrap.cc +++ b/src/async-wrap.cc @@ -185,6 +185,8 @@ void AsyncWrap::Initialize(Local<Object> target, void AsyncWrap::DestroyIdsCb(uv_idle_t* handle) { Environment* env = Environment::from_destroy_ids_idle_handle(handle); + v8::SealHandleScope seal(env->isolate()); + v8::Context::Scope context_scope(env->context()); // This doesn't leak a handle. Local<Function> fn = env->async_hooks_destroy_function(); int64_t current_id; @@ -194,11 +196,12 @@ void AsyncWrap::DestroyIdsCb(uv_idle_t* handle) { if (fn.IsEmpty()) return env->erase_destroy_ids(); + TryCatch try_catch(env->isolate()); + while (env->destroy_ids_length() > 0) { current_id = env->pop_destroy_id(); HandleScope scope(env->isolate()); Local<Value> argv = Number::New(env->isolate(), current_id); - TryCatch try_catch(env->isolate()); MaybeLocal<Value> ret = fn->Call( env->context(), Undefined(env->isolate()), 1, &argv);

Which surprises me a bit, but leads me to believe it isn't leaking a handle not wrapping the TryCatch in a HandleScope. Guess since it's using RAII it doesn't need the HandleScope to clean up?

Forget everything I said about SealHandleScope not catching handles from an async call. My test case was incorrect. It does catch all leaked handles.

For some reason I thought that was a strong persistent conversion, which I thought wouldn't leak a handle.

Oh, you're right. That's an awfully subtle implementation detail though, I'd just create the HandleScope. You'll need the Context::Scope anyway.

You're right. Something that subtle could easily lead to future pain. Will add.

On the side, I'm surprised that both Context::Scope and TryCatch don't leak a handle. Think the Context::Scope would still work properly without the HandleScope?

Context::Scope doesn't strictly need a HandleScope because it doesn't really do anything with handles, it just changes the active context. (The context itself is a handle, of course.)

Something similar applies to TryCatch, it merely changes the active landing pad for JS exceptions.

bnoordhuis · 2016-11-24T10:49:26Z

src/async-wrap.cc

You could lift the TryCatch out of the loop, that would be (marginally) more efficient.

Ah yup. Will do.

bnoordhuis · 2016-11-24T10:50:05Z

src/async-wrap.cc

Is there a reason the declaration is done outside the loop?

bnoordhuis · 2016-11-24T10:50:41Z

src/async-wrap.cc

Cache env->context() in a handle outside the loop, it creates a new handle.

Ah thanks. Thought it was a strong persistent conversion, which I believe doesn't create a new handle.

You're right, never mind.

bnoordhuis · 2016-11-24T10:55:13Z

src/env.h

Is there a reason you don't use a std::vector<int64_t>?

If I judge the implementation complexity between using and not using the STL to be comparable then I usually choose to not use the STL. Though the thought of using std::vector doesn't bother me. Can switch it if that's your preference.

It's simpler than hand-rolling a vector-like class. If you're worried about excessive copying, you could simply vec.reserve(512) at startup.

bnoordhuis · 2016-11-25T15:31:27Z

src/async-wrap.cc

For some reason I thought that was a strong persistent conversion, which I thought wouldn't leak a handle.

Oh, you're right. That's an awfully subtle implementation detail though, I'd just create the HandleScope. You'll need the Context::Scope anyway.

bnoordhuis · 2016-11-25T15:32:04Z

src/async-wrap.cc

You're right, never mind.

bnoordhuis · 2016-11-25T15:34:39Z

src/env.h

It's simpler than hand-rolling a vector-like class. If you're worried about excessive copying, you could simply vec.reserve(512) at startup.

danscales · 2016-11-29T01:27:13Z

@trevnorris We have now seen this bug (same assert in execution.cc) on node v4.5.0, possibly because of our use of weak pointers via the 'weak' module (https://github.com/TooTallNate/node-weak ). Do you think your fix will backport well to node v4.x? Also, just to confirm, your change is not removing weak functionality, I think, so weak pointer functionality from the 'weak' module would still work, at least in node v6 -- is that correct?

trevnorris · 2016-11-29T06:14:30Z

@danscales That issue will be with the module node-weak itself. It'll need a PR to fix when it runs the callback. This fix will be backported in some form to all LTS versions that are affected by this issue.

your change is not removing weak functionality

No. It's simply delaying when the callback runs.

trevnorris · 2016-11-29T07:02:25Z

@bnoordhuis comments should be addressed in commit REVIEW finish fixes for second review round

trevnorris · 2016-11-29T07:58:28Z

CI: https://ci.nodejs.org/job/node-test-commit/6219/

bnoordhuis

LGTM with some final comments.

bnoordhuis · 2016-12-01T21:38:22Z

src/async-wrap.cc

Can you at least add a SealHandleScope? That said, a HandleScop isn't that expensive to create and it's more robust. Debugging a crash or a leak six months from now because something changed would be highly annoying.

I misread the following stack:

FATAL ERROR: v8::HandleScope::CreateHandle() Cannot create a handle without a HandleScope 1: node::Abort() [./node_g] 2: 0x21986e3 [./node_g] 3: v8::Utils::ReportApiFailure(char const*, char const*) [./node_g] 4: v8::Utils::ApiCheck(bool, char const*, char const*) [./node_g] 5: v8::internal::HandleScope::Extend(v8::internal::Isolate*) [./node_g] 6: v8::internal::HandleScope::CreateHandle(v8::internal::Isolate*, v8::internal::Object*) [./node_g] 7: v8::internal::HandleScope::GetHandle(v8::internal::Isolate*, v8::internal::Object*) [./node_g] 8: v8::internal::HandleBase::HandleBase(v8::internal::Object*, v8::internal::Isolate*) [./node_g] 9: v8::internal::Handle<v8::internal::JSFunction>::Handle(v8::internal::JSFunction*, v8::internal::Isolate*) [./node_g] 10: v8::internal::Isolate::object_function() [./node_g] 11: v8::Object::New(v8::Isolate*) [./node_g] 12: node::AsyncWrap::DestroyIdsCb(uv_idle_s*) [./node_g]

Which is generated by the following diff when running test-async-wrap-check-providers.js:

diff --git a/src/async-wrap.cc b/src/async-wrap.cc index 3887179..0bdde8d 100644 --- a/src/async-wrap.cc +++ b/src/async-wrap.cc @@ -190,6 +190,7 @@ void AsyncWrap::DestroyIdsCb(uv_idle_t* handle) { // will catch this can cause the process to abort. v8::Context::Scope context_scope(env->context()); Local<Function> fn = env->async_hooks_destroy_function(); + Local<Object> test = Object::New(env->isolate()); uv_idle_stop(handle);

Now I'm confused as to why the SealHandleScope would be needed since I get the same stack even if I add the SealHandleScope before it. Thoughts?

Anyway, I'll add the HandleScope. Hadn't yet b/c I haven't been able to write a test that would cause the script to crash w/o it. If you have an example of one that would be excellent.

There's a SealHandleScope at the bottom of the stack (node.cc creates one before calling uv_run()) but I figure it's better to create one here explicitly than rely on a detail from elsewhere.

bnoordhuis · 2016-12-01T21:39:01Z

src/async-wrap.cc

Move move this to the top of the function? It looks a little incongruous here.

sure. just habit of placing it after the variable declarations.

bnoordhuis · 2016-12-01T21:41:11Z

src/async-wrap.cc

trevnorris · 2016-12-01T23:21:49Z

@bnoordhuis fixed those three comments.

CI: https://ci.nodejs.org/job/node-test-pull-request/5087/

The constructor and destructor shouldn't have been placed in the -inl.h file from the beginning. PR-URL: nodejs#9753 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

This is how it's done everywhere else in core. Make it follow suit. PR-URL: nodejs#9753 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

Calling JS during GC is a no-no. So intead create a queue of all ids that need to have their destroy() callback called and call them later. Removed checking destroy() in test-async-wrap-uid because destroy() can be called after the 'exit' callback. Missing a reliable test to reproduce the issue that caused the FATAL_ERROR. PR-URL: nodejs#9753 Fixes: nodejs#8216 Fixes: nodejs#9465 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

trevnorris · 2016-12-01T23:51:18Z

Thanks. Landed in 517e3a6, cf5f4b8 and b49b496.

trevnorris · 2016-12-01T23:55:10Z

This'll need to be backported to v7, and possibly v6? @nodejs/lts how far back should land? since it's related to other asyncwrap changes coming though perhaps i'll just backport to v4 to make life easier.

The constructor and destructor shouldn't have been placed in the -inl.h file from the beginning. PR-URL: #9753 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

This is how it's done everywhere else in core. Make it follow suit. PR-URL: #9753 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

Calling JS during GC is a no-no. So intead create a queue of all ids that need to have their destroy() callback called and call them later. Removed checking destroy() in test-async-wrap-uid because destroy() can be called after the 'exit' callback. Missing a reliable test to reproduce the issue that caused the FATAL_ERROR. PR-URL: #9753 Fixes: #8216 Fixes: #9465 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

The constructor and destructor shouldn't have been placed in the -inl.h file from the beginning. PR-URL: nodejs#9753 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

This is how it's done everywhere else in core. Make it follow suit. PR-URL: nodejs#9753 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

Calling JS during GC is a no-no. So intead create a queue of all ids that need to have their destroy() callback called and call them later. Removed checking destroy() in test-async-wrap-uid because destroy() can be called after the 'exit' callback. Missing a reliable test to reproduce the issue that caused the FATAL_ERROR. PR-URL: nodejs#9753 Fixes: nodejs#8216 Fixes: nodejs#9465 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

danscales · 2016-12-09T19:19:12Z

@trevnorris Do you have a suggestion on how to fix node-weak to deal with the same problem? As I mentioned, we get a similar problem to what you fixed in node v6 (and less often in node v4) when we use node-weak extensively. Our application needs weak pointers, but we can get away without actually invoking a javascript callback when the weak object is garbage collected, if needed. So, I tried just disabling the code in weakref.cc/TargetCallback that invokes the javascript callback for weak pointers, but that did not help. So, I'm not sure if there is a much deeper problem in using SetWeak() and so there is no simple fix for weakref.cc. Thanks for any help or suggestions. Seems like it would be good to fix node-weak, since it is actually in the node test suite (test/gc/node_modules/weak)

trevnorris · 2016-12-09T21:22:58Z

@danscales I'd suggest that node-weak also implement a similar implementation to d479826 using a uv_idle_t.

MylesBorins · 2016-12-21T19:35:15Z

@trevnorris if you could manually backport that would be rad!

trevnorris · 2016-12-21T23:11:50Z

@thealphanerd Will do, and hot damn! found a bug with this that's fixed by #10400 :P So will wait for that to land before backporting this.

MylesBorins · 2017-01-23T01:56:49Z

@trevnorris #10400 has landed on v6.x

MylesBorins · 2017-03-07T00:24:30Z

@trevnorris can you backport this?

MylesBorins · 2017-05-08T14:41:01Z

ping @trevnorris

addaleax · 2017-05-09T14:59:14Z

@MylesBorins I’m confused, it looks like this has already landed on v6.x-staging as 59d8255...f3b0cf5?

MylesBorins · 2017-05-09T15:22:03Z

Perhaps this was mislabelled. My audit brought up commits, but you are correct that they are there.
¯\_(ツ)_/¯

trevnorris added the async_wrap label Nov 23, 2016

nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Nov 23, 2016

trevnorris force-pushed the fix-destroy-gc-abort branch from b5fafdd to 49dc16e Compare November 23, 2016 05:42

trevnorris mentioned this pull request Nov 23, 2016

src: don't call into VM from AsyncWrap destructor #9467

Closed

trevnorris force-pushed the fix-destroy-gc-abort branch from 49dc16e to 59c2e01 Compare November 23, 2016 11:12

Fishrock123 reviewed Nov 23, 2016

View reviewed changes

addaleax reviewed Nov 23, 2016

View reviewed changes

trevnorris mentioned this pull request Nov 24, 2016

Garbage Collection Crash #8216

Closed

addaleax approved these changes Nov 24, 2016

View reviewed changes

bnoordhuis requested changes Nov 24, 2016

View reviewed changes

bnoordhuis requested changes Nov 25, 2016

View reviewed changes

trevnorris force-pushed the fix-destroy-gc-abort branch from 96f10b0 to 7fc2efa Compare November 29, 2016 07:55

bnoordhuis approved these changes Dec 1, 2016

View reviewed changes

trevnorris force-pushed the fix-destroy-gc-abort branch from 7fc2efa to dd1601e Compare December 1, 2016 23:21

bnoordhuis approved these changes Dec 1, 2016

View reviewed changes

trevnorris added 3 commits December 1, 2016 16:48

async_wrap: mode constructor/destructor to .cc

517e3a6

The constructor and destructor shouldn't have been placed in the -inl.h file from the beginning. PR-URL: nodejs#9753 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

async_wrap: make Initialize a static class member

cf5f4b8

This is how it's done everywhere else in core. Make it follow suit. PR-URL: nodejs#9753 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>

trevnorris force-pushed the fix-destroy-gc-abort branch from dd1601e to b49b496 Compare December 1, 2016 23:50

trevnorris merged commit b49b496 into nodejs:master Dec 1, 2016

trevnorris added lts-watch-v6.x labels Dec 1, 2016

trevnorris added the lts-watch-v4.x label Dec 1, 2016

trevnorris deleted the fix-destroy-gc-abort branch December 2, 2016 19:41

Fishrock123 mentioned this pull request Dec 5, 2016

v7.2.1 proposal #10127

Merged

2 tasks

This was referenced Apr 11, 2017

src: don't call uv_run() after 'exit' event #12344

Closed

process: flaky behavior of 'exit' event handler #12322

Closed

MylesBorins added backported-to-v6.x and removed lts-watch-v4.x labels May 9, 2017

Uh oh!

Conversation

trevnorris commented Nov 23, 2016

Checklist

Affected core subsystem(s)

Description of change

Uh oh!

Fishrock123 commented Nov 23, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

addaleax Nov 23, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trevnorris commented Nov 23, 2016

Uh oh!

addaleax left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trevnorris Nov 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

addaleax Nov 23, 2016 •

edited

Loading

trevnorris Nov 25, 2016 •

edited

Loading