[async_wrap] fix fatal error during destroy() calls#9753
[async_wrap] fix fatal error during destroy() calls#9753trevnorris merged 3 commits intonodejs:masterfrom
Conversation
b5fafdd to
49dc16e
Compare
49dc16e to
59c2e01
Compare
test/parallel/test-async-wrap-uid.js
Outdated
There was a problem hiding this comment.
Does this PR remove destroy() then or...?
There was a problem hiding this comment.
no. it just makes triggering destroy() before exit a pain. and haven't figured out a good way to test it yet.
src/env.h
Outdated
src/env-inl.h
Outdated
There was a problem hiding this comment.
nit: Could you explicitly initialize .length = 0 here, too? (edit: gcc doesn’t seem to like this anyway?)
There was a problem hiding this comment.
cool. i'll do it the pre c++11 way.
src/env-inl.h
Outdated
There was a problem hiding this comment.
Doesn’t this set id twice, once in the new instance and once at the end of the prev one?
There was a problem hiding this comment.
you're right. fixed.
src/env-inl.h
Outdated
There was a problem hiding this comment.
This should probably also set destroy_ids_current_->length = 0;?
src/env.h
Outdated
There was a problem hiding this comment.
teeny-tiny nit: Could you make this the last member of the struct so that length, prev and the first few entries all fit into a single cache line?
There was a problem hiding this comment.
ooh. nice. done.
src/env.h
Outdated
There was a problem hiding this comment.
Naming this get_… makes this sound a bit like a getter without side effects. How about pop_destroy_id() or something like that? (And if you like, correspondingly push_destroy_id?)
There was a problem hiding this comment.
good idea. done.
|
Fixed things addressed in first review. |
src/async-wrap.cc
Outdated
There was a problem hiding this comment.
There should be a HandleScope and Context::Scope here, the env->async_hooks_destroy_function() call below leaks a handle now.
There was a problem hiding this comment.
For some reason I thought that was a strong persistent conversion, which I thought wouldn't leak a handle. Since that doesn't seem to be the case I'll add it. Though I'll still leave in the HandleScope in the loop so each function call can be cleaned up.
About the first note, should we be using a SealHandleScope in all these locations so we can detect any leaked handles? Currently it's only used around uv_run, but that doesn't seem to catch leaked handles from an asynchronous call.
There was a problem hiding this comment.
Here's my diff to double check the scenario:
diff --git a/src/async-wrap.cc b/src/async-wrap.cc
index a06e163..fa91e1a 100644
--- a/src/async-wrap.cc
+++ b/src/async-wrap.cc
@@ -185,6 +185,8 @@ void AsyncWrap::Initialize(Local<Object> target,
void AsyncWrap::DestroyIdsCb(uv_idle_t* handle) {
Environment* env = Environment::from_destroy_ids_idle_handle(handle);
+ v8::SealHandleScope seal(env->isolate());
+ v8::Context::Scope context_scope(env->context());
// This doesn't leak a handle.
Local<Function> fn = env->async_hooks_destroy_function();
int64_t current_id;This never aborts. To double check I added an Object::New(env->isolate()) right after the Context::Scope() which did cause it to abort. So I don't believe this is leaking a handle. But if I am to hoist the TryCatch out of the loop it would be necessary to do anyway.
On the note of always using a (update: see comment below about this being incorrect)SealHandleScope on the return of an async call. Guess that would only be useful if there was a centralized way to add the wrapper around uv_run every time it's about to call the callback. You know if there's a way to wrap wrap all callbacks from uv_run in a SealHandleScope?
There was a problem hiding this comment.
So, I tried this and it surprisingly didn't abort:
diff --git a/src/async-wrap.cc b/src/async-wrap.cc
index a06e163..3e57ee8 100644
--- a/src/async-wrap.cc
+++ b/src/async-wrap.cc
@@ -185,6 +185,8 @@ void AsyncWrap::Initialize(Local<Object> target,
void AsyncWrap::DestroyIdsCb(uv_idle_t* handle) {
Environment* env = Environment::from_destroy_ids_idle_handle(handle);
+ v8::SealHandleScope seal(env->isolate());
+ v8::Context::Scope context_scope(env->context());
// This doesn't leak a handle.
Local<Function> fn = env->async_hooks_destroy_function();
int64_t current_id;
@@ -194,11 +196,12 @@ void AsyncWrap::DestroyIdsCb(uv_idle_t* handle) {
if (fn.IsEmpty())
return env->erase_destroy_ids();
+ TryCatch try_catch(env->isolate());
+
while (env->destroy_ids_length() > 0) {
current_id = env->pop_destroy_id();
HandleScope scope(env->isolate());
Local<Value> argv = Number::New(env->isolate(), current_id);
- TryCatch try_catch(env->isolate());
MaybeLocal<Value> ret = fn->Call(
env->context(), Undefined(env->isolate()), 1, &argv);
Which surprises me a bit, but leads me to believe it isn't leaking a handle not wrapping the TryCatch in a HandleScope. Guess since it's using RAII it doesn't need the HandleScope to clean up?
There was a problem hiding this comment.
Forget everything I said about SealHandleScope not catching handles from an async call. My test case was incorrect. It does catch all leaked handles.
There was a problem hiding this comment.
For some reason I thought that was a strong persistent conversion, which I thought wouldn't leak a handle.
Oh, you're right. That's an awfully subtle implementation detail though, I'd just create the HandleScope. You'll need the Context::Scope anyway.
There was a problem hiding this comment.
You're right. Something that subtle could easily lead to future pain. Will add.
On the side, I'm surprised that both Context::Scope and TryCatch don't leak a handle. Think the Context::Scope would still work properly without the HandleScope?
There was a problem hiding this comment.
Context::Scope doesn't strictly need a HandleScope because it doesn't really do anything with handles, it just changes the active context. (The context itself is a handle, of course.)
Something similar applies to TryCatch, it merely changes the active landing pad for JS exceptions.
src/async-wrap.cc
Outdated
There was a problem hiding this comment.
You could lift the TryCatch out of the loop, that would be (marginally) more efficient.
There was a problem hiding this comment.
Ah yup. Will do.
src/async-wrap.cc
Outdated
There was a problem hiding this comment.
Is there a reason the declaration is done outside the loop?
src/async-wrap.cc
Outdated
There was a problem hiding this comment.
Cache env->context() in a handle outside the loop, it creates a new handle.
There was a problem hiding this comment.
Ah thanks. Thought it was a strong persistent conversion, which I believe doesn't create a new handle.
src/env.h
Outdated
There was a problem hiding this comment.
Is there a reason you don't use a std::vector<int64_t>?
There was a problem hiding this comment.
If I judge the implementation complexity between using and not using the STL to be comparable then I usually choose to not use the STL. Though the thought of using std::vector doesn't bother me. Can switch it if that's your preference.
There was a problem hiding this comment.
It's simpler than hand-rolling a vector-like class. If you're worried about excessive copying, you could simply vec.reserve(512) at startup.
src/async-wrap.cc
Outdated
There was a problem hiding this comment.
For some reason I thought that was a strong persistent conversion, which I thought wouldn't leak a handle.
Oh, you're right. That's an awfully subtle implementation detail though, I'd just create the HandleScope. You'll need the Context::Scope anyway.
src/async-wrap.cc
Outdated
src/env.h
Outdated
There was a problem hiding this comment.
It's simpler than hand-rolling a vector-like class. If you're worried about excessive copying, you could simply vec.reserve(512) at startup.
|
@trevnorris We have now seen this bug (same assert in execution.cc) on node v4.5.0, possibly because of our use of weak pointers via the 'weak' module (https://github.com/TooTallNate/node-weak ). Do you think your fix will backport well to node v4.x? Also, just to confirm, your change is not removing weak functionality, I think, so weak pointer functionality from the 'weak' module would still work, at least in node v6 -- is that correct? |
|
@danscales That issue will be with the module
No. It's simply delaying when the callback runs. |
|
@bnoordhuis comments should be addressed in commit |
96f10b0 to
7fc2efa
Compare
bnoordhuis
left a comment
There was a problem hiding this comment.
LGTM with some final comments.
src/async-wrap.cc
Outdated
There was a problem hiding this comment.
Can you at least add a SealHandleScope? That said, a HandleScop isn't that expensive to create and it's more robust. Debugging a crash or a leak six months from now because something changed would be highly annoying.
There was a problem hiding this comment.
I misread the following stack:
FATAL ERROR: v8::HandleScope::CreateHandle() Cannot create a handle without a HandleScope
1: node::Abort() [./node_g]
2: 0x21986e3 [./node_g]
3: v8::Utils::ReportApiFailure(char const*, char const*) [./node_g]
4: v8::Utils::ApiCheck(bool, char const*, char const*) [./node_g]
5: v8::internal::HandleScope::Extend(v8::internal::Isolate*) [./node_g]
6: v8::internal::HandleScope::CreateHandle(v8::internal::Isolate*, v8::internal::Object*) [./node_g]
7: v8::internal::HandleScope::GetHandle(v8::internal::Isolate*, v8::internal::Object*) [./node_g]
8: v8::internal::HandleBase::HandleBase(v8::internal::Object*, v8::internal::Isolate*) [./node_g]
9: v8::internal::Handle<v8::internal::JSFunction>::Handle(v8::internal::JSFunction*, v8::internal::Isolate*) [./node_g]
10: v8::internal::Isolate::object_function() [./node_g]
11: v8::Object::New(v8::Isolate*) [./node_g]
12: node::AsyncWrap::DestroyIdsCb(uv_idle_s*) [./node_g]
Which is generated by the following diff when running test-async-wrap-check-providers.js:
diff --git a/src/async-wrap.cc b/src/async-wrap.cc
index 3887179..0bdde8d 100644
--- a/src/async-wrap.cc
+++ b/src/async-wrap.cc
@@ -190,6 +190,7 @@ void AsyncWrap::DestroyIdsCb(uv_idle_t* handle) {
// will catch this can cause the process to abort.
v8::Context::Scope context_scope(env->context());
Local<Function> fn = env->async_hooks_destroy_function();
+ Local<Object> test = Object::New(env->isolate());
uv_idle_stop(handle);
Now I'm confused as to why the SealHandleScope would be needed since I get the same stack even if I add the SealHandleScope before it. Thoughts?
Anyway, I'll add the HandleScope. Hadn't yet b/c I haven't been able to write a test that would cause the script to crash w/o it. If you have an example of one that would be excellent.
There was a problem hiding this comment.
There's a SealHandleScope at the bottom of the stack (node.cc creates one before calling uv_run()) but I figure it's better to create one here explicitly than rely on a detail from elsewhere.
src/async-wrap.cc
Outdated
There was a problem hiding this comment.
Move move this to the top of the function? It looks a little incongruous here.
There was a problem hiding this comment.
sure. just habit of placing it after the variable declarations.
src/async-wrap.cc
Outdated
7fc2efa to
dd1601e
Compare
|
@bnoordhuis fixed those three comments. |
The constructor and destructor shouldn't have been placed in the -inl.h file from the beginning. PR-URL: nodejs#9753 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
This is how it's done everywhere else in core. Make it follow suit. PR-URL: nodejs#9753 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Calling JS during GC is a no-no. So intead create a queue of all ids that need to have their destroy() callback called and call them later. Removed checking destroy() in test-async-wrap-uid because destroy() can be called after the 'exit' callback. Missing a reliable test to reproduce the issue that caused the FATAL_ERROR. PR-URL: nodejs#9753 Fixes: nodejs#8216 Fixes: nodejs#9465 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
dd1601e to
b49b496
Compare
|
This'll need to be backported to v7, and possibly v6? @nodejs/lts how far back should land? since it's related to other asyncwrap changes coming though perhaps i'll just backport to v4 to make life easier. |
The constructor and destructor shouldn't have been placed in the -inl.h file from the beginning. PR-URL: #9753 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
This is how it's done everywhere else in core. Make it follow suit. PR-URL: #9753 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Calling JS during GC is a no-no. So intead create a queue of all ids that need to have their destroy() callback called and call them later. Removed checking destroy() in test-async-wrap-uid because destroy() can be called after the 'exit' callback. Missing a reliable test to reproduce the issue that caused the FATAL_ERROR. PR-URL: #9753 Fixes: #8216 Fixes: #9465 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
The constructor and destructor shouldn't have been placed in the -inl.h file from the beginning. PR-URL: nodejs#9753 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
This is how it's done everywhere else in core. Make it follow suit. PR-URL: nodejs#9753 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Calling JS during GC is a no-no. So intead create a queue of all ids that need to have their destroy() callback called and call them later. Removed checking destroy() in test-async-wrap-uid because destroy() can be called after the 'exit' callback. Missing a reliable test to reproduce the issue that caused the FATAL_ERROR. PR-URL: nodejs#9753 Fixes: nodejs#8216 Fixes: nodejs#9465 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
|
@trevnorris Do you have a suggestion on how to fix node-weak to deal with the same problem? As I mentioned, we get a similar problem to what you fixed in node v6 (and less often in node v4) when we use node-weak extensively. Our application needs weak pointers, but we can get away without actually invoking a javascript callback when the weak object is garbage collected, if needed. So, I tried just disabling the code in weakref.cc/TargetCallback that invokes the javascript callback for weak pointers, but that did not help. So, I'm not sure if there is a much deeper problem in using SetWeak() and so there is no simple fix for weakref.cc. Thanks for any help or suggestions. Seems like it would be good to fix node-weak, since it is actually in the node test suite (test/gc/node_modules/weak) |
|
@danscales I'd suggest that node-weak also implement a similar implementation to d479826 using a |
|
@trevnorris if you could manually backport that would be rad! |
|
@thealphanerd Will do, and hot damn! found a bug with this that's fixed by #10400 :P So will wait for that to land before backporting this. |
|
@trevnorris #10400 has landed on v6.x |
|
@trevnorris can you backport this? |
|
ping @trevnorris |
|
@MylesBorins I’m confused, it looks like this has already landed on v6.x-staging as 59d8255...f3b0cf5? |
|
Perhaps this was mislabelled. My audit brought up commits, but you are correct that they are there. |
Checklist
make -j8 test(UNIX), orvcbuild test nosign(Windows) passesAffected core subsystem(s)
async_wrap
Description of change
Calling JS during GC is a no-no. So intead create a queue of all ids
that need to have their destroy() callback called and call them later.
Removed checking destroy() in test-async-wrap-uid because destroy() can
be called after the 'exit' callback.
Missing a reliable test to reproduce the issue that caused the
FATAL_ERROR.
Fixes: #8216
Fixes: #9465
R=@bnoordhuis