Skip to content

Wrapping a model in static unique_ptr results in seg fault when destructing #131

@ihowell

Description

@ihowell

I'm creating a small library for use in fortran that loads a model, keeps it in memory, and runs it many times. I am currently trying to store the model as a static unique pointer and everything works fine until destruction occurs when the program exits. If I call model.reset() early on the pointer, I still get the error I see below:

ASAN:DEADLYSIGNAL
=================================================================
==13168==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fbd5435ee04 bp 0x7ffef355b480 sp 0x7ffef355b440 T0)
==13168==The signal is caused by a READ memory access.
==13168==Hint: address points to the zero page.
    #0 0x7fbd5435ee03 in TF_DeleteSession (/usr/local/lib/libtensorflow.so.2+0xe17e03)
    #1 0x7fbd7ae24eda in cppflow::model::model(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda(TF_Session*)#1}::operator()(TF_Session*) const /home/ihowell/Projects/nasa/fortran_api/./libs/cppflow/include/cppflow/model.h:55
    #2 0x7fbd7ae396c4 in std::_Sp_counted_deleter<TF_Session*, cppflow::model::model(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda(TF_Session*)#1}, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /usr/include/c++/7/bits/shared_ptr_base.h:470
    #3 0x7fbd7ae2bdec in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /usr/include/c++/7/bits/shared_ptr_base.h:154
    #4 0x7fbd7ae28659 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() /usr/include/c++/7/bits/shared_ptr_base.h:684
    #5 0x7fbd7ae24e93 in std::__shared_ptr<TF_Session, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() /usr/include/c++/7/bits/shared_ptr_base.h:1123
    #6 0x7fbd7ae24eaf in std::shared_ptr<TF_Session>::~shared_ptr() /usr/include/c++/7/bits/shared_ptr.h:93
    #7 0x7fbd7ae3a243 in cppflow::model::~model() /home/ihowell/Projects/nasa/fortran_api/./libs/cppflow/include/cppflow/model.h:30
    #8 0x7fbd7ae3a26f in void __gnu_cxx::new_allocator<cppflow::model>::destroy<cppflow::model>(cppflow::model*) /usr/include/c++/7/ext/new_allocator.h:140
    #9 0x7fbd7ae3a10e in void std::allocator_traits<std::allocator<cppflow::model> >::destroy<cppflow::model>(std::allocator<cppflow::model>&, cppflow::model*) /usr/include/c++/7/bits/alloc_traits.h:487
    #10 0x7fbd7ae39426 in std::_Sp_counted_ptr_inplace<cppflow::model, std::allocator<cppflow::model>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /usr/include/c++/7/bits/shared_ptr_base.h:535
    #11 0x7fbd7ae2bdec in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /usr/include/c++/7/bits/shared_ptr_base.h:154
    #12 0x7fbd7ae28659 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() /usr/include/c++/7/bits/shared_ptr_base.h:684
    #13 0x7fbd7ae278c9 in std::__shared_ptr<cppflow::model, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() /usr/include/c++/7/bits/shared_ptr_base.h:1123
    #14 0x7fbd7ae27921 in std::shared_ptr<cppflow::model>::~shared_ptr() /usr/include/c++/7/bits/shared_ptr.h:93
    #15 0x7fbd7a4a36c4 in __cxa_finalize (/lib/x86_64-linux-gnu/libc.so.6+0x436c4)
    #16 0x7fbd7ae21032  (/home/ihowell/Projects/nasa/fortran_api/build/libexample.so+0x2f032)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/usr/local/lib/libtensorflow.so.2+0xe17e03) in TF_DeleteSession
==13168==ABORTING
(env) ihowell@lake:~/Projects/nasa/fortran_api/examples/basic_usage/build$ ./basic 
2021-06-16 11:43:56.150025: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-06-16 11:43:56.467677: I tensorflow/cc/saved_model/reader.cc:38] Reading SavedModel from: ../model
2021-06-16 11:43:56.470476: I tensorflow/cc/saved_model/reader.cc:90] Reading meta graph with tags { serve }
2021-06-16 11:43:56.470516: I tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: ../model
2021-06-16 11:43:56.470591: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-06-16 11:43:56.471691: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-06-16 11:43:56.535063: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2021-06-16 11:43:56.535117: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: lake
2021-06-16 11:43:56.535133: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: lake
2021-06-16 11:43:56.535185: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.56.0
2021-06-16 11:43:56.535231: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.56.0
2021-06-16 11:43:56.535245: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460.56.0
2021-06-16 11:43:56.567350: I tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle.
2021-06-16 11:43:56.569261: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 3499910000 Hz
2021-06-16 11:43:56.623513: I tensorflow/cc/saved_model/loader.cc:190] Running initialization op on SavedModel bundle at path: ../model
2021-06-16 11:43:56.636943: I tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 169274 microseconds.
(tensor: shape=[10 5], dtype=TF_FLOAT, data=
[[1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]
 ...
 [1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]])
(tensor: shape=[10 1], dtype=TF_FLOAT, data=
[[0.952202678]
 [0.952202678]
 [0.952202678]
 ...
 [0.952202678]
 [0.952202678]
 [0.952202678]])
ASAN:DEADLYSIGNAL
=================================================================
==13447==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f02be04be04 bp 0x7ffe0a51d040 sp 0x7ffe0a51d000 T0)
==13447==The signal is caused by a READ memory access.
==13447==Hint: address points to the zero page.
    #0 0x7f02be04be03 in TF_DeleteSession (/usr/local/lib/libtensorflow.so.2+0xe17e03)
    #1 0x7f02e4b12006 in cppflow::model::model(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda(TF_Session*)#1}::operator()(TF_Session*) const /home/ihowell/Projects/nasa/fortran_api/./libs/cppflow/include/cppflow/model.h:55
    #2 0x7f02e4b26946 in std::_Sp_counted_deleter<TF_Session*, cppflow::model::model(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda(TF_Session*)#1}, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /usr/include/c++/7/bits/shared_ptr_base.h:470
    #3 0x7f02e4b1906e in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /usr/include/c++/7/bits/shared_ptr_base.h:154
    #4 0x7f02e4b157d1 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() /usr/include/c++/7/bits/shared_ptr_base.h:684
    #5 0x7f02e4b11fbf in std::__shared_ptr<TF_Session, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() /usr/include/c++/7/bits/shared_ptr_base.h:1123
    #6 0x7f02e4b11fdb in std::shared_ptr<TF_Session>::~shared_ptr() /usr/include/c++/7/bits/shared_ptr.h:93
    #7 0x7f02e4b274c5 in cppflow::model::~model() /home/ihowell/Projects/nasa/fortran_api/./libs/cppflow/include/cppflow/model.h:30
    #8 0x7f02e4b274f1 in void __gnu_cxx::new_allocator<cppflow::model>::destroy<cppflow::model>(cppflow::model*) /usr/include/c++/7/ext/new_allocator.h:140
    #9 0x7f02e4b27390 in void std::allocator_traits<std::allocator<cppflow::model> >::destroy<cppflow::model>(std::allocator<cppflow::model>&, cppflow::model*) /usr/include/c++/7/bits/alloc_traits.h:487
    #10 0x7f02e4b266a8 in std::_Sp_counted_ptr_inplace<cppflow::model, std::allocator<cppflow::model>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /usr/include/c++/7/bits/shared_ptr_base.h:535
    #11 0x7f02e4b1906e in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /usr/include/c++/7/bits/shared_ptr_base.h:154
    #12 0x7f02e4b157d1 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() /usr/include/c++/7/bits/shared_ptr_base.h:684
    #13 0x7f02e4b149f5 in std::__shared_ptr<cppflow::model, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() /usr/include/c++/7/bits/shared_ptr_base.h:1123
    #14 0x7f02e4b14a99 in std::shared_ptr<cppflow::model>::~shared_ptr() /usr/include/c++/7/bits/shared_ptr.h:93
    #15 0x7f02e41906c4 in __cxa_finalize (/lib/x86_64-linux-gnu/libc.so.6+0x436c4)
    #16 0x7f02e4b0e152  (/home/ihowell/Projects/nasa/fortran_api/build/libexample.so+0x2f152)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/usr/local/lib/libtensorflow.so.2+0xe17e03) in TF_DeleteSession
==13447==ABORTING

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions