Skip to content

[fix](orc) be crash while read orc with mismatch types#30738

Merged
lide-reed merged 1 commit intoapache:branch-1.2-ltsfrom
cambyzju:fix-0202
Feb 2, 2024
Merged

[fix](orc) be crash while read orc with mismatch types#30738
lide-reed merged 1 commit intoapache:branch-1.2-ltsfrom
cambyzju:fix-0202

Conversation

@cambyzju
Copy link
Copy Markdown
Contributor

@cambyzju cambyzju commented Feb 2, 2024

Proposed changes

While read orc data through multi-catalog, if column type in meta is string, but the real data type is INT or Timestamp or others, it may make be crash:

(gdb) bt
#0  0x00007f73c30c2387 in raise () from /lib64/libc.so.6
#1  0x00007f73c30c3a78 in abort () from /lib64/libc.so.6
#2  0x000056552fad78b9 in ?? ()
#3  0x000056552faccecd in google::LogMessage::Fail() ()
#4  0x000056552facf409 in google::LogMessage::SendToLog() ()
#5  0x000056552facca36 in google::LogMessage::Flush() ()
#6  0x000056552facfa79 in google::LogMessageFatal::~LogMessageFatal() ()
#7  0x000056552b76b115 in doris::vectorized::ColumnString::check_chars_length (element_number=<optimized out>, total_length=<optimized out>, 
    this=<optimized out>) at /data/doris-1.x/be/src/vec/columns/column_string.h:66
#8  doris::vectorized::ColumnString::insert_many_strings (this=<optimized out>, strings=<optimized out>, num=4064)
    at /data/doris-1.x/be/src/vec/columns/column_string.h:270
#9  0x000056552f66548c in doris::vectorized::OrcReader::_decode_string_column (this=this@entry=0x7f726d3b6a00, col_name=..., data_column=..., 
    type_kind=@0x7f67b47bc0e0: orc::TIMESTAMP, cvb=cvb@entry=0x7f6d98aadd60, num_values=4064)
    at /data/doris-1.x/be/src/vec/exec/format/orc/vorc_reader.cpp:649
#10 0x000056552f670300 in doris::vectorized::OrcReader::_orc_column_to_doris_column (this=this@entry=0x7f726d3b6a00, col_name=..., doris_column=..., 
    data_type=..., orc_column_type=0x7f6d96907440, cvb=0x7f6d98aadd60, num_values=4064) at /data/doris-1.x/be/src/vec/exec/format/orc/vorc_reader.cpp:738
#11 0x000056552f672d50 in doris::vectorized::OrcReader::get_next_block (this=0x7f726d3b6a00, block=0x7f6d7fe59b20, read_rows=0x7f67b47bc318, 
    eof=<optimized out>) at /data/doris-1.x/be/src/vec/exec/format/orc/vorc_reader.cpp:788
#12 0x000056552f633ad8 in doris::vectorized::VFileScanner::_get_block_impl (this=0x7f6f051b3a00, state=<optimized out>, block=0x7f6d7fe59b20, 
    eof=0x7f67b47bc539) at /data/doris-1.x/be/src/vec/exec/scan/vfile_scanner.cpp:155
#13 0x000056552f5ff329 in doris::vectorized::VScanner::get_block (this=this@entry=0x7f6f051b3a00, state=state@entry=0x7f6d70a61900, 
    block=block@entry=0x7f6d7fe59b20, eof=eof@entry=0x7f67b47bc539) at /data/doris-1.x/be/src/vec/exec/scan/vscanner.cpp:54
#14 0x000056552f5fc682 in doris::vectorized::ScannerScheduler::_scanner_scan (this=<optimized out>, scheduler=<optimized out>, ctx=0x7f6f007bdc00, 
    scanner=0x7f6f051b3a00) at /data/doris-1.x/be/src/vec/exec/scan/scanner_scheduler.cpp:247
#15 0x000056552a903b15 in std::function<void ()>::operator()() const (this=<optimized out>) at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:560
#16 doris::FunctionRunnable::run (this=<optimized out>) at /data/doris-1.x/be/src/util/threadpool.cpp:46
#17 doris::ThreadPool::dispatch_thread (this=0x7f739ad6b180) at /data/doris-1.x/be/src/util/threadpool.cpp:535
#18 0x000056552a8f8f6f in std::function<void ()>::operator()() const (this=0x7f7267d7da38) at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:560
#19 doris::Thread::supervise_thread (arg=0x7f7267d7da20) at /data/doris-1.x/be/src/util/thread.cpp:454
#20 0x00007f73c2e77ea5 in start_thread () from /lib64/libpthread.so.0
#21 0x00007f73c318a9fd in ioperm () from /lib64/libc.so.6
#22 0x0000000000000000 in ?? ()

Here we expect orc::StringVectorBatch*, but the real type is orc::TimestampVectorBatch*:
image

In the code, we already try to check the cast result:

    auto* data = down_cast<orc::StringVectorBatch*>(cvb);
    if (data == nullptr) {
        return Status::InternalError("Wrong data type for colum '{}'", col_name);
    }

Unfortunately down_cast equals to static_cast, because assert do not work in release build type.

template <typename To, typename From> // use like this: down_cast<T*>(foo);
inline To down_cast(From* f) {        // so we only accept pointers
    // Ensures that To is a sub-type of From *.  This test is here only
    // for compile-time type checking, and has no overhead in an
    // optimized build at run-time, as it will be optimized away
    // completely.

    // TODO(user): This should use COMPILE_ASSERT.
    if (false) {
        ::implicit_cast<From*, To>(NULL);
    }

    // uses RTTI in dbg and fastbuild. asserts are disabled in opt builds.
    assert(f == NULL || dynamic_cast<To>(f) != NULL);
    return static_cast<To>(f);
}

While we build release version, we define NDEBUG to turn off assert

# For CMAKE_BUILD_TYPE=Release
#   -O3: Enable all compiler optimizations
#   -DNDEBUG: Turn off dchecks/asserts/debug only code.
set(CXX_FLAGS_RELEASE "${CXX_GCC_FLAGS} -O3 -DNDEBUG")

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link
Copy Markdown

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Copy link
Copy Markdown
Contributor

@lide-reed lide-reed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lide-reed lide-reed merged commit d9539d4 into apache:branch-1.2-lts Feb 2, 2024
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 2, 2024
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 2, 2024

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 2, 2024

PR approved by anyone and no changes requested.

@cambyzju
Copy link
Copy Markdown
Contributor Author

cambyzju commented Feb 4, 2024

#23068 for master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/vectorization reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants