Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #739 +/- ##
===========================================
- Coverage 62.09% 61.87% -0.23%
===========================================
Files 700 704 +4
Lines 40142 41218 +1076
Branches 5650 5908 +258
===========================================
+ Hits 24926 25503 +577
- Misses 14522 14885 +363
- Partials 694 830 +136 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| */ | ||
| int add_timestamp(uint32_t row_index, int64_t timestamp); | ||
|
|
||
| int set_timestamps(const int64_t* timestamps, uint32_t count); |
There was a problem hiding this comment.
Add a comment to explain how cur_row_size is updated after calling this.
cpp/src/common/tablet.cc
Outdated
| int Tablet::set_column_values(uint32_t schema_index, const void* data, | ||
| const uint8_t* null_bitmap, uint32_t count) { |
There was a problem hiding this comment.
add Tablet::set_column_values(uint32_t schema_index, const std::string* data, const uint32_t data_len,
const uint8_t* null_bitmap, uint32_t count)
to support STRING/TEXT/BLOB?
may also use char** to replace std::string* or std::vector to replace data_len.
There was a problem hiding this comment.
We could add a bulk API like set_column_strings, but it wouldn't provide a meaningful performance gain over the current per-row add_value approach. Each string still needs to be individually allocated and copied into the PageArena via dup_from, so the internal implementation would just be a loop doing the same work. The bulk interface would only make the call site slightly cleaner, not faster.
There was a problem hiding this comment.
Yes, that is my intention.
Using two sets of interfaces at the same time is a great burden to understanding.
cpp/src/common/tablet.cc
Outdated
| // Convert Arrow bitmap (1=valid, 0=null) to TsFile bitmap (1=null, | ||
| // 0=valid) by inverting and writing directly. | ||
| char* tsfile_bm = bitmaps_[schema_index].get_bitmap(); | ||
| uint32_t full_bytes = count / 8; | ||
| for (uint32_t i = 0; i < full_bytes; i++) { | ||
| tsfile_bm[i] = ~static_cast<char>(null_bitmap[i]); | ||
| } |
There was a problem hiding this comment.
null_bitmap -> nonnull_bitmap?
cpp/src/cwrapper/arrow_c.cc
Outdated
| if (bm.test(i)) { | ||
| // null row: write zero placeholder in Arrow buffer | ||
| std::memset(static_cast<char*>(data_buffer) + i * type_size, | ||
| 0, type_size); |
There was a problem hiding this comment.
Is it possible to just skip to the next row?
| for (uint32_t i = 0; i < column_count; ++i) { | ||
| children_arrays[i] = nullptr; | ||
| } | ||
|
|
||
| for (uint32_t i = 0; i < column_count; ++i) { | ||
| children_arrays[i] = static_cast<ArrowArray*>( | ||
| common::mem_alloc(sizeof(ArrowArray), common::MOD_TSBLOCK)); | ||
| if (children_arrays[i] == nullptr) { |
There was a problem hiding this comment.
Is the first initialization necessary?
There was a problem hiding this comment.
The cleanup routine after an error checks whether each pointer is null before freeing it. Without initialization, these pointers could contain garbage values, leading to undefined behavior.
cpp/src/cwrapper/arrow_c.cc
Outdated
| out_schema->format = schema_data->format_strings->at(0).c_str(); | ||
| out_schema->name = schema_data->name_strings->at(0).c_str(); |
There was a problem hiding this comment.
Why is the format string a vector?
| if (time_col_index < 0 || time_col_index >= n_cols) | ||
| return common::E_INVALID_ARG; |
There was a problem hiding this comment.
If the reg_schema already specifies the time column, may use it.
There was a problem hiding this comment.
The time_column_index should be set by caller.
cpp/src/cwrapper/tsfile_cwrapper.h
Outdated
| // Write Arrow C Data Interface batch into a table (Arrow -> Tablet -> write). | ||
| // time_col_index: index of the time column in the Arrow struct. | ||
| // >= 0: use the specified column as the time column. | ||
| // < 0: auto-detect by Arrow format "tsn:" (TIMESTAMP type). |
| data: pyarrow.RecordBatch or pyarrow.Table | ||
| time_col_index: index of the time column in the Arrow schema. | ||
| >= 0: use the specified column as the time column. | ||
| < 0: auto-detect by Arrow timestamp type (default). |
There was a problem hiding this comment.
Where is the related logic?

No description provided.