[opt](s3) auto retry when meeting 429 error#35396
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 41184 ms |
TPC-DS: Total hot run time: 169215 ms |
ClickBench: Total hot run time: 30.55 s |
|
TeamCity be ut coverage result: |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
LGTM |
| &s3_bytes_read_total); | ||
| // Although we can get QPS from s3_bytes_per_read, but s3_bytes_per_read only | ||
| // record successfull request, and s3_get_request_qps will record all request. | ||
| bvar::PerSecond<bvar::Adder<uint64_t>> s3_get_request_qps("s3_file_reader", "s3_get_request", |
There was a problem hiding this comment.
s3_bvar::s3_get_latency can get qps/P99 latency altogether. It seems no need to add this bvar
There was a problem hiding this comment.
s3_get_latency can only save "success" request.
s3_get_request_qps saves both "success" and "failed request"
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
TPC-H: Total hot run time: 39925 ms |
TPC-DS: Total hot run time: 175217 ms |
ClickBench: Total hot run time: 31.11 s |
| while (retry_count <= max_retries) { | ||
| s3_file_reader_read_counter << 1; | ||
| // clang-format off | ||
| auto resp = client->get_object( { .bucket = _bucket, .key = _key, }, |
There was a problem hiding this comment.
| auto resp = client->get_object( { .bucket = _bucket, .key = _key, }, | |
| { | |
| SCOPED_BVAR_LATENCY(s3_bvar::s3_get_latency); | |
| auto resp = client->get_object( { .bucket = _bucket, .key = _key, }, | |
| } |
|
PR approved by at least one committer and no changes requested. |
Sometime the s3 sdk will return error like: ``` QpsLimitExceeded Unable to parse ExceptionName: QpsLimitExceeded Message: Please reduce your request rate. code=429 type=100, request_id=66516C288EC49 ``` We should slowdown the request rate by sleeping for a while. - Add 2 new BE config - `s3_read_base_wait_time_ms` and `s3_read_max_wait_time_ms` When meet s3 429 error, the "get" request will sleep `s3_read_base_wait_time_ms (*1, *2, *3, *4)` ms get try again. The max sleep time is s3_read_max_wait_time_ms and the max retry time is max_s3_client_retry - Add more metrics for s3 file reader - `s3_file_reader_too_many_request`: counter of 429 error. - `s3_file_reader_s3_get_request`: the QPS of s3 get request. - `TotalGetRequest`: Get request counter in profile - `TooManyRequestErr`: 429 error counter in profile - `TooManyRequestSleepTime`: Sum of sleep time after 429 error in profile - `TotalBytesRead`: Total bytes read from s3 in profile 
Proposed changes
Sometime the s3 sdk will return error like:
We should slowdown the request rate by sleeping for a while.
Add 2 new BE config
s3_read_base_wait_time_msands3_read_max_wait_time_msWhen meet s3 429 error, the "get" request will
sleep
s3_read_base_wait_time_ms (*1, *2, *3, *4)ms get try again.The max sleep time is s3_read_max_wait_time_ms
and the max retry time is max_s3_client_retry
Add more metrics for s3 file reader
s3_file_reader_too_many_request: counter of 429 error.s3_file_reader_s3_get_request: the QPS of s3 get request.TotalGetRequest: Get request counter in profileTooManyRequestErr: 429 error counter in profileTooManyRequestSleepTime: Sum of sleep time after 429 error in profileTotalBytesRead: Total bytes read from s3 in profile