[opt](brpc) check and remove unavailable brpc stubs#43212
[opt](brpc) check and remove unavailable brpc stubs#43212yiguolei merged 2 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
clang-tidy review says "All clean, LGTM! 👍" |
| void FragmentMgr::register_brpc_with_query(const TNetworkAddress& network_address, | ||
| std::shared_ptr<PBackendService_Stub> brpc_stub, | ||
| TUniqueId query_id) { | ||
| std::lock_guard<std::mutex> lock(_lock); |
There was a problem hiding this comment.
the lock will very very busy, we could save it in query context and collect it during cancel worker
c7ce8e0 to
02163a0
Compare
| c->clear_finished_tasks(); | ||
| } | ||
|
|
||
| std::unordered_map<TNetworkAddress, BrpcItem> brpc_stub_with_queries; |
There was a problem hiding this comment.
不要这么写,会出现2个rpc 链接的host 一样,但是brpc的对象不一样的问题。
可以使用brpc stub的指针做key,然后把network address 放到BrpcItem 这里面,供check的时候打log 使用。
| if (cntl.Failed()) { | ||
| error_message = cntl.ErrorText(); | ||
| LOG(WARNING) << "brpc stub: " << network_address << " check failed: " << error_message; | ||
| } else if (response.has_status() && response.has_hello() && response.hello() == message && |
There was a problem hiding this comment.
Do not need check response value.
If has response, it means we could connect dest succesfully, then the connection is OK.
| request.set_hello(message); | ||
| PHandShakeResponse response; | ||
| brpc::Controller cntl; | ||
| cntl.set_timeout_ms(500); |
There was a problem hiding this comment.
500 * (failed_count + 1) * 2
| private: | ||
| struct BrpcItem { | ||
| std::shared_ptr<PBackendService_Stub> brpc_stub; | ||
| std::map<std::string, std::weak_ptr<QueryContext>> queries; |
There was a problem hiding this comment.
why it is a map? a vector is not ok?
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
run buildall |
|
TeamCity be ut coverage result: |
|
run buildall |
|
TeamCity be ut coverage result: |
|
run buildall |
|
run buildall |
|
TeamCity be ut coverage result: |
|
run buildall |
| @@ -912,9 +915,18 @@ void FragmentMgr::cancel_worker() { | |||
| LOG_WARNING("Query {} is timeout", print_id(it->first)); | |||
| queries_timeout.push_back(it->first); | |||
| ++it; | |||
|
PR approved by at least one committer and no changes requested. |
|
run buildall |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Check List (For Committer)
Test
Behavior changed:
Does this need documentation?
Release note
None
Check List (For Reviewer who merge this PR)