Support gRPC communication with SMG (Shepherd Model Gateway) workers#3946
Conversation
Minor Change Support grpc communication with smg router
|
@Bihan, could you please share a bit more context behind the PR:
|
Yes, the main benefit is that vLLM gRPC workers are now supported with the SMG router in dstack PD services. Other benefits come from gRPC mode, which applies to both SGLang and vLLM workers.
As far as I understand gRPC should be the chosen over HTTP based worker. For vLLM there is no option but to use gRPC.
With SGLang worker: The worker is gRPC when worker is launched with the option
With vLLM gRPC Worker: With SGLang gRPC worker: Note:
|
| if result["status"] == "ready": | ||
| return result | ||
| return await _get_grpc_worker_payload( | ||
| job_model, worker_url=grpc_worker_url, runtime_type=runtime_type | ||
| ) |
There was a problem hiding this comment.
(nit) What if the worker responded successfully over HTTP, but returned a status other than ready? Is this a valid case? If so, I assume we shouldn't try gRPC, because we already know that the worker responds over HTTP?
There was a problem hiding this comment.
Not a valid SGLang case. SGLang fails at startup and never serves with a non-ready status. See
| try: | ||
| result = await _get_http_worker_payload(job_model, worker_url=http_worker_url) | ||
| except RemoteProtocolError as e: | ||
| logger.debug( | ||
| "HTTP server_info probe failed for %s (trying gRPC): %r", | ||
| http_worker_url, | ||
| e, | ||
| ) | ||
| result: _WorkerPayloadResult = {"status": "not_ready", "payload": None} | ||
| if result["status"] == "ready": | ||
| return result | ||
| return await _get_grpc_worker_payload( | ||
| job_model, worker_url=grpc_worker_url, runtime_type=runtime_type | ||
| ) |
There was a problem hiding this comment.
(nit) This can open and close the same SSH tunnel twice (once in _get_http_worker_payload and once in _get_grpc_worker_payload), which is an expensive operation
There was a problem hiding this comment.
Good catch!
Will I handle it in a separate PR?
Plan:
-
Extract shared tunnel setup into something like
get_service_replica_tunnel(job)(yield UDS path). -
Have the HTTP/gRPC clients use that helper.
-
Open one tunnel and run HTTP then gRPC over the same UDS.
Extends router worker sync to discover and register gRPC SMG workers (vLLM and SGLang PD), in addition to the existing HTTP SGLang path.
gRPC client: Adds
job_replica_grpc_client.py— same SSH tunnel pattern as the HTTP replica client, but opens a gRPC channel over a Unix domain socket (unix://) forwarded to the worker’s service port.Worker registration: To register a worker with SMG, dstack needs runtime_type (vLLM / SGLang) and connection_mode (HTTP / gRPC). Rather than adding new service configuration fields, dstack discovers these by probing workers. Discovery runs in two stages:
First sync (router workers is empty): dstack does not yet know connection_mode or runtime_type. It probes each worker replica: HTTP /server_info and/or gRPC GetServerInfo, trying the SGLang then vLLM gRPC stub until one responds. Registered workers include connection_mode, runtime_type, and PD fields (kv_role / disaggregation_mode, bootstrap port for SGLang prefill).
Later syncs: dstack reads connection_mode and runtime_type from the router’s GET /workers list and reuses them — no repeated protocol or runtime guessing. When connection_mode is grpc, HTTP probes are skipped.