Release 0.20.24 · dstackai/dstack

Dev environments

Zed

dstack now supports Zed as a dev environment IDE:

type: dev-environment
ide: zed
resources:
  gpu: L4

Once the dev environment is up, the CLI prints a zed:// link that opens the remote project in Zed over SSH. Since Zed doesn't require any plugins, no server pre-installation is needed — the Zed server is installed automatically on first connect.

✗ dstack apply
...
Submit a new run? [y/n]: y
 NAME                     BACKEND                  GPU                     PRICE       STATUS      SUBMITTED
 fast-fly-1               aws (us-east-2)          gpu=L4:24GB:1           $0.1838     running     16:36
                                                                           (spot)

fast-fly-1 provisioning completed (running)
pip install ipykernel...

To open in Zed, use link below:

  zed://ssh/fast-fly-1/dstack/run

To connect via SSH, use: `ssh fast-fly-1`

To exit, press Ctrl+C.

Services

Replica groups

The spot_policy and reservation properties can now be specified at the replica group level. This allows distributing replicas across reserved and spot capacity, e.g., running baseline replicas on a reservation while autoscaling overflow replicas on spot instances:

type: service
image: my-image
port: 80

replicas:
  - name: baseline
    reservation: my-reservation
    count: 1

  - name: overflow
    spot_policy: auto
    count: 0..3
    scaling:
      metric: rps
      target: 1

Shepherd Model Gateway

Services using Shepherd Model Gateway now support gRPC communication with both vLLM and SGLang workers. Previously, only the SGLang runtime with the HTTP connection mode was supported.

Below is an example service configuration running vLLM gRPC workers:

type: service
name: prefill-decode

env:
  - HF_TOKEN
  - MODEL_ID=zai-org/GLM-4.5-Air-FP8

replicas:
  - count: 1
    image: python:3.12-slim
    commands:
      - pip install smg
      - |
          smg launch \
            --pd-disaggregation \
            --model-path $MODEL_ID \
            --enable-igw \
            --host 0.0.0.0 \
            --port 8000 \
            --prefill-policy cache_aware
    router:
      type: sglang
    resources:
      cpu: 4

  - count: 1
    image: vllm/vllm-openai:latest
    commands:
      - pip install -U "vllm[grpc]"
      - |
          python3 -m vllm.entrypoints.grpc_server \
            --model $MODEL_ID \
            --host 0.0.0.0 \
            --port 8000 \
            --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_producer"}'
    resources:
      gpu: H200

  - count: 1
    image: vllm/vllm-openai:latest
    commands:
      - pip install -U "vllm[grpc]"
      - |
          python3 -m vllm.entrypoints.grpc_server \
            --model $MODEL_ID \
            --host 0.0.0.0 \
            --port 8000 \
            --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_consumer"}'
    resources:
      gpu: H200

port: 8000

dstack automatically detects each worker's runtime (vLLM or SGLang) and connection mode (HTTP or gRPC) by probing it. With gRPC, the SMG router tokenizes requests once and routes on tokens instead of raw text, reducing duplicate work and making cache_aware routing more effective.

JarvisLabs

The jarvislabs backend now supports offers with RTXPRO6000 GPUs.

Azure

`subnet_ids`

Similarly to vpc_ids, the azure backend now allows selecting specific subnets to be attached to dstack VMs via the new subnet_ids property, mapping regions to subnets in the <resource-group>/<vnet>/<subnet> format:

projects:
  - name: main
    backends:
      - type: azure
        subscription_id: ...
        tenant_id: ...
        creds:
          type: default
        regions: [westeurope]
        subnet_ids:
          westeurope: my-resource-group/my-vnet/my-subnet

This is useful when the VNet contains subnets that dstack shouldn't pick automatically, e.g. subnets delegated to other Azure services.

What's changed

Fix zero scaled services assigned to wrong fleets by @r4victor in #3939
Set runner/shim default compiled versions to latest by @r4victor in #3941
Implement SSH connection pool for runner instances by @r4victor in #3936
[chore]: Move format_backend() to common utils by @jvstme in #3942
Drop non-linux runner builds and local backend by @r4victor in #3944
Support Zed as dev-environment IDE by @r4victor in #3947
Fix dropping ssh connections to non-provisioned terminating instances by @r4victor in #3948
Replica group spot_policy and reservation by @jvstme in #3932
Fix jpd.hostname AssertionError on container stop by @r4victor in #3951
Add NVIDIA Dynamo blog post by @peterschmidt85 in #3949
Support gRPC communication with SMG (Shepherd Model Gateway) workers by @Bihan in #3946
Allow configuring subnet_ids in Azure settings by @jvstme in #3955
[JarvisLabs] Support RTX PRO 6000; update gpuhunt dependency by @peterschmidt85 in #3943

Full changelog: 0.20.23...0.20.24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

0.20.24

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Dev environments

Zed

Services

Replica groups

Shepherd Model Gateway

JarvisLabs

Azure

`subnet_ids`

What's changed

Contributors

Uh oh!