Skip to content

0.20.24

Latest

Choose a tag to compare

@peterschmidt85 peterschmidt85 released this 11 Jun 13:55
cd0e93c

Dev environments

Zed

dstack now supports Zed as a dev environment IDE:

type: dev-environment
ide: zed
resources:
  gpu: L4

Once the dev environment is up, the CLI prints a zed:// link that opens the remote project in Zed over SSH. Since Zed doesn't require any plugins, no server pre-installation is needed — the Zed server is installed automatically on first connect.

✗ dstack apply
...
Submit a new run? [y/n]: y
 NAME                     BACKEND                  GPU                     PRICE       STATUS      SUBMITTED
 fast-fly-1               aws (us-east-2)          gpu=L4:24GB:1           $0.1838     running     16:36
                                                                           (spot)

fast-fly-1 provisioning completed (running)
pip install ipykernel...

To open in Zed, use link below:

  zed://ssh/fast-fly-1/dstack/run

To connect via SSH, use: `ssh fast-fly-1`

To exit, press Ctrl+C.

Services

Replica groups

The spot_policy and reservation properties can now be specified at the replica group level. This allows distributing replicas across reserved and spot capacity, e.g., running baseline replicas on a reservation while autoscaling overflow replicas on spot instances:

type: service
image: my-image
port: 80

replicas:
  - name: baseline
    reservation: my-reservation
    count: 1

  - name: overflow
    spot_policy: auto
    count: 0..3
    scaling:
      metric: rps
      target: 1

Shepherd Model Gateway

Services using Shepherd Model Gateway now support gRPC communication with both vLLM and SGLang workers. Previously, only the SGLang runtime with the HTTP connection mode was supported.

Below is an example service configuration running vLLM gRPC workers:

type: service
name: prefill-decode

env:
  - HF_TOKEN
  - MODEL_ID=zai-org/GLM-4.5-Air-FP8

replicas:
  - count: 1
    image: python:3.12-slim
    commands:
      - pip install smg
      - |
          smg launch \
            --pd-disaggregation \
            --model-path $MODEL_ID \
            --enable-igw \
            --host 0.0.0.0 \
            --port 8000 \
            --prefill-policy cache_aware
    router:
      type: sglang
    resources:
      cpu: 4

  - count: 1
    image: vllm/vllm-openai:latest
    commands:
      - pip install -U "vllm[grpc]"
      - |
          python3 -m vllm.entrypoints.grpc_server \
            --model $MODEL_ID \
            --host 0.0.0.0 \
            --port 8000 \
            --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_producer"}'
    resources:
      gpu: H200

  - count: 1
    image: vllm/vllm-openai:latest
    commands:
      - pip install -U "vllm[grpc]"
      - |
          python3 -m vllm.entrypoints.grpc_server \
            --model $MODEL_ID \
            --host 0.0.0.0 \
            --port 8000 \
            --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_consumer"}'
    resources:
      gpu: H200

port: 8000

dstack automatically detects each worker's runtime (vLLM or SGLang) and connection mode (HTTP or gRPC) by probing it. With gRPC, the SMG router tokenizes requests once and routes on tokens instead of raw text, reducing duplicate work and making cache_aware routing more effective.

JarvisLabs

The jarvislabs backend now supports offers with RTXPRO6000 GPUs.

Azure

subnet_ids

Similarly to vpc_ids, the azure backend now allows selecting specific subnets to be attached to dstack VMs via the new subnet_ids property, mapping regions to subnets in the <resource-group>/<vnet>/<subnet> format:

projects:
  - name: main
    backends:
      - type: azure
        subscription_id: ...
        tenant_id: ...
        creds:
          type: default
        regions: [westeurope]
        subnet_ids:
          westeurope: my-resource-group/my-vnet/my-subnet

This is useful when the VNet contains subnets that dstack shouldn't pick automatically, e.g. subnets delegated to other Azure services.

What's changed

  • Fix zero scaled services assigned to wrong fleets by @r4victor in #3939
  • Set runner/shim default compiled versions to latest by @r4victor in #3941
  • Implement SSH connection pool for runner instances by @r4victor in #3936
  • [chore]: Move format_backend() to common utils by @jvstme in #3942
  • Drop non-linux runner builds and local backend by @r4victor in #3944
  • Support Zed as dev-environment IDE by @r4victor in #3947
  • Fix dropping ssh connections to non-provisioned terminating instances by @r4victor in #3948
  • Replica group spot_policy and reservation by @jvstme in #3932
  • Fix jpd.hostname AssertionError on container stop by @r4victor in #3951
  • Add NVIDIA Dynamo blog post by @peterschmidt85 in #3949
  • Support gRPC communication with SMG (Shepherd Model Gateway) workers by @Bihan in #3946
  • Allow configuring subnet_ids in Azure settings by @jvstme in #3955
  • [JarvisLabs] Support RTX PRO 6000; update gpuhunt dependency by @peterschmidt85 in #3943

Full changelog: 0.20.23...0.20.24