Skip to content

[Bug]: Scalable backend-dependent service assigned to SSH fleet and never provisioned #3934

@jvstme

Description

@jvstme

Steps to reproduce

  1. Create an elastic fleet and an SSH fleet.
    $ dstack fleet
     NAME           NODES  GPU  SPOT  BACKEND       PRICE  STATUS  CREATED      
     default        0..    -    auto  *             -      active  2 months ago 
     on-prem        1      -    -     ssh           -      active  1 hour ago   
        instance=0         -    -     ssh (remote)  -      idle    1 hour ago
  2. Apply this configuration.
    type: service
    name: test-service
    image: nginx
    port: 80
    retry: true
    
    backends: [aws]
    
    replicas: 0..1
    scaling:
      metric: rps
      target: 1
  3. Request the service to trigger scaling.

Actual behaviour

The service is assigned to the SSH fleet and never provisioned.

$ dstack event --within-run test-service
[2026-06-05 11:06:50] [👤admin] [run test-service] Run submitted. Status: SUBMITTED
[2026-06-05 11:06:51] [run test-service, gateway main/my-gateway] Service registered in gateway
[2026-06-05 11:06:51] [run test-service] Run status changed SUBMITTED -> PENDING
[2026-06-05 11:07:07] [job test-service-0-0] Job created on new submission. Status: SUBMITTED
[2026-06-05 11:07:07] [run test-service] Run status changed PENDING -> SUBMITTED
[2026-06-05 11:07:11] [job test-service-0-0] Job status changed SUBMITTED -> TERMINATING. Termination reason: FAILED_TO_START_DUE_TO_NO_CAPACITY (Fleet is at capacity)
[2026-06-05 11:07:14] [run test-service] Run status changed SUBMITTED -> PENDING
[2026-06-05 11:07:18] [job test-service-0-0] Job status changed TERMINATING -> FAILED
[2026-06-05 11:07:58] [job test-service-0-0] Job created on new submission. Status: SUBMITTED
[2026-06-05 11:07:58] [run test-service] Run status changed PENDING -> SUBMITTED
[2026-06-05 11:07:58] [job test-service-0-0] Job status changed SUBMITTED -> TERMINATING. Termination reason: FAILED_TO_START_DUE_TO_NO_CAPACITY (Fleet is at capacity)
[2026-06-05 11:08:03] [job test-service-0-0] Job status changed TERMINATING -> FAILED

Expected behaviour

The service is assigned to the elastic fleet and provisioned.

dstack version

0.20.23 (cannot repro on 0.20.22)

Server logs

Additional information

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions