Steps to reproduce
- Create a cloud fleet
nodes: 1 or an SSH fleet with 1 node.
- Start a service with 1 replica on this fleet.
- Scale the service to 2 replicas.
- Stop the run.
Actual behaviour
- After step 3, the second replica is stuck in
submitted.
- After step 4, the run is stuck in
terminating, no jobs are being stopped.
NAME BACKEND GPU PRICE STATUS SUBMITTED
test-service - - terminating 10 mins ago
group=0 replica=0 aws (us-east-2) - $0.0006 (spot) running 10 mins ago
replica=1 - - submitted 8 mins ago
Expected behaviour
- After step 3, the second replica fails with
FAILED_TO_START_DUE_TO_NO_CAPACITY.
- After step 4, the run stops.
dstack version
0.20.19
Server logs
[22:43:13] DEBUG dstack._internal.server.background.pipeline_tasks.base:357 Processing jobs item 875865df-e75b-429c-80fc-7c9306ec487c
DEBUG dstack._internal.server.background.pipeline_tasks.jobs_submitted:337 job(875865)test-service-0-1: assignment has started
DEBUG dstack._internal.server.background.pipeline_tasks.jobs_submitted:591 job(875865)test-service-0-1: fleet test-fleet is full, retrying assignment
DEBUG dstack._internal.server.background.pipeline_tasks.base:364 Processed jobs item 875865df-e75b-429c-80fc-7c9306ec487c in 0.040
[22:43:15] DEBUG dstack._internal.server.background.pipeline_tasks.base:357 Processing runs item c0dc9825-8d9c-41fe-9b0f-2636ef128292
DEBUG dstack._internal.server.background.pipeline_tasks.runs:797 Failed to lock run c0dc9825-8d9c-41fe-9b0f-2636ef128292 jobs. The run will be processed later.
DEBUG dstack._internal.server.background.pipeline_tasks.base:364 Processed runs item c0dc9825-8d9c-41fe-9b0f-2636ef128292 in 0.018
Additional information
Introduced in 0.20.19
Steps to reproduce
nodes: 1or an SSH fleet with 1 node.Actual behaviour
submitted.terminating, no jobs are being stopped.Expected behaviour
FAILED_TO_START_DUE_TO_NO_CAPACITY.dstack version
0.20.19
Server logs
Additional information
Introduced in 0.20.19