Improve host re-imaging failure handling to prevent host pool capacity loss

Problem:
When session host re-imaging task fails, the affected session hosts are removed from the host pool. This results in an unexpected drop in capacity. Hosts are not added back to host pool and autoscale does not replacement hosts to meet the defined base host pool configuration.

Description:
Session host re-imaging failures currently lead to the removal of session hosts without triggering compensatory scaling actions. This disrupts availability and causes manual intervention to restore capacity.

Vision:
Update the failure handling logic so that when a session host re-imaging attempt fails (or any other task requiring the session host to be unavailable), the base host pool capacity is honoured, allowing Nerdio's auto-scaling mechanism to detect the capacity shortfall and either provision replacement hosts or restore original hosts to the host pool. This would maintain compliance with base host pool settings and improve automation reliability.

2

Comments (2 comments)

0
Avatar
Carl Long
Thank you for submitting your feature request—we truly value input from our community.

Next steps:
     • We will review your request and update its status as it progresses through our evaluation process.
     • If any clarification is needed, we'll follow up with you directly in the comments.

We also encourage the community to influence our decision through comments, votes, and feedback.
0
Avatar
Raul Morales

Hi Christian, thank you for the submission!
We've had discussions in the past regarding this behavior, and we believe that using the “Burst beyond base capacity” functionality would be helpful in scenarios where session hosts fail during the provisioning task and transition to an “Unhealthy” state. 
If we were to allow session hosts to be created to replace “Unhealthy” VMs, this could be troublesome and increase costs if not closely monitored, as it could result in potentially creating a large number of session hosts to compensate. Using the burst functionality is a good middle ground.

Related to your other post, using the above functionality and configuring these advanced app settings would allow more flexibility when handling session host provisioning issues.
For Auto-scale tasks:
AutoScale:CleanupAttempts
AutoScale:RestartAttempts

For manual provisioning tasks:
Provision:MaxCleanupAttempts
Provision:MaxRestartAttempts

Please sign in to leave a comment.