We have recently experienced an issue where disjoining Entra failed on several VMs being reimaged. This resulted in the reimaging failing to complete as a result of being unable to join the reimaged host to Entra, as well as subsequent Add Host steps failing for the same reason. However, the steps technically end as "Completed" and "Mode: Cleanup", so they are not considered failures and don't trigger alerts.
As a result, I have two suggestions:
1. A process ending with a Cleanup mode should not be regarded as successful. Either it should be a failure (in terms of triggering an alert, at least) or it should be a separate type of result that can be configured to trigger an alert.
2. After a reimaging or Add Host ending in such a state, the system should (or should have the option to) use a new host name for the next attempt. This could amount to building a list of skipped host names that need admin attention (and which can be removed from the list when the problem is corrected), or to blacklisting the name for a set time interval before trying it again.
Comments (1 comment)