Enable Version-Specific GPU Driver Installation

Current Functionality Location:
Host Pools > Manage Hosts (button) > Properties > VM Deployment > Install GPU Drivers on supported VM sizes.

Requested Enhancement: Introduce a feature allowing users to specify the GPU driver version if the driver extension is detected. This will help manage compatibility issues, for instance, allowing users to avoid issues with GRID Driver version 17.x being incompatible with NVv3 (NVIDIA Tesla M60).

Rationale: Prevent Compatibility Issues: Ensures that the GPU driver version being installed is compatible with the VM sizes, avoiding known problems such as those with GRID Driver version 17.x on NVv3. Increase Operational Efficiency: Saves time as users will no longer need to manually apply the correct driver version or use command-line operations for installation. Maintain Consistency: Guarantees consistent driver installations across different VMs, avoiding discrepancies that can lead to operational issues.

Example Use Case: When attempting to install GPU drivers on NVv3 virtual machines, users encountered compatibility issues with GRID Driver version 17.x. The proposed feature would allow users to specify and install a supported driver version instead, ensuring smooth operation.

Proposed Solution: If the driver extension is found, allow users to input a specific driver version to be installed. This can be done by adding a "Specify Driver Version" field in the VM Deployment settings under Install GPU Drivers.

Refer to the following example for the Azure CLI command:

az vm extension set --resource-group <rg-name> --vm-name <vm-name> --name NvidiaGpuDriverWindows --publisher Microsoft.HpcCompute --settings "{'driverVersion':'538.46'}"

By enabling this feature, streamline driver installation processes, and ensure consistency across our deployments.

1

Comments (0 comments)

Please sign in to leave a comment.