Nerdio Advisor: Logical Overview of VM Right Sizing for Personal Desktops
This article provides an in-depth explanation of the primary data used by the VM Right Sizing service. This data is critical for creating recommendations based on performance counters from a Log Analytics workspace. We also explore the various parameters, benchmarks, rule settings, and algorithms that drive the VM Right Sizing service.
Primary Data Overview
The VM Right Sizing service relies on primary data to create recommendations for optimizing the performance and resource utilization of virtual machines (VMs). This data is collected from various sources, including performance counters, benchmarks, and rule settings.
The following performance counters are used in the VM Right Sizing service:
Processor Information/% Processor Time
To ensure the service works effectively, it is essential to configure your Log Analytics workspace for the host pool and enable these performance counters with intervals no greater than 10 minutes (1 minute is recommended). Additionally, the Log Analytics agent should be running on the target VMs.
The VM Right Sizing service calculates primary indicators using this data, which is then displayed in a detailed usage report on the ADVISOR/Recommendations page.
The VM Right Sizing service calculates several percentiles to assess the performance of VMs. These percentiles are based on the data collected:
The 90th percentile of Processor Information/% Processor Time (percentage to maximum load).
The 90th percentile of Processor Information/% Processor Time on the intervals where the number of active sessions is greater than zero.
The 90th percentile of the percentage of used memory, calculated from Memory/Available MBytes.
The 90th percentile of the percentage of used memory on the intervals where the number of active sessions is greater than zero.
These percentiles provide valuable insights into the VM's performance and resource utilization.
To ensure reliable recommendations, the VM should be powered on for a minimum duration of 72 hours. This duration does not need to be continuous; it can accumulate over time (for example, 7 days with 10 hours per day, plus an additional 2 hours). If the data exists for less than 72 hours but more than 24 hours, recommendations are shown with a warning.
Data Analysis Period
The VM Right Sizing service analyzes data for the last 30 days. However, if a VM changes its size during this period, the analysis period may be shorter, because the data prior to the resizing is not taken into account.
Some of the parameters related to the VM Right Sizing service can be adjusted via the application service settings. These settings allow for fine-tuning the behavior of the service to match specific requirements. The following are the available settings:
This determines whether to display VM statistics to the user if the statistics period is shorter than this duration. Default: 1 (hour)
This determines whether to show recommended VM sizes to the user if the statistics period is shorter than this duration. Default: 24 (hours)
This setting determines when to show a warning if the statistics period is shorter than this duration. Default: 72 (hours)
This specifies the baseline percentile used for calculations. Default: 90 (percentile)
To enable CPU performance and memory amount comparisons, the VM Right Sizing service periodically retrieves data from GitHub. These benchmarks are used to assess the performance of different VM sizes. The service maintains an in-memory cache of this benchmark data.
When multiple benchmarks are available for one VM size, the service uses the average value.
If there are no Windows benchmarks, the Linux benchmark value is used with a factor of 0.967 (median ratio across all benchmarks having items for both OS).
Rules that include the Right Sizing analyzer have specific settings that influence the behavior of the service. These settings include the triggers for CPU and RAM usage, as well as the "Active Sessions Only" flag.
CPU Decrease Trigger
This indicates excessive CPU performance, and it is used to calculate the scale factor. Default: 60%
CPU Increase Trigger
This indicates insufficient CPU performance and is also used to calculate the scale factor. Default: 80%
RAM Decrease Trigger
This indicates excessive RAM amount and is used to calculate the scale factor. Default: 60%
RAM Increase Trigger
This indicates insufficient RAM amount and is also used to calculate the scale factor. Default: 80%
Active Sessions Only Flag
This specifies whether primary data for analysis should consider full usage (when disabled) or active sessions only (when enabled). Default: Enabled
Ideal Scale Factor
When the VM Right Sizing service identifies a need to increase or decrease CPU or RAM resources, it calculates an ideal scale factor. The formula for calculating the scale factor is based on the usage and trigger values.
For an increase, the formula is: `factor = usage / Decrease Trigger`
For a decrease, the formula is: `factor = usage / Increase Trigger`
For optimal usage, the factor is set to 1.
The ideal scale factor cannot be more than 3 or less than 0.3. If it falls outside this range, it is adjusted to fit within it.
Using the ideal scale factors and benchmark data, the VM Right Sizing service calculates the ideal CPU performance and RAM capacity for each VM size. This information is used to make recommendations for resizing VMs.
In some cases, the VM Right Sizing service employs fuzzy-like logic to compare CPU performance, RAM capacity, and prices. This approach considers values with small differences as equal, using a noticeability ratio:
CPU Performance Ratio: 1.10
RAM Capacity Ratio: 1.05
Price Ratio: 1.02
This allows the service to respect values with minor differences as equivalent.
The VM Right Sizing service iterates through all available VM sizes for the same region and subscription to determine the optimal size. Several rejection criteria are applied, which prevents recommendations from being surfaced. These include:
GPU availability mismatches
Specific size families (A and B-size families are excluded from recommendation)
Remaining sizes that pass the rejection criteria undergo sorting, and the algorithm used depends on the desired targets for CPU and RAM resources.
For CPU Decrease or Increase: Simple sorting by price ascending.
For Optimal RAM: Smart sorting, which considers both CPU and RAM needs for coverage.
For RAM Increase: Smart sorting, which calculates a compound coverage score.
The final recommendation is typically based on the sorted list of sizes, and the first three sizes are saved to the database and displayed in the user interface.