January 16, 2017 Leave a comment
The ASH charts in OEM are great utilities for getting a quick summary of your system’s activity. However, these results can be misleading because of how the data is represented on screen. First, ASH is data is collected by sampling so it’s not a complete picture of everything that runs. Another thing to consider is that the charting in OEM doesn’t plot every ASH data point. Instead, it will average them across time slices. Within Top Activity and the ASH Analytics summary charts these points are then connected by curves or straight lines which then further dilutes the results.
Some example snapshots will help illustrate these issues.
First, note the large spike around 1:30am on the 16th. This spike was largely comprised of RMAN backups and is a significant increase in overall activity on the server with approximately 9 active sessions at its peak and a sustained activity level of 8 for most of that period.
Next, let’s look at that same database using ASH Analytics and note how that spike is drawn as a pyramid of activity. While the slope of the sides is fairly steep, it’s still significantly more gradual than that illustrated by the Top Activity chart. The peak activity is still approximately 9 active sessions at its highest but it’s harder to determine when and where it tapers off because the charting simply draws straight lines between time slices.
But, ASH Analytics offers a zoom window feature and using that we can highlight the 1am-2am hour and we get a different picture that more closely reflects the story told in the Top Activity chart. Note the sharp increase at 1:30 as see in the Top Activity. Also, note the higher peaks approaching and exceeding 12 active sessions whereas each of the previous charts indicated a peak of 9. The last curiosity is when the activity declines it is more gradual than the Top Activity but steeper than the Analytics overall chart.
The charts above demonstrate some ambiguities in using any one visualization. In those examples though, the data was mostly consistent in magnitude, but differing on rate of change due to resolution of the time slices.
Another potential problem with the averaging is losing accuracy by dropping information. For instance, in the first chart above, note the brief IO spike around 9:30am with a peak of 6 active sessions. If you look on the ASH Analytics summary chart it has averaged the curve down to approximately 2 active sessions. If we now go to the ASH Analytics page and zoom in to only the 9am-10am hour, we see that spike was in fact much larger at 24! This is 4 to 12 times our previous values and more importantly, running at twice the number of available processors. It was a brief surge and the system recovered fine but if you were looking for potential trouble areas of resource contention, the first two charts could be misleading.
I definitely don’t want to discourage readers from using OEM’s ASH tools; but I also don’t want to suggest you need to zoom in on every single time range in order to get the most accurate picture. Instead I want readers to be aware of the limitations inherent in data averaging and if you do have reason to inspect activity at a narrow time range, then by all means zoom in with ASH Analytics to get the best picture. If you need larger scale summary views, consider querying the ASH data yourself to find extreme values that may have been hidden by the averaging.