Server Group Technical Support
BluePages Search HelpNow
April 23, 2017
Technotes (FAQs)
Detection of Excessive Interrupt Disablement
Number: T1000678 Visibility: Public/Basic (Internet) Published: February 22, 2012
Segment: Operating System Product: AIX Component: Support information
Author: Kathy Nichols Task: Troubleshoot Topic: Problem Resolution
Platform(s): AIX Version(s): 5.3
Historical path: isg1pTechnote1489
URL: http://www.ibm.com/support/docview.wss?uid=isg3T1000678
Question
Detection of Excessive Interrupt Disablement
Cause
Answer
Functional description
Example error log
Controlling disablement detection
Detection threshold
Error disposition
Limiting error logging
Exemption
AIX 5L Version 5.3 ML3 introduces a new feature which can detect a period of excessive
interrupt disablement on a CPU, and create an error log record to report it. This allows you to
know if privileged code running on a system is unduly (and silently) impacting performance.
It also helps to identify and improve such offending code paths before the problems manifest
in ways that have proven very difficult to diagnose in the past.
Functional description
Use a kernel profiling approach to detect disabled code that runs for too long. The basic
idea is to take advantage of the regularly scheduled clock "ticks" that generally occur every
10 milliseconds, using them to approximately measure continuously disabled stretches of
CPU time individually on each logical processor in the configuration.
NOTE: This is a statistical sampling approach, so resolution is limited to avoid excessive false
positives.
This approach will alert you to partially disabled code sequences by logging one or more hits
within the offending code. It will alert you to fully disabled code sequences by logging the
i_enable that terminates them. In the special case of timer request block (trb) callouts, the
possible detection is triggered by controlling the disablement state within the clock routine,
which invokes registered trb handlers in succession.
NOTE: See the tstart kernel service for more information.
The primary detail data logged is a stack trace for the interrupted context. This will reveal
one point from the offending code path and the call sequence that got you there. For
example, a heavy user of bzero will be easily identified even though bzero may have
received the interrupt. Due to the sampling implementation, it is likely that the same
excessively and partially disabled code will log a different IAR and traceback each time it is
detected. Judgement is required to determine if two such detection's are representative of
the same underlying problem or not.
To get function names displayed in the traceback, it is necessary that the subject code be
built with the proper traceback tables. AIX kernel code and extensions are compiled with -q
tbtable=full to ensure this.
The most recent Lightweight Memory Trace (LMT) entries have been copied to the error log