A well known and documented flaw in the Intel VT-d Interrupt Remapping engine affects early revisions of the Intel 5500/5520 CPUs.
The problem was documented by Intel in a september 2011 update to the 5500/5520 chipset specifications
Many hardware vendors have since chosen to disable this feature on affected systems via BIOS updates.
In my work as a technical support engineer I have come across many different problems caused by faulty Interrupt Remapping, with varying symptoms.
Depending on which device is affected, a system may lose network connectivity, access to storage devices, or experience a panic.
This can result in unmanageable hosts, unresponsive or failed virtual machines, system hangs or unexpected reboots.
On linux hosts, Interrupt Remapping can be disabled by booting the system with
On VMware ESX/ESXi hosts, the same result can be achieved by setting the
iovDisableIR kernel parameter to
Over the course of the past two years, a few hardware and software vendors I have worked with have published articles describing this issue.
Some of these are relatively new, others have been recently updated and improved.
I have referenced them below, in alphabetical order.
Disable Interrupt Remapping UCS for UC Applications
Intel 55×0 Chipset Errata – Interrupt Remapping Issue
HBAs and other PCI devices may stop responding in VMware ESX or ESXi 4 – IBM Servers
Why do I see “kernel: do_IRQ: X.Y No irq handler for vector (irq -1)” messages on systems with Intel 5500 and 5520 chipsets?
Faulty Intel chipsets cause problems with interrupt remapping
vHBAs and other PCI devices may stop responding in ESXi 5.x and ESXi/ESX 4.1 when using Interrupt Remapping (1030265)
Additional note on linux:
Earlier this year, a patch was introduced in the Linux kernel to warn system administrator that their system is affected by this problem.
To my knowledge, this patch is included in recent kernel updates for the OpenSuSE, SLES, Fedora and RHEL linux distributions.