Monday, March 29, 2010

The Devil in the Gigabit Network

Nov 2, 2010: late edit..

For a good while now, despite trying to weed out sources of EMI, different switches, different cords, jumbo frames, no jumbo frames, 2003 or 2008 Windows Server, etc, there has been a problem at my work's Accounting department, wherein use of Quickbooks or "large" (say Excel files bigger than a few MB), would intermittently coincide with a temporary disconnect from the domain controller or network shares. As far as I could tell, smaller operations involving Internet use, non-domain, and non-Gigabit activities are not impacted. Still waiting to see if we can track down some known EMI sources that may be interfering, but meanwhile I believe I found some things that help us and a lot of people out...

Interrupt Coalescing / Adaptive Interrupt Moderation
Intel has it. Via has it. Broadcom has it. Realtek has it. Regardless of the name, the concept is that as an adapter receives a lot of Ethernet frames, it can tax the CPU by having it "interrupt" on each one: this feature would avoid that by bundling frames together into multiple frames per interrupt. From what I can tell, latency will probably be higher if it works properly, but more data will pass through: however, it seems at least at the consumer / business level, it does more harm than good & the O/S is better at managing this traffic (especially Windows Vista / 2008 / 7, given its default use of adaptive windowing).

I turned it off in the adapter settings on the problem machines: I can get 40-50% network utilization on a 200MB file, and early results with QB & Excel are promising (its worked longer than half-hour without hiccup).

Offloading (IPv4 + TCP + UDP + Checksum + Large Receive)

1. Turn off IPv4 & IPv6 protocol offloading on non-server-grade adapters.
2. TCP & UDP checksum offloading do not seem to hurt things.
3. I've also been turning off Large receive offloading: supposed to buffer reception of packets for processing.

Conclusion

Interrupt moderation may be fine for an Internet-facing server, but when big files and traffic are regularly generated, it may be confusing the heck out of the adapters in question. Funny enough, it seems the Killer NIC we were using on the server (while getting the Intel NICs figured out) was decently stable for a reason: no interrupt moderation & offloading is done by its 400mhz PowerPC chip; however, no jumbo support, and its driver can get BSOD-unstable on a 2008 setup.

References
*"Interrupt Coalescence (also called Interrupt Moderation or Interrupt Blanking)"
*"How Does Interrupt Coalescing Affect Low Latency High-Performance Messaging?"
*TCP Wikipedia article
*Forum post on a gamer's encounter with Interrupt Moderation

No comments:

Post a Comment