AMD Opteron, Athlon Dual-Core CPU TSC problem

, posted: 18-Jan-2008 15:35

To all system administrators: AMD Opteron and Athlon CPUs which are dual-core are prone to TSC skew. Only rarely will this cause complete system failures, symptoms are more along the lines of unexpected behaviour and intermittent faults. When you see a ping time of -21ms you know something is wrong! Mostly not a problem if you run any modern Linux distro because the kernel uses the more modern ACPI HPET (high precision event timer). Bad luck for Windows admins though, you 'll need Windows Server 2008 to properly avoid the issue, for server 2k3 the fix is to add /usepmtimer to the boot.ini however this incorporates a performance penalty and reduces scalability of the system.

Recently at work some pretty major systems (ie Windows 2k3 Oracle servers) failed because of this issue but, had those systems been Linux Oracle servers I can guarantee there would have been no problem. Understandably I'm really unimpressed with Microsoft and Windows - how hard can it be to patch the OS to use ACPI-HPET properly? Or even to release a patch that uses PM timer when an affected CPU is detected? Linux kernel dev's have known about this since 2005 and fixed it very promptly! What on earth is stopping Microsoft from fixing this problem?

To be fair; this is not entirely Microsoft's fault but again to be fair, Linux has a built-in workaround since 2006 while Microsoft's workaround is manual. Please leave a comment if you've had to deal with this problem. There is much google mojo here so please offer solutions and describe symptoms to help others.

barf's profile

Stuart MacIntosh
New Zealand

Hello world.