Untrustworthy Numbers: a Troubleshooting Lesson

Recently I ran into a weird issue when running some comparative graphics benchmarks which illustrate why you shouldn’t always trust the numbers.

The scenario involves graphics cards, a CPU cooler, and why one system’s quirks may affect your attitude about a completely different system. This little puzzle also illustrates how we tend to trust numbers thrown up onto a screen by a benchmark without always thinking through the implications.

I needed a GTX 980 for some comparative benchmarking. It so happens I have two of them, in two different PCs which see use mostly during our Friday night LAN parties. The test system runs a Core i7-6700K. I’d just wrapped up a set of benchmarks using an Nvidia GeForce Titan X, so I pulled that out and grabbed one of the two GTX 980s. Things started getting weird from there.

I noticed immediately that the test system noise level increased substantially. I thought it a bit odd, but the GTX 980 came out of a system that ran somewhat louder than the test system anyway, so ignored the higher fan sound. I initially believed the GTX 980’s fan ran a little louder than expected.

Mistake number one. Just because one system behaves a little differently doesn’t mean that behavior gets transferred to another system just because you move one component.

I fired up 3DMark and ran three separate benchmarks: Fire Strike, Fire Strike Extreme, and Fire Strike Ultra (which runs at 4K UHD resolution). The benchmark numbers looked lower than the Titan X — nearly 40% lower. I dutifully recorded those numbers, then went to the next test.

Mistake number two. Unthinkingly trusting the numbers.

Next came Ubisoft’s shiny new 3rd person shooter, The Division. The last big update to the game added a pretty spiffy built-in benchmark, so I ran all four tests in my suite: 2560 x 1440 at the high preset, then with every possible setting maxed out, 3840 x 2160, also at high and max. The benchmark built into The Division gives you a little more data than just frame rates. One key parameter is CPU and GPU utilization, expressed as percentages.

I noticed CPU utilization hit 80-plus percent and GPU utilization at around 50%. The Titan X had run the same test at a much lower CPU utilization number — close to 50% — whereas the GPU usage came in as high as 97%. After repeating the benchmark several times, I noticed I obtained oddly different frame rate numbers, but the CPU utilization seemed unusually high, and the GPU usage low.

So I thought maybe this particular GTX 980 ran hot, so I swapped in my second GTX 980. The same thing occurred — loud fan noise, inconsistent benchmark results, and low GPU utilization.

This time, I’d been running with the case side open, so I shined a flashlight inside the case and noticed this.

small Bad Connection

See that loose fan connector? It’s not really a fan connector. It’s the power connector for the pump that’s part of the Corsair sealed liquid cooler, which meant the coolant pump hadn’t been active. I probably yanked it out by accident when removing the Titan X graphics card. The loud fan noise didn’t come from the graphics card. The Corsair radiator fan had ramped up to high speed to try to keep the dribble of coolant cool. Normal convection inside the cooler still circulated the coolant, but much more slowly than with the pump enabled.

In other words, the CPU had been going into overheat protection, running at a much slower clock rate.

There aren’t enough facepalms to adequately describe what I felt at that moment.

Plugging the coolant pump power connector solved all the problems. Noise levels returned to nearly inaudible levels at idle. Rerunning the benchmarks resulted in more consistent and more accurate numbers. The world seemed sane again.

So remember: just because a number pops up onto a screen from an application doesn’t mean you should trust it, particularly when other system behaviors seem even a little out of whack. Don’t assume that your experience with one system means another, completely different system, will behave similarly. The glory of the PC platform means every system will likely behave a little differently.

Thus endeth today’s lesson.

 

Leave a Reply