Everyone is biased.
Everyone is biased, but it can be very hard to acknowledge those biases. That’s why it’s an incredible advantage in life if you can look past the vanity filters and special effects that your superego places in front of our mind’s eye and really see the truth about yourself.
When it comes to computer problems, I am biased towards suspecting software over hardware. It’s not hard to understand why. I am a programmer. I’ve spent an untold number of hours having a compiler or an OS tell me that I’ve made a mistake. I know how easy it can be for software to do things that neither the author or Ada Lovelace ever intended. Unfortunately, that bias sometimes delays my ability to solve problems.
About a year ago, my computer started acting odd and occasionally blue screening while I worked with large files or a large number of files. The start of the behavior wasn’t easily correlated with any new program installations or other software updates. I spent hours updating and rolling back drivers, disabling devices and even taking memory dumps for use with my debugger before finally deciding to re-install Vista x64. I remember having the issue arise while friends were over and getting the annoying “Vista sucks, get a Mac!” response from someone. I admit to feeling that maybe they were right after all. It was frustrating.
I back up my PC regularly and store setup files for older programs, so a re-install only costs me a few hours. Just as I was rebooting the PC to start the new install, I decided to run memtest, a utility which stress tests computer memory and can identify problems with RAM. I did this on a lark, without thinking much about the fact that I had in fact bought two new sticks of RAM for the computer a few months before the problems started. I guess I just assumed that that couldn’t be the underlying cause.
Well, memtest identified a lot of errors, as you’ve probably guessed. I tested the PC without the two new sticks and it worked perfectly. After a little research into my order history on the Newegg site, I realized that I had in fact bought incompatible memory. I can’t explain how it happened, really. I try to be careful and use due diligence when deciding on what type of hardware to buy for my PC. I guess I just screwed up. It never occurred to me that the memory could have been the problem, because it had worked perfectly for a few months. I still don’t understand exactly how that is possible. I ordered new memory and never had a re-occurrence of the problem.
Fast forward to about a month ago, where I started noticing that my CPU usage would sometime spike to 100% while watching internet videos. Now, I freely admit that I work my computer like an unloved pack mule. I usually have tens of Firefox windows open, while working on music and playing chess against the computer. But there was no way I should be getting CPU spikes like that. After a while, it occurred to me that my computer just wasn’t snappy in general. I started checking Windows Update history again and updated all of my device drivers. Nothing helped. My computer would be running along smoothly and then for a minute or so the CPU usage would just go wild. Then, just like that, it would drop down to more normal levels.
I observed the pattern over and over again using a bunch of Flash videos and Resource Monitor and finally fingered “System Interrupts” as the problem. I did some web research and found a bunch of message board and blog posts suggesting that the problem could be related to DPC calls (ignore that part if you’re not a programmer) caused by a device driver. I had had some issues with the Nvidia drivers that my motherboard and video card use during the early Vista days, so I was all too eager to add them to the head of the suspects list. I’m definitely not going to use Nvidia hardware for the next computer I build.
The XPerf tool that comes with the Windows 7 SDK indicated that a large number of the calls were coming from the USB Controller. I looked to the ceiling and yelled “NViiiiiiiidiiiiiiiiiaaaaaaaah!” in my best Schwarzenegger voice. Then, I started updating drivers and looking around for fixes or workarounds. I found a few KB articles concerning Nvidia USB drivers and considered whether my symptoms might be related.
At some point, I noticed that the CPU spiking also happened when using Ableton Live. Could it have been the drivers for my external soundcard instead? I’ve had some problems with the drivers for my Presonus Firebox. I visited the Presonus site and updated the driver to the (finally released) Windows 7 drivers, but still no luck.
I stumbled upon one message board post mentioned that the issue might be related to an overheating CPU, but I didn’t bother to do anything more than skim it. I was sure that wasn’t the problem, but I’m not sure exactly why I felt that way. Maybe it was ego, because I’d built the box? Maybe it was excess pride in the (count them) three case fans I’d used in the case? Could it have been unresolved anger towards Nvidia or Presonus for the problems I’d had in the past? Maybe it was partially all of those things, but most especially it was my bias towards seeing things as software rather than hardware problems. In any case, I didn’t even consider that as a possible cause.
At some point, I got sick of having to open a bunch of Flash videos or Ableton Live in order to test possible fixes to the problem. So, I decided to use one of the CPU benchmarks included in Sisoft Sandra instead. Again, this was only to make the testing process easier, as I was sure the problem had something to do with bad drivers. The results of the processor arithmetic benchmark shocked me. My Core 2 Quad Core Q6600 CPU was scoring around the Pentium IV range, at something like 1/5 of the performance that would be expected. You can imagine that I wasn’t happy. The problem wasn’t that system interrupts were taking up too much CPU time, it was the the CPU wasn’t working at nearly the rate that it should have been.
I wasn’t looking forward to the possibility of having to spend $300 dollars on a new CPU. Now that I think about it, the fact that hardware problems often cost money to fix might be another factor in my tendency not to look in that direction when problems arise.
Eventually I thought back to that message board post about the CPU temperature and I checked using Core Temp. The CPU temperature was hitting 60 degrees at idle and spiked to 87 degrees under load. The Intel thermal specification for the chip is 62.2 degrees, so you can imagine how hot my guy was getting. My first thought was to remove and re-install the CPU cleaner, but I couldn’t find the thermal grease tube I had saved from the initial installation. After searching for a bit, I resolved myself to having to make a trip to a computer supply store. In the meantime, I dusted the (count them) three case fans and the CPU cooler and re-tested the CPU temperature. It had dropped back to 45 or so degrees at idle and maxed out at 67 degrees under load.
I went crazy opening Flash videos and the computer didn’t blink. Ableton Live running while watching Flash videos? No problem. I tortured poor Lord Vader* and the CPU didn’t break a sweat. Problem solved. With five minutes of work with a can of compressed air. Seriously.
Anyway, that’s about twelve hundred words just to remind myself that I am biased. I always assume that computer problems are software related and not hardware related. In the future, I need to think like an engineer and run basic hardware tests before I head down the driver rabbit hole.
*Yes, that’s my computer’s name.