Upgrading (2001)

Building It

The office of our home has had several computers for quite some time. But in the first few months of 2001, my wife and I started talking more seriously about acquiring a dedicated server. This might seem like overkill for a family with three computers already, but with as heavily as we use them it really isn't. And what's more, the very fact that we have three computers networked together creates a surprising amount of impetus for adding a server.

Allow me to explain. Since I did the most printing, the printer was hanging off my machine and was shared by the other two computers. Applying Murphy's law to the situation should make it obvious that the only times my wife ever really needed to print something would be precisely those when my computer was in use for something else. I can't tell you how many on-line video games of mine were disrupted or lost altogether as the Windows print spooler stole CPU cycles in the background from my games.

Similarly, the entire house shares a single email account. I had configured Eudora Pro on all machines to access its data files on my machine—again because I make the most use of email. Of course, this meant that my wife was frustrated quite frequently when she'd be in the middle of corresponding and my machine would crash. Fortunately, Eudora Pro is pretty robust at handling such things, but with the number of cutting-edge games I play, as well as some of my other computer-stressing activities (e.g., developing software), this was a large annoyance for my wife.

Further, because my company uses Symantec's TalkWorks Pro to manage all incoming telephone traffic, switching intelligently between a voice answering machine, data calls and receiving/sending facsimile transmissions as needed, this ate up a fair amount of my machine's system resources. It also suffered from problems mentioned previously with printing and email. I could list any number of other issues (e.g., having a single, SCSI tape backup unit for the entire office, having one SCSI scanner on the same machine, etc.), but the point is hopefully made: we needed (or at least really wanted) a server.

What we didn't need or want, however, were all the hassles that come with trying to integrate Windows NT/2K into our existing network. Although Windows NT is surely more robust than Windows 9x, it's also more difficult to deal with in a networking environment. It also doesn't work very well with some of the applications I wanted to run on a server. Windows 2K would likely have handled the situation a bit better than Windows NT, but I could not realistically use it because of a known problem in the TCP/IP protocol stack that causes latency spikes under various conditions for computers connecting to an Unreal Tournament server I intended to provide for my clan, Steel Maelstrom.

After doing some poking around and some research (I decided I wasn't going to get shorted as I seemed to with my last upgrade), I settled on a strategy. I chose to buy some new components for my existing computer, then cannibalize various parts from around the house and shove them into a new case for our server. In the process, I'd get a CPU speed boost from 933 MHz. to 1.2 GHz. along with PC-133 CAS2 memory, both of which should help out. I also figured I would try to get a motherboard that would let me use 4x AGP in case it became an important factor in future graphics-related endeavors and pick up a cheap 60 GB EIDE hard drive for the server. Hey, what can I say? I wanted to rip our entire CD collection to MP3 format before some RIAA-backed stormtroopers kick in my door and confiscate my MP3 encoding software! I settled upon an Athlon 1.2 GHz. Thunderbird CPU with the AMD recommended cooling fan, a high-quality memory stick from Corsair, a Maxtor hard drive and an ABit KT7A motherboard after reading an incredible amount of praise for that component.

I upgraded my system within less than half an hour (I'm really pretty good at this stuff by now with all the machines I've built over the years) and put the server together in roughly the same amount of time. I fired up the server, and it worked just fine. I'd already taken the liberty of copying the Windows 98 installation files onto the new hard drive, so it was a snap getting Windows installed and running on the server. Sure, the server had an older video card (a Diamond Fire GL 1000 Pro) and neither sound nor monitor, but who cares? It's a server. I figured I should be able to leave it "headless" for the time being, accessing it with Symantec's PC Anywhere as the need arises.

When I turned on my new and improved main desktop machine, however, things weren't so rosy. Windows started and did a bunch of hardware detection and driver installation. After trundling for a while, the machine told me it was ready to reboot, after which it did the whole detection/installation thing once more before requesting a second reboot. Still, this was going far more smoothly than I expected, and I was pleasantly surprised to find Windows 98 doing its job. Naturally, the third time the system rebooted—third time is the charm, after all—I was greeted with the blue screen of death (BSOD) rather than the logon prompt.

Diagnosing It

Phase 1: RAID, VIA and the SB-Live!

I should have known the initially promising results were too good to be true. At this point I wasn't sure what to think, so I rebooted and started Windows in its safe mode to see what the problem was. While device manager reported no conflicts, I did notice a troubling icon next to the on-board RAID controller. I was actually kind of ticked off at the vendor when I saw it, as I specifically requested the non-RAID variant of the board at time of purchase. Apparently, the vendor thought he was doing me a favor by silently "upgrading" me to an ABit KT7A-RAID motherboard because he had none of the non-RAID variety in stock—contrary, of course, to what his web site indicated (sigh). I suppose that's progress; i.e., e-commerce makes it possible for vendors to screw a customer more quickly and easily than ever before.

Seeing this was the problem, I dug about in the documentation and came across a driver disk. After installing the proper drivers for the RAID controller, I rebooted the system again and was pleased to see that the BSOD had gone away. "How about that," I thought to myself, "I had a simple problem that required a simple and obvious solution. How refreshing!" After entering my password, Windows 98 continued to load, and I was presented with my desktop. I checked device manager and was glad to see that everything seemed to be functioning. I figured my work was done.

I sure am a naive idiot sometimes. After about five to ten minutes of use, the system locked hard. I wasn't doing anything extraordinary. In fact, I was just reading a few email messages. It was about 9:30 a.m. local time, and I hadn't yet had a chance to check my email since I began working on the hardware upgrade process some ninety minutes earlier. Heck, Windows itself locks up from time to time, right? I figured it was no big deal, so I powered the system down completely and restarted the computer. Again I could get into Windows, but the system locked up after a few minutes of use.

At this point, I figured I had some kind of problem. I wasn't sure just what as of yet, but I figured two hard freezes probably wasn't a coincidence. Despite having installed the CPU cooler correctly and the temperatures seeming to be in range, I was a bit paranoid in light of things I'd read about the additional heat generated by Athlon chips. Thus, I opened up the case and rigged a temporary external fan to blow more air across the motherboard to see if heat was the problem. It wasn't. The system gave the same lockup during a third attempt at doing something useful. I stowed the external fan and left the case open for the time being. Foreshadowing: sign of quality films and literature everywhere.

Upon further thought, the only commonality I had noticed among the three lockups thus far was that a sound had been playing in each case. While reading mail, Eudora Pro had played its incoming-messages sound just prior to the first freeze. Just prior to the second freeze, ICQ had beeped at me to let me know someone was now on-line. Just prior to the third freeze, Microsoft Word had made its default error sound when I tried to move the cursor to the end of the line when it was already there.

After recognizing the common element, I had to wonder: maybe the problem was with my sound card? When it comes to sound cards, I've been more than a bit conservative after having problems in years past with "non-standard" (i.e., non-Creative Labs) cards. I couldn't imagine that my Creative Labs SB-Live! X-Gamer sound card could be the problem, but I figured I had to start somewhere. The sound card was my only suspect, so it made the most sense to start there, despite my doubts.

Thus, I went right for the problem's throat: I powered the machine down, pulled the SB-Live card out, and restarted. I was genuinely surprised when the computer entered Windows and let me work for longer than half an hour. I expected a freeze with each keystroke, but the freeze never came. Since this was much longer than the system had operated previously, I took it as confirmation that, in fact, the sound card was somehow involved in the problem.

The most obvious possibility was that the sound card was conflicting somehow with some other device in the system. I checked notes that I had taken prior to pulling the card, and I noticed that it had been sharing an IRQ with the onboard RAID controller. I decided that while it really shouldn't be a problem—after all, sharing interrupts is supposed to be a well-supported feature these days—I would reinstall the sound card and disable the RAID controller for my next attempt. Thus, I fussed about in the BIOS until I found a way to disable the RAID controller, plugged the sound card back into the machine, and restarted it.

Sure enough, I got into Windows and worked for a while. It seemed like I had fixed my problem until twenty minutes later my machine froze again. This time, at least, Windows had the courtesy to give me the BSOD. Despite the fact that I had disabled the RAID controller in the BIOS, however, the BSOD was showing an address in the software driver for the RAID controller. At this point I was pretty confused and decided to get help. Naturally, ABit wasn't interested in helping, and I knew from previous experience that VIA (the motherboard's chipset manufacturer) wouldn't be interested either. As an aside, motherboard and chipset vendors remind me a lot of televangelists; i.e., they're perfectly willing to sell you a bill of goods and take your money, but are neither interested in nor capable of answering any tough questions.

Attention religious persons: My prior remark was not intended as a condemnation of religion in general, only a specific breed of televangelists in specific. Where the persons to whom I seek to refer are concerned, I can't help but think that the extension of the very term 'televangelist' is a proper subset of that for the terms 'huckster', 'liar', 'cheat', 'con-man', etc.

At any rate, I turned to the web. At times like that I thank God for the Internet. Fortunately I was able to find Paul's Unofficial ABit KT7 FAQ, which is now hosted at the official VIA site. I guess VIA does care enough about supporting their customers to at least host the work of others; they're just not willing to do the work themselves. Unfortunately I was a bit taken aback after reading Paul's FAQ. Understand that the motherboard I had just bought was praised lavishly by virtually every review I read. And in every hardware enthusiast forum I checked, I got the same story: the ABit KT7A was the board to buy. Despite all the praise, however, Paul's FAQ had rather lengthy sections detailing common problems with AGP settings, the BIOS, booting/restarting/shutting down, connectors and jumper settings, drivers, graphics cards, heat issues, the RAID controller, seemingly random instability, USB support, mice, power supplies and a disappointing set of problems with various sound cards—most notably, of course, the SB-Live!

In short, the board everybody had recommended without qualification seemed like an anathema to a stable system if Paul's FAQ was any guide. Great. Just great. Anyone who has read my last upgrade saga will recognize a pattern, namely, my making every effort to buy the most stable and powerful motherboard available only to find a number of known problems after the fact. I really don't know what I could have done differently. I did a lot of research, but I never came across the huge list of problems in Paul's FAQ.

Anyway, I was basically stuck with the board. I mean, I could have shipped it back to the vendor, but I'd already been promising my wife that I would get the server assembled, and I frankly hadn't a clue what other board to buy without doing even more copious research. So, with Paul's FAQ in hand, I dug in and started trying to fix all the problems. Many hours later I finally hit upon a combination of things that seemed helpful. To be more specific, I had disabled the RAID controller, plugged the SB-Live! card into the only slot on the motherboard in which it didn't cause the BSOD (PCI slot four for all who are interested), adjusted the memory timings after discovering that the "normal" settings were actually far more aggressive than the "turbo" settings (so much for picking meaningful labels), manually reserved the IRQs for the SB-Live! card (both IRQ 11 and IRQ 5 for SB16 emulation), installed the latest versions of the VIA chipset drivers, and made a number of other supposed stability-enhancing tweaks in the BIOS.

By about 4:00 p.m., I had a system that seemed to be stable. At least, it ran for forty minutes before it froze again (sigh). I suppose I should have looked on the bright side, however, as that was about a 400% improvement from my very first attempts. At this point, however, I was tired of fussing with it, so I set it aside and had an early dinner while writing up my notes on the experience thus far. After dinner, I returned to the problem and was greeted with a welcome surprise: the system was stable.

I used my computer the rest of the evening without incident. I was able to play games, use my productivity software and do pretty much everything I wanted to do. I didn't know what the last problem had been, but I figured that I must have licked it somehow. Maybe the last freeze was just the result of all my fiddling about. I assembled the case at the end of the day and figured I was finally done. I went to bed confident that I had finished the upgrade process and had lost only a whole day's worth of time. That's really pretty good given the sorry state of contemporary computer hardware/software.

Phase 2: Heat

Naturally, I was still being a naive idiot. My initial experiences the following morning lent greater weight to my conviction that the problems were behind me insofar as the machine performed flawlessly. I used it all morning to work on my dissertation, answer email messages, write and debug software, keep my calendar, and even play a few video games at lunch. Unfortunately, shortly after lunch the machine froze and froze hard.

I shrugged this off pretty easily, though, as I'd been running Windows 98 now for almost eight hours. Microsoft's operating systems are not known for stability with good reason, so I rebooted without giving it a second thought. Unfortunately, the system ran for only a few minutes before freezing again. After the third encore of this performance, I was ready to hang my head in my hands and cry. Well, almost; I'm really inclined more toward violence than sorrow in light of my character.

After a few brief moments of hallucinating myself on horseback taking a gladius to the necks of my foes, I snapped out of that pleasant delusion and started trying again to diagnose the problem. I checked all the settings, pulled the sound card again, pulled the network card and even booted Windows in safe mode before it occurred to me that today was a relatively warm day. I was lying on the floor with sweat rolling down my face, struggling to get cards re-seated during one of my many diagnostic procedures when it dawned on me: it was hot in the office. That got me to thinking. After all, heat and computers really don't mix all that well. What's more, the very last lockup I had experienced the previous day (around 4:00 p.m.) was at the very hottest point of the day. Hmm...

I re-rigged the fan that I still had in the office from yesterday and voila! The system would now run just about anything other than my CPU-intensive video games without locking. Yup, I had a heat problem. Fortunately, my case had been designed with just this kind of situation in mind. That is, it had an alcove for a second chassis fan built in from the outset. Sadly there was some confusion over precisely what kind of fan to use. About this time I was wishing I'd shelled out the money for one of those super-high-tech CPU cooling fans by which overclockers swear. I just didn't think it would be a problem; after all, I bought the AMD-recommended cooler.

Whenever I run into such uncertainty regarding components, though, I don't despair. I go to Fry's Electronics. I took the dimensions of the alcove to Fry's, and within a matter of minutes I was on my way home... with the wrong fan. Naturally, I got the wrong size. To his credit, the fellow who "helped" me had no way to know that a 90 mm. fan would be in a bag marked as an 82 mm. fan (sigh). After another trip to Fry's—I brought my tape measure this time—I was home again and installing the second chassis fan.

This time it worked. Even though the temperature was nearly 100 degrees Fahrenheit in my office (it gets kind of warm out here near Los Angeles), the computer continued to run. The CPU temperature was only a few degrees short of the 40 degree Celsius hysteresis point, but it wasn't hitting it, regardless of how hard I stressed the system with the cover on. I figured I had finally licked the problems and could consider getting a fancy CPU cooler in the future if the need arose. For the moment, though, I was all set!

Phase 3: USB Hell

Yet again I'm sad to say that I was still being a naive idiot. I was set only until later that very day when I plugged my Epson Photo PC 3000Z digital camera into a USB port to download some pictures to my hard drive. By now it should be easy to predict what happened. Yes, that's right: my system locked hard instantly. By this point I was done being charitable; i.e., I no longer assumed this was just a one-time thing. My assumption proved correct over the next few tests, as merely connecting that USB device screwed the system completely and instantly every time.

After yet more reading I discovered that VIA is to blame for the USB problem. Apparently, it is a widely known fact—that remains entirely unmentioned by anyone but customers it seems—that VIA's USB support is absolutely pathetic. After installing the latest version of VIA's USB filter driver the symptoms did change. That is, rather than locking both hard and instantly, the system would detect the camera first, and then lock up completely. (sigh) That's progress of a sort I suppose.

To cut this short, I never got the Epson camera to work, no matter what I did. In fairness to VIA and ABit, I had problems with the Epson camera with the previous motherboard as well (an ASUS P3V4X), though nothing as severe. With the P3V4X the camera would be detected, but only if I first uninstalled the camera completely. That is, I could use the camera once and only once before having to completely uninstall and reinstall the camera's drivers and software.

Naturally, in keeping with the general tone of technical support these days, Epson blamed VIA/ASUS after claiming that their camera was not to blame. Since neither VIA nor ASUS could be bothered even to respond to my inquiries, I can only suspect they would blame Epson. What I couldn't help finding amusing was that it didn't matter to Epson that all five of my other USB devices worked just fine without a single incident. Nah, it must be a problem with my motherboard. Yeah, good thinking guys. That's the reasonable conclusion. Epson's support technicians clearly need enemas.

Conclusions

Well, since the end of phase three, I discovered several other minor problems with the ABit motherboard as well. The most annoying of these problems was the complete inability to enable DMA access to my DVD drive for speedy ripping of my CD collection into MP3 format. This was never a problem with my ASUS P3V4X motherboard. With the ABit KT7A-RAID motherboard, however, the system would freeze hard, whenever DMA access was enabled, after only a few seconds of encoding with any of the programs I tried.

As these problems continued to mount, I made a decision to fix my problems with the ABit board once and for all. And unlike so many of my little tales of technological terror, this one has a happy ending. I found a way to solve not just one, or even a few, of my problems. I found a way to solve all of my problems with the ABit board. To be more specific, I removed the ABit board out of my case and replaced it with an ASUS A7V133. Now everything works just as it should, no fussing required.

The difference between the two motherboards is like night and day. Whereas with the ABit KT7A-RAID in place my system would lock, crash or otherwise head south at least twice a day, and feature all kinds of wonderful compatibility problems when it was actually working, my system almost never crashes anymore with the ASUS A7V133 board in place. I installed the ASUS board, and it just worked the first time with every peripheral I've tried (even the Epson camera) and with every task I've attempted. I was a bit leery of ASUS after my experience with their P3V4X, but they have surely regained my trust with the A7V133.

After replacing my motherboard yet again, I was in a bit of a pickle regarding what to do with the ABit board. Upon posting to a Usenet newsgroup, I immediately had offers from people who wanted to pay me roughly 80% of what I'd paid for the board. But I just wasn't sanguine about dumping a potential nightmare into an unsuspecting person's lap. I was going to throw it out until I remembered that a good friend was looking to upgrade his system. Even after completely disclosing my problems to him he was still interested, so I sold it to him for a mere $20 plus shipping.

Yeah, I lost money on the ABit board (around $100), but my conscience was clear. Sadly, my friend had the same kinds of problems I had, but by saving so much on the motherboard he was able to get a much nicer upgrade than he was otherwise able to afford. He and I both knew that if he just couldn't make it work well enough, he could always sell it at a profit on EBay. And I knew my friend was technically competent enough not to be screwed outright as might have been the case with other parties.

So, what lessons can I draw from this? Well, I can think of a couple. First, it really doesn't save any money to buy motherboards cheaply over the Internet. If anything goes wrong, the vendor will take the board back, but he will likely want an ugly restocking fee. In the future (insofar as I'm able), I plan to buy motherboards solely from local stores like Fry's Electronics, which has a liberal return policy. Second, and most importantly, reviews and recommendations from hardware forums mean absolutely nothing! This is twice now that I've basically been burned by following the supposedly well-informed advice of reviewers, when those guys clearly do not run a motherboard in a complete system to report comprehensively on its quirks. I sure hope I'm done upgrading for a while, but when I upgrade next you can bet I won't be buying ABit, and I won't be trusting reviewers regarding what to buy.

04/15/2001

1