Emergent Problems?

1. Overview

I've bemoaned the contemporary state of customer support elsewhere in my writings, and even though I hate to sound like a broken record, it still sucks. I've had so many bad experiences over the years with hardware and software that I'm literally stunned whenever something actually works. It's practically unheard of in computing. There's always some driver problem, some chipset incompatibility, some bug in the software, some bug in the operating system (OS), etc. What moves me to write today, however, is the extreme concentration of awful experiences I've had in the very recent past, none of which make any sense.

At least in the past I could typically trace my problem back to something. I would eventually find the errant driver. I would eventually find documentation on the chipset incompatibility. I would eventually track the bugs to some more or less clearly defined, reproducible set of causal factors. These days, however, it's becoming pointless even to try to solve such problems rationally. What should be entirely deterministic machines seem to be morphing into fundamentally nondeterministic entities. That's not to conclude immediately, of course, that at bottom they are no longer deterministic machines; rather, it is to suggest that perhaps they have become so complex, both in terms of hardware and software, that certain nondeterministic properties are beginning to emerge.

For those unfamiliar with the notion of emergent properties, consider the case of water. At so-called standard temperature and pressure (STP), water is in its liquid form, and, more to the point, it's wet. Wetness is a particularly interesting property, as are many such properties at the macro level, because it seems to be emergent. That is, if one examines the components of water (viz., hydrogen and oxygen) at STP, one won't find anything that resembles wetness. Yet when those components come together in a particular way, suddenly wetness emerges. In effect, water is more than the sum of its parts because certain properties it possesses seem to be emergent and unexplained by the properties of its constituents.

What I'm beginning to wonder about computers along these lines is whether the hardware and software in use have become so complex that various emergent properties might now be exemplified. And further, I wonder whether such emergent properties might not be strictly deterministic as the underlying hardware and software is supposed to be. I don't know how seriously to take these issues, but I'll give a few examples below as to why I'm now considering them at greater length.

2. Senseless Problems, Senseless "Solutions"

2.1. Linksys

In the last few weeks, it sure seems to me like everything that can possibly go wrong around my house has done exactly that. For no apparent reason, a literal plethora of devices have simply stopped working. The most trivial case would be the flapper valve in the toilet in our half bathroom. That's a small thing, of course; it broke, and I replaced it. The replacement has stopped working correctly, however, less than a week later. It is evincing an annoying tendency to pop loose from its mounting entirely for no apparent reason. I've cobbled it in-place with a paperclip for the moment, until I can find the time to investigate. Our VCR, which is admittedly around ten years old, also decided to start eating, rather than playing, tapes. This too isn't a big thing. I didn't know what was wrong with it, but after a quick trip to Victor's TV/VCR Repair shop, it again works... for now.

Anyway, those are problems in the analog world, and in all honesty I've typically found them more intimidating than problems in the digital world. That's probably because I've been a computer geek virtually my entire life. After going nuts over the first TRS-80 computer I ever saw up close at my school, my parents bought me a book about the BASIC programming language for my tenth birthday. Long before my eleventh birthday, I had written thousands of lines of code, despite having only a few minutes of access to that computer each week. It was almost like I had found my calling. I told my parents as a ten-year-old boy that I was going to do computer stuff when I grew up.

The big lesson I learned early in life was that problems with hardware/software were problems in the digital world, and, unlike the analog world, they always had a rational cause and solution. That's a very seductive message for a young boy. No, scratch that. It's a very seductive message for anyone who is committed to rationality; i.e., it's very comforting to know that any problem that arises in the digital world must have some well-defined (if obscure) cause. Solving such problems is always a simple matter of detective work, because cause/effect relationships are so much clearer than they are in the analog world.

It seems of late, however, that the digital world is growing more and more confused. Starting a few weeks ago, my wife's computer began exhibiting all kinds of odd, seemingly unrelated behaviors. For example, she would start the machine, and it might lock hard on startup for no apparent reason. She might start it, and it would run for a few minutes, then give a blue-screen of death (BSoD) error in any number of different VxDs. Alternately, she might start it and run it for a while, only to have Internet Explorer freeze the system completely in the middle of downloading a web page. The symptoms were as varied as they were unpredictable.

I was just about ready to junk her machine completely and build something new when I hit upon the real cause of the problem, namely, the Linksys wireless PC card that was providing her access to our network. I've written elsewhere about the nightmares of wireless networking, but once I got past the initial mess and ditched my expectations that things work at anything approaching their advertised throughput levels, the whole wireless thing has worked out pretty well. At least, it did until this episode. Sure enough, if I pulled the Linksys wireless PC card from her machine and used it in my laptop instead, then my laptop would exhibit the same bizarre panoply of symptoms.

For the record, the diagnostics showed that the card was just fine. The latest version of the software drivers seemed perfectly functional. And most importantly, nothing on her machine had changed. I don't install patches and updates with much frequency anymore because they often cause more problems than they solve. As such, I wait to update all machines at the same time, and I make a note of it in my journal so that I know exactly when things last changed. There really was neither rhyme nor reason to this problem. The card seemed simply to have gone mad.

To their credit, Linksys offered to replace the unit free of charge, despite the fact that I wasn't able to track the problem and contact them until after the unit had passed its one-year warranty period. That's a wonderful customer-service decision on their part, and you can bet I'll be buying more Linksys products as a result of it. You just don't see that kind of commitment to customer satisfaction very often these days in the hardware/software industries, and I'm positively rabid about rewarding it with further business.

2.1.1. RMA Hell

In the not-so-good column, however, the mechanics of the Linksys RMA process also suck, plain and simple. The first fellow I contacted said I would receive an email with instructions on how to proceed. After several days and no message, I called back. The second fellow was actually able to send the email message—thanks, Luis!—and I intended to fax the completed form back to him yet that day. Naturally, this was my opportunity to discover two things about my fax software, Symantec TalkWorks Pro: (1) it will not work under Windows XP, and (2) the entire TalkWorks product line has been discontinued. To their credit, Symantec provided a relatively inexpensive upgrade path to the latest version of WinFax Pro v10.02 instead, so I went that route.

Still, my ordeal wasn't over. This was also my opportunity to discover that my almost-six-year-old-scanner, a Microtek ScanMaker E6, would no longer work with WinFax Pro. I fussed with drivers and all kinds of things. The problem seemed to be that the E6 scanner would work just fine with Windows XP's Windows Image Acquisition (WIA) system, but WinFax was somehow stubbornly insisting on using the TWAIN system instead, which apparently had some kind of conflict. Given that the scanner was a bit long in the tooth, I decided not to waste any more time on it. Instead, I replaced it with a Hewlett-Packard ScanJet 5400c. And while I'm on the subject, that scanner is the only piece of hardware I've bought in recent memory that just works. I was literally up and using it within seconds of plugging it in, and I've had nothing but success with it since. That's the one small joy amidst all this hardware sorrow, but HP deserves big kudos for it nevertheless.

Ok, so I was finally able to fax my stuff to Linksys. When a week had gone by and I hadn't heard anything, I called. The woman with whom I spoke told me that she couldn't find my information. She apologized, asked me a few questions over the phone, and assured me that she would look into it and call me back within the hour. After a couple of more days without any contact, I called again. This time, the man to whom I spoke said that he couldn't figure out why the previous contact couldn't find the data. According to him, the replacement had shipped out ten days ago via UPS. Unfortunately, though, it seemed that a tracking number hadn't been entered in the system, so he couldn't give me that (sigh).

After a few more days, I called back again only to be told that, in fact, it hadn't been shipped out the first time. That was, apparently, some kind of clerical error. I was assured that it would ship out that very day, however, so I should soon have it. By this point, it would be charitable in the extreme to say that I merely doubted the information I was being given. And in fact, it did turn out to be wrong as I received two different replacement units that very afternoon. Somehow, Linksys had processed my RMA twice amidst all the confusion.

Because I really didn't want my credit card to be charged, I called them back. The woman with whom I spoke told me that she could find no record of two units having been sent. And according to her best data, I shouldn't have received anything yet, as a replacement just shipped out that day. Since then, I've been half expecting to find a third unit on my doorstep, but it hasn't happened yet. She assured me that she would get to the bottom of this and send me out an airbill to return the spare unit, but that was almost a week ago now, and I've heard nothing else.

Should it really be this difficult to deal with a simple RMA? Why is it that Linksys can't seem to keep even the most basic records straight? It's not even that their left hand doesn't know what the right hand is doing. It's more like they don't have hands at all; i.e., things just move around by random fiat without any human intelligence or action involved. Each one of these calls has been on my nickel, of course, and I get to sit on hold for at least fifteen to twenty minutes each time. At this point, I'm not going to call them back again. I've expended as much effort as I'm going to expend to return the additional unit. If they charge my credit card, I'll revoke the charges and dash off a letter explaining the depths of their incompetence.

2.1.2. Configuration Hell

Having said all that about the RMA process itself, the trouble still wasn't over. Bear in mind that with all of my prior wireless networking woes, I'm an absolute pro at installing and configuring these things. I know exactly the steps I need to take to get a Linksys wireless PC Card up and running. For those faced with the task, they are as follows:

  1. Uninstall other network adapters from the system.
  2. Insert the Linksys wireless PC card.
  3. When prompted, install the correct driver from the CD-ROM.
  4. Do not let it reboot your system when finished.
  5. Open up the network applet in control panel.
  6. Configure the card settings properly. Be sure to configure the channel, mode of operation (i.e., ad-hoc or infrastructure), SSID of the wireless access point (WAP) if needed, encryption settings, and so forth.
  7. Configure the network protocol(s) properly. In my case, that means specifying the right gateway, IP address, etc. for the TCP/IP binding.
  8. Reboot the system.

By following that simple procedure, I've been able to make every Linksys wireless PC card work immediately. Or at least, I've been able to do that with every card except this one. When I performed those steps with the new unit, I got nothing. The configuration utility, which ships with the card, showed that a strong connection had been made to the WAP, but my laptop wasn't on the network. I couldn't even ping any of the other computers on my network.

When half an hour or so of further fiddling failed to change the situation, I called Linksys support. After several hours of further twiddling while on the phone with the technician, the problem simply got up and walked out of the room. Seriously. Nothing we did made any difference. The last action taken before the problem went away was that I reconfigured the WAP to use an IP address normally in use by one of the four machines on my network.

For the record, that didn't seem to fix the problem. I even rebooted a couple of times to make sure. It was while the technician was off searching his database that Windows Explorer suddenly opened four windows, one for each of my drive connections, with no prompting whatsoever. All of a sudden, it was working. To confirm my belief that the IP address change had nothing to do with it, I re-configured the WAP to 192.168.1.250, which is the faux address it had been using for over a year previously, and everything still worked.

In other words, the problem that made absolutely no sense at all simply vanished for no reason at all. None of the steps we took accomplished anything. None of the changes we made accomplished anything. While I was sitting and staring at the screen, doing nothing but listening to the awful muzak of the Linksys phone system, the network just started working on that machine. Out of the blue. Ex nihilo. It kind of makes one wonder: for how long will it continue to work?

2.2. OnStream

Since the whole wireless thing wasn't the only problem I was currently fighting, I gave up. I thanked the technician, hung up, and called OnStream technical support next. You see, a few years ago I purchased an SC-30 cartridge drive from OnStream after reading a bunch of really positive reviews, all of which said essentially that OnStream was the future of backup hardware/software. Naturally, the company went under roughly a month after I shelled out my money (sigh). That's just my typical luck. Still, I installed the hardware on our house server, which was then running Windows 98, and I figured I would simply have to stick with that operating system (OS) until the next time I bought a new backup solution.

Fortunately, however, something happened to bring OnStream back from the dead, and, better still, they actually released updates of their software and drivers for Windows XP. That enabled me to migrate my server to Windows XP, and, after some initial fussing, the OnStream drive and its Echo software kept backing up my systems night after night. It behaved exactly like you would expect backup software to behave. I scheduled automated full backups twice a week, and I had it back up all modified files every other night while we slept. Everything worked just fine.

That is, it did until a week or so ago. All of a sudden, again completely out of the blue, the Echo software started crashing at the outset of every backup. I would get up in the morning, find that my server was nonresponsive, and discover that it was so because the Echo software was causing a serious error that the OS simply had to report to Microsoft before it would function otherwise. You have to love a server that has a so-called "serious error", which is so non-serious that it can continue running without a hitch! Further, despite the server running just fine, Windows XP causes a complete cessation of all file-sharing, printer-sharing, etc. until after the annoying report procedure has been completed. Hey, good thinking, Microsoft!

Again, I have no explanation whatsoever for why the Echo software was suddenly failing. It had been plugging away for a couple of years without so much as a peep. Nothing on that server had changed. I just don't update things very often anymore on my desktop machine, and I don't ever update things on the server more than once every few months. The associated down time just isn't worth the minimal (sometimes nonexistent) "improvements" the patches bring. Out of the clearest of blue skies, Echo simply up and decided it didn't want to run anymore.

Well, I figured I was running under Windows, and, as I've long since learned, everything goes to crap eventually under Windows. So, I uninstalled the Echo software, uninstalled the SC-30 cartridge drive from device manager, and restarted the system. Windows XP greeted me with the message that it had found new hardware, and it asked me if it could install a driver. Since the Echo installation manual clearly specifies that you should not let XP install a driver, I told it no. That seemed to go just fine. I then installed the Echo software, which correctly detected the cartridge drive and let me select that as the backup device I wanted to use. It even let me assign my favorite drive letter to it, 'T' for 'tape'. The installation finished without a problem and told me to reboot the machine.

After the reboot, however, Windows XP was still griping about wanting to install a driver. That was a bit unexpected, as the Echo software seemed to have installed its own. Thus, I didn't let XP do its thing. The Echo software started a moment later and asked me to select which backup device I wanted to use. Unfortunately, the list of options it provided me was empty. This was puzzling indeed. I went to its "Device Options" screen, and the software told me that it detected an SC-30 drive, but it also said the drive was already in use by another driver. Given that I hadn't let Windows XP install any drivers, I couldn't help but wonder: what other driver was using it?

Still, Echo has the ability to wrest control of the drive from all other programs, assigning it a drive letter in the process. I used that option, and after the requisite reboot—are computers ever going to be able to do this kind of crap without rebooting constantly?—Windows XP seemed to be much happier, or much more ignorant depending upon your point of view. That is, XP was no longer complaining about wanting to install a driver. And sure enough, the Echo software showed that the SC-30 was now available for use as drive 'T'. There were only two problems with this: (1) there was no drive 'T' showing in My Computer, and (2) the Echo-supplied list of backup devices was still empty.

After doing everything I could think of to fix the problem over the next hour, I called OnStream support. Almost three hours later, the technician gave up and admitted he couldn't help me with the problem. He promised, however, to escalate the support incident to a higher-level engineer, who would likely call me back the next day. Given the frequent and thoroughly vacuous promises made by Linksys over the previous weeks, I wasn't holding my breath.

Needless to say, I was quite surprised when a higher-level engineer actually called me back the next day. It truly saddens me the degree to which the mere fulfillment of such a simple promise is exciting; it's just so rare that it's something worth crowing about. After a few hours on the phone, the higher-level engineer was about ready to escalate my incident further—and I must say I'm really quite curious who would have gotten involved next—he thought of one more thing to try, namely, updating the firmware in the device.

For the record, my drive was running firmware version v1.08. The latest version available was v1.09. The only changes made between those two versions should have nothing to do with my problem. In the support technician's own words, we were "grasping at straws". And it bears remembering also that the drive worked just fine with that firmware in place when first installed under Windows XP, and it continued to work just fine for months. In short, the firmware update should have been meaningless. Nevertheless, it was a box on the technician's form that he wanted checked before kicking me higher up the support ladder.

Naturally, the firmware-update utility wouldn't work at all. It complained about needing the latest version of ASPI installed, so we headed down that bunny trail for a while because, of course, the Adaptec web site was completely non-responsive. That's wholly predictable, really, insofar as that web site and only that web site features the latest official ASPI drivers for my interface, an Adaptec 19160 Ultra-SCSI adapter. While my advocate at OnStream searched the company intranet, I visited a bunch of other sites for possible mirrors. Sure enough, I found the v4.71 ASPI drivers from Adaptec on a site devoted to the illegal copying of DVDs. Gee, I wonder if I'll get thrown in the pokey for having downloaded an otherwise freely available piece of software from such a site? I guess we'll have to wait and see.

The ASPI installation went well. The firmware update was successful. Rebooting the system changed nothing, however, as we both expected. One of two things continued always to be the case: (1) Echo couldn't use the drive because it was already in use, or (2) Echo thought it owned the drive but couldn't use it. At this point, the support engineer decided to escalate the incident further, and he started filling out the forms on his end. While he was doing that, I decided to reconfigure the server for daily use (we had shut down programs that normally launched at startup and so forth). Upon rebooting, I noticed that I had missed re-enabling one service, namely, the GameVoice server that my clan, Steel Maelstrom, often uses; thus, I re-enabled that service and rebooted a second time.

Presumably, you can imagine my surprise when the Echo software came up and asked me which drive I would like to use. Ok, that in itself wasn't surprising, as it had done this literally dozens of times before over the last few hours, but what was surprising was that my SC-30 drive was actually in the list! Before the whole thing could turn into a pumpkin, I selected the drive and clicked the OK button. All of a sudden, again completely out of the clearest of blue skies, the drive that wouldn't work at all worked as if there were never anything wrong.

Since then I've been able to reassign the drive letter, reboot the system, run backups, etc., and it just works. The OnStream support technician and I never found anything wrong. We never found an explanation for why it wasn't working. None of the changes we made seemed to fix anything. Worse, the last two changes we made (viz., installing ASPI and updating the firmware) couldn't explain the now successful operation. In short, the problem that made no sense whatsoever just up and vanished for no reason whatsoever. The technician didn't know what to think, and neither do I, though I'm grateful for his non-help fixing the non-problem. Thanks, Thomas!

2.3. Gateway

So, after wasting more than a full day's worth of my time on technical support, two utterly senseless problems simply went away for no apparent reason. That's frustrating, of course, but the worst is yet to come. Later that evening, which was last night, my wife tried to use my Gateway Solo 9550 laptop. Despite the fact that the battery was fully charged, it was warning her about needing to connect to a power supply immediately a mere fifteen minutes later. She had encountered the very same problem with that laptop roughly two weeks prior, and I had recalibrated the batteries at that time in the hope of solving the problem.

Since it obviously wasn't solved, however, I figured I would run the recalibration process again while we slept. I'm heading out of town this coming Friday, and I really wanted to take the laptop with me while traveling. As such, I figured I needed to try to get the problem at least diagnosed quickly so that I could buy a new battery if needed. I went to bed without remembering to start the procedure, but I did manage to remember before I fell asleep. Thus, I dragged myself out of bed, wandered off to the office, and pressed the power button on my laptop.

The screen came to life, telling me to press F2 to access setup, so I pressed F2. The laptop responded with a sound I've never heard before. It was some kind of mid-pitched cross between a squeal and a chirp. At the conclusion of this odd sound, the laptop went completely dead. It no longer responded to the power switch. Swapping batteries made no difference. Hitting the hardware reset switch hidden on the bottom of the unit made no difference. Nothing made any difference. My less-than-one-year-old laptop had managed to turn itself into an inert lump in response to pressing F2.

This morning I returned from the nearest Gateway Country store, which is where the on-line Gateway support representative suggested I go, and I must say that I now have a suggestion for Gateway regarding precisely where they can go. I've just been told that they don't fix anything locally. My laptop is headed back to Houston, TX. The minimum turnaround time for any repairs is three weeks. It's still under warranty, of course, so it likely won't cost me anything, but that was my primary machine for working on my doctoral dissertation. I'm indescribably glad that I'm as paranoid as I am; the last thing I did before shutting the machine down that evening was to run a batch file I've written that copies all of my work to the house server.

So, just as completely senseless problems erupted with a Linksys wireless PC card and an OnStream cartridge drive, so too a laptop inexplicably squealed/chirped and died with a single keypress. I wish I could say this is the first problem I've had with Gateway, but it isn't. The order for that laptop was placed originally on 06/09/01. A few weeks later a laptop shipped to me, only to be lost by UPS. A few weeks after that, Gateway re-submitted the order on my behalf. A few weeks later another laptop shipped to me, only to be lost by UPS. Eventually, I did get a Solo 9500 laptop from Gateway on 08/21/01, almost three months after my initial order. Naturally, the machine died completely a week later. Some weeks after that, I got the Solo 9550 I've been using since. Now, less than a year later, that one is dead too.

NB: I will never buy another laptop from Gateway as long as I live.

2.4. Direct TV

Even our Direct TV service figures into this bizarre tale of woe. When I awoke this morning, which was pretty early so that I could take my laptop to the nearest Gateway Country store, I turned on the television to catch the morning news. Rather than seeing the news, however, what I saw was a series of flickering, horizontal white and black lines. It had a good beat, but I couldn't dance to it. Switching channels made no difference. The recently-repaired VCR could play back its tapes successfully, so I knew the TV was fine.

I called Direct TV customer service, and the representative with whom I spoke helped me diagnose the problem. She walked me through the process of checking my cables, checking the card, checking the decoder box, etc. The final conclusion was that the decoder box, for some completely mysterious reason, had simply given up the ghost during the night.

For the record, the system last worked fine roughly five hours before I got up. The Direct TV decoder unit is plugged into the same surge-suppressing power strip as the television and VCR, but in that brief interim during which I slept, it had somehow died alone. Naturally, there is no known reason for its death, no troubleshooting procedures employed have made any sense, and in light of everything else I've been through of late, I simply can't find it within myself to give a damn. As such, I requested a new unit be sent.

Of course, this turned out to be premature. During the troubleshooting process, I reset the decoder unit at least four times. It was even unplugged completely at least twice. None of these actions changed anything. Roughly ten minutes ago, I was going to sit down and watch a tape in the VCR while I ate lunch. Imagine my surprise when I was greeted by Fox News on my television. Yes, you guessed it: my Direct TV box now works for no good reason, just as it stopped working for no good reason. And, true to form, nothing changed in the meantime. The unit literally wasn't disturbed in any way since I had last fussed with it trying to troubleshoot the problem.

3. Conclusion

So what does it all mean? I grew up building stuff out of resistors, capacitors, induction coils, and so forth. I grew up in the analog world, and I still live in it in terms of my biological life. All around me are processes such that, while it is presumed they have some explanation, they remain so mysterious as to seem almost like magic. That's the chaos of the analog world.

The digital world is supposed to be different. The transistor is supposed to be able to function as a binary switch. Integrated circuit chips are supposed to have well-defined behaviors, and large-scale chips are no different from the small in this regard. Signals on some given pin cause internal changes that result in signals appearing on other pins, all of which supposedly obey completely deterministic laws.

Software is allegedly no different insofar as software can make hardware do only what it's capable of doing in the first place. Software can shove values into registers, fictitious holding areas that don't actually hold anything, but which (more often than not) have a particular voltage refreshed somewhere deep in the bowels of the silicon beast lots of times every second. Contemporary software can operate only in a strictly deterministic way with logical control statements, flow control statements, and so forth.

The result should be that the digital world is deterministic. But is it? Rebooting a computer should restore it to a known state. So why, out of the blue, will rebooting multiple times sometimes change the system's macro-level behavior? Why, out of the blue, did my network just start working with no apparent prompting? For God's sake, I wasn't even touching the keyboard! I was watching the screen, glumly wondering how much more of my time the problem would devour. Why, out of the blue, was the Echo software suddenly able to use the drive it previously couldn't? Why, out of the blue, did my laptop reduce itself to a useless lump of inert stuff when I pressed the F2 key? How did my Direct TV decoder recover miraculously between the n-th restart and the (n + 1)-th restart?

After so many of these sorts of things, one cannot help but wonder: is there really even an answer anymore? And if so, can it be known in principle? In practice? Have we built systems so complex now that nondeterministic properties (and thus nondeterministic problems) are emerging? Can the sum of a series of deterministic interactions yield higher-order nondeterministic results? Or is it simply that the initial assumption is flawed; i.e., is it simply that the so-called digital world isn't really digital at all? Perhaps there is more analog slop to the digital than any engineer dares to let himself believe. I'm inclined toward that answer, but going that route requires me to distrust technology at a very fundamental level, and that's pretty hard to do given the way I've lived for the last few decades.

I don't know the answers to these questions. Hell, I'm not even sure I've got the questions right. But I am absolutely certain of this much: technology still sucks badly. And unfortunately, it seems only to be getting worse all the time.

08/03/2002

1