VOIP – AT&T Not Ready for Primetime?

I used to work for Pacific Bell, one of the “local” phone companies after the big AT&T “ma-bell” break-up in the mid-70’s. It was a great place to work, and the customer was ALWAYS #1 priority. You phone, their #1 service, had to work, and it had to work all the time. Most of the other “baby bells” had the same philosophy, “always up” service. AT&T, the mother ship at the time, also believed in non-stop phone service.

Strangely, now that most of AT&T is back together, and even bigger than it was before the break-up, they still have not returned to that place where AT&T, and Pacific Bell were in the 70’s, 80’s and 90’s. Lets call it the “Land of FIVE NINES” for lack of a better name.

When I worked on services and devices at Pacific Bell, everything had to be designed and tested to meet FIVE NINES, and this meant 99.999% up time (see, 5 “9”s), and nothing less was acceptable. The systems we designed had backups and automatic roll-over systems, and some of them had backups on the backups.

Now lets be clear here, I was designing digital TV products, not telephony, but the rules were the same, less than 0.001% downtime. That is less than 9 hours a YEAR total downtime. And in some systems, that was per customer, so two customers down 4.5 hours was a bad thing. Our television cable system had power systems that could keep the system running for over 2 weeks in most places without main power. This was serious: Pac Bell systems did not fail, and if they did, get them back up fast.

But today, AT&T’s U-Verse system dropped VOIP home phone to 22 states at the same time. OK, now U-Verse is not big yet, but it still has 1.15 million customers across the US, so this means that a good deal of them went down, and they all went down for 4 hours and 15 minutes.

Callers were told that the number they called was not in service. People at home on the service got no dial tone. And this all took place for over 4 hours in 22 states. Sorry, but this is not a FIVE NINES service.

But what really gets me is the word by service personal at AT&T: “Support personnel are telling customers that a server crash brought down U-verse Voice in AT&T’s entire 22-state local phone service area.

The key word for me is “a”, meaning ONE SERVER, brought down the entire 22-state phone service. Really? One server? Where were the backups that AT&T used to be famous for? No roll over servers, no backup service center, no contingency plan for a busted server?

AT&T spokeswoman Mari Melguizo said the outage started at about 10:30 a.m., and service was restored to most subscribers at 2:45 p.m. She said the extent of the outage was unknown.

The “extent of the outage was unknown”!!!! Really? Down for 4 hours, and they have no idea to what extent this service was disrupted? Were they all out to lunch or on coffee break at the time? Damn, when my video server supplying pay per view movies went off line once at 2:30 in the morning, I was called to get it back up immediately. Just how many people were watching pay per view at 2:30 AM did not matter. The backup failed, and I was out of bed and in the network center in 20 minutes, and service restored in less than 30 minutes.

Obviously, this is not AT&T’s work ethic today!

Now, a quick digression: I do not own an iPhone. Yes, I worked at Apple and yes I own many Macs, and an iPod Touch, and maybe even an iPad one day, but NO to the iPhone. But I want one. What stops me is AT&T! Their level of service on their wireless system in large cities (like where I live) is awful. My friend’s phones simply drop calls, even when they have 5 bars. Sometimes they do not receive incoming calls. Why? Because AT&T wireless service cannot keep up with all the demands put on it by the data phones (like iPhone and Android) for voice, data, and SMS messaging, and the only solution is to drop calls to recover bandwidth. Now they have had 2+ years to fix this, and yet, most of my friends with iPhones swear at AT&T at least once a week.

What happened to 99.999%? Cell phones used to be reliable when you had coverage. Sure, we all lost signal and dropped a call, but not when we had 5 bars of signal. And why does it seem like AT&T, Sprint, Verizon, and the rest do not really care? And now people’s “landline” phones drop off too, and all caused by one single server?

Personally, I find this sad and outrageous at the same time. U-Verse costs almost $100 a month or more, and you would think they could afford a backup? But at the least, could they strive for better design to allow for failure, and not take down an entire system of 22 states because of one server.

SIGH. Quality, service, and reliability are a dying set these days.

emailMyMac MagazineTwitterAdvertiseReviews ArchivePodcast

Leave a Reply