Tens Of Thousands Report AT&T Service Outage; Company 'Working Urgently' To Restore Service

 

Perhaps some people dug their Garmin GPS out of the bottom desk drawer today if they could use Google Maps on their smartphone.

https://www.usatoday.com/story/tech/2024/02/22/att-outage-se...

Phone Service Outtage

I'm wondering if Iran sent the Magnetic Pulse Generator Thing-a-ma-bob to knock it out

Cyberattack on US Drug Stores Too

Cyberattack has disrupted US pharmacies nationwide as well.

No Evidence

There is no evidence yet that the outage was caused by a cyberattack. The investigation is ongoing however.

Power outages reported

Supposedly it was mylar balloons landed on high tension wires and took out the grid in Rochester NY.

--
Nuvi 2460LMT.

We did that on purpose

pwohlrab wrote:

Supposedly it was mylar balloons landed on high tension wires and took out the grid in Rochester NY.

In Desert Strom we had a special warhead on cruise missiles designed to mess up power systems by dispensing strips of mildly conductive material where they would drape over power lines.

Not saying this was done on purpose--just saying that conductive stuff hitting (non-insulated) high voltage power lines can actually cause trouble.

--
personal GPS user since 1992

Nope, it was caused by UFO's.

Nope, it was caused by UFO's.

--
I never get lost, but I do explore new territory every now and then.

Technology like humans isn't perfect.....

Stuff happens...Things go wrong. Updates in the lab have a glitch when update goes live nationwide. Not to mention solar flares and of course sabotage. Have a plan B and go on with your day because you know what? It's gonna happen again and again and again.

Mylar Balloons...

archae86 wrote:
pwohlrab wrote:

Supposedly it was mylar balloons landed on high tension wires and took out the grid in Rochester NY.

In Desert Strom we had a special warhead on cruise missiles designed to mess up power systems by dispensing strips of mildly conductive material where they would drape over power lines.

Not saying this was done on purpose--just saying that conductive stuff hitting (non-insulated) high voltage power lines can actually cause trouble.

We heard a huge BOOM one night last year. It scared the heck out of our granddaughter who was spending the night. We lost all power for hours. It was caused by a celebratory mylar balloon that caused a transformer to blow. Those things are idiotic.

--
GPSMAP 76CSx - nüvi 760 - nüvi 200 - GPSMAP 78S

Power outages reported

AT&T is reporting “A temporary network disruption that affected AT&T customers in the U.S. Thursday was caused by a software update, the company said.”

One thing. Apparently individuals getting ready to fly somewhere with their boarding pass on a phone caused some issues at the airport as each individual had to be manually verified before they could board. This is one of the reason I always have a printed boarding pass.

--
John from PA

Story...

John from PA wrote:

AT&T is reporting “A temporary network disruption that affected AT&T customers in the U.S. Thursday was caused by a software update, the company said.”

One thing. Apparently individuals getting ready to fly somewhere with their boarding pass on a phone caused some issues at the airport as each individual had to be manually verified before they could board. This is one of the reason I always have a printed boarding pass.

Back in the day I worked with midsized IBM mainframes and spent many a weekend upgrading systems. Upgrading systems was a big deal. Months of planning and testing were done and typically we had Friday night to Sunday morning to do the install and testing. During the testing phase the biggest bugaboo was trying to simulate transaction VOLUME. If we ended up having a problem it was not unusual for it to be transaction VOLUME related.
The customer would test everything in their power to make sure the cutover went smoothly but inevitably transaction VOLUME rose up to bite them on their ass. In the test phase what ran error free at 500 transactions per second sometimes had issues when at 9 a.m. Monday morning when hundreds of users all signed on, and all of a sudden instead of 500 transactions there were 1500 or 2000 transactions per second coming across and problems would be uncovered. AT&T made changes somewhere and viola! the country went dark. Happens.

Phil

--
"No misfortune is so bad that whining about it won't make it worse."

I'm nowhere near as sympathetic as you

plunder wrote:
John from PA wrote:

AT&T is reporting “A temporary network disruption that affected AT&T customers in the U.S. Thursday was caused by a software update, the company said.”

One thing. Apparently individuals getting ready to fly somewhere with their boarding pass on a phone caused some issues at the airport as each individual had to be manually verified before they could board. This is one of the reason I always have a printed boarding pass.

Back in the day I worked with midsized IBM mainframes and spent many a weekend upgrading systems. Upgrading systems was a big deal. Months of planning and testing were done and typically we had Friday night to Sunday morning to do the install and testing. During the testing phase the biggest bugaboo was trying to simulate transaction VOLUME. If we ended up having a problem it was not unusual for it to be transaction VOLUME related.
The customer would test everything in their power to make sure the cutover went smoothly but inevitably transaction VOLUME rose up to bite them on their ass. In the test phase what ran error free at 500 transactions per second sometimes had issues when at 9 a.m. Monday morning when hundreds of users all signed on, and all of a sudden instead of 500 transactions there were 1500 or 2000 transactions per second coming across and problems would be uncovered. AT&T made changes somewhere and viola! the country went dark. Happens.

Phil

I'm nowhere near as sympathetic as you are. Your mainframe parallel is fitting. Following your parallel:

Cutting a model of the running system has been easy for many years. Harder was choosing a suitable period to model that also happened to be healthy. Forecasting workloads requires dealing with managers who don't give a damn about your job and regard their plans as proprietary. They also won't have any idea of what a transaction is so you'll have to translate business-speak to geek-speak. Hopefully you will have long before that negotiated service level objectives with them so you have objective and agreed upon pass/fail criteria. Then you have to run a sensitivity analysis simulating (maybe literally or figuratively) combinations of workloads to find breaking points. Then you can change the model's configuration to the new proposed one, see what happens, and fix it. BTW IBM's data (capacity/performance data) became high enough quality by the mid '80s so cutting a model became mechanistic.

The network folks can do all this too. They do have some interesting problems. Perhaps this will date me. I can’t remember it’s correct name but cell systems maintain a “database in the air” always recirculating thru all nodes that build up connections. That database maintains the then-current relationship between your phone (and every other phone) and the local cell site that you are connected to so that you can be found to receive a call. (You can tell from the fuzziness of my writing this paragraph that I’ve only been distantly connected to this stuff.)

I don’t want to suggest that this is easy. It’s called work.

I’m in a third world

Country right now and many things surprise me.

Walking at the mall a young man approaches me and says I have an opportunity to get in on the ground floor of a new condo. Price converted to USD is over $400k.

Everything here costs the same or more than in PA.

To get a SIM card (not paying Verizon $10/day), wife was asked for Us Passport, her picture and DL were scanned. Hope no fraud.

One thing I will say? Staying connected is challenging to say the least. Even free wifi requires a local cell# to receive a text code to join.

We are spoiled with our “gigabit” home internet, although I only have 100/100.

Btw now I realize why it’s so darn easy for people to vacation in the USA. I used to think how can you afford it, you’re from xyz. I mean where I’m at people earn as little as $10 USD / day yet condos are more than a small house in PA. And they spread the cost over five years

My point is we probably provide best in class service and can actually restore it. Sadly my FiOS was down over 24 hours last year, a first

I've had att fiber since

I've had att fiber since march of 2018. It seems in more recent times periodic outages have increased. First 3 or so years of service practically none. If there were any, it was mostly maintenance windows between 12-4am.

In the last year we've had at least 3 or 4 outages, some lasting hours.

Most recent one evening of the same day att's cell network went to hell. Around 1700, it died for about 10 min. I was in the middle of completing something online so the outage was immediately realized. Red alarm light was on on the ONT.

AT&T offers $5 account credit

AT&T offers $5 account credit to customers affected by nationwide cellular outage

I heard the problem was at

I heard the problem was at the command post that monitors all that stuff. Someone fell asleep and his head hit the keyboard and pushed all the wrong buttons. lol.

Good News.....

I just heard AT&T will reimburse customers for the outage.

--
RKF (Brookeville, MD) Garmin Nuvi 660, 360 & Street Pilot

Reimburse

I guess 5 dollars is better than nothing. They could give everyoe one month of free service..

--
johnm405 660 & MSS&T

Mean time between failures

zx1100e1 wrote:

I've had att fiber since march of 2018. It seems in more recent times periodic outages have increased. First 3 or so years of service practically none. If there were any, it was mostly maintenance windows between 12-4am.

In the last year we've had at least 3 or 4 outages, some lasting hours.

Most recent one evening of the same day att's cell network went to hell. Around 1700, it died for about 10 min. I was in the middle of completing something online so the outage was immediately realized. Red alarm light was on on the ONT.

Drawing on your experience with your ATT&T connections and converting it to my IBM mainframe experience is rather interesting. First of all, I've been retired from mainframe work since 2005 so I'm using very old metrics here. IBM (as does every other computer provider I expect) has a metric called "mean time between failures which is an estimate of often an IBM mainframe user back in 2005 might expect his mainframe to have a problem. Now I'm just bragging here, but way back in 2005 the MTBF for an IBM mainframe was SEVEN YEARS. Pretty good reliability I'd say.

Phil

--
"No misfortune is so bad that whining about it won't make it worse."

two hours in forty years

plunder wrote:

Now I'm just bragging here, but way back in 2005 the MTBF for an IBM mainframe was SEVEN YEARS. Pretty good reliability I'd say.

Phil

I was a co-op student working on switching systems at Bell Telephone Laboratories in the late 1960s. I became familiar with the standards that they had at the time for central office switch reliability. It allowed for two hours of downtime per 40 years of service.

The astonishing thing is that they met that standard with the completely nonelectronic central office switches using step-by-step, tracker, and cross-bar type switches.

They had to put in quite a bit of redundancy and exercise other careful measures to achieve that standard when they switched to electronic central switches.

--
personal GPS user since 1992

no one would even try to build a reliable handset

Since I was a forecaster for computer systems at a telco I was asked to provide insight into the budding cellphone industry and asked if we should get into that business. Obviously (to me) I had no idea of what I was doing. What really hung me up and made me advise against entering the business was my belief that no one would even try to build a handset reliable enough to be measured against Ma Bell’s standard and that would put a severe limit on the growth of cell systems. HA!

OTOH the Bellcos and Bell Labs were cost-plus operations with no incentives for cost savings. An amusing (maybe only to me) example of their disregard for costs was that their office scissors were coveted outside the office. To demonstrate their effectiveness people would use them to cut coins without ill effects (to the scissors).

Very True

archae86 wrote:
plunder wrote:

Phil

I was a co-op student working on switching systems at Bell Telephone Laboratories in the late 1960s. I became familiar with the standards that they had at the time for central office switch reliability. It allowed for two hours of downtime per 40 years of service.

The astonishing thing is that they met that standard with the completely nonelectronic central office switches using step-by-step, tracker, and cross-bar type switches.

They had to put in quite a bit of redundancy and exercise other careful measures to achieve that standard when they switched to electronic central switches.

I was a Bell engineer for almost 40 years and this is very true. The reason is, The public utility commissions in many states were merciless about any service outages. They allowed the strictly controlled service rates to be high enough to support this degree of quality.

My how times have changed!

was on vacation

in a 3rd world country.

The internet connectivity was sketchy. We stayed in a condo and early on, like day 3, the internet went out. It wasn't that bad, it was like 20mbps down, 22 mbps up when I tested it. Handoff was via DSL/phone line. Seen this when I used to setup offices in Toronto. Backbone fiber, handoff to a DSL router. Yep, no more internet the duration of the stay.

Free public wifi? Required a local tel# to receive a text so not anonymous. I was not gonna pay $10/day to Verizon (those clowns got sued and raised prices to cover the class action payout).

Where we are staying, I got solicited multiple times to buy a condo on the ground floor...when I converted the cost back to USD? Starting at $400k. c'mon now.

I'm getting to my point--there's the disparity. You live in a 3rd world country, in a condo that is $400k USD. But your internet is very unreliable, sorta the point of this thread, that it goes down, people "scramble" and if it's more than say 2 hours, it's a crisis! lol

My FiOS as been more unreliable the last 5 years, at least 2 24 hour outages, whereas the prior 13 years, almost none at all.

Again, remember the phones, 99.999% uptime. Don't see that anymore.

In my former

bdhsfz6 wrote:
archae86 wrote:
plunder wrote:

Phil

I was a co-op student working on switching systems at Bell Telephone Laboratories in the late 1960s. I became familiar with the standards that they had at the time for central office switch reliability. It allowed for two hours of downtime per 40 years of service.

The astonishing thing is that they met that standard with the completely nonelectronic central office switches using step-by-step, tracker, and cross-bar type switches.

They had to put in quite a bit of redundancy and exercise other careful measures to achieve that standard when they switched to electronic central switches.

I was a Bell engineer for almost 40 years and this is very true. The reason is, The public utility commissions in many states were merciless about any service outages. They allowed the strictly controlled service rates to be high enough to support this degree of quality.

My how times have changed!

life 1996-2015 was AT&T/Lucent/Avaya. That entire timeframe my expertise was Definity G3R. It took me to two off-site training and 1 conference, every single year, except 2009. Talk about having such a specific expertise.

Imagine one day I woke up, and found that my services would no longer be required. But my employer let me keep my salary and transition into something new. in 2015 I was designing and installing the network infrastructure in warehouse/distribution centers. I've since moved on again in 2019 to present. Pretty amazing, there's an over/under on if I'll ever be laid off in my career cuz I'm certainly in the sunset years by now!

Our enterprise phone systems were 99.999% uptime, we were bonus'd on that number. That's about 5 min. per year. A single outage non-planned blew it.

I no longer have such a tiny window to work with. But today, if something is down (including teammates fat-fingering as I'm the sr.), we could have 200 people doing nothing, and on top of that, forklifts and tractors unable to load and dispatch.

So connectivity still reigns supreme!

No Zero

While office switches had astonishingly high standards for not going down, which were actually met, Bell allowed considerable error rates for individual actions. I think the allowed rate for some single-call connection errors was 0.1%.

Later in life I ran into some of the modern propaganda base regimes such as the "Zero Defects" religion. I much preferred the Bell (of that time) quantitative approach. Know you error rates as the component level. Do the system design to engineer your resulting service quality to reach your stated goals. Measure the results and adjust to see what you actually get.

Perfection does not exist. Better to do disciplined work than issue propaganda.

--
personal GPS user since 1992

In the last 8 years my cell

In the last 8 years my cell phone usage has become practically nil.

Working from home i'm tethered to the pc during working hours. Calls in/out use voip with a piece of software running on the pc.

Google voice has been used for my own number going back to..........probably 2010. I run a pbx in the house which also allows connectivity using the google voice number. Effectively, a different piece of software on the computer connects to the pbx, allowing personal calls in/out. During this time, cell phone is in airplane mode.

In fact, as a smart phone with a 6" display, the battery easily lasts 7+ days with my use. About the only time it comes out of airplane mode is if im out and need to make a call. If someone needs to reach me at this time, they can leave a message. That is, inbound calls can wait.

Yes, such may be an inconvenience to some, too bad. I am not accessible every minute of the day. If it's an emergency, call 911.