BA systems down globally

Author
Discussion

oyster

12,639 posts

249 months

Wednesday 31st May 2017
quotequote all
dmsims said:
Report in the press (sepculative) that heat may have played a part

http://www.dailymail.co.uk/news/article-4556640/Di...

from that article "Because of the high temperatures last year, staff were having to hose the top of the building down to keep it cool."

seriously ?
From experience, I know that in recent years, another of the most famous and biggest British companies was doing the exact same thing at its primary data centre.

Puggit

48,526 posts

249 months

Wednesday 31st May 2017
quotequote all
Coincidence that Virgin are publicising that they are hiring? https://twitter.com/VirginAtlantic/status/86983760...

speedyman

1,526 posts

235 months

Wednesday 31st May 2017
quotequote all
gavsdavs said:
speedyman said:
........ I still don't get why the dr site didn't kick in though.........
Y'know - If only it were that simple.

Proper DR is complex and some key decisions have to be taken (especially around replication, you need to start copying stuff back the other way - which /can/ mean thing like your DR storage arrays start overwriting your prod arrays (if you automate things).

Generally quite a few manual levers need to get pulled to get things up and running, never mind running full DR (where the DR site becomes the primary site for storage replication)
correct, even if the supplies fail the big storage arrays will have data in battery backed cache, this needs to be managed, this is the bit we don't know, how they managed the situation and that would probably involve the off shore team, my guess is they were all in headless chicken mode.

gavsdavs

1,203 posts

127 months

Wednesday 31st May 2017
quotequote all
oyster said:
dmsims said:
Report in the press (sepculative) that heat may have played a part

http://www.dailymail.co.uk/news/article-4556640/Di...

from that article "Because of the high temperatures last year, staff were having to hose the top of the building down to keep it cool."

seriously ?
From experience, I know that in recent years, another of the most famous and biggest British companies was doing the exact same thing at its primary data centre.
This doesn't add up. Pouring water on the roof has no effect on the temperature in the room. Why didn't they start turning stuff off ? (I suppose not everyone knows what's really business critical and what isn't)

dmsims

6,559 posts

268 months

Wednesday 31st May 2017
quotequote all
How does Cruz even know if (in your best Texan drawl) smokes blowing up his ass

He clearly does not understand what's going on and all for £800K

loafer123

15,461 posts

216 months

Wednesday 31st May 2017
quotequote all
gavsdavs said:
This doesn't add up. Pouring water on the roof has no effect on the temperature in the room. Why didn't they start turning stuff off ? (I suppose not everyone knows what's really business critical and what isn't)
I'm no expert, but I find the functionality of my PC deteriorates when I turn it off.



Joke!

andy43

9,762 posts

255 months

Wednesday 31st May 2017
quotequote all
gavsdavs said:
oyster said:
dmsims said:
Report in the press (sepculative) that heat may have played a part

http://www.dailymail.co.uk/news/article-4556640/Di...

from that article "Because of the high temperatures last year, staff were having to hose the top of the building down to keep it cool."

seriously ?
From experience, I know that in recent years, another of the most famous and biggest British companies was doing the exact same thing at its primary data centre.
This doesn't add up. Pouring water on the roof has no effect on the temperature in the room. Why didn't they start turning stuff off ? (I suppose not everyone knows what's really business critical and what isn't)
Was there a hosepipe ban in force? That could be it...

dmsims

6,559 posts

268 months

Wednesday 31st May 2017
quotequote all
oyster said:
dmsims said:
Report in the press (sepculative) that heat may have played a part

http://www.dailymail.co.uk/news/article-4556640/Di...

from that article "Because of the high temperatures last year, staff were having to hose the top of the building down to keep it cool."

seriously ?
From experience, I know that in recent years, another of the most famous and biggest British companies was doing the exact same thing at its primary data centre.
Max temp at Heathrow in 2016 was 24.7 C - hardly high


Puggit

48,526 posts

249 months

Wednesday 31st May 2017
quotequote all
dmsims said:
Max temp at Heathrow in 2016 was 24.7 C - hardly high
And not correct either - there were multiple days above 30 degrees at Heathrow in July.

http://www.standard.co.uk/news/uk/london-weather-h...

dmsims

6,559 posts

268 months

Wednesday 31st May 2017
quotequote all
Puggit said:
dmsims said:
Max temp at Heathrow in 2016 was 24.7 C - hardly high
And not correct either - there were multiple days above 30 degrees at Heathrow in July.

http://www.standard.co.uk/news/uk/london-weather-h...
Blimey! I got that figure from Heathrow weather station data at the Met office - 2016 ranked 21st since 1948

stu67

815 posts

189 months

Wednesday 31st May 2017
quotequote all
Not got a hose on the building envelope as such, rather the AC chillers which were obviously working at 110%. I've known it to happen also, but why "critical" IT systems are buzzing away in a rubbish building heaven knows why.
Interestingly I was in at the weekend doing our annual "black building" shutdown and when some of my IT colleagues were moaning about working I pointed them to the BA scenario. It's damn useful given the turnover in staff (especially IT) that we have a body of knowledge on how to bring kit back up with a minimum of disruption even if we are just switching it over to a DR site.

Puggit

48,526 posts

249 months

Wednesday 31st May 2017
quotequote all
stu67 said:
Not got a hose on the building envelope as such, rather the AC chillers which were obviously working at 110%. I've known it to happen also, but why "critical" IT systems are buzzing away in a rubbish building heaven knows why.
Interestingly I was in at the weekend doing our annual "black building" shutdown and when some of my IT colleagues were moaning about working I pointed them to the BA scenario. It's damn useful given the turnover in staff (especially IT) that we have a body of knowledge on how to bring kit back up with a minimum of disruption even if we are just switching it over to a DR site.
I've dealt with many organisations who run 2 DCs, Active/Passive and are flipping between them on a regular basis. Expensive, but keeps everything checked.

Vaud

50,759 posts

156 months

Wednesday 31st May 2017
quotequote all
dmsims said:
Blimey! I got that figure from Heathrow weather station data at the Met office - 2016 ranked 21st since 1948
You need to read your stats. wink

The data consists of:
Mean daily maximum temperature (tmax)
Mean daily minimum temperature (tmin)
Days of air frost (af)
Total rainfall (rain)
Total sunshine duration (sun)

Mean, not max! So the mean max temp through the month of July. Not the highest recorded temperature that month.

I think I have that right, been a while for me on stats smile

dmsims

6,559 posts

268 months

Wednesday 31st May 2017
quotequote all
Vaud said:
You need to read your stats. wink

The data consists of:
Mean daily maximum temperature (tmax)
Mean daily minimum temperature (tmin)
Days of air frost (af)
Total rainfall (rain)
Total sunshine duration (sun)

Mean, not max! So the mean max temp through the month of July. Not the highest recorded temperature that month.

I think I have that right, been a while for me on stats smile
Thanks - every day is......

dmsims

6,559 posts

268 months

Wednesday 31st May 2017
quotequote all
OK 19th July 2006 35.8C at Heathrow - so they have had it hotter before

Yipper

5,964 posts

91 months

Wednesday 31st May 2017
quotequote all
babatunde said:
couple of things,
I wrote my Masters Thesis (1997) on how outsourcing a company's core competency is a shortcut to failure and how for many companies IT is their core competency. Outsource your cleaners if you must but outsourcing IT in a logistics company is as stupid as outsourcing R&D


Spent many a year designing, building and working in Server rooms and NO I repeat NO proper server room has a single UPS.

When building Racks the redundant power supplies on each individual server will be plugged into a different power source, which are attached to separate UPS's, anything less than this is unacceptable. Redundency is designed into Servers, and the whole Server Room environment is designed around redundancy

SO either it was true amateur hour or they are lying.
Outsourcing of hardware or software has always been a controversial topic since the 1920s when it first emerged. Top managers love outsourcing because it pushes responsibility and workload onto some other sucker and makes them feel all powerful calling the shots. It is the real underlying reason why outsourcing is so popular. You get a bonus or promotion for chopping -20% off short-term costs, somebody else does all the hard labour when it goes right, and somebody else gets shouted at or blamed when it goes wrong. Winner winner, chicken dinner.

And, yes, the airline is not telling the truth. It was not a hot day, and no Heathrow or London power suppliers reported a power surge that day.

stu67

815 posts

189 months

Wednesday 31st May 2017
quotequote all
Puggit said:
stu67 said:
Not got a hose on the building envelope as such, rather the AC chillers which were obviously working at 110%. I've known it to happen also, but why "critical" IT systems are buzzing away in a rubbish building heaven knows why.
Interestingly I was in at the weekend doing our annual "black building" shutdown and when some of my IT colleagues were moaning about working I pointed them to the BA scenario. It's damn useful given the turnover in staff (especially IT) that we have a body of knowledge on how to bring kit back up with a minimum of disruption even if we are just switching it over to a DR site.
I've dealt with many organisations who run 2 DCs, Active/Passive and are flipping between them on a regular basis. Expensive, but keeps everything checked.
Very much this, don't get me wrong Sh** happens and to be honest it couldn't happen at a worse time being a hot bank holiday. It's what you do after the fact that counts. They obviously didn't have the skills / infrastructure available to get things back up and running quick enough but mostly from a PR perspective it was a disaster (the PR department were probably all down the beach)

Vaud

50,759 posts

156 months

Wednesday 31st May 2017
quotequote all
Yipper said:
Outsourcing of hardware or software has always been a controversial topic since the 1920s when it first emerged. Top managers love outsourcing because it pushes responsibility and workload onto some other sucker and makes them feel all powerful calling the shots.
Yes and no. Outsourcing makes sense when it is not economically viable / you don't have the critical mass / investment budget etc to perform in-house.

It makes no sense, for example, to build your own computers from parts in a modern business. You outsource and buy a fully ready, supported product from Dell.

If you have one airconditioning unit, it makes no sense to have a dedicated air conditioning repair man on site for the fractional time they are used - you outsource it.

Ditto many IT services - cloud is attractive because of the sheer scale of investment they can bring (AWS/Azure) and PAYG pricing (opex intensity vs capital intensity). Owning your own data center doesn't make a lot of sense in may use case these days outside of a few cases. you may keep the application knowledge in house, but you outsource the "hassle of the asset"...

Small businesses outsource all the time - it's how many new companies (e.g. Uber) have become as big as they have, as fast as they have - no capital intensity and keeping to their core (platform code + brand identity + engagement) and outsource everything else.

Laplace

1,090 posts

183 months

Wednesday 31st May 2017
quotequote all
ruggedscotty said:
Very good - but......

There would probably be more than one UPS being used and very likey dual string at least. You would never have to put a UPS into bypass, that is last resort and even then if you had then you would have the other string available. This smacks of something more major, like why would you resort to putting your super critical IT infrastructure into bypass ? The system is usually configured that you can one UPS out for maintenance without any effect on the other units or the IT load. Even then you have it set up that you can have another UPS unit fail and still run with no issue.

A good few years ago I was involved in a UPS program of works that involved installing a new ups unit, The issue that we had was that they didn't talk the new unit with the old units and we had to upgrade the old ones to the new software. Trouble was that the old units couldn't be upgraded live so they had to be taken out of service. In a live working environment.

That was a nightmare to organise, getting permission from the business and sorting it out so that we could do it with minimum impact to the business. So much was involved behind the scenes, we decided to rely on the other string, so we had to survey the whole site and ensure that there was dual string capability on every item, and that it was true dual string and that both supplies had not been fed in error from the one source. We then had to prove that under all scenarios we were able to cope, plan was to place one string onto bypass and carry out the work, During this process we were running the affected string off raw mains so we decided that we would have that on generator. When we made that decision we had switching programs thought out and in place so that we ran as best as we could. we even did pre checks to ensure that if we had our IT on generators and we had power cut the other string going to generator would not affect the load characteristics or disrupt the power quality and knockon the IT.

Absolute nightmare.

All I can say is something terrible must have happened with BA.
I hear you and I'm not really suggesting their systems are being held up by a single UPS with zero redundancy. I was just trying to illustrate where human error could come in while switching UPSs, this stands regardless of the site topology, there are just too many scenarios to list so I went with the simplest.

I've spoken with ex-colleagues across four of the major UPS firms, none of which are taking responsibility, and they've all mentioned switching error as a candidate.

If cooling was an issue, exacerbated by poorly maintained UPSs already running hot with internal fan failures and/or high load, then the UPS on reaching its high temp trip point would attempt a bypass transfer. For this to be completed successfully the bypass would have to be within tolerance at that moment in time otherwise the UPS would inhibit the asynchornous transfer and the load would potentially be lost.

We really have nothing to go on but it will come out within the industry, eventually.

dmsims

6,559 posts

268 months

Wednesday 31st May 2017
quotequote all
Laplace said:
We really have nothing to go on but it will come out within the industry, eventually.
You think ?

My money is on Cruz stuffing their mouths with gold so he is not found out