BA systems down globally
Discussion
dmsims said:
Report in the press (sepculative) that heat may have played a part
http://www.dailymail.co.uk/news/article-4556640/Di...
from that article "Because of the high temperatures last year, staff were having to hose the top of the building down to keep it cool."
seriously ?
From experience, I know that in recent years, another of the most famous and biggest British companies was doing the exact same thing at its primary data centre.http://www.dailymail.co.uk/news/article-4556640/Di...
from that article "Because of the high temperatures last year, staff were having to hose the top of the building down to keep it cool."
seriously ?
Coincidence that Virgin are publicising that they are hiring? https://twitter.com/VirginAtlantic/status/86983760...
gavsdavs said:
speedyman said:
........ I still don't get why the dr site didn't kick in though.........
Y'know - If only it were that simple.Proper DR is complex and some key decisions have to be taken (especially around replication, you need to start copying stuff back the other way - which /can/ mean thing like your DR storage arrays start overwriting your prod arrays (if you automate things).
Generally quite a few manual levers need to get pulled to get things up and running, never mind running full DR (where the DR site becomes the primary site for storage replication)
oyster said:
dmsims said:
Report in the press (sepculative) that heat may have played a part
http://www.dailymail.co.uk/news/article-4556640/Di...
from that article "Because of the high temperatures last year, staff were having to hose the top of the building down to keep it cool."
seriously ?
From experience, I know that in recent years, another of the most famous and biggest British companies was doing the exact same thing at its primary data centre.http://www.dailymail.co.uk/news/article-4556640/Di...
from that article "Because of the high temperatures last year, staff were having to hose the top of the building down to keep it cool."
seriously ?
gavsdavs said:
This doesn't add up. Pouring water on the roof has no effect on the temperature in the room. Why didn't they start turning stuff off ? (I suppose not everyone knows what's really business critical and what isn't)
I'm no expert, but I find the functionality of my PC deteriorates when I turn it off.Joke!
gavsdavs said:
oyster said:
dmsims said:
Report in the press (sepculative) that heat may have played a part
http://www.dailymail.co.uk/news/article-4556640/Di...
from that article "Because of the high temperatures last year, staff were having to hose the top of the building down to keep it cool."
seriously ?
From experience, I know that in recent years, another of the most famous and biggest British companies was doing the exact same thing at its primary data centre.http://www.dailymail.co.uk/news/article-4556640/Di...
from that article "Because of the high temperatures last year, staff were having to hose the top of the building down to keep it cool."
seriously ?
oyster said:
dmsims said:
Report in the press (sepculative) that heat may have played a part
http://www.dailymail.co.uk/news/article-4556640/Di...
from that article "Because of the high temperatures last year, staff were having to hose the top of the building down to keep it cool."
seriously ?
From experience, I know that in recent years, another of the most famous and biggest British companies was doing the exact same thing at its primary data centre.http://www.dailymail.co.uk/news/article-4556640/Di...
from that article "Because of the high temperatures last year, staff were having to hose the top of the building down to keep it cool."
seriously ?
dmsims said:
Max temp at Heathrow in 2016 was 24.7 C - hardly high
And not correct either - there were multiple days above 30 degrees at Heathrow in July.http://www.standard.co.uk/news/uk/london-weather-h...
Puggit said:
dmsims said:
Max temp at Heathrow in 2016 was 24.7 C - hardly high
And not correct either - there were multiple days above 30 degrees at Heathrow in July.http://www.standard.co.uk/news/uk/london-weather-h...
Not got a hose on the building envelope as such, rather the AC chillers which were obviously working at 110%. I've known it to happen also, but why "critical" IT systems are buzzing away in a rubbish building heaven knows why.
Interestingly I was in at the weekend doing our annual "black building" shutdown and when some of my IT colleagues were moaning about working I pointed them to the BA scenario. It's damn useful given the turnover in staff (especially IT) that we have a body of knowledge on how to bring kit back up with a minimum of disruption even if we are just switching it over to a DR site.
Interestingly I was in at the weekend doing our annual "black building" shutdown and when some of my IT colleagues were moaning about working I pointed them to the BA scenario. It's damn useful given the turnover in staff (especially IT) that we have a body of knowledge on how to bring kit back up with a minimum of disruption even if we are just switching it over to a DR site.
stu67 said:
Not got a hose on the building envelope as such, rather the AC chillers which were obviously working at 110%. I've known it to happen also, but why "critical" IT systems are buzzing away in a rubbish building heaven knows why.
Interestingly I was in at the weekend doing our annual "black building" shutdown and when some of my IT colleagues were moaning about working I pointed them to the BA scenario. It's damn useful given the turnover in staff (especially IT) that we have a body of knowledge on how to bring kit back up with a minimum of disruption even if we are just switching it over to a DR site.
I've dealt with many organisations who run 2 DCs, Active/Passive and are flipping between them on a regular basis. Expensive, but keeps everything checked. Interestingly I was in at the weekend doing our annual "black building" shutdown and when some of my IT colleagues were moaning about working I pointed them to the BA scenario. It's damn useful given the turnover in staff (especially IT) that we have a body of knowledge on how to bring kit back up with a minimum of disruption even if we are just switching it over to a DR site.
dmsims said:
Blimey! I got that figure from Heathrow weather station data at the Met office - 2016 ranked 21st since 1948
You need to read your stats. The data consists of:
Mean daily maximum temperature (tmax)
Mean daily minimum temperature (tmin)
Days of air frost (af)
Total rainfall (rain)
Total sunshine duration (sun)
Mean, not max! So the mean max temp through the month of July. Not the highest recorded temperature that month.
I think I have that right, been a while for me on stats
Vaud said:
You need to read your stats.
The data consists of:
Mean daily maximum temperature (tmax)
Mean daily minimum temperature (tmin)
Days of air frost (af)
Total rainfall (rain)
Total sunshine duration (sun)
Mean, not max! So the mean max temp through the month of July. Not the highest recorded temperature that month.
I think I have that right, been a while for me on stats
Thanks - every day is......The data consists of:
Mean daily maximum temperature (tmax)
Mean daily minimum temperature (tmin)
Days of air frost (af)
Total rainfall (rain)
Total sunshine duration (sun)
Mean, not max! So the mean max temp through the month of July. Not the highest recorded temperature that month.
I think I have that right, been a while for me on stats
babatunde said:
couple of things,
I wrote my Masters Thesis (1997) on how outsourcing a company's core competency is a shortcut to failure and how for many companies IT is their core competency. Outsource your cleaners if you must but outsourcing IT in a logistics company is as stupid as outsourcing R&D
Spent many a year designing, building and working in Server rooms and NO I repeat NO proper server room has a single UPS.
When building Racks the redundant power supplies on each individual server will be plugged into a different power source, which are attached to separate UPS's, anything less than this is unacceptable. Redundency is designed into Servers, and the whole Server Room environment is designed around redundancy
SO either it was true amateur hour or they are lying.
Outsourcing of hardware or software has always been a controversial topic since the 1920s when it first emerged. Top managers love outsourcing because it pushes responsibility and workload onto some other sucker and makes them feel all powerful calling the shots. It is the real underlying reason why outsourcing is so popular. You get a bonus or promotion for chopping -20% off short-term costs, somebody else does all the hard labour when it goes right, and somebody else gets shouted at or blamed when it goes wrong. Winner winner, chicken dinner.I wrote my Masters Thesis (1997) on how outsourcing a company's core competency is a shortcut to failure and how for many companies IT is their core competency. Outsource your cleaners if you must but outsourcing IT in a logistics company is as stupid as outsourcing R&D
Spent many a year designing, building and working in Server rooms and NO I repeat NO proper server room has a single UPS.
When building Racks the redundant power supplies on each individual server will be plugged into a different power source, which are attached to separate UPS's, anything less than this is unacceptable. Redundency is designed into Servers, and the whole Server Room environment is designed around redundancy
SO either it was true amateur hour or they are lying.
And, yes, the airline is not telling the truth. It was not a hot day, and no Heathrow or London power suppliers reported a power surge that day.
Puggit said:
stu67 said:
Not got a hose on the building envelope as such, rather the AC chillers which were obviously working at 110%. I've known it to happen also, but why "critical" IT systems are buzzing away in a rubbish building heaven knows why.
Interestingly I was in at the weekend doing our annual "black building" shutdown and when some of my IT colleagues were moaning about working I pointed them to the BA scenario. It's damn useful given the turnover in staff (especially IT) that we have a body of knowledge on how to bring kit back up with a minimum of disruption even if we are just switching it over to a DR site.
I've dealt with many organisations who run 2 DCs, Active/Passive and are flipping between them on a regular basis. Expensive, but keeps everything checked. Interestingly I was in at the weekend doing our annual "black building" shutdown and when some of my IT colleagues were moaning about working I pointed them to the BA scenario. It's damn useful given the turnover in staff (especially IT) that we have a body of knowledge on how to bring kit back up with a minimum of disruption even if we are just switching it over to a DR site.
Yipper said:
Outsourcing of hardware or software has always been a controversial topic since the 1920s when it first emerged. Top managers love outsourcing because it pushes responsibility and workload onto some other sucker and makes them feel all powerful calling the shots.
Yes and no. Outsourcing makes sense when it is not economically viable / you don't have the critical mass / investment budget etc to perform in-house.It makes no sense, for example, to build your own computers from parts in a modern business. You outsource and buy a fully ready, supported product from Dell.
If you have one airconditioning unit, it makes no sense to have a dedicated air conditioning repair man on site for the fractional time they are used - you outsource it.
Ditto many IT services - cloud is attractive because of the sheer scale of investment they can bring (AWS/Azure) and PAYG pricing (opex intensity vs capital intensity). Owning your own data center doesn't make a lot of sense in may use case these days outside of a few cases. you may keep the application knowledge in house, but you outsource the "hassle of the asset"...
Small businesses outsource all the time - it's how many new companies (e.g. Uber) have become as big as they have, as fast as they have - no capital intensity and keeping to their core (platform code + brand identity + engagement) and outsource everything else.
ruggedscotty said:
Very good - but......
There would probably be more than one UPS being used and very likey dual string at least. You would never have to put a UPS into bypass, that is last resort and even then if you had then you would have the other string available. This smacks of something more major, like why would you resort to putting your super critical IT infrastructure into bypass ? The system is usually configured that you can one UPS out for maintenance without any effect on the other units or the IT load. Even then you have it set up that you can have another UPS unit fail and still run with no issue.
A good few years ago I was involved in a UPS program of works that involved installing a new ups unit, The issue that we had was that they didn't talk the new unit with the old units and we had to upgrade the old ones to the new software. Trouble was that the old units couldn't be upgraded live so they had to be taken out of service. In a live working environment.
That was a nightmare to organise, getting permission from the business and sorting it out so that we could do it with minimum impact to the business. So much was involved behind the scenes, we decided to rely on the other string, so we had to survey the whole site and ensure that there was dual string capability on every item, and that it was true dual string and that both supplies had not been fed in error from the one source. We then had to prove that under all scenarios we were able to cope, plan was to place one string onto bypass and carry out the work, During this process we were running the affected string off raw mains so we decided that we would have that on generator. When we made that decision we had switching programs thought out and in place so that we ran as best as we could. we even did pre checks to ensure that if we had our IT on generators and we had power cut the other string going to generator would not affect the load characteristics or disrupt the power quality and knockon the IT.
Absolute nightmare.
All I can say is something terrible must have happened with BA.
I hear you and I'm not really suggesting their systems are being held up by a single UPS with zero redundancy. I was just trying to illustrate where human error could come in while switching UPSs, this stands regardless of the site topology, there are just too many scenarios to list so I went with the simplest.There would probably be more than one UPS being used and very likey dual string at least. You would never have to put a UPS into bypass, that is last resort and even then if you had then you would have the other string available. This smacks of something more major, like why would you resort to putting your super critical IT infrastructure into bypass ? The system is usually configured that you can one UPS out for maintenance without any effect on the other units or the IT load. Even then you have it set up that you can have another UPS unit fail and still run with no issue.
A good few years ago I was involved in a UPS program of works that involved installing a new ups unit, The issue that we had was that they didn't talk the new unit with the old units and we had to upgrade the old ones to the new software. Trouble was that the old units couldn't be upgraded live so they had to be taken out of service. In a live working environment.
That was a nightmare to organise, getting permission from the business and sorting it out so that we could do it with minimum impact to the business. So much was involved behind the scenes, we decided to rely on the other string, so we had to survey the whole site and ensure that there was dual string capability on every item, and that it was true dual string and that both supplies had not been fed in error from the one source. We then had to prove that under all scenarios we were able to cope, plan was to place one string onto bypass and carry out the work, During this process we were running the affected string off raw mains so we decided that we would have that on generator. When we made that decision we had switching programs thought out and in place so that we ran as best as we could. we even did pre checks to ensure that if we had our IT on generators and we had power cut the other string going to generator would not affect the load characteristics or disrupt the power quality and knockon the IT.
Absolute nightmare.
All I can say is something terrible must have happened with BA.
I've spoken with ex-colleagues across four of the major UPS firms, none of which are taking responsibility, and they've all mentioned switching error as a candidate.
If cooling was an issue, exacerbated by poorly maintained UPSs already running hot with internal fan failures and/or high load, then the UPS on reaching its high temp trip point would attempt a bypass transfer. For this to be completed successfully the bypass would have to be within tolerance at that moment in time otherwise the UPS would inhibit the asynchornous transfer and the load would potentially be lost.
We really have nothing to go on but it will come out within the industry, eventually.
Gassing Station | News, Politics & Economics | Top of Page | What's New | My Stuff