Natwest/RBS glitch

Author
Discussion

honest_delboy

1,519 posts

201 months

Thursday 28th June 2012
quotequote all
I kind of meant if they were being that granular about details on changes surely someone would've noticed there were applying this upgrade to both the primary and secondary nodes at the same time.

joe_90

4,206 posts

232 months

Thursday 28th June 2012
quotequote all
ExFiF said:
joe_90 said:
nail.. head.
The best thing is they just say yes.. or go quiet and do/say nothing.
Actually I'd go further and say that they say everything is OK< or go quiet even when pressed repeatedly. When the situation is about to go into meltdown and they are threatened with their intransigence / failure to reply being escalated up the management tree, then their reply pings back within seconds. This says to me that they did know the answer but were just refusing to answer.

Of course if you do escalate it and their boss is of same mindset...
I think this is cultural thing, the Chinese are very bad at this too, never wanting to admit they don't understand something (when training them), or want to ask questions.

onyx39

11,133 posts

151 months

Thursday 28th June 2012
quotequote all
Sexual Chocolate said:
Nothing to do with change control. Its pretty strict here.

The upgrade was succsfull, it just had performance issues. By monday they where about a day behind. On Tuesday when they backed it out someone formatted the messaging queue. Not sure what they thought was going to happen or if this was done in error. Could have been as simple as the UNIX/LINUX equvilant of sudo rm -rf * but not checking where you where in the system.

The change was implemented by UK based staff the recovery on the other hand wasn't.


Edited by Sexual Chocolate on Thursday 28th June 10:34
so we broke it, and the Indian's fixed it??

Sexual Chocolate

1,583 posts

145 months

Thursday 28th June 2012
quotequote all
onyx39 said:
so we broke it, and the Indian's fixed it??
Change raised by uk staff, issue discovered by UK based staff (Ops I think) the indian support team didnt have a clue how to fix it. In the end UK staff resolved it.

Chim

7,259 posts

178 months

Thursday 28th June 2012
quotequote all
onyx39 said:
Sexual Chocolate said:
Nothing to do with change control. Its pretty strict here.

The upgrade was succsfull, it just had performance issues. By monday they where about a day behind. On Tuesday when they backed it out someone formatted the messaging queue. Not sure what they thought was going to happen or if this was done in error. Could have been as simple as the UNIX/LINUX equvilant of sudo rm -rf * but not checking where you where in the system.

The change was implemented by UK based staff the recovery on the other hand wasn't.


Edited by Sexual Chocolate on Thursday 28th June 10:34
so we broke it, and the Indian's fixed it??
Are you hard of reading, he clearly states that the upgrade was carried out at the weekend. The upgrade was successful but following it they where seeing performance issues while it was under load, the decision was then taken to back out the change on Tuesday and this was done offshore. During this back-out some idiot managed to deleted half the batch and they then had to restore it from tape.

For some reason that I am not sure of they had to go back to the Friday back-up which then led to the chaos as all accounts would be brought back to the Friday position and all transactions since this point would have to be manually restored.

honest_delboy

1,519 posts

201 months

Thursday 28th June 2012
quotequote all
ahhhhh i seeeeeee, so after the upgrade tests were done which passed ok so 2nd node upgraded. Only under load did the problems arise.

Is this right?

Chim

7,259 posts

178 months

Thursday 28th June 2012
quotequote all
honest_delboy said:
ahhhhh i seeeeeee, so after the upgrade tests were done which passed ok so 2nd node upgraded. Only under load did the problems arise.

Is this right?
From what I can gather, yes. Only mystery to me is why they had to restore from Friday.

honest_delboy

1,519 posts

201 months

Thursday 28th June 2012
quotequote all
Chim said:
From what I can gather, yes. Only mystery to me is why they had to restore from Friday.
With the scheduler offline to do the upgrade ..... would the scheduler schedule its own backups?

Sexual Chocolate

1,583 posts

145 months

Thursday 28th June 2012
quotequote all
Not sure of they did restore from Friday there is no mention of it. But then again I don't know mainframe stuff so maybe they did.

Reading through it all the change requestor mentions its a simple task and hence thats why they got that window. Probably implelemented by prod support which is, as far as I know is offhsore but they are still RBS employees I think.

Edited by Sexual Chocolate on Thursday 28th June 11:19

onyx39

11,133 posts

151 months

Thursday 28th June 2012
quotequote all
Chim said:
onyx39 said:
Sexual Chocolate said:
Nothing to do with change control. Its pretty strict here.

The upgrade was succsfull, it just had performance issues. By monday they where about a day behind. On Tuesday when they backed it out someone formatted the messaging queue. Not sure what they thought was going to happen or if this was done in error. Could have been as simple as the UNIX/LINUX equvilant of sudo rm -rf * but not checking where you where in the system.

The change was implemented by UK based staff the recovery on the other hand wasn't.


Edited by Sexual Chocolate on Thursday 28th June 10:34
so we broke it, and the Indian's fixed it??
Are you hard of reading, he clearly states that the upgrade was carried out at the weekend. The upgrade was successful but following it they where seeing performance issues while it was under load, the decision was then taken to back out the change on Tuesday and this was done offshore. During this back-out some idiot managed to deleted half the batch and they then had to restore it from tape.

For some reason that I am not sure of they had to go back to the Friday back-up which then led to the chaos as all accounts would be brought back to the Friday position and all transactions since this point would have to be manually restored.
No. I can read perfectly well thank you.

Chim

7,259 posts

178 months

Thursday 28th June 2012
quotequote all
honest_delboy said:
Chim said:
From what I can gather, yes. Only mystery to me is why they had to restore from Friday.
With the scheduler offline to do the upgrade ..... would the scheduler schedule its own backups?
Should not have made a difference, the change would have to have left enough time for the back-up to have run, it may have been delayed until the change was finished and they would have been in STIPP, the back-up would have run as normal after this. It may be that as they where backing up the upgrade system that they could not restore as the system was now different and they had to restore back to the regressed system (namely the Friday prior to the change) this would make sense.

ExFiF

44,252 posts

252 months

Thursday 28th June 2012
quotequote all
joe_90 said:
ExFiF said:
joe_90 said:
nail.. head.
The best thing is they just say yes.. or go quiet and do/say nothing.
Actually I'd go further and say that they say everything is OK< or go quiet even when pressed repeatedly. When the situation is about to go into meltdown and they are threatened with their intransigence / failure to reply being escalated up the management tree, then their reply pings back within seconds. This says to me that they did know the answer but were just refusing to answer.

Of course if you do escalate it and their boss is of same mindset...
I think this is cultural thing, the Chinese are very bad at this too, never wanting to admit they don't understand something (when training them), or want to ask questions.
I find the Chinese somewhat different, those from the mainland seem to be quite arrogant and deceiptful, maybe just been unlucky.

But agree it is a cultural thing, also having said what written earlier, some Indian colleagues who have come to Europe have been some of the most committed, hard working, never say die colleagues I have ever had the privilege to work with.

Du1point8

21,613 posts

193 months

Thursday 28th June 2012
quotequote all
Sexual Chocolate said:
onyx39 said:
so we broke it, and the Indian's fixed it??
Change raised by uk staff, issue discovered by UK based staff (Ops I think) the indian support team didnt have a clue how to fix it. In the end UK staff resolved it.
Sounds about right as change release staff are just that... they release the software based upon testing done via development team and their testing team... faking the results is not going to be picked by a release team at all... they just release it.

Du1point8

21,613 posts

193 months

Thursday 28th June 2012
quotequote all
ExFiF said:
joe_90 said:
ExFiF said:
joe_90 said:
nail.. head.
The best thing is they just say yes.. or go quiet and do/say nothing.
Actually I'd go further and say that they say everything is OK< or go quiet even when pressed repeatedly. When the situation is about to go into meltdown and they are threatened with their intransigence / failure to reply being escalated up the management tree, then their reply pings back within seconds. This says to me that they did know the answer but were just refusing to answer.

Of course if you do escalate it and their boss is of same mindset...
I think this is cultural thing, the Chinese are very bad at this too, never wanting to admit they don't understand something (when training them), or want to ask questions.
I find the Chinese somewhat different, those from the mainland seem to be quite arrogant and deceiptful, maybe just been unlucky.

But agree it is a cultural thing, also having said what written earlier, some Indian colleagues who have come to Europe have been some of the most committed, hard working, never say die colleagues I have ever had the privilege to work with.
thats cause, those that can, move to europe and london are good, those that can't usually stay in india for a reason, very few good people stay in india unless for personal reasons. Speaking from my experience of my indian friends and their opinions as well as my own, some of them are embarrassed at some of their colleagues back home.

Ozzie Osmond

21,189 posts

247 months

Thursday 28th June 2012
quotequote all
Bank?

Chaos?

..... of course it's nothing to do with them. Nothing whatsoever.

Du1point8

21,613 posts

193 months

Thursday 28th June 2012
quotequote all
Chim said:
honest_delboy said:
ahhhhh i seeeeeee, so after the upgrade tests were done which passed ok so 2nd node upgraded. Only under load did the problems arise.

Is this right?
From what I can gather, yes. Only mystery to me is why they had to restore from Friday.
One more from me:

This looks like (from my own experience) End of day running in a bank is limited to Monday to Friday and hence upgrades are attempted at weekends, so you would only see full load of a system on the monday night when all hell would break loose... Also suggests their UAT system is not doing EOD testing with a full load of prod data and is running way under normal tolerances.

So when it breaks on Monday, they would need to restore from Fridays EOD, then run Mondays EOD and so on to catch up. Seems logical as when I designed the EOD systems for Citi and HSBC equities departments thats the standard we would do on UAT to catch up and force several EOD tests to prove the system works and we needed to do once when server overheated one night (but it didn't affect day to day business)...

I wish more people would learn what stress testing is and bleeding well test for it.

Carfiend

3,186 posts

210 months

Thursday 28th June 2012
quotequote all
Stress testing, load testing, boundary testing, whatever the want to call it it should be done. However when I have asked for identical hardware, network, software and data setups for Test/UAT systems only to get them rejected due to cost all I can do is shrug and point to the fact I asked for proper testing and they wouldn't sign it off. Amazingly the request is quickly signed off after the first mess.

joe_90

4,206 posts

232 months

Thursday 28th June 2012
quotequote all
Once (years back) I was helping upgrade a banks business process production system.
So we took a copy of all the data (john smith'd it in the database) and they guys in charge of the system after getting lots of sign offs brought the data to us.

We had rented a sun server (small one, installed oracle etc) and copied all the data etc onto our system, thus replicating the system, bar the fact they were running on E10000's with huge disk arrays etc.

The upgrade (did 3 times) took 8 hours (each).. Fine, all worked should take less on the super powerful production system. This gave them confidence that our part of the upgrades (multi systems etc) would indeed work.

After 18 hours they were getting a little panicky as Monday was now starting to get closer.. 21 hours later it finished.. never worked out where the bottle neck was..

Art0ir

9,402 posts

171 months

Thursday 5th July 2012
quotequote all
Checked my online banking this morning, last transaction recorded was the 28th.. I had money placed in my account on the 29th so I was hopeful it would turn up tonight or perhaps tomorrow morning.. So I go to check again and get this

Ulsterbank said:
Service Temporarily Unavailable

We regret this service is temporarily unavailable and thank you for your patience.
As we continue to work hard to clear the backlog caused by our current technical issues, we have suspended our online application service until we return to normal service. We will continue to provide daily updates on our progress on our website. For the latest information please visit Help
Brilliant. So I now have to visit an ATM periodically to know if my account is even close to being up to date now.

TallbutBuxomly

12,254 posts

217 months

Thursday 5th July 2012
quotequote all
Art0ir said:
Brilliant. So I now have to visit an ATM periodically to know if my account is even close to being up to date now.
It will be at least another two weeks before any semblance of actual normality returns.