johnr: Prod is not 100% the same as Pre prod, It's very very very close
Plans are always in place for rollback and this is why changes are deployed when most people are asleep,
John I do think we have to give Salt and his team credit for their roll back plans. One hour of disruption for something that I'm sure got any amount of pre testing, to be rolled back out, that's not bad in my book.
Clearly the pre prod system wasn't close enough on this occasion.
I don't know if you picked up the point of my last post.
'close' isn't good enough for a fault free production system. Fault free requires 1 to 1 replication, and even then that's some times not enough. Some times you need even more replication and resource in the test system than the production system so that you can not only run a test but also monitor what's going on.
However what I'm suggesting, is it realistic in a New Zealand network? Are New Zealanders willing to pay what it would cost to have fully replicated test systems?
My 200k phones in a warehouse was just a silly suggestion, but even the resource to set up 5,000 phones in such a way that you can do real world testing is really expensive and that cost has to be passed back to consumers.
My observation is that consumers simply aren't willing to pay for that resource on any of the New Zealand networks. I could be wrong.
In the past I've asked providers why they don't give users a heads up that a system test is going to be performed 'tonight'.
But even that suggestion isn't realistic is it? If you sent out such a text, how many people would hit the call center to object or ask if they can be excluded etc? Who pays for those calls to be answered? Consumers again don't want to pay.
I wonder if there should be a regulated test window? So each provider is required to do testing at the same time each week and those times are published? Each network would be staggered, so people wanting continuity of service could avoid outages.
The reason I propose regulation is to drive consumer awareness and ensure there is a fair and level playing field for all providers.
I also wonder if there should be a regulated opt in alerts system? Personally I'd be inclined to subscribe and then shut my phone off during the event window. Others may choose to have more than one provider and switch providers for that time frame, some just won't care and will just go back to sleep if their phone goes a bit bonkers for an hour.
If this was done, then when someone such as SteveOn gets up set, then we could at least point them to the outages notices and ask what ownership they took? At present Steve isn't really empowered to do anything.