GiffGaff's mobile network fell over last week, downed by a water pipe that burst and bathed its supplier's kit in the wet stuff. Forget snakes on a plane -- water pipes in a data centre are far more terrifying.
The outage began at 10.10am and lasted until just after 6pm -- affecting calls, texts, data and even GG's ability to fully update its own community forum pages. The only services that weren't affected were inbound texts and emergency calls, according to CEO Mike Fairman, who spoke to me in an exclusive interview yesterday. Anyone trying to port numbers to GiffGaff was also put on hold.
"There was a water leak in a data centre -- one of our third-party suppliers -- which initially took out the power supply to some of our servers but it also actually damaged some of the servers as well," explains Fairman. "And although there was resiliency built into the service, in that there were two sets of servers and load balancing between them, because the power supply was interrupted and the network gear was interrupted then the whole thing went down."
GiffGaff is a small fish in the UK's mobile network pond. It's not one of the big five operators, who all own their own networks, rather it's what's known as a mobile virtual network operator -- or MVNO -- because it piggybacks on one of the big boys' networks, namely O2's.
GiffGaff does own and run some of its own kit, though, which is why O2's network wasn't taken down during the unscheduled shower. In fact O2 actually owns GiffGaff -- albeit GG is run as a separate company. GiffGaff's parent even rolled up its sleeves and helped apply a temporary fix during Friday's outage, Fairman reveals.
There's no shortage of MVNOs in the UK -- Tesco and Virgin Mobile are the best known -- but GiffGaff likes to make out it's special because it does things a bit differently. It says it's able to offer cheap mobile deals, for example, because it keeps its own costs down by recruiting customers to run support services in exchange for a discount. But with Friday's day-long outage, GiffGaffers might be forgiven for thinking the company also cuts corners on its network infrastructure. So does it?
Well, yes. Fairman tells me it would have been possible for its service to be proofed against Friday's failure -- if it had doubled up on a particularly expensive network component which it didn't, on cost-saving grounds.
Discussing the component that failed, he says: "That particular part of the system is a single physical location which, when you think about redundancy there's a spectrum of redundancy that you can build in to a system.
"Obviously one of the things that we could do is to double up on that particular network component but that is a very, very expensive network component -- and one would argue that the contingency that we have currently, which is failover with free calls and free data, would be a better and more cost-effective option than actually doubling up on that component 100 per cent."
The 'failover' Fairman cites is the temporary fix put in place on Friday afternoon to get some services up and running. This failover actually meant GiffGaffers who were able to make calls or get online at that time were getting free calls and free data during the outage, and even the following day, because the company wasn't checking their balances before routing calls.
"The thing that actually went wrong was not the network. It was the component that holds the balance and is the thing that the network checks if the balance is there every time a call is made," adds Fairman. "That obviously had the impact of not allowing the network to route calls. So what we did was put a bypass in on the network side of things that didn't ask the billing engine about the calls."
Should be faster
Having stuck this sticking plaster on its network, GiffGaff was able to get service back up and running by Friday afternoon -- and has since been able to fully restore its normal network order by replacing damaged kit. Fairman reckons GiffGaff could use this same bypass workaround in a future outage scenario -- indeed, he prefers the bypass route to doubling up on that "very, very expensive" bit of network kit whose failure caused all this bother in the first place.
The bypass workaround isn't perfect though, as it would inevitably result in some loss of service for GiffGaffers while the company twiddles the knobs. But Fairman reckons it would be able to get the bypass in place faster next time. "That time delay would not be anywhere near as long in future though if it was to happen again," he says.
Does GiffGaff do full disaster recovery testing? Not exactly, admits Fairman. "We have contingency planning and disaster recovery testing is kind of difficult to do, in as much as you do need to have that dual site resilience in place to test it, so if you haven't got that you can't test it," he says. "What we do do is load test and have contingency plans for components in the system."
After Friday's watery debacle, Fairman stresses that the company is reviewing its network resilience and infrastructure, and is bringing forward planned infrastructure investments totalling millions of pounds. But isn't it fair to say GiffGaff has been able to offer cut-price mobile deals by making some compromises on the quality and resilience of its network kit?
"I don't think so," says Fairman. "We do use the O2 network which is very resilient and capable. What we've done with the business is to put the resilience and disaster recovery in line with the scale of our business and the maturity of our business."
However, he goes on to describe GiffGaff's service commitment to its users as a spectrum -- making the case that if you're a customer of a start-up (which he says GiffGaff is, despite it being owned by the mobile behemoth O2 -- in turned owned by global comms giant Telefónica) you shouldn't expect the highest standards in town.
"Please don't go"
"Clearly when you're a start-up you're very, very small and you can't afford to do much of that -- as you grow and you get more customers on board, it's a sensible thing to do to build more resiliency in. So this year we already have a plan to invest in our infrastructure significantly -- it's millions and millions of pounds of IT investment and so that's already part of our plan, we've already brought forward some of that spend," he says.
"And as a result of what happened on Friday we're going to be reviewing those plans again to see if there are other things that we can bring forward so as we grow and scale the business then our commitment and obligation to our customers also grows and we'll put plans in place to do with redundancy that are appropriate."
What does Fairman say to GiffGaffers who aren't convinced by its after-the-fact network investment plans and have decided to leave GiffGaff because they just don't reckon it's up to snuff?
"I'd say please don't go. And reassure them that we do really care about our network and the reliability of it -- and we are taking steps to make sure it doesn't happen again," he says. "It's very, very upsetting for us for this to happen and we will be making every attempt we can to make sure this doesn't happen again so hopefully they'll see that as a genuine effort on our part to make things better."
In the spirit of making things better, GiffGaff has taken up some of its members' suggestion that instead of compensating them for the loss of service on Friday it will be donating £10,000 to a charity of their choice.
"There's quite a lot of debate going on at the moment on the forum about it -- there are a few people still asking for compensation and there are other people saying, well, if you do the maths, [what most people spend equates to] 33p a day -- we were out eight hours, that's 11 pence.
"Would you want to be compensated 11p, which would obviously cost us more to implement than the actual 11p, or would you rather GiffGaff actually focused on making things better (by investing more in its network)? And it's a nice gesture for a corporation to be making a charitable donation."
Are you reassured about the resilience of GiffGaff's network? Or will you be taking your digits elsewhere? Let us know in the comments below or over on our Facebook page.