When our computer engineers tried to find out what had happened and how, they faced big challenges, the computer giant writes in a new update on Tuesday.
Millions of users around the world despaired of Facebook, Instagram, and WhatsApp. was inactive for more than six hours Monday night.
In hindsight, speculation about what actually happened has flourished, even though the company was explained shortly after it had brought back all three platforms.
Later, Facebook explained that the downtime was due to changes in the configuration of the backbone routers that coordinate network traffic between data centers.
– The error had major consequences on the way our data centers communicate, causing a total outage, the company wrote.
But what does this really mean?
On Tuesday night, more than a day after the error occurred, Facebook revealed more details about what happened. They do it in a blog post about the so-called Facebook Engineering.
– Now that our platform is working as usual, I thought it was worth sharing a little more detail about what happened and why, and what we can learn from it, writes Santosh Janardhan.
You lost access to your “data centers”
I The charge Facebook writes that the backbone routers were killed by a glitch in the system in which the routers store data.
The backbone coordinates all of Facebook’s computing facilities, which are connected via mile-long fiber optic cables and stored in so-called “data centers.”
When one of the company’s applications is opened and data is to be loaded, for example, your messages in the inbox, the data travels to the mobile phone from the nearest data center.
In the data center, the data is processed and sent to your phone.
The data between these facilities is processed by routers, which read incoming and outgoing data.
Routers need to be regularly updated and maintained, for example when fiber cable is to be repaired, capacity needs to be increased, or software needs to be updated. Facebook then removes parts of the backbone for “offline maintenance.”
This is what happened yesterday and what made things go wrong. During routine maintenance work, the entire backbone was accidentally disconnected.
– It happened very fast
The incident led to all of Facebook being completely disconnected from all of its data centers and the internet.
– All this happened very fast. When our computer engineers tried to find out what had happened and how, they faced big challenges, Janardhan writes in the post.
Facebook also writes that there is still no indication that user data has been leaked as a result of the downtime.
What was special about Facebook’s long downtime this time was that the problems not only affected users of the applications, but also the company’s own employees, he writes. TechCrunch.
They did not enter the offices and were not allowed to do any work because the downtime also affected internal systems. This was also confirmed by Facebook in another Message.