Facebook can’t be down, can it? Facebook and its associated services WhatsApp and Instagram were, in fact, all down. Their (Domain Name System) DNS stopped resolving, and their infrastructure IPs were unreachable for 6 hours on 05/10/2021. As if someone had “pulled the cables” from their data centers all at once and disconnected them from the internet world.
The Internet is a network of networks, and it’s bound together by BGP. BGP stands for Border Gateway Protocol. BGP is the process of exchanging routing information between Autonomous Systems (AS) on the Internet. Without BGP, the Internet routers wouldn’t know what to do, and the Internet would stop working. BGP allows platforms to advertise their presence to other networks that form the Internet. As we write Facebook, Instagram, Whats App are not advertising their presence, ISPs and other networks can’t find Facebook’s network and so it is unavailable. An Autonomous System (AS) is an individual network with a unified internal routing policy. An Autonomous System (AS) can originate prefixes, as well as transit prefixes. Due to Facebook stopped announcing their DNS prefix routes through BGP, everyone else’s DNS resolvers had no way to connect to their nameservers. Consequently, 1.1.1.1, 8.8.8.8, and other major public DNS resolvers started issuing (and caching) SERVFAIL responses. As Facebook and their sites are so big, DNS resolvers worldwide handling 30x more queries than usual and potentially causing latency and timeout issues to other platforms.
This event is a gentle reminder that the Internet is a very complex and interdependent system of millions of systems and protocols working together.
Facebook should think about a central NMS system. Going down all the associated platforms can create havoc. Software Architecture and Redundancy problems should be given major priority. At a time all the associated platforms can not go down.