When looking at under the hood architecture for Cloud leaders, we immediately see 2 common things:
- Few disclosed figures make you feel dizzy (yes, 1 Peta = 1000 Tera…)
- Very few technical infos disclosed
What happen is that Google & Azure recently let some infos going out for their network 🙂
Facebook is more ahead and provide more information.
About Azure, here are slides from Russinovich master (click on image to get PowerPoint (29 slides)):
Most important news: they use FPGA (programmable processors) to manage network layer (40GbE/s) (or to mine bitcoins, who knows!).
Last time I had valuable informations was at TechEd in 2012, again from Mark. Video is still online:
- ~10 people to admin around 100 000 servers!
- Demo of one of the Azure admin interface,
- Graphical view of racks with VM,
- Demo on platform self healing,
- Explain on leap day bug (29th of february), with even source code line that broke everything
Looks like they index evertything excep their own architecture content (with a killer robots.txt 😉
Techcrunch collected interesting infos, again on their network management (SDN):
In 2009, They had published a visite in their datacenter, with a server content inside a container:
They are more chatty, may be frm their DNA…:
A Facebook server (old model I guess):
Empty at the back is true:
Most information are about Exchange:
- Backup less,
- 3 replicate + 1 Lag at 8 days (goes down to 0 in case of issues)
- JBOD storage (1 database per hard drive).
- Optimizations through Exchange 2016
Office 365 is independent from Azure, while some services are using it and merge is upcoming I guess.
- They are using Xen for virtualization,
- They also make their own server,
- Based on Open source
They have hyper needs, but also hyper resources to handle them:
- Complete control of the entire chain: datacenter, network, servers, OS, hypervisor, applications, load balancers, SQL engines
- Developments in low layers : SDN, FPGA…
- Source code (and people to handle it): Windows for Microsoft, Linux for Google, Facebook and Amazon
- They use Open source (OpenFlow, memcache, Hadoop…) but extend them,
- If they think so, they can spend huge money on topics (like Google with its SDN).
All this gives them a strong independence from other enterprises and their potential buyouts. Other topics may cost less from quantity point of view (ie: earning $50 per server, x300 000)
Where we generally happily stop (geo cluster, DRP, LB) it’s minimum to provide for them. Once this done, another road open with load.