A Globalized Cloud For A Globalized World

hiba · October 15, 2018 · 7 min read

The web of today is truly global. It spans ever more of the planet’s surface, bringing an ever-greater portion of its population into the global communications fold. Building for such a globalized internet requires technical architectures and development mindsets that haven’t historically been taught in computer science courses and where physics concepts like the speed of light suddenly play a very real role in user experience. Like everything else in the online world, the public cloud is increasingly making geography transparent to developers, making it easy to launch global applications at production scale.

When I launched my first web startup, Mosaic had just recently launched, the “web” was still in its infancy, most content was in English and few companies were focused on building globally synchronized distributed applications. Today even the smallest web startup must consider how to handle an increasingly globalized online user base that may access its services from anywhere in the world. Gone are the days where companies could simply host their site in the US and tell their users on the other side of the world that they will just have to accept a website or app that is as slow in 2018 as a dialup modem of 1996.

No matter how finely tuned and optimized a modern web application is, the basic physics problem of the speed of light constrains how fast a user on the other side of the world can access it. A user in Tokyo accessing a website hosted in Virginia will have an unacceptably slow user experience and likely abandon using the application, even if the server itself is responding within a thousandth of a second to each request.

As Google points out in its latency tutorial, under theoretically perfect conditions, a user in Frankfurt, Germany accessing a website hosted in Google’s Iowa data center would experience 75 millisecond roundtrip latency. In reality, fiber optic cable doesn’t travel in a perfectly straight line and given the presence of networking equipment along the way, a more realistic theoretically optimal estimate would be 112.5 milliseconds, or just over a tenth of a second for each request, even before any server latency in handling the request is factored in. Even the distance between the East and West coasts of the United States is sufficient to create unacceptable latency in many applications.

To address this, companies have long built data centers across the world in each region they have a large customer base. Yet, the cost of building a fleet of data centers spanning multiple countries historically placed this approach out of the reach of all but the largest enterprises.

Enter the public cloud, with Amazon, Azure, Google and their peers all rapidly building data centers in strategic locations worldwide to allow companies to deploy their applications as close as possible to their global customer base. Connecting all of these data centers are massive private fiber optics networks that provide the absolute minimum latency and highest bandwidth technically feasible today.

For example, Google’s private fiber network allows data to transit between any of its worldwide data centers without ever leaving Google’s own fiber cables. This means that a web crawler running in Google’s Mumbai data center that fetches a page from a website in Texas will see that HTTP request travel from India all the way to the US along Google’s fiber cable, then exit the nearest Google Point of Presence physically closest to the server hosting the Texas website. The response is then funneled back into that POC and transits Google’s fiber all the way back to Mumbai. Similarly, a user in Finland accessing a website hosted in Google’s Los Angeles data center will see their request complete nearly its entire transit over Google’s private fiber network. Making this even more remarkable is that the price per gigabyte to ship data from Virginia to London is the same to ship it from Virginia to Tokyo, meaning bandwidth costs to connect and synchronize global applications are relatively uniform, dramatically reducing costs when connecting applications across multiple countries.

In my own web and high-performance computing background spanning back more than 23 years to the early days of the modern web, most of the computing systems I worked with were largely geographically centralized. From 100,000 processor traditional supercomputers to massive clusters, the underlying hardware was typically centralized in a single data center or a handful of geographically proximate data centers.

As my work has grown increasingly globalized over the years I’ve had to focus ever more on developing globally distributed applications that span many data centers across many countries in tightly synchronized but highly fault tolerate applications.

When I launched my first virtual machine in a public cloud data center outside of the United States I was expecting the process to be an extensive and technically complex ordeal, based on past experiences in the pre-cloud era of building multi-country applications. Instead, spinning up a virtual machine in Sao Paulo or Mumbai or Tokyo or anywhere else in the world in today’s public cloud is no different than launching one in Ohio. One simply selects “Mumbai” instead of “Ohio” as the data center and absolutely everything else is unchanged. Even the monthly cost of running the virtual machine is almost the same.

Most importantly, the modern public cloud provides an absolutely identical development and operational environment across their global data centers. You can simply copy your virtual machine from Ohio over to Tokyo and it will spin up instantly and see exactly the same environment with exactly the same resources, with the VM being entirely unaware it is now a continent away. In the case of my open data GDELT Project, its underlying computing resources span 16 data centers in 12 countries and the code running in each data center in each country is absolutely identical. All of the code reads and writes to the same cloud storage buckets, uses the same infrastructure resources and the same machine learning APIs and are able to synchronize with each other globally without a single line of code being changed. In fact, the only way to tell that a given virtual machine is located in Tokyo compared with South Carolina is the speed of light latency that makes interactive terminal input slightly slower when working on the one 7,000 miles away.

This geographic transparency of the modern cloud makes it possible to build “living” applications that are globally synchronized but expand and contract in each region of the world as customer bases expand or shift. A small startup can launch with a few virtual machines in a single US data center and rest assured that their site is accessible globally with the best latency and bandwidth technically possible today. As that startup’s business expands to a new region of the world all it takes a few mouseclicks to launch a mirror image of their site in a new data center as close as possible to those new customers, without a single line of code changing. In fact, the strong global networking means that the startup can even keep all of their backend computation in the US and deploy only latency-sensitive customer-facing components to remote data centers.

Putting this all together, the modern cloud is increasingly making geography transparent to developers, allowing them to deploy globalized applications without changing a single line code.

Source: forbes.com

Author: Kalev Leetaru

#AI #Azure #big data #Cloud #Google

hiba

Author at Live Assets.

Discussion

Leave a Reply Cancel reply