Follow this link for all System design articles

LB setup

Backup LB

  • used when Primary LB is down
  • dns IP change time > changing static IP from Primary LB to Backup LB
LB setup

LB as a service/High scale apps

  • 1 B requests/sec

What is a load balancer?

According to (Wikipedia): a load balancer refers to the process of distributing a set of tasks over a set of resources (computing units), with the aim of making their overall processing more efficient. Load balancing can optimize the response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle.

As you can see from the diagram, you can place a load balancer at any layer of your architecture including (Client/Backend/Database layers).
  • Load Balancers can be run as an array of servers, these can be auto-scaled when needed, and released when don’t need.
  • Monitors health of targets
  • Integrates with SSL
  • “Elastic” — Load Balancers can be run as an array of servers, these can be auto-scaled when needed, and released when don’t need.
  • Steps for a load balancer —
    > Picks Appserver
    > Forward request to App server
    > Receive Response from App server
    > Forwards Response to Client

Why load balancer?

  • Elastic scaling of servers can be done behind a load balancer
  • Security — all the machines that are below LB can be in a private network. Servers are disconnected from the Internet and communication is happening via LB (i.e. LB to Application Server)
  • Encryption and Decrypting the request
  • Securing the request via HTTPS — LB layer is HTTPS and Application layer is HTTP
  • Improves the distribution of workloads across multiple computers.
  • Multiple IP addresses are associated with a single domain name( is associated with IP1, IP2, IP3). Whenever the client wants to connect using DNS, IPs are given based on scheduling ago
  • Performance: There are physically bounds on how much 1 computer can do the task. Millions of users can only be supported on multiple machines, not 1
  • Availability: Machine failure happens. We should avoid single points of failure whenever possible.
  • Economy: It's always cheaper to buy 5 small commodity machines as compared to 1 big machine.
  • Redundancy: Multiple machines are of the same capacity, if 1 fails others are ready and up, While in Vertical scaling(if 1 machine fails) system is down.
  • Handle more users wrt single server.
  • Examples Ngnix, Amazon ELB

How does the load balancer work?

  • Define IP/DNS name: For each programme, task, or website, administrators designate a single IP address and/or DNS name to which all requests will be sent. The load balancing server is identified by its IP address or DNS name.
  • Configure backend servers to LB: The administrator will next enter the IP addresses of all the actual servers that will be sharing the workload for a certain application or task into the load balancing server. The load balancer is the sole way to reach this pool of available servers.
  • Deploy: Finally, your load balancer must be set up — either as a proxy, which sits between your app servers and your users all over the world and accepts all traffic or as a gateway, which assigns a user to a server once and then ignores the interaction.
  • Bring it on: Finally, your load balancer must be configured, either as a proxy that sits between your app servers and all of your users across the world and accepts all traffic or as a gateway that assigns a user to a server once and then ignores the interaction.

Scheduling Algorithms at Load Balancers

  1. ROUND ROBIN — Send requests one after the other.
  2. WEIGHTED ROUND ROBIN — where based on the server’s compute & traffic handling capacity weights are assigned to them.
    → And then based on the server weights, traffic is routed to them using the Round Robin algorithm.
    → With this approach, more traffic is converged to machines that can handle a higher traffic load thus making efficient use of the resources.
    → This approach is pretty useful when the service is deployed in different data centres having different compute capacities. More traffic can be directed to the larger data centres containing more machines.
  3. LEAST CONNECTIONS — Sends requests to the server with the lowest number of active connections.
    → There are two approaches to implementing this –
    → In the first, it is assumed that all the requests will consume an equal amount of server resources & the traffic is routed to the machine, has the least open connections, based on this assumption. Now in this scenario, there is a possibility that the machine having the least open
    connections might be already processing requests demanding most of its CPU power Routing more traffic to this machine wouldn’t be such a good idea power. Routing more traffic to this machine wouldn’t be such a good idea.
    → In the other approach, the CPU utilization & the request processing time of the chosen machine are also taken into account before routing the traffic to it. Machines with less request processing time, CPU utilization & simultaneously having the least open connections are the right candidates to process the future client requests.
    The least connections approach comes in handy when the server has long opened connections, for instance, persistent connections in a gaming application
  4. LEAST TIME — Sends requests to the server selected by a formula that combines the fastest response time and fewest active connections.
  5. HASH — Distributes requests based on a key you define, such as the client IP address or the request URL.
    → Hashing the source IP ensures that the request of a client with a certain IP will always be routed to the same server.
    → This facilitates a better user experience as the server has already processed the initial client requests and holds the client’s data in its local memory. There is no need for it to fetch the client session data from the session memory of the cluster & then process the request. This reduces latency.
    Hashing the client IP also enables the client to re-establish the connection with the same server, that was processing its request, in case the connection drops.
    Hashing a URL ensures that the request with that URL always hits a certain cache that already has data on it. This is to ensure that there is no cache miss.
    → This also averts the need for duplicating data in every cache and is thus a more efficient way to implement caching.
  6. IP HASH: (HTTP only) — Distributes requests based on the first three octets of the client IP address.
  7. RANDOM WITH 2 CHOICES — Picks two servers at random and sends the request to the one that is selected by then applying the Least Connections algorithm.
  8. DNS Round Robin Load Balancing? — Yes it's different from Round robin.
    → Mapping multiple servers(IP-1, IP-2, IP-3) to the same hostname, so that when users visit multiple servers are available to handle their requests.
    → Now client Requests will get distributed across a group of server machines.
    → DNS-based load balancing is a specific type of load balancing that uses the DNS to distribute traffic across several servers. It does this by providing different IP addresses in response to DNS queries. Load balancers can use various methods or rules for choosing which IP address to share in response to a DNS query.
  9. Subdomain DNS Delegation with Round Robin:
    → When we have multiple subdomains (, inside the main domain( subdomains have their own nameservers. Now when a request comes in for a resource inside primary nameserver( will forward to subdomains nameservers. This will improve response times, as the DNS protocol will automatically look for the fastest links.
  10. Geographic — redistributes application traffic across data centres in different locations for maximum efficiency and security.
    →While local load balancing happens within a single data centre, geographic load balancing uses multiple data centres in many locations.
    → DNS based, Redirect?
    → Check below for more detail
  11. Client-side random load balancing:
    → Deliver a list of server IPs to the client, and then have the client randomly select the IP from the list on each connection. This essentially relies on all clients generating similar loads.
  12. Server-side load balancers:
    → Load balancer binds and listens on a port. Forwards the request to all backend servers, whichever responds 1st. it caters for the request.
    → It has security benefits by hiding the structure of the internal network and preventing attacks on the kernel's network stack or unrelated services running on other ports
    → If a request comes from IP Address within a range from (x to y) forward packet to backend server-1, if the request comes from IP Address range from (a to c) forward packet to backend server-2 and so on.
  14. LAYER 5 AWARE:
    → These load balancers are layer-5 protocol aware(Eg: HTTP). They can look into the HTTP header can decide what to do with the packet whether to send it to server-1 or 2.
Geographic load balancing

Types of Load Balancing

    Traffic is divided using IP Addresses or Port numbers.
    → So when client A talks to server B — the request (L4) has both IP addresses and ports — tuples. For example,
    ✅ the client’s side will look like [, 10023(random port above 1024)] and
    ✅ the server-side will be [, 80] — the same tuple exists on the other side, where the LB connects to the actual server, but in that case, the LB plays client’s role.
    ✅With ports in mind, LB have a mapping between connections it received from clients and connections LB established to servers. Having ports allow many many mapping — the same client may have more than one connection to many servers via LB.
    ✅ The above was about L4;
    ✅ L7 works in a similar way — just more processing is happening on the load balancer itself.
    → if load-balancer # 1 fails traffic gets routed using load-balancer # 2.
    Usually, load balancers are implemented in high-availability pairs which may also replicate session persistence data.
    → Load Balancers know which machines are: Available, Healthy(Health check packets are sent, similar to ping requests. Servers reply with their state), Not overloaded
    Implies a division of traffic between physical interfaces at layer1(per packet division) or layer2(data link layer with a protocol like shortest path bridging).

Broad types

  1. Hardware LB
    → routing the request
  2. Software LB
    → L4
    → L7

L4 vs L7

# Layer 3 LB / NLB(Network LB) / VPN LB

  • LB decision is made based on IP Address

# Layer 4 LB

  • Operates on OSI Layer 4 — Transport Layer
  • Has access to TCP or UDP, IP, Port
  • They are mostly the network address translators (NATs) which share the load to the different servers getting translated to by these load balancers.
  • The only thing we get from the incoming request is the source_ip + port and based on these we route the requests
  • computation is less so it will be faster
  • exposing only minimal information so it’s more secure than layer 7 LB.
  • Routes traffic based on protocol and port of incoming traffic
  • SSL passthrough by default (AWS now has a feature to Terminate SSL)
    Session persistence can be achieved at the IP address level
  • No termination for TCP connections

Two modes of L4 LB:

1. DSR Mode(Direct Server Return Mode)

DSR is an implementation of asymmetric network load distribution in load-balanced systems, meaning that the request and response traffic use a different network path.
The use of different network paths helps avoid extra hops and reduces the latency by which not only speeds up the response time between the client and the service but also removes some extra load from the load balancer.

The TCP connection is established directly between the client and the backend. The load-balancer sees only the requests and just changes the destination MAC address of the packets. The backends answer directly to the client using the service IP (VIP) configured on a loopback interface (this is why the src is VIP for response). Note that the requests pass through the load balancer while the responses not. LB won’t be the bandwidth bottleneck.

  • Problem:
    load balancer passes incoming packets to the appropriate server with negligible modification to the packets. The server responds to the load balancer with the required data and then this is relayed to the client by the load balancer.
    Major drawback: if incoming requests are small(2MB) & Response big(200MB). The response has to pass thru the load balancer while LB might be seeing large traffic already, the risk of the LB might become a bottleneck
  • Solution=DSR.
    As a request reaches the backend server using LB, backends will answer directly to the clients, without passing through LB.
  • In DSR mode, the load-balancer routes packets to the backends without changing anything in it but the destination MAC address.
  • The backends process the requests and answer directly to the clients, without passing through the load-balancer.
  • Advantages:
    → 1.
    very fast
    2. Load-balancer network bandwidth is not a bottleneck anymore
  • Disadvantages:
    → 1.
    No intelligence needed at LB
    2. No Layer-7 advanced features are available.
  • When using this architecture?
    → where response time matters
    → where no intelligence is required
    → when output capacity of the load-balancer could be the bottleneck
DSR Mode


  • The clients get connected to the service VIP. The load balancer chooses a server in the pool and then forwards packets to it by changing the destination IP address (DNAT), the LB becomes the default gateway for the real servers, and the source IP is the client’s IP. All traffic will pass through the load balancer, and output bandwidth is limited by load balancer output capacity. There is only one connection established.
NAT mode

# Layer 7 LB / Application LB (Also called Reverse Proxy)

  • Operates on OSI layer 7 — Application Layer
  • Has access to everything layer 4 has
  • in the request, we get headers, query param, path param and HTTP method
  • will wait for all network packets to arrive before deciding which server will serve my request.
  • better hardware reduces the slowness mentioned above.
  • can Routes traffic based on URL path
  • Validates and terminates SSL

L4/NLB or L7/ALB?

  • NLB handles spiky traffic better
  • NLB exposes static IP address

Benefits of LB

  • Resilence
  • Scalability

RP vs LB

Load balancer vs Reverse proxy

  • They are components in a client-server computing architecture. Both act as intermediaries in the communication between the clients and servers, performing functions that improve efficiency. Most load balancer programs are also reverse proxy servers, which simplifies web application server architecture.
  • Load balancers are most commonly deployed when a site needs multiple servers. Some load balancers also provide session persistence. It refers to directing a client’s requests to the same backend web or application server for the duration of a “session” or the time it takes to complete a task or transaction (e.g. shopping).
  • It often makes sense to deploy a reverse proxy even with just one web server or application server.

Places where Load Balancers can be placed?

  1. Between client and Application servers.
    → Each application server might be serving some subset of users. LB will forward a request to the appropriate AS.
    App-server-2 Load-Balancer client
  2. Between App-servers and cache. Since we can have 100s of cache servers, those needed load balancing.
  3. Between cache and DB servers. Since DB servers can be up to 256 LB in front of it is good.

Load Balancing with SSL Termination/Offloading and Validation

  • SSL passthrough passes through HTTPS traffic to a backend server without decrypting the traffic on the load balancer. The data passes through fully encrypted, which precludes/prevents any layer 7 actions. Proxy SSL passthrough is the simplest way to configure SSL in a load balancer but is suitable only for smaller deployments.

What is SSL Offloading?

An SSL offloading is the mechanism of transferring the incoming encrypted traffic from a client to a load balancer to relieve the webserver from encryption/decryption of data. A load balancer is positioned between a browser and the webserver. They use SSL security protocol to perform either SSL termination or SSL bridging to lower the server’s metaphoric shoulders’ operational.

The load balancer receives the encrypted data coming from a. It performs decryption and sends the plain text to the server, making it free from this time-consuming process.

Why Need SSL Offloading?

As you already know, a public key and a private key are used to encrypt and decrypt. These keys are of larger size (2048-bit) with the RSA algorithm. Although it safeguards the site completely, it makes the encryption and decryption process slow due to its bulky size.

A session key is faster than 2048-bit keys, but when many visitors land on the same website, the server has to deal with many session keys. This burdens the server with encryption or decryption requests in a short interval of time.

This activity is CPU-intensive and consumes the resources of the web server up to a large extent. This process makes the server work slowly. the SSL offloading concept was introduced to make the backend work faster.

[Extra] SSL is an encryption process that protects communications on the Internet. SSL encryption can ensure the security of user communication. SSL encryption and decryption require a lot of CPU resources and may put pressure on server resources. To balance the computational requirements for SSL encryption and decryption of traffic sent over an SSL connection, SSL offloading will transfer processing to a dedicated server. This frees up the Web server to handle other application delivery requirements.

[Extra] SSL offloading on a load balancer is now a required capability and these load balancers are also referred to as SSL load balancers. This is a load balancer that has the ability to encrypt and decrypt data transported via HTTPS, which uses the SSL protocol to secure data across the network.

How Does SSL Offloading Work?

A load balancer is used for the task of SSL offloading. The load balancer is positioned between the browser and the webserver to perform the chore in place of the server. The load balancer uses the same SSL certificate that is already issued to the server to complete this task. A load balancer can perform this job in two different ways.

1. SSL Termination

2. SSL bridging

#1. SSL Termination

SSL termination is an SSL offloading that helps in accelerating the speed of the server. This process is done by connecting the user through a secured connection to the load balancer and then connecting them from the load balancer to the server via an insecure connection.

[Extra] SSL offloading relieves a web server of the processing burden of encrypting and decrypting traffic sent via SSL. Every web browser is compatible with SSL security protocol, making SSL traffic common. The processing is offloaded to a separate server designed specifically to perform SSL acceleration or SSL termination. SSL certificates use cryptography keys for encryption. RSA keys of increasing key lengths (e.g. 1024 bits and 2048 bits) were the most common cryptography keys until a few years ago. But more efficient ECC (Elliptic Curve Cryptography) keys of shorter key lengths are replacing the RSA keys as the mechanism to encrypt traffic.

[Extra] SSL offloading, also known as SSL termination, decrypts all HTTPS traffic on the load balancer. Layer 7 actions can be carried out and the data proceeds to the backend server as plain HTTP traffic. SSL offloading allows data to be inspected as it passes between the load balancer and server. It also reduces CPU demand on an application server by decrypting data in advance. SSL offloading is vulnerable to attack, however, as the data travels unencrypted between the load balancer and application server.

The information shared between the user, and the load balancer remains secure while exchanging information between the web server and the load balancer is unencrypted. The working of SSL termination is explained below:

  • A load balancer is placed between the server and the client’s browser.
  • When the client requests an HTTPS connection, a session key is created between the load balancer and the browser using the server’s public and private keys.
  • All the information that is encrypted by the client’s browser reaches the load balancer.
  • The load balancer decrypts this information using a session key and sends the unencrypted information to the server.
  • The server receives the data in unencrypted form, so it does not need to decrypt it.
  • The response by the server is sent in plain text to the load balancer.
  • It[load balancer] performs encryption[of response] on this data using the session key.
  • The browser receives data from the load balancer and uses the session key for decryption.

Advantages of SSL Termination

  • The encryption and decryption of the incoming data are done through a load balancer. So, the server is free of workload.
  • SSL termination is best for websites that do not deal with the customers’ sensitive data (username, password, bank details).
  • It helps in accelerating the speed of the server.

Disadvantages of SSL Termination

  • The data between the load balancer and the server is transmitted in plain text, which makes it easy for hackers to steal sensitive information. In a way, it violates the purpose of having an SSL certificate because the secrecy of the data is compromised.
  • The server shares its keys to the load balancer which may lead to vulnerabilities.
  • It deceives the clients that their data is safe and secure throughout the communication, although encryption is lost mid-way and they do not know about this.
  • As the load balancer handles all the data, it isn’t easy to trust that all the information is still secure.

#2. SSL Bridging

  • SSL bridging is another method of SSL offloading. It is not appropriate for the websites that use sensitive information of the clients’ such as usernames, passwords, banking details, etc., to use the SSL termination technique.
  • Handling a large quantity of HTTPS data from the users makes these web servers work slower. For this purpose, SSL bridging is used by these websites.
  • Like SSL termination, a load balancer is used in this technique too. But the way of working is different, which is explained ahead:
  • A user sends the data through an HTTPS connection to the load balancer.
  • The load balancer receives the encrypted data and performs an SSL inspection on this information.
  • If the load balancer finds anything suspicious in the HTTPS data, it blocks that content.
  • Then, the load balancer again encrypts the data and sends it to the server. So, the data remains safe throughout the process.
  • The server then decrypts the information and sends the encrypted response to the load balancer, which is then forwarded to the client.

Advantages of SSL Bridging

1. The main advantage of SSL bridging is that data security is not compromised. The information stays encrypted during the whole process.

2. It protects the server from the following web-application attacks:

Disadvantages of SSL Bridging

  • The server still performs encryption and decryption itself. So, this workload is not reduced.
  • The biggest disadvantage of this method is that all the command is given to the load balancer to inquire about the data’s security. If somehow, the AI system of load balancer does not work properly, then there is a chance of important or safe data being blocked too.
  • The purpose of SSL bridging is to perform extra checks on the data to ensure that there is no malware included. The process includes decrypting the incoming data, inspecting it for any malicious code, and then re-encrypting it and sending it on to the webserver. Obviously, this form of offloading is meant to increase security rather than relieve the webserver of processing activities.

Local vs Global Load Balancing

  • Originally, load balancing referred to the distribution of traffic across servers in one locality — a single data centre.
  • As more and more computing is done online, load balancing has taken on a broader meaning. Global Server Load Balancing is the same in principle but the load is distributed planet-wide instead of just across a data centre. Early generation GSLB solutions relied on DNS load balancing, which limited both their performance and efficacy so it is now considered acceptable mostly for simple applications or websites.
  • Another implementation of GSLB is through a Content Delivery Network (CDN), such as the Cloudflare CDN. A global CDN service will take data from their customers’ origin servers and cache it on a geographically distributed network of servers, providing fast and reliable delivery of Internet content to users around the world.

Some advanced algorithms may be used when GSLB is linked to an ADC:

  • Server load (eg CPU)
  • Server link bandwidth
  • Least connexions
  • Packet rate

Geographic Load Balancing/ Geo Load Balancing

Geographic Load Balancing redistributes application traffic across data centres in different locations for maximum efficiency and security.

What Is Geographic Load Balancing?

Amazon Route 53 — Geolocation Traffic management strategy

Geographic load balancing redistributes application traffic across data centres in different locations for maximum efficiency and security.

While local load balancing happens within a single data centre, geographic load balancing uses multiple data centres in many locations.

What Is Geographic Server Load Balancing(GSLB)?

Client → DNS → GSLB [will do health checks] → Datacenter IP

Client → Datacenter IP

Geographic server load balancing, also known as global server load balancing (GSLB), is the distribution of traffic across servers located in multiple geographies. The servers can be on-premises or hosted in a private or public cloud.

Geographic server load balancing is especially useful in times of disaster, allowing companies to recover information and avoid shutting down operations. The geographic server load balancer can detect a server failure and automatically divert requests to the other geographic locations.

Imagine a store that sells shoes through the mail to customers all over the world. If that shoe store operates out of a single location, it will take a very long time for faraway customers to submit orders and receive their shoes. During busy shopping seasons, the store might get overloaded with orders and lose the ability to fill all their customers’ orders quickly.

Now imagine that the shoe store opens several more locations all over the world. This means customers can order shoes from a nearby location, cutting down on shipping times and reducing the possibility of one store getting overloaded with orders. This is exactly what GSLB does for websites and services, making it one of the most popular loads balancing solutions for companies with a global user base.

Client traffic is sent to the location that will provide the very best application performance and client experience, tailored to the location of the client and the observed availability of each location.

GSLB is the act of redirecting users requesting content to specific application instances that are closest to users based on some distribution logic. While there are several use cases for GSLB in application environments, GSLB is most commonly are implemented to achieve one or more of the following goals for an application:

  • Reduced latency to users in geographically distributed locations by optimizing request distribution
  • Offer tolerance across application, network or data centre failures
  • Enable non-disruptive migration to another cloud and other data centres

GSLB, traditionally a DNS-based user redirection, allows service providers to optimize resource use, maximize throughput, minimize response time, and avoid overload of any computing resource. A GSLB device performs global server selection to direct client traffic to the best server for a given domain during the initial client connection.

In this example, a client is using a web browser to view Example Inc. at “”. Example Inc. has two websites: one in San Jose and one in Denver, each with identical content and available services.

Both Web sites have a load balancer configured for GSLB, with the domain name set to” These devices are also configured as the Authoritative Name Servers for “”

The master DNS server on Example Inc. is configured to delegate “” to “”.

The DNS resolution for this GSLB configuration is as follows:

  1. The client web browser requests the “” IP address from the local DNS.
  2. The client’s DNS asks its upstream DNS, which in turn asks the next, and so on until the address is resolved. Eventually, the request reaches an upstream DNS server that has the IP address information available or the request reaches one of the Example Inc. DNS servers.
  3. Example Inc.’s San Jose DNS tells the local DNS to query the load balancer with GSLB software as the authoritative name server for “”.
  4. The San Jose load balancer responds to the DNS request, listing the IP address with the current best service. Each load balancer with GSLB software is capable of responding to the client’s name resolution request. Since each load balancer regularly checks and communicates health and performance information with its peers, either load balancer can determine which sites are best able to serve the client’s web access needs. It can respond with a list of IP addresses for the Example Inc.’s distributed sites, which are prioritized by proximity, performance, geography, and other criteria. In this case, the San Jose load balancer knows that Example Inc. Denver currently provides better service, and lists Example Inc. Denver’s virtual server IP address first when responding to the DNS request.
  5. The client connects to Example Inc. Denver for the best service.

If the site serving the client HTTP content suddenly experiences a failure (no healthy real servers) or becomes overloaded with traffic (all real servers reach their maximum connection limit), The load balancer issues an HTTP redirect and transparently causes the client to connect to another peer site. The result is that the client gets quick, reliable service with no latency and no special client-side configuration.


One of the easiest and most cost-effective ways to implement GSLB is through a content delivery network (CDN), such as the Cloudflare CDN. A global CDN service will take data from their customers’ origin servers and cache it on a geographically distributed network of servers, providing fast and reliable delivery of Internet content to users around the world.

What Is DNS Geographic Load Balancing?

DNS geographic load balancing configures a domain in the Domain Name System (DNS) so requests are distributed across multiple servers in various locations.

Online algorithms for geographical load balancing reroute traffic based on data and other parameters like location, availability and performance. When the browser resolves the hostname to an IP address in DNS load balancing, the traffic will be sent to a web server that can handle the information with little latency.

💡the load balancing logic that you’re trying to implement has to do with the origin of the traffic, not with an even distribution of that traffic. Let’s look at several solutions:

📝Cloudflare Load Balancing is a DNS-based load balancing solution that actively monitors server health via HTTP/HTTPS requests. Based on the results of these health checks, Cloudflare steers traffic toward healthy origin servers and away from unhealthy servers. Cloudflare Load Balancing also offers customers who reverse proxy their traffic the additional security benefit of masking their origin server’s IP address.

Leveraging a Content Delivery Network (CDN)

CDNs are adept at managing multiple active data centres globally. Our primary CDN is Fastly, which has multiple points of presence around the world and a cutting-edge IP network that relies on Anycast to guarantee that the closest data centre can consistently answer queries.

How it works: When a customer sends data to one of our systems that’s CDN-brokered, they’re talking to an HTTP proxy at a CDN point of presence close to them. Then that CDN performs basic filtering and routing (see Load Balancing 101: The Importance of Local Load Balancing). The CDN then sends the request through to us. Traffic goes to the CDN based on both geography and performance, and then the CDN can route it to multiple backend data centres based on the desired load level in each data centre or other characteristics.

Using DNS

This way of balancing traffic between multiple data centres has been around since the early days of the internet. It was much more common 10 years ago; it’s not used as much today.

How it works: As a basic example, suppose you have one hostname, such as, and it returns one of four different IP addresses, one for each of your data centres. While DNS load balancing is relatively inexpensive to implement and maintain, it’s slow (turning it on and off takes minutes) and sometimes unpredictable — many DNS clients will hold on to their responses and continue going to the old destination even after it has been updated.

Using Anycast and Border Gateway Protocol

This strategy is similar to the approach a CDN takes, just done in-house. It has become more common recently.

How it works: Each of your data centres proposes itself as a possible route for a set of virtual IPs(?). Then the originating traffic finds the shortest route to those IPs. It picks a path and brings it to one of your data centres, but instead of the data centre actually passing that data through in the backend, it serves the traffic and just impersonates those IPs.

The propagation time is much faster than DNS. You have more control over things like how you weigh the traffic among various locations. Because it matches the physical structure of the internet, it encourages better performance by picking a path that’s close to a data centre and close to your traffic origin. However, it requires significant internal network experience to set up and manage.

These are the essential, big-picture ways to do global load balancing. They all come with a certain degree of complexity, but more tools for managing that are becoming available all the time. In a sense, the complexity is a sign of your organization’s success: It means you’ve grown to the point where you need to solve these problems.

Global Server Load Balancers Compared to Traditional Load Balancers

While a normal load balancer (or ADC) distributes traffic across servers located in a specific data centre, a global server load balancer is capable of directing traffic across several data centres.

The other important difference is that load balancers are “in-line” with the traffic, meaning that all traffic between the client and the applications goes through the load balancer.

By comparison, GSLBs are only involved in setting up the route. Once the connection has been established, all traffic goes directly between the client and the application.

The flow is therefore as follows:

  1. The user requests the DNS services hosted by the GSLB to get the IP address of the server hosting the application
  2. The GSLB DNS server gives the IP address to direct the user to the data centre according to the GSLB distribution algorithm selected
  3. The user connects to the IP address communicated by the GSLB’s DNS service
  4. The application server is answering directly to the user

GSLB solutions can be used to complement load balancers, and become particularly interesting for businesses working across multiple sites.


Question: Load balancer seems like a single point of failure. What happens if the load balancer device goes down?
Second, load balancers have limits with regards to the number of requests they can process and a number of bytes they can transfer. What happens when our distributed message queue service becomes so popular that load balancer limits are reached?
Answer: To address high availability concerns, load balancers utilize a concept of primary and secondary nodes. The primary node accepts connections and serves requests while the secondary node monitors the primary. If the primary node is unable to accept connections, the secondary node takes over.

Other Topics

Load Balancer and Certificates

Shuffle Sharding

Major Components and Features

  • Healthchecker — to identify healthy machines to avoid requests to unhealthy servers
  • Connection pooling — Have pre-warmed tcp connections from connection pool
  • Stats Engine → Avg response time, # of connection etc.
  • Backend Discovery (Auto/Manual) → Multicast? Broadcast?
  • Header manipulation
  • Proxy
  • Algorithm plug
  • Metadata Store
  • Request tracing
  • Handling persistent connection?
  • Concurrency and Locking