1. Introduction
Servers hosted on-premises have traditionally used a centralized load balancer. However, as modern web and mobile applications move to the cloud, the use of multiple distributed load balancers over a server farm has become the preferred implementation as it is more robust, programmable, and cost-effective (Lu et al., 2011).
A centralized load balancer oversees the entire incoming and outgoing traffic of tasks and the assignment of tasks to servers; hence it can keep track of the task queue at each server. Whereas in a distributed setup each load balancer, called dispatcher, is only aware of the tasks that it handles; hence dispatchers cannot track the task queues at servers. Additional communication between the servers and dispatchers is required for the dispatchers to have information on server queues for load balancing. However, as servers and dispatchers increase in number, information exchange between them becomes voluminous and an expensive time overhead making it impractical for the dispatchers to keep track of all server task queues (Mitzenmacher, 2016).
The Join Idle Queue (JIQ) algorithm was proposed to attain fast response time for distributed dispatchers with low communication overhead (Mitzenmacher, 2016). It was shown to perform better than other load balancing algorithms for distributed setups (Silva Filho et al., 2017) and has become the standard. Central to JIQ is decoupling the process of task assignment by dispatchers from the registration of idle servers at the dispatchers. Instead of dispatchers tracking server queues, idle servers register themselves with dispatchers. To facilitate this, each dispatcher maintains a data structure called the I-queue which contains a list of idle servers registered at the dispatcher. When a server becomes idle, it registers itself with the I-queue of one of the dispatchers which is selected at random. In this study, we observed that in a typical JIQ scenario many dispatchers persist with zero- length I-queues. Since the servers are blind to the I-queues of dispatchers, idle servers register themselves with the I-queues of randomly selected dispatchers. The distribution of idle machines across dispatcher I-queues is not uniform and allows zero-length I-queues to persist in some dispatchers. Dispatchers with empty I-queues assign incoming tasks to randomly selected servers leading to suboptimal average response time. We hypothesized that if dispatchers can make servers aware of their idle queues at the time of task assignment, the servers can have partial information on dispatcher I-queues and when idle they can register themselves with one of the dispatchers that have an empty I-queue. This will help to mitigate the persistence of zero-length I-queues while keeping communication overhead low. As the probability of dispatchers finding idle servers increases, the average response times will improve. In this paper we propose an improvement to the JIQ algorithm called Join Idle Queue Dispatcher I-queue Optimization (JIQ-DIO) which implements this idea.
Related work including the JIQ algorithm, and its variants are described in Section 2. The JIQ- DIO algorithm is described in Section 3. Implementation and simulation of the JIQ-DIO algorithm in CloudSim Plus (Calheiros et al., 2011), a fork of popular tool CloudSim for simulating cloud computing infrastructures (Kunwar et al., 2018), is described in Section 4. The original JIQ algorithm and its other variants have also been simulated on the same platform for direct performance comparison with JIQ-DIO. Conclusions and future work are discussed in Section 5.
2. Related work
2.1. Load balancing with distributed dispatchers
Cloud computing delivers computing resources and services over the internet. An important aspect of research work in cloud computing is load balancing which refers to the optimal allocation of service requests to evenly balance the workload across multiple servers. Load balancing improves response time and throughput for the users and leads to optimal utilization of resources and low downtime at the servers. Traditionally load balancing is performed by a single dispatcher who makes all decisions on task allocation to servers. This classical problem of load balancing using a centralized dispatcher has been researched extensively (Khiyaita et al., 2012; Li, 2017) and analytical expressions for optimizing average response time, power consumption and cost- performance ratio in heterogeneous multi-server setups have been described (Rao et al.,2003).
However, lately the use of multiple distributed dispatchers over a server farm has been preferred over a centralized dispatcher. The scalability and performance advantages of decentralized load balancing were recognized in peer-to-peer file sharing systems (Gao & Min, 2009; Grosu & Chronopoulos, 2005; Yang & Garcia-Molina, 2003). Multiple heterogeneous servers distributing tasks among them using game theoretic algorithms in either cooperative or non-cooperative fashion were found to be advantageous over centralized load balancing (Al-Fares et al., 2008; Duan et al., 2014). In data centers, multi-rooted tree architecture utilizing software algorithms and communication protocols for balancing data flows across a network was found to be superior to a centralized setup (Bharti & Pattanaik, 2013; Cheung & Leung, 2018; Harchol-Balter, 2021). Similarly, distributed load balancing was found to improve the resilience and scalability of data centers hosting cloud computing services (Badonnel & Burgess, 2008; Ousterhout et al., 2013). Distributed load balancing can be achieved in many ways such as by servers sharing load with peers (Cardellini et al.,1999; Hong, Y et al.,2006), DNS based load balancing (Kingman, 1961), or by a strictly hierarchical architecture where dedicated load balancers called dispatchers direct requests to servers (Ousterhout et al., 2013).
This study considers cloud computing server farms having a hierarchical setup consisting of multiple distributed dispatchers and heterogeneous servers or virtual machines (VMs) as illustrated in Fig. 1. The server farm comprises of p physical servers hosting a total of 𝑘1 + 𝑘2 + ⋯ + 𝑘𝑝 = 𝐾 virtual machines (VMs). Service requests, or tasks, T1, T2, …, Tn are received by a router. Task arrival is modeled by a Poisson process and the processing times of tasks (or task lengths) are considered to be exponentially distribute (Narang et al., 2019). The router directs a service request to one of m dispatchers which assigns it to one of K VMs. The VMs maintain a task queue from which tasks are processed in a first-in-first-out (FIFO) order.
In this scenario, a load balancing algorithm must address broadly three aspects:
2.2. A general model of Join Idle Queue
The Join Idle Queue algorithm addresses load balancing in the context of large data centers or server farms that maintain hundreds or thousands of servers managed by distributed dispatchers (Mitzenmacher, 2016). The large compute capacity ensures that any given time there are idle VMs available to immediately process incoming service requests. The problem of load balancing is then simplified to finding an available idle VM for an incoming service request. JIQ solves this problem with minimal communication overhead between the dispatchers and VMs. In a JIQ model each dispatcher locally maintains an Idle Queue or I-queue which stores a list of idle VMs that are registered with it. An idle VM may be registered with the I-queue of only one dispatcher at any time. When a task arrives, a dispatcher assigns it to an idle VM from its I- queue and removes the VM from its I-queue. Notably, instead of dispatchers attempting to find idle VMs to fill their I-queues, the idle VMs have the responsibility of registering themselves with the I-queue of a dispatcher. The different variants of JIQ present various strategies by which idle VMs select a dispatcher to register themselves with. For instance, the standard JIQ algorithm, also referred to as JIQ-Random, uses the following strategies for the three key aspects of load balancing:
A) Dispatcher selection: The router directs an incoming request to one of the dispatchers selected uniformly at random.
B) Task allocation: A dispatcher with non-empty I-queue selects an idle VM uniformly at random from its I-queue and then removes this VM from its I-queue. A dispatcher with an empty I-queue selects a VM uniformly at random among all VMs.
C) I-queue joining: When a VM becomes idle, it selects a dispatcher uniformly at random among all dispatchers and joins its I-queue.
2.3. Variants of JIQ
The standard JIQ or JIQ-Random uses a simple uniform-at-random strategy for dispatcher selection, task allocation as well as I-queue joining. This is easy to implement as there is no need to track the state of the system. However, its performance is not optimal. A few different variants of the JIQ algorithm have been proposed to improve performance (Lu, 2018). These are summarized in Table 1.
Algo | Strategy | |||
Dispatcher Selection | Task Allocation | I-queue Joining | ||
Nonempty I-queue (Select from I-queue) | Empty I- queue (Select from all VMs) | |||
JIQ Random | Random | Random | Random | Random |
JIQ-SQ(d) | Random | Random | Random | SQ(d), Shortest I queue |
JIQ-PoD | Random | Random | SQ(d) | Random |
JIQ-NE | SQ(d) | Random | Random | Random |
JIQ-E | Random | Random | Random | SQ(d) |
JIQ-SQ(d) makes a smarter choice in the I-queue joining strategy compared to JIQ-Random:
JIQ-SQ(d) I-queue joining: When a VM becomes idle, it chooses d dispatchers at random and among those registers itself with the dispatcher which has the shortest I-queue length.
JIQ-PoD, power-of-d-choices, improves upon the task allocation strategy:
JIQ-PoD task allocation: A dispatcher with non-empty I-queue selects an idle VM uniformly at random from its I-queue and then removes this VM from its I-queue. A dispatcher with empty I-queue selects a subset of d VMs from all VMs, and among these d VMs assigns the task to the VM with the shortest task queue.
JIQ-NE, non-empty, improves the strategy for dispatcher selection:
JIQ-NE dispatcher selection: The router chooses d dispatchers at random and among them selects the first one having a non-empty I-queue for routing a new task. If none of the d dispatchers has a non-empty I-queue, then the d th dispatcher is chosen irrespective of its I- queue status.
JIQ-E, empty, is like JIQ-SQ(d), however, the difference is in that JIQ-SQ(d) seeks the dispatcher with shortest length I-queue whereas JIQ-E seeks a dispatcher with an empty I- queue:
JIQ-E I-queue joining: When a VM becomes idle, it chooses d dispatchers at random and among those registers itself with the first dispatcher that has an empty I-queue. If none of the d dispatchers has an empty I-queue, then the d th dispatcher is chosen irrespective of its I-queue status.
Also, JIQ-SQ(d) and JIQ-PoD probe d servers simultaneously whereas JIQ-NE and JIQ-E do the probing sequentially to reduce the communication cost.
In the same spirit, the present research preserves the general model of JIQ while improving upon the I-queue joining strategy.
3. Join Idle Queue Dispatcher I-queue Optimization
As seen in Table 1, in JIQ and its variants idle VMs register themselves with the I-queues of dispatchers selected by random or by the SQ(d) method. Through simulation studies we noted that this approach allows zero-length I-queues to persist in dispatchers which may lead to suboptimal assignment of tasks to VMs and hence increased response time. The proposed algorithm, JIQ dispatcher I-queue Optimization, aims to increase the number of dispatchers that have non-empty I-queues without increasing the overhead of communication between dispatchers and VMs. To achieve this, a list of dispatcher I-queue lengths is kept with the VMs. This list is updated opportunistically when dispatchers assign tasks to VMs. At the time of task assignment, a dispatcher also communicates its I-queue length to the VM which is stored by the VM in its list of dispatcher I-queue lengths. As different dispatchers and VMs communicate during the process of task assignment, the VMs gradually build up an approximate state of dispatcher I-queue lengths without incurring additional communication overhead. The VMs may never build a comprehensive view of all dispatcher I-queue lengths and the values may not be up to date, however the partial view suffices for VMs to register themselves with dispatchers with empty I-queues when they become idle. This leads to a reduction in the numbers of dispatchers with empty I-queues, hence increasing the probability of tasks finding a dispatcher with non-zero I-queue and being processed immediately. Hence, there is an improvement in the average response time.
JIQ-DIO compares with other JIQ algorithms listed in Table 1 as follows:
Dispatcher selection: Random
Task allocation:
Non-empty I-queue: Random VM from the I-queue
Empty I-queue: Randomly to any VM
I-queue joining: Lookup the VM’s list of dispatcher I-queue lengths. If there are dispatchers with empty I-queues in this list, select any one of them at random. Otherwise select any dispatcher at random.
3.1. Pseudocode of JIQ-DIO
The JIQ-DIO algorithm has been simulated in CloudSim Plus following the pseudocode presented below. In CloudSim terminology tasks are called cloudlets.
1.Cloudlet generation and processing module
• Generate n cloudlets with lengths (task size) as per exponential distribution
• Set the cloudlet arrival times as per Poisson distribution
• For each cloudlet to be scheduled:
• Select at random an idle VM in the dispatcher I-queue
• Assign the cloudlet to the selected VM for processing
• Remove the selected VM from the dispatcher’s I-queue
• In the selected idle VM also set the length of the selected dispatcher’s I-queue
ELSE
• Assign cloudlet to a random VM.
• Capture how many times this happens, i.e., no idle VM found.
2.Listener: Cloudlet completion module
• Decrement task queue length at VM
• IF (task queue length == 0) THEN
• Assign Idle VM to first dispatcher found with empty idle queue ELSE
• Assign idle VM to a random dispatcher
4. Results
4.1. Simulation framework
The simulations in this study have been performed using CloudSim Plus, a fork of CloudSim. New modules were developed in CloudSim Plus to simulate JIQ-DIO as well as the other five JIQ variants listed in Table 1. The simulation model was described previously in (Narang et al., 2021), Section 7. In summary, cloudlets are generated dynamically with arrival times modeled by a Poisson process. The cloudlets are considered independent of each other and of the same priority. Cloudlet lengths are exponentially distributed. The VMs are modeled as having heterogeneous processing capacities in progressive increments of 1 MIPS step-ups while being homogeneous in other machine characteristics including RAM, bandwidth and storage. The parameters used in simulations in the present study are shown in Table 2.
Parameter | Value(s) |
Number of hosts | 60 |
Number of virtual machines | 180, 360, 600 |
VM processing capacity (MIPS) (heterogeneous) | 20, 21, 22, …, 20+(n-1) |
Number of dispatchers | 10, 36, 60, 120 |
SQ(d) | 0.3, 0.5, 0.6 |
Number of cloudlets | 10000, 30000, 50000 |
Cloudlet length distribution | Exponential |
Mean cloudlet length | 800 |
Cloudlet arrival process | Poisson |
Mean arrival time for cloudlets | 1, 2 |
4.2. Performance metrics
The algorithms are compared on the following performance metrics which are key service level measures for cloud-based scenarios (Narang et al., 2021).
Average response time per task (cloudlet):
where A is the arrival time of the task, Tdelay is the transfer time of the task, and F is the time to complete the task. For simulations in this work Tdelay = 0.
Makespan: It is the completion time of the last cloudlet, which implies that all submitted tasks (cloudlets) have been processed by the pool of VMs.
Resource utilization: It is the average utilization of VMs where the utilization of a VM is defined as the difference of its busy and idle times through the length of execution, i.e., makespan.
where 𝑅𝑟𝑡 and 𝑅𝑖𝑡 are the VM running time and idle time respectively, and n is the number of VMs. Its value ranges between -1 (completely idle) and 1 (fully busy).
4.3 Simulation results
Simulation was performed in triplicates for each set of parameter values and the performance metrics were summarized by their mean and standard deviation over replicates.
Compared to standard JIQ and its variants, JIQ DIO had a consistently better response time which was even more pronounced as the number of VMs was increased (Fig. 2). In these simulations, JIQ-DIO led to almost 2-fold improvement in the response time with good statistical significance (p < 0.05). Interestingly, varying the number of dispatchers had little impact on the average response time as shown by lack of statistical significance (p > 0.05) in Fig. 3. It could be argued that performance improvement obtained with JIQ-DIO might be transient. To test this, the number of cloudlets increased from 10000 to 50000. Simulation results show that there is no drop in the performance of JIQ-DIO over time (Fig. 4).
Simulation results were used to compare the probability of dispatchers finding empty VMs in various JIQ implementations. It was mentioned above that our implementation of JIQ maintains a counter to track the number of times a dispatcher has an empty I-queue when a task is routed to it. The data collected from this counter is shown in Fig. 5. The JIQ-Empty and JIQ-NE encountered an empty I-queue 35,111 and 35,149 times respectively on average among 50,000 cloudlets which implies that there is greater than 70% probability of dispatchers having empty I-queues. The frequency (out of 50,000) is 1981 or 3.96% for standard JIQ, 1970 or 3.94% for JIQ-PoD, 200 or 0.4% for JIQ-SQ(d) and 8 or 0.0016% for JIQ-DIO. These results favor the hypothesis that JIQ-DIO decreases the probability of dispatchers with empty I-queues which in turn improves the average response time.
No significant differences were observed between the various JIQ variants in terms of the makespan and average resource utilization metrics.
It is worth noting that the performance of JIQ and its variants reported in previous works (Mitzenmacher, 2016; Narang et al., 2021) is mostly considering a homogeneous set of VMs, whereas this work simulates a heterogeneous setup and uses a more generic simulation tool, CloudSim Plus. Hence, the simulation results concerning average response times presented herein might be more generalizable in comparison to those presented previously.
5. Conclusion and future work
The Join Idle Queue algorithm is well suited for load balancing in distributed scenarios which are now commonplace in cloud computing. Over the years a few improvisations have been proposed to the initial work to yield better performance for specific scenarios. We proposed a new variant, JIQ dispatcher I-queue Optimization, and through simulations on a generic heterogeneous setup showed a two-fold improvement in response times across a broad range of parameters. We showed that JIQ-DIO results in a higher probability of dispatchers having access to idle machines leading to this performance gain. The reliability of the simulations was ensured by replication of the experiments followed by statistical analysis of the results. Rigorous mathematical evaluation of JIQ- DIO is left to a separate communication. In terms of future work, the model can be improved to attain a close to uniform distribution of idle machines among dispatchers to realize even further improvements to response times. Additional optimizations can be made to improve upon other performance metrics which would require taking into consideration additional parameters beyond the standard few used in this work.