Big data is a
mechanism of distributed computing to organize and share the networked
resources, data domains, storage devices, processing power across purely
distributed locations. In the big data environment, information services are essential
to provide information about various resources. This information includes
configuration of resources, policies, agreements that are managed by both
higher level and lower level schedulers. Resource information aggregation schemes
are used to shrink the quantity of information exchanged between the networked
resources. Through information aggregation, the characteristics of resources
are summarized and then sent to the scheduling procedures. In this paper we are
proposing a separate module named RIM (Resource Information Module) for information
aggregation in each domain. The experimental results show that the proposed
aggregation schemes achieve large information reduction and better resource
selection scheme for enabling improved task scheduling decisions.
Keywords: – Grid Computing, Domains, Information
aggregation, Resources, Schedulers.
Grid is a distributed environment that makes it
probable to share heterogeneous, loosely combined IT resources through
organisations and geographical locations 1. Grid network systems link
computing resources together in an approach that contracts someone use one
computer to access and force the collected power of all the computer machines
in the environment 2. Grid computing environments are the outcome of autonomic
provisioning of a multitude of resources and capabilities, typically
demonstrating increased computing resource utilization, access to specialized
computer systems, cost sharing, and improved management capabilities. Grid
Computing performs various applications in life sciences, financial analysis,
research collaboration, engineering design, collaborative games and
bio-informatics etc 3.
Computational grid 4 is defined as a collection
of computers, online instruments, data archives, and networks that are
connected by a shared set of services which, when taken together, provide users
with transparent access to the entire set of resources. A computational grid is
a hardware and software infrastructure that delivers dependable, consistent,
pervasive, and inexpensive access to high-end computational capabilities. These
systems have higher computational capacity to perform complicated calculations.
In a computational grid 5, a large computational
task is distributed up between individual machines, which run calculations in
parallel and then return results to the original computer. These individual
machines are nodes in a network, which may span various administrative domains
and may be geographically separated. Each of the nodes may be thought of as a
discrete system that can perform work and has access to a network.
Computational grids are more economical than supercomputers of equal computing
Due to the large amount of resources existing in
computational grid networks, it is difficult to manage the resource information
6. Using information aggregation, the characteristics of grid resources are
grouped and then sent to the higher level scheduling mechanisms. The scheduler
makes its decisions depending on the information collected by resource
monitoring systems. The Computational resources play major role in problem
solving and some complicated calculations. For example, these systems are used in
Medical Engineering, Robotics, Bio Informatics and Weather forecasting etc.
The Grid model 7 consists of two stages of
schedulers. First, the central scheduler receives the request from its higher
level mechanism and decides which resource domain is suitable for executing the
computation. In second level, the domain scheduler selects an appropriate
resource site to solve the computational problem. With the use of information
aggregation resource associated information are summarized and conveyed between
these two level schedulers. This reduces the quantity of information exchanges
among the grid machines.
This research paper has been structured as follows.
Section II reports on some works related to information aggregation. Section
III presents the proposed system model and gives some descriptions about
aggregation methods and resource scheduling decisions. Section IV provides an
analysis and simulation of the proposed method.
Finally, conclusions are submitted in Section V.
2. RELATED WORKS
2.1 Single Point
and Intra-Domain Clustering Aggregation Scheme 8
the single point aggregation scheme, the information vectors of the resource
sites in each domain are aggregated into a single information vector. In the
intra-domain clustering aggregation scheme the resource sites with similar
characteristics are sub grouped as Intra Domain Clusters. All resource site
vectors belonging to an intra-domain cluster are aggregated into a single
vector. Here, the domain monitor collects the information from its resource
sites and computes aggregation matrix. This matrix is sent to central monitor
to make scheduling decisions.
Aggregation through BestBrokerRank policy 9
this model, all clusters or domains adopt a public data aggregation model that
allows both the encapsulation and sharing of its resources and scheduling
information. Every domain has a grid broker that acts as a firewall to that
domain. Through best broker rank policy, the central scheduler selects an
appropriate domain to execute the task. The aggregation parameters used here
are Processor type, OS type and File type. Some operators such as count, total
are used to aggregate the parameters. It enables subcategories to increase the
accuracy of the resource information and fix threshold values for aggregation.
Aggregation through Omnipresent scheme 10
approach defines omnipresent information aggregation scheme. Omnipresent means
that everybody has a view of the aggregated information in a system and
therefore called n-aggregation (n stands for n participants). Thus everybody is monitoring the networked
Service with Failure Detector 11
method introduces fault detection architecture called HYDRA to monitor the
process failures in distributed systems. The aggregator component of HYDRA
publicizes the characteristics of resources which were retrieved from a GIS
(Grid Information Service) to the grid schedulers. If a user enquiries about
specific process, the aggregator sends its aggregated information. If there is
no information about its status, the resource aggregator forwards the query to
the GIS, which is related with the process.
Private Information Aggregation 12
model presents efficient and scalable protocols for privately computing a large
range of aggregation functions based on addition, disjunction, and max/min. It
defines computationally secure protocols for summation and disjunction that are
used to aggregate the resource information.
Information Aggregation Scheme 13
model uses Quality of Context parameters to resolve the conflicts in context
information. Here the context aggregation system notices conflict and duplicate
information which are kept in GIS (Grid Information System). If any conflict
arises, the system removes that critical information using some quality aware
policies. All context resource information is received at a monitor and aggregates
the information from lower level schedulers before forwarding it to the higher
Aggregation and Analysis (A Grid based approach for medicine and biology) 14
mainly two components are used. The Data Aggregator component is able to
support the aggregation of data deriving from heterogeneous resources (E.g.
biosensors, actuators). The Data Analyser manipulates the information gathered
from the Aggregator component and proceeds with a set of simulations.
based aggregation scheme 15
the resource machines transmit data to a local schedulers (or) cluster head
which aggregates data from all the resources in its cluster and transmits the
combined information to the sink. Here the resource sites transmit information
to cluster heads and the cluster head aggregates and compress the data and
forward it to the global schedulers. This protocol operates with two phases. Setup
phase: It involves the organization of resource nodes into clusters and selecting
cluster head. Steady state phase: It
involves information aggregation at the cluster head and transmits to the sink
through query definitions 16
protocol proposes inter cluster communication. It describes query definitions
with interest messages. Query definitions are sent from the cluster head to the
resource nodes (or) sink nodes.
2.10 Chain based
aggregation scheme 17
objective of chain based data aggregation is that each resource site transmits the
information to its closest neighbour. The chain construction is based on
computational capacity of the resources. It prevents single point failure. Here
all nodes have the global knowledge about the entire network.
Secure hop-by-hop Data Aggregation Protocols 18
uses grouping technique to partition the resource sites in a tree topology into
multiple logical groups (sub trees) of similar sizes. Hop-by-Hop aggregation is
performed in each group to generate a group aggregation result.
Indexing based aggregation scheme 19
method proposes a Multi-Attribute Addressable Network (MAAN) for resource
indexing and a Distributed Aggregation Tree (DAT) for information aggregation.
In MAAN, the resource sites are registered with a set of attribute value pairs
and that can be searched through multi-attribute based range queries. To
address a range query, MAAN routes the query to the first resource site within
its domain range. Then the query is forwarded to other resource sites consecutively
until it reaches the last site. In DAT, all resource sites use a well adjusted
routing algorithm to build a balanced DAT tree in the direction of the root
site. In DAT, all nodes aggregate towards the global information.
2.13 Grid Information Services for
Distributed Resource Sharing 20
Information Services are providing fundamental mechanisms for resource
discovery and monitoring using Information provider. Here Information provider
expresses two protocols. GRIP (GRid Information Protocol) is used to access the
information or characteristics about resource entities. GRRP (GRid Resource
Protocol) is used to perform notifying messages for aggregate directory
services to find the availability of resource information.
Consolidation and Information Aggregation 21
Grid networks, sometimes the scheduler needs more than one piece of data to
schedule the resources. Here data consolidation method introduces to transfer
group of data among the resource sites and data replicas.
Grid Information Service
Resource Information Module
Fig.1: Resource Information Module (RIM) for Local
this system model, the resource sites are clustered into domains. Each site in
a domain is characterized by a Vector Vi of resource information 7.
Resource Information: computation capacity, storage capacity,
resource type, resource location, host name and resource manager type etc.
Vector Vi= Ci, Si, Ni….
i – Resource Site
Ci – Computational Capacity (MIPS)
Si – Storage Capacity (MB)
Ni – Number of tasks queued
Information Service (GIS)
is a warehouse for storing information about the grid resources. The information services are an effective method
for resources within the grid to handle with the dynamic nature of the grid.
Within any grid, both CPU and data resources will fluctuate, liable to their
availability to process and share data. As resources become free within the
grid, they can update their status within the grid information services. This
provides clients information to make intellectual decisions on which grid
resources are free to use. A grid information service delivers information
about entities in a grid. An entity is something of value to a computational
grid. Entities can be services, devices, such as software; serve as a resource,
such as a compute cluster, machines or exist as a person, group, or
a grid environment, the schedulers have the motivation to assign the tasks
according to policy rules or other constraints. For example, some machines may
be designated to only be used for medical research. These would be identified
as having a medical research aspect and the scheduler could be configured to
only assign jobs that require machines of the medical research resource. Others
may be participating in the grid only if they are not used for military
purposes. In this situation, jobs requiring a military resource would not be
assigned to such machines. Of course, the administrators would need to impose a
classification on each kind of job through some authorization procedure to use
this kind of approach. Therefore the task scheduling is performed at two
Schedulers: At the higher
level, a global scheduler decides the site (or) domain a task will be executed
(ii) Local Schedulers: At the lower level, a local scheduler selects
the exact machine where the task will be executed.
make decisions based on static and dynamic resource information, including the
computation and storage capacities, their availability, the number of tasks
queued and other parameters of interest, which are usually collected by GIS.
(i) Global Scheduler (GS) gets the
queued tasks information from the GIS. This task information contains Task_ID and Minimum_Computational_Capacity_Needed values.
(ii) GS fixes Limit_Value of computational capacity corresponding to the
aggregated computational capacity (AVG) from the Domain’s Resource Information
(iii) The resources which are having
computational capacity higher than the AVG value are connected by chain.
(iv) The tasks which are having the Minimum_Computational_Capacity_Needed
value in-between the Limit_Value are
given to the corresponding domain.
(v) Then the tasks are scheduled to
the corresponding chain.
(vi) If the number of tasks is higher
than the number of resources then the tasks should wait for the completion of
(vii) If any resource completed its
task execution then that resource is disconnected from the chain and scheduled
to the queued tasks.
(viii) If any tasks are not under the
limit value that are assigned to other domain using the aggregated information
from the RIM of the particular domain.
3.4 Aggregation Variables
The computational capacities Ci of the resource sites,
measured in MIPS, in Domain D1 can be aggregated by following:
D1 = (C1+C2+C3+….…. +Cn) / N
C2, C3….Cn – Computational
Capacities of the Resources in Domain 1
N – Number
of Resources in Domain 1
4. EXPERIMENTAL EVALUATION
has been used to create the simulation of grid computing environment. The
computational resources are created in GridSim and the resource information is
stored in MySql. Through this simulation
environment we create several computational resources, with different
characteristics that belong to different domains. In our experiment there are
10 resources and 15 resources for Domain 1 and Domain 2 respectively.
Fig.2: Computational Resources in
Fig.2 shows that the computational capacities for 10 resources under Domain 1.
From the graph we can find the resources 102, 103, 105, 108 and 109 are having
their computational capacity values above than the average aggregated value.
These resources are formed as a chain for executing the tasks.
Fig.3: Computational Resources in
Fig.3 shows that the computational capacities for 15 resources under Domain 2.
From the graph we can find the resources 201, 204, 206, 209 and 212 are having
their computational capacity values above than the average aggregated value.
These resources are formed as a chain for executing the tasks.
Fig.4: Task information in GIS
Fig.4 shows that the information about Tasks and Minimum_Computational_Capacity_Needed to execute those tasks. For
example, RIM1 aggregates Domain1’s average aggregation value as 2028 MIPS.
Therefore the global scheduler sets the Limit_Value
as 1900 MIPS – 2100 MIPS. According to the proposed scheme, the tasks which are
values in-between the Limit_Value are
scheduled to the Domain1’s resource chain. Similarly the remaining tasks are
scheduled to the other domains. Through the aggregation values from RIM, the
selected tasks are scheduled to the resource chains in the respective domains.
Traffic between the Grid Nodes
Fig.5 shows that the RIM based aggregation scheme reduces data traffic in the
this paper, various resource information aggregation schemes in grid computing
have been described in related works. This work handles information aggregation
in grids as a distinct and important problem, trying to identify the main
issues, limitations, dependencies and side-effects related to the aggregation
process. The proposed information aggregation model in grid computing is an
individual domain based aggregation approach that the resource information is
stored in multi-level schedulers.
Experimental result shows efficiency and effectiveness of the proposed
system model. Though the proposed model
can reduce information exchanges between the schedulers, the data traffic in
grid network has been minimized. Finally the inter domain communication can be
made in this system for the tasks execution in several domains.
Foster, Carl Kesselman and Steven Tucke “The anatomy of the Grid”, International
Journal of High Performance Computing
Applications, Vol. 15, Issue 3, August 2001, pp. 200-222.
Fox and Dennis Gannon “Computational Grids”, Computing in Science and
Engineering Magazine, July/August 2001, pp. 75-78.
Aktaruzzaman, “Resource discovery in Computational Grids” , Proceedings of the
2005 conference on Self-Organization and Autonomic Informatics, pp.
and Varvarigos, “Scheduling efficiency of resource information aggregation in
grid networks”, Journal of future
generation computer systems, Vol. 28, Issue 1, January 2012, pp. 9-23.
and Varvarigos, “Resource Information Aggregation in Hierarchical Grid Networks”, Proceedings of the 2009 9th
IEEE/ACM International Symposium on Cluster Computing and the Grid, pp.
Rodero, Francesc Guim, Julita Corbalan, Liana Fong, “Grid broker selection
strategies using aggregated resource information”, Future Generation Computer
Systems, Vol.26, Issue 1, January 2010,
Schneider, “Information Aggregation for load balancing in a Distributed System
of Grid Services”, 2007.
Kang, “A Process Status Aggregating Service with Failure Detector”, 2005.
Mads Dam and Duglas Wikstorm, “Practical Private Information aggregation in
large networks”, Journal of
Information Security Technology for Applications Lecture Notes in Computer
Science, Vol. 7127, 2012, pp 89-103.
Manzoor, Hong-Linh Truong and Schahram Dustdar “Quality Aware Context
Information Aggregation System for Pervasive Environments”, Proceedings of the
2009 International Conference on Advanced Information Networking and
Applications Workshops, 2009, pp. 266-271.
Kyriazis, Konstantinos Tserpes, George Kousiouris, Andreas Menychtas, Gregory
Katsaros and Theodora Varvarigou, “Data Aggregation and Analysis: A Grid based
approach for Medicine and Biology”, Proceedings of the 2008 IEEE International
Symposium on Parallel and Distributed Processing with Applications, 2008, pp. 841-848.
Dasgupta, K. Kalpakis and P. Namjoshi, “An
Efficient Clustering-based Heuristic for Data Gathering and Aggregation in Sensor Networks”, IEEE
Conference on Wireless Communications and Networking, Vol. 3, March 2003, pp.
Chatterjea and Paul Havinga, “A Dynamic Data Aggregation Scheme for Wireless Sensor Networks”, 14th Workshop on
Circuits, Systems and Signal Processing, 2003.
Lindsey and Cauligi S. Raghavendra , “PEGASIS: Power-Efficient Gathering in
Sensor Information Systems” Proceedings of IEEE Aerospace conference, Vol.3,
2002, pp. 1125-1130.
Yang, Xinran Wang, Sencun Zhu and Guohong Cao, “SDAP: A Secure HopbyHop Data
Aggregation Protocol for Sensor Networks”, Proceedings of the 7th ACM
international symposium on Mobile ad hoc networking and computing, 2006, pp.
Kai, “Distributed Indexing and Aggregation techniques for Peer-to-Peer and Grid
Computing” December 2006.
Czajkowski, Steven Fitzgerald, Ian Foster and Carl Kesselman, “Grid Information
Services for distributed resource sharing”, Proceedings of 10th IEEE
International Symposium on High Performance Distributed Computing (HPDC-10),
and Varvarigos, “Data Consolidation and Information Aggregation in Grid
Networks”, Advances in Grid Computing, Chapter 6, February 28 2011.
M.MANICKA RAJA is
working as an Assistant Professor in Department of Computer Science and Engineering,
SCAD Institute of Technology and currently pursuing Ph.D. in Anna University,
Chennai. His areas of interest are Internet of Things and Big Data.
working as a Professor & Head in Department of Information Technology, Karpagam
College of Engineering. His areas of interest are Cloud Computing, Big Data and
Internet of Things.