Servicios Personalizados
Revista
Articulo
Indicadores
Links relacionados
- Similares en SciELO
Compartir
Computación y Sistemas
versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546
Comp. y Sist. vol.13 no.2 Ciudad de México oct./dic. 2009
Resumen de tesis doctoral
Reactive Scheduling of DAG Applications on Heterogeneous and Dynamic Distributed Computing Systems
Mapeo de Aplicaciones Paralelas tipo DAG en Sistemas Distribuidos Heterogéneos y Dinámicos
Graduated: Jesús Israel Hernández Hernández
Institute for Computing Systems Architecture
School of Informatics
University of Edinburgh, UK.
j.i.hernandez@sms.ed.ac.uk
Supervisor: Murray Cole
Institute for Computing Systems Architecture
School of Informatics
University of Edinburgh, UK.
mic@inf.ed.ac.uk
Graduated in December 4th, 2008
Abstract
Emerging computational platforms enable a set of geographically distributed computers with different capabilities to be linked together and used in a coordinated fashion to solve a parallel application at the same time. Effective scheduling mechanisms are essential to exploit the tremendous potential of computational resources offered by such platforms. We consider the problem of scheduling parallel applications which are often abstracted as directed acyclic graphs (DAGs), in which vertices represent application tasks and edges represent data dependencies between tasks. The core scheduling issues are that the availability and performance of resources, which are already by their nature heterogeneous, can be expected to vary dynamically, even during the course of an execution. This thesis summary presents the main results of the Global Task Positioning (GTP) mapping method, which is based on the cyclic use of a static mapping method over time. We place strong emphasis in three key aspects, which we believe are central to address the dynamic nature of the problem: reactivity, dataaware components and fault tolerance.
Keywords: Parallel processing, heterogeneous computing, task scheduling, DAG scheduling, fault tolerance.
Resumen
Plataformas computacionales emergentes permiten la compartición de recursos computacionales conectados a una red de alta velocidad y localizados en sitios distribuidos geográficamente, en la solución de una aplicación de manera concurrente. En este contexto, mecanismos de asignación de tareas se vuelven esenciales para explotar el tremendo potencial de recursos computacionales. Nuestra investigación considera el problema de mapear aplicaciones paralelas, frecuentemente representadas por grafos del tipo DAG (Directed Acyclic Graphs), en ambientes computacionales distribuidos, heterogéneos y dinámicos. El punto central del problema es que la disponibilidad y desempeño de los recursos computacionales pueden variar con el tiempo, incluso antes de terminar la ejecución de la aplicación. Ponemos especial énfasis en tres aspectos clave, los cuales creemos son primordiales para tratar la naturaleza dinámica el problema: adaptabilidad, reuso de información y tolerancia a fallas. Este resumen de tesis comparte la experiencia adquirida en el área y muestra los resultados principales del método de mapeo de aplicaciones paralelas GTP (Global Task Positioning) con sus respectivas variantes.
Palabras clave: Cómputo paralelo, cómputo heterogéneo, mapeo de tareas, tolerancia a fallas.
DESCARGAR ARTÍCULO EN FORMATO PDF
References
1. A. Chervenak, I. Foster, C. Kesselman, C. Salisbury and S. Tuecke, "The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets", Journal of Network & Computer Applic., 23(3): 187200 (1999). [ Links ]
2. Deelman, E., Kesselman, C., Blythe, J., and Gil, Y, "Mapping abstract complex workflows onto grid environments", Journal of Grid Computing, 1(1):2539 (2003). [ Links ]
3. Eshaghian, M. and Wu, Y., "Mapping heterogeneous task graphs onto heterogeneous system graphs", In Proceedings of Heterogeneous Computing Workshop (HCW'97), pages 147160, 1997. [ Links ]
4. Foster, I., and Kesselman,C., "The Grid: Blueprint for a Future Computing Infrastructure", Morgan Kaufmann Publishers, USA, 1999 [ Links ]
5. Foster, I., Kesselman, C., and Tuecke, S, "The anatomy of the grid: Enabling scalable virtual organizations", International Journal on Supercomputer Applications, 15(3):200222 (2001). [ Links ]
6. Gary, M. and Johnson, D. Computers and intractability: a guide to the theory of npcompleteness. W.H. Freeman and co., New York, 1979. [ Links ]
7. Gerasoulis, A. and Yang, T., "A comparison of clustering heuristics for scheduling directed acyclic graphs on multiprocessors", Journal of Parallel and Distributed Computing, 16(4):276291 (1992). [ Links ]
8. Hernandez, I. and Cole, M., "Reactive grid scheduling of dag applications", In Proceedings of the 25th IASTED(PDCN), Acta Press, pages 9297, 2007a. [ Links ]
9. Hernandez, I. and Cole, M., "Reliable DAG scheduling with rewinding and migration", In Proc.of the First International Conference on Networks for Grid Applications(GridNets), ACM Press, pages 18,2007b. [ Links ]
10. Hernandez, I. and Cole, M., "Scheduling DAGs on grids with copying and migration", Parallel Processing and Applied Mathematics (PPAM07), Springer LNCS, pages 10191028, 2007c. [ Links ]
11. In, J., Avery, P., and Ranka, S., "Sphinx: A faulttolerant system for scheduling in dynamic grid environments", In Proc. of the 19th International Parallel and Distributed Processing Symposium (IPDPS), pages 1222, 2005. [ Links ]
12. Kwok, Y. and Ahmad, I., "Static algorithms for allocating directed task graphs to multiprocessors", ACM Computing Surveys, 31(4):406471 (1999). [ Links ]
13. Maheswaran,M. and Siegel, H., "A dynamic matching and scheduling algorithm for heterogeneous systems", In Proceedings of the 7th Heterogeneous Computing Workshop (HCW), pages 5769, 1998. [ Links ]
14. MDS, "The Monitoring and Discovery System", http://globus.org/mds, 2000. [ Links ]
15. Medeiros, R., Cirne, W., Brasileiro, F., and Sauve, J., "Faults in grids: Why are they so bad and what can be done about it?", In Proceeding of the International Workshop on Grid Computing, pages 1824, 2003. [ Links ]
16. NWS, "The Network Weather Service", http://nws.cs.ucsb.edu, 2002. [ Links ]
17. Papadimitriou, C. and Steiglitz, K., "Combinatorial optimization: Algorithms and complexity", Dover Pub., INC., 1998. [ Links ]
18. Pegasus, "Planning for execution in grids", http://pegasus.isi.edu/, 2003. [ Links ]
19. Ranganathan, K. and Foster, I.. "Computation and data scheduling for large scale distributed computing", Proceedings of the 19th IEEE EuromicroPDP, pages 263275, 2004. [ Links ]
20. Shi, Z. and Dongarra, J., "Scheduling workflows applications on processors with different capabilities", Future Generation Computer Systems (FGCS), 22(6):665675 (2006). [ Links ]
21. Sih, G. and Lee, E., "A compiletime scheduling heuristic for interconnection constrained heterogeneous processor architectures". IEEE Trans. on Parallel and Distributed Systems, 4(2): 175187 (1993). [ Links ]
22. Simgrid, "The simgrid project homepage", http://simgrid.gforge.inria.fr/, 2001. [ Links ]
23. STG, "The Standard Task Graph project", http://www.kasahara.elec.waseda.ac.jp/schedule/, 2000. [ Links ]
24. Topcuoglu, H., "Performanceeffective and lowcomplexity task scheduling for heterogeneous computing", IEEE Transactions on Parallel and Distributed Systems, 13(3):260274 (2002). [ Links ]
25. Zhao, H. and Sakellariou, R., "A lowcost rescheduling policy for efficient mapping of workflows on grid systems", Scientific Programming SPR, 12(4):253262 (2004). [ Links ]