Servicios Personalizados
Revista
Articulo
Indicadores
- Citado por SciELO
- Accesos
Links relacionados
- Similares en SciELO
Compartir
Journal of applied research and technology
versión On-line ISSN 2448-6736versión impresa ISSN 1665-6423
J. appl. res. technol vol.7 no.3 Ciudad de México dic. 2009
Acceleration of associationrule based markov decision processes
Ma. de G. GarcíaHernández*1, J. RuizPinales2, A. ReyesBallesteros3, E. Onaindía4, J. Gabriel AviñaCervantes5, S. Ledesma6
1,2,5,6 Universidad de Guanajuato, Comunidad de Palo Blanco s/n, C.P. 36885, Salamanca, Guanajuato, México, garciag@salamanca.ugto.mx, pinales@salamanca.ugto.mx, avina@salamanca.ugto.mx, selo@salamanca.ugto.mx.
3 Instituto de Investigaciones Eléctricas, Reforma 113, C.P. 62490, Temixco, Morelos, México, areyes@iie.org.mx
4 Universidad Politécnica de Valencia, DSIC, Camino de Vera s/n, 46022, Valencia, España, onaindia@dsic.upv.es
ABSTRACT
In this paper, we present a new approach for the estimation of Markov decision processes based on efficient association rule mining techniques such as Apriori. For the fastest solution of the resulting associationrule based Markov decision process, several accelerating procedures such as asynchronous updates and prioritization using a static ordering have been applied. A new criterion for state reordering in decreasing order of maximum reward is also compared with a modified topological reordering algorithm. Experimental results obtained on a finite state and actionspace stochastic shortest path problem demonstrate the feasibility of the new approach.
Keywords: Markov decision processes, association rules, acceleration procedures.
RESUMEN
En este documento se presenta un nuevo enfoque para la estimación de procesos de decisión de Markov basado en técnicas eficientes de minería de reglas de asociación tal como Apriori. Para la más rápida solución del resultante proceso de decisión de Markov basado en reglas de asociación, han sido aplicados varios procedimientos de aceleración tales como actualización asíncrona y priorización usando reordenamiento estático. Un nuevo criterio para el reordenamiento de estados es también comparado con un algoritmo modificado de reordenamiento topológico. Los resultados experimentales obtenidos en un problema estocástico de ruta más corta, con un número finito de acciones y estados, demuestran la viabilidad del nuevo enfoque.
Palabras clave: Procesos de decisión de Markov, reglas de asociación, procesos de aceleración.
DESCARGAR ARTÍCULO EN FORMATO PDF
References
[1] Boutilier, C., Dean, T., Hanks, S., Decisiontheoretic planning: structural assumptions and computational leverage, Journal of Artificial Intelligence Research, 11, 1999, pp 194. [ Links ]
[2] Bellman, R. E., The theory of dynamic programming, Bull. Amer. Math. Soc., 60, 1954, pp 503516. [ Links ]
[3] Puterman, M. L., Markov Decision Processes, Wiley Editors, New York, USA, 1994. [ Links ]
[4] Bonet, B., Geffner, H., Learning depthfirst search: A unified approach to heuristic search in deterministic and nondeterministic settings and its application to MDP, International Conference on Automated Planning and Scheduling, ICAPS, 2006, Cumbria, UK. [ Links ]
[5] Darwiche, A., Goldszmidt M., Action networks: A framework for reasoning about actions and change under understanding, 10th Conference on Uncertainty in Artificial Intelligence, UAI, 1994, pp 136144, Seattle, Washington, USA. [ Links ]
[6] Van Otterlo, M., A Survey of Reinforcement Learning in Relational Domains, Technical Report Series CTIT0531, ISSN 13813625, July 2005. [ Links ]
[7] Dean, T., Kaelbling, L. P., Kirman, J., Nicholson, A., Planning under Time Constraints in Stochastic Domains, Artificial Intelligence, 76 (12), July 1995, pp 3574. [ Links ]
[8] Boutilier, C., Dearden, R., Goldszmidt, M., Stochastic Dynamic Programming with Factored Representations, Artificial Intelligence, 121 (12), 2000, pp 49107. [ Links ]
[9] Givan, R., Dean, T., Greig, M., Equivalence Notions and Model Minimization in MDPs, Artificial Intelligence, 147 (12), 2003, pp 163233. [ Links ]
[10] Tsitsiklis, J. N., Van Roy, B., Featurebased methods for largescale dynamic programming, Machine Learning, 22, 1996, pp 5994. [ Links ]
[11] De Farias, D. P., Van Roy, B., The linear programming approach to approximate dynamic programming, Operations Research, 51 (6), 2003, pp850865. [ Links ]
[12] Bonet, B., Geffner, H., Labeled RTDP: Improving the Convergence of RealTime Dynamic Programming, International Conference on Automated Planning and Scheduling, ICAPS, 2003, pp 1221, Trento, Italy. [ Links ]
[13] Hansen, E. A., Zilberstein, S., LAO: A Heuristic Search Algorithm that finds solutions with Loops, Artificial Intelligence, 129, 2001, pp 3562. [ Links ]
[14] Chang, H. S., Fu, M. C., Hu, J., Marcus, S. I., An Adaptive sampling algorithm for solving MDPs, Operations Research, 53 (1), 2005, pp 126139. [ Links ]
[15] Gardiol, N., Kaelbling, L. P., Envelopebased Planning in Relational MDP's, Neural Information Processing Systems NIPS, 16, 2003, Vancouver, B. C. [ Links ]
[16] Gardiol, N., Relational Envelopebased Planning, PhD Thesis, MIT, MA, USA, February 2008. [ Links ]
[17] Bellman, R. E., Dynamic Programming, Princeton United Press, Princeton, USA, 1957. [ Links ]
[18] Puterman, M. L., Markov Decision Processes, Wiley Interscience Editors, New York, USA, 2005. [ Links ]
[19] Russell, S., Artificial Intelligence: A Modern Approach, 2nd Edition, Making Complex Decisions (C17), Pearson Prentice Hill Ed., USA, 2004. [ Links ]
[20] Chang, I. and Soo, H., Simulationbased algorithms for Markov decision processes, Communications and Control Engineering, Springer Verlag London Limited, 2007. [ Links ]
[21] Tijms, H. C., A First Course in Stochastic Models, Wiley Ed., DiscreteTime Markov Decision Processes (C6), UK, 2003. [ Links ]
[22] Littman, M. L., Dean, T. L. and Kaelbling, L. P., On the Complexity of Solving Markov Decision Problems, 11th International Conference on Uncertainty in Artificial Intelligence, 1995, pp 394402, Montreal, Quebec. [ Links ]
[23] Wingate, D., Seppi, K. D., Prioritization Methods for Accelerating MDP Solvers, Journal of Machine Learning Research, 6, 2005, pp 851881. [ Links ]
[24] Dai, P., Hansen, E. A., Prioritizing Bellman Backups Without a Priority Queue, Association for the Advancement of Artificial Intelligence, 17th International Conference on Automated Planning and Scheduling, ICAPS, 2007. [ Links ]
[25] Agrawal, R., Imielinski, T., Swami, A., Mining Association Rules between Sets of Items in Large Databases, ACM SIGMOD International Conference on Management of Data, May 1993, Washington DC, USA. [ Links ]
[26] Hahsler, M., Hornik, K., Reutterer, T., Implications of Probabilistic Data Modeling for Mining Association Rules, Studies in Classification Data Analysis and Knowledge Organization, Springer Verlag, 2005. [ Links ]
[27] Brijs, T., Swinnen, G., Van Hoof, K., Wets, G., Building an association rules framework to improve product assortment decisions, Data Mining and Knowledge Discovery, 8 (1), 2004, pp 723. [ Links ]
[28] Lawrence, R. D., Almasi, G. S., Kotlyar, V., Viveros, M. S., Duri, S., Personalization of supermarket product recommendations, Data Mining and Knowledge Discovery, 5 (1/2), 2001, pp 1132. [ Links ]
[29] Van den Poel, D., Schamphelaere, J., Wets, G., Direct and indirect effects of retail promotions on sales and profits in the doityourself market, Expert Systems with Applications, 27 (1), 2004, pp 5362. [ Links ]
[30] Agrawal, R., Srikant, R., Fast Algorithms for Mining Association Rules, 20th VLDB Conference, IBM Almaden Research Center, 1994. [ Links ]
[31] Sutton, R. S., Barto, A. G., Introduction to Reinforcement Learning, MIT Press, USA 1998. [ Links ]
[32] Scherrer, B., Mannor, S., Error Reducing Sampling in Reinforcement Learning, Institut National de Recherche en Informatique et Automatique, INRIA, 98352, Vol.1, September 2006. [ Links ]
[33] Gupta, G. K., Introduction to Data Mining with Case Studies, PrenticeHall of India, Pvt. Ltd, 2006, pp 7682. [ Links ]
[34] Ceglar, A., Roddick, J. F., Association Mining, ACM Computing Surveys, Vol. 38, No.2, Article 5, July 2006. [ Links ]
[35] Vanderbei, Robert J., Optimal Sailing Strategies, Statistics and Operations Research Program, University of Princeton, USA, (http://orfe.princeton.edu/~rvdb/sail/sail.html), 1996. [ Links ]
[36] Blackwell, D., Discounted dynamic programming, Annals of Mathematical Statistics, Vol. 36, 1965, pp 226235. [ Links ]
[37] Hinderer, K., Waldmann, K. H., The critical discount factor for Finite Markovian Decision Processes with an absorbing set, Mathematical Methods of Operations Research, Springer Verlag, 57, 2003, pp 119. [ Links ]
[38] Garey, M. R., Johnson, D. S., Computers and Intractability, A Guide to the Theory of NPCompleteness, Appendix A: List of NPComplete Problems, W. H. Freeman, 1990. [ Links ]
[39] Dai, P., Goldsmith, J., Topological Value Iteration Algorithm for Markov Decision Processes, 20th International Joint Conference on Artificial Intelligence, IJCAI, 2007, pp 18601865, Hyderabad, India. [ Links ]
[40] Reyes, A., Ibarguengoytia, P., Sucar, L. E., Morales, E., Abstraction and Refinement for Solving Continuous Markov Decision Processes, 3rd European Workshop on Probabilistic Graphical Models, 2006, pp 263270, Prague, Czech Republic. [ Links ]
[41] Vanderbei, Robert J., Linear Programming: Foundations and Extensions, Springer Verlag, 3rd Edition, January 2008. [ Links ]