Servicios Personalizados
Revista
Articulo
Indicadores
Links relacionados
- Similares en SciELO
Compartir
Computación y Sistemas
versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546
Comp. y Sist. vol.13 no.1 Ciudad de México jul./sep. 2009
Artículos
AsistO: A Qualitative MDPbased Recommender System for Power Plant Operation
AsistO: Un Sistema de Recomendaciones basado en MDPs Cualitativos para la Operación de Plantas Generadoras
Alberto Reyes1, L. Enrique Sucar2 and Eduardo F. Morales2
1 Instituto de Investigaciones Eléctricas; Av. Reforma 113, Palmira, Cuernavaca, Morelos, 62490, México; areyes@iie.org.mx
2 INAOE; Luis Enrique Erro 1, Sta. Ma. Tonantzintla, Puebla 72840, México; esucar@inaoep.mx , emorales@inaoep.mx
Article received on July 15, 2008
Accepted on April 03, 2009
Abstract
This paper proposes a novel and practical modelbased learning approach with iterative refinement for solving continuous (and hybrid) Markov decision processes. Initially, an approximate model is learned using conventional sampling methods and solved to obtain a policy. Iteratively, the approximate model is refined using variance in the utility values as partition criterion. In the learning phase, initial reward and transition functions are obtained by sampling the stateaction space. The samples are used to induce a decision tree predicting reward values from which an initial partition of the state space is built. The samples are also used to induce a factored MDP. The state abstraction is then refined by splitting states only where the split is locally important. The main contributions of this paper are the use of sampling to construct an abstraction, and a local refinement process of the state abstraction based on utility variance. The proposed technique was tested in AsistO, an intelligent recommender system for power plant operation, where we solved two versions of a complex hybrid continuousdiscrete problem. We show how our technique approximates a solution even in cases where standard methods explode computationally.
Keywords: Recommender systems, power plants, Markov decision processes, abstractions.
Resumen
Este artículo propone una técnica novedosa y práctica de aprendizaje basada en modelos con refinamiento iterativo para resolver procesos de decisión de Markov (MDPs) continuos. Inicialmente, se aprende un modelo aproximado usando métodos de muestreo convencionales, el cual se resuelve para obtener una política. Iterativamente, el modelo aproximado se refina con base en la varianza de los valores de la utilidad esperada. En la fase de aprendizaje, se obtienen las funciones de recompensa inmediata y de transición mediante muestras del tipo estadoacción. Éstas primero se usan para inducir un árbol de decisión que predice los valores de recompensa y a partir del cual se construye una partición inicial del espacio de estados. Posteriormente, las muestras también se usan para inducir un MDP factorizado. Finalmente, la abstracción de espacio de estados resultante se refina dividiendo aquellos estados donde pueda haber cambios en la política. Las contribuciones principales de este trabajo son el uso de datos para construir una abstracción inicial, y el proceso de refinamiento local basado en la varianza de la utilidad. La técnica propuesta fue probada en AsistO, un sistema inteligente de recomendaciones para la operación de plantas generadoras de electricidad, donde resolvimos dos versiones de un problema complejo con variables híbridas continuas y discretas. Aquí mostramos como nuestra técnica aproxima una solución aun en casos donde los métodos estándar explotan computacionalmente.
Palabras clave: Sistemas de recomendaciones, plantas generadoras, procesos de decisión de Markov, abstracciones.
DESCARGAR ARTÍCULO EN FORMATO PDF
Acknowledgments
This work was supported jointly by the Instituto de Investigaciones Eléctricas, Mexico and CONACYT Project No. 47968.
References
1. J. Baum and A. E. Nicholson. Dynamic nonuniform abstractions for approximate planning in large structured stochastic domains. In PRICAI'98 Proceedings of the 5th Pacific Rim International Conference on Artificial Intelligence, pages 587598, Singapore, 1998. [ Links ]
2. R.E. Bellman. Dynamic Programming. Princeton U. Press, Princeton, N.J., 1957. [ Links ]
3. D. P. Bertsekas. A counterexample to temporal difference learning. Neural Computation, 1994. [ Links ]
4. D. P. Bertsekas and J.N. Tsitsiklis. Neurodynamic programming. Athena Sciences, 1996. [ Links ]
5. B. Bonet and J. Pearl. Qualitative MDPs and POMDPs: An orderofmagnitude approach. In Proceedings of the 18th Conf. on Uncertainty in AI, UAI02, pages 6168, Edmonton, Canada, 2002. [ Links ]
6. C. Boutilier, T. Dean, and S. Hanks. Decisiontheoretic planning: structural assumptions and computational leverage. Journal of AI Research, 11:194, 1999. [ Links ]
7. C. Boutilier, M. Goldszmidt, and B. Sabata. Continuous value function approximation for sequential bidding policies. In Kathryn Laskey and Henri Prade, editors, Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI99), pages 8190. Morgan Kaufmann Publishers, San Francisco, California, USA, 1999. [ Links ]
8. Elvira Consortium. Elvira: an environment for creating and using probabilistic graphical models. Technical report, U. de Granada, Spain, 2002. [ Links ]
9. G. F. Cooper and E. Herskovits. A bayesian method for the induction of probabilistic networks from data. Machine Learning, 1992. [ Links ]
10. T. Dean and K. Kanazawa. A model for reasoning about persistence and causation. Computational Intelligence, 5:142150, 1989. [ Links ]
11. Z. Feng, R. Dearden, N. Meuleau, and R. Washington. Dynamic programming for structured continuous Markov decision problems. In Proc. of the 20th Conf. on Uncertainty in AI (UAI2004). Banff, Canada, 2004. [ Links ]
12. C. Guestrin, M. Hauskrecht, and B. Kveton. Solving factored MDPs with continuous and discrete variables. In Twentieth Conference on Uncertainty in Artificial Intelligence (UAI 2004), Banff, Canada, 2004. [ Links ]
13. M. Hauskrecht and B. Kveton. Linear program approximation for factored continuousstate Markov decision processes. In In Advances in Neural Information Processing Systems NIPS(03), pages 895 902, 2003. [ Links ]
14. J. Hoey, R. StAubin, A. Hu, and C. Boutilier. SPUDD: Stochastic planning using decision diagrams. In Proc. of the 15th Conf. on Uncertainty in AI, UAI99, pages 279288, 1999. [ Links ]
15. L. Li and M. L. Littman. Lazy approximation for solving continuous finitehorizon MDPs. In AAAI05, pages 11751180, Pittsburgh, PA, 2005. [ Links ]
16. R. Munos and A. Moore. Variable resolution discretization for highaccuracy solutions of optimal control problems. In Thomas Dean, editor, Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI99), pages 13481355. Morgan Kaufmann Publishers, San Francisco, California, USA, August 1999. [ Links ]
17. J. Pineau, G. Gordon, and S. Thrun. Policycontingent abstraction for robust control. In Proc. of the 19th Conf. on Uncertainty in AI, UAI03, pages 477484, 2003. [ Links ]
18. M. L. Puterman. Markov Decision Processes. Wiley, New York, 1994. [ Links ]
19. J.R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81106, 1986. [ Links ]
20. J.R. Quinlan. C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco, Calif., USA., 1993. [ Links ]
21. R. S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998. [ Links ]
22. I.H. Witten. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, 2nd Ed. Morgan Kaufmann, USA, 2005. [ Links ]