Van Vaerenbergh, K., Rodriguez, A., Gagliolo, M., Vrancx, P., Nowé, A., Stoev, J., Goossens, S., Pinte, G., and Symens, W. Improving wet clutch engagement with reinforcement learning. In International Joint Conference on Neural Networks, IJCNN 2012. Accepted. [ bib | event ]
Gagliolo, M., Van Vaerenbergh, K., Rodriguez, A., Nowé, A., Goossens, S., Pinte, G., and Symens, W. (2011b). Policy search reinforcement learning for automatic wet clutch engagement. In 15th International Conference on System Theory, Control and Computing - ICSTCC 2011, pages 250-255. IEEE. [ bib | event | slides | http | .pdf ]
In most existing motion control algorithms, a reference trajectory is tracked, based on a continuous measurement of the system's response. In many industrial applications, however, it is either not possible or too expensive to install sensors which measure the system's output over the complete stroke: instead, the motion can only be detected at certain discrete positions. The control objective in these systems is often not to track a complete trajectory accurately, but rather to achieve a given state at the sensor locations (e.g. to pass by the sensor at a given time, or with a given speed). Model-based control strategies are not suited for the control of these systems, due to the lack of sensor data. We are currently investigating the potential of a non-model-based learning strategy, Reinforcement Learning (RL), in dealing with this kind of discrete sensor information. Here, we describe ongoing experiments with a wet clutch, which has to be engaged smoothly yet quickly, without any feedback on piston position.
Pinte, G., Stoev, J., Symens, W., Dutta, A., Zhong, Y., Wyns, B., De Keyser, R., Depraetere, B., Swevers, J., Gagliolo, M., and Nowé, A. (2011). Learning strategies for wet clutch control. In 15th International Conference on System Theory, Control and Computing - ICSTCC 2011, pages 467-474. IEEE. [ bib | event ]
This paper presents an overview of model-based (Iterative Learning Control, Model Predictive Control and Iterative Optimization) and non-model-based (Genetic-based Machine Learning and Reinforcement Learning) learning strategies for the control of wet clutches. Based on theoretical considerations and a validation on an experimental test bench containing wet clutches, the benefits and drawbacks of the different strategies are compared. Although after convergence a good engagement quality can be obtained by all strategies, only model-based strategies are suited for online applicability. The convergence time for non-model-based strategies is too long such that they can only be applied during an offline calibration phase.
Rodriguez, A., Gagliolo, M., Vrancx, P., Grau, R., and Nowé, A. (2011). Improving the performance of continuous action reinforcement learning automata. 9th European Workshop on Reinforcement Learning, EWRL 2011. [ bib | event | .pdf ]
Learning automata (LA) are policy iteration reinforcement learners that exhibit nice convergence properties in discrete action settings. Recently, a novel formulation for continuous action reinforcement learning automata was proposed (CARLA), featuring an interesting exploration strategy, which is well suited for learning from sparse reward signals. In this paper, we propose an improvement of the algorithm, and evaluate its impact on performance comparing with the original version, as well as with another continuous LA. The experimental evaluation is carried out both in simulation and on a real control setup, showing a clear advantage of our method, which also performs well in distributed control settings.
Gagliolo, M., Van Vaerenbergh, K., Rodriguez, A., Nowé, A., Goossens, S., Pinte, G., and Symens, W. (2011a). Policy gradient methods for controlling systems with discrete sensor information. 20th Annual Belgian-Dutch Conference on Machine Learning - BeNeLearn, online proceedings, p. 115-116. [ bib | event | .pdf ]
In most existing motion control algorithms, a reference trajectory is tracked, based on a continuous measurement of the system's response. In many industrial applications, however, it is either not possible or too expensive to install sensors which measure the system's output over the complete stroke: instead, the motion can only be detected at certain discrete positions. The control objective in these systems is often not to track a complete trajectory accurately, but rather to achieve a given state at the sensor locations (e.g. to pass by the sensor at a given time, or with a given speed). Model-based control strategies are not suited for the control of these systems, due to the lack of sensor data. We are currently investigating the potential of a non-model-based learning strategy, Reinforcement Learning (RL), in dealing with this kind of discrete sensor information. Here, we describe experiments with a simple yet challenging system, where a single sensor detects the passage of a mass being pushed by a linear motor.
Gagliolo, M. and Schmidhuber, J. (2011). Algorithm portfolio selection as a bandit problem with unbounded losses. Annals of Mathematics and Artificial Intelligence, 61(2):49-86. [ bib | DOI | .pdf ]
We propose a method that learns to allocate computation time to a given set of algorithms, of unknown performance, with the aim of solving a given sequence of problem instances in a minimum time. Analogous meta-learning techniques are typically based on models of algorithm performance, learned during a separate offline training sequence which can be prohibitively expensive. We adopt instead an online approach, named GambleTA, in which algorithm performance models are iteratively updated, and used to guide allocation on a sequence of problem instances. GambleTA is a general method for selecting among two or more alternative algorithm portfolios. Each portfolio has its own way of allocating computation time to the available algorithms, possibly based on performance models, in which case its performance is expected to improve over time, as more runtime data becomes available. The resulting exploration-exploitation trade-off is represented as a bandit problem. In our previous work, the algorithms corresponded to the arms of the bandit, and allocations evaluated by the different portfolios were mixed, using a solver for the bandit problem with expert advice, but this required the setting of an arbitrary bound on algorithm runtimes, invalidating the optimal regret of the solver. In this paper, we propose a simpler version of GambleTA, in which the allocators correspond to the arms, such that a single portfolio is selected for each instance. The selection is represented as a bandit problem with partial information, and an unknown bound on losses. We devise a solver for this game, proving a bound on its expected regret. We present experiments based on results from several solver competitions, in various domains, comparing GambleTA with another online method.
Goossens, S., Pinte, G., Symens, W., Gagliolo, M., Rodriguez, A., and Nowé, A. (2011). Reinforcement learning for repetitive systems with discrete sensors. In 30th Benelux Meeting on Systems and Control - Book of Abstracts, page 149, Gent, Belgium. Universiteit Gent. [ bib | event | slides | .pdf ]
Gagliolo, M. and Legrand, C. (2010). Algorithm survival analysis. In Bartz-Beielstein, T., Chiarandini, M., Paquete, L., and Preuss, M., editors, Experimental Methods for the Analysis of Optimization Algorithms, pages 161-184. Springer, Berlin, Heidelberg. [ bib | DOI | .pdf ]
Algorithm selection is typically based on models of algorithm performance,learned during a separate offline training sequence, which can be prohibitively expensive. In recent work, we adopted an online approach, in which models of the runtime distributions of the available algorithms are iteratively updated and used to guide the allocation of computational resources, while solving a sequence of problem instances. The models are estimated using survival analysis techniques, which allow us to reduce computation time, censoring the runtimes of the slower algorithms. Here, we review the statistical aspects of our online selection method, discussing the bias induced in the runtime distributions (RTD) models by the competition of different algorithms on the same problem instances.
Gagliolo, M. and Schmidhuber, J. (2010). Algorithm selection as a bandit problem with unbounded losses. In Blum, C. and Battiti, R., editors, Learning and Intelligent Optimization. 4th International Conference, LION 4, Venice, Italy, January 18-22, 2010. Selected Papers, volume 6073 of LNCS, pages 82-96, Berlin, Heidelberg. Springer. [ bib | DOI | arXiv | event | slides | techrep | .pdf ]
Algorithm selection is typically based on models of algorithm performance learned during a separate offline training sequence, which can be prohibitively expensive. In recent work, we adopted an online approach, in which a performance model is iteratively updated and used to guide selection on a sequence of problem instances. The resulting exploration-exploitation trade-off was represented as a bandit problem with expert advice, using an existing solver for this game, but this required the setting of an arbitrary bound on algorithm runtimes, thus invalidating the optimal regret of the solver. In this paper, we propose a simpler framework for representing algorithm selection as a bandit problem, with partial information, and an unknown bound on losses. We adapt an existing solver to this game, proving a bound on its expected regret, which holds also for the resulting algorithm selection technique. We present experiments with a set of SAT solvers on a mixed SAT-UNSAT benchmark.
Gagliolo, M. (2010). Online Dynamic Algorithm Portfolios. PhD thesis, IDSIA/University of Lugano, Lugano, Switzerland. [ bib | slides | .pdf ]
This thesis presents methods for minimizing the computational effort of problem solving. Rather than looking at a particular algorithm, we consider the issue of computational complexity at a higher level, and propose techniques that, given a set of candidate algorithms, of unknown performance, learn to use these algorithms while solving a sequence of problem instances, with the aim of solving all instances in a minimum time. An analogous meta-level approach to problem solving has been adopted in many different fields, with different aims and terminology. A widely accepted term to describe it is algorithm selection. Algorithm portfolios represent a more general framework, in which computation time is allocated to a set of algorithms running on one or more processors.Automating algorithm selection is an old dream of the AI community, which has been brought closer to reality in the last decade. Most available selection techniques are based on a model of algorithm performance, assumed to be available, or learned during a separate offline training sequence, which is often prohibitively expensive. The model is used to perform a static allocation of resources, with no feedback from the actual execution of the algorithms. There is a trade-off between the performance of model-based selection, and the cost of learning the model. In this thesis, we formulate this trade-off as a bandit problem.
We propose GambleTA, a fully dynamic and online algorithm portfolio selection technique, with no separate training phase: all candidate algorithms are run in parallel, while a model incrementally learns their runtime distributions. A redundant set of time allocators uses the partially trained model to optimize machine time shares for the algorithms, in order to minimize runtime. A bandit problem solver picks the allocator to use on each instance, gradually increasing the impact of the best time allocators as the model improves. A similar approach is adopted for learning restart strategies online (GambleR). In both cases, the runtime distributions are modeled using survival analysis techniques; unsuccessful runs are correctly considered as censored runtime observations, allowing to save further computation time.
The methods proposed are validated with several experiments, mostly based on data from solver competitions, displaying a robust performance in a variety of settings, and showing that rough performance models already allow to allocate resources efficiently, reducing the risk of wasting computation time.
Gagliolo, M., Legrand, C., and Birattari, M. (2009). Mixed-effects modeling of optimisation algorithm performance. In Stützle, T., Birattari, M., and Hoos, H. H., editors, Engineering Stochastic Local Search Algorithms. Designing, Implementing and Analyzing Effective Heuristics, Second International Workshop, SLS 2009, Brussels, Belgium, September 3-4, 2009. Proceedings, volume 5752 of LNCS, pages 150-154. Springer. [ bib | DOI | event | poster | .pdf ]
The learning curves of optimisation algorithms, plotting the evolution of the objective vs. runtime spent. can be viewed as a sample of longitudinal data. In this paper we describe mixed-effects modeling, a standard technique in longitudinal data analysis, and give an example of its application to algorithm performance modeling.
Gagliolo, M. and Schmidhuber, J. (2009). Towards distributed algorithm portfolios. In Corchado, J. M., Rodríguez, S., Llinas, J., and Molina, J. M., editors, International Symposium on Distributed Computing and Artificial Intelligence 2008 (DCAI 2008), volume 50 of Advances in Soft Computing, pages 634-643. Springer. [ bib | DOI | event | slides | .pdf ]
In recent work we have developed an online algorithm selection technique, in which a model of algorithm performance is learned incrementally while being used. The resulting exploration-exploitation trade-off is solved as a bandit problem. The candidate solvers are run in parallel on a single machine, as an algorithm portfolio, and computation time is shared among them according to their expected performances. In this paper, we extend our technique to the more interesting and practical case of multiple CPUs.
Schmidhuber, J., Wierstra, D., Gagliolo, M., and Gomez, F. (2007). Training recurrent networks by evolino. Neural computation, 19(3):757-779. [ bib | DOI | .pdf ]
In recent years, gradient-based LSTM recurrent neural networks (RNNs) solved many previously RNN-unlearnable tasks. Sometimes, however, gradient information is of little use for training RNNs, due to numerous local minima. For such cases, we present a novel method: EVOlution of systems with LINear Outputs (Evolino). Evolino evolves weights to the nonlinear, hidden nodes of RNNs while computing optimal linear mappings from hidden state to output, using methods such as pseudo-inverse-based linear regression. If we instead use quadratic programming to maximize the margin, we obtain the first evolutionary recurrent support vector machines. We show that Evolino-based LSTM can solve tasks that Echo State nets (Jaeger, 2004a) cannot and achieves higher accuracy in certain continuous function generation tasks than conventional gradient descent RNNs, including gradient-based LSTM.
Gagliolo, M. (2007). Universal search. Scholarpedia, 2(11):2575. [ bib | DOI | http ]
Gagliolo, M. and Schmidhuber, J. (2007). Learning restart strategies. In Veloso, M. M., editor, IJCAI 2007 - Twentieth International Joint Conference on Artificial Intelligence, vol. 1, pages 792-797. AAAI Press. [ bib | event | slides | .pdf ]
Restart strategies are commonly used for minimizing the computational cost of randomized algorithms, but require prior knowledge of the run-time distribution in order to be effective. We propose a portfolio of two strategies, one fixed, with a provable bound on performance, the other based on a model of run-time distribution, updated as the two strategies are run on a sequence of problems. Computational resources are allocated probabilistically to the two strategies, based on their performances, using a well-known K-armed bandit problem solver. We present bounds on the performance of the resulting technique, and experiments with a satisfiability problem solver, showing rapid convergence to a near-optimal execution time.
Gagliolo, M. and Schmidhuber, J. (2006d). Learning dynamic algorithm portfolios. Annals of Mathematics and Artificial Intelligence, 47(3):295-328. AI&MATH 2006 Special Issue. [ bib | DOI | slides | techrep | .pdf ]
Algorithm selection can be performed using a model of runtime distribution, learned during a preliminary training phase. There is a trade-off between the performance of model-based algorithm selection, and the cost of learning the model. In this paper, we treat this trade-off in the context of bandit problems. We propose a fully dynamic and online algorithm selection technique, with no separate training phase: all candidate algorithms are run in parallel, while a model incrementally learns their runtime distributions. A redundant set of time allocators uses the partially trained model to propose machine time shares for the algorithms. A bandit problem solver mixes the model-based shares with a uniform share, gradually increasing the impact of the best time allocators as the model improves. We present experiments with a set of SAT solvers on a mixed SAT-UNSAT benchmark; and with a set of solvers for the Auction Winner Determination problem.
Gagliolo, M. and Schmidhuber, J. (2006b). Gambling in a computationally expensive casino: Algorithm selection as a bandit problem. Online Trading of Exploration and Exploitation - NIPS 2006 Workshop. [ bib | event | .pdf ]
Automating algorithm selection and parameter tuning is an old dream of the AI community, which has been brought closer to reality in the last decade. Most available techniques are either oblivious, with no knowledge transfer across different problems; or are based on a model of algorithm performance, learned in a separate offline training sequence, which is often prohibitively expensive. We describe recent work in which the problem is treated in a fully online setting. A model of algorithm performance can be learned and used to reduce the cost of learning it. The resulting exploration-exploitation trade-off can be treated in the context of Bandit problems.
Gagliolo, M. and Schmidhuber, J. (2006c). Impact of censored sampling on the performance of restart strategies. In Benhamou, F., editor, Principle and Practice of Constraint Programming - CP 2006, 12th International Conference, CP 2006, Nantes, France, September 25-29, 2006, Proceedings, volume 4204 of LNCS, pages 167-181. Springer. [ bib | DOI | event | slides | .pdf ]
Algorithm selection, algorithm portfolios, and randomized restarts, can profit from a probabilistic model of algorithm run-time, to be estimated from data gathered by solving a set of experiments. Censored sampling offers a principled way of reducing this initial training time. We study the trade-off between training time and model precision by varying the censoring threshold, and analyzing the consequent impact on the performance of an optimal restart strategy, based on an estimated model of runtime distribution. We present experiments with a SAT solver on a graph-coloring benchmark. Due to the heavy-tailed runtime distribution, a modest censoring can already reduce training time by a few orders of magnitudes. The nature of the optimization process underlying the restart strategy renders its performance surprisingly robust, also to more aggressive censoring.
Gagliolo, M. (2006). Dynamic meta learning. 50th Anniversary Summit of Artificial Intelligence, Monte Veritá, Switzerland. [ bib | event | poster | .pdf ]
Schmidhuber, J., Gagliolo, M., Wierstra, D., and Gomez, F. (2006). Evolino for recurrent support vector machines. In Verleysen, M., editor, ESANN '06 - 14 th European Symposium on Artificial Neural Networks, pages 593-598, Evere, Belgium. d-side. [ bib | arXiv | event | techrep ]
Gagliolo, M. and Schmidhuber, J. (2006a). Dynamic algorithm portfolios. AI&MATH '06 - Ninth International Symposium on Artificial Intelligence and Mathematics. [ bib | event | slides | .pdf ]
Gagliolo, M. and Schmidhuber, J. (2005b). Towards life-long meta learning. Inductive Transfer : 10 Years Later - NIPS 2005 workshop. [ bib | event | .pdf ]
Gagliolo, M. and Schmidhuber, J. (2005a). A neural network model for inter-problem adaptive online time allocation. In Duch, W., Kacprzyk, J., Oja, E., and Zadrożny, S., editors, Artificial Neural Networks: Formal Models and Their Applications - ICANN 2005, 15th International Conference, Warsaw, Poland, September 11-15, 2005, Proceedings, Part 2, volume 3697 of LNCS, pages 7-12. Springer. [ bib | DOI | event | slides | .pdf ]
One aim of Meta-learning techniques is to minimize the time needed for problem solving, and the effort of parameter hand-tuning, by automating algorithm selection. The predictive model of algorithm performance needed for this task often requires long training times. We address the problem in an online fashion, running multiple algorithms in parallel on a sequence of tasks, continually updating their relative priorities according to a neural model that maps their current state to the expected time to the solution. The model itself is updated at the end of each task, based on the actual performance of each algorithm. Censored sampling allows us to train the model effectively, without need of additional exploration after each task's solution. We present a preliminary experiment in which this new inter-problem technique learns to outperform a previously proposed intra-problem heuristic.
Gagliolo, M., Zhumatiy, V., and Schmidhuber, J. (2004). Adaptive online time allocation to search algorithms. In Boulicaut, J., F.Esposito, Giannotti, F., and Pedreschi, D., editors, Machine Learning: ECML 2004. Proceedings of the 15th European Conference on Machine Learning, Pisa, Italy, September 20-24, 2004, volume 3201 of LNCS, pages 134-143. Springer. [ bib | DOI | event | slides | techrep ]
Given is a search problem or a sequence of search problems, as well as a set of potentially useful search algorithms. We propose a general framework for online allocation of computation time to search algorithms based on experience with their performance so far. In an example instantiation, we use simple linear extrapolation of performance for allocating time to various simultaneously running genetic algorithms characterized by different parameter values. Despite the large number of searchers tested in parallel, on various tasks this rather general approach compares favorably to a more specialized state-of-the-art heuristic; in one case it is nearly two orders of magnitude faster.
Schmidhuber, J., Zhumatiy, V., and Gagliolo, M. (2004). Bias-optimal incremental learning of control sequences for virtual robots. In et al., F. G., editor, Proc. 8th Conference on Intelligent Autonomous Systems, IAS-8, pages 658-665, Amsterdam, NL. IOS Press. [ bib ]
Anguita, D. and Gagliolo, M. (2002). Mdl based model selection for relevance vector regression. In Dorronsoro, J., editor, Artificial Neural Networks - ICANN 2002, volume 2415 of LNCS, pages 468-473. Springer. [ bib | DOI | .pdf ]
Relevance Vector regression is a form of Support Vector regression, recently proposed by M.E.Tipping, which allows a sparse representation of the data. The Bayesian learning algorithm proposed by the author leaves the partially open question of how to automatically choose the optimal model. In this paper we describe a model selection criterion inspired by the Minimum Description Length (MDL) principle. We show that our proposal is effective in finding the optimal kernel parameter both on an artificial dataset and a real-world application.
This file was generated by bibtex2html 1.96.