Reinforcement Learning and Approximate Dynamic Programming

Posted by

Reinforcement studying (RL) and adaptive dynamic programming (ADP) has been essentially the most serious study fields in technology and engineering for contemporary complicated structures.

This publication describes the most recent RL and ADP thoughts for choice and keep watch over in human engineered platforms, masking either unmarried participant choice and regulate and multi-player video games.

Edited by way of the pioneers of RL and ADP study, the booklet brings jointly rules and techniques from many fields and gives a huge and well timed tips on controlling a wide selection of structures, equivalent to robots, business strategies, and monetary decision-making.

Show description

Read or Download Reinforcement Learning and Approximate Dynamic Programming for Feedback Control (IEEE Press Series on Computational Intelligence, Volume 17) PDF

Best computer science books

Computer Science Illuminated

Designed to provide a breadth first assurance of the sector of machine technological know-how.

Introduction to Data Compression (4th Edition) (The Morgan Kaufmann Series in Multimedia Information and Systems)

Each one version of advent to information Compression has greatly been thought of the easiest advent and reference textual content at the paintings and technological know-how of knowledge compression, and the fourth variation maintains during this culture. information compression ideas and know-how are ever-evolving with new purposes in picture, speech, textual content, audio, and video.

Computers as Components: Principles of Embedded Computing System Design (3rd Edition) (The Morgan Kaufmann Series in Computer Architecture and Design)

Desktops as parts: ideas of Embedded Computing procedure layout, 3e, offers crucial wisdom on embedded platforms know-how and methods. up-to-date for today's embedded structures layout equipment, this variation good points new examples together with electronic sign processing, multimedia, and cyber-physical platforms.

Computation and Storage in the Cloud: Understanding the Trade-Offs

Computation and garage within the Cloud is the 1st finished and systematic paintings investigating the difficulty of computation and garage trade-off within the cloud with a purpose to lessen the general program expense. clinical purposes tend to be computation and knowledge extensive, the place complicated computation initiatives take decades for execution and the generated datasets are usually terabytes or petabytes in measurement.

Extra info for Reinforcement Learning and Approximate Dynamic Programming for Feedback Control (IEEE Press Series on Computational Intelligence, Volume 17)

Sample text

In most applications today, we do not actually include a random term (e) in the action network, but stochastic exploration of the physical WHAT IS RLADP? 13 world is an important part of animal learning, and may become more important in challenging future applications. These fundamental methods are described in great detail in Handbook of Intelligent Control [4]. Many applications and variations and special cases have appeared since, in [6] and in this book, for example. But there is still a basic choice between approximating J ∗ (as in HDP), approximating λ (as in DHP) and approximating J ∗ while accounting for gradient error (as in GDHP), with or without a dependence on the actions u(t) (as in the action-dependent variations).

Unlike ADP, they do not offer a true brain-like real-time learning option. Even in using NMPC in receding horizon control, one can often improve performance by training a critic network to evaluate the final state X(T ). 3 SOME BASIC CHALLENGES IN IMPLEMENTING ADP Among the crucial choices in using ADP are • discrete time versus continuous time, • how to account for the effect of unseen variables, SOME BASIC CHALLENGES IN IMPLEMENTING ADP 15 • offline controller design versus real-time learning, “model-based methods” like HDP and DHP versus “model free methods” like ADHDP and Q learning, • how to approximate the value function effectively, • how to pick u(t) at each time t even knowing the value function, • how to use RLADP to build effective cooperative multiagent systems?

One would expect deep learning to yield similar benefits here, especially if there is more research in this area. 14) could become an alternative general-purpose method for ADP, if powerful enough methods were found to adapt the basis functions φi here or in linearized DHP (LDHP), defined by: λ∧ i (X) = n Wij φj . 15) j=1 In discussions at an NSF workshop on ADP in Mexico, Van Roy suggested that we could solve this problem by using nonlinear programming somehow. James Momoh suggested that his new implementation of interior point methods for nonlinear programming might make this practical, but there has been no follow-up on this possibility.

Download PDF sample

Rated 4.11 of 5 – based on 26 votes