DSpace Repository

Complexity Estimates and Reductions to Discounting for Total and Average-Reward Markov Decision Processes and Stochastic Games

Show simple item record

dc.contributor.advisor Feinberg, Eugene A. en_US
dc.contributor.author Huang, Jefferson en_US
dc.contributor.other Department of Applied Mathematics and Statistics en_US
dc.date.accessioned 2017-09-20T16:52:08Z
dc.date.available 2017-09-20T16:52:08Z
dc.date.issued 2016-12-01 en_US
dc.identifier.uri http://hdl.handle.net/11401/77166 en_US
dc.description 125 pg. en_US
dc.description.abstract Recently there has been a resurgence of interest in the complexity of algorithms for Markov decision processes (MDPs) and stochastic games. Much of this work was inspired by recent groundbreaking results on the complexity of policy iteration algorithms for MDPs under the Blum-Shub-Smale (BSS) model of computation. In particular, for discounted MDPs with a fixed discount factor, Yinyu Ye showed that the number of arithmetic operations needed by two classic variants of policy iteration can be bounded above by a polynomial in the number of state-action pairs only. A natural question is whether a similar complexity estimate exists for the value iteration algorithm, which is another classic approach to computing optimal policies for MDPs. Our first main contribution is a negative answer to this question. Using a deterministic MDP with four state-action pairs, we show that under the BSS model there is no upper bound on the number of iterations needed by value iteration to return an optimal policy. We also show that the same example implies the same result for a broad class of so-called optimistic policy iteration algorithms, which includes algorithms of interest to the reinforcement learning community such as λ-policy iteration and modified policy iteration. Another natural question is whether Ye's approach can yield results for MDPs under other optimality criteria. Our second main contribution is a formulation of conditions, which to our knowledge are the most general ones known, under which MDPs and two-player zero-sum stochastic games with Borel state and action spaces can be reduced to discounted ones. For undiscounted total-reward MDPs and stochastic games, the transformations we formulate are based on an idea due to Alan Hoffman. For average-rewards, the transformations are extensions to Borel state and action spaces of one proposed recently for finite stochastic games. In addition to implying the existence of ε-optimal policies for total and average rewards, these reductions lead to estimates of the number of arithmetic operations needed to compute optimal policies for such models with finite state and actions sets, as well as complexity estimates for computing ε-optimal policies for MDPs with Euclidean state and action spaces. en_US
dc.description.sponsorship This work is sponsored by the Stony Brook University Graduate School in compliance with the requirements for completion of degree. en_US
dc.format Monograph en_US
dc.format.medium Electronic Resource en_US
dc.language.iso en_US en_US
dc.publisher The Graduate School, Stony Brook University: Stony Brook, NY. en_US
dc.subject.lcsh Operations research -- Mathematics en_US
dc.subject.other algorithm, complexity, control, Markov decision process, policy, stochastic game en_US
dc.title Complexity Estimates and Reductions to Discounting for Total and Average-Reward Markov Decision Processes and Stochastic Games en_US
dc.type Dissertation en_US
dc.mimetype Application/PDF en_US
dc.contributor.committeemember Mitchell, Joseph S. B. en_US
dc.contributor.committeemember Hu, Jiaqiao en_US
dc.contributor.committeemember Gao, Jie. en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account