## Markov Decision Processes with Multiple Objectives

*Krishnendu Chatterjee, Rupak Majumdar, and
Thomas A. Henzinger*

We consider Markov decision processes (MDPs) with multiple discounted
reward objectives. Such MDPs occur in design problems where one
wishes to simultaneously optimize several criteria, for example,
latency and power. The possible trade-offs between the different
objectives are characterized by the Pareto curve. We show that every
Pareto-optimal point can be achieved by a memoryless strategy;
however, unlike in the single-objective case, the memoryless strategy
may require randomization. Moreover, we show that the Pareto curve
can be approximated in polynomial time in the size of the MDP.
Additionally, we study the problem if a given value vector is
realizable by any strategy, and show that it can be decided in
polynomial time; but the question whether it is realizable by a
deterministic memoryless strategy is NP-complete. These results
provide efficient algorithms for design exploration in MDP models with
multiple objectives.

*Proceedings of the
23rd International Conference on Theoretical Aspects of Computer Science*
(STACS),
Lecture Notes in Computer Science 3884,
Springer, 2006, pp. 325-336.

Download inofficial, sometimes updated
PostScript /
PDF document.
© 2006 Springer.