Department of Computer science, Faculty of Mathematical Sciences, Vali-e-Asr University ofRafsanjan, Rafsanjan, Iran
تاریخ دریافت: 13 بهمن 1396،
تاریخ بازنگری: 22 اردیبهشت 1397،
تاریخ پذیرش: 22 آبان 1397
چکیده
Background and Objectives: Numerical iterative methods are widely used to compute reachability probabilities and expected rewards in probabilistic model checking of Markov decision processes. Several approaches have been proposed to improve the performance of these iterative methods. Reducing the total number of iterations is an important goal that is followed by several techniques in this paper. Methods: In this paper, we consider MDPs with different levels of non-determinism. We show that modified policy iteration performs better than the other standard iterative methods when the degree of non-determinism increases. We propose some novel methods to improve the performance of the modified policy iteration method. Our approach is to define several criteria to avoid useless iterations and updates of the modified policy iteration method. The first criterion considers the graphical structure of the model to define the number of iterations after each policy modification. The second criterion is a dynamic one to determine when a policy should be updated. The third proposed approach in this work is to use several priority heaps to select states for value updates. Results: Our proposed methods are implemented in the PRISM model checker and applied on several standard case studies. The results of experiments show that the running times of our approaches are less than the running time of the standard and previous methods for most case studies. The main reason for these results is that the total numbers of iterations and updates are reduced by our approaches, which results in an improvement in the performance of the method. While the running times are reduced by our approaches, the precision of computations are kept in most cases. Conclusion: The proposed techniques are able to reduce the number of iterations and accelerate the convergence to the optimal policies. The also prioritize the commutations to reduce the total number of updates.