Skip to content

nolnolon/MDP-Modelling-Example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 

Repository files navigation

MPD Modelling

Modelling requirements Specifications
A system consists of 2 components that both need to be functioning for the whole system to work $s = (i, j)$
Each component can be in any of 10 states indicating the level of deterioration; state 10 = failure i $\in$ {1,..., 10}, j $\in$ {1, ..., 10} --> $S_i = {1, ..., 10}$, $S_j = {1, ..., 10}$, $S = S_i X S_j$
Every time unit there is a 10% probability for each component that it deteriorates to the next state $p = 0.1$, $p' = 1 - p$
A reward of 1 is received when the system is functioning and no component is being repaired $r = 1$
There are two types of repair: preventive and corrective. Preventive repair costs 5 per component, corrective repair 25 per component $r = -5$ for preventive, $r = -25$ for corrective
It is possible to repair both components at the same time, preventing additional downtime $a_0$ (do nothing), $a_1$ (repair $i$ preventively), $a_2$ (repair $j$ preventively), $a_3$ (repair both), $a_4$ (repair $i$ correctively), $a_5$ (repair $j$ correctively and $i$ preventively OR repair $i$ correctively and $j$ preventively), $a_6$ (repair both correctively)
The objective is the maximize the long-run average reward $g = \sum_{s\in S} \pi_s*r(s, (\pi, s))$

Corrective repair happens (immediately) when one of the components has failed. Repair takes 1 time unit, afterwards the component(s) that is/are repaired is in state 1. If 1 component is being repaired, the other one does not deteriorate

Tasks

c) Allow for preventive repair of 1 component if the other has failed. Determine using value iteration the optimal policy. Give its value and plot the optimal policy. Interpret your results.

d) Now allow for preventive repair of 1 or 2 components in any state. Determine using value iteration the optimal policy. Give its value and plot the optimal policy. Interpret your results.

Answers

Transition probabilities:

transition probability value
$P((i,j),(i,j)) = p'p'$ $0.81$
$P((i,j),(min(i+1, 10), j) = pp'$ $0.09$
$P((i, j), (i, min(j+1, 10))) = p'p$ $0.09$
$P((i, j), ((min(i+1), 10), (min(j+1),10)) = pp$ $0.01$
$P((10, j), (1, j)) = 1$ $1$
$P((i, 10), (i, 1))= 1$ $$
$P((10, 10), (1,1) = 1$ $1$

Actions, rewards and costs

actions reward/cost
$a_0$ $r_0 = 1$
$a_2$ $r_2 = -5$
$a_3$ $r_3 = (-5)*2 = 10$
$a_4$ $r_4 = -25$
$a_5$ $r_5 = -25 + (-5) = -30$
$a_6$ $r_6 = -25 * 2 = -50$

c)

In this case, we only consider actions A = ${a_0, a_4, a_5, a_6}$. The optimal long-term average reward 0.5857. When both components are functioning, the optimal action is always to do nothing because that is the only available action (Fig.C). When one component is failing but the other one is still in state 1, the optimal action is corrective repair ($a_4$). Once that component enters state 2 and the other one fails, the optimal action is to preventively repair the other component ($a_5$).

p_c

d)

In this case, the available actions are the full action set $A = {a_0, a_1, a_2, a_3, a_4, a_5, a_6}$. The optimal long-term reward is 0.8542. In this setting, it is significantly higher, because it is possible to take preventive action and reset components to state 1 more often.
Whenever a component reaches state 8-9, the optimal action is to repair preventively, which cuts costs $(r=-5 > r=-25)$ (Fig.1(b)). When both components inch closer to the fail state $((8,7),(7,8), (8,8))$, the optimal action is to preventively repair both. In the cases where one component fails, the distribution of optimal actions for the other component is different from that suggested by the optimal policy in c): current policy favours corrective repair ($a_4$) up to state 6 of the functioning component.

p_d

Assignment 2 (c & d), Dynamic Programming and Reinforcement Learning (November 2025)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages