MPD Modelling

Modelling requirements	Specifications
A system consists of 2 components that both need to be functioning for the whole system to work	$s = (i, j)$
Each component can be in any of 10 states indicating the level of deterioration; state 10 = failure	i $\in$ {1,..., 10}, j $\in$ {1, ..., 10} --> $S_i = {1, ..., 10}$, $S_j = {1, ..., 10}$, $S = S_i X S_j$
Every time unit there is a 10% probability for each component that it deteriorates to the next state	$p = 0.1$, $p' = 1 - p$
A reward of 1 is received when the system is functioning and no component is being repaired	$r = 1$
There are two types of repair: preventive and corrective. Preventive repair costs 5 per component, corrective repair 25 per component	$r = -5$ for preventive, $r = -25$ for corrective
It is possible to repair both components at the same time, preventing additional downtime	$a_0$ (do nothing), $a_1$ (repair $i$ preventively), $a_2$ (repair $j$ preventively), $a_3$ (repair both), $a_4$ (repair $i$ correctively), $a_5$ (repair $j$ correctively and $i$ preventively OR repair $i$ correctively and $j$ preventively), $a_6$ (repair both correctively)
The objective is the maximize the long-run average reward	$g = \sum_{s\in S} \pi_s*r(s, (\pi, s))$

Corrective repair happens (immediately) when one of the components has failed. Repair takes 1 time unit, afterwards the component(s) that is/are repaired is in state 1. If 1 component is being repaired, the other one does not deteriorate

Tasks

c) Allow for preventive repair of 1 component if the other has failed. Determine using value iteration the optimal policy. Give its value and plot the optimal policy. Interpret your results.

d) Now allow for preventive repair of 1 or 2 components in any state. Determine using value iteration the optimal policy. Give its value and plot the optimal policy. Interpret your results.

Answers

Transition probabilities:

transition probability	value
$P((i,j),(i,j)) = p'p'$	$0.81$
$P((i,j),(min(i+1, 10), j) = pp'$	$0.09$
$P((i, j), (i, min(j+1, 10))) = p'p$	$0.09$
$P((i, j), ((min(i+1), 10), (min(j+1),10)) = pp$	$0.01$
$P((10, j), (1, j)) = 1$	$1$
$P((i, 10), (i, 1))= 1$	$$
$P((10, 10), (1,1) = 1$	$1$

Actions, rewards and costs

	actions	reward/cost
	$a_0$	$r_0 = 1$
	$a_2$	$r_2 = -5$
	$a_3$	$r_3 = (-5)*2 = 10$
	$a_4$	$r_4 = -25$
	$a_5$	$r_5 = -25 + (-5) = -30$
	$a_6$	$r_6 = -25 * 2 = -50$

c)

In this case, we only consider actions A = ${a_0, a_4, a_5, a_6}$. The optimal long-term average reward 0.5857. When both components are functioning, the optimal action is always to do nothing because that is the only available action (Fig.C). When one component is failing but the other one is still in state 1, the optimal action is corrective repair ($a_4$). Once that component enters state 2 and the other one fails, the optimal action is to preventively repair the other component ($a_5$).

d)

In this case, the available actions are the full action set $A = {a_0, a_1, a_2, a_3, a_4, a_5, a_6}$. The optimal long-term reward is 0.8542. In this setting, it is significantly higher, because it is possible to take preventive action and reset components to state 1 more often.
Whenever a component reaches state 8-9, the optimal action is to repair preventively, which cuts costs $(r=-5 > r=-25)$ (Fig.1(b)). When both components inch closer to the fail state $((8,7),(7,8), (8,8))$, the optimal action is to preventively repair both. In the cases where one component fails, the distribution of optimal actions for the other component is different from that suggested by the optimal policy in c): current policy favours corrective repair ($a_4$) up to state 6 of the functioning component.

Assignment 2 (c & d), Dynamic Programming and Reinforcement Learning (November 2025)

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
README.md		README.md
dprl_a2(c_d).py		dprl_a2(c_d).py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MPD Modelling

Tasks

Answers

Transition probabilities:

Actions, rewards and costs

c)

d)

About

Uh oh!

Releases

Packages

Languages

nolnolon/MDP-Modelling-Example

Folders and files

Latest commit

History

Repository files navigation

MPD Modelling

Tasks

Answers

Transition probabilities:

Actions, rewards and costs

c)

d)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages