| Modelling requirements | Specifications |
|---|---|
| A system consists of 2 components that both need to be functioning for the whole system to work | |
| Each component can be in any of 10 states indicating the level of deterioration; state 10 = failure | i |
| Every time unit there is a 10% probability for each component that it deteriorates to the next state |
|
| A reward of 1 is received when the system is functioning and no component is being repaired | |
| There are two types of repair: preventive and corrective. Preventive repair costs 5 per component, corrective repair 25 per component |
|
| It is possible to repair both components at the same time, preventing additional downtime |
|
| The objective is the maximize the long-run average reward |
Corrective repair happens (immediately) when one of the components has failed. Repair takes 1 time unit, afterwards the component(s) that is/are repaired is in state 1. If 1 component is being repaired, the other one does not deteriorate
c) Allow for preventive repair of 1 component if the other has failed. Determine using value iteration the optimal policy. Give its value and plot the optimal policy. Interpret your results.
d) Now allow for preventive repair of 1 or 2 components in any state. Determine using value iteration the optimal policy. Give its value and plot the optimal policy. Interpret your results.
| transition probability | value |
|---|---|
| $$ | |
| actions | reward/cost | |
|---|---|---|
In this case, we only consider actions A =
In this case, the available actions are the full action set
Whenever a component reaches state 8-9, the optimal action is to repair preventively, which cuts costs
Assignment 2 (c & d), Dynamic Programming and Reinforcement Learning (November 2025)