1- Basics:
1+ # Generations
2+
3+
4+
5+ ## Basics:
6+
27 - We use ACP (atomic commit protocol -- e3pc, paxos, whatever) to ensure that
38 each xact either committed or aborted everywhere.
49 - However, if minority fails, we want to continue working. Thus we are
@@ -37,8 +42,7 @@ change, in particular A and B could throw C off the cluster. We need some
3742causal relationship between these events to make sure apply is safe.
3843
3944
40- ------------------------------------------------------------
41- The algorithm.
45+ ## The algorithm.
4246
4347The goal is to avoid reordering of conflicting xacts. We don't want to always
4448wait for all nodes PREPARE confirmation before committing; however, dealing with
@@ -73,6 +77,8 @@ TBD why this is true.
7377
7478Some data structures:
7579
80+ ``` c
81+
7682struct Generation {
7783 int64 num; /* generation number * /
7884 nodemask_t members; /* generation members * /
@@ -125,9 +131,11 @@ struct GenState {
125131 * /
126132 Generation last_vote;
127133}
134+ ```
128135
129136The voting procedure:
130137In addition to structures above, when conducting voting,
138+ ``` c
131139struct Vote {
132140 NodeId voter;
133141 Generation last_online_in;
@@ -136,7 +144,9 @@ struct Campaign {
136144 Generation proposed_gen;
137145 Vote [ ] collected_votes; /* register received votes here * /
138146} my_campaign;
147+ ```
139148is also kept in shmem.
149+
140150Initially we set first generation <1, all nodes>, in which everyone is recovered
141151(last_online_in = 1).
142152 - Whenever node decides to change generation (i.e. wants to join the cluster), it
@@ -161,9 +171,12 @@ Initially we set first generation <1, all nodes>, in which everyone is recovered
161171 - Processing of messages above by elections initiator:
162172 On VoteGenNumTooLow, restart elections with number at least
163173 received last_vote.num + 1 (local last_vote.num adjusted accordingly)
174+
164175 On VoteOk, remember the vote in collected_votes if we are still conducting
165176 elections with this num. If majority is collected, vote is successfull,
166177 calculate donors which are members of last gen among last_online_in in votes:
178+
179+ ``` c
167180 {
168181 Generation latest_gen = { .num = 0 }
169182 foreach v in my_campaign->collected_votes {
@@ -172,6 +185,7 @@ Initially we set first generation <1, all nodes>, in which everyone is recovered
172185 donors = latest_gen.members
173186 }
174187 }
188+ ```
175189 execute ConsiderGenSwitch(my_campaign->proposed_gen, donors) and broadcast
176190 CurrentGenIs<current_gen, donors>
177191 - On CurrentGenIs<gen, donors> receival, ConsiderGenSwitch(gen, donors) is always executed.
@@ -180,11 +194,12 @@ Initially we set first generation <1, all nodes>, in which everyone is recovered
180194 proposed_members.
181195
182196
183- ------------------------------
184- Generation switching procedure, executed whenever node learned about existence
197+ ## Generation switching procedure
198+ executed whenever node learned about existence
185199of generation higher than its current (CurrentGenIs, START_REPLICATION
186200command, PREPARE, parallel safe arrived, PREPARE replies):
187201
202+ ``` c
188203bool ConsiderGenSwitch (Generation gen, nodemask_t donors) {
189204 LWLockAcquire(GenLock, LW_EXCLUSIVE);
190205 if (genstate->current_gen.num >= gen.num) {
@@ -309,10 +324,12 @@ void EnableMyself() {
309324 * Now backends and walreceivers may proceed * /
310325 genstate->status = ONLINE;
311326}
327+ ```
328+
312329
313330
314- ------------------------------
315- Backend actions:
331+ ## Backend actions:
332+
316333 - During writing PREPARE to wal, lock GenLock in shared mode and
317334 - if !IsMemberOfGen(me, genstate->current_gen), bail out with 'not a member of current gen'
318335 - if genstate->status == RECOVERY, bail out with 'node is in recovery'
@@ -329,9 +346,9 @@ because if e.g. we had BC, then sausage A-B-C, and clique convention says to us
329346that in this case quorum must be AB, next gen might exclude C even if C is alive
330347and connected to B.
331348
332- ------------------------------
333- Walreceiver:
349+ ## Walreceiver:
334350
351+ ```c
335352enum
336353{
337354 REPLMODE_RECOVERY, /* stream all origins */
@@ -535,9 +552,10 @@ HandleCommit(record, rcv_ctx) {
535552 }
536553}
537554
555+ ```
556+
538557
539- ------------------------------
540- Liveness.
558+ ## Liveness.
541559
542560As said above, anyone can at any time propose any generations and we ought to be
543561safe. However, to make sure the system is live, sane generations should be
0 commit comments