an oversight in the Bao security argument: security relies on the chunk counter

Antonio Marcedone and Balachandar Kesavan have reported a mistake in the Bao security argument related to seeking. The security argument can be repaired without any changes to Bao, so this mistake is not a vulnerability, but nonetheless it was a significant flaw in my reasoning. It indicates that I need to look carefully for similar mistakes and ultimately formalize the security argument into a proof.

Here's my summary of their report. Consider a 2 KiB file whose BLAKE3/Bao tree looks like this:

```
      root
      /  \
chunk_a  chunk_b
```

Now imagine an attacker tries to construct the following _invalid_ tree representing a 3 KiB file:

```
          root
          /  \
     parent  chunk_b
      /  \
chunk_c  chunk_d
```

Here the attacker has kept nodes `root` and `chunk_b` the same, but they've replaced `chunk_a` with some new, arbitrary 2 KiB subtree. Note that the attacker cannot make the CV of the new `parent` node match the old CV of `chunk_a`, which is still the left half of the unchanged `root` node, so this tree cannot be a valid Bao encoding. (A collision on any of these CV's would mean that BLAKE3 itself was broken.)

However, suppose the attacker presents this invalid encoding with a length header of 3072 (3*1024), claims that it matches the original root hash, and invites the victim to seek to the start of the final chunk (content offset 2048). In this case the victim will read only the nodes `root` and `chunk_b`. The security question is, will these nodes validate? If so, the attacker will have successfully tricked the victim into validating content bytes that differ from the original input. (The bytes in question were present in the original but at a different offset, and the victim would also validate an incorrect total length for the file.)

In fact, thankfully, `chunk_b` will not validate. The `root` node will, because its CV/hash doesn't directly depend on the total input length. But `chunk_b` will not, because moving it to a new position in the tree changes the value of its **chunk counter** a.k.a. the `t` parameter to the compression function.

This is all well and good...but I never actually considered the chunk counter when I wrote Bao's security argument. In fact, an [earlier version of Bao](https://github.com/oconnor663/bao/blob/0.6.0/docs/spec.md) didn't include chunk counters at all and was broken in this case. The chunk counter was added later, for [unrelated reasons](https://github.com/oconnor663/bao/blob/0.7.0/docs/spec.md#why-does-bao-set-the-node-offset-parameter-for-each-chunk). See also [Zooko's original suggestion](https://twitter.com/zooko/status/1060218512990363650) and section 7.5 of [the BLAKE3 paper](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf). So what we have here is a pretty dramatic oversight in the design, where I missed an important class of attacks, but I got lucky and didn't get burned.

The generalized version of this attack exploits the fact that that traversing the right face of the tree comes with some ambiguity about the size of the left subtrees that we're "skipping over". The number of parent nodes in the path from the root to the rightmost chunk is equal to the number of 1-bits in the binary representation of the rightmost chunk index (i.e. the total number of chunks minus one), and an attacker in this scenario has to preserve the number of nodes in that path, but _which 1-bits those nodes represent_ is potentially ambiguous. In the case above the attacker tries to pass off chunk index 0b1 (1) as chunk index 0b10 (2), but they could also have tried 0b100 (4) or 0b1000 (8), or any other index with a single 1-bit set. Similarly, a tree with six chunks would have a rightmost chunk index of 0b101 (5), which could be ambiguous with 0b11 (3), 0b110 (6), or any larger number with two 1-bits. Disambiguating/decolliding these cases requires some node along the path to incorporate the total size of the tree in its CV. The chunk counter does this, and any Bao-like tree hash design without a chunk counter (for example, with the goal of allowing subtree caching) would need to incorporate the total size somewhere else, perhaps by having parent nodes include their own height in the compression function parameters.

Notably, the proof frameworks laid out by [_Sufficient conditions for sound tree and sequential hashing modes_](https://eprint.iacr.org/2009/210.pdf) and [_Sound Hashing Modes of Arbitrary Functions, Permutations, and Block Ciphers_](https://tosc.iacr.org/index.php/ToSC/article/view/7366), which we used in the BLAKE3 paper, aren't concerned with this property. For the purposes of those frameworks, the fact that `parent` can't be constructed to collide with `chunk_a` (i.e. "tree-decodability" or "radical-decodability") is the end of the story. The modified tree can never represent the hash of any input, so we don't need to consider it. But for Bao's purposes, we need something stronger. The victim might never even look at `parent`, but they still need to be certain that none of the chunks they _do_ look at have been moved. I need to formalize the specific properties Bao is aiming for here and then (hopefully) prove that Bao achieves them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

an oversight in the Bao security argument: security relies on the chunk counter #41

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

an oversight in the Bao security argument: security relies on the chunk counter #41

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions