Skip to content

an oversight in the Bao security argument: security relies on the chunk counter #41

@oconnor663

Description

@oconnor663

Antonio Marcedone and Balachandar Kesavan have reported a mistake in the Bao security argument related to seeking. The security argument can be repaired without any changes to Bao, so this mistake is not a vulnerability, but nonetheless it was a significant flaw in my reasoning. It indicates that I need to look carefully for similar mistakes and ultimately formalize the security argument into a proof.

Here's my summary of their report. Consider a 2 KiB file whose BLAKE3/Bao tree looks like this:

      root
      /  \
chunk_a  chunk_b

Now imagine an attacker tries to construct the following invalid tree representing a 3 KiB file:

          root
          /  \
     parent  chunk_b
      /  \
chunk_c  chunk_d

Here the attacker has kept nodes root and chunk_b the same, but they've replaced chunk_a with some new, arbitrary 2 KiB subtree. Note that the attacker cannot make the CV of the new parent node match the old CV of chunk_a, which is still the left half of the unchanged root node, so this tree cannot be a valid Bao encoding. (A collision on any of these CV's would mean that BLAKE3 itself was broken.)

However, suppose the attacker presents this invalid encoding with a length header of 3072 (3*1024), claims that it matches the original root hash, and invites the victim to seek to the start of the final chunk (content offset 2048). In this case the victim will read only the nodes root and chunk_b. The security question is, will these nodes validate? If so, the attacker will have successfully tricked the victim into validating content bytes that differ from the original input. (The bytes in question were present in the original but at a different offset, and the victim would also validate an incorrect total length for the file.)

In fact, thankfully, chunk_b will not validate. The root node will, because its CV/hash doesn't directly depend on the total input length. But chunk_b will not, because moving it to a new position in the tree changes the value of its chunk counter a.k.a. the t parameter to the compression function.

This is all well and good...but I never actually considered the chunk counter when I wrote Bao's security argument. In fact, an earlier version of Bao didn't include chunk counters at all and was broken in this case. The chunk counter was added later, for unrelated reasons. See also Zooko's original suggestion and section 7.5 of the BLAKE3 paper. So what we have here is a pretty dramatic oversight in the design, where I missed an important class of attacks, but I got lucky and didn't get burned.

The generalized version of this attack exploits the fact that that traversing the right face of the tree comes with some ambiguity about the size of the left subtrees that we're "skipping over". The number of parent nodes in the path from the root to the rightmost chunk is equal to the number of 1-bits in the binary representation of the rightmost chunk index (i.e. the total number of chunks minus one), and an attacker in this scenario has to preserve the number of nodes in that path, but which 1-bits those nodes represent is potentially ambiguous. In the case above the attacker tries to pass off chunk index 0b1 (1) as chunk index 0b10 (2), but they could also have tried 0b100 (4) or 0b1000 (8), or any other index with a single 1-bit set. Similarly, a tree with six chunks would have a rightmost chunk index of 0b101 (5), which could be ambiguous with 0b11 (3), 0b110 (6), or any larger number with two 1-bits. Disambiguating/decolliding these cases requires some node along the path to incorporate the total size of the tree in its CV. The chunk counter does this, and any Bao-like tree hash design without a chunk counter (for example, with the goal of allowing subtree caching) would need to incorporate the total size somewhere else, perhaps by having parent nodes include their own height in the compression function parameters.

Notably, the proof frameworks laid out by Sufficient conditions for sound tree and sequential hashing modes and Sound Hashing Modes of Arbitrary Functions, Permutations, and Block Ciphers, which we used in the BLAKE3 paper, aren't concerned with this property. For the purposes of those frameworks, the fact that parent can't be constructed to collide with chunk_a (i.e. "tree-decodability" or "radical-decodability") is the end of the story. The modified tree can never represent the hash of any input, so we don't need to consider it. But for Bao's purposes, we need something stronger. The victim might never even look at parent, but they still need to be certain that none of the chunks they do look at have been moved. I need to formalize the specific properties Bao is aiming for here and then (hopefully) prove that Bao achieves them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions