linux-fscrypt.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Biggers <ebiggers@kernel.org>
To: Boris Burkov <boris@bur.io>
Cc: fstests@vger.kernel.org, linux-fscrypt@vger.kernel.org,
	linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH v5 4/4] generic: test fs-verity EFBIG scenarios
Date: Thu, 16 Sep 2021 14:18:34 -0700	[thread overview]
Message-ID: <YUO0qg3bqxsCy/iT@sol.localdomain> (raw)
In-Reply-To: <b1d116cd4d0ea74b9cd86f349c672021e005a75c.1631558495.git.boris@bur.io>

On Mon, Sep 13, 2021 at 11:44:37AM -0700, Boris Burkov wrote:
> +_fsv_scratch_begin_subtest "way too big: fail on first merkle block"
> +# have to go back by 4096 from max to not hit the fsverity MAX_LEVELS check.
> +truncate -s $(($max_sz - 4095)) $fsv_file
> +_fsv_enable $fsv_file |& _filter_scratch

This is actually a kernel bug, so please don't work around it in the test :-(

It will be fixed by the kernel patch
https://lore.kernel.org/linux-fscrypt/20210916203424.113376-1-ebiggers@kernel.org

> +
> +# The goal of this second test is to make a big enough file that we trip the
> +# EFBIG codepath, but not so big that we hit it immediately as soon as we try
> +# to write a Merkle leaf. Because of the layout of the Merkle tree that
> +# fs-verity uses, this is a bit complicated to compute dynamically.
> +
> +# The layout of the Merkle tree has the leaf nodes last, but writes them first.
> +# To get an interesting overflow, we need the start of L0 to be < MAX but the
> +# end of the merkle tree (EOM) to be past MAX. Ideally, the start of L0 is only
> +# just smaller than MAX, so that we don't have to write many blocks to blow up,
> +# but we take some liberties with adding alignments rather than computing them
> +# correctly, so we under-estimate the perfectly sized file.
> +
> +# We make the following assumptions to arrive at a Merkle tree layout:
> +# The Merkle tree is stored past EOF aligned to 64k.
> +# 4K blocks and pages
> +# Merkle tree levels aligned to the block (not pictured)
> +# SHA-256 hashes (32 bytes; 128 hashes per block/page)
> +# 64 bit max file size (and thus 8 levels)
> +
> +# 0                        EOF round-to-64k L7L6L5 L4   L3    L2    L1  L0 MAX  EOM
> +# |-------------------------|               ||-|--|---|----|-----|------|--|!!!!!|
> +
> +# Given this structure, we can compute the size of the file that yields the
> +# desired properties. (NB the diagram skips the block alignment of each level)
> +# sz + 64k + sz/128^8 + 4k + sz/128^7 + 4k + ... + sz/128^2 + 4k < MAX
> +# sz + 64k + 7(4k) + sz/128^8 + sz/128^7 + ... + sz/128^2 < MAX
> +# sz + 92k + sz/128^2 < MAX
> +# (128^8)sz + (128^8)92k + sz + (128)sz + (128^2)sz + ... + (128^6)sz < (128^8)MAX
> +# sz(128^8 + 128^6 + 128^5 + 128^4 + 128^3 + 128^2 + 128 + 1) < (128^8)(MAX - 92k)
> +# sz < (128^8/(128^8 + (128^6 + ... + 128 + 1)))(MAX - 92k)
> +#
> +# Do the actual caclulation with 'bc' and 20 digits of precision.
> +# set -f prevents the * from being expanded into the files in the cwd.
> +set -f
> +calc="scale=20; ($max_sz - 94208) * ((128^8) / (1 + 128 + 128^2 + 128^3 + 128^4 + 128^5 + 128^6 + 128^8))"
> +sz=$(echo $calc | $BC -q | cut -d. -f1)
> +set +f

It's hard to follow the above explanation.  I'm still wondering whether it could
be simplified a lot.  Maybe something like the following:

# The goal of this second test is to make a big enough file that we trip the
# EFBIG codepath, but not so big that we hit it immediately when writing the
# first Merkle leaf.
#
# The Merkle tree is stored with the leaf node level (L0) last, but it is
# written first.  To get an interesting overflow, we need the maximum file size
# (MAX) to be in the middle of L0 -- ideally near the beginning of L0 so that we
# don't have to write many blocks before getting an error.
# 
# With SHA-256 and 4K blocks, there are 128 hashes per block.  Thus, ignoring
# padding, L0 is 1/128 of the file size while the other levels in total are
# 1/128**2 + 1/128**3 + 1/128**4 + ... = 1/16256 of the file size.  So still
# ignoring padding, for L0 start exactly at MAX, the file size must be s such
# that s + s/16256 = MAX, i.e. s = MAX * (16256/16257).  Then to get a file size
# where MAX occurs *near* the start of L0 rather than *at* the start, we can
# just subtract an overestimate of the padding: 64K after the file contents,
# then 4K per level, where the consideration of 8 levels is sufficient.
sz=$(echo "scale=20; $max_sz * (16256/16257) - 65536 - 4096*8" | $BC -q | cut -d. -f1)


That gives a size only 4103 bytes different from your calculation, and IMO is
much easier to understand.

- Eric

      reply	other threads:[~2021-09-16 21:18 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-13 18:44 [PATCH v5 0/4] tests for btrfs fsverity Boris Burkov
2021-09-13 18:44 ` [PATCH v5 1/4] btrfs: test btrfs specific fsverity corruption Boris Burkov
2021-09-13 18:44 ` [PATCH v5 2/4] generic/574: corrupt btrfs merkle tree data Boris Burkov
2021-09-13 18:44 ` [PATCH v5 3/4] btrfs: test verity orphans with dmlogwrites Boris Burkov
2021-09-13 18:44 ` [PATCH v5 4/4] generic: test fs-verity EFBIG scenarios Boris Burkov
2021-09-16 21:18   ` Eric Biggers [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YUO0qg3bqxsCy/iT@sol.localdomain \
    --to=ebiggers@kernel.org \
    --cc=boris@bur.io \
    --cc=fstests@vger.kernel.org \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fscrypt@vger.kernel.org \
    --subject='Re: [PATCH v5 4/4] generic: test fs-verity EFBIG scenarios' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).