Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: David Sterba <dsterba@suse.cz>
To: fdmanana@kernel.org
Cc: linux-btrfs@vger.kernel.org, ce3g8jdj@umail.furryterror.org
Subject: Re: [PATCH] Btrfs: fix corruption reading shared and compressed extents after hole punching
Date: Wed, 27 Feb 2019 15:29:04 +0100
Message-ID: <20190227142904.GM24609@twin.jikos.cz> (raw)
In-Reply-To: <20190214151720.23563-1-fdmanana@kernel.org>

On Thu, Feb 14, 2019 at 03:17:20PM +0000, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> In the past we had data corruption when reading compressed extents that
> are shared within the same file and they are consecutive, this got fixed
> by commit 005efedf2c7d0 ("Btrfs: fix read corruption of compressed and
> shared extents") and by commit 808f80b46790f ("Btrfs: update fix for read
> corruption of compressed and shared extents"). However there was a case
> that was missing in those fixes, which is when the shared and compressed
> extents are referenced with a non-zero offset. The following shell script
> creates a reproducer for this issue:
> 
>   #!/bin/bash
> 
>   mkfs.btrfs -f /dev/sdc &> /dev/null
>   mount -o compress /dev/sdc /mnt/sdc
> 
>   # Create a file with 3 consecutive compressed extents, each has an
>   # uncompressed size of 128Kb and a compressed size of 4Kb.
>   for ((i = 1; i <= 3; i++)); do
>       head -c 4096 /dev/zero
>       for ((j = 1; j <= 31; j++)); do
>           head -c 4096 /dev/zero | tr '\0' "\377"
>       done
>   done > /mnt/sdc/foobar
>   sync
> 
>   echo "Digest after file creation:   $(md5sum /mnt/sdc/foobar)"
> 
>   # Clone the first extent into offsets 128K and 256K.
>   xfs_io -c "reflink /mnt/sdc/foobar 0 128K 128K" /mnt/sdc/foobar
>   xfs_io -c "reflink /mnt/sdc/foobar 0 256K 128K" /mnt/sdc/foobar
>   sync
> 
>   echo "Digest after cloning:         $(md5sum /mnt/sdc/foobar)"
> 
>   # Punch holes into the regions that are already full of zeroes.
>   xfs_io -c "fpunch 0 4K" /mnt/sdc/foobar
>   xfs_io -c "fpunch 128K 4K" /mnt/sdc/foobar
>   xfs_io -c "fpunch 256K 4K" /mnt/sdc/foobar
>   sync
> 
>   echo "Digest after hole punching:   $(md5sum /mnt/sdc/foobar)"
> 
>   echo "Dropping page cache..."
>   sysctl -q vm.drop_caches=1
>   echo "Digest after hole punching:   $(md5sum /mnt/sdc/foobar)"
> 
>   umount /dev/sdc
> 
> When running the script we get the following output:
> 
>   Digest after file creation:   5a0888d80d7ab1fd31c229f83a3bbcc8  /mnt/sdc/foobar
>   linked 131072/131072 bytes at offset 131072
>   128 KiB, 1 ops; 0.0033 sec (36.960 MiB/sec and 295.6830 ops/sec)
>   linked 131072/131072 bytes at offset 262144
>   128 KiB, 1 ops; 0.0015 sec (78.567 MiB/sec and 628.5355 ops/sec)
>   Digest after cloning:         5a0888d80d7ab1fd31c229f83a3bbcc8  /mnt/sdc/foobar
>   Digest after hole punching:   5a0888d80d7ab1fd31c229f83a3bbcc8  /mnt/sdc/foobar
>   Dropping page cache...
>   Digest after hole punching:   fba694ae8664ed0c2e9ff8937e7f1484  /mnt/sdc/foobar
> 
> This happens because after reading all the pages of the extent in the
> range from 128K to 256K for example, we read the hole at offset 256K
> and then when reading the page at offset 260K we don't submit the
> existing bio, which is responsible for filling all the page in the
> range 128K to 256K only, therefore adding the pages from range 260K
> to 384K to the existing bio and submitting it after iterating over the
> entire range. Once the bio completes, the uncompressed data fills only
> the pages in the range 128K to 256K because there's no more data read
> from disk, leaving the pages in the range 260K to 384K unfilled. It is
> just a slightly different variant of what was solved by commit
> 005efedf2c7d0 ("Btrfs: fix read corruption of compressed and shared
> extents").
> 
> Fix this by forcing a bio submit, during readpages(), whenever we find a
> compressed extent map for a page that is different from the extent map
> for the previous page or has a different starting offset (in case it's
> the same compressed extent), instead of the extent map's original start
> offset.
> 
> A test case for fstests follows soon.
> 
> Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
> Tested-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
> Cc: stable@vger.kernel.org # 4.3+
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Added to misc-next and scheduled for 5.1, thanks. And thanks to Zygo for
the report.

      reply index

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-14 15:17 fdmanana
2019-02-27 14:29 ` David Sterba [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190227142904.GM24609@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox