From: David Sterba <email@example.com> To: firstname.lastname@example.org Cc: email@example.com, firstname.lastname@example.org Subject: Re: [PATCH] Btrfs: fix corruption reading shared and compressed extents after hole punching Date: Wed, 27 Feb 2019 15:29:04 +0100 Message-ID: <20190227142904.GM24609@twin.jikos.cz> (raw) In-Reply-To: <email@example.com> On Thu, Feb 14, 2019 at 03:17:20PM +0000, firstname.lastname@example.org wrote: > From: Filipe Manana <email@example.com> > > In the past we had data corruption when reading compressed extents that > are shared within the same file and they are consecutive, this got fixed > by commit 005efedf2c7d0 ("Btrfs: fix read corruption of compressed and > shared extents") and by commit 808f80b46790f ("Btrfs: update fix for read > corruption of compressed and shared extents"). However there was a case > that was missing in those fixes, which is when the shared and compressed > extents are referenced with a non-zero offset. The following shell script > creates a reproducer for this issue: > > #!/bin/bash > > mkfs.btrfs -f /dev/sdc &> /dev/null > mount -o compress /dev/sdc /mnt/sdc > > # Create a file with 3 consecutive compressed extents, each has an > # uncompressed size of 128Kb and a compressed size of 4Kb. > for ((i = 1; i <= 3; i++)); do > head -c 4096 /dev/zero > for ((j = 1; j <= 31; j++)); do > head -c 4096 /dev/zero | tr '\0' "\377" > done > done > /mnt/sdc/foobar > sync > > echo "Digest after file creation: $(md5sum /mnt/sdc/foobar)" > > # Clone the first extent into offsets 128K and 256K. > xfs_io -c "reflink /mnt/sdc/foobar 0 128K 128K" /mnt/sdc/foobar > xfs_io -c "reflink /mnt/sdc/foobar 0 256K 128K" /mnt/sdc/foobar > sync > > echo "Digest after cloning: $(md5sum /mnt/sdc/foobar)" > > # Punch holes into the regions that are already full of zeroes. > xfs_io -c "fpunch 0 4K" /mnt/sdc/foobar > xfs_io -c "fpunch 128K 4K" /mnt/sdc/foobar > xfs_io -c "fpunch 256K 4K" /mnt/sdc/foobar > sync > > echo "Digest after hole punching: $(md5sum /mnt/sdc/foobar)" > > echo "Dropping page cache..." > sysctl -q vm.drop_caches=1 > echo "Digest after hole punching: $(md5sum /mnt/sdc/foobar)" > > umount /dev/sdc > > When running the script we get the following output: > > Digest after file creation: 5a0888d80d7ab1fd31c229f83a3bbcc8 /mnt/sdc/foobar > linked 131072/131072 bytes at offset 131072 > 128 KiB, 1 ops; 0.0033 sec (36.960 MiB/sec and 295.6830 ops/sec) > linked 131072/131072 bytes at offset 262144 > 128 KiB, 1 ops; 0.0015 sec (78.567 MiB/sec and 628.5355 ops/sec) > Digest after cloning: 5a0888d80d7ab1fd31c229f83a3bbcc8 /mnt/sdc/foobar > Digest after hole punching: 5a0888d80d7ab1fd31c229f83a3bbcc8 /mnt/sdc/foobar > Dropping page cache... > Digest after hole punching: fba694ae8664ed0c2e9ff8937e7f1484 /mnt/sdc/foobar > > This happens because after reading all the pages of the extent in the > range from 128K to 256K for example, we read the hole at offset 256K > and then when reading the page at offset 260K we don't submit the > existing bio, which is responsible for filling all the page in the > range 128K to 256K only, therefore adding the pages from range 260K > to 384K to the existing bio and submitting it after iterating over the > entire range. Once the bio completes, the uncompressed data fills only > the pages in the range 128K to 256K because there's no more data read > from disk, leaving the pages in the range 260K to 384K unfilled. It is > just a slightly different variant of what was solved by commit > 005efedf2c7d0 ("Btrfs: fix read corruption of compressed and shared > extents"). > > Fix this by forcing a bio submit, during readpages(), whenever we find a > compressed extent map for a page that is different from the extent map > for the previous page or has a different starting offset (in case it's > the same compressed extent), instead of the extent map's original start > offset. > > A test case for fstests follows soon. > > Reported-by: Zygo Blaxell <firstname.lastname@example.org> > Tested-by: Zygo Blaxell <email@example.com> > Cc: firstname.lastname@example.org # 4.3+ > Signed-off-by: Filipe Manana <email@example.com> Added to misc-next and scheduled for 5.1, thanks. And thanks to Zygo for the report.
prev parent reply index Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-02-14 15:17 fdmanana 2019-02-27 14:29 ` David Sterba [this message]
Reply instructions: You may reply publically to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190227142904.GM24609@twin.jikos.cz \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-BTRFS Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \ firstname.lastname@example.org email@example.com public-inbox-index linux-btrfs Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs AGPL code for this site: git clone https://public-inbox.org/ public-inbox