linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Filipe Manana <fdmanana@suse.com>,
	David Sterba <dsterba@suse.com>
Subject: [PATCH 4.14 34/53] Btrfs: fix incremental send failure after deduplication
Date: Mon,  5 Aug 2019 15:02:59 +0200	[thread overview]
Message-ID: <20190805124931.988306131@linuxfoundation.org> (raw)
In-Reply-To: <20190805124927.973499541@linuxfoundation.org>

From: Filipe Manana <fdmanana@suse.com>

commit b4f9a1a87a48c255bb90d8a6c3d555a1abb88130 upstream.

When doing an incremental send operation we can fail if we previously did
deduplication operations against a file that exists in both snapshots. In
that case we will fail the send operation with -EIO and print a message
to dmesg/syslog like the following:

  BTRFS error (device sdc): Send: inconsistent snapshot, found updated \
  extent for inode 257 without updated inode item, send root is 258, \
  parent root is 257

This requires that we deduplicate to the same file in both snapshots for
the same amount of times on each snapshot. The issue happens because a
deduplication only updates the iversion of an inode and does not update
any other field of the inode, therefore if we deduplicate the file on
each snapshot for the same amount of time, the inode will have the same
iversion value (stored as the "sequence" field on the inode item) on both
snapshots, therefore it will be seen as unchanged between in the send
snapshot while there are new/updated/deleted extent items when comparing
to the parent snapshot. This makes the send operation return -EIO and
print an error message.

Example reproducer:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  # Create our first file. The first half of the file has several 64Kb
  # extents while the second half as a single 512Kb extent.
  $ xfs_io -f -s -c "pwrite -S 0xb8 -b 64K 0 512K" /mnt/foo
  $ xfs_io -c "pwrite -S 0xb8 512K 512K" /mnt/foo

  # Create the base snapshot and the parent send stream from it.
  $ btrfs subvolume snapshot -r /mnt /mnt/mysnap1
  $ btrfs send -f /tmp/1.snap /mnt/mysnap1

  # Create our second file, that has exactly the same data as the first
  # file.
  $ xfs_io -f -c "pwrite -S 0xb8 0 1M" /mnt/bar

  # Create the second snapshot, used for the incremental send, before
  # doing the file deduplication.
  $ btrfs subvolume snapshot -r /mnt /mnt/mysnap2

  # Now before creating the incremental send stream:
  #
  # 1) Deduplicate into a subrange of file foo in snapshot mysnap1. This
  #    will drop several extent items and add a new one, also updating
  #    the inode's iversion (sequence field in inode item) by 1, but not
  #    any other field of the inode;
  #
  # 2) Deduplicate into a different subrange of file foo in snapshot
  #    mysnap2. This will replace an extent item with a new one, also
  #    updating the inode's iversion by 1 but not any other field of the
  #    inode.
  #
  # After these two deduplication operations, the inode items, for file
  # foo, are identical in both snapshots, but we have different extent
  # items for this inode in both snapshots. We want to check this doesn't
  # cause send to fail with an error or produce an incorrect stream.

  $ xfs_io -r -c "dedupe /mnt/bar 0 0 512K" /mnt/mysnap1/foo
  $ xfs_io -r -c "dedupe /mnt/bar 512K 512K 512K" /mnt/mysnap2/foo

  # Create the incremental send stream.
  $ btrfs send -p /mnt/mysnap1 -f /tmp/2.snap /mnt/mysnap2
  ERROR: send ioctl failed with -5: Input/output error

This issue started happening back in 2015 when deduplication was updated
to not update the inode's ctime and mtime and update only the iversion.
Back then we would hit a BUG_ON() in send, but later in 2016 send was
updated to return -EIO and print the error message instead of doing the
BUG_ON().

A test case for fstests follows soon.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203933
Fixes: 1c919a5e13702c ("btrfs: don't update mtime/ctime on deduped inodes")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/send.c |   77 ++++++++++----------------------------------------------
 1 file changed, 15 insertions(+), 62 deletions(-)

--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -6130,68 +6130,21 @@ static int changed_extent(struct send_ct
 {
 	int ret = 0;
 
-	if (sctx->cur_ino != sctx->cmp_key->objectid) {
-
-		if (result == BTRFS_COMPARE_TREE_CHANGED) {
-			struct extent_buffer *leaf_l;
-			struct extent_buffer *leaf_r;
-			struct btrfs_file_extent_item *ei_l;
-			struct btrfs_file_extent_item *ei_r;
-
-			leaf_l = sctx->left_path->nodes[0];
-			leaf_r = sctx->right_path->nodes[0];
-			ei_l = btrfs_item_ptr(leaf_l,
-					      sctx->left_path->slots[0],
-					      struct btrfs_file_extent_item);
-			ei_r = btrfs_item_ptr(leaf_r,
-					      sctx->right_path->slots[0],
-					      struct btrfs_file_extent_item);
-
-			/*
-			 * We may have found an extent item that has changed
-			 * only its disk_bytenr field and the corresponding
-			 * inode item was not updated. This case happens due to
-			 * very specific timings during relocation when a leaf
-			 * that contains file extent items is COWed while
-			 * relocation is ongoing and its in the stage where it
-			 * updates data pointers. So when this happens we can
-			 * safely ignore it since we know it's the same extent,
-			 * but just at different logical and physical locations
-			 * (when an extent is fully replaced with a new one, we
-			 * know the generation number must have changed too,
-			 * since snapshot creation implies committing the current
-			 * transaction, and the inode item must have been updated
-			 * as well).
-			 * This replacement of the disk_bytenr happens at
-			 * relocation.c:replace_file_extents() through
-			 * relocation.c:btrfs_reloc_cow_block().
-			 */
-			if (btrfs_file_extent_generation(leaf_l, ei_l) ==
-			    btrfs_file_extent_generation(leaf_r, ei_r) &&
-			    btrfs_file_extent_ram_bytes(leaf_l, ei_l) ==
-			    btrfs_file_extent_ram_bytes(leaf_r, ei_r) &&
-			    btrfs_file_extent_compression(leaf_l, ei_l) ==
-			    btrfs_file_extent_compression(leaf_r, ei_r) &&
-			    btrfs_file_extent_encryption(leaf_l, ei_l) ==
-			    btrfs_file_extent_encryption(leaf_r, ei_r) &&
-			    btrfs_file_extent_other_encoding(leaf_l, ei_l) ==
-			    btrfs_file_extent_other_encoding(leaf_r, ei_r) &&
-			    btrfs_file_extent_type(leaf_l, ei_l) ==
-			    btrfs_file_extent_type(leaf_r, ei_r) &&
-			    btrfs_file_extent_disk_bytenr(leaf_l, ei_l) !=
-			    btrfs_file_extent_disk_bytenr(leaf_r, ei_r) &&
-			    btrfs_file_extent_disk_num_bytes(leaf_l, ei_l) ==
-			    btrfs_file_extent_disk_num_bytes(leaf_r, ei_r) &&
-			    btrfs_file_extent_offset(leaf_l, ei_l) ==
-			    btrfs_file_extent_offset(leaf_r, ei_r) &&
-			    btrfs_file_extent_num_bytes(leaf_l, ei_l) ==
-			    btrfs_file_extent_num_bytes(leaf_r, ei_r))
-				return 0;
-		}
-
-		inconsistent_snapshot_error(sctx, result, "extent");
-		return -EIO;
-	}
+	/*
+	 * We have found an extent item that changed without the inode item
+	 * having changed. This can happen either after relocation (where the
+	 * disk_bytenr of an extent item is replaced at
+	 * relocation.c:replace_file_extents()) or after deduplication into a
+	 * file in both the parent and send snapshots (where an extent item can
+	 * get modified or replaced with a new one). Note that deduplication
+	 * updates the inode item, but it only changes the iversion (sequence
+	 * field in the inode item) of the inode, so if a file is deduplicated
+	 * the same amount of times in both the parent and send snapshots, its
+	 * iversion becames the same in both snapshots, whence the inode item is
+	 * the same on both snapshots.
+	 */
+	if (sctx->cur_ino != sctx->cmp_key->objectid)
+		return 0;
 
 	if (!sctx->cur_inode_new_gen && !sctx->cur_inode_deleted) {
 		if (result != BTRFS_COMPARE_TREE_DELETED)



  parent reply	other threads:[~2019-08-05 13:08 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-05 13:02 [PATCH 4.14 00/53] 4.14.137-stable review Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 01/53] ARM: riscpc: fix DMA Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 02/53] ARM: dts: rockchip: Make rk3288-veyron-minnie run at hs200 Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 03/53] ARM: dts: rockchip: Make rk3288-veyron-mickeys emmc work again Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 04/53] ARM: dts: rockchip: Mark that the rk3288 timer might stop in suspend Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 05/53] ftrace: Enable trampoline when rec count returns back to one Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 06/53] kernel/module.c: Only return -EEXIST for modules that have finished loading Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 07/53] MIPS: lantiq: Fix bitfield masking Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 08/53] dmaengine: rcar-dmac: Reject zero-length slave DMA requests Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 09/53] clk: tegra210: fix PLLU and PLLU_OUT1 Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 10/53] fs/adfs: super: fix use-after-free bug Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 11/53] btrfs: fix minimum number of chunk errors for DUP Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 12/53] cifs: Fix a race condition with cifs_echo_request Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 13/53] ceph: fix improper use of smp_mb__before_atomic() Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 14/53] ceph: return -ERANGE if virtual xattr value didnt fit in buffer Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 15/53] ACPI: blacklist: fix clang warning for unused DMI table Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 16/53] scsi: zfcp: fix GCC compiler warning emitted with -Wmaybe-uninitialized Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 17/53] x86: kvm: avoid constant-conversion warning Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 18/53] ACPI: fix false-positive -Wuninitialized warning Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 19/53] be2net: Signal that the device cannot transmit during reconfiguration Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 20/53] x86/apic: Silence -Wtype-limits compiler warnings Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 21/53] x86: math-emu: Hide clang warnings for 16-bit overflow Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 22/53] mm/cma.c: fail if fixed declaration cant be honored Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 23/53] coda: add error handling for fget Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 24/53] coda: fix build using bare-metal toolchain Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 25/53] uapi linux/coda_psdev.h: move upc_req definition from uapi to kernel side headers Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 26/53] drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 27/53] ipc/mqueue.c: only perform resource calculation if user valid Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 28/53] xen/pv: Fix a boot up hang revealed by int3 self test Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 29/53] x86/kvm: Dont call kvm_spurious_fault() from .fixup Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 30/53] x86/paravirt: Fix callee-saved function ELF sizes Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 31/53] x86, boot: Remove multiple copy of static function sanitize_boot_params() Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 32/53] drm/nouveau: fix memory leak in nouveau_conn_reset() Greg Kroah-Hartman
2019-08-05 13:02 ` [PATCH 4.14 33/53] kbuild: initialize CLANG_FLAGS correctly in the top Makefile Greg Kroah-Hartman
2019-08-05 13:02 ` Greg Kroah-Hartman [this message]
2019-08-05 13:03 ` [PATCH 4.14 35/53] Btrfs: fix race leading to fs corruption after transaction abort Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.14 36/53] mmc: dw_mmc: Fix occasional hang after tuning on eMMC Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.14 37/53] gpiolib: fix incorrect IRQ requesting of an active-low lineevent Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.14 38/53] IB/hfi1: Fix Spectre v1 vulnerability Greg Kroah-Hartman
2019-08-05 14:16   ` Gustavo A. R. Silva
2019-08-26  9:06     ` Greg Kroah-Hartman
2019-08-26 21:23       ` Gustavo A. R. Silva
2019-08-05 13:03 ` [PATCH 4.14 39/53] selinux: fix memory leak in policydb_init() Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.14 40/53] s390/dasd: fix endless loop after read unit address configuration Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.14 41/53] parisc: Fix build of compressed kernel even with debug enabled Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.14 42/53] drivers/perf: arm_pmu: Fix failure path in PM notifier Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.14 43/53] nbd: replace kill_bdev() with __invalidate_device() again Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.14 44/53] xen/swiotlb: fix condition for calling xen_destroy_contiguous_region() Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.14 45/53] IB/mlx5: Fix unreg_umr to ignore the mkey state Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.14 46/53] IB/mlx5: Use direct mkey destroy command upon UMR unreg failure Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.14 47/53] IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.14 48/53] IB/mlx5: Fix RSS Toeplitz setup to be aligned with the HW specification Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.14 49/53] IB/hfi1: Check for error on call to alloc_rsm_map_table Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.14 50/53] eeprom: at24: make spd world-readable again Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.14 51/53] objtool: Support GCC 9 cold subfunction naming scheme Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.14 52/53] gcc-9: properly declare the {pv,hv}clock_page storage Greg Kroah-Hartman
2019-08-05 13:03 ` [PATCH 4.14 53/53] x86/vdso: Prevent segfaults due to hoisted vclock reads Greg Kroah-Hartman
2019-08-06  1:06 ` [PATCH 4.14 00/53] 4.14.137-stable review shuah
2019-08-06  3:34 ` Naresh Kamboju
2019-08-06  7:16 ` Jack Wang
2019-08-06 15:49 ` Guenter Roeck
2019-08-06 18:29 ` Jon Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190805124931.988306131@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).