linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Filipe Manana <fdmanana@suse.com>,
	David Sterba <dsterba@suse.com>
Subject: [PATCH 5.4 10/57] btrfs: fix partial loss of prealloc extent past i_size after fsync
Date: Mon,  4 May 2020 19:57:14 +0200	[thread overview]
Message-ID: <20200504165457.355495113@linuxfoundation.org> (raw)
In-Reply-To: <20200504165456.783676004@linuxfoundation.org>

From: Filipe Manana <fdmanana@suse.com>

commit f135cea30de5f74d5bfb5116682073841fb4af8f upstream.

When we have an inode with a prealloc extent that starts at an offset
lower than the i_size and there is another prealloc extent that starts at
an offset beyond i_size, we can end up losing part of the first prealloc
extent (the part that starts at i_size) and have an implicit hole if we
fsync the file and then have a power failure.

Consider the following example with comments explaining how and why it
happens.

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  # Create our test file with 2 consecutive prealloc extents, each with a
  # size of 128Kb, and covering the range from 0 to 256Kb, with a file
  # size of 0.
  $ xfs_io -f -c "falloc -k 0 128K" /mnt/foo
  $ xfs_io -c "falloc -k 128K 128K" /mnt/foo

  # Fsync the file to record both extents in the log tree.
  $ xfs_io -c "fsync" /mnt/foo

  # Now do a redudant extent allocation for the range from 0 to 64Kb.
  # This will merely increase the file size from 0 to 64Kb. Instead we
  # could also do a truncate to set the file size to 64Kb.
  $ xfs_io -c "falloc 0 64K" /mnt/foo

  # Fsync the file, so we update the inode item in the log tree with the
  # new file size (64Kb). This also ends up setting the number of bytes
  # for the first prealloc extent to 64Kb. This is done by the truncation
  # at btrfs_log_prealloc_extents().
  # This means that if a power failure happens after this, a write into
  # the file range 64Kb to 128Kb will not use the prealloc extent and
  # will result in allocation of a new extent.
  $ xfs_io -c "fsync" /mnt/foo

  # Now set the file size to 256K with a truncate and then fsync the file.
  # Since no changes happened to the extents, the fsync only updates the
  # i_size in the inode item at the log tree. This results in an implicit
  # hole for the file range from 64Kb to 128Kb, something which fsck will
  # complain when not using the NO_HOLES feature if we replay the log
  # after a power failure.
  $ xfs_io -c "truncate 256K" -c "fsync" /mnt/foo

So instead of always truncating the log to the inode's current i_size at
btrfs_log_prealloc_extents(), check first if there's a prealloc extent
that starts at an offset lower than the i_size and with a length that
crosses the i_size - if there is one, just make sure we truncate to a
size that corresponds to the end offset of that prealloc extent, so
that we don't lose the part of that extent that starts at i_size if a
power failure happens.

A test case for fstests follows soon.

Fixes: 31d11b83b96f ("Btrfs: fix duplicate extents after fsync of file with prealloc extents")
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/tree-log.c |   43 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 40 insertions(+), 3 deletions(-)

--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4242,6 +4242,9 @@ static int btrfs_log_prealloc_extents(st
 	const u64 ino = btrfs_ino(inode);
 	struct btrfs_path *dst_path = NULL;
 	bool dropped_extents = false;
+	u64 truncate_offset = i_size;
+	struct extent_buffer *leaf;
+	int slot;
 	int ins_nr = 0;
 	int start_slot;
 	int ret;
@@ -4256,9 +4259,43 @@ static int btrfs_log_prealloc_extents(st
 	if (ret < 0)
 		goto out;
 
+	/*
+	 * We must check if there is a prealloc extent that starts before the
+	 * i_size and crosses the i_size boundary. This is to ensure later we
+	 * truncate down to the end of that extent and not to the i_size, as
+	 * otherwise we end up losing part of the prealloc extent after a log
+	 * replay and with an implicit hole if there is another prealloc extent
+	 * that starts at an offset beyond i_size.
+	 */
+	ret = btrfs_previous_item(root, path, ino, BTRFS_EXTENT_DATA_KEY);
+	if (ret < 0)
+		goto out;
+
+	if (ret == 0) {
+		struct btrfs_file_extent_item *ei;
+
+		leaf = path->nodes[0];
+		slot = path->slots[0];
+		ei = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
+
+		if (btrfs_file_extent_type(leaf, ei) ==
+		    BTRFS_FILE_EXTENT_PREALLOC) {
+			u64 extent_end;
+
+			btrfs_item_key_to_cpu(leaf, &key, slot);
+			extent_end = key.offset +
+				btrfs_file_extent_num_bytes(leaf, ei);
+
+			if (extent_end > i_size)
+				truncate_offset = extent_end;
+		}
+	} else {
+		ret = 0;
+	}
+
 	while (true) {
-		struct extent_buffer *leaf = path->nodes[0];
-		int slot = path->slots[0];
+		leaf = path->nodes[0];
+		slot = path->slots[0];
 
 		if (slot >= btrfs_header_nritems(leaf)) {
 			if (ins_nr > 0) {
@@ -4296,7 +4333,7 @@ static int btrfs_log_prealloc_extents(st
 				ret = btrfs_truncate_inode_items(trans,
 							 root->log_root,
 							 &inode->vfs_inode,
-							 i_size,
+							 truncate_offset,
 							 BTRFS_EXTENT_DATA_KEY);
 			} while (ret == -EAGAIN);
 			if (ret)



  parent reply	other threads:[~2020-05-04 18:02 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-04 17:57 [PATCH 5.4 00/57] 5.4.39-rc1 review Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 01/57] dma-buf: Fix SET_NAME ioctl uapi Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 02/57] drm/edid: Fix off-by-one in DispID DTD pixel clock Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 03/57] drm/amd/display: Fix green screen issue after suspend Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 04/57] drm/qxl: qxl_release leak in qxl_draw_dirty_fb() Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 05/57] drm/qxl: qxl_release leak in qxl_hw_surface_alloc() Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 06/57] drm/qxl: qxl_release use after free Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 07/57] NFSv4.1: fix handling of backchannel binding in BIND_CONN_TO_SESSION Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 08/57] btrfs: fix transaction leak in btrfs_recover_relocation Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 09/57] btrfs: fix block group leak when removing fails Greg Kroah-Hartman
2020-05-04 17:57 ` Greg Kroah-Hartman [this message]
2020-05-04 17:57 ` [PATCH 5.4 11/57] btrfs: transaction: Avoid deadlock due to bad initialization timing of fs_info::journal_info Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 12/57] mmc: cqhci: Avoid false "cqhci: CQE stuck on" by not open-coding timeout loop Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 13/57] mmc: sdhci-xenon: fix annoying 1.8V regulator warning Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 14/57] mmc: sdhci-pci: Fix eMMC driver strength for BYT-based controllers Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 15/57] mmc: sdhci-msm: Enable host capabilities pertains to R1b response Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 16/57] mmc: meson-mx-sdio: Set MMC_CAP_WAIT_WHILE_BUSY Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 17/57] mmc: meson-mx-sdio: remove the broken ->card_busy() op Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 18/57] crypto: caam - fix the address of the last entry of S/G Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 19/57] ALSA: hda/realtek - Two front mics on a Lenovo ThinkCenter Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 20/57] ALSA: usb-audio: Correct a typo of NuPrime DAC-10 USB ID Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 21/57] ALSA: hda/hdmi: fix without unlocked before return Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 22/57] ALSA: line6: Fix POD HD500 audio playback Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 23/57] ALSA: pcm: oss: Place the plugin buffer overflow checks correctly Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 24/57] i2c: amd-mp2-pci: Fix Oops in amd_mp2_pci_init() error handling Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 25/57] Drivers: hv: vmbus: Fix Suspend-to-Idle for Generation-2 VM Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 26/57] dlmfs_file_write(): fix the bogosity in handling non-zero *ppos Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 27/57] IB/rdmavt: Always return ERR_PTR from rvt_create_mmap_info() Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 28/57] PM: ACPI: Output correct message on target power state Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 29/57] PM: hibernate: Freeze kernel threads in software_resume() Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 30/57] dm verity fec: fix hash block number in verity_fec_decode Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 31/57] dm writecache: fix data corruption when reloading the target Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 32/57] dm multipath: use updated MPATHF_QUEUE_IO on mapping for bio-based mpath Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 33/57] ARM: dts: imx6qdl-sr-som-ti: indicate powering off wifi is safe Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 34/57] scsi: qla2xxx: set UNLOADING before waiting for session deletion Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 35/57] scsi: qla2xxx: check UNLOADING before posting async work Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 36/57] RDMA/mlx5: Set GRH fields in query QP on RoCE Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 37/57] RDMA/mlx4: Initialize ib_spec on the stack Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 38/57] RDMA/siw: Fix potential siw_mem refcnt leak in siw_fastreg_mr() Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 39/57] RDMA/core: Prevent mixed use of FDs between shared ufiles Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 40/57] RDMA/core: Fix race between destroy and release FD object Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 41/57] RDMA/cm: Fix ordering of xa_alloc_cyclic() in ib_create_cm_id() Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 42/57] RDMA/cm: Fix an error check in cm_alloc_id_priv() Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 43/57] i2c: iproc: generate stop event for slave writes Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 44/57] vfio: avoid possible overflow in vfio_iommu_type1_pin_pages Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 45/57] vfio/type1: Fix VA->PA translation for PFNMAP VMAs in vaddr_get_pfn() Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 46/57] iommu/qcom: Fix local_base status check Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 47/57] scsi: target/iblock: fix WRITE SAME zeroing Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 48/57] iommu/amd: Fix legacy interrupt remapping for x2APIC-enabled system Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 49/57] i2c: aspeed: Avoid i2c interrupt status clear race condition Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 50/57] ALSA: opti9xx: shut up gcc-10 range warning Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 51/57] Fix use after free in get_tree_bdev() Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 52/57] nvme: prevent double free in nvme_alloc_ns() error handling Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 53/57] nfs: Fix potential posix_acl refcnt leak in nfs3_set_acl Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 54/57] dmaengine: dmatest: Fix iteration non-stop logic Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 5.4 55/57] dmaengine: dmatest: Fix process hang when reading wait parameter Greg Kroah-Hartman
2020-05-04 17:58 ` [PATCH 5.4 56/57] arm64: vdso: Add -fasynchronous-unwind-tables to cflags Greg Kroah-Hartman
2020-05-04 17:58 ` [PATCH 5.4 57/57] selinux: properly handle multiple messages in selinux_netlink_send() Greg Kroah-Hartman
2020-05-05  8:38 ` [PATCH 5.4 00/57] 5.4.39-rc1 review Jon Hunter
2020-05-05 15:19 ` Naresh Kamboju
2020-05-05 15:43 ` Guenter Roeck
2020-05-05 15:44 ` shuah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200504165457.355495113@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).