All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Filipe Manana <fdmanana@suse.com>,
	David Sterba <dsterba@suse.com>
Subject: [PATCH 4.14 07/26] btrfs: fix partial loss of prealloc extent past i_size after fsync
Date: Mon,  4 May 2020 19:57:21 +0200	[thread overview]
Message-ID: <20200504165444.168138017@linuxfoundation.org> (raw)
In-Reply-To: <20200504165442.494398840@linuxfoundation.org>

From: Filipe Manana <fdmanana@suse.com>

commit f135cea30de5f74d5bfb5116682073841fb4af8f upstream.

When we have an inode with a prealloc extent that starts at an offset
lower than the i_size and there is another prealloc extent that starts at
an offset beyond i_size, we can end up losing part of the first prealloc
extent (the part that starts at i_size) and have an implicit hole if we
fsync the file and then have a power failure.

Consider the following example with comments explaining how and why it
happens.

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  # Create our test file with 2 consecutive prealloc extents, each with a
  # size of 128Kb, and covering the range from 0 to 256Kb, with a file
  # size of 0.
  $ xfs_io -f -c "falloc -k 0 128K" /mnt/foo
  $ xfs_io -c "falloc -k 128K 128K" /mnt/foo

  # Fsync the file to record both extents in the log tree.
  $ xfs_io -c "fsync" /mnt/foo

  # Now do a redudant extent allocation for the range from 0 to 64Kb.
  # This will merely increase the file size from 0 to 64Kb. Instead we
  # could also do a truncate to set the file size to 64Kb.
  $ xfs_io -c "falloc 0 64K" /mnt/foo

  # Fsync the file, so we update the inode item in the log tree with the
  # new file size (64Kb). This also ends up setting the number of bytes
  # for the first prealloc extent to 64Kb. This is done by the truncation
  # at btrfs_log_prealloc_extents().
  # This means that if a power failure happens after this, a write into
  # the file range 64Kb to 128Kb will not use the prealloc extent and
  # will result in allocation of a new extent.
  $ xfs_io -c "fsync" /mnt/foo

  # Now set the file size to 256K with a truncate and then fsync the file.
  # Since no changes happened to the extents, the fsync only updates the
  # i_size in the inode item at the log tree. This results in an implicit
  # hole for the file range from 64Kb to 128Kb, something which fsck will
  # complain when not using the NO_HOLES feature if we replay the log
  # after a power failure.
  $ xfs_io -c "truncate 256K" -c "fsync" /mnt/foo

So instead of always truncating the log to the inode's current i_size at
btrfs_log_prealloc_extents(), check first if there's a prealloc extent
that starts at an offset lower than the i_size and with a length that
crosses the i_size - if there is one, just make sure we truncate to a
size that corresponds to the end offset of that prealloc extent, so
that we don't lose the part of that extent that starts at i_size if a
power failure happens.

A test case for fstests follows soon.

Fixes: 31d11b83b96f ("Btrfs: fix duplicate extents after fsync of file with prealloc extents")
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/tree-log.c |   43 ++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 40 insertions(+), 3 deletions(-)

--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4155,6 +4155,9 @@ static int btrfs_log_prealloc_extents(st
 	const u64 ino = btrfs_ino(inode);
 	struct btrfs_path *dst_path = NULL;
 	bool dropped_extents = false;
+	u64 truncate_offset = i_size;
+	struct extent_buffer *leaf;
+	int slot;
 	int ins_nr = 0;
 	int start_slot;
 	int ret;
@@ -4169,9 +4172,43 @@ static int btrfs_log_prealloc_extents(st
 	if (ret < 0)
 		goto out;
 
+	/*
+	 * We must check if there is a prealloc extent that starts before the
+	 * i_size and crosses the i_size boundary. This is to ensure later we
+	 * truncate down to the end of that extent and not to the i_size, as
+	 * otherwise we end up losing part of the prealloc extent after a log
+	 * replay and with an implicit hole if there is another prealloc extent
+	 * that starts at an offset beyond i_size.
+	 */
+	ret = btrfs_previous_item(root, path, ino, BTRFS_EXTENT_DATA_KEY);
+	if (ret < 0)
+		goto out;
+
+	if (ret == 0) {
+		struct btrfs_file_extent_item *ei;
+
+		leaf = path->nodes[0];
+		slot = path->slots[0];
+		ei = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
+
+		if (btrfs_file_extent_type(leaf, ei) ==
+		    BTRFS_FILE_EXTENT_PREALLOC) {
+			u64 extent_end;
+
+			btrfs_item_key_to_cpu(leaf, &key, slot);
+			extent_end = key.offset +
+				btrfs_file_extent_num_bytes(leaf, ei);
+
+			if (extent_end > i_size)
+				truncate_offset = extent_end;
+		}
+	} else {
+		ret = 0;
+	}
+
 	while (true) {
-		struct extent_buffer *leaf = path->nodes[0];
-		int slot = path->slots[0];
+		leaf = path->nodes[0];
+		slot = path->slots[0];
 
 		if (slot >= btrfs_header_nritems(leaf)) {
 			if (ins_nr > 0) {
@@ -4209,7 +4246,7 @@ static int btrfs_log_prealloc_extents(st
 				ret = btrfs_truncate_inode_items(trans,
 							 root->log_root,
 							 &inode->vfs_inode,
-							 i_size,
+							 truncate_offset,
 							 BTRFS_EXTENT_DATA_KEY);
 			} while (ret == -EAGAIN);
 			if (ret)



  parent reply	other threads:[~2020-05-04 18:14 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-04 17:57 [PATCH 4.14 00/26] 4.14.179-rc1 review Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 01/26] ext4: fix special inode number checks in __ext4_iget() Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 02/26] drm/edid: Fix off-by-one in DispID DTD pixel clock Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 03/26] drm/qxl: qxl_release leak in qxl_draw_dirty_fb() Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 04/26] drm/qxl: qxl_release leak in qxl_hw_surface_alloc() Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 05/26] drm/qxl: qxl_release use after free Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 06/26] btrfs: fix block group leak when removing fails Greg Kroah-Hartman
2020-05-04 17:57 ` Greg Kroah-Hartman [this message]
2020-05-04 17:57 ` [PATCH 4.14 08/26] mmc: sdhci-xenon: fix annoying 1.8V regulator warning Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 09/26] mmc: sdhci-pci: Fix eMMC driver strength for BYT-based controllers Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 10/26] ALSA: hda/realtek - Two front mics on a Lenovo ThinkCenter Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 11/26] ALSA: hda/hdmi: fix without unlocked before return Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 12/26] ALSA: pcm: oss: Place the plugin buffer overflow checks correctly Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 13/26] PM: ACPI: Output correct message on target power state Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 14/26] PM: hibernate: Freeze kernel threads in software_resume() Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 15/26] dm verity fec: fix hash block number in verity_fec_decode Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 16/26] RDMA/mlx5: Set GRH fields in query QP on RoCE Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 17/26] RDMA/mlx4: Initialize ib_spec on the stack Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 18/26] vfio: avoid possible overflow in vfio_iommu_type1_pin_pages Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 19/26] vfio/type1: Fix VA->PA translation for PFNMAP VMAs in vaddr_get_pfn() Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 20/26] iommu/qcom: Fix local_base status check Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 21/26] scsi: target/iblock: fix WRITE SAME zeroing Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 22/26] iommu/amd: Fix legacy interrupt remapping for x2APIC-enabled system Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 23/26] ALSA: opti9xx: shut up gcc-10 range warning Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 24/26] nfs: Fix potential posix_acl refcnt leak in nfs3_set_acl Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 25/26] dmaengine: dmatest: Fix iteration non-stop logic Greg Kroah-Hartman
2020-05-04 17:57 ` [PATCH 4.14 26/26] selinux: properly handle multiple messages in selinux_netlink_send() Greg Kroah-Hartman
     [not found] ` <20200504165442.494398840-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
2020-05-05  8:37   ` [PATCH 4.14 00/26] 4.14.179-rc1 review Jon Hunter
2020-05-05  8:37     ` Jon Hunter
2020-05-05 15:35 ` Naresh Kamboju
2020-05-05 15:48 ` Guenter Roeck
2020-05-05 15:53 ` shuah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200504165444.168138017@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.