All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Filipe Manana <fdmanana@suse.com>,
	Nikolay Borisov <nborisov@suse.com>,
	Josef Bacik <josef@toxicpanda.com>, Qu Wenruo <wqu@suse.com>,
	David Sterba <dsterba@suse.com>,
	Anand Jain <anand.jain@oracle.com>
Subject: [PATCH 5.4 29/85] btrfs: Ensure we trim ranges across block group boundary
Date: Mon, 12 Oct 2020 15:26:52 +0200	[thread overview]
Message-ID: <20201012132634.255502969@linuxfoundation.org> (raw)
In-Reply-To: <20201012132632.846779148@linuxfoundation.org>

From: Qu Wenruo <wqu@suse.com>

commit 6b7faadd985c990324b5b5bd18cc4ba5c395eb65 upstream.

[BUG]
When deleting large files (which cross block group boundary) with
discard mount option, we find some btrfs_discard_extent() calls only
trimmed part of its space, not the whole range:

  btrfs_discard_extent: type=0x1 start=19626196992 len=2144530432 trimmed=1073741824 ratio=50%

type:		bbio->map_type, in above case, it's SINGLE DATA.
start:		Logical address of this trim
len:		Logical length of this trim
trimmed:	Physically trimmed bytes
ratio:		trimmed / len

Thus leaving some unused space not discarded.

[CAUSE]
When discard mount option is specified, after a transaction is fully
committed (super block written to disk), we begin to cleanup pinned
extents in the following call chain:

btrfs_commit_transaction()
|- btrfs_finish_extent_commit()
   |- find_first_extent_bit(unpin, 0, &start, &end, EXTENT_DIRTY);
   |- btrfs_discard_extent()

However, pinned extents are recorded in an extent_io_tree, which can
merge adjacent extent states.

When a large file gets deleted and it has adjacent file extents across
block group boundary, we will get a large merged range like this:

      |<---    BG1    --->|<---      BG2     --->|
      |//////|<--   Range to discard   --->|/////|

To discard that range, we have the following calls:

  btrfs_discard_extent()
  |- btrfs_map_block()
  |  Returned bbio will end at BG1's end. As btrfs_map_block()
  |  never returns result across block group boundary.
  |- btrfs_issuse_discard()
     Issue discard for each stripe.

So we will only discard the range in BG1, not the remaining part in BG2.

Furthermore, this bug is not that reliably observed, for above case, if
there is no other extent in BG2, BG2 will be empty and btrfs will trim
all space of BG2, covering up the bug.

[FIX]
- Allow __btrfs_map_block_for_discard() to modify @length parameter
  btrfs_map_block() uses its @length paramter to notify the caller how
  many bytes are mapped in current call.
  With __btrfs_map_block_for_discard() also modifing the @length,
  btrfs_discard_extent() now understands when to do extra trim.

- Call btrfs_map_block() in a loop until we hit the range end Since we
  now know how many bytes are mapped each time, we can iterate through
  each block group boundary and issue correct trim for each range.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/btrfs/extent-tree.c |   41 +++++++++++++++++++++++++++++++----------
 fs/btrfs/volumes.c     |    6 ++++--
 2 files changed, 35 insertions(+), 12 deletions(-)

--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1305,8 +1305,10 @@ static int btrfs_issue_discard(struct bl
 int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
 			 u64 num_bytes, u64 *actual_bytes)
 {
-	int ret;
+	int ret = 0;
 	u64 discarded_bytes = 0;
+	u64 end = bytenr + num_bytes;
+	u64 cur = bytenr;
 	struct btrfs_bio *bbio = NULL;
 
 
@@ -1315,15 +1317,23 @@ int btrfs_discard_extent(struct btrfs_fs
 	 * associated to its stripes that don't go away while we are discarding.
 	 */
 	btrfs_bio_counter_inc_blocked(fs_info);
-	/* Tell the block device(s) that the sectors can be discarded */
-	ret = btrfs_map_block(fs_info, BTRFS_MAP_DISCARD, bytenr, &num_bytes,
-			      &bbio, 0);
-	/* Error condition is -ENOMEM */
-	if (!ret) {
-		struct btrfs_bio_stripe *stripe = bbio->stripes;
+	while (cur < end) {
+		struct btrfs_bio_stripe *stripe;
 		int i;
 
+		num_bytes = end - cur;
+		/* Tell the block device(s) that the sectors can be discarded */
+		ret = btrfs_map_block(fs_info, BTRFS_MAP_DISCARD, cur,
+				      &num_bytes, &bbio, 0);
+		/*
+		 * Error can be -ENOMEM, -ENOENT (no such chunk mapping) or
+		 * -EOPNOTSUPP. For any such error, @num_bytes is not updated,
+		 * thus we can't continue anyway.
+		 */
+		if (ret < 0)
+			goto out;
 
+		stripe = bbio->stripes;
 		for (i = 0; i < bbio->num_stripes; i++, stripe++) {
 			u64 bytes;
 			struct request_queue *req_q;
@@ -1340,10 +1350,19 @@ int btrfs_discard_extent(struct btrfs_fs
 						  stripe->physical,
 						  stripe->length,
 						  &bytes);
-			if (!ret)
+			if (!ret) {
 				discarded_bytes += bytes;
-			else if (ret != -EOPNOTSUPP)
-				break; /* Logic errors or -ENOMEM, or -EIO but I don't know how that could happen JDM */
+			} else if (ret != -EOPNOTSUPP) {
+				/*
+				 * Logic errors or -ENOMEM, or -EIO, but
+				 * unlikely to happen.
+				 *
+				 * And since there are two loops, explicitly
+				 * go to out to avoid confusion.
+				 */
+				btrfs_put_bbio(bbio);
+				goto out;
+			}
 
 			/*
 			 * Just in case we get back EOPNOTSUPP for some reason,
@@ -1353,7 +1372,9 @@ int btrfs_discard_extent(struct btrfs_fs
 			ret = 0;
 		}
 		btrfs_put_bbio(bbio);
+		cur += num_bytes;
 	}
+out:
 	btrfs_bio_counter_dec(fs_info);
 
 	if (actual_bytes)
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5676,12 +5676,13 @@ void btrfs_put_bbio(struct btrfs_bio *bb
  * replace.
  */
 static int __btrfs_map_block_for_discard(struct btrfs_fs_info *fs_info,
-					 u64 logical, u64 length,
+					 u64 logical, u64 *length_ret,
 					 struct btrfs_bio **bbio_ret)
 {
 	struct extent_map *em;
 	struct map_lookup *map;
 	struct btrfs_bio *bbio;
+	u64 length = *length_ret;
 	u64 offset;
 	u64 stripe_nr;
 	u64 stripe_nr_end;
@@ -5715,6 +5716,7 @@ static int __btrfs_map_block_for_discard
 
 	offset = logical - em->start;
 	length = min_t(u64, em->start + em->len - logical, length);
+	*length_ret = length;
 
 	stripe_len = map->stripe_len;
 	/*
@@ -6129,7 +6131,7 @@ static int __btrfs_map_block(struct btrf
 
 	if (op == BTRFS_MAP_DISCARD)
 		return __btrfs_map_block_for_discard(fs_info, logical,
-						     *length, bbio_ret);
+						     length, bbio_ret);
 
 	ret = btrfs_get_io_geometry(fs_info, op, logical, *length, &geom);
 	if (ret < 0)



  parent reply	other threads:[~2020-10-12 13:41 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-12 13:26 [PATCH 5.4 00/85] 5.4.71-rc1 review Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 01/85] fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 02/85] Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 03/85] fbcon: Fix global-out-of-bounds read in fbcon_get_font() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 04/85] Revert "ravb: Fixed to be able to unload modules" Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 05/85] io_uring: Fix resource leaking when kill the process Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 06/85] io_uring: Fix missing smp_mb() in io_cancel_async_work() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 07/85] io_uring: Fix remove irrelevant req from the task_list Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 08/85] io_uring: Fix double list add in io_queue_async_work() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 09/85] net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 10/85] drm/nouveau/mem: guard against NULL pointer access in mem_del Greg Kroah-Hartman
2020-10-12 13:26   ` Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 11/85] vhost: Dont call access_ok() when using IOTLB Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 12/85] vhost: Use vhost_get_used_size() in vhost_vring_set_addr() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 13/85] usermodehelper: reset umask to default before executing user process Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 14/85] Platform: OLPC: Fix memleak in olpc_ec_probe Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 15/85] platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on the HP Pavilion 11 x360 Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 16/85] platform/x86: thinkpad_acpi: initialize tp_nvram_state variable Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 17/85] bpf: Fix sysfs export of empty BTF section Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 18/85] bpf: Prevent .BTF section elimination Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 19/85] platform/x86: intel-vbtn: Switch to an allow-list for SW_TABLET_MODE reporting Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 20/85] platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuse Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 21/85] driver core: Fix probe_count imbalance in really_probe() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 22/85] perf test session topology: Fix data path Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 23/85] perf top: Fix stdio interface input handling with glibc 2.28+ Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 24/85] i2c: i801: Exclude device from suspend direct complete optimization Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 25/85] arm64: dts: stratix10: add status to qspi dts node Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 26/85] Btrfs: send, allow clone operations within the same file Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 27/85] Btrfs: send, fix emission of invalid " Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 28/85] btrfs: volumes: Use more straightforward way to calculate map length Greg Kroah-Hartman
2020-10-12 13:26 ` Greg Kroah-Hartman [this message]
2020-10-12 13:26 ` [PATCH 5.4 30/85] btrfs: fix RWF_NOWAIT write not failling when we need to cow Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 31/85] btrfs: allow btrfs_truncate_block() to fallback to nocow for data space reservation Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 32/85] nvme-core: put ctrl ref when module ref get fail Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 33/85] macsec: avoid use-after-free in macsec_handle_frame() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 34/85] mm/khugepaged: fix filemap page_to_pgoff(page) != offset Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 35/85] net: introduce helper sendpage_ok() in include/linux/net.h Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 36/85] tcp: use sendpage_ok() to detect misused .sendpage Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 37/85] nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 38/85] xfrmi: drop ignore_df check before updating pmtu Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 39/85] cifs: Fix incomplete memory allocation on setxattr path Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 40/85] i2c: meson: fix clock setting overwrite Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 41/85] i2c: meson: fixup rate calculation with filter delay Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 42/85] i2c: owl: Clear NACK and BUS error bits Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 43/85] sctp: fix sctp_auth_init_hmacs() error path Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 44/85] team: set dev->needed_headroom in team_setup_by_port() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 45/85] net: team: fix memory leak in __team_options_register Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 46/85] openvswitch: handle DNAT tuple collision Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 47/85] drm/amdgpu: prevent double kfree ttm->sg Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 48/85] iommu/vt-d: Fix lockdep splat in iommu_flush_dev_iotlb() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 49/85] xfrm: clone XFRMA_SET_MARK in xfrm_do_migrate Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 50/85] xfrm: clone XFRMA_REPLAY_ESN_VAL " Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 51/85] xfrm: clone XFRMA_SEC_CTX " Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 52/85] xfrm: clone whole liftime_cur structure " Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 53/85] net: stmmac: removed enabling eee in EEE set callback Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 54/85] platform/x86: fix kconfig dependency warning for FUJITSU_LAPTOP Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 55/85] xfrm: Use correct address family in xfrm_state_find Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 56/85] iavf: use generic power management Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 57/85] iavf: Fix incorrect adapter get in iavf_resume Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 58/85] net: ethernet: cavium: octeon_mgmt: use phy_start and phy_stop Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 59/85] bonding: set dev->needed_headroom in bond_setup_by_slave() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 60/85] mdio: fix mdio-thunder.c dependency & build error Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 61/85] mlxsw: spectrum_acl: Fix mlxsw_sp_acl_tcam_group_add()s error path Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 62/85] r8169: fix RTL8168f/RTL8411 EPHY config Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 63/85] net: usb: ax88179_178a: fix missing stop entry in driver_info Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 64/85] virtio-net: dont disable guest csum when disable LRO Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 65/85] net/mlx5: Avoid possible free of command entry while timeout comp handler Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 66/85] net/mlx5: Fix request_irqs error flow Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 67/85] net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 68/85] net/mlx5e: Fix VLAN cleanup flow Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 69/85] net/mlx5e: Fix VLAN create flow Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 70/85] rxrpc: Fix rxkad token xdr encoding Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 71/85] rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 72/85] rxrpc: Fix some missing _bh annotations on locking conn->state_lock Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 73/85] rxrpc: The server keyring isnt network-namespaced Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 74/85] rxrpc: Fix server keyring leak Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 75/85] perf: Fix task_function_call() error handling Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 76/85] mmc: core: dont set limits.discard_granularity as 0 Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 77/85] mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 78/85] tcp: fix receive window update in tcp_add_backlog() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 79/85] net/core: check length before updating Ethertype in skb_mpls_{push,pop} Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 80/85] net/tls: race causes kernel panic Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 81/85] net/mlx5e: Fix drivers declaration to support GRE offload Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 82/85] Input: ati_remote2 - add missing newlines when printing module parameters Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 83/85] net: usb: rtl8150: set random MAC address when set_ethernet_addr() fails Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 84/85] net_sched: defer tcf_idr_insert() in tcf_action_init_1() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 85/85] net_sched: commit action insertions together Greg Kroah-Hartman
2020-10-12 18:27 ` [PATCH 5.4 00/85] 5.4.71-rc1 review Jon Hunter
2020-10-13  5:52 ` Naresh Kamboju
2020-10-13 16:41 ` Guenter Roeck
2020-10-14  1:24 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201012132634.255502969@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=anand.jain@oracle.com \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nborisov@suse.com \
    --cc=stable@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.