linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Filipe Manana <fdmanana@suse.com>,
	David Sterba <dsterba@suse.com>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.4 03/23] btrfs: fix race causing unnecessary inode logging during link and rename
Date: Fri,  6 Aug 2021 10:16:35 +0200	[thread overview]
Message-ID: <20210806081112.226646648@linuxfoundation.org> (raw)
In-Reply-To: <20210806081112.104686873@linuxfoundation.org>

From: Filipe Manana <fdmanana@suse.com>

[ Upstream commit de53d892e5c51dfa0a158e812575a75a6c991f39 ]

When we are doing a rename or a link operation for an inode that was logged
in the previous transaction and that transaction is still committing, we
have a time window where we incorrectly consider that the inode was logged
previously in the current transaction and therefore decide to log it to
update it in the log. The following steps give an example on how this
happens during a link operation:

1) Inode X is logged in transaction 1000, so its logged_trans field is set
   to 1000;

2) Task A starts to commit transaction 1000;

3) The state of transaction 1000 is changed to TRANS_STATE_UNBLOCKED;

4) Task B starts a link operation for inode X, and as a consequence it
   starts transaction 1001;

5) Task A is still committing transaction 1000, therefore the value stored
   at fs_info->last_trans_committed is still 999;

6) Task B calls btrfs_log_new_name(), it reads a value of 999 from
   fs_info->last_trans_committed and because the logged_trans field of
   inode X has a value of 1000, the function does not return immediately,
   instead it proceeds to logging the inode, which should not happen
   because the inode was logged in the previous transaction (1000) and
   not in the current one (1001).

This is not a functional problem, just wasted time and space logging an
inode that does not need to be logged, contributing to higher latency
for link and rename operations.

So fix this by comparing the inodes' logged_trans field with the
generation of the current transaction instead of comparing with the value
stored in fs_info->last_trans_committed.

This case is often hit when running dbench for a long enough duration, as
it does lots of rename operations.

This patch belongs to a patch set that is comprised of the following
patches:

  btrfs: fix race causing unnecessary inode logging during link and rename
  btrfs: fix race that results in logging old extents during a fast fsync
  btrfs: fix race that causes unnecessary logging of ancestor inodes
  btrfs: fix race that makes inode logging fallback to transaction commit
  btrfs: fix race leading to unnecessary transaction commit when logging inode
  btrfs: do not block inode logging for so long during transaction commit

Performance results are mentioned in the change log of the last patch.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/btrfs/tree-log.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 53607156b008..9912b5231bcf 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -6437,7 +6437,6 @@ void btrfs_log_new_name(struct btrfs_trans_handle *trans,
 			struct btrfs_inode *inode, struct btrfs_inode *old_dir,
 			struct dentry *parent)
 {
-	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_log_ctx ctx;
 
 	/*
@@ -6451,8 +6450,8 @@ void btrfs_log_new_name(struct btrfs_trans_handle *trans,
 	 * if this inode hasn't been logged and directory we're renaming it
 	 * from hasn't been logged, we don't need to log it
 	 */
-	if (inode->logged_trans <= fs_info->last_trans_committed &&
-	    (!old_dir || old_dir->logged_trans <= fs_info->last_trans_committed))
+	if (inode->logged_trans < trans->transid &&
+	    (!old_dir || old_dir->logged_trans < trans->transid))
 		return;
 
 	btrfs_init_log_ctx(&ctx, &inode->vfs_inode);
-- 
2.30.2




  parent reply	other threads:[~2021-08-06  8:20 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-06  8:16 [PATCH 5.4 00/23] 5.4.139-rc1 review Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 01/23] btrfs: delete duplicated words + other fixes in comments Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 02/23] btrfs: do not commit logs and transactions during link and rename operations Greg Kroah-Hartman
2021-08-06  8:16 ` Greg Kroah-Hartman [this message]
2021-08-06  8:16 ` [PATCH 5.4 04/23] btrfs: fix lost inode on log replay after mix of fsync, rename and inode eviction Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 05/23] regulator: rt5033: Fix n_voltages settings for BUCK and LDO Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 06/23] spi: stm32h7: fix full duplex irq handler handling Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 07/23] ASoC: tlv320aic31xx: fix reversed bclk/wclk master bits Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 08/23] r8152: Fix potential PM refcount imbalance Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 09/23] qed: fix possible unpaired spin_{un}lock_bh in _qed_mcp_cmd_and_union() Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 10/23] net: Fix zero-copy head len calculation Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 11/23] nvme: fix nvme_setup_command metadata trace event Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 12/23] ACPI: fix NULL pointer dereference Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 13/23] Revert "spi: mediatek: fix fifo rx mode" Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 14/23] Revert "Bluetooth: Shutdown controller after workqueues are flushed or cancelled" Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 15/23] firmware: arm_scmi: Ensure drivers provide a probe function Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 16/23] firmware: arm_scmi: Add delayed response status check Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 17/23] Revert "watchdog: iTCO_wdt: Account for rebooting on second timeout" Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 18/23] bpf: Inherit expanded/patched seen count from old aux data Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 19/23] bpf: Do not mark insn as seen under speculative path verification Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 20/23] bpf: Fix leakage under speculation on mispredicted branches Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 21/23] bpf: Test_verifier, add alu32 bounds tracking tests Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 22/23] bpf, selftests: Add a verifier test for assigning 32bit reg states to 64bit ones Greg Kroah-Hartman
2021-08-06  8:16 ` [PATCH 5.4 23/23] bpf, selftests: Adjust few selftest outcomes wrt unreachable code Greg Kroah-Hartman
2021-08-06 18:58 ` [PATCH 5.4 00/23] 5.4.139-rc1 review Guenter Roeck
2021-08-07 10:41 ` Sudip Mukherjee
2021-08-07 18:40 ` Naresh Kamboju
2021-08-08  3:27 ` Aakash Hemadri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210806081112.226646648@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).