All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Josef Bacik <jbacik@fb.com>,
	Filipe Manana <fdmanana@suse.com>,
	David Sterba <dsterba@suse.com>,
	Anand Jain <anand.jain@oracle.com>
Subject: [PATCH 4.14 01/30] btrfs: always wait on ordered extents at fsync time
Date: Mon, 25 Oct 2021 21:14:21 +0200	[thread overview]
Message-ID: <20211025190923.166779214@linuxfoundation.org> (raw)
In-Reply-To: <20211025190922.089277904@linuxfoundation.org>

From: Josef Bacik <jbacik@fb.com>

commit b5e6c3e170b77025b5f6174258c7ad71eed2d4de upstream.

There's a priority inversion that exists currently with btrfs fsync.  In
some cases we will collect outstanding ordered extents onto a list and
only wait on them at the very last second.  However this "very last
second" falls inside of a transaction handle, so if we are in a lower
priority cgroup we can end up holding the transaction open for longer
than needed, so if a high priority cgroup is also trying to fsync()
it'll see latency.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/btrfs/file.c |   56 ++++----------------------------------------------------
 1 file changed, 4 insertions(+), 52 deletions(-)

--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2102,53 +2102,12 @@ int btrfs_sync_file(struct file *file, l
 	atomic_inc(&root->log_batch);
 	full_sync = test_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
 			     &BTRFS_I(inode)->runtime_flags);
+
 	/*
-	 * We might have have had more pages made dirty after calling
-	 * start_ordered_ops and before acquiring the inode's i_mutex.
+	 * We have to do this here to avoid the priority inversion of waiting on
+	 * IO of a lower priority task while holding a transaciton open.
 	 */
-	if (full_sync) {
-		/*
-		 * For a full sync, we need to make sure any ordered operations
-		 * start and finish before we start logging the inode, so that
-		 * all extents are persisted and the respective file extent
-		 * items are in the fs/subvol btree.
-		 */
-		ret = btrfs_wait_ordered_range(inode, start, len);
-	} else {
-		/*
-		 * Start any new ordered operations before starting to log the
-		 * inode. We will wait for them to finish in btrfs_sync_log().
-		 *
-		 * Right before acquiring the inode's mutex, we might have new
-		 * writes dirtying pages, which won't immediately start the
-		 * respective ordered operations - that is done through the
-		 * fill_delalloc callbacks invoked from the writepage and
-		 * writepages address space operations. So make sure we start
-		 * all ordered operations before starting to log our inode. Not
-		 * doing this means that while logging the inode, writeback
-		 * could start and invoke writepage/writepages, which would call
-		 * the fill_delalloc callbacks (cow_file_range,
-		 * submit_compressed_extents). These callbacks add first an
-		 * extent map to the modified list of extents and then create
-		 * the respective ordered operation, which means in
-		 * tree-log.c:btrfs_log_inode() we might capture all existing
-		 * ordered operations (with btrfs_get_logged_extents()) before
-		 * the fill_delalloc callback adds its ordered operation, and by
-		 * the time we visit the modified list of extent maps (with
-		 * btrfs_log_changed_extents()), we see and process the extent
-		 * map they created. We then use the extent map to construct a
-		 * file extent item for logging without waiting for the
-		 * respective ordered operation to finish - this file extent
-		 * item points to a disk location that might not have yet been
-		 * written to, containing random data - so after a crash a log
-		 * replay will make our inode have file extent items that point
-		 * to disk locations containing invalid data, as we returned
-		 * success to userspace without waiting for the respective
-		 * ordered operation to finish, because it wasn't captured by
-		 * btrfs_get_logged_extents().
-		 */
-		ret = start_ordered_ops(inode, start, end);
-	}
+	ret = btrfs_wait_ordered_range(inode, start, len);
 	if (ret) {
 		up_write(&BTRFS_I(inode)->dio_sem);
 		inode_unlock(inode);
@@ -2283,13 +2242,6 @@ int btrfs_sync_file(struct file *file, l
 				goto out;
 			}
 		}
-		if (!full_sync) {
-			ret = btrfs_wait_ordered_range(inode, start, len);
-			if (ret) {
-				btrfs_end_transaction(trans);
-				goto out;
-			}
-		}
 		ret = btrfs_commit_transaction(trans);
 	} else {
 		ret = btrfs_end_transaction(trans);



  reply	other threads:[~2021-10-25 19:23 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-25 19:14 [PATCH 4.14 00/30] 4.14.253-rc1 review Greg Kroah-Hartman
2021-10-25 19:14 ` Greg Kroah-Hartman [this message]
2021-10-25 19:14 ` [PATCH 4.14 02/30] ARM: dts: at91: sama5d2_som1_ek: disable ISC node by default Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 03/30] xtensa: xtfpga: use CONFIG_USE_OF instead of CONFIG_OF Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 04/30] xtensa: xtfpga: Try software restart before simulating CPU reset Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 05/30] NFSD: Keep existing listeners on portlist error Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 06/30] netfilter: ipvs: make global sysctl readonly in non-init netns Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 07/30] NIOS2: irqflags: rename a redefined register name Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 08/30] can: rcar_can: fix suspend/resume Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 09/30] can: peak_usb: pcan_usb_fd_decode_status(): fix back to ERROR_ACTIVE state notification Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 10/30] can: peak_pci: peak_pci_remove(): fix UAF Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 11/30] ocfs2: fix data corruption after conversion from inline format Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 12/30] ocfs2: mount fails with buffer overflow in strlen Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 13/30] elfcore: correct reference to CONFIG_UML Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 14/30] vfs: check fd has read access in kernel_read_file_from_fd() Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 15/30] ALSA: usb-audio: Provide quirk for Sennheiser GSP670 Headset Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 16/30] ASoC: DAPM: Fix missing kctl change notifications Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 17/30] nfc: nci: fix the UAF of rf_conn_info object Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 18/30] isdn: cpai: check ctr->cnr to avoid array index out of bound Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 19/30] netfilter: Kconfig: use default y instead of m for bool config option Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 20/30] btrfs: deal with errors when checking if a dir entry exists during log replay Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 21/30] net: stmmac: add support for dwmac 3.40a Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 22/30] ARM: dts: spear3xx: Fix gmac node Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 23/30] isdn: mISDN: Fix sleeping function called from invalid context Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 24/30] platform/x86: intel_scu_ipc: Update timeout value in comment Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 25/30] ALSA: hda: avoid write to STATESTS if controller is in reset Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 26/30] scsi: core: Fix shost->cmd_per_lun calculation in scsi_add_host_with_dma() Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 27/30] usbnet: sanity check for maxpacket Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 28/30] net: mdiobus: Fix memory leak in __mdiobus_register Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 29/30] tracing: Have all levels of checks prevent recursion Greg Kroah-Hartman
2021-10-25 19:14 ` [PATCH 4.14 30/30] ARM: 9122/1: select HAVE_FUTEX_CMPXCHG Greg Kroah-Hartman
2021-10-26  9:20 ` [PATCH 4.14 00/30] 4.14.253-rc1 review Jon Hunter
2021-10-26 13:30 ` Naresh Kamboju
2021-10-26 19:15 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211025190923.166779214@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=anand.jain@oracle.com \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=jbacik@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.