All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Dave Chinner <dchinner@redhat.com>, Jens Axboe <axboe@kernel.dk>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	Sasha Levin <sashal@kernel.org>,
	linux-xfs@vger.kernel.org
Subject: [PATCH AUTOSEL 5.9 33/33] xfs: don't allow NOWAIT DIO across extent boundaries
Date: Wed, 25 Nov 2020 10:35:50 -0500	[thread overview]
Message-ID: <20201125153550.810101-33-sashal@kernel.org> (raw)
In-Reply-To: <20201125153550.810101-1-sashal@kernel.org>

From: Dave Chinner <dchinner@redhat.com>

[ Upstream commit 883a790a84401f6f55992887fd7263d808d4d05d ]

Jens has reported a situation where partial direct IOs can be issued
and completed yet still return -EAGAIN. We don't want this to report
a short IO as we want XFS to complete user DIO entirely or not at
all.

This partial IO situation can occur on a write IO that is split
across an allocated extent and a hole, and the second mapping is
returning EAGAIN because allocation would be required.

The trivial reproducer:

$ sudo xfs_io -fdt -c "pwrite 0 4k" -c "pwrite -V 1 -b 8k -N 0 8k" /mnt/scr/foo
wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 0.0001 sec (27.509 MiB/sec and 7042.2535 ops/sec)
pwrite: Resource temporarily unavailable
$

The pwritev2(0, 8kB, RWF_NOWAIT) call returns EAGAIN having done
the first 4kB write:

 xfs_file_direct_write: dev 259:1 ino 0x83 size 0x1000 offset 0x0 count 0x2000
 iomap_apply:          dev 259:1 ino 0x83 pos 0 length 8192 flags WRITE|DIRECT|NOWAIT (0x31) ops xfs_direct_write_iomap_ops caller iomap_dio_rw actor iomap_dio_actor
 xfs_ilock_nowait:     dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_ilock_for_iomap
 xfs_iunlock:          dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_direct_write_iomap_begin
 xfs_iomap_found:      dev 259:1 ino 0x83 size 0x1000 offset 0x0 count 8192 fork data startoff 0x0 startblock 24 blockcount 0x1
 iomap_apply_dstmap:   dev 259:1 ino 0x83 bdev 259:1 addr 102400 offset 0 length 4096 type MAPPED flags DIRTY

Here the first iomap loop has mapped the first 4kB of the file and
issued the IO, and we enter the second iomap_apply loop:

 iomap_apply: dev 259:1 ino 0x83 pos 4096 length 4096 flags WRITE|DIRECT|NOWAIT (0x31) ops xfs_direct_write_iomap_ops caller iomap_dio_rw actor iomap_dio_actor
 xfs_ilock_nowait:     dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_ilock_for_iomap
 xfs_iunlock:          dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_direct_write_iomap_begin

And we exit with -EAGAIN out because we hit the allocate case trying
to make the second 4kB block.

Then IO completes on the first 4kB and the original IO context
completes and unlocks the inode, returning -EAGAIN to userspace:

 xfs_end_io_direct_write: dev 259:1 ino 0x83 isize 0x1000 disize 0x1000 offset 0x0 count 4096
 xfs_iunlock:          dev 259:1 ino 0x83 flags IOLOCK_SHARED caller xfs_file_dio_aio_write

There are other vectors to the same problem when we re-enter the
mapping code if we have to make multiple mappinfs under NOWAIT
conditions. e.g. failing trylocks, COW extents being found,
allocation being required, and so on.

Avoid all these potential problems by only allowing IOMAP_NOWAIT IO
to go ahead if the mapping we retrieve for the IO spans an entire
allocated extent. This avoids the possibility of subsequent mappings
to complete the IO from triggering NOWAIT semantics by any means as
NOWAIT IO will now only enter the mapping code once per NOWAIT IO.

Reported-and-tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/xfs/xfs_iomap.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 3abb8b9d6f4c6..7b9ff824e82d4 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -706,6 +706,23 @@ xfs_ilock_for_iomap(
 	return 0;
 }
 
+/*
+ * Check that the imap we are going to return to the caller spans the entire
+ * range that the caller requested for the IO.
+ */
+static bool
+imap_spans_range(
+	struct xfs_bmbt_irec	*imap,
+	xfs_fileoff_t		offset_fsb,
+	xfs_fileoff_t		end_fsb)
+{
+	if (imap->br_startoff > offset_fsb)
+		return false;
+	if (imap->br_startoff + imap->br_blockcount < end_fsb)
+		return false;
+	return true;
+}
+
 static int
 xfs_direct_write_iomap_begin(
 	struct inode		*inode,
@@ -766,6 +783,18 @@ xfs_direct_write_iomap_begin(
 	if (imap_needs_alloc(inode, flags, &imap, nimaps))
 		goto allocate_blocks;
 
+	/*
+	 * NOWAIT IO needs to span the entire requested IO with a single map so
+	 * that we avoid partial IO failures due to the rest of the IO range not
+	 * covered by this map triggering an EAGAIN condition when it is
+	 * subsequently mapped and aborting the IO.
+	 */
+	if ((flags & IOMAP_NOWAIT) &&
+	    !imap_spans_range(&imap, offset_fsb, end_fsb)) {
+		error = -EAGAIN;
+		goto out_unlock;
+	}
+
 	xfs_iunlock(ip, lockmode);
 	trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap);
 	return xfs_bmbt_to_iomap(ip, iomap, &imap, iomap_flags);
-- 
2.27.0


  parent reply	other threads:[~2020-11-25 15:42 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-25 15:35 [PATCH AUTOSEL 5.9 01/33] HID: uclogic: Add ID for Trust Flex Design Tablet Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 02/33] HID: ite: Replace ABS_MISC 120/121 events with touchpad on/off keypresses Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 03/33] HID: cypress: Support Varmilo Keyboards' media hotkeys Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 04/33] HID: add support for Sega Saturn Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 05/33] Input: i8042 - allow insmod to succeed on devices without an i8042 controller Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 06/33] HID: hid-sensor-hub: Fix issue with devices with no report ID Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 07/33] staging: ralink-gdma: fix kconfig dependency bug for DMA_RALINK Sasha Levin
2020-11-25 15:35   ` Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 08/33] HID: add HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE for Gamevice devices Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 09/33] dmaengine: xilinx_dma: use readl_poll_timeout_atomic variant Sasha Levin
2020-11-25 15:35   ` Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 10/33] x86/xen: don't unbind uninitialized lock_kicker_irq Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 11/33] kunit: fix display of failed expectations for strings Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 12/33] HID: logitech-hidpp: Add HIDPP_CONSUMER_VENDOR_KEYS quirk for the Dinovo Edge Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 13/33] HID: Add Logitech Dinovo Edge battery quirk Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 14/33] proc: don't allow async path resolution of /proc/self components Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 15/33] nvme: free sq/cq dbbuf pointers when dbbuf set fails Sasha Levin
2020-11-25 15:35   ` Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 16/33] io_uring: handle -EOPNOTSUPP on path resolution Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 17/33] net: stmmac: dwmac_lib: enlarge dma reset timeout Sasha Levin
2020-11-25 15:35   ` Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 18/33] vdpasim: fix "mac_pton" undefined error Sasha Levin
2020-11-25 15:35   ` Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 19/33] vhost: add helper to check if a vq has been setup Sasha Levin
2020-11-25 15:35   ` Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 20/33] vhost scsi: alloc cmds per vq instead of session Sasha Levin
2020-11-25 15:35   ` Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 21/33] vhost scsi: fix cmd completion race Sasha Levin
2020-11-25 15:35   ` Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 22/33] vhost scsi: add lun parser helper Sasha Levin
2020-11-25 15:35   ` Sasha Levin
2020-11-25 17:48   ` Paolo Bonzini
2020-11-25 17:48     ` Paolo Bonzini
2020-11-25 18:01     ` Sasha Levin
2020-11-25 18:01       ` Sasha Levin
2020-11-25 18:08       ` Paolo Bonzini
2020-11-25 18:08         ` Paolo Bonzini
2020-11-29  4:13         ` Sasha Levin
2020-11-29  4:13           ` Sasha Levin
2020-11-29 17:34           ` Paolo Bonzini
2020-11-29 17:34             ` Paolo Bonzini
2020-11-29 21:06             ` Sasha Levin
2020-11-29 21:06               ` Sasha Levin
2020-11-30  8:33               ` Paolo Bonzini
2020-11-30  8:33                 ` Paolo Bonzini
2020-11-30 13:28                 ` Greg KH
2020-11-30 13:28                   ` Greg KH
2020-11-30 13:52                   ` Paolo Bonzini
2020-11-30 13:52                     ` Paolo Bonzini
2020-11-30 13:57                     ` Greg KH
2020-11-30 13:57                       ` Greg KH
2020-11-30 14:00                       ` Paolo Bonzini
2020-11-30 14:00                         ` Paolo Bonzini
2020-11-30 17:34                         ` Sasha Levin
2020-11-30 17:34                           ` Sasha Levin
2020-11-30 17:38                 ` Sasha Levin
2020-11-30 17:38                   ` Sasha Levin
2020-11-30 17:52                   ` Paolo Bonzini
2020-11-30 17:52                     ` Paolo Bonzini
2020-11-30 19:44                     ` Mike Christie
2020-11-30 20:29                       ` Paolo Bonzini
2020-11-30 20:29                         ` Paolo Bonzini
2020-11-30 23:59                         ` Sasha Levin
2020-11-30 23:59                           ` Sasha Levin
2020-12-04  8:27                           ` Paolo Bonzini
2020-12-04  8:27                             ` Paolo Bonzini
2020-12-04 15:49                             ` Sasha Levin
2020-12-04 15:49                               ` Sasha Levin
2020-12-04 16:12                               ` Joe Perches
2020-12-04 16:12                                 ` Joe Perches
2020-12-04 17:08                               ` Paolo Bonzini
2020-12-04 17:08                                 ` Paolo Bonzini
2020-12-05 20:59                                 ` Sasha Levin
2020-12-05 20:59                                   ` Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 23/33] vhost scsi: Add support for LUN resets Sasha Levin
2020-11-25 15:35   ` Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 24/33] cpuidle: tegra: Annotate tegra_pm_set_cpu_in_lp2() with RCU_NONIDLE Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 25/33] dmaengine: pl330: _prep_dma_memcpy: Fix wrong burst size Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 26/33] scsi: libiscsi: Fix NOP race condition Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 27/33] scsi: target: iscsi: Fix cmd abort fabric stop race Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 28/33] lockdep: Put graph lock/unlock under lock_recursion protection Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 29/33] perf/x86: fix sysfs type mismatches Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 30/33] xtensa: uaccess: Add missing __user to strncpy_from_user() prototype Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 31/33] x86/dumpstack: Do not try to access user space code of other tasks Sasha Levin
2020-11-25 15:35 ` [PATCH AUTOSEL 5.9 32/33] net: dsa: mv88e6xxx: Wait for EEPROM done after HW reset Sasha Levin
2020-11-25 15:35 ` Sasha Levin [this message]
2020-11-25 21:52   ` [PATCH AUTOSEL 5.9 33/33] xfs: don't allow NOWAIT DIO across extent boundaries Dave Chinner
2020-11-25 23:46     ` Sasha Levin
2020-11-26  7:13       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201125153550.810101-33-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=darrick.wong@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.