Linux-XFS Archive on lore.kernel.org
 help / color / Atom feed
* RFC: hold i_rwsem until aio completes
@ 2020-01-14 16:12 Christoph Hellwig
  2020-01-14 16:12 ` [PATCH 01/12] mm: fix a comment in sys_swapon Christoph Hellwig
                   ` (15 more replies)
  0 siblings, 16 replies; 39+ messages in thread
From: Christoph Hellwig @ 2020-01-14 16:12 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel, Waiman Long, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Will Deacon, Andrew Morton,
	linux-ext4, cluster-devel
  Cc: linux-kernel, linux-mm

Hi all,

Asynchronous read/write operations currently use a rather magic locking
scheme, were access to file data is normally protected using a rw_semaphore,
but if we are doing aio where the syscall returns to userspace before the
I/O has completed we also use an atomic_t to track the outstanding aio
ops.  This scheme has lead to lots of subtle bugs in file systems where
didn't wait to the count to reach zero, and due to its adhoc nature also
means we have to serialize direct I/O writes that are smaller than the
file system block size.

All this is solved by releasing i_rwsem only when the I/O has actually
completed, but doings so is against to mantras of Linux locking primites:

 (1) no unlocking by another process than the one that acquired it
 (2) no return to userspace with locks held

It actually happens we have various places that work around this.  A few
callers do non-owner unlocks of rwsems, which are pretty nasty for
PREEMPT_RT as the owner tracking doesn't work.  OTOH the file system
freeze code has both problems and works around them a little better,
although in a somewhat awkward way, in that it releases the lockdep
object when returning to userspace, and reacquires it when done, and
also clears the rwsem owner when returning to userspace, and then sets
the new onwer before unlocking.

This series tries to follow that scheme, also it doesn't fully work.  The
first issue is that the rwsem code has a bug where it doesn't properly
handle clearing the owner.  This series has a patch to fix that, but it
is ugly and might not be correct so some help is needed.  Second I/O
completions often come from interrupt context, which means the re-acquire
is recorded as from irq context, leading to warnings about incorrect
contexts.  I wonder if we could just have a bit in lockdep that says
returning to userspace is ok for this particular lock?  That would also
clean up the fsfreeze situation a lot.

Let me know what you think of all this.  While I converted all the iomap
using file systems only XFS is actually tested.

Diffstat:

 24 files changed, 144 insertions(+), 180 deletions(-)

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, back to index

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-14 16:12 RFC: hold i_rwsem until aio completes Christoph Hellwig
2020-01-14 16:12 ` [PATCH 01/12] mm: fix a comment in sys_swapon Christoph Hellwig
2020-02-10 23:29   ` Andrew Morton
2020-02-12  7:37     ` Christoph Hellwig
2020-01-14 16:12 ` [PATCH 02/12] locking/rwsem: Exit early when held by an anonymous owner Christoph Hellwig
2020-01-14 18:17   ` Waiman Long
2020-01-14 18:25     ` Christoph Hellwig
2020-01-14 18:33       ` Waiman Long
2020-01-14 18:55       ` Waiman Long
2020-01-14 16:12 ` [PATCH 03/12] xfs: fix IOCB_NOWAIT handling in xfs_file_dio_aio_read Christoph Hellwig
2020-01-14 16:12 ` [PATCH 04/12] gfs2: move setting current->backing_dev_info Christoph Hellwig
2020-01-14 16:12 ` [PATCH 05/12] gfs2: fix O_SYNC write handling Christoph Hellwig
2020-02-06 15:31   ` [Cluster-devel] " Andreas Gruenbacher
2020-01-14 16:12 ` [PATCH 06/12] iomap: pass a flags value to iomap_dio_rw Christoph Hellwig
2020-01-14 16:12 ` [PATCH 07/12] iomap: allow holding i_rwsem until aio completion Christoph Hellwig
2020-01-14 16:12 ` [PATCH 08/12] ext4: hold i_rwsem until AIO completes Christoph Hellwig
2020-01-14 21:50   ` Theodore Y. Ts'o
2020-01-15  6:48     ` Christoph Hellwig
2020-01-14 16:12 ` [PATCH 09/12] gfs2: " Christoph Hellwig
2020-01-14 16:12 ` [PATCH 10/12] xfs: " Christoph Hellwig
2020-01-14 16:12 ` [PATCH 11/12] xfs: don't set IOMAP_DIO_SYNCHRONOUS for unaligned I/O Christoph Hellwig
2020-01-14 16:12 ` [PATCH 12/12] iomap: remove the inode_dio_begin/end calls Christoph Hellwig
2020-01-14 18:47 ` RFC: hold i_rwsem until aio completes Matthew Wilcox
2020-01-15  6:54   ` Christoph Hellwig
2020-01-14 19:27 ` Jason Gunthorpe
2020-01-15  6:56   ` Christoph Hellwig
2020-01-15 13:24     ` Jason Gunthorpe
2020-01-15 14:33       ` Peter Zijlstra
2020-01-15 14:49         ` Jason Gunthorpe
2020-01-15 19:03           ` Waiman Long
2020-01-15 19:07             ` Christoph Hellwig
2020-01-18 22:40         ` Matthew Wilcox
2020-01-15 15:36       ` Christoph Hellwig
2020-01-15 16:26         ` Jason Gunthorpe
2020-01-16 14:00 ` Jan Kara
2020-02-03 17:44   ` Christoph Hellwig
2020-01-18  9:28 ` Dave Chinner
2020-02-03 17:46   ` Christoph Hellwig
2020-02-03 23:02     ` Dave Chinner

Linux-XFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-xfs/0 linux-xfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-xfs linux-xfs/ https://lore.kernel.org/linux-xfs \
		linux-xfs@vger.kernel.org
	public-inbox-index linux-xfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-xfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git