From: Christoph Hellwig <hch@lst.de> To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, Waiman Long <longman@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, linux-ext4@vger.kernel.org, cluster-devel@redhat.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: RFC: hold i_rwsem until aio completes Date: Tue, 14 Jan 2020 17:12:13 +0100 [thread overview] Message-ID: <20200114161225.309792-1-hch@lst.de> (raw) Hi all, Asynchronous read/write operations currently use a rather magic locking scheme, were access to file data is normally protected using a rw_semaphore, but if we are doing aio where the syscall returns to userspace before the I/O has completed we also use an atomic_t to track the outstanding aio ops. This scheme has lead to lots of subtle bugs in file systems where didn't wait to the count to reach zero, and due to its adhoc nature also means we have to serialize direct I/O writes that are smaller than the file system block size. All this is solved by releasing i_rwsem only when the I/O has actually completed, but doings so is against to mantras of Linux locking primites: (1) no unlocking by another process than the one that acquired it (2) no return to userspace with locks held It actually happens we have various places that work around this. A few callers do non-owner unlocks of rwsems, which are pretty nasty for PREEMPT_RT as the owner tracking doesn't work. OTOH the file system freeze code has both problems and works around them a little better, although in a somewhat awkward way, in that it releases the lockdep object when returning to userspace, and reacquires it when done, and also clears the rwsem owner when returning to userspace, and then sets the new onwer before unlocking. This series tries to follow that scheme, also it doesn't fully work. The first issue is that the rwsem code has a bug where it doesn't properly handle clearing the owner. This series has a patch to fix that, but it is ugly and might not be correct so some help is needed. Second I/O completions often come from interrupt context, which means the re-acquire is recorded as from irq context, leading to warnings about incorrect contexts. I wonder if we could just have a bit in lockdep that says returning to userspace is ok for this particular lock? That would also clean up the fsfreeze situation a lot. Let me know what you think of all this. While I converted all the iomap using file systems only XFS is actually tested. Diffstat: 24 files changed, 144 insertions(+), 180 deletions(-)
WARNING: multiple messages have this Message-ID (diff)
From: Christoph Hellwig <hch@lst.de> To: cluster-devel.redhat.com Subject: [Cluster-devel] RFC: hold i_rwsem until aio completes Date: Tue, 14 Jan 2020 17:12:13 +0100 [thread overview] Message-ID: <20200114161225.309792-1-hch@lst.de> (raw) Hi all, Asynchronous read/write operations currently use a rather magic locking scheme, were access to file data is normally protected using a rw_semaphore, but if we are doing aio where the syscall returns to userspace before the I/O has completed we also use an atomic_t to track the outstanding aio ops. This scheme has lead to lots of subtle bugs in file systems where didn't wait to the count to reach zero, and due to its adhoc nature also means we have to serialize direct I/O writes that are smaller than the file system block size. All this is solved by releasing i_rwsem only when the I/O has actually completed, but doings so is against to mantras of Linux locking primites: (1) no unlocking by another process than the one that acquired it (2) no return to userspace with locks held It actually happens we have various places that work around this. A few callers do non-owner unlocks of rwsems, which are pretty nasty for PREEMPT_RT as the owner tracking doesn't work. OTOH the file system freeze code has both problems and works around them a little better, although in a somewhat awkward way, in that it releases the lockdep object when returning to userspace, and reacquires it when done, and also clears the rwsem owner when returning to userspace, and then sets the new onwer before unlocking. This series tries to follow that scheme, also it doesn't fully work. The first issue is that the rwsem code has a bug where it doesn't properly handle clearing the owner. This series has a patch to fix that, but it is ugly and might not be correct so some help is needed. Second I/O completions often come from interrupt context, which means the re-acquire is recorded as from irq context, leading to warnings about incorrect contexts. I wonder if we could just have a bit in lockdep that says returning to userspace is ok for this particular lock? That would also clean up the fsfreeze situation a lot. Let me know what you think of all this. While I converted all the iomap using file systems only XFS is actually tested. Diffstat: 24 files changed, 144 insertions(+), 180 deletions(-)
next reply other threads:[~2020-01-14 16:12 UTC|newest] Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-01-14 16:12 Christoph Hellwig [this message] 2020-01-14 16:12 ` [Cluster-devel] RFC: hold i_rwsem until aio completes Christoph Hellwig 2020-01-14 16:12 ` [PATCH 01/12] mm: fix a comment in sys_swapon Christoph Hellwig 2020-01-14 16:12 ` [Cluster-devel] " Christoph Hellwig 2020-02-10 23:29 ` Andrew Morton 2020-02-10 23:29 ` [Cluster-devel] " Andrew Morton 2020-02-12 7:37 ` Christoph Hellwig 2020-02-12 7:37 ` [Cluster-devel] " Christoph Hellwig 2020-01-14 16:12 ` [PATCH 02/12] locking/rwsem: Exit early when held by an anonymous owner Christoph Hellwig 2020-01-14 16:12 ` [Cluster-devel] " Christoph Hellwig 2020-01-14 18:17 ` Waiman Long 2020-01-14 18:17 ` [Cluster-devel] " Waiman Long 2020-01-14 18:25 ` Christoph Hellwig 2020-01-14 18:25 ` [Cluster-devel] " Christoph Hellwig 2020-01-14 18:33 ` Waiman Long 2020-01-14 18:33 ` [Cluster-devel] " Waiman Long 2020-01-14 18:55 ` Waiman Long 2020-01-14 18:55 ` [Cluster-devel] " Waiman Long 2020-01-14 16:12 ` [PATCH 03/12] xfs: fix IOCB_NOWAIT handling in xfs_file_dio_aio_read Christoph Hellwig 2020-01-14 16:12 ` [Cluster-devel] " Christoph Hellwig 2020-01-14 16:12 ` [PATCH 04/12] gfs2: move setting current->backing_dev_info Christoph Hellwig 2020-01-14 16:12 ` [Cluster-devel] " Christoph Hellwig 2020-01-14 16:12 ` [PATCH 05/12] gfs2: fix O_SYNC write handling Christoph Hellwig 2020-01-14 16:12 ` [Cluster-devel] " Christoph Hellwig 2020-01-27 9:03 ` Christoph Hellwig 2020-01-28 16:57 ` Bob Peterson 2020-02-06 15:31 ` Andreas Gruenbacher 2020-02-06 15:31 ` Andreas Gruenbacher 2020-02-06 15:31 ` Andreas Gruenbacher 2020-01-14 16:12 ` [PATCH 06/12] iomap: pass a flags value to iomap_dio_rw Christoph Hellwig 2020-01-14 16:12 ` [Cluster-devel] " Christoph Hellwig 2020-01-14 16:12 ` [PATCH 07/12] iomap: allow holding i_rwsem until aio completion Christoph Hellwig 2020-01-14 16:12 ` [PATCH 08/12] ext4: hold i_rwsem until AIO completes Christoph Hellwig 2020-01-14 16:12 ` [Cluster-devel] " Christoph Hellwig 2020-01-14 21:50 ` Theodore Y. Ts'o 2020-01-14 21:50 ` [Cluster-devel] " Theodore Y. Ts'o 2020-01-15 6:48 ` Christoph Hellwig 2020-01-15 6:48 ` [Cluster-devel] " Christoph Hellwig 2020-01-14 16:12 ` [PATCH 09/12] gfs2: " Christoph Hellwig 2020-01-14 16:12 ` [Cluster-devel] " Christoph Hellwig 2020-01-14 16:12 ` [PATCH 10/12] xfs: " Christoph Hellwig 2020-01-14 16:12 ` [Cluster-devel] " Christoph Hellwig 2020-01-14 16:12 ` [PATCH 11/12] xfs: don't set IOMAP_DIO_SYNCHRONOUS for unaligned I/O Christoph Hellwig 2020-01-14 16:12 ` [Cluster-devel] " Christoph Hellwig 2020-01-14 16:12 ` [PATCH 12/12] iomap: remove the inode_dio_begin/end calls Christoph Hellwig 2020-01-14 16:12 ` [Cluster-devel] " Christoph Hellwig 2020-01-14 18:47 ` RFC: hold i_rwsem until aio completes Matthew Wilcox 2020-01-14 18:47 ` [Cluster-devel] " Matthew Wilcox 2020-01-15 6:54 ` Christoph Hellwig 2020-01-15 6:54 ` [Cluster-devel] " Christoph Hellwig 2020-01-14 19:27 ` Jason Gunthorpe 2020-01-14 19:27 ` [Cluster-devel] " Jason Gunthorpe 2020-01-15 6:56 ` Christoph Hellwig 2020-01-15 6:56 ` [Cluster-devel] " Christoph Hellwig 2020-01-15 13:24 ` Jason Gunthorpe 2020-01-15 13:24 ` [Cluster-devel] " Jason Gunthorpe 2020-01-15 14:33 ` Peter Zijlstra 2020-01-15 14:33 ` [Cluster-devel] " Peter Zijlstra 2020-01-15 14:49 ` Jason Gunthorpe 2020-01-15 14:49 ` [Cluster-devel] " Jason Gunthorpe 2020-01-15 19:03 ` Waiman Long 2020-01-15 19:03 ` [Cluster-devel] " Waiman Long 2020-01-15 19:07 ` Christoph Hellwig 2020-01-15 19:07 ` [Cluster-devel] " Christoph Hellwig 2020-01-18 22:40 ` Matthew Wilcox 2020-01-18 22:40 ` [Cluster-devel] " Matthew Wilcox 2020-01-15 15:36 ` Christoph Hellwig 2020-01-15 15:36 ` [Cluster-devel] " Christoph Hellwig 2020-01-15 16:26 ` Jason Gunthorpe 2020-01-15 16:26 ` [Cluster-devel] " Jason Gunthorpe 2020-01-16 14:00 ` Jan Kara 2020-01-16 14:00 ` [Cluster-devel] " Jan Kara 2020-02-03 17:44 ` Christoph Hellwig 2020-02-03 17:44 ` [Cluster-devel] " Christoph Hellwig 2020-01-18 9:28 ` Dave Chinner 2020-01-18 9:28 ` [Cluster-devel] " Dave Chinner 2020-02-03 17:46 ` Christoph Hellwig 2020-02-03 17:46 ` [Cluster-devel] " Christoph Hellwig 2020-02-03 23:02 ` Dave Chinner 2020-02-03 23:02 ` [Cluster-devel] " Dave Chinner
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200114161225.309792-1-hch@lst.de \ --to=hch@lst.de \ --cc=akpm@linux-foundation.org \ --cc=cluster-devel@redhat.com \ --cc=linux-ext4@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-xfs@vger.kernel.org \ --cc=longman@redhat.com \ --cc=mingo@redhat.com \ --cc=peterz@infradead.org \ --cc=tglx@linutronix.de \ --cc=will@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.