All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] gfs2: Prevent endless loops in gfs2_file_buffered_write
@ 2021-11-10 17:44 ` Andreas Gruenbacher
  0 siblings, 0 replies; 4+ messages in thread
From: Andreas Gruenbacher @ 2021-11-10 17:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andreas Gruenbacher, Alexander Viro, Catalin Marinas,
	linux-fsdevel, cluster-devel

Hi Linus,

in commit 00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for
buffered I/O"), I've managed to introduce a hang in gfs2 due to the
following check in iomap_write_iter:

  if (unlikely(fault_in_iov_iter_readable(i, bytes))) {

which fails if any of the iov iterator cannot be faulted in for reading.
At the filesystem level, we're retrying the rest of the write if any of
the iov iterator can be faulted in, so we can end up in a loop without
ever making progress.  The fix in iomap_write_iter would be as follows:

  if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) {

The same bug exists in generic_perform_write, but I'm not aware of any
callers of generic_perform_write that have page faults turned off.

Is this fix still appropriate for 5.16, or should we work around it in
the filesystem as below for now?

A related post-5.16 option would be to turn the pre-faulting in
iomap_write_iter and generic_perform_write into post-faulting, but at
the very least, that still needs a bit of performance analysis:

  https://lore.kernel.org/linux-fsdevel/20211026094430.3669156-1-agruenba@redhat.com/
  https://lore.kernel.org/linux-fsdevel/20211027212138.3722977-1-agruenba@redhat.com/

Thanks,
Andreas

---
 fs/gfs2/file.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index c486b702e00f..3e718cfc19a7 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1013,6 +1013,7 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
 	struct gfs2_holder *statfs_gh = NULL;
 	size_t prev_count = 0, window_size = 0;
+	size_t orig_count = iov_iter_count(from);
 	size_t read = 0;
 	ssize_t ret;
 
@@ -1057,6 +1058,7 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
 	if (inode == sdp->sd_rindex)
 		gfs2_glock_dq_uninit(statfs_gh);
 
+	from->count = orig_count - read;
 	if (should_fault_in_pages(ret, from, &prev_count, &window_size)) {
 		size_t leftover;
 
@@ -1064,6 +1066,7 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
 		leftover = fault_in_iov_iter_readable(from, window_size);
 		gfs2_holder_disallow_demote(gh);
 		if (leftover != window_size) {
+			from->count = min(from->count, window_size - leftover);
 			if (!gfs2_holder_queued(gh)) {
 				if (read)
 					goto out_uninit;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [Cluster-devel] [RFC] gfs2: Prevent endless loops in gfs2_file_buffered_write
@ 2021-11-10 17:44 ` Andreas Gruenbacher
  0 siblings, 0 replies; 4+ messages in thread
From: Andreas Gruenbacher @ 2021-11-10 17:44 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Linus,

in commit 00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for
buffered I/O"), I've managed to introduce a hang in gfs2 due to the
following check in iomap_write_iter:

  if (unlikely(fault_in_iov_iter_readable(i, bytes))) {

which fails if any of the iov iterator cannot be faulted in for reading.
At the filesystem level, we're retrying the rest of the write if any of
the iov iterator can be faulted in, so we can end up in a loop without
ever making progress.  The fix in iomap_write_iter would be as follows:

  if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) {

The same bug exists in generic_perform_write, but I'm not aware of any
callers of generic_perform_write that have page faults turned off.

Is this fix still appropriate for 5.16, or should we work around it in
the filesystem as below for now?

A related post-5.16 option would be to turn the pre-faulting in
iomap_write_iter and generic_perform_write into post-faulting, but at
the very least, that still needs a bit of performance analysis:

  https://lore.kernel.org/linux-fsdevel/20211026094430.3669156-1-agruenba at redhat.com/
  https://lore.kernel.org/linux-fsdevel/20211027212138.3722977-1-agruenba at redhat.com/

Thanks,
Andreas

---
 fs/gfs2/file.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index c486b702e00f..3e718cfc19a7 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1013,6 +1013,7 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
 	struct gfs2_holder *statfs_gh = NULL;
 	size_t prev_count = 0, window_size = 0;
+	size_t orig_count = iov_iter_count(from);
 	size_t read = 0;
 	ssize_t ret;
 
@@ -1057,6 +1058,7 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
 	if (inode == sdp->sd_rindex)
 		gfs2_glock_dq_uninit(statfs_gh);
 
+	from->count = orig_count - read;
 	if (should_fault_in_pages(ret, from, &prev_count, &window_size)) {
 		size_t leftover;
 
@@ -1064,6 +1066,7 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb,
 		leftover = fault_in_iov_iter_readable(from, window_size);
 		gfs2_holder_disallow_demote(gh);
 		if (leftover != window_size) {
+			from->count = min(from->count, window_size - leftover);
 			if (!gfs2_holder_queued(gh)) {
 				if (read)
 					goto out_uninit;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC] gfs2: Prevent endless loops in gfs2_file_buffered_write
  2021-11-10 17:44 ` [Cluster-devel] " Andreas Gruenbacher
@ 2021-11-12 19:15   ` Catalin Marinas
  -1 siblings, 0 replies; 4+ messages in thread
From: Catalin Marinas @ 2021-11-12 19:15 UTC (permalink / raw)
  To: Andreas Gruenbacher
  Cc: Linus Torvalds, Alexander Viro, linux-fsdevel, cluster-devel

Hi Andreas,

On Wed, Nov 10, 2021 at 06:44:57PM +0100, Andreas Gruenbacher wrote:
> in commit 00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for
> buffered I/O"), I've managed to introduce a hang in gfs2 due to the
> following check in iomap_write_iter:
> 
>   if (unlikely(fault_in_iov_iter_readable(i, bytes))) {
> 
> which fails if any of the iov iterator cannot be faulted in for reading.
> At the filesystem level, we're retrying the rest of the write if any of
> the iov iterator can be faulted in, so we can end up in a loop without
> ever making progress.  The fix in iomap_write_iter would be as follows:
> 
>   if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) {

My preference would be to check against the 'bytes' returned as above.
This allows the write to progress as much as possible rather than
stopping if any of the iovs cannot be faulted in. It would be more
consistent for MTE as well if we keep the fault-in check at the
beginning of the user buffers only. I mentioned it here:

https://lore.kernel.org/all/YYQk9L0D57QHc0gE@arm.com/

> The same bug exists in generic_perform_write, but I'm not aware of any
> callers of generic_perform_write that have page faults turned off.

Similar reason as above, though one may argue it's a slight ABI change.

> A related post-5.16 option would be to turn the pre-faulting in
> iomap_write_iter and generic_perform_write into post-faulting, but at
> the very least, that still needs a bit of performance analysis:
> 
>   https://lore.kernel.org/linux-fsdevel/20211026094430.3669156-1-agruenba@redhat.com/
>   https://lore.kernel.org/linux-fsdevel/20211027212138.3722977-1-agruenba@redhat.com/

I don't think that's urgent. At least generic_perform_write() will make
progress with a fault-in that checks the beginning of the buffer, even
with sub-page faults. For fault_in_iov_writable() I'll add an arch
callback, probe_user_writable_safe() or something (hopefully next week).

There is the direct I/O case but IIUC the user buffer is accessed via
the kernel mapping (kmap) and that cannot fault on access. I may have
missed something though.

For the search_ioctl() function in btrfs I thought of introducing
fault_in_writable_exact() that checks each sub-page granule in the arch
callback. This shouldn't be used on performance critical paths.

-- 
Catalin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Cluster-devel] [RFC] gfs2: Prevent endless loops in gfs2_file_buffered_write
@ 2021-11-12 19:15   ` Catalin Marinas
  0 siblings, 0 replies; 4+ messages in thread
From: Catalin Marinas @ 2021-11-12 19:15 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Andreas,

On Wed, Nov 10, 2021 at 06:44:57PM +0100, Andreas Gruenbacher wrote:
> in commit 00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for
> buffered I/O"), I've managed to introduce a hang in gfs2 due to the
> following check in iomap_write_iter:
> 
>   if (unlikely(fault_in_iov_iter_readable(i, bytes))) {
> 
> which fails if any of the iov iterator cannot be faulted in for reading.
> At the filesystem level, we're retrying the rest of the write if any of
> the iov iterator can be faulted in, so we can end up in a loop without
> ever making progress.  The fix in iomap_write_iter would be as follows:
> 
>   if (unlikely(fault_in_iov_iter_readable(i, bytes) == bytes)) {

My preference would be to check against the 'bytes' returned as above.
This allows the write to progress as much as possible rather than
stopping if any of the iovs cannot be faulted in. It would be more
consistent for MTE as well if we keep the fault-in check at the
beginning of the user buffers only. I mentioned it here:

https://lore.kernel.org/all/YYQk9L0D57QHc0gE at arm.com/

> The same bug exists in generic_perform_write, but I'm not aware of any
> callers of generic_perform_write that have page faults turned off.

Similar reason as above, though one may argue it's a slight ABI change.

> A related post-5.16 option would be to turn the pre-faulting in
> iomap_write_iter and generic_perform_write into post-faulting, but at
> the very least, that still needs a bit of performance analysis:
> 
>   https://lore.kernel.org/linux-fsdevel/20211026094430.3669156-1-agruenba at redhat.com/
>   https://lore.kernel.org/linux-fsdevel/20211027212138.3722977-1-agruenba at redhat.com/

I don't think that's urgent. At least generic_perform_write() will make
progress with a fault-in that checks the beginning of the buffer, even
with sub-page faults. For fault_in_iov_writable() I'll add an arch
callback, probe_user_writable_safe() or something (hopefully next week).

There is the direct I/O case but IIUC the user buffer is accessed via
the kernel mapping (kmap) and that cannot fault on access. I may have
missed something though.

For the search_ioctl() function in btrfs I thought of introducing
fault_in_writable_exact() that checks each sub-page granule in the arch
callback. This shouldn't be used on performance critical paths.

-- 
Catalin



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-11-12 19:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-10 17:44 [RFC] gfs2: Prevent endless loops in gfs2_file_buffered_write Andreas Gruenbacher
2021-11-10 17:44 ` [Cluster-devel] " Andreas Gruenbacher
2021-11-12 19:15 ` Catalin Marinas
2021-11-12 19:15   ` [Cluster-devel] " Catalin Marinas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.