linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Gruenbacher <agruenba@redhat.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>,
	"Darrick J. Wong" <djwong@kernel.org>, Jan Kara <jack@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	cluster-devel <cluster-devel@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"ocfs2-devel@oss.oracle.com" <ocfs2-devel@oss.oracle.com>,
	Josef Bacik <josef@toxicpanda.com>, Will Deacon <will@kernel.org>
Subject: Re: [RFC][arm64] possible infinite loop in btrfs search_ioctl()
Date: Thu, 21 Oct 2021 16:42:33 +0200	[thread overview]
Message-ID: <CAHc6FU5xTMOxuiEDyc9VO_V98=bvoDc-0OFi4jsGPgWJWjRJWQ@mail.gmail.com> (raw)
In-Reply-To: <YXE7fhDkqJbfDk6e@arm.com>

On Thu, Oct 21, 2021 at 12:06 PM Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On Thu, Oct 21, 2021 at 02:46:10AM +0200, Andreas Gruenbacher wrote:
> > On Tue, Oct 12, 2021 at 1:59 AM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > > On Mon, Oct 11, 2021 at 2:08 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > >
> > > > +#ifdef CONFIG_ARM64_MTE
> > > > +#define FAULT_GRANULE_SIZE     (16)
> > > > +#define FAULT_GRANULE_MASK     (~(FAULT_GRANULE_SIZE-1))
> > >
> > > [...]
> > >
> > > > If this looks in the right direction, I'll do some proper patches
> > > > tomorrow.
> > >
> > > Looks fine to me. It's going to be quite expensive and bad for caches, though.
> > >
> > > That said, fault_in_writable() is _supposed_ to all be for the slow
> > > path when things go south and the normal path didn't work out, so I
> > > think it's fine.
> >
> > Let me get back to this; I'm actually not convinced that we need to
> > worry about sub-page-size fault granules in fault_in_pages_readable or
> > fault_in_pages_writeable.
> >
> > From a filesystem point of view, we can get into trouble when a
> > user-space read or write triggers a page fault while we're holding
> > filesystem locks, and that page fault ends up calling back into the
> > filesystem. To deal with that, we're performing those user-space
> > accesses with page faults disabled.
>
> Yes, this makes sense.
>
> > When a page fault would occur, we
> > get back an error instead, and then we try to fault in the offending
> > pages. If a page is resident and we still get a fault trying to access
> > it, trying to fault in the same page again isn't going to help and we
> > have a true error.
>
> You can't be sure the second fault is a true error. The unlocked
> fault_in_*() may race with some LRU scheme making the pte not accessible
> or a write-back making it clean/read-only. copy_to_user() with
> pagefault_disabled() fails again but that's a benign fault. The
> filesystem should re-attempt the fault-in (gup would correct the pte),
> disable page faults and copy_to_user(), potentially in an infinite loop.
> If you bail out on the second/third uaccess following a fault_in_*()
> call, you may get some unexpected errors (though very rare). Maybe the
> filesystems avoid this problem somehow but I couldn't figure it out.

Good point, we can indeed only bail out if both the user copy and the
fault-in fail.

But probing the entire memory range in fault domain granularity in the
page fault-in functions still doesn't actually make sense. Those
functions really only need to guarantee that we'll be able to make
progress eventually. From that point of view, it should be enough to
probe the first byte of the requested memory range, so when one of
those functions reports that the next N bytes should be accessible,
this really means that the first byte surely isn't permanently
inaccessible and that the rest is likely accessible. Functions
fault_in_readable and fault_in_writeable already work that way, so
this only leaves function fault_in_safe_writeable to worry about.

> > We're clearly looking at memory at a page
> > granularity; faults at a sub-page level don't matter at this level of
> > abstraction (but they do show similar error behavior). To avoid
> > getting stuck, when it gets a short result or -EFAULT, the filesystem
> > implements the following backoff strategy: first, it tries to fault in
> > a number of pages. When the read or write still doesn't make progress,
> > it scales back and faults in a single page. Finally, when that still
> > doesn't help, it gives up. This strategy is needed for actual page
> > faults, but it also handles sub-page faults appropriately as long as
> > the user-space access functions give sensible results.
>
> As I said above, I think with this approach there's a small chance of
> incorrectly reporting an error when the fault is recoverable. If you
> change it to an infinite loop, you'd run into the sub-page fault
> problem.

Yes, I see now, thanks.

> There are some places with such infinite loops: futex_wake_op(),
> search_ioctl() in the btrfs code. I still have to get my head around
> generic_perform_write() but I think we get away here because it faults
> in the page with a get_user() rather than gup (and copy_from_user() is
> guaranteed to make progress if any bytes can still be accessed).

Thanks,
Andreas


  reply	other threads:[~2021-10-21 14:42 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-27 16:49 [PATCH v7 00/19] gfs2: Fix mmap + page fault deadlocks Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 01/19] iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value Andreas Gruenbacher
2021-09-09 11:09   ` Christoph Hellwig
2021-08-27 16:49 ` [PATCH v7 02/19] powerpc/kvm: Fix kvm_use_magic_page Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 03/19] gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} Andreas Gruenbacher
2021-08-27 19:08   ` Al Viro
2021-09-03 14:56   ` Filipe Manana
2021-09-28 15:02     ` Andreas Gruenbacher
2021-09-28 16:37       ` Matthew Wilcox
2021-09-28 20:41         ` Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 04/19] iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable Andreas Gruenbacher
2021-08-27 18:53   ` Al Viro
2021-08-27 18:57     ` Linus Torvalds
2021-08-27 19:16       ` Al Viro
2021-08-27 20:56   ` Kari Argillander
2021-08-28 17:13     ` Linus Torvalds
2021-08-27 16:49 ` [PATCH v7 05/19] iov_iter: Introduce fault_in_iov_iter_writeable Andreas Gruenbacher
2021-08-27 18:49   ` Al Viro
2021-08-27 19:05     ` Linus Torvalds
2021-08-27 19:23       ` Al Viro
2021-08-27 19:33         ` Linus Torvalds
2021-08-27 19:37           ` Al Viro
2021-08-27 21:48             ` Al Viro
2021-08-27 21:57               ` Al Viro
2021-08-27 23:22                 ` Luck, Tony
2021-08-28  2:20                   ` Luck, Tony
2021-08-28 21:47                   ` Thomas Gleixner
2021-08-28 22:04                     ` Al Viro
2021-08-28 22:11                       ` Al Viro
2021-08-28 22:19                         ` Al Viro
2021-08-28 22:51                           ` Al Viro
2021-08-29 18:44                             ` Thomas Gleixner
2021-08-29 19:46                               ` Al Viro
2021-08-29 19:51                                 ` Thomas Gleixner
2021-08-28 22:20                         ` Tony Luck
2021-08-29  1:40                           ` Matthew Wilcox
2021-08-30 15:41                             ` Luck, Tony
2021-08-28 22:23                       ` Thomas Gleixner
2021-08-28 19:28               ` [RFC][arm64] possible infinite loop in btrfs search_ioctl() Al Viro
2021-08-31 13:54                 ` Catalin Marinas
2021-08-31 15:28                   ` Al Viro
2021-08-31 16:01                     ` Catalin Marinas
2021-10-11 17:37                     ` Catalin Marinas
2021-10-11 19:15                       ` Linus Torvalds
2021-10-11 21:08                         ` Catalin Marinas
2021-10-11 23:59                           ` Linus Torvalds
2021-10-12 17:27                             ` Catalin Marinas
2021-10-12 17:58                               ` Linus Torvalds
2021-10-18 17:13                                 ` Catalin Marinas
2021-10-21  0:46                             ` Andreas Gruenbacher
2021-10-21 10:05                               ` Catalin Marinas
2021-10-21 14:42                                 ` Andreas Gruenbacher [this message]
2021-10-21 17:09                                   ` Catalin Marinas
2021-10-21 18:00                                     ` Andreas Gruenbacher
2021-10-22 18:41                                       ` Catalin Marinas
2021-10-25 19:37                                         ` Andreas Gruenbacher
2021-10-22  2:30                                   ` Linus Torvalds
2021-10-22  9:34                                     ` Catalin Marinas
2021-08-29  0:58               ` [PATCH v7 05/19] iov_iter: Introduce fault_in_iov_iter_writeable Al Viro
2021-08-27 16:49 ` [PATCH v7 06/19] gfs2: Add wrapper for iomap_file_buffered_write Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 07/19] gfs2: Clean up function may_grant Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 08/19] gfs2: Eliminate vestigial HIF_FIRST Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 09/19] gfs2: Remove redundant check from gfs2_glock_dq Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 10/19] gfs2: Introduce flag for glock holder auto-demotion Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 11/19] gfs2: Move the inode glock locking to gfs2_file_buffered_write Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 12/19] gfs2: Eliminate ip->i_gh Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 13/19] gfs2: Fix mmap + page fault deadlocks for buffered I/O Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 14/19] iomap: Fix iomap_dio_rw return value for user copies Andreas Gruenbacher
2021-09-03 18:54   ` Darrick J. Wong
2021-09-09 11:17   ` Christoph Hellwig
2021-08-27 16:49 ` [PATCH v7 15/19] iomap: Support partial direct I/O on user copy failures Andreas Gruenbacher
2021-09-03 18:54   ` Darrick J. Wong
2021-09-09 11:20   ` Christoph Hellwig
2021-09-28 15:05     ` Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 16/19] iomap: Add done_before argument to iomap_dio_rw Andreas Gruenbacher
2021-08-27 18:30   ` Darrick J. Wong
2021-08-27 20:15     ` Andreas Gruenbacher
2021-08-27 21:32       ` Darrick J. Wong
2021-08-27 21:49         ` Andreas Grünbacher
2021-08-27 22:35         ` Linus Torvalds
2021-09-03 18:47           ` Darrick J. Wong
2021-09-03 18:53   ` Darrick J. Wong
2021-09-09 11:30   ` Christoph Hellwig
2021-09-09 17:22     ` Linus Torvalds
2021-09-10  7:36       ` Christoph Hellwig
2021-08-27 16:49 ` [PATCH v7 17/19] gup: Introduce FOLL_NOFAULT flag to disable page faults Andreas Gruenbacher
2021-09-09 11:36   ` Christoph Hellwig
2021-09-09 17:17     ` Linus Torvalds
2021-09-10  7:24       ` Christoph Hellwig
2021-08-27 16:49 ` [PATCH v7 18/19] iov_iter: Introduce nofault " Andreas Gruenbacher
2021-08-27 18:47   ` Al Viro
2021-08-27 19:56     ` Andreas Gruenbacher
2021-08-27 16:49 ` [PATCH v7 19/19] gfs2: Fix mmap + page fault deadlocks for direct I/O Andreas Gruenbacher
2021-08-27 17:16 ` [PATCH v7 00/19] gfs2: Fix mmap + page fault deadlocks Linus Torvalds
2021-09-01 19:52   ` Andreas Gruenbacher
2021-09-03 15:52     ` Linus Torvalds
2021-09-03 18:25       ` Al Viro
2021-09-03 18:47         ` Linus Torvalds
2021-09-03 15:07 ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHc6FU5xTMOxuiEDyc9VO_V98=bvoDc-0OFi4jsGPgWJWjRJWQ@mail.gmail.com' \
    --to=agruenba@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=cluster-devel@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).