All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Gruenbacher <agruenba@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Paul Mackerras <paulus@ozlabs.org>, Jan Kara <jack@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	cluster-devel <cluster-devel@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	ocfs2-devel@oss.oracle.com, kvm-ppc@vger.kernel.org
Subject: Re: [PATCH v5 00/12] gfs2: Fix mmap + page fault deadlocks
Date: Thu, 19 Aug 2021 23:39:56 +0200	[thread overview]
Message-ID: <CAHc6FU6a8SLmHfMoS7NUDKboWpVEGBKyC46pU_brx3y8crbEXA@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wgumKBhggjyR7Ff6V8VKxaJK1yA-LpWdzZFSqFyqYq0Dw@mail.gmail.com>

On Thu, Aug 19, 2021 at 10:14 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Aug 19, 2021 at 12:41 PM Andreas Gruenbacher
> <agruenba@redhat.com> wrote:
> >
> > Hmm, what if GUP is made to skip VM_IO vmas without adding anything to
> > the pages array? That would match fault_in_iov_iter_writeable, which
> > is modeled after __mm_populate and which skips VM_IO and VM_PFNMAP
> > vmas.
>
> I don't understand what you mean.. GUP already skips VM_IO (and
> VM_PFNMAP) pages. It just returns EFAULT.
>
> We could make it return another error. We already have DAX and
> FOLL_LONGTERM returning -EOPNOTSUPP.
>
> Of course, I think some code ends up always just returning "number of
> pages looked up" and might return 0 for "no pages" rather than the
> error for the first page.
>
> So we may end up having interfaces that then lose that explanation
> error code, but I didn't check.
>
> But we couldn't make it just say "skip them and try later addresses",
> if that is what you meant. THAT makes no sense - that would just make
> GUP look up some other address than what was asked for.

get_user_pages has a start and a nr_pages argument, which specifies an
address range from start to start + nr_pages * PAGE_SIZE. If pages !=
NULL, it adds a pointer to that array for each PAGE_SIZE subpage. I
was thinking of skipping over VM_IO vmas in that process, so when the
range starts in a mappable vma, runs into a VM_IO vma, and ends in a
mappable vma, the pages in the pages array would be discontiguous;
they would only cover the mappable vmas. But that would make it
difficult to make sense of what's in the pages array. So scratch that
idea.

> > > I also do still think that even regardless of that, we want to just
> > > add a FOLL_NOFAULT flag that just disables calling handle_mm_fault(),
> > > and then you can use the regular get_user_pages().
> > >
> > > That at least gives us the full _normal_ page handling stuff.
> >
> > And it does fix the generic/208 failure.
>
> Good. So I think the approach is usable, even if we might have corner
> cases left.
>
> So I think the remaining issue is exactly things like VM_IO and
> VM_PFNMAP. Do the fstests have test-cases for things like this? It
> _is_ quite specialized, it might be a good idea to have that.
>
> Of course, doing direct-IO from special memory regions with zerocopy
> might be something special people actually want to do. But I think
> we've had that VM_IO flag testing there basically forever, so I don't
> think it has ever worked (for some definition of "ever").

The v6 patch queue should handle those cases acceptably well for now,
but I don't think we have tests covering that at all.

Thanks,
Andreas


WARNING: multiple messages have this Message-ID (diff)
From: Andreas Gruenbacher <agruenba@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: kvm-ppc@vger.kernel.org, Paul Mackerras <paulus@ozlabs.org>,
	cluster-devel <cluster-devel@redhat.com>, Jan Kara <jack@suse.cz>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] [PATCH v5 00/12] gfs2: Fix mmap + page fault deadlocks
Date: Thu, 19 Aug 2021 23:39:56 +0200	[thread overview]
Message-ID: <CAHc6FU6a8SLmHfMoS7NUDKboWpVEGBKyC46pU_brx3y8crbEXA@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wgumKBhggjyR7Ff6V8VKxaJK1yA-LpWdzZFSqFyqYq0Dw@mail.gmail.com>

On Thu, Aug 19, 2021 at 10:14 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Aug 19, 2021 at 12:41 PM Andreas Gruenbacher
> <agruenba@redhat.com> wrote:
> >
> > Hmm, what if GUP is made to skip VM_IO vmas without adding anything to
> > the pages array? That would match fault_in_iov_iter_writeable, which
> > is modeled after __mm_populate and which skips VM_IO and VM_PFNMAP
> > vmas.
>
> I don't understand what you mean.. GUP already skips VM_IO (and
> VM_PFNMAP) pages. It just returns EFAULT.
>
> We could make it return another error. We already have DAX and
> FOLL_LONGTERM returning -EOPNOTSUPP.
>
> Of course, I think some code ends up always just returning "number of
> pages looked up" and might return 0 for "no pages" rather than the
> error for the first page.
>
> So we may end up having interfaces that then lose that explanation
> error code, but I didn't check.
>
> But we couldn't make it just say "skip them and try later addresses",
> if that is what you meant. THAT makes no sense - that would just make
> GUP look up some other address than what was asked for.

get_user_pages has a start and a nr_pages argument, which specifies an
address range from start to start + nr_pages * PAGE_SIZE. If pages !=
NULL, it adds a pointer to that array for each PAGE_SIZE subpage. I
was thinking of skipping over VM_IO vmas in that process, so when the
range starts in a mappable vma, runs into a VM_IO vma, and ends in a
mappable vma, the pages in the pages array would be discontiguous;
they would only cover the mappable vmas. But that would make it
difficult to make sense of what's in the pages array. So scratch that
idea.

> > > I also do still think that even regardless of that, we want to just
> > > add a FOLL_NOFAULT flag that just disables calling handle_mm_fault(),
> > > and then you can use the regular get_user_pages().
> > >
> > > That at least gives us the full _normal_ page handling stuff.
> >
> > And it does fix the generic/208 failure.
>
> Good. So I think the approach is usable, even if we might have corner
> cases left.
>
> So I think the remaining issue is exactly things like VM_IO and
> VM_PFNMAP. Do the fstests have test-cases for things like this? It
> _is_ quite specialized, it might be a good idea to have that.
>
> Of course, doing direct-IO from special memory regions with zerocopy
> might be something special people actually want to do. But I think
> we've had that VM_IO flag testing there basically forever, so I don't
> think it has ever worked (for some definition of "ever").

The v6 patch queue should handle those cases acceptably well for now,
but I don't think we have tests covering that at all.

Thanks,
Andreas


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

WARNING: multiple messages have this Message-ID (diff)
From: Andreas Gruenbacher <agruenba@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH v5 00/12] gfs2: Fix mmap + page fault deadlocks
Date: Thu, 19 Aug 2021 23:39:56 +0200	[thread overview]
Message-ID: <CAHc6FU6a8SLmHfMoS7NUDKboWpVEGBKyC46pU_brx3y8crbEXA@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wgumKBhggjyR7Ff6V8VKxaJK1yA-LpWdzZFSqFyqYq0Dw@mail.gmail.com>

On Thu, Aug 19, 2021 at 10:14 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Aug 19, 2021 at 12:41 PM Andreas Gruenbacher
> <agruenba@redhat.com> wrote:
> >
> > Hmm, what if GUP is made to skip VM_IO vmas without adding anything to
> > the pages array? That would match fault_in_iov_iter_writeable, which
> > is modeled after __mm_populate and which skips VM_IO and VM_PFNMAP
> > vmas.
>
> I don't understand what you mean.. GUP already skips VM_IO (and
> VM_PFNMAP) pages. It just returns EFAULT.
>
> We could make it return another error. We already have DAX and
> FOLL_LONGTERM returning -EOPNOTSUPP.
>
> Of course, I think some code ends up always just returning "number of
> pages looked up" and might return 0 for "no pages" rather than the
> error for the first page.
>
> So we may end up having interfaces that then lose that explanation
> error code, but I didn't check.
>
> But we couldn't make it just say "skip them and try later addresses",
> if that is what you meant. THAT makes no sense - that would just make
> GUP look up some other address than what was asked for.

get_user_pages has a start and a nr_pages argument, which specifies an
address range from start to start + nr_pages * PAGE_SIZE. If pages !=
NULL, it adds a pointer to that array for each PAGE_SIZE subpage. I
was thinking of skipping over VM_IO vmas in that process, so when the
range starts in a mappable vma, runs into a VM_IO vma, and ends in a
mappable vma, the pages in the pages array would be discontiguous;
they would only cover the mappable vmas. But that would make it
difficult to make sense of what's in the pages array. So scratch that
idea.

> > > I also do still think that even regardless of that, we want to just
> > > add a FOLL_NOFAULT flag that just disables calling handle_mm_fault(),
> > > and then you can use the regular get_user_pages().
> > >
> > > That at least gives us the full _normal_ page handling stuff.
> >
> > And it does fix the generic/208 failure.
>
> Good. So I think the approach is usable, even if we might have corner
> cases left.
>
> So I think the remaining issue is exactly things like VM_IO and
> VM_PFNMAP. Do the fstests have test-cases for things like this? It
> _is_ quite specialized, it might be a good idea to have that.
>
> Of course, doing direct-IO from special memory regions with zerocopy
> might be something special people actually want to do. But I think
> we've had that VM_IO flag testing there basically forever, so I don't
> think it has ever worked (for some definition of "ever").

The v6 patch queue should handle those cases acceptably well for now,
but I don't think we have tests covering that at all.

Thanks,
Andreas



WARNING: multiple messages have this Message-ID (diff)
From: Andreas Gruenbacher <agruenba@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Paul Mackerras <paulus@ozlabs.org>, Jan Kara <jack@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	cluster-devel <cluster-devel@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	ocfs2-devel@oss.oracle.com, kvm-ppc@vger.kernel.org
Subject: Re: [PATCH v5 00/12] gfs2: Fix mmap + page fault deadlocks
Date: Thu, 19 Aug 2021 21:39:56 +0000	[thread overview]
Message-ID: <CAHc6FU6a8SLmHfMoS7NUDKboWpVEGBKyC46pU_brx3y8crbEXA@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wgumKBhggjyR7Ff6V8VKxaJK1yA-LpWdzZFSqFyqYq0Dw@mail.gmail.com>

On Thu, Aug 19, 2021 at 10:14 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Aug 19, 2021 at 12:41 PM Andreas Gruenbacher
> <agruenba@redhat.com> wrote:
> >
> > Hmm, what if GUP is made to skip VM_IO vmas without adding anything to
> > the pages array? That would match fault_in_iov_iter_writeable, which
> > is modeled after __mm_populate and which skips VM_IO and VM_PFNMAP
> > vmas.
>
> I don't understand what you mean.. GUP already skips VM_IO (and
> VM_PFNMAP) pages. It just returns EFAULT.
>
> We could make it return another error. We already have DAX and
> FOLL_LONGTERM returning -EOPNOTSUPP.
>
> Of course, I think some code ends up always just returning "number of
> pages looked up" and might return 0 for "no pages" rather than the
> error for the first page.
>
> So we may end up having interfaces that then lose that explanation
> error code, but I didn't check.
>
> But we couldn't make it just say "skip them and try later addresses",
> if that is what you meant. THAT makes no sense - that would just make
> GUP look up some other address than what was asked for.

get_user_pages has a start and a nr_pages argument, which specifies an
address range from start to start + nr_pages * PAGE_SIZE. If pages !NULL, it adds a pointer to that array for each PAGE_SIZE subpage. I
was thinking of skipping over VM_IO vmas in that process, so when the
range starts in a mappable vma, runs into a VM_IO vma, and ends in a
mappable vma, the pages in the pages array would be discontiguous;
they would only cover the mappable vmas. But that would make it
difficult to make sense of what's in the pages array. So scratch that
idea.

> > > I also do still think that even regardless of that, we want to just
> > > add a FOLL_NOFAULT flag that just disables calling handle_mm_fault(),
> > > and then you can use the regular get_user_pages().
> > >
> > > That at least gives us the full _normal_ page handling stuff.
> >
> > And it does fix the generic/208 failure.
>
> Good. So I think the approach is usable, even if we might have corner
> cases left.
>
> So I think the remaining issue is exactly things like VM_IO and
> VM_PFNMAP. Do the fstests have test-cases for things like this? It
> _is_ quite specialized, it might be a good idea to have that.
>
> Of course, doing direct-IO from special memory regions with zerocopy
> might be something special people actually want to do. But I think
> we've had that VM_IO flag testing there basically forever, so I don't
> think it has ever worked (for some definition of "ever").

The v6 patch queue should handle those cases acceptably well for now,
but I don't think we have tests covering that at all.

Thanks,
Andreas

  reply	other threads:[~2021-08-19 21:40 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-03 19:18 [PATCH v5 00/12] gfs2: Fix mmap + page fault deadlocks Andreas Gruenbacher
2021-08-03 19:18 ` Andreas Gruenbacher
2021-08-03 19:18 ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 01/12] iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] [PATCH v5 01/12] iov_iter: Fix iov_iter_get_pages{, _alloc} " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 02/12] powerpc/kvm: Fix kvm_use_magic_page Andreas Gruenbacher
2021-08-03 19:18   ` Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 03/12] Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] [PATCH v5 03/12] Turn fault_in_pages_{readable, writeable} into fault_in_{readable, writeable} Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:43   ` [PATCH v5 03/12] Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} Linus Torvalds
2021-08-03 19:43     ` [Cluster-devel] [PATCH v5 03/12] Turn fault_in_pages_{readable, writeable} into fault_in_{readable, writeable} Linus Torvalds
2021-08-03 19:43     ` [Ocfs2-devel] " Linus Torvalds
2021-08-03 20:57   ` [PATCH v5 03/12] Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} Al Viro
2021-08-03 20:57     ` [Cluster-devel] [PATCH v5 03/12] Turn fault_in_pages_{readable, writeable} into fault_in_{readable, writeable} Al Viro
2021-08-03 20:57     ` [Ocfs2-devel] " Al Viro
2021-08-03 21:38     ` [PATCH v5 03/12] Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} Andreas Gruenbacher
2021-08-03 21:38       ` [Cluster-devel] [PATCH v5 03/12] Turn fault_in_pages_{readable, writeable} into fault_in_{readable, writeable} Andreas Gruenbacher
2021-08-03 21:38       ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 04/12] Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 05/12] iov_iter: Introduce fault_in_iov_iter_writeable Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-04 15:24   ` Andreas Gruenbacher
2021-08-04 15:24     ` [Cluster-devel] " Andreas Gruenbacher
2021-08-04 15:24     ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 06/12] gfs2: Add wrapper for iomap_file_buffered_write Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 07/12] gfs2: Fix mmap + page fault deadlocks for buffered I/O Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 08/12] iomap: Fix iomap_dio_rw return value for user copies Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 09/12] iomap: Support restarting direct I/O requests after user copy failures Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 10/12] iomap: Add done_before argument to iomap_dio_rw Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 11/12] iov_iter: Introduce noio flag to disable page faults Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 12/12] gfs2: Fix mmap + page fault deadlocks for direct I/O Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:45 ` [PATCH v5 00/12] gfs2: Fix mmap + page fault deadlocks Linus Torvalds
2021-08-03 19:45   ` Linus Torvalds
2021-08-03 19:45   ` [Cluster-devel] " Linus Torvalds
2021-08-03 19:45   ` [Ocfs2-devel] " Linus Torvalds
2021-08-16 19:14   ` Andreas Gruenbacher
2021-08-16 19:14     ` Andreas Gruenbacher
2021-08-16 19:14     ` [Cluster-devel] " Andreas Gruenbacher
2021-08-16 19:14     ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-18 21:49     ` Linus Torvalds
2021-08-18 21:49       ` Linus Torvalds
2021-08-18 21:49       ` [Cluster-devel] " Linus Torvalds
2021-08-18 21:49       ` [Ocfs2-devel] " Linus Torvalds
2021-08-19 19:40       ` Andreas Gruenbacher
2021-08-19 19:40         ` Andreas Gruenbacher
2021-08-19 19:40         ` [Cluster-devel] " Andreas Gruenbacher
2021-08-19 19:40         ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-19 20:14         ` Linus Torvalds
2021-08-19 20:14           ` Linus Torvalds
2021-08-19 20:14           ` [Cluster-devel] " Linus Torvalds
2021-08-19 20:14           ` [Ocfs2-devel] " Linus Torvalds
2021-08-19 21:39           ` Andreas Gruenbacher [this message]
2021-08-19 21:39             ` Andreas Gruenbacher
2021-08-19 21:39             ` [Cluster-devel] " Andreas Gruenbacher
2021-08-19 21:39             ` [Ocfs2-devel] " Andreas Gruenbacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHc6FU6a8SLmHfMoS7NUDKboWpVEGBKyC46pU_brx3y8crbEXA@mail.gmail.com \
    --to=agruenba@redhat.com \
    --cc=cluster-devel@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=paulus@ozlabs.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.