All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Gruenbacher <agruenba@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Paul Mackerras <paulus@ozlabs.org>, Jan Kara <jack@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	cluster-devel <cluster-devel@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	ocfs2-devel@oss.oracle.com, kvm-ppc@vger.kernel.org
Subject: Re: [PATCH v5 00/12] gfs2: Fix mmap + page fault deadlocks
Date: Thu, 19 Aug 2021 21:40:49 +0200	[thread overview]
Message-ID: <CAHc6FU5HHFwuJBCNuU0e_N0ehaFrzbUrCuTJyaLNC4qxwfazYA@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=whBCm3G5yibbvQsTn00fA16a688NTU_geQV158DnRy+bQ@mail.gmail.com>

On Wed, Aug 18, 2021 at 11:49 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> [ Sorry for the delay, I was on the road and this fell through the cracks ]

No harm done, I was busy enough implementing your previous suggestions.

> On Mon, Aug 16, 2021 at 12:14 PM Andreas Gruenbacher
> <agruenba@redhat.com> wrote:
> >
> > On Tue, Aug 3, 2021 at 9:45 PM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > Hmm. Have you tried to figure out why that "still returns 0" happens?
> >
> > The call stack is:
> >
> > gup_pte_range
> > gup_pmd_range
> > gup_pud_range
> > gup_p4d_range
> > gup_pgd_range
> > lockless_pages_from_mm
> > internal_get_user_pages_fast
> > get_user_pages_fast
> > iov_iter_get_pages
> > __bio_iov_iter_get_pages
> > bio_iov_iter_get_pages
> > iomap_dio_bio_actor
> > iomap_dio_actor
> > iomap_apply
> > iomap_dio_rw
> > gfs2_file_direct_write
> >
> > In gup_pte_range, pte_special(pte) is true and so we return 0.
>
> Ok, so that is indeed something that the fast-case can't handle,
> because some of the special code wants to have the mm_lock so that it
> can look at the vma flags (eg "vm_normal_page()" and friends.
>
> That said, some of these cases even the full GUP won't ever handle,
> simply because a mapping doesn't necessarily even _have_ a 'struct
> page' associated with it if it's a VM_IO mapping.
>
> So it turns out that you can't just always do
> fault_in_iov_iter_readable() and then assume that you can do
> iov_iter_get_pages() and repeat until successful.
>
> We could certainly make get_user_pages_fast() handle a few more cases,
> but I get the feeling that we need to have separate error cases for
> EFAULT - no page exists - and the "page exists, but cannot be mapped
> as a 'struct page'" case.

Hmm, what if GUP is made to skip VM_IO vmas without adding anything to
the pages array? That would match fault_in_iov_iter_writeable, which
is modeled after __mm_populate and which skips VM_IO and VM_PFNMAP
vmas.

The other strategy I've added is to scale back the page fault windows
to a single page if faulting in multiple pages didn't help, and to
give up if the I/O operation still fails after that. So pathological
cases won't loop indefinitely anymore at least.

> I also do still think that even regardless of that, we want to just
> add a FOLL_NOFAULT flag that just disables calling handle_mm_fault(),
> and then you can use the regular get_user_pages().
>
> That at least gives us the full _normal_ page handling stuff.

And it does fix the generic/208 failure.

Thanks,
Andreas


WARNING: multiple messages have this Message-ID (diff)
From: Andreas Gruenbacher <agruenba@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: kvm-ppc@vger.kernel.org, Paul Mackerras <paulus@ozlabs.org>,
	cluster-devel <cluster-devel@redhat.com>, Jan Kara <jack@suse.cz>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] [PATCH v5 00/12] gfs2: Fix mmap + page fault deadlocks
Date: Thu, 19 Aug 2021 21:40:49 +0200	[thread overview]
Message-ID: <CAHc6FU5HHFwuJBCNuU0e_N0ehaFrzbUrCuTJyaLNC4qxwfazYA@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=whBCm3G5yibbvQsTn00fA16a688NTU_geQV158DnRy+bQ@mail.gmail.com>

On Wed, Aug 18, 2021 at 11:49 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> [ Sorry for the delay, I was on the road and this fell through the cracks ]

No harm done, I was busy enough implementing your previous suggestions.

> On Mon, Aug 16, 2021 at 12:14 PM Andreas Gruenbacher
> <agruenba@redhat.com> wrote:
> >
> > On Tue, Aug 3, 2021 at 9:45 PM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > Hmm. Have you tried to figure out why that "still returns 0" happens?
> >
> > The call stack is:
> >
> > gup_pte_range
> > gup_pmd_range
> > gup_pud_range
> > gup_p4d_range
> > gup_pgd_range
> > lockless_pages_from_mm
> > internal_get_user_pages_fast
> > get_user_pages_fast
> > iov_iter_get_pages
> > __bio_iov_iter_get_pages
> > bio_iov_iter_get_pages
> > iomap_dio_bio_actor
> > iomap_dio_actor
> > iomap_apply
> > iomap_dio_rw
> > gfs2_file_direct_write
> >
> > In gup_pte_range, pte_special(pte) is true and so we return 0.
>
> Ok, so that is indeed something that the fast-case can't handle,
> because some of the special code wants to have the mm_lock so that it
> can look at the vma flags (eg "vm_normal_page()" and friends.
>
> That said, some of these cases even the full GUP won't ever handle,
> simply because a mapping doesn't necessarily even _have_ a 'struct
> page' associated with it if it's a VM_IO mapping.
>
> So it turns out that you can't just always do
> fault_in_iov_iter_readable() and then assume that you can do
> iov_iter_get_pages() and repeat until successful.
>
> We could certainly make get_user_pages_fast() handle a few more cases,
> but I get the feeling that we need to have separate error cases for
> EFAULT - no page exists - and the "page exists, but cannot be mapped
> as a 'struct page'" case.

Hmm, what if GUP is made to skip VM_IO vmas without adding anything to
the pages array? That would match fault_in_iov_iter_writeable, which
is modeled after __mm_populate and which skips VM_IO and VM_PFNMAP
vmas.

The other strategy I've added is to scale back the page fault windows
to a single page if faulting in multiple pages didn't help, and to
give up if the I/O operation still fails after that. So pathological
cases won't loop indefinitely anymore at least.

> I also do still think that even regardless of that, we want to just
> add a FOLL_NOFAULT flag that just disables calling handle_mm_fault(),
> and then you can use the regular get_user_pages().
>
> That at least gives us the full _normal_ page handling stuff.

And it does fix the generic/208 failure.

Thanks,
Andreas


_______________________________________________
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

WARNING: multiple messages have this Message-ID (diff)
From: Andreas Gruenbacher <agruenba@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH v5 00/12] gfs2: Fix mmap + page fault deadlocks
Date: Thu, 19 Aug 2021 21:40:49 +0200	[thread overview]
Message-ID: <CAHc6FU5HHFwuJBCNuU0e_N0ehaFrzbUrCuTJyaLNC4qxwfazYA@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=whBCm3G5yibbvQsTn00fA16a688NTU_geQV158DnRy+bQ@mail.gmail.com>

On Wed, Aug 18, 2021 at 11:49 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> [ Sorry for the delay, I was on the road and this fell through the cracks ]

No harm done, I was busy enough implementing your previous suggestions.

> On Mon, Aug 16, 2021 at 12:14 PM Andreas Gruenbacher
> <agruenba@redhat.com> wrote:
> >
> > On Tue, Aug 3, 2021 at 9:45 PM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > Hmm. Have you tried to figure out why that "still returns 0" happens?
> >
> > The call stack is:
> >
> > gup_pte_range
> > gup_pmd_range
> > gup_pud_range
> > gup_p4d_range
> > gup_pgd_range
> > lockless_pages_from_mm
> > internal_get_user_pages_fast
> > get_user_pages_fast
> > iov_iter_get_pages
> > __bio_iov_iter_get_pages
> > bio_iov_iter_get_pages
> > iomap_dio_bio_actor
> > iomap_dio_actor
> > iomap_apply
> > iomap_dio_rw
> > gfs2_file_direct_write
> >
> > In gup_pte_range, pte_special(pte) is true and so we return 0.
>
> Ok, so that is indeed something that the fast-case can't handle,
> because some of the special code wants to have the mm_lock so that it
> can look at the vma flags (eg "vm_normal_page()" and friends.
>
> That said, some of these cases even the full GUP won't ever handle,
> simply because a mapping doesn't necessarily even _have_ a 'struct
> page' associated with it if it's a VM_IO mapping.
>
> So it turns out that you can't just always do
> fault_in_iov_iter_readable() and then assume that you can do
> iov_iter_get_pages() and repeat until successful.
>
> We could certainly make get_user_pages_fast() handle a few more cases,
> but I get the feeling that we need to have separate error cases for
> EFAULT - no page exists - and the "page exists, but cannot be mapped
> as a 'struct page'" case.

Hmm, what if GUP is made to skip VM_IO vmas without adding anything to
the pages array? That would match fault_in_iov_iter_writeable, which
is modeled after __mm_populate and which skips VM_IO and VM_PFNMAP
vmas.

The other strategy I've added is to scale back the page fault windows
to a single page if faulting in multiple pages didn't help, and to
give up if the I/O operation still fails after that. So pathological
cases won't loop indefinitely anymore at least.

> I also do still think that even regardless of that, we want to just
> add a FOLL_NOFAULT flag that just disables calling handle_mm_fault(),
> and then you can use the regular get_user_pages().
>
> That at least gives us the full _normal_ page handling stuff.

And it does fix the generic/208 failure.

Thanks,
Andreas



WARNING: multiple messages have this Message-ID (diff)
From: Andreas Gruenbacher <agruenba@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Paul Mackerras <paulus@ozlabs.org>, Jan Kara <jack@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	cluster-devel <cluster-devel@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	ocfs2-devel@oss.oracle.com, kvm-ppc@vger.kernel.org
Subject: Re: [PATCH v5 00/12] gfs2: Fix mmap + page fault deadlocks
Date: Thu, 19 Aug 2021 19:40:49 +0000	[thread overview]
Message-ID: <CAHc6FU5HHFwuJBCNuU0e_N0ehaFrzbUrCuTJyaLNC4qxwfazYA@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=whBCm3G5yibbvQsTn00fA16a688NTU_geQV158DnRy+bQ@mail.gmail.com>

On Wed, Aug 18, 2021 at 11:49 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> [ Sorry for the delay, I was on the road and this fell through the cracks ]

No harm done, I was busy enough implementing your previous suggestions.

> On Mon, Aug 16, 2021 at 12:14 PM Andreas Gruenbacher
> <agruenba@redhat.com> wrote:
> >
> > On Tue, Aug 3, 2021 at 9:45 PM Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > Hmm. Have you tried to figure out why that "still returns 0" happens?
> >
> > The call stack is:
> >
> > gup_pte_range
> > gup_pmd_range
> > gup_pud_range
> > gup_p4d_range
> > gup_pgd_range
> > lockless_pages_from_mm
> > internal_get_user_pages_fast
> > get_user_pages_fast
> > iov_iter_get_pages
> > __bio_iov_iter_get_pages
> > bio_iov_iter_get_pages
> > iomap_dio_bio_actor
> > iomap_dio_actor
> > iomap_apply
> > iomap_dio_rw
> > gfs2_file_direct_write
> >
> > In gup_pte_range, pte_special(pte) is true and so we return 0.
>
> Ok, so that is indeed something that the fast-case can't handle,
> because some of the special code wants to have the mm_lock so that it
> can look at the vma flags (eg "vm_normal_page()" and friends.
>
> That said, some of these cases even the full GUP won't ever handle,
> simply because a mapping doesn't necessarily even _have_ a 'struct
> page' associated with it if it's a VM_IO mapping.
>
> So it turns out that you can't just always do
> fault_in_iov_iter_readable() and then assume that you can do
> iov_iter_get_pages() and repeat until successful.
>
> We could certainly make get_user_pages_fast() handle a few more cases,
> but I get the feeling that we need to have separate error cases for
> EFAULT - no page exists - and the "page exists, but cannot be mapped
> as a 'struct page'" case.

Hmm, what if GUP is made to skip VM_IO vmas without adding anything to
the pages array? That would match fault_in_iov_iter_writeable, which
is modeled after __mm_populate and which skips VM_IO and VM_PFNMAP
vmas.

The other strategy I've added is to scale back the page fault windows
to a single page if faulting in multiple pages didn't help, and to
give up if the I/O operation still fails after that. So pathological
cases won't loop indefinitely anymore at least.

> I also do still think that even regardless of that, we want to just
> add a FOLL_NOFAULT flag that just disables calling handle_mm_fault(),
> and then you can use the regular get_user_pages().
>
> That at least gives us the full _normal_ page handling stuff.

And it does fix the generic/208 failure.

Thanks,
Andreas

  reply	other threads:[~2021-08-19 19:41 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-03 19:18 [PATCH v5 00/12] gfs2: Fix mmap + page fault deadlocks Andreas Gruenbacher
2021-08-03 19:18 ` Andreas Gruenbacher
2021-08-03 19:18 ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 01/12] iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] [PATCH v5 01/12] iov_iter: Fix iov_iter_get_pages{, _alloc} " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 02/12] powerpc/kvm: Fix kvm_use_magic_page Andreas Gruenbacher
2021-08-03 19:18   ` Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 03/12] Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] [PATCH v5 03/12] Turn fault_in_pages_{readable, writeable} into fault_in_{readable, writeable} Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:43   ` [PATCH v5 03/12] Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} Linus Torvalds
2021-08-03 19:43     ` [Cluster-devel] [PATCH v5 03/12] Turn fault_in_pages_{readable, writeable} into fault_in_{readable, writeable} Linus Torvalds
2021-08-03 19:43     ` [Ocfs2-devel] " Linus Torvalds
2021-08-03 20:57   ` [PATCH v5 03/12] Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} Al Viro
2021-08-03 20:57     ` [Cluster-devel] [PATCH v5 03/12] Turn fault_in_pages_{readable, writeable} into fault_in_{readable, writeable} Al Viro
2021-08-03 20:57     ` [Ocfs2-devel] " Al Viro
2021-08-03 21:38     ` [PATCH v5 03/12] Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} Andreas Gruenbacher
2021-08-03 21:38       ` [Cluster-devel] [PATCH v5 03/12] Turn fault_in_pages_{readable, writeable} into fault_in_{readable, writeable} Andreas Gruenbacher
2021-08-03 21:38       ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 04/12] Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 05/12] iov_iter: Introduce fault_in_iov_iter_writeable Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-04 15:24   ` Andreas Gruenbacher
2021-08-04 15:24     ` [Cluster-devel] " Andreas Gruenbacher
2021-08-04 15:24     ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 06/12] gfs2: Add wrapper for iomap_file_buffered_write Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 07/12] gfs2: Fix mmap + page fault deadlocks for buffered I/O Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 08/12] iomap: Fix iomap_dio_rw return value for user copies Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 09/12] iomap: Support restarting direct I/O requests after user copy failures Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 10/12] iomap: Add done_before argument to iomap_dio_rw Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 11/12] iov_iter: Introduce noio flag to disable page faults Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:18 ` [PATCH v5 12/12] gfs2: Fix mmap + page fault deadlocks for direct I/O Andreas Gruenbacher
2021-08-03 19:18   ` [Cluster-devel] " Andreas Gruenbacher
2021-08-03 19:18   ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-03 19:45 ` [PATCH v5 00/12] gfs2: Fix mmap + page fault deadlocks Linus Torvalds
2021-08-03 19:45   ` Linus Torvalds
2021-08-03 19:45   ` [Cluster-devel] " Linus Torvalds
2021-08-03 19:45   ` [Ocfs2-devel] " Linus Torvalds
2021-08-16 19:14   ` Andreas Gruenbacher
2021-08-16 19:14     ` Andreas Gruenbacher
2021-08-16 19:14     ` [Cluster-devel] " Andreas Gruenbacher
2021-08-16 19:14     ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-18 21:49     ` Linus Torvalds
2021-08-18 21:49       ` Linus Torvalds
2021-08-18 21:49       ` [Cluster-devel] " Linus Torvalds
2021-08-18 21:49       ` [Ocfs2-devel] " Linus Torvalds
2021-08-19 19:40       ` Andreas Gruenbacher [this message]
2021-08-19 19:40         ` Andreas Gruenbacher
2021-08-19 19:40         ` [Cluster-devel] " Andreas Gruenbacher
2021-08-19 19:40         ` [Ocfs2-devel] " Andreas Gruenbacher
2021-08-19 20:14         ` Linus Torvalds
2021-08-19 20:14           ` Linus Torvalds
2021-08-19 20:14           ` [Cluster-devel] " Linus Torvalds
2021-08-19 20:14           ` [Ocfs2-devel] " Linus Torvalds
2021-08-19 21:39           ` Andreas Gruenbacher
2021-08-19 21:39             ` Andreas Gruenbacher
2021-08-19 21:39             ` [Cluster-devel] " Andreas Gruenbacher
2021-08-19 21:39             ` [Ocfs2-devel] " Andreas Gruenbacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHc6FU5HHFwuJBCNuU0e_N0ehaFrzbUrCuTJyaLNC4qxwfazYA@mail.gmail.com \
    --to=agruenba@redhat.com \
    --cc=cluster-devel@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=paulus@ozlabs.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.