linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alistair Popple <apopple@nvidia.com>
To: Peter Xu <peterx@redhat.com>
Cc: <linux-mm@kvack.org>, <akpm@linux-foundation.org>,
	<rcampbell@nvidia.com>, <linux-doc@vger.kernel.org>,
	<nouveau@lists.freedesktop.org>, <hughd@google.com>,
	<linux-kernel@vger.kernel.org>, <dri-devel@lists.freedesktop.org>,
	<hch@infradead.org>, <bskeggs@redhat.com>, <jgg@nvidia.com>,
	<shakeelb@google.com>, <jhubbard@nvidia.com>,
	<willy@infradead.org>, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v10 07/10] mm: Device exclusive memory access
Date: Thu, 10 Jun 2021 10:18:25 +1000	[thread overview]
Message-ID: <2773835.D95cIkl9rl@nvdebian> (raw)
In-Reply-To: <YMDmsha6GDtUf3Vs@t490s>

On Thursday, 10 June 2021 2:05:06 AM AEST Peter Xu wrote:
> On Wed, Jun 09, 2021 at 07:38:04PM +1000, Alistair Popple wrote:
> > On Wednesday, 9 June 2021 4:33:52 AM AEST Peter Xu wrote:
> > > On Mon, Jun 07, 2021 at 05:58:52PM +1000, Alistair Popple wrote:

[...]

> > For thp this means we could end up passing
> > tail pages to rmap_walk(), however it doesn't actually walk them.
> >
> > Based on the results of previous testing I had done I assumed rmap_walk()
> > filtered out tail pages. It does, and I didn't hit the BUG_ON above, but the
> > filtering was not as deliberate as assumed.
> >
> > I've gone back and looked at what was happening in my earlier tests and the
> > tail pages get filtered because the VMA is not getting locked in
> > page_lock_anon_vma_read() due to failing this check:
> >
> >       anon_mapping = (unsigned long)READ_ONCE(page->mapping);
> >       if ((anon_mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
> >               goto out;
> >
> > And now I'm not sure it makes sense to read page->mapping of a tail page. So
> > it might be best if we explicitly ignore any tail pages returned from GUP, at
> > least for now (a future series will improve thp support such as adding a pmd
> > version for exclusive entries).
> 
> I feel like it's illegal to access page->mapping of tail pages; I looked at
> what happens if we call page_anon_vma() on a tail page:
> 
> struct anon_vma *page_anon_vma(struct page *page)
> {
>         unsigned long mapping;
> 
>         page = compound_head(page);
>         mapping = (unsigned long)page->mapping;
>         if ((mapping & PAGE_MAPPING_FLAGS) != PAGE_MAPPING_ANON)
>                 return NULL;
>         return __page_rmapping(page);
> }
> 
> It'll just take the head's mapping instead.  It makes sense since the tail page
> shouldn't have a different value against the head page, afaiu.

Right, it makes no sense to look at ->mapping on a tail page because the field
is used for something else. On the 1st tail page it is ->compound_nr and on the
2nd tail page it is ->deferred_list. See the definitions of compound_nr() and
page_deferred_list() respectively. I suppose on the rest of the pages it could
be anything.

I think in practice it is probably ok - iuc bit 0 won't be set for compound_nr
and certainly not for deferred_list->next (a pointer). But none of that seems
intentional, so it would be better to be explicit and not walk the tail pages.

> It would be great if thp experts could chim in.  Before that happens, I agree
> with you that a safer approach is to explicitly not walk a tail page for its
> rmap (and I think the rmap of a tail page will be the same of the head
> anyways.. since they seem to share the anon_vma as quoted).
> >
> > > So... for thp mappings, wondering whether we should do normal GUP (without
> > > SPLIT), pass in always normal or head pages into rmap_walk(), but then
> > > unconditionally split_huge_pmd_address() in page_make_device_exclusive_one()?
> >
> > That could work (although I think GUP will still return tail pages - see
> > follow_trans_huge_pmd() which is called from follow_pmd_mask() in gup).
> 
> Agreed.
> 
> > The main problem is split_huge_pmd_address() unconditionally calls a mmu
> > notifier so I would need to plumb in passing an owner everywhere which could
> > get messy.
> 
> Could I ask why?  split_huge_pmd_address() will notify with CLEAR, so I'm a bit
> confused why we need to pass over the owner.

Sure, it is the same reason we need to pass it for the exclusive notifier.
Any invalidation during the make exclusive operation will break the mmu read
side critical section forcing a retry of the operation. The owner field is what
is used to filter out invalidations (such as the exclusive invalidation) that
don't need to be retried.
 
> I thought plumb it right before your EXCLUSIVE notifier init would work?

I did try this just to double check and it doesn't work due to the unconditional
notifier.

> ---8<---
> diff --git a/mm/rmap.c b/mm/rmap.c
> index a94d9aed9d95..360ce86f3822 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -2042,6 +2042,12 @@ static bool page_make_device_exclusive_one(struct page *page,
>         swp_entry_t entry;
>         pte_t swp_pte;
> 
> +       /*
> +        * Make sure thps split as device exclusive entries only support pte
> +        * level for now.
> +        */
> +       split_huge_pmd_address(vma, address, false, page);
> +
>         mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0, vma,
>                                       vma->vm_mm, address, min(vma->vm_end,
>                                       address + page_size(page)), args->owner);
> ---8<---
> 
> Thanks,
> 
> --
> Peter Xu
> 





  reply	other threads:[~2021-06-10  0:18 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-07  7:58 [PATCH v10 00/10] Add support for SVM atomics in Nouveau Alistair Popple
2021-06-07  7:58 ` [PATCH v10 01/10] mm: Remove special swap entry functions Alistair Popple
2021-06-07  7:58 ` [PATCH v10 02/10] mm/swapops: Rework swap entry manipulation code Alistair Popple
2021-06-07  7:58 ` [PATCH v10 03/10] mm/rmap: Split try_to_munlock from try_to_unmap Alistair Popple
2021-06-07  7:58 ` [PATCH v10 04/10] mm/rmap: Split migration into its own function Alistair Popple
2021-06-07  7:58 ` [PATCH v10 05/10] mm: Rename migrate_pgmap_owner Alistair Popple
2021-06-08 15:16   ` Peter Xu
2021-06-07  7:58 ` [PATCH v10 06/10] mm/memory.c: Allow different return codes for copy_nonpresent_pte() Alistair Popple
2021-06-08 15:19   ` Peter Xu
2021-06-07  7:58 ` [PATCH v10 07/10] mm: Device exclusive memory access Alistair Popple
2021-06-08 18:33   ` Peter Xu
2021-06-09  9:38     ` Alistair Popple
2021-06-09 16:05       ` Peter Xu
2021-06-10  0:18         ` Alistair Popple [this message]
2021-06-10 18:04           ` Peter Xu
2021-06-10 14:21             ` Alistair Popple
2021-06-10 23:04               ` Peter Xu
2021-06-10 23:17                 ` Alistair Popple
2021-06-11  1:00                   ` Peter Xu
2021-06-11  3:43                     ` Alistair Popple
2021-06-11 15:01                       ` Peter Xu
2021-06-15  3:08                         ` Alistair Popple
2021-06-15 16:25                           ` Peter Xu
2021-06-16  2:47                             ` Alistair Popple
2021-06-07  7:58 ` [PATCH v10 08/10] mm: Selftests for exclusive device memory Alistair Popple
2021-06-07  7:58 ` [PATCH v10 09/10] nouveau/svm: Refactor nouveau_range_fault Alistair Popple
2021-06-07  7:58 ` [PATCH v10 10/10] nouveau/svm: Implement atomic SVM access Alistair Popple

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2773835.D95cIkl9rl@nvdebian \
    --to=apopple@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=bskeggs@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=hughd@google.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=peterx@redhat.com \
    --cc=rcampbell@nvidia.com \
    --cc=shakeelb@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).