nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Joao Martins <joao.m.martins@oracle.com>
To: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-mm@kvack.org, Dan Williams <dan.j.williams@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	Matthew Wilcox <willy@infradead.org>,
	John Hubbard <jhubbard@nvidia.com>,
	Jane Chu <jane.chu@oracle.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jonathan Corbet <corbet@lwn.net>, Christoph Hellwig <hch@lst.de>,
	nvdimm@lists.linux.dev, linux-doc@vger.kernel.org
Subject: Re: [PATCH v4 08/14] mm/gup: grab head page refcount once for group of subpages
Date: Tue, 31 Aug 2021 13:34:04 +0100	[thread overview]
Message-ID: <cda6d8fb-bd48-a3de-9d4e-96e4a43ebe58@oracle.com> (raw)
In-Reply-To: <20210830130741.GO1200268@ziepe.ca>

On 8/30/21 2:07 PM, Jason Gunthorpe wrote:
> On Fri, Aug 27, 2021 at 07:34:54PM +0100, Joao Martins wrote:
>> On 8/27/21 5:25 PM, Jason Gunthorpe wrote:
>>> On Fri, Aug 27, 2021 at 03:58:13PM +0100, Joao Martins wrote:
>>>
>>>>  #if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
>>>>  static int __gup_device_huge(unsigned long pfn, unsigned long addr,
>>>>  			     unsigned long end, unsigned int flags,
>>>>  			     struct page **pages, int *nr)
>>>>  {
>>>> -	int nr_start = *nr;
>>>> +	int refs, nr_start = *nr;
>>>>  	struct dev_pagemap *pgmap = NULL;
>>>>  	int ret = 1;
>>>>  
>>>>  	do {
>>>> -		struct page *page = pfn_to_page(pfn);
>>>> +		struct page *head, *page = pfn_to_page(pfn);
>>>> +		unsigned long next = addr + PAGE_SIZE;
>>>>  
>>>>  		pgmap = get_dev_pagemap(pfn, pgmap);
>>>>  		if (unlikely(!pgmap)) {
>>>> @@ -2252,16 +2265,25 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr,
>>>>  			ret = 0;
>>>>  			break;
>>>>  		}
>>>> -		SetPageReferenced(page);
>>>> -		pages[*nr] = page;
>>>> -		if (unlikely(!try_grab_page(page, flags))) {
>>>> -			undo_dev_pagemap(nr, nr_start, flags, pages);
>>>> +
>>>> +		head = compound_head(page);
>>>> +		/* @end is assumed to be limited at most one compound page */
>>>> +		if (PageHead(head))
>>>> +			next = end;
>>>> +		refs = record_subpages(page, addr, next, pages + *nr);
>>>> +
>>>> +		SetPageReferenced(head);
>>>> +		if (unlikely(!try_grab_compound_head(head, refs, flags))) {
>>>> +			if (PageHead(head))
>>>> +				ClearPageReferenced(head);
>>>> +			else
>>>> +				undo_dev_pagemap(nr, nr_start, flags, pages);
>>>>  			ret = 0;
>>>>  			break;
>>>
>>> Why is this special cased for devmap?
>>>
>>> Shouldn't everything processing pud/pmds/etc use the same basic loop
>>> that is similar in idea to the 'for_each_compound_head' scheme in
>>> unpin_user_pages_dirty_lock()?
>>>
>>> Doesn't that work for all the special page type cases here?
>>
>> We are iterating over PFNs to create an array of base pages (regardless of page table
>> type), rather than iterating over an array of pages to work on. 
> 
> That is part of it, yes, but the slow bit here is to minimally find
> the head pages and do the atomics on them, much like the
> unpin_user_pages_dirty_lock()
> 
> I would think this should be designed similar to how things work on
> the unpin side.
> 
I don't think it's the same thing. The bit you say 'minimally find the
head pages' carries a visible overhead in unpin_user_pages() as we are
checking each of the pages belongs to the same head page -- because you
can pass an arbritary set of pages. This does have a cost which is not
in gup-fast right now AIUI. Whereas in our gup-fast 'handler' you
already know that you are processing a contiguous chunk of pages.
If anything, we are closer to unpin_user_page_range*()
than unpin_user_pages().

> Sweep the page tables to find a proper start/end - eg even if a
> compound is spread across multiple pte/pmd/pud/etc we should find a
> linear range of starting PFN (ie starting page*) and npages across as
> much of the page tables as we can manage. This is the same as where
> things end up in the unpin case where all the contiguous PFNs are
> grouped togeher into a range.
> 
> Then 'assign' that range to the output array which requires walking
> over each compount_head in the range and pinning it, then writing out
> the tail pages to the output struct page array.
> 
> And this approach should apply universally no matter what is under the
> pte's - ie huge pages, THPs and devmaps should all be treated the same
> way. Currently each case is different, like above which is unique to
> device_huge.
> 
Only devmap gup-fast is different IIUC.

Switching to similar iteration logic to unpin would look something like
this (still untested):

        for_each_compound_range(index, &page, npages, head, refs) {
                pgmap = get_dev_pagemap(pfn + *nr, pgmap);
                if (unlikely(!pgmap)) {
                        undo_dev_pagemap(nr, nr_start, flags, pages);
                        ret = 0;
                        break;
                }

                SetPageReferenced(head);
                if (unlikely(!try_grab_compound_head(head, refs, flags))) {
                        if (PageHead(head))
                                ClearPageReferenced(head);
                        else
                                undo_dev_pagemap(nr, nr_start, flags, pages);
                        ret = 0;
                        break;
                }

                record_subpages(page + *nr, addr,
                                addr + (refs << PAGE_SHIFT), pages + *nr);
                *(nr) += refs;
		addr += (refs << PAGE_SHIFT);
        }


But it looks to be a tidbit more complex and not really aligning with the
rest of gup-fast.

All in all, I am dealing with the fact that 1) devmap pmds/puds may not
be represented with compound pages and 2) we temporarily grab dev_pagemap reference
prior to pinning the page. Those two items is what makes this different than THPs/HugeTLB
(which do have the same logic). And thus it's what lead me to *slightly* improve
gup_device_huge().

  reply	other threads:[~2021-08-31 12:34 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-27 14:58 [PATCH v4 00/14] mm, sparse-vmemmap: Introduce compound devmaps for device-dax Joao Martins
2021-08-27 14:58 ` [PATCH v4 01/14] memory-failure: fetch compound_head after pgmap_pfn_valid() Joao Martins
2021-08-27 14:58 ` [PATCH v4 02/14] mm/page_alloc: split prep_compound_page into head and tail subparts Joao Martins
2021-08-27 14:58 ` [PATCH v4 03/14] mm/page_alloc: refactor memmap_init_zone_device() page init Joao Martins
2021-08-27 14:58 ` [PATCH v4 04/14] mm/memremap: add ZONE_DEVICE support for compound pages Joao Martins
2021-08-27 15:33   ` Christoph Hellwig
2021-08-27 16:00     ` Joao Martins
2021-09-01  9:44       ` Christoph Hellwig
2021-09-09  9:38         ` Joao Martins
2021-08-27 14:58 ` [PATCH v4 05/14] device-dax: use ALIGN() for determining pgoff Joao Martins
2021-08-27 14:58 ` [PATCH v4 06/14] device-dax: ensure dev_dax->pgmap is valid for dynamic devices Joao Martins
2021-11-05  0:31   ` Dan Williams
2021-11-05 12:09     ` Joao Martins
2021-11-05 16:14       ` Joao Martins
2021-11-05 16:46       ` Dan Williams
2021-11-05 18:11         ` Joao Martins
2021-08-27 14:58 ` [PATCH v4 07/14] device-dax: compound devmap support Joao Martins
2021-11-05  0:38   ` Dan Williams
2021-11-05 14:10     ` Joao Martins
2021-11-05 16:41       ` Dan Williams
2021-08-27 14:58 ` [PATCH v4 08/14] mm/gup: grab head page refcount once for group of subpages Joao Martins
2021-08-27 16:25   ` Jason Gunthorpe
2021-08-27 18:34     ` Joao Martins
2021-08-30 13:07       ` Jason Gunthorpe
2021-08-31 12:34         ` Joao Martins [this message]
2021-08-31 17:05           ` Jason Gunthorpe
2021-09-23 16:51             ` Joao Martins
2021-09-28 18:01               ` Jason Gunthorpe
2021-09-29 11:50                 ` Joao Martins
2021-09-29 19:34                   ` Jason Gunthorpe
2021-09-30  3:01                     ` Alistair Popple
2021-09-30 17:54                       ` Joao Martins
2021-09-30 21:55                         ` Jason Gunthorpe
2021-10-18 18:36                       ` Jason Gunthorpe
2021-10-18 18:37                   ` Jason Gunthorpe
2021-10-08 11:54   ` Jason Gunthorpe
2021-10-11 15:53     ` Joao Martins
2021-10-13 17:41       ` Jason Gunthorpe
2021-10-13 19:18         ` Joao Martins
2021-10-13 19:43           ` Jason Gunthorpe
2021-10-14 17:56             ` Joao Martins
2021-10-14 18:06               ` Jason Gunthorpe
2021-08-27 14:58 ` [PATCH v4 09/14] mm/sparse-vmemmap: add a pgmap argument to section activation Joao Martins
2021-08-27 14:58 ` [PATCH v4 10/14] mm/sparse-vmemmap: refactor core of vmemmap_populate_basepages() to helper Joao Martins
2021-08-27 14:58 ` [PATCH v4 11/14] mm/hugetlb_vmemmap: move comment block to Documentation/vm Joao Martins
2021-08-27 14:58 ` [PATCH v4 12/14] mm/sparse-vmemmap: populate compound devmaps Joao Martins
2021-08-27 14:58 ` [PATCH v4 13/14] mm/page_alloc: reuse tail struct pages for " Joao Martins
2021-08-27 14:58 ` [PATCH v4 14/14] mm/sparse-vmemmap: improve memory savings for compound pud geometry Joao Martins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cda6d8fb-bd48-a3de-9d4e-96e4a43ebe58@oracle.com \
    --to=joao.m.martins@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=hch@lst.de \
    --cc=jane.chu@oracle.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=songmuchun@bytedance.com \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).