nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Joao Martins <joao.m.martins@oracle.com>
To: Jason Gunthorpe <jgg@ziepe.ca>, Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org, Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	Matthew Wilcox <willy@infradead.org>,
	John Hubbard <jhubbard@nvidia.com>,
	Jane Chu <jane.chu@oracle.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jonathan Corbet <corbet@lwn.net>, Christoph Hellwig <hch@lst.de>,
	nvdimm@lists.linux.dev, linux-doc@vger.kernel.org
Subject: Re: [PATCH v4 08/14] mm/gup: grab head page refcount once for group of subpages
Date: Thu, 23 Sep 2021 17:51:04 +0100	[thread overview]
Message-ID: <8c23586a-eb3b-11a6-e72a-dcc3faad4e96@oracle.com> (raw)
In-Reply-To: <20210831170526.GP1200268@ziepe.ca>

On 8/31/21 6:05 PM, Jason Gunthorpe wrote:
> On Tue, Aug 31, 2021 at 01:34:04PM +0100, Joao Martins wrote:
>> On 8/30/21 2:07 PM, Jason Gunthorpe wrote:
>>> On Fri, Aug 27, 2021 at 07:34:54PM +0100, Joao Martins wrote:
>>>> On 8/27/21 5:25 PM, Jason Gunthorpe wrote:
>>>>> On Fri, Aug 27, 2021 at 03:58:13PM +0100, Joao Martins wrote:
>>>>>
>>>>>>  #if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
>>>>>>  static int __gup_device_huge(unsigned long pfn, unsigned long addr,
>>>>>>  			     unsigned long end, unsigned int flags,
>>>>>>  			     struct page **pages, int *nr)
>>>>>>  {
>>>>>> -	int nr_start = *nr;
>>>>>> +	int refs, nr_start = *nr;
>>>>>>  	struct dev_pagemap *pgmap = NULL;
>>>>>>  	int ret = 1;
>>>>>>  
>>>>>>  	do {
>>>>>> -		struct page *page = pfn_to_page(pfn);
>>>>>> +		struct page *head, *page = pfn_to_page(pfn);
>>>>>> +		unsigned long next = addr + PAGE_SIZE;
>>>>>>  
>>>>>>  		pgmap = get_dev_pagemap(pfn, pgmap);
>>>>>>  		if (unlikely(!pgmap)) {
>>>>>> @@ -2252,16 +2265,25 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr,
>>>>>>  			ret = 0;
>>>>>>  			break;
>>>>>>  		}
>>>>>> -		SetPageReferenced(page);
>>>>>> -		pages[*nr] = page;
>>>>>> -		if (unlikely(!try_grab_page(page, flags))) {
>>>>>> -			undo_dev_pagemap(nr, nr_start, flags, pages);
>>>>>> +
>>>>>> +		head = compound_head(page);
>>>>>> +		/* @end is assumed to be limited at most one compound page */
>>>>>> +		if (PageHead(head))
>>>>>> +			next = end;
>>>>>> +		refs = record_subpages(page, addr, next, pages + *nr);
>>>>>> +
>>>>>> +		SetPageReferenced(head);
>>>>>> +		if (unlikely(!try_grab_compound_head(head, refs, flags))) {
>>>>>> +			if (PageHead(head))
>>>>>> +				ClearPageReferenced(head);
>>>>>> +			else
>>>>>> +				undo_dev_pagemap(nr, nr_start, flags, pages);
>>>>>>  			ret = 0;
>>>>>>  			break;
>>>>>
>>>>> Why is this special cased for devmap?
>>>>>
>>>>> Shouldn't everything processing pud/pmds/etc use the same basic loop
>>>>> that is similar in idea to the 'for_each_compound_head' scheme in
>>>>> unpin_user_pages_dirty_lock()?
>>>>>
>>>>> Doesn't that work for all the special page type cases here?
>>>>
>>>> We are iterating over PFNs to create an array of base pages (regardless of page table
>>>> type), rather than iterating over an array of pages to work on. 
>>>
>>> That is part of it, yes, but the slow bit here is to minimally find
>>> the head pages and do the atomics on them, much like the
>>> unpin_user_pages_dirty_lock()
>>>
>>> I would think this should be designed similar to how things work on
>>> the unpin side.
>>>
>> I don't think it's the same thing. The bit you say 'minimally find the
>> head pages' carries a visible overhead in unpin_user_pages() as we are
>> checking each of the pages belongs to the same head page -- because you
>> can pass an arbritary set of pages. This does have a cost which is not
>> in gup-fast right now AIUI. Whereas in our gup-fast 'handler' you
>> already know that you are processing a contiguous chunk of pages.
>> If anything, we are closer to unpin_user_page_range*()
>> than unpin_user_pages().
> 
> Yes, that is what I mean, it is very similar to the range case as we
> don't even know that a single compound spans a pud/pmd. So you end up
> doing the same loop to find the compound boundaries.
> 
> Under GUP slow we can also aggregate multiple page table entires, eg a
> split huge page could be procesed as a single 2M range operation even
> if it is broken to 4K PTEs.

/me nods

FWIW, I have a follow-up patch pursuing similar optimization (to fix
gup-slow case) that I need to put in better shape -- I probably won't wait
until this series is done contrary to what the cover letter says.

>> Switching to similar iteration logic to unpin would look something like
>> this (still untested):
>>
>>         for_each_compound_range(index, &page, npages, head, refs) {
>>                 pgmap = get_dev_pagemap(pfn + *nr, pgmap);
> 
> I recall talking to DanW about this and we agreed it was unnecessary
> here to hold the pgmap and should be deleted.

Yeap, I remember that conversation[0]. It was a long time ago, and I am
not sure what progress was made there since the last posting? Dan, any
thoughts there?

[0]
https://lore.kernel.org/linux-mm/161604050866.1463742.7759521510383551055.stgit@dwillia2-desk3.amr.corp.intel.com/

So ... if pgmap accounting was removed from gup-fast then this patch
would be a lot simpler and we could perhaps just fallback to the regular
hugepage case (THP, HugeTLB) like your suggestion at the top. See at the
end below scissors mark as the ballpark of changes.

So far my options seem to be: 1) this patch which leverages the existing
iteration logic or 2) switching to for_each_compound_range() -- see my previous
reply 3) waiting for Dan to remove @pgmap accounting in gup-fast and use
something similar to below scissors mark.

What do you think would be the best course of action?

--->8---

++static int __gup_device_compound(unsigned long addr, unsigned long pfn,
++                               unsigned long mask)
++{
++      pfn += ((addr & ~mask) >> PAGE_SHIFT);
++
++      return PageCompound(pfn_to_page(pfn));
++}
++
  static int __gup_device_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
                                 unsigned long end, unsigned int flags,
                                 struct page **pages, int *nr)
@@@ -2428,8 -2428,8 +2433,10 @@@ static int gup_huge_pmd(pmd_t orig, pmd
        if (pmd_devmap(orig)) {
                if (unlikely(flags & FOLL_LONGTERM))
                        return 0;
--              return __gup_device_huge_pmd(orig, pmdp, addr, end, flags,
--                                           pages, nr);
++
++              if (!__gup_device_compound(addr, pmd_pfn(orig), PMD_MASK))
++                      return __gup_device_huge_pmd(orig, pmdp, addr, end,
++                                                   flags, pages, nr);
        }

        page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
@@@ -2462,8 -2462,8 +2469,10 @@@ static int gup_huge_pud(pud_t orig, pud
        if (pud_devmap(orig)) {
                if (unlikely(flags & FOLL_LONGTERM))
                        return 0;
--              return __gup_device_huge_pud(orig, pudp, addr, end, flags,
--                                           pages, nr);
++
++              if (!__gup_device_compound(addr, pud_pfn(orig), PUD_MASK))
++                      return __gup_device_huge_pud(orig, pudp, addr, end,
++                                                   flags, pages, nr);
        }

        page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);

  reply	other threads:[~2021-09-23 16:51 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-27 14:58 [PATCH v4 00/14] mm, sparse-vmemmap: Introduce compound devmaps for device-dax Joao Martins
2021-08-27 14:58 ` [PATCH v4 01/14] memory-failure: fetch compound_head after pgmap_pfn_valid() Joao Martins
2021-08-27 14:58 ` [PATCH v4 02/14] mm/page_alloc: split prep_compound_page into head and tail subparts Joao Martins
2021-08-27 14:58 ` [PATCH v4 03/14] mm/page_alloc: refactor memmap_init_zone_device() page init Joao Martins
2021-08-27 14:58 ` [PATCH v4 04/14] mm/memremap: add ZONE_DEVICE support for compound pages Joao Martins
2021-08-27 15:33   ` Christoph Hellwig
2021-08-27 16:00     ` Joao Martins
2021-09-01  9:44       ` Christoph Hellwig
2021-09-09  9:38         ` Joao Martins
2021-08-27 14:58 ` [PATCH v4 05/14] device-dax: use ALIGN() for determining pgoff Joao Martins
2021-08-27 14:58 ` [PATCH v4 06/14] device-dax: ensure dev_dax->pgmap is valid for dynamic devices Joao Martins
2021-11-05  0:31   ` Dan Williams
2021-11-05 12:09     ` Joao Martins
2021-11-05 16:14       ` Joao Martins
2021-11-05 16:46       ` Dan Williams
2021-11-05 18:11         ` Joao Martins
2021-08-27 14:58 ` [PATCH v4 07/14] device-dax: compound devmap support Joao Martins
2021-11-05  0:38   ` Dan Williams
2021-11-05 14:10     ` Joao Martins
2021-11-05 16:41       ` Dan Williams
2021-08-27 14:58 ` [PATCH v4 08/14] mm/gup: grab head page refcount once for group of subpages Joao Martins
2021-08-27 16:25   ` Jason Gunthorpe
2021-08-27 18:34     ` Joao Martins
2021-08-30 13:07       ` Jason Gunthorpe
2021-08-31 12:34         ` Joao Martins
2021-08-31 17:05           ` Jason Gunthorpe
2021-09-23 16:51             ` Joao Martins [this message]
2021-09-28 18:01               ` Jason Gunthorpe
2021-09-29 11:50                 ` Joao Martins
2021-09-29 19:34                   ` Jason Gunthorpe
2021-09-30  3:01                     ` Alistair Popple
2021-09-30 17:54                       ` Joao Martins
2021-09-30 21:55                         ` Jason Gunthorpe
2021-10-18 18:36                       ` Jason Gunthorpe
2021-10-18 18:37                   ` Jason Gunthorpe
2021-10-08 11:54   ` Jason Gunthorpe
2021-10-11 15:53     ` Joao Martins
2021-10-13 17:41       ` Jason Gunthorpe
2021-10-13 19:18         ` Joao Martins
2021-10-13 19:43           ` Jason Gunthorpe
2021-10-14 17:56             ` Joao Martins
2021-10-14 18:06               ` Jason Gunthorpe
2021-08-27 14:58 ` [PATCH v4 09/14] mm/sparse-vmemmap: add a pgmap argument to section activation Joao Martins
2021-08-27 14:58 ` [PATCH v4 10/14] mm/sparse-vmemmap: refactor core of vmemmap_populate_basepages() to helper Joao Martins
2021-08-27 14:58 ` [PATCH v4 11/14] mm/hugetlb_vmemmap: move comment block to Documentation/vm Joao Martins
2021-08-27 14:58 ` [PATCH v4 12/14] mm/sparse-vmemmap: populate compound devmaps Joao Martins
2021-08-27 14:58 ` [PATCH v4 13/14] mm/page_alloc: reuse tail struct pages for " Joao Martins
2021-08-27 14:58 ` [PATCH v4 14/14] mm/sparse-vmemmap: improve memory savings for compound pud geometry Joao Martins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8c23586a-eb3b-11a6-e72a-dcc3faad4e96@oracle.com \
    --to=joao.m.martins@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=hch@lst.de \
    --cc=jane.chu@oracle.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=songmuchun@bytedance.com \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).