All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>,
	akpm@linux-foundation.org, tglx@linutronix.de,
	hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com,
	kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, songmuchun@bytedance.com
Subject: Re: [PATCH v2 6/9] mm: free user PTE page table pages
Date: Wed, 1 Sep 2021 18:13:07 +0200	[thread overview]
Message-ID: <7789261d-6a64-c47b-be6c-c9be680e5d33@redhat.com> (raw)
In-Reply-To: <20210901153247.GJ1721383@nvidia.com>

On 01.09.21 17:32, Jason Gunthorpe wrote:
> On Wed, Sep 01, 2021 at 03:57:09PM +0200, David Hildenbrand wrote:
>> On 01.09.21 15:53, Jason Gunthorpe wrote:
>>> On Thu, Aug 19, 2021 at 11:18:55AM +0800, Qi Zheng wrote:
>>>
>>>> diff --git a/mm/gup.c b/mm/gup.c
>>>> index 2630ed1bb4f4..30757f3b176c 100644
>>>> +++ b/mm/gup.c
>>>> @@ -500,6 +500,9 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
>>>>    	if (unlikely(pmd_bad(*pmd)))
>>>>    		return no_page_table(vma, flags);
>>>> +	if (!pte_try_get(mm, pmd))
>>>> +		return no_page_table(vma, flags);
>>>> +
>>>>    	ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
>>>
>>> This is not good on a performance path, the pte_try_get() is
>>> locking/locking the same lock that pte_offset_map_lock() is getting.
>>
>> Yes, and we really need patch #8, anything else is just confusing reviewers.
> 
> It is a bit better with patch 8, but it is still not optimal, we don't
> need to do the atomic work at all if the entire ptep is accessed while
> locked. So the above is stil not what I would expect here, even with
> RCU.
> 
> eg I would expect that this kind of change would work first with the
> existing paired acessors, ie
> 
> 	pte = pte_offset_map(pmd, address);
> 	pte_unmap(pte);
> 
> Should handle the refcount under the covers, and same kind of idea for
> the _locked/_unlocked varient.

See my other mail.

> 
> Only places that don't already use that pairing should get modified.
> 
> To do this we have to extend the API so that pte_offset_map() can
> fail, or very cleverly return some kind of global non-present pte page
> (I wonder if the zero page would work?)

I explored both ideas (returning NULL, return a specially prepared page) 
and it didn't work in some cases where we unmap+remap etc.

> 
>>> Also, I don't really understand how this scheme works with
>>> get_user_pages_fast.
>>
>> With the RCU change it in #8 it should work just fine, because RCU
>> synchronize has to wait either until all other CPUs have left the RCU read
>> section, or re-enabled interrupts.
> 
> So at this point in the series fast gup is broken, that does mean the
> series presentation really needs to be reworked. The better
> presentation is to add the API changes, with a
> no-functional-difference implementation, push the new API in well
> split patches to all the consumption sites, then change the API to
> have the new semantics.

Exactly my thoughts.

> 
> RCU and refcount to free the page levels seems like a reasonable
> approach, but I have to say I haven't thought it through fully - are
> all the contexts that have the pte deref safe to do call_rcu?


Very good question. I'd assume so.


-- 
Thanks,

David / dhildenb


  reply	other threads:[~2021-09-01 16:13 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-19  3:18 [PATCH v2 0/9] Free user PTE page table pages Qi Zheng
2021-08-19  3:18 ` [PATCH v2 1/9] mm: introduce pmd_install() helper Qi Zheng
2021-08-24 16:26   ` David Hildenbrand
2021-08-25 16:20     ` Qi Zheng
2021-08-25 16:32       ` David Hildenbrand
2021-08-26  3:04         ` Qi Zheng
2021-08-19  3:18 ` [PATCH v2 2/9] mm: remove redundant smp_wmb() Qi Zheng
2021-08-19  3:18 ` [PATCH v2 3/9] mm: rework the parameter of lock_page_or_retry() Qi Zheng
2021-08-19  3:18 ` [PATCH v2 4/9] mm: move pte_alloc{,_map,_map_lock}() to a separate file Qi Zheng
2021-08-19  3:18 ` [PATCH v2 5/9] mm: pte_refcount infrastructure Qi Zheng
2021-08-19  3:18 ` [PATCH v2 6/9] mm: free user PTE page table pages Qi Zheng
2021-08-19  7:01   ` David Hildenbrand
2021-08-19 10:18     ` [External] " Qi Zheng
2021-09-01 13:53   ` Jason Gunthorpe
2021-09-01 13:57     ` David Hildenbrand
2021-09-01 15:32       ` Jason Gunthorpe
2021-09-01 16:13         ` David Hildenbrand [this message]
2021-09-01 16:16           ` Jason Gunthorpe
2021-09-01 16:19             ` David Hildenbrand
2021-09-01 17:10               ` Jason Gunthorpe
2021-09-01 17:49                 ` David Hildenbrand
2021-09-01 17:55                   ` Jason Gunthorpe
2021-09-01 17:58                     ` David Hildenbrand
2021-09-01 18:09                       ` Jason Gunthorpe
2021-09-01 18:10                         ` David Hildenbrand
2021-09-02  7:04                     ` Qi Zheng
2021-09-02  6:53         ` Qi Zheng
2021-08-19  3:18 ` [PATCH v2 7/9] mm: add THP support for pte_ref Qi Zheng
2021-08-19  3:18 ` [PATCH v2 8/9] mm: free PTE page table by using rcu mechanism Qi Zheng
2021-08-19  3:18 ` [PATCH v2 9/9] mm: use mmu_gather to free PTE page table Qi Zheng
2021-09-01 12:32 ` [PATCH v2 0/9] Free user PTE page table pages David Hildenbrand
2021-09-01 16:07   ` Jason Gunthorpe
2021-09-01 16:10     ` David Hildenbrand
2021-09-02  3:37   ` Qi Zheng
2021-09-15 14:52   ` Qi Zheng
2021-09-15 14:59     ` Jason Gunthorpe
2021-09-16  5:32       ` Qi Zheng
2021-09-16  8:30         ` David Hildenbrand
2021-09-16  8:41           ` Qi Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7789261d-6a64-c47b-be6c-c9be680e5d33@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jgg@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mika.penttila@nextfour.com \
    --cc=songmuchun@bytedance.com \
    --cc=tglx@linutronix.de \
    --cc=vdavydov.dev@gmail.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.