linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
	akpm@linux-foundation.org, catalin.marinas@arm.com,
	will@kernel.org
Cc: dalias@libc.org, linux-ia64@vger.kernel.org,
	linux-sh@vger.kernel.org, linux-kernel@vger.kernel.org,
	James.Bottomley@HansenPartnership.com, linux-mm@kvack.org,
	paulus@samba.org, sparclinux@vger.kernel.org,
	agordeev@linux.ibm.com, linux-arch@vger.kernel.org,
	linux-s390@vger.kernel.org, arnd@arndb.de,
	ysato@users.sourceforge.jp, deller@gmx.de,
	borntraeger@linux.ibm.com, gor@linux.ibm.com, hca@linux.ibm.com,
	linux-arm-kernel@lists.infradead.org, tsbogend@alpha.franken.de,
	linux-parisc@vger.kernel.org, linux-mips@vger.kernel.org,
	svens@linux.ibm.com, linuxppc-dev@lists.ozlabs.org,
	davem@davemloft.net
Subject: Re: [PATCH 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration
Date: Sun, 8 May 2022 17:19:43 +0800	[thread overview]
Message-ID: <1fad03a6-98cf-1b0e-e012-82dc6466c7d2@linux.alibaba.com> (raw)
In-Reply-To: <e8b56f7d-ad95-7938-21a5-55caedbbb354@linux.alibaba.com>



On 5/7/2022 10:33 AM, Baolin Wang wrote:
> 
> 
> On 5/7/2022 1:56 AM, Mike Kravetz wrote:
>> On 5/5/22 20:39, Baolin Wang wrote:
>>>
>>> On 5/6/2022 7:53 AM, Mike Kravetz wrote:
>>>> On 4/29/22 01:14, Baolin Wang wrote:
>>>>> On some architectures (like ARM64), it can support CONT-PTE/PMD size
>>>>> hugetlb, which means it can support not only PMD/PUD size hugetlb:
>>>>> 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page
>>>>> size specified.
>>>> <snip>
>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>> index 6fdd198..7cf2408 100644
>>>>> --- a/mm/rmap.c
>>>>> +++ b/mm/rmap.c
>>>>> @@ -1924,13 +1924,15 @@ static bool try_to_migrate_one(struct folio 
>>>>> *folio, struct vm_area_struct *vma,
>>>>>                        break;
>>>>>                    }
>>>>>                }
>>>>> +
>>>>> +            /* Nuke the hugetlb page table entry */
>>>>> +            pteval = huge_ptep_clear_flush(vma, address, pvmw.pte);
>>>>>            } else {
>>>>>                flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
>>>>> +            /* Nuke the page table entry. */
>>>>> +            pteval = ptep_clear_flush(vma, address, pvmw.pte);
>>>>>            }
>>>>
>>>> On arm64 with CONT-PTE/PMD the returned pteval will have dirty or 
>>>> young set
>>>> if ANY of the PTE/PMDs had dirty or young set.
>>>
>>> Right.
>>>
>>>>
>>>>> -        /* Nuke the page table entry. */
>>>>> -        pteval = ptep_clear_flush(vma, address, pvmw.pte);
>>>>> -
>>>>>            /* Set the dirty flag on the folio now the pte is gone. */
>>>>>            if (pte_dirty(pteval))
>>>>>                folio_mark_dirty(folio);
>>>>> @@ -2015,7 +2017,10 @@ static bool try_to_migrate_one(struct folio 
>>>>> *folio, struct vm_area_struct *vma,
>>>>>                pte_t swp_pte;
>>>>>                  if (arch_unmap_one(mm, vma, address, pteval) < 0) {
>>>>> -                set_pte_at(mm, address, pvmw.pte, pteval);
>>>>> +                if (folio_test_hugetlb(folio))
>>>>> +                    set_huge_pte_at(mm, address, pvmw.pte, pteval);
>>>>
>>>> And, we will use that pteval for ALL the PTE/PMDs here.  So, we 
>>>> would set
>>>> the dirty or young bit in ALL PTE/PMDs.
>>>>
>>>> Could that cause any issues?  May be more of a question for the 
>>>> arm64 people.
>>>
>>> I don't think this will cause any issues. Since the hugetlb can not 
>>> be split, and we should not lose the the dirty or young state if any 
>>> subpages were set. Meanwhile we already did like this in hugetlb.c:
>>>
>>> pte = huge_ptep_get_and_clear(mm, address, ptep);
>>> tlb_remove_huge_tlb_entry(h, tlb, ptep, address);
>>> if (huge_pte_dirty(pte))
>>>      set_page_dirty(page);
>>>
>>
>> Agree that it 'should not' cause issues.  It just seems inconsistent.
>> This is not a problem specifically with your patch, just the handling of
>> CONT-PTE/PMD entries.
>>
>> There does not appear to be an arm64 specific version of huge_ptep_get()
>> that takes CONT-PTE/PMD into account.  So, huge_ptep_get() would only
>> return the one specific value.  It would not take into account the dirty
>> or young bits of CONT-PTE/PMDs like your new version of
>> huge_ptep_get_and_clear.  Is that correct?  Or, am I missing something.
> 
> Yes, you are right.
> 
>>
>> If I am correct, then code like the following may not work:
>>
>> static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask,
>>                  unsigned long addr, unsigned long end, struct mm_walk 
>> *walk)
>> {
>>          pte_t huge_pte = huge_ptep_get(pte);
>>          struct numa_maps *md;
>>          struct page *page;
>>
>>          if (!pte_present(huge_pte))
>>                  return 0;
>>
>>          page = pte_page(huge_pte);
>>
>>          md = walk->private;
>>          gather_stats(page, md, pte_dirty(huge_pte), 1);
>>          return 0;
>> }
> 
> Right, this is inconsistent with current huge_ptep_get() interface like 
> you said. So I think we can define an ARCH-specific huge_ptep_get() 
> interface for arm64, and some sample code like below. How do you think?

After some investigation, I send out a RFC patch set[1] to address this 
issue. We can talk about this issue in that thread. Thanks.

[1] 
https://lore.kernel.org/all/cover.1651998586.git.baolin.wang@linux.alibaba.com/

  reply	other threads:[~2022-05-08  9:19 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-29  8:14 [PATCH 0/3] Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating Baolin Wang
2022-04-29  8:14 ` [PATCH 1/3] mm: change huge_ptep_clear_flush() to return the original pte Baolin Wang
2022-05-05 23:15   ` Mike Kravetz
2022-05-06  3:02     ` Baolin Wang
2022-04-29  8:14 ` [PATCH 2/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when migration Baolin Wang
2022-05-05 23:53   ` Mike Kravetz
2022-05-06  3:39     ` Baolin Wang
2022-05-06 17:56       ` Mike Kravetz
2022-05-07  2:33         ` Baolin Wang
2022-05-08  9:19           ` Baolin Wang [this message]
2022-04-29  8:14 ` [PATCH 3/3] mm: rmap: Fix CONT-PTE/PMD size hugetlb issue when unmapping Baolin Wang
2022-04-29 20:02   ` Gerald Schaefer
2022-04-30  3:22     ` Baolin Wang
2022-05-02 14:02       ` Gerald Schaefer
2022-05-03  2:19         ` Baolin Wang
2022-05-03 10:03           ` Gerald Schaefer
2022-05-03 13:33             ` Baolin Wang
2022-05-06 19:07             ` Mike Kravetz
2022-05-09 16:41               ` Peter Xu
2022-05-10  1:28                 ` Baolin Wang
2022-05-06 18:55   ` Mike Kravetz
2022-05-07  1:32     ` Baolin Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1fad03a6-98cf-1b0e-e012-82dc6466c7d2@linux.alibaba.com \
    --to=baolin.wang@linux.alibaba.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=borntraeger@linux.ibm.com \
    --cc=catalin.marinas@arm.com \
    --cc=dalias@libc.org \
    --cc=davem@davemloft.net \
    --cc=deller@gmx.de \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mike.kravetz@oracle.com \
    --cc=paulus@samba.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=svens@linux.ibm.com \
    --cc=tsbogend@alpha.franken.de \
    --cc=will@kernel.org \
    --cc=ysato@users.sourceforge.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).