All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Price <steven.price@arm.com>
To: "Anshuman Khandual" <anshuman.khandual@arm.com>,
	linux-mm@kvack.org, "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Mark Rutland" <Mark.Rutland@arm.com>,
	x86@kernel.org, "Arnd Bergmann" <arnd@arndb.de>,
	"Ard Biesheuvel" <ard.biesheuvel@linaro.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	linux-kernel@vger.kernel.org,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Andy Lutomirski" <luto@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"James Morse" <james.morse@arm.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Will Deacon" <will@kernel.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linux-arm-kernel@lists.infradead.org, "Liang,
	Kan" <kan.liang@linux.intel.com>
Subject: Re: [PATCH v9 11/21] mm: pagewalk: Add p4d_entry() and pgd_entry()
Date: Mon, 29 Jul 2019 13:17:42 +0100	[thread overview]
Message-ID: <63a86424-9a8e-4528-5880-138f0009e462@arm.com> (raw)
In-Reply-To: <b61435a3-0da0-de57-0993-b1fffeca3ca9@arm.com>

On 28/07/2019 13:33, Anshuman Khandual wrote:
> 
> 
> On 07/22/2019 09:12 PM, Steven Price wrote:
>> pgd_entry() and pud_entry() were removed by commit 0b1fbfe50006c410
>> ("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were
>> no users. We're about to add users so reintroduce them, along with
>> p4d_entry() as we now have 5 levels of tables.
>>
>> Note that commit a00cc7d9dd93d66a ("mm, x86: add support for
>> PUD-sized transparent hugepages") already re-added pud_entry() but with
>> different semantics to the other callbacks. Since there have never
>> been upstream users of this, revert the semantics back to match the
>> other callbacks. This means pud_entry() is called for all entries, not
>> just transparent huge pages.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>  include/linux/mm.h | 15 +++++++++------
>>  mm/pagewalk.c      | 27 ++++++++++++++++-----------
>>  2 files changed, 25 insertions(+), 17 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 0334ca97c584..b22799129128 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1432,15 +1432,14 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
>>  
>>  /**
>>   * mm_walk - callbacks for walk_page_range
>> - * @pud_entry: if set, called for each non-empty PUD (2nd-level) entry
>> - *	       this handler should only handle pud_trans_huge() puds.
>> - *	       the pmd_entry or pte_entry callbacks will be used for
>> - *	       regular PUDs.
>> - * @pmd_entry: if set, called for each non-empty PMD (3rd-level) entry
>> + * @pgd_entry: if set, called for each non-empty PGD (top-level) entry
>> + * @p4d_entry: if set, called for each non-empty P4D entry
>> + * @pud_entry: if set, called for each non-empty PUD entry
>> + * @pmd_entry: if set, called for each non-empty PMD entry
>>   *	       this handler is required to be able to handle
>>   *	       pmd_trans_huge() pmds.  They may simply choose to
>>   *	       split_huge_page() instead of handling it explicitly.
>> - * @pte_entry: if set, called for each non-empty PTE (4th-level) entry
>> + * @pte_entry: if set, called for each non-empty PTE (lowest-level) entry
>>   * @pte_hole: if set, called for each hole at all levels
>>   * @hugetlb_entry: if set, called for each hugetlb entry
>>   * @test_walk: caller specific callback function to determine whether
>> @@ -1455,6 +1454,10 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
>>   * (see the comment on walk_page_range() for more details)
>>   */
>>  struct mm_walk {
>> +	int (*pgd_entry)(pgd_t *pgd, unsigned long addr,
>> +			 unsigned long next, struct mm_walk *walk);
>> +	int (*p4d_entry)(p4d_t *p4d, unsigned long addr,
>> +			 unsigned long next, struct mm_walk *walk);
>>  	int (*pud_entry)(pud_t *pud, unsigned long addr,
>>  			 unsigned long next, struct mm_walk *walk);
>>  	int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
>> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
>> index c3084ff2569d..98373a9f88b8 100644
>> --- a/mm/pagewalk.c
>> +++ b/mm/pagewalk.c
>> @@ -90,15 +90,9 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
>>  		}
>>  
>>  		if (walk->pud_entry) {
>> -			spinlock_t *ptl = pud_trans_huge_lock(pud, walk->vma);
>> -
>> -			if (ptl) {
>> -				err = walk->pud_entry(pud, addr, next, walk);
>> -				spin_unlock(ptl);
>> -				if (err)
>> -					break;
>> -				continue;
>> -			}
>> +			err = walk->pud_entry(pud, addr, next, walk);
>> +			if (err)
>> +				break;
> 
> But will not this still encounter possible THP entries when walking user
> page tables (valid walk->vma) in which case still needs to get a lock.
> OR will the callback take care of it ?

This is what I mean in the commit message by:
> Since there have never
> been upstream users of this, revert the semantics back to match the
> other callbacks. This means pud_entry() is called for all entries, not
> just transparent huge pages.

So the expectation is that the caller takes care of it.

However, having checked again, it appears that mm/hmm.c now does use
this callback (merged in v5.2-rc1).

Jérôme - are you happy with this change in semantics? It looks like
hmm_vma_walk_pud() should deal gracefully with both normal and large
pages - although I'm unsure whether you are relying on the lock from
pud_trans_huge_lock()?

Thanks,

Steve

WARNING: multiple messages have this Message-ID (diff)
From: Steven Price <steven.price@arm.com>
To: "Anshuman Khandual" <anshuman.khandual@arm.com>,
	linux-mm@kvack.org, "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Mark Rutland" <Mark.Rutland@arm.com>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Ard Biesheuvel" <ard.biesheuvel@linaro.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Andy Lutomirski" <luto@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"James Morse" <james.morse@arm.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Will Deacon" <will@kernel.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linux-arm-kernel@lists.infradead.org, "Liang,
	Kan" <kan.liang@linux.intel.com>
Subject: Re: [PATCH v9 11/21] mm: pagewalk: Add p4d_entry() and pgd_entry()
Date: Mon, 29 Jul 2019 13:17:42 +0100	[thread overview]
Message-ID: <63a86424-9a8e-4528-5880-138f0009e462@arm.com> (raw)
In-Reply-To: <b61435a3-0da0-de57-0993-b1fffeca3ca9@arm.com>

On 28/07/2019 13:33, Anshuman Khandual wrote:
> 
> 
> On 07/22/2019 09:12 PM, Steven Price wrote:
>> pgd_entry() and pud_entry() were removed by commit 0b1fbfe50006c410
>> ("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were
>> no users. We're about to add users so reintroduce them, along with
>> p4d_entry() as we now have 5 levels of tables.
>>
>> Note that commit a00cc7d9dd93d66a ("mm, x86: add support for
>> PUD-sized transparent hugepages") already re-added pud_entry() but with
>> different semantics to the other callbacks. Since there have never
>> been upstream users of this, revert the semantics back to match the
>> other callbacks. This means pud_entry() is called for all entries, not
>> just transparent huge pages.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>  include/linux/mm.h | 15 +++++++++------
>>  mm/pagewalk.c      | 27 ++++++++++++++++-----------
>>  2 files changed, 25 insertions(+), 17 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 0334ca97c584..b22799129128 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1432,15 +1432,14 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
>>  
>>  /**
>>   * mm_walk - callbacks for walk_page_range
>> - * @pud_entry: if set, called for each non-empty PUD (2nd-level) entry
>> - *	       this handler should only handle pud_trans_huge() puds.
>> - *	       the pmd_entry or pte_entry callbacks will be used for
>> - *	       regular PUDs.
>> - * @pmd_entry: if set, called for each non-empty PMD (3rd-level) entry
>> + * @pgd_entry: if set, called for each non-empty PGD (top-level) entry
>> + * @p4d_entry: if set, called for each non-empty P4D entry
>> + * @pud_entry: if set, called for each non-empty PUD entry
>> + * @pmd_entry: if set, called for each non-empty PMD entry
>>   *	       this handler is required to be able to handle
>>   *	       pmd_trans_huge() pmds.  They may simply choose to
>>   *	       split_huge_page() instead of handling it explicitly.
>> - * @pte_entry: if set, called for each non-empty PTE (4th-level) entry
>> + * @pte_entry: if set, called for each non-empty PTE (lowest-level) entry
>>   * @pte_hole: if set, called for each hole at all levels
>>   * @hugetlb_entry: if set, called for each hugetlb entry
>>   * @test_walk: caller specific callback function to determine whether
>> @@ -1455,6 +1454,10 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
>>   * (see the comment on walk_page_range() for more details)
>>   */
>>  struct mm_walk {
>> +	int (*pgd_entry)(pgd_t *pgd, unsigned long addr,
>> +			 unsigned long next, struct mm_walk *walk);
>> +	int (*p4d_entry)(p4d_t *p4d, unsigned long addr,
>> +			 unsigned long next, struct mm_walk *walk);
>>  	int (*pud_entry)(pud_t *pud, unsigned long addr,
>>  			 unsigned long next, struct mm_walk *walk);
>>  	int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
>> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
>> index c3084ff2569d..98373a9f88b8 100644
>> --- a/mm/pagewalk.c
>> +++ b/mm/pagewalk.c
>> @@ -90,15 +90,9 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
>>  		}
>>  
>>  		if (walk->pud_entry) {
>> -			spinlock_t *ptl = pud_trans_huge_lock(pud, walk->vma);
>> -
>> -			if (ptl) {
>> -				err = walk->pud_entry(pud, addr, next, walk);
>> -				spin_unlock(ptl);
>> -				if (err)
>> -					break;
>> -				continue;
>> -			}
>> +			err = walk->pud_entry(pud, addr, next, walk);
>> +			if (err)
>> +				break;
> 
> But will not this still encounter possible THP entries when walking user
> page tables (valid walk->vma) in which case still needs to get a lock.
> OR will the callback take care of it ?

This is what I mean in the commit message by:
> Since there have never
> been upstream users of this, revert the semantics back to match the
> other callbacks. This means pud_entry() is called for all entries, not
> just transparent huge pages.

So the expectation is that the caller takes care of it.

However, having checked again, it appears that mm/hmm.c now does use
this callback (merged in v5.2-rc1).

Jérôme - are you happy with this change in semantics? It looks like
hmm_vma_walk_pud() should deal gracefully with both normal and large
pages - although I'm unsure whether you are relying on the lock from
pud_trans_huge_lock()?

Thanks,

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2019-07-29 12:17 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-22 15:41 [PATCH v9 00/21] Generic page walk and ptdump Steven Price
2019-07-22 15:41 ` Steven Price
2019-07-22 15:41 ` [PATCH v9 01/21] arc: mm: Add p?d_leaf() definitions Steven Price
2019-07-22 15:41   ` Steven Price
2019-07-22 15:41   ` Steven Price
2019-07-22 15:41 ` [PATCH v9 02/21] arm: " Steven Price
2019-07-22 15:41   ` Steven Price
2019-07-22 15:41 ` [PATCH v9 03/21] arm64: " Steven Price
2019-07-22 15:41   ` Steven Price
2019-07-22 15:41 ` [PATCH v9 04/21] mips: " Steven Price
2019-07-22 15:41   ` Steven Price
2019-07-22 21:47   ` Paul Burton
2019-07-22 21:47     ` Paul Burton
2019-07-24 13:03     ` Steven Price
2019-07-24 13:03       ` Steven Price
2019-07-22 15:41 ` [PATCH v9 05/21] powerpc: " Steven Price
2019-07-22 15:41   ` Steven Price
2019-07-22 15:41   ` Steven Price
2019-07-22 15:41   ` Steven Price
2019-07-22 15:41 ` [PATCH v9 06/21] riscv: " Steven Price
2019-07-22 15:41   ` Steven Price
2019-07-22 15:41   ` Steven Price
2019-07-22 15:41 ` [PATCH v9 07/21] s390: " Steven Price
2019-07-22 15:41   ` Steven Price
2019-07-22 15:41 ` [PATCH v9 08/21] sparc: " Steven Price
2019-07-22 15:41   ` Steven Price
2019-07-22 15:41   ` Steven Price
2019-07-22 15:41 ` [PATCH v9 09/21] x86: " Steven Price
2019-07-22 15:41   ` Steven Price
2019-07-22 15:41 ` [PATCH v9 10/21] mm: Add generic p?d_leaf() macros Steven Price
2019-07-22 15:41   ` Steven Price
2019-07-23  9:41   ` Mark Rutland
2019-07-23  9:41     ` Mark Rutland
2019-07-24 13:48     ` Steven Price
2019-07-24 13:48       ` Steven Price
2019-07-28 11:44     ` Anshuman Khandual
2019-07-28 11:44       ` Anshuman Khandual
2019-07-29 11:38       ` Steven Price
2019-07-29 11:38         ` Steven Price
2019-08-01  6:09         ` Anshuman Khandual
2019-08-01  6:09           ` Anshuman Khandual
2019-08-01 12:22           ` Steven Price
2019-08-01 12:22             ` Steven Price
2019-07-29 12:50       ` Mark Rutland
2019-07-29 12:50         ` Mark Rutland
2019-08-01  6:13         ` Anshuman Khandual
2019-08-01  6:13           ` Anshuman Khandual
2019-07-22 15:42 ` [PATCH v9 11/21] mm: pagewalk: Add p4d_entry() and pgd_entry() Steven Price
2019-07-22 15:42   ` Steven Price
2019-07-23 10:14   ` Mark Rutland
2019-07-23 10:14     ` Mark Rutland
2019-07-24 13:53     ` Steven Price
2019-07-24 13:53       ` Steven Price
2019-07-24 14:09       ` Mark Rutland
2019-07-24 14:09         ` Mark Rutland
2019-07-28 12:33   ` Anshuman Khandual
2019-07-28 12:33     ` Anshuman Khandual
2019-07-29 12:17     ` Steven Price [this message]
2019-07-29 12:17       ` Steven Price
2019-07-22 15:42 ` [PATCH v9 12/21] mm: pagewalk: Allow walking without vma Steven Price
2019-07-22 15:42   ` Steven Price
2019-07-28 14:20   ` Anshuman Khandual
2019-07-28 14:20     ` Anshuman Khandual
2019-07-29 12:29     ` Steven Price
2019-07-29 12:29       ` Steven Price
2019-08-01  6:41       ` Anshuman Khandual
2019-08-01  6:41         ` Anshuman Khandual
2019-07-22 15:42 ` [PATCH v9 13/21] mm: pagewalk: Add test_p?d callbacks Steven Price
2019-07-22 15:42   ` Steven Price
2019-07-28 13:41   ` Anshuman Khandual
2019-07-28 13:41     ` Anshuman Khandual
2019-07-29 12:34     ` Steven Price
2019-07-29 12:34       ` Steven Price
2019-07-22 15:42 ` [PATCH v9 14/21] x86: mm: Don't display pages which aren't present in debugfs Steven Price
2019-07-22 15:42   ` Steven Price
2019-07-22 15:42 ` [PATCH v9 15/21] x86: mm: Point to struct seq_file from struct pg_state Steven Price
2019-07-22 15:42   ` Steven Price
2019-07-22 15:42 ` [PATCH v9 16/21] x86: mm+efi: Convert ptdump_walk_pgd_level() to take a mm_struct Steven Price
2019-07-22 15:42   ` Steven Price
2019-07-22 15:42 ` [PATCH v9 17/21] x86: mm: Convert ptdump_walk_pgd_level_debugfs() to take an mm_struct Steven Price
2019-07-22 15:42   ` Steven Price
2019-07-22 15:42 ` [PATCH v9 18/21] x86: mm: Convert ptdump_walk_pgd_level_core() " Steven Price
2019-07-22 15:42   ` Steven Price
2019-07-22 15:42 ` [PATCH v9 19/21] mm: Add generic ptdump Steven Price
2019-07-22 15:42   ` Steven Price
2019-07-23  9:57   ` Mark Rutland
2019-07-23  9:57     ` Mark Rutland
2019-07-24 16:36     ` Steven Price
2019-07-24 16:36       ` Steven Price
2019-07-29  2:59   ` Anshuman Khandual
2019-07-29  2:59     ` Anshuman Khandual
2019-07-29 13:56     ` Steven Price
2019-07-29 13:56       ` Steven Price
2019-07-22 15:42 ` [PATCH v9 20/21] x86: mm: Convert dump_pagetables to use walk_page_range Steven Price
2019-07-22 15:42   ` Steven Price
2019-07-22 15:42 ` [PATCH v9 21/21] arm64: mm: Convert mm/dump.c to use walk_page_range() Steven Price
2019-07-22 15:42   ` Steven Price
2019-07-23  6:39 ` [PATCH v9 00/21] Generic page walk and ptdump Anshuman Khandual
2019-07-23  6:39   ` Anshuman Khandual
2019-07-24 13:35   ` Steven Price
2019-07-24 13:35     ` Steven Price
2019-07-25  9:09     ` Anshuman Khandual
2019-07-25  9:09       ` Anshuman Khandual
2019-07-25  9:30       ` Will Deacon
2019-07-25  9:30         ` Will Deacon
2019-07-26  6:03         ` Anshuman Khandual
2019-07-26  6:03           ` Anshuman Khandual
2019-07-25 10:15       ` Steven Price
2019-07-25 10:15         ` Steven Price
2019-07-23 10:16 ` Mark Rutland
2019-07-23 10:16   ` Mark Rutland
2019-07-24 13:35   ` Steven Price
2019-07-24 13:35     ` Steven Price
2019-07-24 13:57     ` Thomas Gleixner
2019-07-24 13:57       ` Thomas Gleixner
2019-07-24 14:07       ` Mark Rutland
2019-07-24 14:07         ` Mark Rutland
2019-07-24 14:18       ` Steven Price
2019-07-24 14:18         ` Steven Price
2019-07-24 14:37         ` Thomas Gleixner
2019-07-24 14:37           ` Thomas Gleixner
2019-07-28 11:20 ` Anshuman Khandual
2019-07-28 11:20   ` Anshuman Khandual
2019-07-29 11:32   ` Steven Price
2019-07-29 11:32     ` Steven Price
2019-07-31  9:27     ` Sven Schnelle
2019-07-31  9:27       ` Sven Schnelle
2019-07-31 11:18       ` Steven Price
2019-07-31 11:18         ` Steven Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=63a86424-9a8e-4528-5880-138f0009e462@arm.com \
    --to=steven.price@arm.com \
    --cc=Mark.Rutland@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=jglisse@redhat.com \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.