linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Thomas Hellström (VMware)" <thomas_os@shipmail.org>
To: Steven Price <steven.price@arm.com>
Cc: "Andy Lutomirski" <luto@kernel.org>,
	"Ard Biesheuvel" <ard.biesheuvel@linaro.org>,
	"Arnd Bergmann" <arnd@arndb.de>, "Borislav Petkov" <bp@alien8.de>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Ingo Molnar" <mingo@redhat.com>,
	"James Morse" <james.morse@arm.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Will Deacon" <will@kernel.org>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,
	"Mark Rutland" <Mark.Rutland@arm.com>,
	"Liang, Kan" <kan.liang@linux.intel.com>,
	"Zong Li" <zong.li@sifive.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH v16 11/25] mm: pagewalk: Add p4d_entry() and pgd_entry()
Date: Thu, 12 Dec 2019 12:23:44 +0100	[thread overview]
Message-ID: <13280f9e-6f03-e1fd-659a-31462ba185b0@shipmail.org> (raw)
In-Reply-To: <20191206135316.47703-12-steven.price@arm.com>

On 12/6/19 2:53 PM, Steven Price wrote:
> pgd_entry() and pud_entry() were removed by commit 0b1fbfe50006c410
> ("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were
> no users. We're about to add users so reintroduce them, along with
> p4d_entry() as we now have 5 levels of tables.
>
> Note that commit a00cc7d9dd93d66a ("mm, x86: add support for
> PUD-sized transparent hugepages") already re-added pud_entry() but with
> different semantics to the other callbacks. Since there have never
> been upstream users of this, revert the semantics back to match the
> other callbacks. This means pud_entry() is called for all entries, not
> just transparent huge pages.

Actually, there are two users of pud_entry(), in hmm.c and since 5.5rc1 
also mapping_dirty_helpers.c. The latter one is unproblematic and 
requires no attention but the one in hmm.c is probably largely untested, 
and seems to assume it was called outside of the spinlock.

The problem with the current patch is that the hmm pud_entry will 
traverse also pmds, so that will be done twice now.

In another thread we were discussing a means of rerunning the level (in 
case of a race), or continuing after a level, based on the return value 
after the callback. The change was fairly invasive,


> Tested-by: Zong Li <zong.li@sifive.com>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>   include/linux/pagewalk.h | 19 +++++++++++++------
>   mm/pagewalk.c            | 27 ++++++++++++++++-----------
>   2 files changed, 29 insertions(+), 17 deletions(-)
>
> diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
> index 6ec82e92c87f..06790f23957f 100644
> --- a/include/linux/pagewalk.h
> +++ b/include/linux/pagewalk.h
> @@ -8,15 +8,15 @@ struct mm_walk;
>   
>   /**
>    * mm_walk_ops - callbacks for walk_page_range
> - * @pud_entry:		if set, called for each non-empty PUD (2nd-level) entry
> - *			this handler should only handle pud_trans_huge() puds.
> - *			the pmd_entry or pte_entry callbacks will be used for
> - *			regular PUDs.
> - * @pmd_entry:		if set, called for each non-empty PMD (3rd-level) entry
> + * @pgd_entry:		if set, called for each non-empty PGD (top-level) entry
> + * @p4d_entry:		if set, called for each non-empty P4D entry
> + * @pud_entry:		if set, called for each non-empty PUD entry
> + * @pmd_entry:		if set, called for each non-empty PMD entry
>    *			this handler is required to be able to handle
>    *			pmd_trans_huge() pmds.  They may simply choose to
>    *			split_huge_page() instead of handling it explicitly.
> - * @pte_entry:		if set, called for each non-empty PTE (4th-level) entry
> + * @pte_entry:		if set, called for each non-empty PTE (lowest-level)
> + *			entry
>    * @pte_hole:		if set, called for each hole at all levels
>    * @hugetlb_entry:	if set, called for each hugetlb entry
>    * @test_walk:		caller specific callback function to determine whether
> @@ -27,8 +27,15 @@ struct mm_walk;
>    * @pre_vma:            if set, called before starting walk on a non-null vma.
>    * @post_vma:           if set, called after a walk on a non-null vma, provided
>    *                      that @pre_vma and the vma walk succeeded.
> + *
> + * p?d_entry callbacks are called even if those levels are folded on a
> + * particular architecture/configuration.
>    */
>   struct mm_walk_ops {
> +	int (*pgd_entry)(pgd_t *pgd, unsigned long addr,
> +			 unsigned long next, struct mm_walk *walk);
> +	int (*p4d_entry)(p4d_t *p4d, unsigned long addr,
> +			 unsigned long next, struct mm_walk *walk);
>   	int (*pud_entry)(pud_t *pud, unsigned long addr,
>   			 unsigned long next, struct mm_walk *walk);
>   	int (*pmd_entry)(pmd_t *pmd, unsigned long addr,
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index ea0b9e606ad1..c089786e7a7f 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -94,15 +94,9 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
>   		}
>   
>   		if (ops->pud_entry) {
> -			spinlock_t *ptl = pud_trans_huge_lock(pud, walk->vma);
> -
> -			if (ptl) {
> -				err = ops->pud_entry(pud, addr, next, walk);
> -				spin_unlock(ptl);
> -				if (err)
> -					break;
> -				continue;
> -			}
> +			err = ops->pud_entry(pud, addr, next, walk);
> +			if (err)
> +				break;

Actually, there are two current users of pud_entry(), in hmm.c and since 
5.5rc1 also mapping_dirty_helpers.c. The latter one is unproblematic and 
requires no attention but the one in hmm.c is probably largely untested, 
and seems to assume it was called outside of the spinlock.

The problem with the current patch is that the hmm pud_entry will 
traverse also pmds, so that will now be done twice.

/Thomas


  reply	other threads:[~2019-12-12 11:23 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-06 13:52 [PATCH v16 00/25] Generic page walk and ptdump Steven Price
2019-12-06 13:52 ` [PATCH v16 01/25] mm: Add generic p?d_leaf() macros Steven Price
2019-12-06 13:52 ` [PATCH v16 02/25] arc: mm: Add p?d_leaf() definitions Steven Price
2019-12-06 13:52 ` [PATCH v16 03/25] arm: " Steven Price
2019-12-06 13:52 ` [PATCH v16 04/25] arm64: " Steven Price
2019-12-06 13:52 ` [PATCH v16 05/25] mips: " Steven Price
2019-12-06 13:52 ` [PATCH v16 06/25] powerpc: " Steven Price
2019-12-09 11:08   ` Michael Ellerman
2019-12-09 13:06     ` Steven Price
2019-12-06 13:52 ` [PATCH v16 07/25] riscv: " Steven Price
2019-12-06 13:52 ` [PATCH v16 08/25] s390: " Steven Price
2019-12-06 13:53 ` [PATCH v16 09/25] sparc: " Steven Price
2019-12-06 13:53 ` [PATCH v16 10/25] x86: " Steven Price
2019-12-06 13:53 ` [PATCH v16 11/25] mm: pagewalk: Add p4d_entry() and pgd_entry() Steven Price
2019-12-12 11:23   ` Thomas Hellström (VMware) [this message]
2019-12-12 11:33     ` Thomas Hellström (VMware)
2019-12-12 13:15       ` Steven Price
2019-12-12 14:04         ` Thomas Hellström (VMware)
2019-12-12 15:18           ` Steven Price
2019-12-06 13:53 ` [PATCH v16 12/25] mm: pagewalk: Allow walking without vma Steven Price
2019-12-06 13:53 ` [PATCH v16 13/25] mm: pagewalk: Don't lock PTEs for walk_page_range_novma() Steven Price
2019-12-10 11:23   ` kbuild test robot
2019-12-11 15:54     ` Steven Price
2019-12-11 17:12       ` Luc Van Oostenryck
2019-12-11 17:19       ` Qian Cai
2019-12-06 13:53 ` [PATCH v16 14/25] mm: pagewalk: fix termination condition in walk_pte_range() Steven Price
2019-12-06 13:53 ` [PATCH v16 15/25] mm: pagewalk: Add test_p?d callbacks Steven Price
2019-12-06 13:53 ` [PATCH v16 16/25] mm: pagewalk: Add 'depth' parameter to pte_hole Steven Price
2019-12-06 13:53 ` [PATCH v16 17/25] x86: mm: Point to struct seq_file from struct pg_state Steven Price
2019-12-06 13:53 ` [PATCH v16 18/25] x86: mm+efi: Convert ptdump_walk_pgd_level() to take a mm_struct Steven Price
2019-12-06 13:53 ` [PATCH v16 19/25] x86: mm: Convert ptdump_walk_pgd_level_debugfs() to take an mm_struct Steven Price
2019-12-06 13:53 ` [PATCH v16 20/25] x86: mm: Convert ptdump_walk_pgd_level_core() " Steven Price
2019-12-06 13:53 ` [PATCH v16 21/25] mm: Add generic ptdump Steven Price
2019-12-06 13:53 ` [PATCH v16 22/25] x86: mm: Convert dump_pagetables to use walk_page_range Steven Price
2019-12-06 13:53 ` [PATCH v16 23/25] arm64: mm: Convert mm/dump.c to use walk_page_range() Steven Price
2019-12-06 13:53 ` [PATCH v16 24/25] arm64: mm: Display non-present entries in ptdump Steven Price
2019-12-06 13:53 ` [PATCH v16 25/25] mm: ptdump: Reduce level numbers by 1 in note_page() Steven Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=13280f9e-6f03-e1fd-659a-31462ba185b0@shipmail.org \
    --to=thomas_os@shipmail.org \
    --cc=Mark.Rutland@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=ard.biesheuvel@linaro.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=jglisse@redhat.com \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=steven.price@arm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=zong.li@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).