linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Song Liu <songliubraving@fb.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kernel-team@fb.com, stable@vger.kernel.org,
	Joerg Roedel <jroedel@suse.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>
Subject: Re: [PATCH v2] x86/mm/pti: in pti_clone_pgtable(), increase addr properly
Date: Wed, 21 Aug 2019 12:10:08 +0200	[thread overview]
Message-ID: <20190821101008.GX2349@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20190820202314.1083149-1-songliubraving@fb.com>

On Tue, Aug 20, 2019 at 01:23:14PM -0700, Song Liu wrote:
> Before 32-bit support, pti_clone_pmds() always adds PMD_SIZE to addr.
> This behavior changes after the 32-bit support:  pti_clone_pgtable()
> increases addr by PUD_SIZE for pud_none(*pud) case, and increases addr by
> PMD_SIZE for pmd_none(*pmd) case. However, this is not accurate because
> addr may not be PUD_SIZE/PMD_SIZE aligned.
> 
> Fix this issue by properly rounding up addr to next PUD_SIZE/PMD_SIZE
> in these two cases.

So the patch is fine, ACK on that, but that still leaves us with the
puzzle of why this didn't explode mightily and the story needs a little
more work.

> The following explains how we debugged this issue:
> 
> We use huge page for hot text and thus reduces iTLB misses. As we
> benchmark 5.2 based kernel (vs. 4.16 based), we found ~2.5x more
> iTLB misses.
> 
> To figure out the issue, I use a debug patch that dumps page table for
> a pid. The following are information from the workload pid.
> 
> For the 4.16 based kernel:
> 
> host-4.16 # grep "x  pmd" /sys/kernel/debug/page_tables/dump_pid
> 0x0000000000600000-0x0000000000e00000           8M USR ro         PSE         x  pmd
> 0xffffffff81a00000-0xffffffff81c00000           2M     ro         PSE         x  pmd
> 
> For the 5.2 based kernel before this patch:
> 
> host-5.2-before # grep "x  pmd" /sys/kernel/debug/page_tables/dump_pid
> 0x0000000000600000-0x0000000000e00000           8M USR ro         PSE         x  pmd
> 
> The 8MB text in pmd is from user space. 4.16 kernel has 1 pmd for the
> irq entry table; while 4.16 kernel doesn't have it.
> 
> For the 5.2 based kernel after this patch:
> 
> host-5.2-after # grep "x  pmd" /sys/kernel/debug/page_tables/dump_pid
> 0x0000000000600000-0x0000000000e00000           8M USR ro         PSE         x  pmd
> 0xffffffff81000000-0xffffffff81e00000          14M     ro         PSE     GLB x  pmd
> 
> So after this patch, the 5.2 based kernel has 7 PMDs instead of 1 PMD
> in 4.16 kernel.

This basically gives rise to more questions than it provides answers.
You seem to have 'forgotten' to provide the equivalent mappings on the
two older kernels. The fact that they're not PMD is evident, but it
would be very good to know what is mapped, and what -- if anything --
lives in the holes we've (accidentally) created.

Can you please provide more complete mappings? Basically provide the
whole cpu_entry_area mapping.

> This further reduces iTLB miss rate

What you're saying is that by using PMDs, we reduce 4K iTLB usage. But
we increase 2M iTLB usage, but for your workload this works out
favourably (a quick look at the PMU event tables for SKL didn't show me
separate 4K/2M iTLB counters :/).

> Cc: stable@vger.kernel.org # v4.19+
> Fixes: 16a3fe634f6a ("x86/mm/pti: Clone kernel-image on PTE level for 32 bit")
> Reviewed-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> Cc: Joerg Roedel <jroedel@suse.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  arch/x86/mm/pti.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
> index b196524759ec..1337494e22ef 100644
> --- a/arch/x86/mm/pti.c
> +++ b/arch/x86/mm/pti.c
> @@ -330,13 +330,13 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
>  
>  		pud = pud_offset(p4d, addr);
>  		if (pud_none(*pud)) {
> -			addr += PUD_SIZE;
> +			addr = round_up(addr + 1, PUD_SIZE);
>  			continue;
>  		}
>  
>  		pmd = pmd_offset(pud, addr);
>  		if (pmd_none(*pmd)) {
> -			addr += PMD_SIZE;
> +			addr = round_up(addr + 1, PMD_SIZE);
>  			continue;
>  		}
>  
> -- 
> 2.17.1
> 

  parent reply	other threads:[~2019-08-21 10:10 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-20 20:23 [PATCH v2] x86/mm/pti: in pti_clone_pgtable(), increase addr properly Song Liu
2019-08-21  9:41 ` Song Liu
2019-08-21 10:17   ` Thomas Gleixner
2019-08-24  0:59     ` Thomas Gleixner
2019-08-24  2:10       ` Song Liu
2019-08-21 10:10 ` Peter Zijlstra [this message]
2019-08-21 10:30   ` Peter Zijlstra
2019-08-24  2:12     ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190821101008.GX2349@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=jroedel@suse.de \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=songliubraving@fb.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).