linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>,
	Will Deacon <will@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	mingo@redhat.com, acme@kernel.org, linux-kernel@vger.kernel.org,
	mark.rutland@arm.com, alexander.shishkin@linux.intel.com,
	jolsa@redhat.com, eranian@google.com, ak@linux.intel.com,
	dave.hansen@intel.com, kirill.shutemov@linux.intel.com,
	benh@kernel.crashing.org, paulus@samba.org,
	David Miller <davem@davemloft.net>,
	vbabka@suse.cz
Subject: Re: [PATCH V9 1/4] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE
Date: Wed, 11 Nov 2020 22:33:44 +0000	[thread overview]
Message-ID: <20201111223344.GX17076@casper.infradead.org> (raw)
In-Reply-To: <20201111200000.GL2628@hirez.programming.kicks-ass.net>

On Wed, Nov 11, 2020 at 09:00:00PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 11, 2020 at 06:26:20PM +0000, Matthew Wilcox wrote:
> > On Wed, Nov 11, 2020 at 06:22:53PM +0100, Peter Zijlstra wrote:
> > > On Wed, Nov 11, 2020 at 04:38:48PM +0000, Matthew Wilcox wrote:
> > > > 	if (pud_leaf(pud))
> > > > 		return PUD_SIZE;
> > > 
> > > But that doesn't handle non-pagetable aligned hugetlb sizes. Granted,
> > > that's unlikely at the PUD level, but why be inconsistent..
> > > 
> > > So we really want:
> > > 
> > > 	if (p*d_leaf(p*d)) {
> > > 		if (!'special') {
> > > 			page = p*d_page(p*d);
> > > 			if (PageHuge(page))
> > > 				return page_size(compound_head(page));
> > > 		}
> > > 		return P*D_SIZE;
> > > 	}
> > 
> > Still doesn't work because pages can be mapped at funny offsets.
> 
> Wait, what?! Is there hardware that has unaligned TLB page-sizes?

No, you can force a 2MB page to be mapped at an address which isn't
2MB aligned.

> Can you start a 64K page at an 8k offset? I don't think I've ever seen
> that. Still even with that, how would the above go wrong there? It would
> find the compound page covering @addr, PageHuge() (and possibly some
> addition arch specific condition) returns true and we get the compound
> size to find the hardware page size used.

On any architecture I can think of, that 2MB page will be mapped with 4kB
TLB entries.

> > What we really want is for a weak definition of
> > 
> > unsigned long tlb_size(struct mm_struct *mm, unsigned long addr)
> > {
> > 	if (p*d_leaf(p*d))
> > 		return p*d_size(p*d);
> > }
> > 
> > then ARM can look at its special bit in the page table to determine
> > whether this is a singleton or part of a brace of pages.
> 
> That's basically what we provide. but really the only thing that's
> missing from this generic page walker is the ability to detect if a
> !PageHuge compound page is actually still a hardware page.
> 
> > > Now, when you add !PMD THP sizes (presumably for architectures that have
> > > 'funny' sizes, otherwise what's the point), then you get to add '||
> > 
> > This is the problem with all the huge page support in Linux today.
> > It's written by people who work for hardware companies who think only
> > about exploiting the hardware features they sell.  You all ignore the
> > very real software overhedas of trying to manage millions of pages.
> > I see a 6% reduction in kernel overhead when running kernbench using
> > THPs that may go as large as 256kB.  On x86.  Intel x86, at that.
> 
> That's a really nice improvement. However then this code doesn't care
> about it. Please make it possible to distinguish between THP on hardware
> pages vs software pages.

That can and should be done just by looking at the page table entries.
There's no need to convert it into a struct page.  The CPU obviously
decides what TLB entry size to use based solely on the page tables,
so we can too.

  reply	other threads:[~2020-11-12  1:55 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-01 13:57 [PATCH V9 0/4] Add the page size in the perf record (kernel) kan.liang
2020-10-01 13:57 ` [PATCH V9 1/4] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE kan.liang
2020-10-09  9:09   ` Peter Zijlstra
2020-10-09  9:16     ` Peter Zijlstra
2020-10-09  9:37     ` Will Deacon
2020-10-09  9:53       ` Peter Zijlstra
2020-10-20  2:49         ` Leo Yan
2020-10-20  7:19           ` Peter Zijlstra
2020-10-20  8:16             ` Leo Yan
2020-10-09 12:29     ` Liang, Kan
2020-10-09 12:57       ` Peter Zijlstra
2020-10-09 13:28     ` Michael Ellerman
2020-10-12  8:48       ` Will Deacon
2020-10-13 14:57         ` Liang, Kan
2020-10-13 15:46           ` Peter Zijlstra
2020-10-13 16:34             ` Peter Zijlstra
2020-11-04 17:11               ` Liang, Kan
2020-11-10 15:20                 ` Liang, Kan
2020-11-11  9:57                 ` Peter Zijlstra
2020-11-11 11:22                   ` Peter Zijlstra
2020-11-11 12:43                     ` Peter Zijlstra
2020-11-11 15:30                       ` Matthew Wilcox
2020-11-11 15:52                         ` Peter Zijlstra
2020-11-11 15:57                         ` Peter Zijlstra
2020-11-11 16:38                           ` Matthew Wilcox
2020-11-11 17:22                             ` Peter Zijlstra
2020-11-11 18:26                               ` Matthew Wilcox
2020-11-11 20:00                                 ` Peter Zijlstra
2020-11-11 22:33                                   ` Matthew Wilcox [this message]
2020-11-12  9:53                                     ` Peter Zijlstra
2020-11-12 11:36                                       ` Peter Zijlstra
2020-11-12 14:01                                         ` Matthew Wilcox
2020-10-29 10:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-10-01 13:57 ` [PATCH V9 2/4] perf/x86/intel: Support PERF_SAMPLE_DATA_PAGE_SIZE kan.liang
2020-10-29 10:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-10-01 13:57 ` [PATCH V9 3/4] powerpc/perf: " kan.liang
2020-10-29 10:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-10-01 13:57 ` [PATCH V9 4/4] perf/core: Add support for PERF_SAMPLE_CODE_PAGE_SIZE kan.liang
2020-10-29 10:51   ` [tip: perf/core] " tip-bot2 for Stephane Eranian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201111223344.GX17076@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=benh@kernel.crashing.org \
    --cc=dave.hansen@intel.com \
    --cc=davem@davemloft.net \
    --cc=eranian@google.com \
    --cc=jolsa@redhat.com \
    --cc=kan.liang@linux.intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).