All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Peter Xu <peterx@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	Matthew Wilcox <willy@infradead.org>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"x86@kernel.org" <x86@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Muchun Song <muchun.song@linux.dev>,
	"sparclinux@vger.kernel.org" <sparclinux@vger.kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
	"Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Subject: Re: [PATCH 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf()
Date: Tue, 19 Mar 2024 20:26:56 -0300	[thread overview]
Message-ID: <20240319232656.GC159172@nvidia.com> (raw)
In-Reply-To: <e0417c2a-2ef1-4435-b5a7-aadfe90ff8f1@csgroup.eu>

On Tue, Mar 19, 2024 at 11:07:08PM +0000, Christophe Leroy wrote:
> 
> 
> Le 18/03/2024 à 17:15, Jason Gunthorpe a écrit :
> > On Thu, Mar 14, 2024 at 01:11:59PM +0000, Christophe Leroy wrote:
> >>
> >>
> >> Le 14/03/2024 à 13:53, Peter Xu a écrit :
> >>> On Thu, Mar 14, 2024 at 08:45:34AM +0000, Christophe Leroy wrote:
> >>>>
> >>>>
> >>>> Le 13/03/2024 à 22:47, peterx@redhat.com a écrit :
> >>>>> From: Peter Xu <peterx@redhat.com>
> >>>>>
> >>>>> PowerPC book3s 4K mostly has the same definition on both, except pXd_huge()
> >>>>> constantly returns 0 for hash MMUs.  As Michael Ellerman pointed out [1],
> >>>>> it is safe to check _PAGE_PTE on hash MMUs, as the bit will never be set so
> >>>>> it will keep returning false.
> >>>>>
> >>>>> As a reference, __p[mu]d_mkhuge() will trigger a BUG_ON trying to create
> >>>>> such huge mappings for 4K hash MMUs.  Meanwhile, the major powerpc hugetlb
> >>>>> pgtable walker __find_linux_pte() already used pXd_leaf() to check hugetlb
> >>>>> mappings.
> >>>>>
> >>>>> The goal should be that we will have one API pXd_leaf() to detect all kinds
> >>>>> of huge mappings.  AFAICT we need to use the pXd_leaf() impl (rather than
> >>>>> pXd_huge() ones) to make sure ie. THPs on hash MMU will also return true.
> >>>>
> >>>> All kinds of huge mappings ?
> >>>>
> >>>> pXd_leaf() will detect only leaf mappings (like pXd_huge() ). There are
> >>>> also huge mappings through hugepd. On powerpc 8xx we have 8M huge pages
> >>>> and 512k huge pages. A PGD entry covers 4M so pgd_leaf() won't report
> >>>> those huge pages.
> >>>
> >>> Ah yes, I should always mention this is in the context of leaf huge pages
> >>> only.  Are the examples you provided all fall into hugepd category?  If so
> >>> I can reword the commit message, as:
> >>
> >> On powerpc 8xx, only the 8M huge pages fall into the hugepd case.
> >>
> >> The 512k hugepages are at PTE level, they are handled more or less like
> >> CONT_PTE on ARM. see function set_huge_pte_at() for more context.
> >>
> >> You can also look at pte_leaf_size() and pgd_leaf_size().
> > 
> > IMHO leaf should return false if the thing is pointing to a next level
> > page table, even if that next level is fully populated with contiguous
> > pages.
> > 
> > This seems more aligned with the contig page direction that hugepd
> > should be moved over to..
> 
> Should hugepd be moved to the contig page direction, really ?

Sure? Is there any downside for the reading side to do so?

> Would it be acceptable that a 8M hugepage requires 2048 contig entries
> in 2 page tables, when the hugepd allows a single entry ? 

? I thought we agreed the only difference would be that something new
is needed to merge the two identical sibling page tables into one, ie
you pay 2x the page table memory if that isn't fixed. That is write
side only change and I imagine it could be done with a single PPC
special API.

Honestly not totally sure that is a big deal, it is already really
memory inefficient compared to every other arch's huge page by needing
the child page table in the first place.

> Would it be acceptable performancewise ?

Isn't this particular PPC sub platform ancient? Are there current real
users that are going to have hugetlbfs special code and care about
this performance detail on a 6.20 era kernel?

In today's world wouldn't it be performance better if these platforms
could support THP by aligning to the contig API instead of being
special?

Am I wrong to question why we are polluting the core code for this
special optimization?

Jason

WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Peter Xu <peterx@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	Matthew Wilcox <willy@infradead.org>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"x86@kernel.org" <x86@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Muchun Song <muchun.song@linux.dev>,
	"sparclinux@vger.kernel.org" <sparclinux@vger.kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
	"Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Subject: Re: [PATCH 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf()
Date: Tue, 19 Mar 2024 20:26:56 -0300	[thread overview]
Message-ID: <20240319232656.GC159172@nvidia.com> (raw)
In-Reply-To: <e0417c2a-2ef1-4435-b5a7-aadfe90ff8f1@csgroup.eu>

On Tue, Mar 19, 2024 at 11:07:08PM +0000, Christophe Leroy wrote:
> 
> 
> Le 18/03/2024 à 17:15, Jason Gunthorpe a écrit :
> > On Thu, Mar 14, 2024 at 01:11:59PM +0000, Christophe Leroy wrote:
> >>
> >>
> >> Le 14/03/2024 à 13:53, Peter Xu a écrit :
> >>> On Thu, Mar 14, 2024 at 08:45:34AM +0000, Christophe Leroy wrote:
> >>>>
> >>>>
> >>>> Le 13/03/2024 à 22:47, peterx@redhat.com a écrit :
> >>>>> From: Peter Xu <peterx@redhat.com>
> >>>>>
> >>>>> PowerPC book3s 4K mostly has the same definition on both, except pXd_huge()
> >>>>> constantly returns 0 for hash MMUs.  As Michael Ellerman pointed out [1],
> >>>>> it is safe to check _PAGE_PTE on hash MMUs, as the bit will never be set so
> >>>>> it will keep returning false.
> >>>>>
> >>>>> As a reference, __p[mu]d_mkhuge() will trigger a BUG_ON trying to create
> >>>>> such huge mappings for 4K hash MMUs.  Meanwhile, the major powerpc hugetlb
> >>>>> pgtable walker __find_linux_pte() already used pXd_leaf() to check hugetlb
> >>>>> mappings.
> >>>>>
> >>>>> The goal should be that we will have one API pXd_leaf() to detect all kinds
> >>>>> of huge mappings.  AFAICT we need to use the pXd_leaf() impl (rather than
> >>>>> pXd_huge() ones) to make sure ie. THPs on hash MMU will also return true.
> >>>>
> >>>> All kinds of huge mappings ?
> >>>>
> >>>> pXd_leaf() will detect only leaf mappings (like pXd_huge() ). There are
> >>>> also huge mappings through hugepd. On powerpc 8xx we have 8M huge pages
> >>>> and 512k huge pages. A PGD entry covers 4M so pgd_leaf() won't report
> >>>> those huge pages.
> >>>
> >>> Ah yes, I should always mention this is in the context of leaf huge pages
> >>> only.  Are the examples you provided all fall into hugepd category?  If so
> >>> I can reword the commit message, as:
> >>
> >> On powerpc 8xx, only the 8M huge pages fall into the hugepd case.
> >>
> >> The 512k hugepages are at PTE level, they are handled more or less like
> >> CONT_PTE on ARM. see function set_huge_pte_at() for more context.
> >>
> >> You can also look at pte_leaf_size() and pgd_leaf_size().
> > 
> > IMHO leaf should return false if the thing is pointing to a next level
> > page table, even if that next level is fully populated with contiguous
> > pages.
> > 
> > This seems more aligned with the contig page direction that hugepd
> > should be moved over to..
> 
> Should hugepd be moved to the contig page direction, really ?

Sure? Is there any downside for the reading side to do so?

> Would it be acceptable that a 8M hugepage requires 2048 contig entries
> in 2 page tables, when the hugepd allows a single entry ? 

? I thought we agreed the only difference would be that something new
is needed to merge the two identical sibling page tables into one, ie
you pay 2x the page table memory if that isn't fixed. That is write
side only change and I imagine it could be done with a single PPC
special API.

Honestly not totally sure that is a big deal, it is already really
memory inefficient compared to every other arch's huge page by needing
the child page table in the first place.

> Would it be acceptable performancewise ?

Isn't this particular PPC sub platform ancient? Are there current real
users that are going to have hugetlbfs special code and care about
this performance detail on a 6.20 era kernel?

In today's world wouldn't it be performance better if these platforms
could support THP by aligning to the contig API instead of being
special?

Am I wrong to question why we are polluting the core code for this
special optimization?

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Muchun Song <muchun.song@linux.dev>,
	"x86@kernel.org" <x86@kernel.org>,
	Nicholas Piggin <npiggin@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	"Naveen N. Rao" <naveen.n.rao@linux.ibm.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Peter Xu <peterx@redhat.com>, Mike Rapoport <rppt@kernel.org>,
	"sparclinux@vger.kernel.org" <sparclinux@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf()
Date: Tue, 19 Mar 2024 20:26:56 -0300	[thread overview]
Message-ID: <20240319232656.GC159172@nvidia.com> (raw)
In-Reply-To: <e0417c2a-2ef1-4435-b5a7-aadfe90ff8f1@csgroup.eu>

On Tue, Mar 19, 2024 at 11:07:08PM +0000, Christophe Leroy wrote:
> 
> 
> Le 18/03/2024 à 17:15, Jason Gunthorpe a écrit :
> > On Thu, Mar 14, 2024 at 01:11:59PM +0000, Christophe Leroy wrote:
> >>
> >>
> >> Le 14/03/2024 à 13:53, Peter Xu a écrit :
> >>> On Thu, Mar 14, 2024 at 08:45:34AM +0000, Christophe Leroy wrote:
> >>>>
> >>>>
> >>>> Le 13/03/2024 à 22:47, peterx@redhat.com a écrit :
> >>>>> From: Peter Xu <peterx@redhat.com>
> >>>>>
> >>>>> PowerPC book3s 4K mostly has the same definition on both, except pXd_huge()
> >>>>> constantly returns 0 for hash MMUs.  As Michael Ellerman pointed out [1],
> >>>>> it is safe to check _PAGE_PTE on hash MMUs, as the bit will never be set so
> >>>>> it will keep returning false.
> >>>>>
> >>>>> As a reference, __p[mu]d_mkhuge() will trigger a BUG_ON trying to create
> >>>>> such huge mappings for 4K hash MMUs.  Meanwhile, the major powerpc hugetlb
> >>>>> pgtable walker __find_linux_pte() already used pXd_leaf() to check hugetlb
> >>>>> mappings.
> >>>>>
> >>>>> The goal should be that we will have one API pXd_leaf() to detect all kinds
> >>>>> of huge mappings.  AFAICT we need to use the pXd_leaf() impl (rather than
> >>>>> pXd_huge() ones) to make sure ie. THPs on hash MMU will also return true.
> >>>>
> >>>> All kinds of huge mappings ?
> >>>>
> >>>> pXd_leaf() will detect only leaf mappings (like pXd_huge() ). There are
> >>>> also huge mappings through hugepd. On powerpc 8xx we have 8M huge pages
> >>>> and 512k huge pages. A PGD entry covers 4M so pgd_leaf() won't report
> >>>> those huge pages.
> >>>
> >>> Ah yes, I should always mention this is in the context of leaf huge pages
> >>> only.  Are the examples you provided all fall into hugepd category?  If so
> >>> I can reword the commit message, as:
> >>
> >> On powerpc 8xx, only the 8M huge pages fall into the hugepd case.
> >>
> >> The 512k hugepages are at PTE level, they are handled more or less like
> >> CONT_PTE on ARM. see function set_huge_pte_at() for more context.
> >>
> >> You can also look at pte_leaf_size() and pgd_leaf_size().
> > 
> > IMHO leaf should return false if the thing is pointing to a next level
> > page table, even if that next level is fully populated with contiguous
> > pages.
> > 
> > This seems more aligned with the contig page direction that hugepd
> > should be moved over to..
> 
> Should hugepd be moved to the contig page direction, really ?

Sure? Is there any downside for the reading side to do so?

> Would it be acceptable that a 8M hugepage requires 2048 contig entries
> in 2 page tables, when the hugepd allows a single entry ? 

? I thought we agreed the only difference would be that something new
is needed to merge the two identical sibling page tables into one, ie
you pay 2x the page table memory if that isn't fixed. That is write
side only change and I imagine it could be done with a single PPC
special API.

Honestly not totally sure that is a big deal, it is already really
memory inefficient compared to every other arch's huge page by needing
the child page table in the first place.

> Would it be acceptable performancewise ?

Isn't this particular PPC sub platform ancient? Are there current real
users that are going to have hugetlbfs special code and care about
this performance detail on a 6.20 era kernel?

In today's world wouldn't it be performance better if these platforms
could support THP by aligning to the contig API instead of being
special?

Am I wrong to question why we are polluting the core code for this
special optimization?

Jason

  reply	other threads:[~2024-03-19 23:27 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-13 21:47 [PATCH 00/13] mm/treewide: Remove pXd_huge() API peterx
2024-03-13 21:47 ` peterx
2024-03-13 21:47 ` peterx
2024-03-13 21:47 ` [PATCH 01/13] mm/hmm: Process pud swap entry without pud_huge() peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47 ` [PATCH 02/13] mm/gup: Cache p4d in follow_p4d_mask() peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47 ` [PATCH 03/13] mm/gup: Check p4d presence before going on peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47 ` [PATCH 04/13] mm/x86: Change pXd_huge() behavior to exclude swap entries peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47 ` [PATCH 05/13] mm/sparc: " peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47 ` [PATCH 06/13] mm/arm: Use macros to define pmd/pud helpers peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47 ` [PATCH 07/13] mm/arm: Redefine pmd_huge() with pmd_leaf() peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47 ` [PATCH 08/13] mm/arm64: Merge pXd_huge() and pXd_leaf() definitions peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47 ` [PATCH 09/13] mm/powerpc: Redefine pXd_huge() with pXd_leaf() peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47   ` peterx
2024-03-14  8:45   ` Christophe Leroy
2024-03-14  8:45     ` Christophe Leroy
2024-03-14  8:45     ` Christophe Leroy
2024-03-14 12:53     ` Peter Xu
2024-03-14 12:53       ` Peter Xu
2024-03-14 12:53       ` Peter Xu
2024-03-14 13:11       ` Christophe Leroy
2024-03-14 13:11         ` Christophe Leroy
2024-03-14 13:11         ` Christophe Leroy
2024-03-18 16:15         ` Jason Gunthorpe
2024-03-18 16:15           ` Jason Gunthorpe
2024-03-18 16:15           ` Jason Gunthorpe
2024-03-19 23:07           ` Christophe Leroy
2024-03-19 23:07             ` Christophe Leroy
2024-03-19 23:07             ` Christophe Leroy
2024-03-19 23:26             ` Jason Gunthorpe [this message]
2024-03-19 23:26               ` Jason Gunthorpe
2024-03-19 23:26               ` Jason Gunthorpe
2024-03-20  6:16               ` Christophe Leroy
2024-03-20  6:16                 ` Christophe Leroy
2024-03-20  6:16                 ` Christophe Leroy
2024-03-20 16:09                 ` Peter Xu
2024-03-20 16:09                   ` Peter Xu
2024-03-20 16:09                   ` Peter Xu
2024-03-20 17:40                   ` Christophe Leroy
2024-03-20 17:40                     ` Christophe Leroy
2024-03-20 17:40                     ` Christophe Leroy
2024-03-20 20:24                     ` Peter Xu
2024-03-20 20:24                       ` Peter Xu
2024-03-20 20:24                       ` Peter Xu
2024-03-13 21:47 ` [PATCH 10/13] mm/gup: Merge pXd huge mapping checks peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47 ` [PATCH 11/13] mm/treewide: Replace pXd_huge() with pXd_leaf() peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47   ` peterx
2024-03-14  8:50   ` Christophe Leroy
2024-03-14  8:50     ` Christophe Leroy
2024-03-14  8:50     ` Christophe Leroy
2024-03-14 12:59     ` Peter Xu
2024-03-14 12:59       ` Peter Xu
2024-03-14 12:59       ` Peter Xu
2024-03-18 16:16       ` Jason Gunthorpe
2024-03-18 16:16         ` Jason Gunthorpe
2024-03-18 16:16         ` Jason Gunthorpe
2024-03-13 21:47 ` [PATCH 12/13] mm/treewide: Remove pXd_huge() peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47   ` peterx
2024-03-14  8:56   ` Christophe Leroy
2024-03-14  8:56     ` Christophe Leroy
2024-03-14  8:56     ` Christophe Leroy
2024-03-14 14:08     ` Peter Xu
2024-03-14 14:08       ` Peter Xu
2024-03-14 14:08       ` Peter Xu
2024-03-13 21:47 ` [PATCH 13/13] mm: Document pXd_leaf() API peterx
2024-03-13 21:47   ` peterx
2024-03-13 21:47   ` peterx

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240319232656.GC159172@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=muchun.song@linux.dev \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.