All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption
@ 2021-02-17 18:49 Mike Kravetz
  2021-02-17 18:49 ` [PATCH 2/2] hugetlb: fix copy_huge_page_from_user " Mike Kravetz
  2021-02-17 19:02 ` [PATCH 1/2] hugetlb: fix update_and_free_page " Andrew Morton
  0 siblings, 2 replies; 13+ messages in thread
From: Mike Kravetz @ 2021-02-17 18:49 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Zi Yan, Davidlohr Bueso, Kirill A . Shutemov, Andrea Arcangeli,
	Matthew Wilcox, Oscar Salvador, Joao Martins, Andrew Morton,
	Mike Kravetz, stable

page structs are not guaranteed to be contiguous for gigantic pages.  The
routine update_and_free_page can encounter a gigantic page, yet it assumes
page structs are contiguous when setting page flags in subpages.

If update_and_free_page encounters non-contiguous page structs, we can
see “BUG: Bad page state in process …” errors.

Non-contiguous page structs are generally not an issue.  However, they can
exist with a specific kernel configuration and hotplug operations.  For
example: Configure the kernel with CONFIG_SPARSEMEM and
!CONFIG_SPARSEMEM_VMEMMAP.  Then, hotplug add memory for the area where the
gigantic page will be allocated.
Zi Yan outlined steps to reproduce here [1].

[1] https://lore.kernel.org/linux-mm/16F7C58B-4D79-41C5-9B64-A1A1628F4AF2@nvidia.com/

Fixes: 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
---
 mm/hugetlb.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4bdb58ab14cb..94e9fa803294 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1312,14 +1312,16 @@ static inline void destroy_compound_gigantic_page(struct page *page,
 static void update_and_free_page(struct hstate *h, struct page *page)
 {
 	int i;
+	struct page *subpage = page;
 
 	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
 		return;
 
 	h->nr_huge_pages--;
 	h->nr_huge_pages_node[page_to_nid(page)]--;
-	for (i = 0; i < pages_per_huge_page(h); i++) {
-		page[i].flags &= ~(1 << PG_locked | 1 << PG_error |
+	for (i = 0; i < pages_per_huge_page(h);
+	     i++, subpage = mem_map_next(subpage, page, i)) {
+		subpage->flags &= ~(1 << PG_locked | 1 << PG_error |
 				1 << PG_referenced | 1 << PG_dirty |
 				1 << PG_active | 1 << PG_private |
 				1 << PG_writeback);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] hugetlb: fix copy_huge_page_from_user contig page struct assumption
  2021-02-17 18:49 [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption Mike Kravetz
@ 2021-02-17 18:49 ` Mike Kravetz
  2021-02-17 19:02 ` [PATCH 1/2] hugetlb: fix update_and_free_page " Andrew Morton
  1 sibling, 0 replies; 13+ messages in thread
From: Mike Kravetz @ 2021-02-17 18:49 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: Zi Yan, Davidlohr Bueso, Kirill A . Shutemov, Andrea Arcangeli,
	Matthew Wilcox, Oscar Salvador, Joao Martins, Andrew Morton,
	Mike Kravetz, stable

page structs are not guaranteed to be contiguous for gigantic pages.
The routine copy_huge_page_from_user can encounter gigantic pages, yet it
assumes page structs are contiguous when copying pages from user space.

Since page structs for the target gigantic page are not contiguous,
the data copied from user space could overwrite other pages not
associated with the gigantic page and cause data corruption.

Non-contiguous page structs are generally not an issue.  However, they can
exist with a specific kernel configuration and hotplug operations.  For
example: Configure the kernel with CONFIG_SPARSEMEM and
!CONFIG_SPARSEMEM_VMEMMAP.  Then, hotplug add memory for the area where the
gigantic page will be allocated.

Fixes: 8fb5debc5fcd ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
---
 mm/memory.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index feff48e1465a..241bec4199b5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5173,17 +5173,19 @@ long copy_huge_page_from_user(struct page *dst_page,
 	void *page_kaddr;
 	unsigned long i, rc = 0;
 	unsigned long ret_val = pages_per_huge_page * PAGE_SIZE;
+	struct page *subpage = dst_page;
 
-	for (i = 0; i < pages_per_huge_page; i++) {
+	for (i = 0; i < pages_per_huge_page;
+	     i++, subpage = mem_map_next(subpage, dst_page, i)) {
 		if (allow_pagefault)
-			page_kaddr = kmap(dst_page + i);
+			page_kaddr = kmap(subpage);
 		else
-			page_kaddr = kmap_atomic(dst_page + i);
+			page_kaddr = kmap_atomic(subpage);
 		rc = copy_from_user(page_kaddr,
 				(const void __user *)(src + i * PAGE_SIZE),
 				PAGE_SIZE);
 		if (allow_pagefault)
-			kunmap(dst_page + i);
+			kunmap(subpage);
 		else
 			kunmap_atomic(page_kaddr);
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption
  2021-02-17 18:49 [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption Mike Kravetz
  2021-02-17 18:49 ` [PATCH 2/2] hugetlb: fix copy_huge_page_from_user " Mike Kravetz
@ 2021-02-17 19:02 ` Andrew Morton
  2021-02-17 19:38   ` Mike Kravetz
  2021-02-18 14:45   ` Matthew Wilcox
  1 sibling, 2 replies; 13+ messages in thread
From: Andrew Morton @ 2021-02-17 19:02 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-kernel, linux-mm, Zi Yan, Davidlohr Bueso,
	Kirill A . Shutemov, Andrea Arcangeli, Matthew Wilcox,
	Oscar Salvador, Joao Martins, stable

On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> page structs are not guaranteed to be contiguous for gigantic pages.  The
> routine update_and_free_page can encounter a gigantic page, yet it assumes
> page structs are contiguous when setting page flags in subpages.
> 
> If update_and_free_page encounters non-contiguous page structs, we can
> see “BUG: Bad page state in process …” errors.
> 
> Non-contiguous page structs are generally not an issue.  However, they can
> exist with a specific kernel configuration and hotplug operations.  For
> example: Configure the kernel with CONFIG_SPARSEMEM and
> !CONFIG_SPARSEMEM_VMEMMAP.  Then, hotplug add memory for the area where the
> gigantic page will be allocated.
> Zi Yan outlined steps to reproduce here [1].
> 
> [1] https://lore.kernel.org/linux-mm/16F7C58B-4D79-41C5-9B64-A1A1628F4AF2@nvidia.com/
> 
> Fixes: 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime")

June 2014.  That's a long lurk time for a bug.  I wonder if some later
commit revealed it.

I guess it doesn't matter a lot, but some -stable kernel maintainers
might wonder if they really need this fix...



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption
  2021-02-17 19:02 ` [PATCH 1/2] hugetlb: fix update_and_free_page " Andrew Morton
@ 2021-02-17 19:38   ` Mike Kravetz
  2021-02-18 14:45   ` Matthew Wilcox
  1 sibling, 0 replies; 13+ messages in thread
From: Mike Kravetz @ 2021-02-17 19:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Zi Yan, Davidlohr Bueso,
	Kirill A . Shutemov, Andrea Arcangeli, Matthew Wilcox,
	Oscar Salvador, Joao Martins, stable

On 2/17/21 11:02 AM, Andrew Morton wrote:
> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
> 
>> page structs are not guaranteed to be contiguous for gigantic pages.  The
>> routine update_and_free_page can encounter a gigantic page, yet it assumes
>> page structs are contiguous when setting page flags in subpages.
>>
>> If update_and_free_page encounters non-contiguous page structs, we can
>> see “BUG: Bad page state in process …” errors.
>>
>> Non-contiguous page structs are generally not an issue.  However, they can
>> exist with a specific kernel configuration and hotplug operations.  For
>> example: Configure the kernel with CONFIG_SPARSEMEM and
>> !CONFIG_SPARSEMEM_VMEMMAP.  Then, hotplug add memory for the area where the
>> gigantic page will be allocated.
>> Zi Yan outlined steps to reproduce here [1].
>>
>> [1] https://lore.kernel.org/linux-mm/16F7C58B-4D79-41C5-9B64-A1A1628F4AF2@nvidia.com/
>>
>> Fixes: 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime")
> 
> June 2014.  That's a long lurk time for a bug.  I wonder if some later
> commit revealed it.
> 
> I guess it doesn't matter a lot, but some -stable kernel maintainers
> might wonder if they really need this fix...

I am not sure how common a CONFIG_SPARSEMEM and !CONFIG_SPARSEMEM_VMEMMAP
config is.  On the more popular architectures, this is not the default.
But, you can build a kernel with such options.  And, then you need to
hotplug memory add and allocate a gigantic page there.

It is unlikely to happen, but possible since Zi could force the BUG.

The copy_huge_page_from_user bug requires the same non-normal configuration
and is just as unlikely to occurr.  But, since it can overwrite somewhat
random pages I would feel better if it was fixed.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption
  2021-02-17 19:02 ` [PATCH 1/2] hugetlb: fix update_and_free_page " Andrew Morton
  2021-02-17 19:38   ` Mike Kravetz
@ 2021-02-18 14:45   ` Matthew Wilcox
  2021-02-18 17:25     ` Jason Gunthorpe
  1 sibling, 1 reply; 13+ messages in thread
From: Matthew Wilcox @ 2021-02-18 14:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Kravetz, linux-kernel, linux-mm, Zi Yan, Davidlohr Bueso,
	Kirill A . Shutemov, Andrea Arcangeli, Oscar Salvador,
	Joao Martins, stable

On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote:
> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > page structs are not guaranteed to be contiguous for gigantic pages.  The
>
> June 2014.  That's a long lurk time for a bug.  I wonder if some later
> commit revealed it.

I would suggest that gigantic pages have not seen much use.  Certainly
performance with Intel CPUs on benchmarks that I've been involved with
showed lower performance with 1GB pages than with 2MB pages until quite
recently.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption
  2021-02-18 14:45   ` Matthew Wilcox
@ 2021-02-18 17:25     ` Jason Gunthorpe
  2021-02-18 17:27       ` Zi Yan
  2021-02-18 17:34       ` Mike Kravetz
  0 siblings, 2 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2021-02-18 17:25 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Mike Kravetz, linux-kernel, linux-mm, Zi Yan,
	Davidlohr Bueso, Kirill A . Shutemov, Andrea Arcangeli,
	Oscar Salvador, Joao Martins, stable

On Thu, Feb 18, 2021 at 02:45:54PM +0000, Matthew Wilcox wrote:
> On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote:
> > On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > > page structs are not guaranteed to be contiguous for gigantic pages.  The
> >
> > June 2014.  That's a long lurk time for a bug.  I wonder if some later
> > commit revealed it.
> 
> I would suggest that gigantic pages have not seen much use.  Certainly
> performance with Intel CPUs on benchmarks that I've been involved with
> showed lower performance with 1GB pages than with 2MB pages until quite
> recently.

I suggested in another thread that maybe it is time to consider
dropping this "feature"

If it has been slightly broken for 7 years it seems a good bet it
isn't actually being used.

The cost to fix GUP to be compatible with this will hurt normal
GUP performance - and again, that nobody has hit this bug in GUP
further suggests the feature isn't used..

Jason 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption
  2021-02-18 17:25     ` Jason Gunthorpe
@ 2021-02-18 17:27       ` Zi Yan
  2021-02-18 17:32         ` Jason Gunthorpe
  2021-02-18 17:34       ` Mike Kravetz
  1 sibling, 1 reply; 13+ messages in thread
From: Zi Yan @ 2021-02-18 17:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matthew Wilcox, Andrew Morton, Mike Kravetz, linux-kernel,
	linux-mm, Davidlohr Bueso, Kirill A . Shutemov, Andrea Arcangeli,
	Oscar Salvador, Joao Martins, stable

[-- Attachment #1: Type: text/plain, Size: 1313 bytes --]

On 18 Feb 2021, at 12:25, Jason Gunthorpe wrote:

> On Thu, Feb 18, 2021 at 02:45:54PM +0000, Matthew Wilcox wrote:
>> On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote:
>>> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>>> page structs are not guaranteed to be contiguous for gigantic pages.  The
>>>
>>> June 2014.  That's a long lurk time for a bug.  I wonder if some later
>>> commit revealed it.
>>
>> I would suggest that gigantic pages have not seen much use.  Certainly
>> performance with Intel CPUs on benchmarks that I've been involved with
>> showed lower performance with 1GB pages than with 2MB pages until quite
>> recently.
>
> I suggested in another thread that maybe it is time to consider
> dropping this "feature"

You mean dropping gigantic page support in hugetlb?

>
> If it has been slightly broken for 7 years it seems a good bet it
> isn't actually being used.
>
> The cost to fix GUP to be compatible with this will hurt normal
> GUP performance - and again, that nobody has hit this bug in GUP
> further suggests the feature isn't used..

A easy fix might be to make gigantic hugetlb page depends on
CONFIG_SPARSEMEM_VMEMMAP, which guarantee all struct pages are contiguous.


—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption
  2021-02-18 17:27       ` Zi Yan
@ 2021-02-18 17:32         ` Jason Gunthorpe
  2021-02-18 17:40           ` Zi Yan
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2021-02-18 17:32 UTC (permalink / raw)
  To: Zi Yan
  Cc: Matthew Wilcox, Andrew Morton, Mike Kravetz, linux-kernel,
	linux-mm, Davidlohr Bueso, Kirill A . Shutemov, Andrea Arcangeli,
	Oscar Salvador, Joao Martins, stable

On Thu, Feb 18, 2021 at 12:27:58PM -0500, Zi Yan wrote:
> On 18 Feb 2021, at 12:25, Jason Gunthorpe wrote:
> 
> > On Thu, Feb 18, 2021 at 02:45:54PM +0000, Matthew Wilcox wrote:
> >> On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote:
> >>> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >>>> page structs are not guaranteed to be contiguous for gigantic pages.  The
> >>>
> >>> June 2014.  That's a long lurk time for a bug.  I wonder if some later
> >>> commit revealed it.
> >>
> >> I would suggest that gigantic pages have not seen much use.  Certainly
> >> performance with Intel CPUs on benchmarks that I've been involved with
> >> showed lower performance with 1GB pages than with 2MB pages until quite
> >> recently.
> >
> > I suggested in another thread that maybe it is time to consider
> > dropping this "feature"
>
> You mean dropping gigantic page support in hugetlb?

No, I mean dropping support for arches that want to do:

   tail_page != head_page + tail_page_nr

because they can't allocate the required page array either virtually
or physically contiguously.

It seems like quite a burden on the core mm for a very niche, and
maybe even non-existant, case. 

It was originally done for PPC, can these PPC systems use VMEMMAP now?

> > The cost to fix GUP to be compatible with this will hurt normal
> > GUP performance - and again, that nobody has hit this bug in GUP
> > further suggests the feature isn't used..
> 
> A easy fix might be to make gigantic hugetlb page depends on
> CONFIG_SPARSEMEM_VMEMMAP, which guarantee all struct pages are contiguous.

Yes, exactly.

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption
  2021-02-18 17:25     ` Jason Gunthorpe
  2021-02-18 17:27       ` Zi Yan
@ 2021-02-18 17:34       ` Mike Kravetz
  2021-02-18 21:43         ` Mike Kravetz
  1 sibling, 1 reply; 13+ messages in thread
From: Mike Kravetz @ 2021-02-18 17:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Matthew Wilcox
  Cc: Andrew Morton, linux-kernel, linux-mm, Zi Yan, Davidlohr Bueso,
	Kirill A . Shutemov, Andrea Arcangeli, Oscar Salvador,
	Joao Martins, stable

On 2/18/21 9:25 AM, Jason Gunthorpe wrote:
> On Thu, Feb 18, 2021 at 02:45:54PM +0000, Matthew Wilcox wrote:
>> On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote:
>>> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>>> page structs are not guaranteed to be contiguous for gigantic pages.  The
>>>
>>> June 2014.  That's a long lurk time for a bug.  I wonder if some later
>>> commit revealed it.
>>
>> I would suggest that gigantic pages have not seen much use.  Certainly
>> performance with Intel CPUs on benchmarks that I've been involved with
>> showed lower performance with 1GB pages than with 2MB pages until quite
>> recently.
> 
> I suggested in another thread that maybe it is time to consider
> dropping this "feature"
> 
> If it has been slightly broken for 7 years it seems a good bet it
> isn't actually being used.
> 
> The cost to fix GUP to be compatible with this will hurt normal
> GUP performance - and again, that nobody has hit this bug in GUP
> further suggests the feature isn't used..

I was thinking that we could detect these 'unusual' configurations and only
do the slower page struct walking in those cases.  However, we would need to
do some research to make sure we have taken into account all possible config
options which can produce non-contiguous page structs.  That should have zero
performance impact in the 'normal' cases.

I suppose we could prohibit gigantic pages in these 'unusual' configurations.
It would require some research to see if this 'may' impact someone.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption
  2021-02-18 17:32         ` Jason Gunthorpe
@ 2021-02-18 17:40           ` Zi Yan
  2021-02-18 17:51             ` Mike Kravetz
  0 siblings, 1 reply; 13+ messages in thread
From: Zi Yan @ 2021-02-18 17:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matthew Wilcox, Andrew Morton, Mike Kravetz, linux-kernel,
	linux-mm, Davidlohr Bueso, Kirill A . Shutemov, Andrea Arcangeli,
	Oscar Salvador, Joao Martins, stable

[-- Attachment #1: Type: text/plain, Size: 2228 bytes --]

On 18 Feb 2021, at 12:32, Jason Gunthorpe wrote:

> On Thu, Feb 18, 2021 at 12:27:58PM -0500, Zi Yan wrote:
>> On 18 Feb 2021, at 12:25, Jason Gunthorpe wrote:
>>
>>> On Thu, Feb 18, 2021 at 02:45:54PM +0000, Matthew Wilcox wrote:
>>>> On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote:
>>>>> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>>>>> page structs are not guaranteed to be contiguous for gigantic pages.  The
>>>>>
>>>>> June 2014.  That's a long lurk time for a bug.  I wonder if some later
>>>>> commit revealed it.
>>>>
>>>> I would suggest that gigantic pages have not seen much use.  Certainly
>>>> performance with Intel CPUs on benchmarks that I've been involved with
>>>> showed lower performance with 1GB pages than with 2MB pages until quite
>>>> recently.
>>>
>>> I suggested in another thread that maybe it is time to consider
>>> dropping this "feature"
>>
>> You mean dropping gigantic page support in hugetlb?
>
> No, I mean dropping support for arches that want to do:
>
>    tail_page != head_page + tail_page_nr
>
> because they can't allocate the required page array either virtually
> or physically contiguously.
>
> It seems like quite a burden on the core mm for a very niche, and
> maybe even non-existant, case.
>
> It was originally done for PPC, can these PPC systems use VMEMMAP now?
>
>>> The cost to fix GUP to be compatible with this will hurt normal
>>> GUP performance - and again, that nobody has hit this bug in GUP
>>> further suggests the feature isn't used..
>>
>> A easy fix might be to make gigantic hugetlb page depends on
>> CONFIG_SPARSEMEM_VMEMMAP, which guarantee all struct pages are contiguous.
>
> Yes, exactly.

I actually have a question on CONFIG_SPARSEMEM_VMEMMAP. Can we assume
PFN_A - PFN_B == struct_page_A - struct_page_B, meaning all struct pages
are ordered based on physical addresses? I just wonder for two PFN ranges,
e.g., [0 - 128MB], [128MB - 256MB], if it is possible to first online
[128MB - 256MB] then [0 - 128MB] and the struct pages of [128MB - 256MB]
are in front of [0 - 128MB] in the vmemmap due to online ordering.


—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption
  2021-02-18 17:40           ` Zi Yan
@ 2021-02-18 17:51             ` Mike Kravetz
  2021-02-18 18:50               ` Zi Yan
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Kravetz @ 2021-02-18 17:51 UTC (permalink / raw)
  To: Zi Yan, Jason Gunthorpe
  Cc: Matthew Wilcox, Andrew Morton, linux-kernel, linux-mm,
	Davidlohr Bueso, Kirill A . Shutemov, Andrea Arcangeli,
	Oscar Salvador, Joao Martins, stable

On 2/18/21 9:40 AM, Zi Yan wrote:
> On 18 Feb 2021, at 12:32, Jason Gunthorpe wrote:
> 
>> On Thu, Feb 18, 2021 at 12:27:58PM -0500, Zi Yan wrote:
>>> On 18 Feb 2021, at 12:25, Jason Gunthorpe wrote:
>>>
>>>> On Thu, Feb 18, 2021 at 02:45:54PM +0000, Matthew Wilcox wrote:
>>>>> On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote:
>>>>>> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>>>>>> page structs are not guaranteed to be contiguous for gigantic pages.  The
>>>>>>
>>>>>> June 2014.  That's a long lurk time for a bug.  I wonder if some later
>>>>>> commit revealed it.
>>>>>
>>>>> I would suggest that gigantic pages have not seen much use.  Certainly
>>>>> performance with Intel CPUs on benchmarks that I've been involved with
>>>>> showed lower performance with 1GB pages than with 2MB pages until quite
>>>>> recently.
>>>>
>>>> I suggested in another thread that maybe it is time to consider
>>>> dropping this "feature"
>>>
>>> You mean dropping gigantic page support in hugetlb?
>>
>> No, I mean dropping support for arches that want to do:
>>
>>    tail_page != head_page + tail_page_nr
>>
>> because they can't allocate the required page array either virtually
>> or physically contiguously.
>>
>> It seems like quite a burden on the core mm for a very niche, and
>> maybe even non-existant, case.
>>
>> It was originally done for PPC, can these PPC systems use VMEMMAP now?
>>
>>>> The cost to fix GUP to be compatible with this will hurt normal
>>>> GUP performance - and again, that nobody has hit this bug in GUP
>>>> further suggests the feature isn't used..
>>>
>>> A easy fix might be to make gigantic hugetlb page depends on
>>> CONFIG_SPARSEMEM_VMEMMAP, which guarantee all struct pages are contiguous.
>>
>> Yes, exactly.
> 
> I actually have a question on CONFIG_SPARSEMEM_VMEMMAP. Can we assume
> PFN_A - PFN_B == struct_page_A - struct_page_B, meaning all struct pages
> are ordered based on physical addresses? I just wonder for two PFN ranges,
> e.g., [0 - 128MB], [128MB - 256MB], if it is possible to first online
> [128MB - 256MB] then [0 - 128MB] and the struct pages of [128MB - 256MB]
> are in front of [0 - 128MB] in the vmemmap due to online ordering.

I have not looked at the code which does the onlining and vmemmap setup.
But, these definitions make me believe it is true:

#elif defined(CONFIG_SPARSEMEM_VMEMMAP)

/* memmap is virtually contiguous.  */
#define __pfn_to_page(pfn)      (vmemmap + (pfn))
#define __page_to_pfn(page)     (unsigned long)((page) - vmemmap)

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption
  2021-02-18 17:51             ` Mike Kravetz
@ 2021-02-18 18:50               ` Zi Yan
  0 siblings, 0 replies; 13+ messages in thread
From: Zi Yan @ 2021-02-18 18:50 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Jason Gunthorpe, Matthew Wilcox, Andrew Morton, linux-kernel,
	linux-mm, Davidlohr Bueso, Kirill A . Shutemov, Andrea Arcangeli,
	Oscar Salvador, Joao Martins, stable

[-- Attachment #1: Type: text/plain, Size: 3016 bytes --]

On 18 Feb 2021, at 12:51, Mike Kravetz wrote:

> On 2/18/21 9:40 AM, Zi Yan wrote:
>> On 18 Feb 2021, at 12:32, Jason Gunthorpe wrote:
>>
>>> On Thu, Feb 18, 2021 at 12:27:58PM -0500, Zi Yan wrote:
>>>> On 18 Feb 2021, at 12:25, Jason Gunthorpe wrote:
>>>>
>>>>> On Thu, Feb 18, 2021 at 02:45:54PM +0000, Matthew Wilcox wrote:
>>>>>> On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote:
>>>>>>> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>>>>>>> page structs are not guaranteed to be contiguous for gigantic pages.  The
>>>>>>>
>>>>>>> June 2014.  That's a long lurk time for a bug.  I wonder if some later
>>>>>>> commit revealed it.
>>>>>>
>>>>>> I would suggest that gigantic pages have not seen much use.  Certainly
>>>>>> performance with Intel CPUs on benchmarks that I've been involved with
>>>>>> showed lower performance with 1GB pages than with 2MB pages until quite
>>>>>> recently.
>>>>>
>>>>> I suggested in another thread that maybe it is time to consider
>>>>> dropping this "feature"
>>>>
>>>> You mean dropping gigantic page support in hugetlb?
>>>
>>> No, I mean dropping support for arches that want to do:
>>>
>>>    tail_page != head_page + tail_page_nr
>>>
>>> because they can't allocate the required page array either virtually
>>> or physically contiguously.
>>>
>>> It seems like quite a burden on the core mm for a very niche, and
>>> maybe even non-existant, case.
>>>
>>> It was originally done for PPC, can these PPC systems use VMEMMAP now?
>>>
>>>>> The cost to fix GUP to be compatible with this will hurt normal
>>>>> GUP performance - and again, that nobody has hit this bug in GUP
>>>>> further suggests the feature isn't used..
>>>>
>>>> A easy fix might be to make gigantic hugetlb page depends on
>>>> CONFIG_SPARSEMEM_VMEMMAP, which guarantee all struct pages are contiguous.
>>>
>>> Yes, exactly.
>>
>> I actually have a question on CONFIG_SPARSEMEM_VMEMMAP. Can we assume
>> PFN_A - PFN_B == struct_page_A - struct_page_B, meaning all struct pages
>> are ordered based on physical addresses? I just wonder for two PFN ranges,
>> e.g., [0 - 128MB], [128MB - 256MB], if it is possible to first online
>> [128MB - 256MB] then [0 - 128MB] and the struct pages of [128MB - 256MB]
>> are in front of [0 - 128MB] in the vmemmap due to online ordering.
>
> I have not looked at the code which does the onlining and vmemmap setup.
> But, these definitions make me believe it is true:
>
> #elif defined(CONFIG_SPARSEMEM_VMEMMAP)
>
> /* memmap is virtually contiguous.  */
> #define __pfn_to_page(pfn)      (vmemmap + (pfn))
> #define __page_to_pfn(page)     (unsigned long)((page) - vmemmap)

Makes sense. Thank you for checking.

I guess making gigantic page depends on CONFIG_SPARSEMEM_VMEMMAP might
be a good way of simplifying code and avoiding future bugs unless
there is an arch really needs gigantic page and cannot have VMEMMAP.

—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption
  2021-02-18 17:34       ` Mike Kravetz
@ 2021-02-18 21:43         ` Mike Kravetz
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Kravetz @ 2021-02-18 21:43 UTC (permalink / raw)
  To: Jason Gunthorpe, Matthew Wilcox
  Cc: Andrew Morton, linux-kernel, linux-mm, Zi Yan, Davidlohr Bueso,
	Kirill A . Shutemov, Andrea Arcangeli, Oscar Salvador,
	Joao Martins, stable

On 2/18/21 9:34 AM, Mike Kravetz wrote:
> On 2/18/21 9:25 AM, Jason Gunthorpe wrote:
>> On Thu, Feb 18, 2021 at 02:45:54PM +0000, Matthew Wilcox wrote:
>>> On Wed, Feb 17, 2021 at 11:02:52AM -0800, Andrew Morton wrote:
>>>> On Wed, 17 Feb 2021 10:49:25 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>>>> page structs are not guaranteed to be contiguous for gigantic pages.  The
>>>>
>>>> June 2014.  That's a long lurk time for a bug.  I wonder if some later
>>>> commit revealed it.
>>>
>>> I would suggest that gigantic pages have not seen much use.  Certainly
>>> performance with Intel CPUs on benchmarks that I've been involved with
>>> showed lower performance with 1GB pages than with 2MB pages until quite
>>> recently.
>>
>> I suggested in another thread that maybe it is time to consider
>> dropping this "feature"
>>
>> If it has been slightly broken for 7 years it seems a good bet it
>> isn't actually being used.
>>
>> The cost to fix GUP to be compatible with this will hurt normal
>> GUP performance - and again, that nobody has hit this bug in GUP
>> further suggests the feature isn't used..
> 
> I was thinking that we could detect these 'unusual' configurations and only
> do the slower page struct walking in those cases.  However, we would need to
> do some research to make sure we have taken into account all possible config
> options which can produce non-contiguous page structs.  That should have zero
> performance impact in the 'normal' cases.

What about something like the following patch, and making all code that
wants to scan gigantic page subpages use mem_map_next()?

From 95b0384bd5d7f0435546bdd3c01c478724ae0166 Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz@oracle.com>
Date: Thu, 18 Feb 2021 13:35:02 -0800
Subject: [PATCH] mm: define PFN_PAGE_MAP_LINEAR to optimize gigantic page
 scans

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 arch/ia64/include/asm/page.h       | 1 +
 arch/m68k/include/asm/page_no.h    | 1 +
 include/asm-generic/memory_model.h | 2 ++
 mm/internal.h                      | 2 ++
 4 files changed, 6 insertions(+)

diff --git a/arch/ia64/include/asm/page.h b/arch/ia64/include/asm/page.h
index b69a5499d75b..8f4288862ec8 100644
--- a/arch/ia64/include/asm/page.h
+++ b/arch/ia64/include/asm/page.h
@@ -106,6 +106,7 @@ extern struct page *vmem_map;
 #ifdef CONFIG_DISCONTIGMEM
 # define page_to_pfn(page)	((unsigned long) (page - vmem_map))
 # define pfn_to_page(pfn)	(vmem_map + (pfn))
+# define PFN_PAGE_MAP_LINEAR
 # define __pfn_to_phys(pfn)	PFN_PHYS(pfn)
 #else
 # include <asm-generic/memory_model.h>
diff --git a/arch/m68k/include/asm/page_no.h b/arch/m68k/include/asm/page_no.h
index 6bbe52025de3..cafc0731a42c 100644
--- a/arch/m68k/include/asm/page_no.h
+++ b/arch/m68k/include/asm/page_no.h
@@ -28,6 +28,7 @@ extern unsigned long memory_end;
 
 #define pfn_to_page(pfn)	virt_to_page(pfn_to_virt(pfn))
 #define page_to_pfn(page)	virt_to_pfn(page_to_virt(page))
+#define PFN_PAGE_MAP_LINEAR
 #define pfn_valid(pfn)	        ((pfn) < max_mapnr)
 
 #define	virt_addr_valid(kaddr)	(((void *)(kaddr) >= (void *)PAGE_OFFSET) && \
diff --git a/include/asm-generic/memory_model.h b/include/asm-generic/memory_model.h
index 7637fb46ba4f..8ac4c48dbf22 100644
--- a/include/asm-generic/memory_model.h
+++ b/include/asm-generic/memory_model.h
@@ -33,6 +33,7 @@
 #define __pfn_to_page(pfn)	(mem_map + ((pfn) - ARCH_PFN_OFFSET))
 #define __page_to_pfn(page)	((unsigned long)((page) - mem_map) + \
 				 ARCH_PFN_OFFSET)
+#define PFN_PAGE_MAP_LINEAR
 #elif defined(CONFIG_DISCONTIGMEM)
 
 #define __pfn_to_page(pfn)			\
@@ -53,6 +54,7 @@
 /* memmap is virtually contiguous.  */
 #define __pfn_to_page(pfn)	(vmemmap + (pfn))
 #define __page_to_pfn(page)	(unsigned long)((page) - vmemmap)
+#define PFN_PAGE_MAP_LINEAR
 
 #elif defined(CONFIG_SPARSEMEM)
 /*
diff --git a/mm/internal.h b/mm/internal.h
index 25d2b2439f19..64cc5069047c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -454,12 +454,14 @@ static inline struct page *mem_map_offset(struct page *base, int offset)
 static inline struct page *mem_map_next(struct page *iter,
 						struct page *base, int offset)
 {
+#ifndef PFN_PAGE_MAP_LINEAR
 	if (unlikely((offset & (MAX_ORDER_NR_PAGES - 1)) == 0)) {
 		unsigned long pfn = page_to_pfn(base) + offset;
 		if (!pfn_valid(pfn))
 			return NULL;
 		return pfn_to_page(pfn);
 	}
+#endif
 	return iter + 1;
 }
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-02-18 21:44 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-17 18:49 [PATCH 1/2] hugetlb: fix update_and_free_page contig page struct assumption Mike Kravetz
2021-02-17 18:49 ` [PATCH 2/2] hugetlb: fix copy_huge_page_from_user " Mike Kravetz
2021-02-17 19:02 ` [PATCH 1/2] hugetlb: fix update_and_free_page " Andrew Morton
2021-02-17 19:38   ` Mike Kravetz
2021-02-18 14:45   ` Matthew Wilcox
2021-02-18 17:25     ` Jason Gunthorpe
2021-02-18 17:27       ` Zi Yan
2021-02-18 17:32         ` Jason Gunthorpe
2021-02-18 17:40           ` Zi Yan
2021-02-18 17:51             ` Mike Kravetz
2021-02-18 18:50               ` Zi Yan
2021-02-18 17:34       ` Mike Kravetz
2021-02-18 21:43         ` Mike Kravetz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.