stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype
@ 2022-06-13 13:10 Xianting Tian
  2022-06-13 15:22 ` Zi Yan
  2022-06-13 16:08 ` Guo Ren
  0 siblings, 2 replies; 19+ messages in thread
From: Xianting Tian @ 2022-06-13 13:10 UTC (permalink / raw)
  To: akpm, vbabka, ziy
  Cc: linux-mm, linux-kernel, stable, huanyi.xj, guoren, zjb194813,
	tianhu.hh, Xianting Tian

Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
added buddy check code. But unfortunately, this fix isn't backported to
linux-5.17.y and the former stable branches. The reason is it added wrong
fixes message:
     Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
			   pageblocks with others")
Actually, this issue is involved by commit:
     commit d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")

For RISC-V arch, the first 2M is reserved for sbi, so the start PFN is 512,
but it got buddy PFN 0 for PFN 0x2000:
     0 = 0x2000 ^ (1 << 12)
With the illegal buddy PFN 0, it got an illegal buddy page, which caused
crash in __get_pfnblock_flags_mask().

With the patch, it can avoid the calling of get_pageblock_migratetype() if
it isn't buddy page.

Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
Cc: stable@vger.kernel.org
Reported-by: zjb194813@alibaba-inc.com
Reported-by: tianhu.hh@alibaba-inc.com
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
---
 mm/page_alloc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b1caa1c6c887..5b423caa68fd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1129,6 +1129,9 @@ static inline void __free_one_page(struct page *page,
 
 			buddy_pfn = __find_buddy_pfn(pfn, order);
 			buddy = page + (buddy_pfn - pfn);
+
+			if (!page_is_buddy(page, buddy, order))
+				goto done_merging;
 			buddy_mt = get_pageblock_migratetype(buddy);
 
 			if (migratetype != buddy_mt
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype
  2022-06-13 13:10 [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype Xianting Tian
@ 2022-06-13 15:22 ` Zi Yan
  2022-06-13 16:32   ` Guo Ren
  2022-06-13 16:08 ` Guo Ren
  1 sibling, 1 reply; 19+ messages in thread
From: Zi Yan @ 2022-06-13 15:22 UTC (permalink / raw)
  To: Xianting Tian
  Cc: akpm, vbabka, linux-mm, linux-kernel, stable, huanyi.xj, guoren,
	zjb194813, tianhu.hh

[-- Attachment #1: Type: text/plain, Size: 2526 bytes --]

Hi Xianting,

Thanks for your patch.

On 13 Jun 2022, at 9:10, Xianting Tian wrote:

> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
> added buddy check code. But unfortunately, this fix isn't backported to
> linux-5.17.y and the former stable branches. The reason is it added wrong
> fixes message:
>      Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
> 			   pageblocks with others")

No, the Fixes tag is right. The commit above does need to validate buddy.

> Actually, this issue is involved by commit:
>      commit d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>
> For RISC-V arch, the first 2M is reserved for sbi, so the start PFN is 512,
> but it got buddy PFN 0 for PFN 0x2000:
>      0 = 0x2000 ^ (1 << 12)
> With the illegal buddy PFN 0, it got an illegal buddy page, which caused
> crash in __get_pfnblock_flags_mask().

It seems that the RISC-V arch reveals a similar bug from d9dddbf55667.
Basically, this bug will only happen when PFN=0x2000 is merging up and
there are some isolated pageblocks.

BTW, what does first reserved 2MB imply? All 4KB pages from first 2MB are
set to PageReserved?

>
> With the patch, it can avoid the calling of get_pageblock_migratetype() if
> it isn't buddy page.

You might miss the __find_buddy_pfn() caller in unset_migratetype_isolate()
from mm/page_isolation.c, if you are talking about linux-5.17.y and former
version. There, page_is_buddy() is also not called and is_migrate_isolate_page()
is called, which calls get_pageblock_migratetype() too.

>
> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
> Cc: stable@vger.kernel.org
> Reported-by: zjb194813@alibaba-inc.com
> Reported-by: tianhu.hh@alibaba-inc.com
> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> ---
>  mm/page_alloc.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b1caa1c6c887..5b423caa68fd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1129,6 +1129,9 @@ static inline void __free_one_page(struct page *page,
>
>  			buddy_pfn = __find_buddy_pfn(pfn, order);
>  			buddy = page + (buddy_pfn - pfn);
> +
> +			if (!page_is_buddy(page, buddy, order))
> +				goto done_merging;
>  			buddy_mt = get_pageblock_migratetype(buddy);
>
>  			if (migratetype != buddy_mt
> -- 
> 2.17.1

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype
  2022-06-13 13:10 [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype Xianting Tian
  2022-06-13 15:22 ` Zi Yan
@ 2022-06-13 16:08 ` Guo Ren
  2022-06-14  1:26   ` Xianting Tian
  2022-06-14  2:04   ` Xianting Tian
  1 sibling, 2 replies; 19+ messages in thread
From: Guo Ren @ 2022-06-13 16:08 UTC (permalink / raw)
  To: Xianting Tian
  Cc: Andrew Morton, Vlastimil Babka, ziy, Linux-MM,
	Linux Kernel Mailing List, stable, huanyi.xj, zjb194813,
	tianhu.hh

On Mon, Jun 13, 2022 at 9:11 PM Xianting Tian
<xianting.tian@linux.alibaba.com> wrote:
>
> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
> added buddy check code. But unfortunately, this fix isn't backported to
> linux-5.17.y and the former stable branches. The reason is it added wrong
> fixes message:
>      Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>                            pageblocks with others")
> Actually, this issue is involved by commit:
>      commit d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>
> For RISC-V arch, the first 2M is reserved for sbi, so the start PFN is 512,
> but it got buddy PFN 0 for PFN 0x2000:
>      0 = 0x2000 ^ (1 << 12)
How did we get 0? (Try it in gdb)
(gdb) p /x (0x2000 ^ (1<<12))
$3 = 0x3000

I think it got buddy PFN 0 for PFN 0x1000, right?
(gdb) p /x (0x1000 ^ (1<<12))
$4 = 0x0

> With the illegal buddy PFN 0, it got an illegal buddy page, which caused
> crash in __get_pfnblock_flags_mask().
>
> With the patch, it can avoid the calling of get_pageblock_migratetype() if
> it isn't buddy page.
>
> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
> Cc: stable@vger.kernel.org
> Reported-by: zjb194813@alibaba-inc.com
> Reported-by: tianhu.hh@alibaba-inc.com
> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> ---
>  mm/page_alloc.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b1caa1c6c887..5b423caa68fd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1129,6 +1129,9 @@ static inline void __free_one_page(struct page *page,
>
>                         buddy_pfn = __find_buddy_pfn(pfn, order);
>                         buddy = page + (buddy_pfn - pfn);
> +
> +                       if (!page_is_buddy(page, buddy, order))
Right, we need to check the buddy_pfn valid, because some SoCs would
start dram address with an offset that has an order smaller than
MAX_ORDER.

> +                               goto done_merging;
>                         buddy_mt = get_pageblock_migratetype(buddy);
>
>                         if (migratetype != buddy_mt
> --
> 2.17.1
>

Fixup the comment and

Reviewed-by: Guo Ren <guoren@kernel.org>

-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype
  2022-06-13 15:22 ` Zi Yan
@ 2022-06-13 16:32   ` Guo Ren
  2022-06-13 19:49     ` Zi Yan
  0 siblings, 1 reply; 19+ messages in thread
From: Guo Ren @ 2022-06-13 16:32 UTC (permalink / raw)
  To: Zi Yan
  Cc: Xianting Tian, Andrew Morton, Vlastimil Babka, Linux-MM,
	Linux Kernel Mailing List, stable, huanyi.xj, zjb194813,
	tianhu.hh, Hanjun Guo, Joonsoo Kim, Laura Abbott

On Mon, Jun 13, 2022 at 11:23 PM Zi Yan <ziy@nvidia.com> wrote:
>
> Hi Xianting,
>
> Thanks for your patch.
>
> On 13 Jun 2022, at 9:10, Xianting Tian wrote:
>
> > Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
> > added buddy check code. But unfortunately, this fix isn't backported to
> > linux-5.17.y and the former stable branches. The reason is it added wrong
> > fixes message:
> >      Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
> >                          pageblocks with others")
>
> No, the Fixes tag is right. The commit above does need to validate buddy.
I think Xianting is right. The “Fixes:" tag is not accurate and the
page_is_buddy() is necessary here.

This patch could be applied to the early version of the stable tree
(eg: Linux-5.10.y, not the master tree)

>
> > Actually, this issue is involved by commit:
> >      commit d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
> >
> > For RISC-V arch, the first 2M is reserved for sbi, so the start PFN is 512,
> > but it got buddy PFN 0 for PFN 0x2000:
> >      0 = 0x2000 ^ (1 << 12)
> > With the illegal buddy PFN 0, it got an illegal buddy page, which caused
> > crash in __get_pfnblock_flags_mask().
>
> It seems that the RISC-V arch reveals a similar bug from d9dddbf55667.
> Basically, this bug will only happen when PFN=0x2000 is merging up and
> there are some isolated pageblocks.
Not PFN=0x2000, it's PFN=0x1000, I guess.

RISC-V's first 2MB RAM could reserve for opensbi, so it would have
riscv_pfn_base=512 and mem_map began with 512th PFN when
CONFIG_FLATMEM=y.
(Also, csky has the same issue: a non-zero pfn_base in some scenarios.)

But __find_buddy_pfn algorithm thinks the start address is 0, it could
get 0 pfn or less than the pfn_base value. We need another check to
prevent that.

>
> BTW, what does first reserved 2MB imply? All 4KB pages from first 2MB are
> set to PageReserved?
>
> >
> > With the patch, it can avoid the calling of get_pageblock_migratetype() if
> > it isn't buddy page.
>
> You might miss the __find_buddy_pfn() caller in unset_migratetype_isolate()
> from mm/page_isolation.c, if you are talking about linux-5.17.y and former
> version. There, page_is_buddy() is also not called and is_migrate_isolate_page()
> is called, which calls get_pageblock_migratetype() too.
>
> >
> > Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
> > Cc: stable@vger.kernel.org
> > Reported-by: zjb194813@alibaba-inc.com
> > Reported-by: tianhu.hh@alibaba-inc.com
> > Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> > ---
> >  mm/page_alloc.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index b1caa1c6c887..5b423caa68fd 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1129,6 +1129,9 @@ static inline void __free_one_page(struct page *page,
> >
> >                       buddy_pfn = __find_buddy_pfn(pfn, order);
> >                       buddy = page + (buddy_pfn - pfn);
> > +
> > +                     if (!page_is_buddy(page, buddy, order))
> > +                             goto done_merging;
> >                       buddy_mt = get_pageblock_migratetype(buddy);
> >
> >                       if (migratetype != buddy_mt
> > --
> > 2.17.1
>
> --
> Best Regards,
> Yan, Zi



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype
  2022-06-13 16:32   ` Guo Ren
@ 2022-06-13 19:49     ` Zi Yan
  2022-06-13 23:47       ` Guo Ren
  0 siblings, 1 reply; 19+ messages in thread
From: Zi Yan @ 2022-06-13 19:49 UTC (permalink / raw)
  To: Guo Ren
  Cc: Xianting Tian, Andrew Morton, Vlastimil Babka, Linux-MM,
	Linux Kernel Mailing List, stable, huanyi.xj, zjb194813,
	tianhu.hh, Hanjun Guo, Joonsoo Kim, Laura Abbott

[-- Attachment #1: Type: text/plain, Size: 4615 bytes --]

On 13 Jun 2022, at 12:32, Guo Ren wrote:

> On Mon, Jun 13, 2022 at 11:23 PM Zi Yan <ziy@nvidia.com> wrote:
>>
>> Hi Xianting,
>>
>> Thanks for your patch.
>>
>> On 13 Jun 2022, at 9:10, Xianting Tian wrote:
>>
>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>>> added buddy check code. But unfortunately, this fix isn't backported to
>>> linux-5.17.y and the former stable branches. The reason is it added wrong
>>> fixes message:
>>>      Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>>>                          pageblocks with others")
>>
>> No, the Fixes tag is right. The commit above does need to validate buddy.
> I think Xianting is right. The “Fixes:" tag is not accurate and the
> page_is_buddy() is necessary here.
>
> This patch could be applied to the early version of the stable tree
> (eg: Linux-5.10.y, not the master tree)

This is quite misleading. Commit 787af64d05cd applies does not mean it is
intended to fix the preexisting bug. Also it does not apply cleanly
to commit d9dddbf55667, there is a clear indentation mismatch. At best,
you can say the way of 787af64d05cd fixing 1dd214b8f21c also fixes d9dddbf55667.
There is no way you can apply 787af64d05cd to earlier trees and call it a day.

You can mention 787af64d05cd that it fixes a bug in 1dd214b8f21c and there is
a similar bug in d9dddbf55667 that can be fixed in a similar way too. Saying
the fixes message is wrong just misleads people, making them think there is
no bug in 1dd214b8f21c. We need to be clear about this.

Also, you will need to fix the mm/page_isolation.c code too to make this patch
complete, unless you can show that PFN=0x1000 is never going to be encountered
in the mm/page_isolation.c code I mentioned below.

>
>>
>>> Actually, this issue is involved by commit:
>>>      commit d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>
>>> For RISC-V arch, the first 2M is reserved for sbi, so the start PFN is 512,
>>> but it got buddy PFN 0 for PFN 0x2000:
>>>      0 = 0x2000 ^ (1 << 12)
>>> With the illegal buddy PFN 0, it got an illegal buddy page, which caused
>>> crash in __get_pfnblock_flags_mask().
>>
>> It seems that the RISC-V arch reveals a similar bug from d9dddbf55667.
>> Basically, this bug will only happen when PFN=0x2000 is merging up and
>> there are some isolated pageblocks.
> Not PFN=0x2000, it's PFN=0x1000, I guess.
>
> RISC-V's first 2MB RAM could reserve for opensbi, so it would have
> riscv_pfn_base=512 and mem_map began with 512th PFN when
> CONFIG_FLATMEM=y.
> (Also, csky has the same issue: a non-zero pfn_base in some scenarios.)
>
> But __find_buddy_pfn algorithm thinks the start address is 0, it could
> get 0 pfn or less than the pfn_base value. We need another check to
> prevent that.
>
>>
>> BTW, what does first reserved 2MB imply? All 4KB pages from first 2MB are
>> set to PageReserved?
>>
>>>
>>> With the patch, it can avoid the calling of get_pageblock_migratetype() if
>>> it isn't buddy page.
>>
>> You might miss the __find_buddy_pfn() caller in unset_migratetype_isolate()
>> from mm/page_isolation.c, if you are talking about linux-5.17.y and former
>> version. There, page_is_buddy() is also not called and is_migrate_isolate_page()
>> is called, which calls get_pageblock_migratetype() too.
>>
>>>
>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>> Cc: stable@vger.kernel.org
>>> Reported-by: zjb194813@alibaba-inc.com
>>> Reported-by: tianhu.hh@alibaba-inc.com
>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>>> ---
>>>  mm/page_alloc.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index b1caa1c6c887..5b423caa68fd 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -1129,6 +1129,9 @@ static inline void __free_one_page(struct page *page,
>>>
>>>                       buddy_pfn = __find_buddy_pfn(pfn, order);
>>>                       buddy = page + (buddy_pfn - pfn);
>>> +
>>> +                     if (!page_is_buddy(page, buddy, order))
>>> +                             goto done_merging;
>>>                       buddy_mt = get_pageblock_migratetype(buddy);
>>>
>>>                       if (migratetype != buddy_mt
>>> --
>>> 2.17.1
>>
>> --
>> Best Regards,
>> Yan, Zi
>
>
>
> -- 
> Best Regards
>  Guo Ren
>
> ML: https://lore.kernel.org/linux-csky/

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype
  2022-06-13 19:49     ` Zi Yan
@ 2022-06-13 23:47       ` Guo Ren
  2022-06-14  0:14         ` Zi Yan
  0 siblings, 1 reply; 19+ messages in thread
From: Guo Ren @ 2022-06-13 23:47 UTC (permalink / raw)
  To: Zi Yan, Greg Kroah-Hartman
  Cc: Xianting Tian, Andrew Morton, Vlastimil Babka, Linux-MM,
	Linux Kernel Mailing List, stable, huanyi.xj, zjb194813,
	tianhu.hh, Hanjun Guo, Joonsoo Kim, Laura Abbott

On Tue, Jun 14, 2022 at 3:49 AM Zi Yan <ziy@nvidia.com> wrote:
>
> On 13 Jun 2022, at 12:32, Guo Ren wrote:
>
> > On Mon, Jun 13, 2022 at 11:23 PM Zi Yan <ziy@nvidia.com> wrote:
> >>
> >> Hi Xianting,
> >>
> >> Thanks for your patch.
> >>
> >> On 13 Jun 2022, at 9:10, Xianting Tian wrote:
> >>
> >>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
> >>> added buddy check code. But unfortunately, this fix isn't backported to
> >>> linux-5.17.y and the former stable branches. The reason is it added wrong
> >>> fixes message:
> >>>      Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
> >>>                          pageblocks with others")
> >>
> >> No, the Fixes tag is right. The commit above does need to validate buddy.
> > I think Xianting is right. The “Fixes:" tag is not accurate and the
> > page_is_buddy() is necessary here.
> >
> > This patch could be applied to the early version of the stable tree
> > (eg: Linux-5.10.y, not the master tree)
>
> This is quite misleading. Commit 787af64d05cd applies does not mean it is
> intended to fix the preexisting bug. Also it does not apply cleanly
> to commit d9dddbf55667, there is a clear indentation mismatch. At best,
> you can say the way of 787af64d05cd fixing 1dd214b8f21c also fixes d9dddbf55667.
> There is no way you can apply 787af64d05cd to earlier trees and call it a day.
>
> You can mention 787af64d05cd that it fixes a bug in 1dd214b8f21c and there is
> a similar bug in d9dddbf55667 that can be fixed in a similar way too. Saying
> the fixes message is wrong just misleads people, making them think there is
> no bug in 1dd214b8f21c. We need to be clear about this.
First, d9dddbf55667 is earlier than 1dd214b8f21c in Linus tree. The
origin fixes could cover the Linux-5.0.y tree if they give the
accurate commit number and that is the cause we want to point out.

Second, if the patch is for d9dddbf55667 then it could cover any tree
in the stable repo. Actually, we only know Linux-5.10.y has the
problem.

Maybe, Gregkh could help to direct us on how to deal with the issue:
(Fixup a bug which only belongs to the former stable branch.)

>
> Also, you will need to fix the mm/page_isolation.c code too to make this patch
> complete, unless you can show that PFN=0x1000 is never going to be encountered
> in the mm/page_isolation.c code I mentioned below.
No, we needn't fix mm/page_isolation.c in linux-5.10.y, because it had
pfn_valid_within(buddy_pfn) check after __find_buddy_pfn() to prevent
buddy_pfn=0.
The root cause comes from __find_buddy_pfn():
return page_pfn ^ (1 << order);

When page_pfn is the same as the order size, it will return the
previous buddy not the next. That is the only exception for this
algorithm, right?




In fact, the bug is a very long time to reproduce and is not easy to
debug, so we want to contribute it to the community to prevent other
guys from wasting time. Although there is no new patch at all.

>
> >
> >>
> >>> Actually, this issue is involved by commit:
> >>>      commit d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
> >>>
> >>> For RISC-V arch, the first 2M is reserved for sbi, so the start PFN is 512,
> >>> but it got buddy PFN 0 for PFN 0x2000:
> >>>      0 = 0x2000 ^ (1 << 12)
> >>> With the illegal buddy PFN 0, it got an illegal buddy page, which caused
> >>> crash in __get_pfnblock_flags_mask().
> >>
> >> It seems that the RISC-V arch reveals a similar bug from d9dddbf55667.
> >> Basically, this bug will only happen when PFN=0x2000 is merging up and
> >> there are some isolated pageblocks.
> > Not PFN=0x2000, it's PFN=0x1000, I guess.
> >
> > RISC-V's first 2MB RAM could reserve for opensbi, so it would have
> > riscv_pfn_base=512 and mem_map began with 512th PFN when
> > CONFIG_FLATMEM=y.
> > (Also, csky has the same issue: a non-zero pfn_base in some scenarios.)
> >
> > But __find_buddy_pfn algorithm thinks the start address is 0, it could
> > get 0 pfn or less than the pfn_base value. We need another check to
> > prevent that.
> >
> >>
> >> BTW, what does first reserved 2MB imply? All 4KB pages from first 2MB are
> >> set to PageReserved?
> >>
> >>>
> >>> With the patch, it can avoid the calling of get_pageblock_migratetype() if
> >>> it isn't buddy page.
> >>
> >> You might miss the __find_buddy_pfn() caller in unset_migratetype_isolate()
> >> from mm/page_isolation.c, if you are talking about linux-5.17.y and former
> >> version. There, page_is_buddy() is also not called and is_migrate_isolate_page()
> >> is called, which calls get_pageblock_migratetype() too.
> >>
> >>>
> >>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
> >>> Cc: stable@vger.kernel.org
> >>> Reported-by: zjb194813@alibaba-inc.com
> >>> Reported-by: tianhu.hh@alibaba-inc.com
> >>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> >>> ---
> >>>  mm/page_alloc.c | 3 +++
> >>>  1 file changed, 3 insertions(+)
> >>>
> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>> index b1caa1c6c887..5b423caa68fd 100644
> >>> --- a/mm/page_alloc.c
> >>> +++ b/mm/page_alloc.c
> >>> @@ -1129,6 +1129,9 @@ static inline void __free_one_page(struct page *page,
> >>>
> >>>                       buddy_pfn = __find_buddy_pfn(pfn, order);
> >>>                       buddy = page + (buddy_pfn - pfn);
> >>> +
> >>> +                     if (!page_is_buddy(page, buddy, order))
> >>> +                             goto done_merging;
> >>>                       buddy_mt = get_pageblock_migratetype(buddy);
> >>>
> >>>                       if (migratetype != buddy_mt
> >>> --
> >>> 2.17.1
> >>
> >> --
> >> Best Regards,
> >> Yan, Zi
> >
> >
> >
> > --
> > Best Regards
> >  Guo Ren
> >
> > ML: https://lore.kernel.org/linux-csky/
>
> --
> Best Regards,
> Yan, Zi



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype
  2022-06-13 23:47       ` Guo Ren
@ 2022-06-14  0:14         ` Zi Yan
  2022-06-14  1:19           ` Guo Ren
  2022-06-15  6:47           ` Xianting Tian
  0 siblings, 2 replies; 19+ messages in thread
From: Zi Yan @ 2022-06-14  0:14 UTC (permalink / raw)
  To: Guo Ren
  Cc: Greg Kroah-Hartman, Xianting Tian, Andrew Morton,
	Vlastimil Babka, Linux-MM, Linux Kernel Mailing List, stable,
	huanyi.xj, zjb194813, tianhu.hh, Hanjun Guo, Joonsoo Kim,
	Laura Abbott

[-- Attachment #1: Type: text/plain, Size: 7878 bytes --]

On 13 Jun 2022, at 19:47, Guo Ren wrote:

> On Tue, Jun 14, 2022 at 3:49 AM Zi Yan <ziy@nvidia.com> wrote:
>>
>> On 13 Jun 2022, at 12:32, Guo Ren wrote:
>>
>>> On Mon, Jun 13, 2022 at 11:23 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>
>>>> Hi Xianting,
>>>>
>>>> Thanks for your patch.
>>>>
>>>> On 13 Jun 2022, at 9:10, Xianting Tian wrote:
>>>>
>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>>>>> added buddy check code. But unfortunately, this fix isn't backported to
>>>>> linux-5.17.y and the former stable branches. The reason is it added wrong
>>>>> fixes message:
>>>>>      Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>>>>>                          pageblocks with others")
>>>>
>>>> No, the Fixes tag is right. The commit above does need to validate buddy.
>>> I think Xianting is right. The “Fixes:" tag is not accurate and the
>>> page_is_buddy() is necessary here.
>>>
>>> This patch could be applied to the early version of the stable tree
>>> (eg: Linux-5.10.y, not the master tree)
>>
>> This is quite misleading. Commit 787af64d05cd applies does not mean it is
>> intended to fix the preexisting bug. Also it does not apply cleanly
>> to commit d9dddbf55667, there is a clear indentation mismatch. At best,
>> you can say the way of 787af64d05cd fixing 1dd214b8f21c also fixes d9dddbf55667.
>> There is no way you can apply 787af64d05cd to earlier trees and call it a day.
>>
>> You can mention 787af64d05cd that it fixes a bug in 1dd214b8f21c and there is
>> a similar bug in d9dddbf55667 that can be fixed in a similar way too. Saying
>> the fixes message is wrong just misleads people, making them think there is
>> no bug in 1dd214b8f21c. We need to be clear about this.
> First, d9dddbf55667 is earlier than 1dd214b8f21c in Linus tree. The
> origin fixes could cover the Linux-5.0.y tree if they give the
> accurate commit number and that is the cause we want to point out.

Yes, I got that d9dddbf55667 is earlier and commit 787af64d05cd fixes
the issue introduced by d9dddbf55667. But my point is that 787af64d05cd
is not intended to fix d9dddbf55667 and saying it has a wrong fixes
message is misleading. This is the point I want to make.

>
> Second, if the patch is for d9dddbf55667 then it could cover any tree
> in the stable repo. Actually, we only know Linux-5.10.y has the
> problem.

But it is not and does not apply to d9dddbf55667 cleanly.

>
> Maybe, Gregkh could help to direct us on how to deal with the issue:
> (Fixup a bug which only belongs to the former stable branch.)
>

I think you just need to send this patch without saying “commit
787af64d05cd fixes message is wrong” would be a good start. You also
need extra fix to mm/page_isolation.c for kernels between 5.15 and 5.17
(inclusive). So there will need to be two patches:

1) your patch to stable tree prior to 5.15 and

2) your patch with an additional mm/page_isolation.c fix to stable tree
between 5.15 and 5.17.

>>
>> Also, you will need to fix the mm/page_isolation.c code too to make this patch
>> complete, unless you can show that PFN=0x1000 is never going to be encountered
>> in the mm/page_isolation.c code I mentioned below.
> No, we needn't fix mm/page_isolation.c in linux-5.10.y, because it had
> pfn_valid_within(buddy_pfn) check after __find_buddy_pfn() to prevent
> buddy_pfn=0.
> The root cause comes from __find_buddy_pfn():
> return page_pfn ^ (1 << order);

Right. But pfn_valid_within() was removed since 5.15. So your fix is
required for kernels between 5.15 and 5.17 (inclusive).

>
> When page_pfn is the same as the order size, it will return the
> previous buddy not the next. That is the only exception for this
> algorithm, right?
>
>
>
>
> In fact, the bug is a very long time to reproduce and is not easy to
> debug, so we want to contribute it to the community to prevent other
> guys from wasting time. Although there is no new patch at all.

Thanks for your reporting and sending out the patch. I really
appreciate it. We definitely need your inputs. Throughout the email
thread, I am trying to help you clarify the bug and how to fix it
properly:

1. The commit 787af64d05cd does not apply cleanly to commits
d9dddbf55667, meaning you cannot just cherry-pick that commit to
fix the issue. That is why we need your patch to fix the issue.
And saying it has a wrong fixes message in this patch’s git log is
misleading.

2. For kernels between 5.15 and 5.17 (inclusive), an additional fix
to mm/page_isolation.c is also needed, since pfn_valid_within() was
removed since 5.15 and the issue can appear during page isolation.

3. For kernels before 5.15, this patch will apply.

>
>>
>>>
>>>>
>>>>> Actually, this issue is involved by commit:
>>>>>      commit d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>>
>>>>> For RISC-V arch, the first 2M is reserved for sbi, so the start PFN is 512,
>>>>> but it got buddy PFN 0 for PFN 0x2000:
>>>>>      0 = 0x2000 ^ (1 << 12)
>>>>> With the illegal buddy PFN 0, it got an illegal buddy page, which caused
>>>>> crash in __get_pfnblock_flags_mask().
>>>>
>>>> It seems that the RISC-V arch reveals a similar bug from d9dddbf55667.
>>>> Basically, this bug will only happen when PFN=0x2000 is merging up and
>>>> there are some isolated pageblocks.
>>> Not PFN=0x2000, it's PFN=0x1000, I guess.
>>>
>>> RISC-V's first 2MB RAM could reserve for opensbi, so it would have
>>> riscv_pfn_base=512 and mem_map began with 512th PFN when
>>> CONFIG_FLATMEM=y.
>>> (Also, csky has the same issue: a non-zero pfn_base in some scenarios.)
>>>
>>> But __find_buddy_pfn algorithm thinks the start address is 0, it could
>>> get 0 pfn or less than the pfn_base value. We need another check to
>>> prevent that.
>>>
>>>>
>>>> BTW, what does first reserved 2MB imply? All 4KB pages from first 2MB are
>>>> set to PageReserved?
>>>>
>>>>>
>>>>> With the patch, it can avoid the calling of get_pageblock_migratetype() if
>>>>> it isn't buddy page.
>>>>
>>>> You might miss the __find_buddy_pfn() caller in unset_migratetype_isolate()
>>>> from mm/page_isolation.c, if you are talking about linux-5.17.y and former
>>>> version. There, page_is_buddy() is also not called and is_migrate_isolate_page()
>>>> is called, which calls get_pageblock_migratetype() too.
>>>>
>>>>>
>>>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>> Cc: stable@vger.kernel.org
>>>>> Reported-by: zjb194813@alibaba-inc.com
>>>>> Reported-by: tianhu.hh@alibaba-inc.com
>>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>>>>> ---
>>>>>  mm/page_alloc.c | 3 +++
>>>>>  1 file changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>> index b1caa1c6c887..5b423caa68fd 100644
>>>>> --- a/mm/page_alloc.c
>>>>> +++ b/mm/page_alloc.c
>>>>> @@ -1129,6 +1129,9 @@ static inline void __free_one_page(struct page *page,
>>>>>
>>>>>                       buddy_pfn = __find_buddy_pfn(pfn, order);
>>>>>                       buddy = page + (buddy_pfn - pfn);
>>>>> +
>>>>> +                     if (!page_is_buddy(page, buddy, order))
>>>>> +                             goto done_merging;
>>>>>                       buddy_mt = get_pageblock_migratetype(buddy);
>>>>>
>>>>>                       if (migratetype != buddy_mt
>>>>> --
>>>>> 2.17.1
>>>>
>>>> --
>>>> Best Regards,
>>>> Yan, Zi
>>>
>>>
>>>
>>> --
>>> Best Regards
>>>  Guo Ren
>>>
>>> ML: https://lore.kernel.org/linux-csky/
>>
>> --
>> Best Regards,
>> Yan, Zi
>
>
>
> -- 
> Best Regards
>  Guo Ren
>
> ML: https://lore.kernel.org/linux-csky/

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype
  2022-06-14  0:14         ` Zi Yan
@ 2022-06-14  1:19           ` Guo Ren
  2022-06-14  3:26             ` Xianting Tian
  2022-06-15  6:47           ` Xianting Tian
  1 sibling, 1 reply; 19+ messages in thread
From: Guo Ren @ 2022-06-14  1:19 UTC (permalink / raw)
  To: Zi Yan
  Cc: Greg Kroah-Hartman, Xianting Tian, Andrew Morton,
	Vlastimil Babka, Linux-MM, Linux Kernel Mailing List, stable,
	huanyi.xj, zjb194813, tianhu.hh, Hanjun Guo, Joonsoo Kim,
	Laura Abbott

On Tue, Jun 14, 2022 at 8:14 AM Zi Yan <ziy@nvidia.com> wrote:
>
> On 13 Jun 2022, at 19:47, Guo Ren wrote:
>
> > On Tue, Jun 14, 2022 at 3:49 AM Zi Yan <ziy@nvidia.com> wrote:
> >>
> >> On 13 Jun 2022, at 12:32, Guo Ren wrote:
> >>
> >>> On Mon, Jun 13, 2022 at 11:23 PM Zi Yan <ziy@nvidia.com> wrote:
> >>>>
> >>>> Hi Xianting,
> >>>>
> >>>> Thanks for your patch.
> >>>>
> >>>> On 13 Jun 2022, at 9:10, Xianting Tian wrote:
> >>>>
> >>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
> >>>>> added buddy check code. But unfortunately, this fix isn't backported to
> >>>>> linux-5.17.y and the former stable branches. The reason is it added wrong
> >>>>> fixes message:
> >>>>>      Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
> >>>>>                          pageblocks with others")
> >>>>
> >>>> No, the Fixes tag is right. The commit above does need to validate buddy.
> >>> I think Xianting is right. The “Fixes:" tag is not accurate and the
> >>> page_is_buddy() is necessary here.
> >>>
> >>> This patch could be applied to the early version of the stable tree
> >>> (eg: Linux-5.10.y, not the master tree)
> >>
> >> This is quite misleading. Commit 787af64d05cd applies does not mean it is
> >> intended to fix the preexisting bug. Also it does not apply cleanly
> >> to commit d9dddbf55667, there is a clear indentation mismatch. At best,
> >> you can say the way of 787af64d05cd fixing 1dd214b8f21c also fixes d9dddbf55667.
> >> There is no way you can apply 787af64d05cd to earlier trees and call it a day.
> >>
> >> You can mention 787af64d05cd that it fixes a bug in 1dd214b8f21c and there is
> >> a similar bug in d9dddbf55667 that can be fixed in a similar way too. Saying
> >> the fixes message is wrong just misleads people, making them think there is
> >> no bug in 1dd214b8f21c. We need to be clear about this.
> > First, d9dddbf55667 is earlier than 1dd214b8f21c in Linus tree. The
> > origin fixes could cover the Linux-5.0.y tree if they give the
> > accurate commit number and that is the cause we want to point out.
>
> Yes, I got that d9dddbf55667 is earlier and commit 787af64d05cd fixes
> the issue introduced by d9dddbf55667. But my point is that 787af64d05cd
> is not intended to fix d9dddbf55667 and saying it has a wrong fixes
> message is misleading. This is the point I want to make.
>
> >
> > Second, if the patch is for d9dddbf55667 then it could cover any tree
> > in the stable repo. Actually, we only know Linux-5.10.y has the
> > problem.
>
> But it is not and does not apply to d9dddbf55667 cleanly.
>
> >
> > Maybe, Gregkh could help to direct us on how to deal with the issue:
> > (Fixup a bug which only belongs to the former stable branch.)
> >
>
> I think you just need to send this patch without saying “commit
> 787af64d05cd fixes message is wrong” would be a good start. You also
> need extra fix to mm/page_isolation.c for kernels between 5.15 and 5.17
> (inclusive). So there will need to be two patches:
>
> 1) your patch to stable tree prior to 5.15 and
>
> 2) your patch with an additional mm/page_isolation.c fix to stable tree
> between 5.15 and 5.17.
>
> >>
> >> Also, you will need to fix the mm/page_isolation.c code too to make this patch
> >> complete, unless you can show that PFN=0x1000 is never going to be encountered
> >> in the mm/page_isolation.c code I mentioned below.
> > No, we needn't fix mm/page_isolation.c in linux-5.10.y, because it had
> > pfn_valid_within(buddy_pfn) check after __find_buddy_pfn() to prevent
> > buddy_pfn=0.
> > The root cause comes from __find_buddy_pfn():
> > return page_pfn ^ (1 << order);
>
> Right. But pfn_valid_within() was removed since 5.15. So your fix is
> required for kernels between 5.15 and 5.17 (inclusive).
>
> >
> > When page_pfn is the same as the order size, it will return the
> > previous buddy not the next. That is the only exception for this
> > algorithm, right?
> >
> >
> >
> >
> > In fact, the bug is a very long time to reproduce and is not easy to
> > debug, so we want to contribute it to the community to prevent other
> > guys from wasting time. Although there is no new patch at all.
>
> Thanks for your reporting and sending out the patch. I really
> appreciate it. We definitely need your inputs. Throughout the email
> thread, I am trying to help you clarify the bug and how to fix it
> properly:
>
> 1. The commit 787af64d05cd does not apply cleanly to commits
> d9dddbf55667, meaning you cannot just cherry-pick that commit to
> fix the issue. That is why we need your patch to fix the issue.
> And saying it has a wrong fixes message in this patch’s git log is
> misleading.
Okay, seems we need to send some patches for the different stable
branches separately.

>
> 2. For kernels between 5.15 and 5.17 (inclusive), an additional fix
> to mm/page_isolation.c is also needed, since pfn_valid_within() was
> removed since 5.15 and the issue can appear during page isolation.
Good point and we would take care of that.

>
> 3. For kernels before 5.15, this patch will apply.
Thx

>
> >
> >>
> >>>
> >>>>
> >>>>> Actually, this issue is involved by commit:
> >>>>>      commit d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
> >>>>>
> >>>>> For RISC-V arch, the first 2M is reserved for sbi, so the start PFN is 512,
> >>>>> but it got buddy PFN 0 for PFN 0x2000:
> >>>>>      0 = 0x2000 ^ (1 << 12)
> >>>>> With the illegal buddy PFN 0, it got an illegal buddy page, which caused
> >>>>> crash in __get_pfnblock_flags_mask().
> >>>>
> >>>> It seems that the RISC-V arch reveals a similar bug from d9dddbf55667.
> >>>> Basically, this bug will only happen when PFN=0x2000 is merging up and
> >>>> there are some isolated pageblocks.
> >>> Not PFN=0x2000, it's PFN=0x1000, I guess.
> >>>
> >>> RISC-V's first 2MB RAM could reserve for opensbi, so it would have
> >>> riscv_pfn_base=512 and mem_map began with 512th PFN when
> >>> CONFIG_FLATMEM=y.
> >>> (Also, csky has the same issue: a non-zero pfn_base in some scenarios.)
> >>>
> >>> But __find_buddy_pfn algorithm thinks the start address is 0, it could
> >>> get 0 pfn or less than the pfn_base value. We need another check to
> >>> prevent that.
> >>>
> >>>>
> >>>> BTW, what does first reserved 2MB imply? All 4KB pages from first 2MB are
> >>>> set to PageReserved?
> >>>>
> >>>>>
> >>>>> With the patch, it can avoid the calling of get_pageblock_migratetype() if
> >>>>> it isn't buddy page.
> >>>>
> >>>> You might miss the __find_buddy_pfn() caller in unset_migratetype_isolate()
> >>>> from mm/page_isolation.c, if you are talking about linux-5.17.y and former
> >>>> version. There, page_is_buddy() is also not called and is_migrate_isolate_page()
> >>>> is called, which calls get_pageblock_migratetype() too.
> >>>>
> >>>>>
> >>>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
> >>>>> Cc: stable@vger.kernel.org
> >>>>> Reported-by: zjb194813@alibaba-inc.com
> >>>>> Reported-by: tianhu.hh@alibaba-inc.com
> >>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
> >>>>> ---
> >>>>>  mm/page_alloc.c | 3 +++
> >>>>>  1 file changed, 3 insertions(+)
> >>>>>
> >>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>>>> index b1caa1c6c887..5b423caa68fd 100644
> >>>>> --- a/mm/page_alloc.c
> >>>>> +++ b/mm/page_alloc.c
> >>>>> @@ -1129,6 +1129,9 @@ static inline void __free_one_page(struct page *page,
> >>>>>
> >>>>>                       buddy_pfn = __find_buddy_pfn(pfn, order);
> >>>>>                       buddy = page + (buddy_pfn - pfn);
> >>>>> +
> >>>>> +                     if (!page_is_buddy(page, buddy, order))
> >>>>> +                             goto done_merging;
> >>>>>                       buddy_mt = get_pageblock_migratetype(buddy);
> >>>>>
> >>>>>                       if (migratetype != buddy_mt
> >>>>> --
> >>>>> 2.17.1
> >>>>
> >>>> --
> >>>> Best Regards,
> >>>> Yan, Zi
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards
> >>>  Guo Ren
> >>>
> >>> ML: https://lore.kernel.org/linux-csky/
> >>
> >> --
> >> Best Regards,
> >> Yan, Zi
> >
> >
> >
> > --
> > Best Regards
> >  Guo Ren
> >
> > ML: https://lore.kernel.org/linux-csky/
>
> --
> Best Regards,
> Yan, Zi



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype
  2022-06-13 16:08 ` Guo Ren
@ 2022-06-14  1:26   ` Xianting Tian
  2022-06-14  2:04   ` Xianting Tian
  1 sibling, 0 replies; 19+ messages in thread
From: Xianting Tian @ 2022-06-14  1:26 UTC (permalink / raw)
  To: Guo Ren
  Cc: Andrew Morton, Vlastimil Babka, ziy, Linux-MM,
	Linux Kernel Mailing List, stable, huanyi.xj, zjb194813,
	tianhu.hh


在 2022/6/14 上午12:08, Guo Ren 写道:
> On Mon, Jun 13, 2022 at 9:11 PM Xianting Tian
> <xianting.tian@linux.alibaba.com> wrote:
>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>> added buddy check code. But unfortunately, this fix isn't backported to
>> linux-5.17.y and the former stable branches. The reason is it added wrong
>> fixes message:
>>       Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>>                             pageblocks with others")
>> Actually, this issue is involved by commit:
>>       commit d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>
>> For RISC-V arch, the first 2M is reserved for sbi, so the start PFN is 512,
>> but it got buddy PFN 0 for PFN 0x2000:
>>       0 = 0x2000 ^ (1 << 12)
> How did we get 0? (Try it in gdb)
> (gdb) p /x (0x2000 ^ (1<<12))
> $3 = 0x3000
> Sorry, the order is 0xd = 13, not 12, it is a typo.
> I think it got buddy PFN 0 for PFN 0x1000, right?
> (gdb) p /x (0x1000 ^ (1<<12))
> $4 = 0x0
>
>> With the illegal buddy PFN 0, it got an illegal buddy page, which caused
>> crash in __get_pfnblock_flags_mask().
>>
>> With the patch, it can avoid the calling of get_pageblock_migratetype() if
>> it isn't buddy page.
>>
>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>> Cc: stable@vger.kernel.org
>> Reported-by: zjb194813@alibaba-inc.com
>> Reported-by: tianhu.hh@alibaba-inc.com
>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>> ---
>>   mm/page_alloc.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index b1caa1c6c887..5b423caa68fd 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1129,6 +1129,9 @@ static inline void __free_one_page(struct page *page,
>>
>>                          buddy_pfn = __find_buddy_pfn(pfn, order);
>>                          buddy = page + (buddy_pfn - pfn);
>> +
>> +                       if (!page_is_buddy(page, buddy, order))
> Right, we need to check the buddy_pfn valid, because some SoCs would
> start dram address with an offset that has an order smaller than
> MAX_ORDER.
>
>> +                               goto done_merging;
>>                          buddy_mt = get_pageblock_migratetype(buddy);
>>
>>                          if (migratetype != buddy_mt
>> --
>> 2.17.1
>>
> Fixup the comment and
>
> Reviewed-by: Guo Ren <guoren@kernel.org>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype
  2022-06-13 16:08 ` Guo Ren
  2022-06-14  1:26   ` Xianting Tian
@ 2022-06-14  2:04   ` Xianting Tian
  1 sibling, 0 replies; 19+ messages in thread
From: Xianting Tian @ 2022-06-14  2:04 UTC (permalink / raw)
  To: Guo Ren
  Cc: Andrew Morton, Vlastimil Babka, ziy, Linux-MM,
	Linux Kernel Mailing List, stable, huanyi.xj, zjb194813,
	tianhu.hh


在 2022/6/14 上午12:08, Guo Ren 写道:
> On Mon, Jun 13, 2022 at 9:11 PM Xianting Tian
> <xianting.tian@linux.alibaba.com> wrote:
>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>> added buddy check code. But unfortunately, this fix isn't backported to
>> linux-5.17.y and the former stable branches. The reason is it added wrong
>> fixes message:
>>       Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>>                             pageblocks with others")
>> Actually, this issue is involved by commit:
>>       commit d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>
>> For RISC-V arch, the first 2M is reserved for sbi, so the start PFN is 512,
>> but it got buddy PFN 0 for PFN 0x2000:
>>       0 = 0x2000 ^ (1 << 12)
> How did we get 0? (Try it in gdb)
> (gdb) p /x (0x2000 ^ (1<<12))
> $3 = 0x3000
>
> I think it got buddy PFN 0 for PFN 0x1000, right?
> (gdb) p /x (0x1000 ^ (1<<12))
> $4 = 0x0
Sorry, it is a typo, the order is 0xd = 13, not 12.
>> With the illegal buddy PFN 0, it got an illegal buddy page, which caused
>> crash in __get_pfnblock_flags_mask().
>>
>> With the patch, it can avoid the calling of get_pageblock_migratetype() if
>> it isn't buddy page.
>>
>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>> Cc: stable@vger.kernel.org
>> Reported-by: zjb194813@alibaba-inc.com
>> Reported-by: tianhu.hh@alibaba-inc.com
>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>> ---
>>   mm/page_alloc.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index b1caa1c6c887..5b423caa68fd 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1129,6 +1129,9 @@ static inline void __free_one_page(struct page *page,
>>
>>                          buddy_pfn = __find_buddy_pfn(pfn, order);
>>                          buddy = page + (buddy_pfn - pfn);
>> +
>> +                       if (!page_is_buddy(page, buddy, order))
> Right, we need to check the buddy_pfn valid, because some SoCs would
> start dram address with an offset that has an order smaller than
> MAX_ORDER.
>
>> +                               goto done_merging;
>>                          buddy_mt = get_pageblock_migratetype(buddy);
>>
>>                          if (migratetype != buddy_mt
>> --
>> 2.17.1
>>
> Fixup the comment and
>
> Reviewed-by: Guo Ren <guoren@kernel.org>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype
  2022-06-14  1:19           ` Guo Ren
@ 2022-06-14  3:26             ` Xianting Tian
  0 siblings, 0 replies; 19+ messages in thread
From: Xianting Tian @ 2022-06-14  3:26 UTC (permalink / raw)
  To: Guo Ren, Zi Yan
  Cc: Greg Kroah-Hartman, Andrew Morton, Vlastimil Babka, Linux-MM,
	Linux Kernel Mailing List, stable, huanyi.xj, zjb194813,
	tianhu.hh, Hanjun Guo, Joonsoo Kim, Laura Abbott

Thanks Zi Yan, Guo Ren for the detailed discussion.

The commit message need to be improved, and I will send the patches soon.

在 2022/6/14 上午9:19, Guo Ren 写道:
> On Tue, Jun 14, 2022 at 8:14 AM Zi Yan <ziy@nvidia.com> wrote:
>> On 13 Jun 2022, at 19:47, Guo Ren wrote:
>>
>>> On Tue, Jun 14, 2022 at 3:49 AM Zi Yan <ziy@nvidia.com> wrote:
>>>> On 13 Jun 2022, at 12:32, Guo Ren wrote:
>>>>
>>>>> On Mon, Jun 13, 2022 at 11:23 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>>> Hi Xianting,
>>>>>>
>>>>>> Thanks for your patch.
>>>>>>
>>>>>> On 13 Jun 2022, at 9:10, Xianting Tian wrote:
>>>>>>
>>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>>>>>>> added buddy check code. But unfortunately, this fix isn't backported to
>>>>>>> linux-5.17.y and the former stable branches. The reason is it added wrong
>>>>>>> fixes message:
>>>>>>>       Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>>>>>>>                           pageblocks with others")
>>>>>> No, the Fixes tag is right. The commit above does need to validate buddy.
>>>>> I think Xianting is right. The “Fixes:" tag is not accurate and the
>>>>> page_is_buddy() is necessary here.
>>>>>
>>>>> This patch could be applied to the early version of the stable tree
>>>>> (eg: Linux-5.10.y, not the master tree)
>>>> This is quite misleading. Commit 787af64d05cd applies does not mean it is
>>>> intended to fix the preexisting bug. Also it does not apply cleanly
>>>> to commit d9dddbf55667, there is a clear indentation mismatch. At best,
>>>> you can say the way of 787af64d05cd fixing 1dd214b8f21c also fixes d9dddbf55667.
>>>> There is no way you can apply 787af64d05cd to earlier trees and call it a day.
>>>>
>>>> You can mention 787af64d05cd that it fixes a bug in 1dd214b8f21c and there is
>>>> a similar bug in d9dddbf55667 that can be fixed in a similar way too. Saying
>>>> the fixes message is wrong just misleads people, making them think there is
>>>> no bug in 1dd214b8f21c. We need to be clear about this.
>>> First, d9dddbf55667 is earlier than 1dd214b8f21c in Linus tree. The
>>> origin fixes could cover the Linux-5.0.y tree if they give the
>>> accurate commit number and that is the cause we want to point out.
>> Yes, I got that d9dddbf55667 is earlier and commit 787af64d05cd fixes
>> the issue introduced by d9dddbf55667. But my point is that 787af64d05cd
>> is not intended to fix d9dddbf55667 and saying it has a wrong fixes
>> message is misleading. This is the point I want to make.
>>
>>> Second, if the patch is for d9dddbf55667 then it could cover any tree
>>> in the stable repo. Actually, we only know Linux-5.10.y has the
>>> problem.
>> But it is not and does not apply to d9dddbf55667 cleanly.
>>
>>> Maybe, Gregkh could help to direct us on how to deal with the issue:
>>> (Fixup a bug which only belongs to the former stable branch.)
>>>
>> I think you just need to send this patch without saying “commit
>> 787af64d05cd fixes message is wrong” would be a good start. You also
>> need extra fix to mm/page_isolation.c for kernels between 5.15 and 5.17
>> (inclusive). So there will need to be two patches:
>>
>> 1) your patch to stable tree prior to 5.15 and
>>
>> 2) your patch with an additional mm/page_isolation.c fix to stable tree
>> between 5.15 and 5.17.
>>
>>>> Also, you will need to fix the mm/page_isolation.c code too to make this patch
>>>> complete, unless you can show that PFN=0x1000 is never going to be encountered
>>>> in the mm/page_isolation.c code I mentioned below.
>>> No, we needn't fix mm/page_isolation.c in linux-5.10.y, because it had
>>> pfn_valid_within(buddy_pfn) check after __find_buddy_pfn() to prevent
>>> buddy_pfn=0.
>>> The root cause comes from __find_buddy_pfn():
>>> return page_pfn ^ (1 << order);
>> Right. But pfn_valid_within() was removed since 5.15. So your fix is
>> required for kernels between 5.15 and 5.17 (inclusive).
>>
>>> When page_pfn is the same as the order size, it will return the
>>> previous buddy not the next. That is the only exception for this
>>> algorithm, right?
>>>
>>>
>>>
>>>
>>> In fact, the bug is a very long time to reproduce and is not easy to
>>> debug, so we want to contribute it to the community to prevent other
>>> guys from wasting time. Although there is no new patch at all.
>> Thanks for your reporting and sending out the patch. I really
>> appreciate it. We definitely need your inputs. Throughout the email
>> thread, I am trying to help you clarify the bug and how to fix it
>> properly:
>>
>> 1. The commit 787af64d05cd does not apply cleanly to commits
>> d9dddbf55667, meaning you cannot just cherry-pick that commit to
>> fix the issue. That is why we need your patch to fix the issue.
>> And saying it has a wrong fixes message in this patch’s git log is
>> misleading.
> Okay, seems we need to send some patches for the different stable
> branches separately.
>
>> 2. For kernels between 5.15 and 5.17 (inclusive), an additional fix
>> to mm/page_isolation.c is also needed, since pfn_valid_within() was
>> removed since 5.15 and the issue can appear during page isolation.
> Good point and we would take care of that.
>
>> 3. For kernels before 5.15, this patch will apply.
> Thx
>
>>>>>>> Actually, this issue is involved by commit:
>>>>>>>       commit d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>>>>
>>>>>>> For RISC-V arch, the first 2M is reserved for sbi, so the start PFN is 512,
>>>>>>> but it got buddy PFN 0 for PFN 0x2000:
>>>>>>>       0 = 0x2000 ^ (1 << 12)
>>>>>>> With the illegal buddy PFN 0, it got an illegal buddy page, which caused
>>>>>>> crash in __get_pfnblock_flags_mask().
>>>>>> It seems that the RISC-V arch reveals a similar bug from d9dddbf55667.
>>>>>> Basically, this bug will only happen when PFN=0x2000 is merging up and
>>>>>> there are some isolated pageblocks.
>>>>> Not PFN=0x2000, it's PFN=0x1000, I guess.
>>>>>
>>>>> RISC-V's first 2MB RAM could reserve for opensbi, so it would have
>>>>> riscv_pfn_base=512 and mem_map began with 512th PFN when
>>>>> CONFIG_FLATMEM=y.
>>>>> (Also, csky has the same issue: a non-zero pfn_base in some scenarios.)
>>>>>
>>>>> But __find_buddy_pfn algorithm thinks the start address is 0, it could
>>>>> get 0 pfn or less than the pfn_base value. We need another check to
>>>>> prevent that.
>>>>>
>>>>>> BTW, what does first reserved 2MB imply? All 4KB pages from first 2MB are
>>>>>> set to PageReserved?
>>>>>>
>>>>>>> With the patch, it can avoid the calling of get_pageblock_migratetype() if
>>>>>>> it isn't buddy page.
>>>>>> You might miss the __find_buddy_pfn() caller in unset_migratetype_isolate()
>>>>>> from mm/page_isolation.c, if you are talking about linux-5.17.y and former
>>>>>> version. There, page_is_buddy() is also not called and is_migrate_isolate_page()
>>>>>> is called, which calls get_pageblock_migratetype() too.
>>>>>>
>>>>>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>>>> Cc: stable@vger.kernel.org
>>>>>>> Reported-by: zjb194813@alibaba-inc.com
>>>>>>> Reported-by: tianhu.hh@alibaba-inc.com
>>>>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>>>>>>> ---
>>>>>>>   mm/page_alloc.c | 3 +++
>>>>>>>   1 file changed, 3 insertions(+)
>>>>>>>
>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>>> index b1caa1c6c887..5b423caa68fd 100644
>>>>>>> --- a/mm/page_alloc.c
>>>>>>> +++ b/mm/page_alloc.c
>>>>>>> @@ -1129,6 +1129,9 @@ static inline void __free_one_page(struct page *page,
>>>>>>>
>>>>>>>                        buddy_pfn = __find_buddy_pfn(pfn, order);
>>>>>>>                        buddy = page + (buddy_pfn - pfn);
>>>>>>> +
>>>>>>> +                     if (!page_is_buddy(page, buddy, order))
>>>>>>> +                             goto done_merging;
>>>>>>>                        buddy_mt = get_pageblock_migratetype(buddy);
>>>>>>>
>>>>>>>                        if (migratetype != buddy_mt
>>>>>>> --
>>>>>>> 2.17.1
>>>>>> --
>>>>>> Best Regards,
>>>>>> Yan, Zi
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards
>>>>>   Guo Ren
>>>>>
>>>>> ML: https://lore.kernel.org/linux-csky/
>>>> --
>>>> Best Regards,
>>>> Yan, Zi
>>>
>>>
>>> --
>>> Best Regards
>>>   Guo Ren
>>>
>>> ML: https://lore.kernel.org/linux-csky/
>> --
>> Best Regards,
>> Yan, Zi
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype
  2022-06-14  0:14         ` Zi Yan
  2022-06-14  1:19           ` Guo Ren
@ 2022-06-15  6:47           ` Xianting Tian
  2022-06-15 13:55             ` Zi Yan
  1 sibling, 1 reply; 19+ messages in thread
From: Xianting Tian @ 2022-06-15  6:47 UTC (permalink / raw)
  To: Zi Yan, Guo Ren
  Cc: Greg Kroah-Hartman, Andrew Morton, Vlastimil Babka, Linux-MM,
	Linux Kernel Mailing List, stable, huanyi.xj, zjb194813,
	tianhu.hh, Hanjun Guo, Joonsoo Kim, Laura Abbott


在 2022/6/14 上午8:14, Zi Yan 写道:
> On 13 Jun 2022, at 19:47, Guo Ren wrote:
>
>> On Tue, Jun 14, 2022 at 3:49 AM Zi Yan <ziy@nvidia.com> wrote:
>>> On 13 Jun 2022, at 12:32, Guo Ren wrote:
>>>
>>>> On Mon, Jun 13, 2022 at 11:23 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>> Hi Xianting,
>>>>>
>>>>> Thanks for your patch.
>>>>>
>>>>> On 13 Jun 2022, at 9:10, Xianting Tian wrote:
>>>>>
>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>>>>>> added buddy check code. But unfortunately, this fix isn't backported to
>>>>>> linux-5.17.y and the former stable branches. The reason is it added wrong
>>>>>> fixes message:
>>>>>>       Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>>>>>>                           pageblocks with others")
>>>>> No, the Fixes tag is right. The commit above does need to validate buddy.
>>>> I think Xianting is right. The “Fixes:" tag is not accurate and the
>>>> page_is_buddy() is necessary here.
>>>>
>>>> This patch could be applied to the early version of the stable tree
>>>> (eg: Linux-5.10.y, not the master tree)
>>> This is quite misleading. Commit 787af64d05cd applies does not mean it is
>>> intended to fix the preexisting bug. Also it does not apply cleanly
>>> to commit d9dddbf55667, there is a clear indentation mismatch. At best,
>>> you can say the way of 787af64d05cd fixing 1dd214b8f21c also fixes d9dddbf55667.
>>> There is no way you can apply 787af64d05cd to earlier trees and call it a day.
>>>
>>> You can mention 787af64d05cd that it fixes a bug in 1dd214b8f21c and there is
>>> a similar bug in d9dddbf55667 that can be fixed in a similar way too. Saying
>>> the fixes message is wrong just misleads people, making them think there is
>>> no bug in 1dd214b8f21c. We need to be clear about this.
>> First, d9dddbf55667 is earlier than 1dd214b8f21c in Linus tree. The
>> origin fixes could cover the Linux-5.0.y tree if they give the
>> accurate commit number and that is the cause we want to point out.
> Yes, I got that d9dddbf55667 is earlier and commit 787af64d05cd fixes
> the issue introduced by d9dddbf55667. But my point is that 787af64d05cd
> is not intended to fix d9dddbf55667 and saying it has a wrong fixes
> message is misleading. This is the point I want to make.
>
>> Second, if the patch is for d9dddbf55667 then it could cover any tree
>> in the stable repo. Actually, we only know Linux-5.10.y has the
>> problem.
> But it is not and does not apply to d9dddbf55667 cleanly.
>
>> Maybe, Gregkh could help to direct us on how to deal with the issue:
>> (Fixup a bug which only belongs to the former stable branch.)
>>
> I think you just need to send this patch without saying “commit
> 787af64d05cd fixes message is wrong” would be a good start. You also
> need extra fix to mm/page_isolation.c for kernels between 5.15 and 5.17
> (inclusive). So there will need to be two patches:
>
> 1) your patch to stable tree prior to 5.15 and
>
> 2) your patch with an additional mm/page_isolation.c fix to stable tree
> between 5.15 and 5.17.
>
>>> Also, you will need to fix the mm/page_isolation.c code too to make this patch
>>> complete, unless you can show that PFN=0x1000 is never going to be encountered
>>> in the mm/page_isolation.c code I mentioned below.
>> No, we needn't fix mm/page_isolation.c in linux-5.10.y, because it had
>> pfn_valid_within(buddy_pfn) check after __find_buddy_pfn() to prevent
>> buddy_pfn=0.
>> The root cause comes from __find_buddy_pfn():
>> return page_pfn ^ (1 << order);
> Right. But pfn_valid_within() was removed since 5.15. So your fix is
> required for kernels between 5.15 and 5.17 (inclusive).
>
>> When page_pfn is the same as the order size, it will return the
>> previous buddy not the next. That is the only exception for this
>> algorithm, right?
>>
>>
>>
>>
>> In fact, the bug is a very long time to reproduce and is not easy to
>> debug, so we want to contribute it to the community to prevent other
>> guys from wasting time. Although there is no new patch at all.
> Thanks for your reporting and sending out the patch. I really
> appreciate it. We definitely need your inputs. Throughout the email
> thread, I am trying to help you clarify the bug and how to fix it
> properly:
>
> 1. The commit 787af64d05cd does not apply cleanly to commits
> d9dddbf55667, meaning you cannot just cherry-pick that commit to
> fix the issue. That is why we need your patch to fix the issue.
> And saying it has a wrong fixes message in this patch’s git log is
> misleading.
>
> 2. For kernels between 5.15 and 5.17 (inclusive), an additional fix
> to mm/page_isolation.c is also needed, since pfn_valid_within() was
> removed since 5.15 and the issue can appear during page isolation.
>
> 3. For kernels before 5.15, this patch will apply.

Zi Yan, Guo Ren,

I think we still need some imporvemnt for MASTER branch, as we discussed 
above, we will get an illegal buddy page if buddy_pfn is 0,

within page_is_buddy(), it still use the illegal buddy page to do the 
check. I think in most of cases, page_is_buddy() can return false,  but 
it still may return true with very low probablity.

I think we need to add some code to verify buddy_pfn in the first place.

Could you give some suggestions for this idea?

>
>>>>>> Actually, this issue is involved by commit:
>>>>>>       commit d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>>>
>>>>>> For RISC-V arch, the first 2M is reserved for sbi, so the start PFN is 512,
>>>>>> but it got buddy PFN 0 for PFN 0x2000:
>>>>>>       0 = 0x2000 ^ (1 << 12)
>>>>>> With the illegal buddy PFN 0, it got an illegal buddy page, which caused
>>>>>> crash in __get_pfnblock_flags_mask().
>>>>> It seems that the RISC-V arch reveals a similar bug from d9dddbf55667.
>>>>> Basically, this bug will only happen when PFN=0x2000 is merging up and
>>>>> there are some isolated pageblocks.
>>>> Not PFN=0x2000, it's PFN=0x1000, I guess.
>>>>
>>>> RISC-V's first 2MB RAM could reserve for opensbi, so it would have
>>>> riscv_pfn_base=512 and mem_map began with 512th PFN when
>>>> CONFIG_FLATMEM=y.
>>>> (Also, csky has the same issue: a non-zero pfn_base in some scenarios.)
>>>>
>>>> But __find_buddy_pfn algorithm thinks the start address is 0, it could
>>>> get 0 pfn or less than the pfn_base value. We need another check to
>>>> prevent that.
>>>>
>>>>> BTW, what does first reserved 2MB imply? All 4KB pages from first 2MB are
>>>>> set to PageReserved?
>>>>>
>>>>>> With the patch, it can avoid the calling of get_pageblock_migratetype() if
>>>>>> it isn't buddy page.
>>>>> You might miss the __find_buddy_pfn() caller in unset_migratetype_isolate()
>>>>> from mm/page_isolation.c, if you are talking about linux-5.17.y and former
>>>>> version. There, page_is_buddy() is also not called and is_migrate_isolate_page()
>>>>> is called, which calls get_pageblock_migratetype() too.
>>>>>
>>>>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>>> Cc: stable@vger.kernel.org
>>>>>> Reported-by: zjb194813@alibaba-inc.com
>>>>>> Reported-by: tianhu.hh@alibaba-inc.com
>>>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>>>>>> ---
>>>>>>   mm/page_alloc.c | 3 +++
>>>>>>   1 file changed, 3 insertions(+)
>>>>>>
>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>> index b1caa1c6c887..5b423caa68fd 100644
>>>>>> --- a/mm/page_alloc.c
>>>>>> +++ b/mm/page_alloc.c
>>>>>> @@ -1129,6 +1129,9 @@ static inline void __free_one_page(struct page *page,
>>>>>>
>>>>>>                        buddy_pfn = __find_buddy_pfn(pfn, order);
>>>>>>                        buddy = page + (buddy_pfn - pfn);
>>>>>> +
>>>>>> +                     if (!page_is_buddy(page, buddy, order))
>>>>>> +                             goto done_merging;
>>>>>>                        buddy_mt = get_pageblock_migratetype(buddy);
>>>>>>
>>>>>>                        if (migratetype != buddy_mt
>>>>>> --
>>>>>> 2.17.1
>>>>> --
>>>>> Best Regards,
>>>>> Yan, Zi
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>   Guo Ren
>>>>
>>>> ML: https://lore.kernel.org/linux-csky/
>>> --
>>> Best Regards,
>>> Yan, Zi
>>
>>
>> -- 
>> Best Regards
>>   Guo Ren
>>
>> ML: https://lore.kernel.org/linux-csky/
> --
> Best Regards,
> Yan, Zi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype
  2022-06-15  6:47           ` Xianting Tian
@ 2022-06-15 13:55             ` Zi Yan
  2022-06-15 16:15               ` Xianting Tian
  0 siblings, 1 reply; 19+ messages in thread
From: Zi Yan @ 2022-06-15 13:55 UTC (permalink / raw)
  To: Xianting Tian
  Cc: Guo Ren, Greg Kroah-Hartman, Andrew Morton, Vlastimil Babka,
	Linux-MM, Linux Kernel Mailing List, stable, huanyi.xj,
	zjb194813, tianhu.hh, Hanjun Guo, Joonsoo Kim, Laura Abbott

[-- Attachment #1: Type: text/plain, Size: 9077 bytes --]

On 15 Jun 2022, at 2:47, Xianting Tian wrote:

> 在 2022/6/14 上午8:14, Zi Yan 写道:
>> On 13 Jun 2022, at 19:47, Guo Ren wrote:
>>
>>> On Tue, Jun 14, 2022 at 3:49 AM Zi Yan <ziy@nvidia.com> wrote:
>>>> On 13 Jun 2022, at 12:32, Guo Ren wrote:
>>>>
>>>>> On Mon, Jun 13, 2022 at 11:23 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>>> Hi Xianting,
>>>>>>
>>>>>> Thanks for your patch.
>>>>>>
>>>>>> On 13 Jun 2022, at 9:10, Xianting Tian wrote:
>>>>>>
>>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>>>>>>> added buddy check code. But unfortunately, this fix isn't backported to
>>>>>>> linux-5.17.y and the former stable branches. The reason is it added wrong
>>>>>>> fixes message:
>>>>>>>       Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>>>>>>>                           pageblocks with others")
>>>>>> No, the Fixes tag is right. The commit above does need to validate buddy.
>>>>> I think Xianting is right. The “Fixes:" tag is not accurate and the
>>>>> page_is_buddy() is necessary here.
>>>>>
>>>>> This patch could be applied to the early version of the stable tree
>>>>> (eg: Linux-5.10.y, not the master tree)
>>>> This is quite misleading. Commit 787af64d05cd applies does not mean it is
>>>> intended to fix the preexisting bug. Also it does not apply cleanly
>>>> to commit d9dddbf55667, there is a clear indentation mismatch. At best,
>>>> you can say the way of 787af64d05cd fixing 1dd214b8f21c also fixes d9dddbf55667.
>>>> There is no way you can apply 787af64d05cd to earlier trees and call it a day.
>>>>
>>>> You can mention 787af64d05cd that it fixes a bug in 1dd214b8f21c and there is
>>>> a similar bug in d9dddbf55667 that can be fixed in a similar way too. Saying
>>>> the fixes message is wrong just misleads people, making them think there is
>>>> no bug in 1dd214b8f21c. We need to be clear about this.
>>> First, d9dddbf55667 is earlier than 1dd214b8f21c in Linus tree. The
>>> origin fixes could cover the Linux-5.0.y tree if they give the
>>> accurate commit number and that is the cause we want to point out.
>> Yes, I got that d9dddbf55667 is earlier and commit 787af64d05cd fixes
>> the issue introduced by d9dddbf55667. But my point is that 787af64d05cd
>> is not intended to fix d9dddbf55667 and saying it has a wrong fixes
>> message is misleading. This is the point I want to make.
>>
>>> Second, if the patch is for d9dddbf55667 then it could cover any tree
>>> in the stable repo. Actually, we only know Linux-5.10.y has the
>>> problem.
>> But it is not and does not apply to d9dddbf55667 cleanly.
>>
>>> Maybe, Gregkh could help to direct us on how to deal with the issue:
>>> (Fixup a bug which only belongs to the former stable branch.)
>>>
>> I think you just need to send this patch without saying “commit
>> 787af64d05cd fixes message is wrong” would be a good start. You also
>> need extra fix to mm/page_isolation.c for kernels between 5.15 and 5.17
>> (inclusive). So there will need to be two patches:
>>
>> 1) your patch to stable tree prior to 5.15 and
>>
>> 2) your patch with an additional mm/page_isolation.c fix to stable tree
>> between 5.15 and 5.17.
>>
>>>> Also, you will need to fix the mm/page_isolation.c code too to make this patch
>>>> complete, unless you can show that PFN=0x1000 is never going to be encountered
>>>> in the mm/page_isolation.c code I mentioned below.
>>> No, we needn't fix mm/page_isolation.c in linux-5.10.y, because it had
>>> pfn_valid_within(buddy_pfn) check after __find_buddy_pfn() to prevent
>>> buddy_pfn=0.
>>> The root cause comes from __find_buddy_pfn():
>>> return page_pfn ^ (1 << order);
>> Right. But pfn_valid_within() was removed since 5.15. So your fix is
>> required for kernels between 5.15 and 5.17 (inclusive).
>>
>>> When page_pfn is the same as the order size, it will return the
>>> previous buddy not the next. That is the only exception for this
>>> algorithm, right?
>>>
>>>
>>>
>>>
>>> In fact, the bug is a very long time to reproduce and is not easy to
>>> debug, so we want to contribute it to the community to prevent other
>>> guys from wasting time. Although there is no new patch at all.
>> Thanks for your reporting and sending out the patch. I really
>> appreciate it. We definitely need your inputs. Throughout the email
>> thread, I am trying to help you clarify the bug and how to fix it
>> properly:
>>
>> 1. The commit 787af64d05cd does not apply cleanly to commits
>> d9dddbf55667, meaning you cannot just cherry-pick that commit to
>> fix the issue. That is why we need your patch to fix the issue.
>> And saying it has a wrong fixes message in this patch’s git log is
>> misleading.
>>
>> 2. For kernels between 5.15 and 5.17 (inclusive), an additional fix
>> to mm/page_isolation.c is also needed, since pfn_valid_within() was
>> removed since 5.15 and the issue can appear during page isolation.
>>
>> 3. For kernels before 5.15, this patch will apply.
>
> Zi Yan, Guo Ren,
>
> I think we still need some imporvemnt for MASTER branch, as we discussed above, we will get an illegal buddy page if buddy_pfn is 0,
>
> within page_is_buddy(), it still use the illegal buddy page to do the check. I think in most of cases, page_is_buddy() can return false,  but it still may return true with very low probablity.

Can you elaborate more on this? What kind of page can lead to page_is_buddy()
returning true? You said it is buddy_pfn is 0, but if the page is reserved,
if (!page_is_guard(buddy) && !PageBuddy(buddy)) should return false.
Maybe show us the dump_page() that offending page.

Thanks.

>
> I think we need to add some code to verify buddy_pfn in the first place.
>
> Could you give some suggestions for this idea?
>
>>
>>>>>>> Actually, this issue is involved by commit:
>>>>>>>       commit d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>>>>
>>>>>>> For RISC-V arch, the first 2M is reserved for sbi, so the start PFN is 512,
>>>>>>> but it got buddy PFN 0 for PFN 0x2000:
>>>>>>>       0 = 0x2000 ^ (1 << 12)
>>>>>>> With the illegal buddy PFN 0, it got an illegal buddy page, which caused
>>>>>>> crash in __get_pfnblock_flags_mask().
>>>>>> It seems that the RISC-V arch reveals a similar bug from d9dddbf55667.
>>>>>> Basically, this bug will only happen when PFN=0x2000 is merging up and
>>>>>> there are some isolated pageblocks.
>>>>> Not PFN=0x2000, it's PFN=0x1000, I guess.
>>>>>
>>>>> RISC-V's first 2MB RAM could reserve for opensbi, so it would have
>>>>> riscv_pfn_base=512 and mem_map began with 512th PFN when
>>>>> CONFIG_FLATMEM=y.
>>>>> (Also, csky has the same issue: a non-zero pfn_base in some scenarios.)
>>>>>
>>>>> But __find_buddy_pfn algorithm thinks the start address is 0, it could
>>>>> get 0 pfn or less than the pfn_base value. We need another check to
>>>>> prevent that.
>>>>>
>>>>>> BTW, what does first reserved 2MB imply? All 4KB pages from first 2MB are
>>>>>> set to PageReserved?
>>>>>>
>>>>>>> With the patch, it can avoid the calling of get_pageblock_migratetype() if
>>>>>>> it isn't buddy page.
>>>>>> You might miss the __find_buddy_pfn() caller in unset_migratetype_isolate()
>>>>>> from mm/page_isolation.c, if you are talking about linux-5.17.y and former
>>>>>> version. There, page_is_buddy() is also not called and is_migrate_isolate_page()
>>>>>> is called, which calls get_pageblock_migratetype() too.
>>>>>>
>>>>>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>>>> Cc: stable@vger.kernel.org
>>>>>>> Reported-by: zjb194813@alibaba-inc.com
>>>>>>> Reported-by: tianhu.hh@alibaba-inc.com
>>>>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>>>>>>> ---
>>>>>>>   mm/page_alloc.c | 3 +++
>>>>>>>   1 file changed, 3 insertions(+)
>>>>>>>
>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>>> index b1caa1c6c887..5b423caa68fd 100644
>>>>>>> --- a/mm/page_alloc.c
>>>>>>> +++ b/mm/page_alloc.c
>>>>>>> @@ -1129,6 +1129,9 @@ static inline void __free_one_page(struct page *page,
>>>>>>>
>>>>>>>                        buddy_pfn = __find_buddy_pfn(pfn, order);
>>>>>>>                        buddy = page + (buddy_pfn - pfn);
>>>>>>> +
>>>>>>> +                     if (!page_is_buddy(page, buddy, order))
>>>>>>> +                             goto done_merging;
>>>>>>>                        buddy_mt = get_pageblock_migratetype(buddy);
>>>>>>>
>>>>>>>                        if (migratetype != buddy_mt
>>>>>>> --
>>>>>>> 2.17.1
>>>>>> --
>>>>>> Best Regards,
>>>>>> Yan, Zi
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards
>>>>>   Guo Ren
>>>>>
>>>>> ML: https://lore.kernel.org/linux-csky/
>>>> --
>>>> Best Regards,
>>>> Yan, Zi
>>>
>>>
>>> -- 
>>> Best Regards
>>>   Guo Ren
>>>
>>> ML: https://lore.kernel.org/linux-csky/
>> --
>> Best Regards,
>> Yan, Zi

--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype
  2022-06-15 13:55             ` Zi Yan
@ 2022-06-15 16:15               ` Xianting Tian
  2022-06-16  1:51                 ` Xianting Tian
  2022-06-16 14:01                 ` Zi Yan
  0 siblings, 2 replies; 19+ messages in thread
From: Xianting Tian @ 2022-06-15 16:15 UTC (permalink / raw)
  To: Zi Yan
  Cc: Guo Ren, Greg Kroah-Hartman, Andrew Morton, Vlastimil Babka,
	Linux-MM, Linux Kernel Mailing List, stable, huanyi.xj,
	zjb194813, tianhu.hh, Hanjun Guo, Joonsoo Kim, Laura Abbott


在 2022/6/15 下午9:55, Zi Yan 写道:
> On 15 Jun 2022, at 2:47, Xianting Tian wrote:
>
>> 在 2022/6/14 上午8:14, Zi Yan 写道:
>>> On 13 Jun 2022, at 19:47, Guo Ren wrote:
>>>
>>>> On Tue, Jun 14, 2022 at 3:49 AM Zi Yan <ziy@nvidia.com> wrote:
>>>>> On 13 Jun 2022, at 12:32, Guo Ren wrote:
>>>>>
>>>>>> On Mon, Jun 13, 2022 at 11:23 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>>>> Hi Xianting,
>>>>>>>
>>>>>>> Thanks for your patch.
>>>>>>>
>>>>>>> On 13 Jun 2022, at 9:10, Xianting Tian wrote:
>>>>>>>
>>>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>>>>>>>> added buddy check code. But unfortunately, this fix isn't backported to
>>>>>>>> linux-5.17.y and the former stable branches. The reason is it added wrong
>>>>>>>> fixes message:
>>>>>>>>        Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>>>>>>>>                            pageblocks with others")
>>>>>>> No, the Fixes tag is right. The commit above does need to validate buddy.
>>>>>> I think Xianting is right. The “Fixes:" tag is not accurate and the
>>>>>> page_is_buddy() is necessary here.
>>>>>>
>>>>>> This patch could be applied to the early version of the stable tree
>>>>>> (eg: Linux-5.10.y, not the master tree)
>>>>> This is quite misleading. Commit 787af64d05cd applies does not mean it is
>>>>> intended to fix the preexisting bug. Also it does not apply cleanly
>>>>> to commit d9dddbf55667, there is a clear indentation mismatch. At best,
>>>>> you can say the way of 787af64d05cd fixing 1dd214b8f21c also fixes d9dddbf55667.
>>>>> There is no way you can apply 787af64d05cd to earlier trees and call it a day.
>>>>>
>>>>> You can mention 787af64d05cd that it fixes a bug in 1dd214b8f21c and there is
>>>>> a similar bug in d9dddbf55667 that can be fixed in a similar way too. Saying
>>>>> the fixes message is wrong just misleads people, making them think there is
>>>>> no bug in 1dd214b8f21c. We need to be clear about this.
>>>> First, d9dddbf55667 is earlier than 1dd214b8f21c in Linus tree. The
>>>> origin fixes could cover the Linux-5.0.y tree if they give the
>>>> accurate commit number and that is the cause we want to point out.
>>> Yes, I got that d9dddbf55667 is earlier and commit 787af64d05cd fixes
>>> the issue introduced by d9dddbf55667. But my point is that 787af64d05cd
>>> is not intended to fix d9dddbf55667 and saying it has a wrong fixes
>>> message is misleading. This is the point I want to make.
>>>
>>>> Second, if the patch is for d9dddbf55667 then it could cover any tree
>>>> in the stable repo. Actually, we only know Linux-5.10.y has the
>>>> problem.
>>> But it is not and does not apply to d9dddbf55667 cleanly.
>>>
>>>> Maybe, Gregkh could help to direct us on how to deal with the issue:
>>>> (Fixup a bug which only belongs to the former stable branch.)
>>>>
>>> I think you just need to send this patch without saying “commit
>>> 787af64d05cd fixes message is wrong” would be a good start. You also
>>> need extra fix to mm/page_isolation.c for kernels between 5.15 and 5.17
>>> (inclusive). So there will need to be two patches:
>>>
>>> 1) your patch to stable tree prior to 5.15 and
>>>
>>> 2) your patch with an additional mm/page_isolation.c fix to stable tree
>>> between 5.15 and 5.17.
>>>
>>>>> Also, you will need to fix the mm/page_isolation.c code too to make this patch
>>>>> complete, unless you can show that PFN=0x1000 is never going to be encountered
>>>>> in the mm/page_isolation.c code I mentioned below.
>>>> No, we needn't fix mm/page_isolation.c in linux-5.10.y, because it had
>>>> pfn_valid_within(buddy_pfn) check after __find_buddy_pfn() to prevent
>>>> buddy_pfn=0.
>>>> The root cause comes from __find_buddy_pfn():
>>>> return page_pfn ^ (1 << order);
>>> Right. But pfn_valid_within() was removed since 5.15. So your fix is
>>> required for kernels between 5.15 and 5.17 (inclusive).
>>>
>>>> When page_pfn is the same as the order size, it will return the
>>>> previous buddy not the next. That is the only exception for this
>>>> algorithm, right?
>>>>
>>>>
>>>>
>>>>
>>>> In fact, the bug is a very long time to reproduce and is not easy to
>>>> debug, so we want to contribute it to the community to prevent other
>>>> guys from wasting time. Although there is no new patch at all.
>>> Thanks for your reporting and sending out the patch. I really
>>> appreciate it. We definitely need your inputs. Throughout the email
>>> thread, I am trying to help you clarify the bug and how to fix it
>>> properly:
>>>
>>> 1. The commit 787af64d05cd does not apply cleanly to commits
>>> d9dddbf55667, meaning you cannot just cherry-pick that commit to
>>> fix the issue. That is why we need your patch to fix the issue.
>>> And saying it has a wrong fixes message in this patch’s git log is
>>> misleading.
>>>
>>> 2. For kernels between 5.15 and 5.17 (inclusive), an additional fix
>>> to mm/page_isolation.c is also needed, since pfn_valid_within() was
>>> removed since 5.15 and the issue can appear during page isolation.
>>>
>>> 3. For kernels before 5.15, this patch will apply.
>> Zi Yan, Guo Ren,
>>
>> I think we still need some imporvemnt for MASTER branch, as we discussed above, we will get an illegal buddy page if buddy_pfn is 0,
>>
>> within page_is_buddy(), it still use the illegal buddy page to do the check. I think in most of cases, page_is_buddy() can return false,  but it still may return true with very low probablity.
> Can you elaborate more on this? What kind of page can lead to page_is_buddy()
> returning true? You said it is buddy_pfn is 0, but if the page is reserved,
> if (!page_is_guard(buddy) && !PageBuddy(buddy)) should return false.
> Maybe show us the dump_page() that offending page.
>
> Thanks.

Let‘s take the issue we met on RISC-V arch for example,

pfn_base is 512 as we reserved 2M RAM for opensbi, mem_map's value is 
0xffffffe07e205000, which is the page address of PFN 512.

__find_buddy_pfn() returned 0 for PFN 0x2000 with order 0xd.
We know PFN 0 is not a valid pfn for buddy system, because 512 is the first PFN for buddy system.

Then it use below code to get buddy page with buddy_pfn 0:
buddy = page + (buddy_pfn - pfn);
So buddy page address is:
0xffffffe07e1fe000 = (struct page*)0xffffffe07e26e000 + (0 - 0x2000)

we can know this buddy page's address is less than mem_map(0xffffffe07e1fe000 < 0xffffffe07e205000),
actually 0xffffffe07e1fe000 is not a valid page's address. If we use 0xffffffe07e1fe000
as the page's address to extract the value of a member in 'struct page', we may get an uncertain value.
That's why I say page_is_buddy() may return true with very low probablity.

So I think we need to add the code the verify buddy_pfn in the first place:
	pfn_valid(buddy_pfn)

>> I think we need to add some code to verify buddy_pfn in the first place.
>>
>> Could you give some suggestions for this idea?
>>
>>>>>>>> Actually, this issue is involved by commit:
>>>>>>>>        commit d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>>>>>
>>>>>>>> For RISC-V arch, the first 2M is reserved for sbi, so the start PFN is 512,
>>>>>>>> but it got buddy PFN 0 for PFN 0x2000:
>>>>>>>>        0 = 0x2000 ^ (1 << 12)
>>>>>>>> With the illegal buddy PFN 0, it got an illegal buddy page, which caused
>>>>>>>> crash in __get_pfnblock_flags_mask().
>>>>>>> It seems that the RISC-V arch reveals a similar bug from d9dddbf55667.
>>>>>>> Basically, this bug will only happen when PFN=0x2000 is merging up and
>>>>>>> there are some isolated pageblocks.
>>>>>> Not PFN=0x2000, it's PFN=0x1000, I guess.
>>>>>>
>>>>>> RISC-V's first 2MB RAM could reserve for opensbi, so it would have
>>>>>> riscv_pfn_base=512 and mem_map began with 512th PFN when
>>>>>> CONFIG_FLATMEM=y.
>>>>>> (Also, csky has the same issue: a non-zero pfn_base in some scenarios.)
>>>>>>
>>>>>> But __find_buddy_pfn algorithm thinks the start address is 0, it could
>>>>>> get 0 pfn or less than the pfn_base value. We need another check to
>>>>>> prevent that.
>>>>>>
>>>>>>> BTW, what does first reserved 2MB imply? All 4KB pages from first 2MB are
>>>>>>> set to PageReserved?
>>>>>>>
>>>>>>>> With the patch, it can avoid the calling of get_pageblock_migratetype() if
>>>>>>>> it isn't buddy page.
>>>>>>> You might miss the __find_buddy_pfn() caller in unset_migratetype_isolate()
>>>>>>> from mm/page_isolation.c, if you are talking about linux-5.17.y and former
>>>>>>> version. There, page_is_buddy() is also not called and is_migrate_isolate_page()
>>>>>>> is called, which calls get_pageblock_migratetype() too.
>>>>>>>
>>>>>>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>>>>> Cc: stable@vger.kernel.org
>>>>>>>> Reported-by: zjb194813@alibaba-inc.com
>>>>>>>> Reported-by: tianhu.hh@alibaba-inc.com
>>>>>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>>>>>>>> ---
>>>>>>>>    mm/page_alloc.c | 3 +++
>>>>>>>>    1 file changed, 3 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>>>> index b1caa1c6c887..5b423caa68fd 100644
>>>>>>>> --- a/mm/page_alloc.c
>>>>>>>> +++ b/mm/page_alloc.c
>>>>>>>> @@ -1129,6 +1129,9 @@ static inline void __free_one_page(struct page *page,
>>>>>>>>
>>>>>>>>                         buddy_pfn = __find_buddy_pfn(pfn, order);
>>>>>>>>                         buddy = page + (buddy_pfn - pfn);
>>>>>>>> +
>>>>>>>> +                     if (!page_is_buddy(page, buddy, order))
>>>>>>>> +                             goto done_merging;
>>>>>>>>                         buddy_mt = get_pageblock_migratetype(buddy);
>>>>>>>>
>>>>>>>>                         if (migratetype != buddy_mt
>>>>>>>> --
>>>>>>>> 2.17.1
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Yan, Zi
>>>>>>
>>>>>> --
>>>>>> Best Regards
>>>>>>    Guo Ren
>>>>>>
>>>>>> ML: https://lore.kernel.org/linux-csky/
>>>>> --
>>>>> Best Regards,
>>>>> Yan, Zi
>>>>
>>>> -- 
>>>> Best Regards
>>>>    Guo Ren
>>>>
>>>> ML: https://lore.kernel.org/linux-csky/
>>> --
>>> Best Regards,
>>> Yan, Zi
> --
> Best Regards,
> Yan, Zi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype
  2022-06-15 16:15               ` Xianting Tian
@ 2022-06-16  1:51                 ` Xianting Tian
  2022-06-16 14:01                 ` Zi Yan
  1 sibling, 0 replies; 19+ messages in thread
From: Xianting Tian @ 2022-06-16  1:51 UTC (permalink / raw)
  To: Zi Yan
  Cc: Guo Ren, Greg Kroah-Hartman, Andrew Morton, Vlastimil Babka,
	Linux-MM, Linux Kernel Mailing List, stable, huanyi.xj,
	zjb194813, tianhu.hh, Hanjun Guo, Joonsoo Kim, Laura Abbott


在 2022/6/16 上午12:15, Xianting Tian 写道:
>
> 在 2022/6/15 下午9:55, Zi Yan 写道:
>> On 15 Jun 2022, at 2:47, Xianting Tian wrote:
>>
>>> 在 2022/6/14 上午8:14, Zi Yan 写道:
>>>> On 13 Jun 2022, at 19:47, Guo Ren wrote:
>>>>
>>>>> On Tue, Jun 14, 2022 at 3:49 AM Zi Yan <ziy@nvidia.com> wrote:
>>>>>> On 13 Jun 2022, at 12:32, Guo Ren wrote:
>>>>>>
>>>>>>> On Mon, Jun 13, 2022 at 11:23 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>>>>> Hi Xianting,
>>>>>>>>
>>>>>>>> Thanks for your patch.
>>>>>>>>
>>>>>>>> On 13 Jun 2022, at 9:10, Xianting Tian wrote:
>>>>>>>>
>>>>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before 
>>>>>>>>> check its migratetype.")
>>>>>>>>> added buddy check code. But unfortunately, this fix isn't 
>>>>>>>>> backported to
>>>>>>>>> linux-5.17.y and the former stable branches. The reason is it 
>>>>>>>>> added wrong
>>>>>>>>> fixes message:
>>>>>>>>>        Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging 
>>>>>>>>> non-fallbackable
>>>>>>>>>                            pageblocks with others")
>>>>>>>> No, the Fixes tag is right. The commit above does need to 
>>>>>>>> validate buddy.
>>>>>>> I think Xianting is right. The “Fixes:" tag is not accurate and the
>>>>>>> page_is_buddy() is necessary here.
>>>>>>>
>>>>>>> This patch could be applied to the early version of the stable tree
>>>>>>> (eg: Linux-5.10.y, not the master tree)
>>>>>> This is quite misleading. Commit 787af64d05cd applies does not 
>>>>>> mean it is
>>>>>> intended to fix the preexisting bug. Also it does not apply cleanly
>>>>>> to commit d9dddbf55667, there is a clear indentation mismatch. At 
>>>>>> best,
>>>>>> you can say the way of 787af64d05cd fixing 1dd214b8f21c also 
>>>>>> fixes d9dddbf55667.
>>>>>> There is no way you can apply 787af64d05cd to earlier trees and 
>>>>>> call it a day.
>>>>>>
>>>>>> You can mention 787af64d05cd that it fixes a bug in 1dd214b8f21c 
>>>>>> and there is
>>>>>> a similar bug in d9dddbf55667 that can be fixed in a similar way 
>>>>>> too. Saying
>>>>>> the fixes message is wrong just misleads people, making them 
>>>>>> think there is
>>>>>> no bug in 1dd214b8f21c. We need to be clear about this.
>>>>> First, d9dddbf55667 is earlier than 1dd214b8f21c in Linus tree. The
>>>>> origin fixes could cover the Linux-5.0.y tree if they give the
>>>>> accurate commit number and that is the cause we want to point out.
>>>> Yes, I got that d9dddbf55667 is earlier and commit 787af64d05cd fixes
>>>> the issue introduced by d9dddbf55667. But my point is that 
>>>> 787af64d05cd
>>>> is not intended to fix d9dddbf55667 and saying it has a wrong fixes
>>>> message is misleading. This is the point I want to make.
>>>>
>>>>> Second, if the patch is for d9dddbf55667 then it could cover any tree
>>>>> in the stable repo. Actually, we only know Linux-5.10.y has the
>>>>> problem.
>>>> But it is not and does not apply to d9dddbf55667 cleanly.
>>>>
>>>>> Maybe, Gregkh could help to direct us on how to deal with the issue:
>>>>> (Fixup a bug which only belongs to the former stable branch.)
>>>>>
>>>> I think you just need to send this patch without saying “commit
>>>> 787af64d05cd fixes message is wrong” would be a good start. You also
>>>> need extra fix to mm/page_isolation.c for kernels between 5.15 and 
>>>> 5.17
>>>> (inclusive). So there will need to be two patches:
>>>>
>>>> 1) your patch to stable tree prior to 5.15 and
>>>>
>>>> 2) your patch with an additional mm/page_isolation.c fix to stable 
>>>> tree
>>>> between 5.15 and 5.17.
>>>>
>>>>>> Also, you will need to fix the mm/page_isolation.c code too to 
>>>>>> make this patch
>>>>>> complete, unless you can show that PFN=0x1000 is never going to 
>>>>>> be encountered
>>>>>> in the mm/page_isolation.c code I mentioned below.
>>>>> No, we needn't fix mm/page_isolation.c in linux-5.10.y, because it 
>>>>> had
>>>>> pfn_valid_within(buddy_pfn) check after __find_buddy_pfn() to prevent
>>>>> buddy_pfn=0.
>>>>> The root cause comes from __find_buddy_pfn():
>>>>> return page_pfn ^ (1 << order);
>>>> Right. But pfn_valid_within() was removed since 5.15. So your fix is
>>>> required for kernels between 5.15 and 5.17 (inclusive).
>>>>
>>>>> When page_pfn is the same as the order size, it will return the
>>>>> previous buddy not the next. That is the only exception for this
>>>>> algorithm, right?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> In fact, the bug is a very long time to reproduce and is not easy to
>>>>> debug, so we want to contribute it to the community to prevent other
>>>>> guys from wasting time. Although there is no new patch at all.
>>>> Thanks for your reporting and sending out the patch. I really
>>>> appreciate it. We definitely need your inputs. Throughout the email
>>>> thread, I am trying to help you clarify the bug and how to fix it
>>>> properly:
>>>>
>>>> 1. The commit 787af64d05cd does not apply cleanly to commits
>>>> d9dddbf55667, meaning you cannot just cherry-pick that commit to
>>>> fix the issue. That is why we need your patch to fix the issue.
>>>> And saying it has a wrong fixes message in this patch’s git log is
>>>> misleading.
>>>>
>>>> 2. For kernels between 5.15 and 5.17 (inclusive), an additional fix
>>>> to mm/page_isolation.c is also needed, since pfn_valid_within() was
>>>> removed since 5.15 and the issue can appear during page isolation.
>>>>
>>>> 3. For kernels before 5.15, this patch will apply.
>>> Zi Yan, Guo Ren,
>>>
>>> I think we still need some imporvemnt for MASTER branch, as we 
>>> discussed above, we will get an illegal buddy page if buddy_pfn is 0,
>>>
>>> within page_is_buddy(), it still use the illegal buddy page to do 
>>> the check. I think in most of cases, page_is_buddy() can return 
>>> false,  but it still may return true with very low probablity.
>> Can you elaborate more on this? What kind of page can lead to 
>> page_is_buddy()
>> returning true? You said it is buddy_pfn is 0, but if the page is 
>> reserved,
>> if (!page_is_guard(buddy) && !PageBuddy(buddy)) should return false.
>> Maybe show us the dump_page() that offending page.
>>
>> Thanks.
>
> Let‘s take the issue we met on RISC-V arch for example,
>
> pfn_base is 512 as we reserved 2M RAM for opensbi, mem_map's value is 
> 0xffffffe07e205000, which is the page address of PFN 512.
>
> __find_buddy_pfn() returned 0 for PFN 0x2000 with order 0xd.
> We know PFN 0 is not a valid pfn for buddy system, because 512 is the 
> first PFN for buddy system.
>
> Then it use below code to get buddy page with buddy_pfn 0:
> buddy = page + (buddy_pfn - pfn);
> So buddy page address is:
> 0xffffffe07e1fe000 = (struct page*)0xffffffe07e26e000 + (0 - 0x2000)
>
> we can know this buddy page's address is less than 
> mem_map(0xffffffe07e1fe000 < 0xffffffe07e205000),
> actually 0xffffffe07e1fe000 is not a valid page's address. If we use 
> 0xffffffe07e1fe000
> as the page's address to extract the value of a member in 'struct 
> page', we may get an uncertain value.
> That's why I say page_is_buddy() may return true with very low 
> probablity.
>
> So I think we need to add the code the verify buddy_pfn in the first 
> place:
>     pfn_valid(buddy_pfn)
>
The contents of the invalid page as below, all 0 in contents, but it may 
diff next time.

(gdb) p /x *(struct page *)0xffffffe07e1fe000

$15 = {flags = 0x0, {{lru = {next = 0x0, prev = 0x0}, mapping = 0x0, 
index = 0x0, private = 0x0}, {dma_addr = {0x0, 0x0}}, {{slab_list = 
{next = 0x0, prev = 0
           x0}, {next = 0x0, pages = 0x0, pobjects = 0x0}}, slab_cache = 
0x0, freelist = 0x0, {s_mem = 0x0, counters = 0x0, {inuse = 0x0, objects 
= 0x0
           , frozen = 0x0}}}, {compound_head = 0x0, compound_dtor = 0x0, 
compound_order = 0x0, compound_mapcount = {counter = 0x0}, compound_nr =
       0x0}, {_compound_pad_1 = 0x0, hpage_pinned_refcount = {counter = 
0x0}, deferred_list = {next = 0x0, prev = 0x0}}, {_pt_pad_1 = 0x0, pmd_hu
       ge_pte = 0x0, _pt_pad_2 = 0x0, {pt_mm = 0x0, pt_frag_refcount = 
{counter = 0x0}}, ptl = {{rlock = {raw_lock = {lock = 0x0}}}}}, {pgmap = 
0x0, zone
       _device_data = 0x0}, callback_head = {next = 0x0, func = 0x0}}, 
{_mapcount = {counter = 0x0}, page_type = 0x0, active = 0x0, units = 
0x0}, _refcount

    = {counter = 0x0}}

This is the first normal page in mem_map,

(gdb) p /x *(struct page *)0xffffffe07e26e000
$16 = {flags = 0x0, {{lru = {next = 0x100, prev = 0x122}, mapping = 0x0, 
index = 0x0, private = 0x0}, {dma_addr = {0x100, 0x122}}, {{slab_list = 
{next = 0x100
           , prev = 0x122}, {next = 0x100, pages = 0x122, pobjects = 
0x0}}, slab_cache = 0x0, freelist = 0x0, {s_mem = 0x0, counters = 0x0, 
{inuse
           = 0x0, objects = 0x0, frozen = 0x0}}}, {compound_head = 
0x100, compound_dtor = 0x22, compound_order = 0x1, compound_mapcount = 
{counter = 0x0}
         , compound_nr = 0x0}, {_compound_pad_1 = 0x100, 
hpage_pinned_refcount = {counter = 0x122}, deferred_list = {next = 0x0, 
prev = 0x0}}, {_
       pt_pad_1 = 0x100, pmd_huge_pte = 0x122, _pt_pad_2 = 0x0, {pt_mm = 
0x0, pt_frag_refcount = {counter = 0x0}}, ptl = {{rlock = {raw_lock = 
{lock = 0x0}}}}
               }, {pgmap = 0x100, zone_device_data = 0x122}, 
callback_head = {next = 0x100, func = 0x122}}, {_mapcount = {counter = 
0xffffffff},
       page_type = 0xffffffff, active = 0xffffffff, units = 0xffffffff}, 
_refcount = {counter = 0x0}}

>>> I think we need to add some code to verify buddy_pfn in the first 
>>> place.
>>>
>>> Could you give some suggestions for this idea?
>>>
>>>>>>>>> Actually, this issue is involved by commit:
>>>>>>>>>        commit d9dddbf55667 ("mm/page_alloc: prevent merging 
>>>>>>>>> between isolated and other pageblocks")
>>>>>>>>>
>>>>>>>>> For RISC-V arch, the first 2M is reserved for sbi, so the 
>>>>>>>>> start PFN is 512,
>>>>>>>>> but it got buddy PFN 0 for PFN 0x2000:
>>>>>>>>>        0 = 0x2000 ^ (1 << 12)
>>>>>>>>> With the illegal buddy PFN 0, it got an illegal buddy page, 
>>>>>>>>> which caused
>>>>>>>>> crash in __get_pfnblock_flags_mask().
>>>>>>>> It seems that the RISC-V arch reveals a similar bug from 
>>>>>>>> d9dddbf55667.
>>>>>>>> Basically, this bug will only happen when PFN=0x2000 is merging 
>>>>>>>> up and
>>>>>>>> there are some isolated pageblocks.
>>>>>>> Not PFN=0x2000, it's PFN=0x1000, I guess.
>>>>>>>
>>>>>>> RISC-V's first 2MB RAM could reserve for opensbi, so it would have
>>>>>>> riscv_pfn_base=512 and mem_map began with 512th PFN when
>>>>>>> CONFIG_FLATMEM=y.
>>>>>>> (Also, csky has the same issue: a non-zero pfn_base in some 
>>>>>>> scenarios.)
>>>>>>>
>>>>>>> But __find_buddy_pfn algorithm thinks the start address is 0, it 
>>>>>>> could
>>>>>>> get 0 pfn or less than the pfn_base value. We need another check to
>>>>>>> prevent that.
>>>>>>>
>>>>>>>> BTW, what does first reserved 2MB imply? All 4KB pages from 
>>>>>>>> first 2MB are
>>>>>>>> set to PageReserved?
>>>>>>>>
>>>>>>>>> With the patch, it can avoid the calling of 
>>>>>>>>> get_pageblock_migratetype() if
>>>>>>>>> it isn't buddy page.
>>>>>>>> You might miss the __find_buddy_pfn() caller in 
>>>>>>>> unset_migratetype_isolate()
>>>>>>>> from mm/page_isolation.c, if you are talking about linux-5.17.y 
>>>>>>>> and former
>>>>>>>> version. There, page_is_buddy() is also not called and 
>>>>>>>> is_migrate_isolate_page()
>>>>>>>> is called, which calls get_pageblock_migratetype() too.
>>>>>>>>
>>>>>>>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between 
>>>>>>>>> isolated and other pageblocks")
>>>>>>>>> Cc: stable@vger.kernel.org
>>>>>>>>> Reported-by: zjb194813@alibaba-inc.com
>>>>>>>>> Reported-by: tianhu.hh@alibaba-inc.com
>>>>>>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>>>>>>>>> ---
>>>>>>>>>    mm/page_alloc.c | 3 +++
>>>>>>>>>    1 file changed, 3 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>>>>> index b1caa1c6c887..5b423caa68fd 100644
>>>>>>>>> --- a/mm/page_alloc.c
>>>>>>>>> +++ b/mm/page_alloc.c
>>>>>>>>> @@ -1129,6 +1129,9 @@ static inline void 
>>>>>>>>> __free_one_page(struct page *page,
>>>>>>>>>
>>>>>>>>>                         buddy_pfn = __find_buddy_pfn(pfn, order);
>>>>>>>>>                         buddy = page + (buddy_pfn - pfn);
>>>>>>>>> +
>>>>>>>>> +                     if (!page_is_buddy(page, buddy, order))
>>>>>>>>> +                             goto done_merging;
>>>>>>>>>                         buddy_mt = 
>>>>>>>>> get_pageblock_migratetype(buddy);
>>>>>>>>>
>>>>>>>>>                         if (migratetype != buddy_mt
>>>>>>>>> -- 
>>>>>>>>> 2.17.1
>>>>>>>> -- 
>>>>>>>> Best Regards,
>>>>>>>> Yan, Zi
>>>>>>>
>>>>>>> -- 
>>>>>>> Best Regards
>>>>>>>    Guo Ren
>>>>>>>
>>>>>>> ML: https://lore.kernel.org/linux-csky/
>>>>>> -- 
>>>>>> Best Regards,
>>>>>> Yan, Zi
>>>>>
>>>>> -- 
>>>>> Best Regards
>>>>>    Guo Ren
>>>>>
>>>>> ML: https://lore.kernel.org/linux-csky/
>>>> -- 
>>>> Best Regards,
>>>> Yan, Zi
>> -- 
>> Best Regards,
>> Yan, Zi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype
  2022-06-15 16:15               ` Xianting Tian
  2022-06-16  1:51                 ` Xianting Tian
@ 2022-06-16 14:01                 ` Zi Yan
  2022-06-16 15:04                   ` David Hildenbrand
  1 sibling, 1 reply; 19+ messages in thread
From: Zi Yan @ 2022-06-16 14:01 UTC (permalink / raw)
  To: Xianting Tian
  Cc: Guo Ren, Greg Kroah-Hartman, Andrew Morton, Vlastimil Babka,
	Linux-MM, Linux Kernel Mailing List, stable, huanyi.xj,
	zjb194813, tianhu.hh, Hanjun Guo, David Hildenbrand, Joonsoo Kim,
	Laura Abbott

[-- Attachment #1: Type: text/plain, Size: 11329 bytes --]

On 15 Jun 2022, at 12:15, Xianting Tian wrote:

> 在 2022/6/15 下午9:55, Zi Yan 写道:
>> On 15 Jun 2022, at 2:47, Xianting Tian wrote:
>>
>>> 在 2022/6/14 上午8:14, Zi Yan 写道:
>>>> On 13 Jun 2022, at 19:47, Guo Ren wrote:
>>>>
>>>>> On Tue, Jun 14, 2022 at 3:49 AM Zi Yan <ziy@nvidia.com> wrote:
>>>>>> On 13 Jun 2022, at 12:32, Guo Ren wrote:
>>>>>>
>>>>>>> On Mon, Jun 13, 2022 at 11:23 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>>>>> Hi Xianting,
>>>>>>>>
>>>>>>>> Thanks for your patch.
>>>>>>>>
>>>>>>>> On 13 Jun 2022, at 9:10, Xianting Tian wrote:
>>>>>>>>
>>>>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>>>>>>>>> added buddy check code. But unfortunately, this fix isn't backported to
>>>>>>>>> linux-5.17.y and the former stable branches. The reason is it added wrong
>>>>>>>>> fixes message:
>>>>>>>>>        Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>>>>>>>>>                            pageblocks with others")
>>>>>>>> No, the Fixes tag is right. The commit above does need to validate buddy.
>>>>>>> I think Xianting is right. The “Fixes:" tag is not accurate and the
>>>>>>> page_is_buddy() is necessary here.
>>>>>>>
>>>>>>> This patch could be applied to the early version of the stable tree
>>>>>>> (eg: Linux-5.10.y, not the master tree)
>>>>>> This is quite misleading. Commit 787af64d05cd applies does not mean it is
>>>>>> intended to fix the preexisting bug. Also it does not apply cleanly
>>>>>> to commit d9dddbf55667, there is a clear indentation mismatch. At best,
>>>>>> you can say the way of 787af64d05cd fixing 1dd214b8f21c also fixes d9dddbf55667.
>>>>>> There is no way you can apply 787af64d05cd to earlier trees and call it a day.
>>>>>>
>>>>>> You can mention 787af64d05cd that it fixes a bug in 1dd214b8f21c and there is
>>>>>> a similar bug in d9dddbf55667 that can be fixed in a similar way too. Saying
>>>>>> the fixes message is wrong just misleads people, making them think there is
>>>>>> no bug in 1dd214b8f21c. We need to be clear about this.
>>>>> First, d9dddbf55667 is earlier than 1dd214b8f21c in Linus tree. The
>>>>> origin fixes could cover the Linux-5.0.y tree if they give the
>>>>> accurate commit number and that is the cause we want to point out.
>>>> Yes, I got that d9dddbf55667 is earlier and commit 787af64d05cd fixes
>>>> the issue introduced by d9dddbf55667. But my point is that 787af64d05cd
>>>> is not intended to fix d9dddbf55667 and saying it has a wrong fixes
>>>> message is misleading. This is the point I want to make.
>>>>
>>>>> Second, if the patch is for d9dddbf55667 then it could cover any tree
>>>>> in the stable repo. Actually, we only know Linux-5.10.y has the
>>>>> problem.
>>>> But it is not and does not apply to d9dddbf55667 cleanly.
>>>>
>>>>> Maybe, Gregkh could help to direct us on how to deal with the issue:
>>>>> (Fixup a bug which only belongs to the former stable branch.)
>>>>>
>>>> I think you just need to send this patch without saying “commit
>>>> 787af64d05cd fixes message is wrong” would be a good start. You also
>>>> need extra fix to mm/page_isolation.c for kernels between 5.15 and 5.17
>>>> (inclusive). So there will need to be two patches:
>>>>
>>>> 1) your patch to stable tree prior to 5.15 and
>>>>
>>>> 2) your patch with an additional mm/page_isolation.c fix to stable tree
>>>> between 5.15 and 5.17.
>>>>
>>>>>> Also, you will need to fix the mm/page_isolation.c code too to make this patch
>>>>>> complete, unless you can show that PFN=0x1000 is never going to be encountered
>>>>>> in the mm/page_isolation.c code I mentioned below.
>>>>> No, we needn't fix mm/page_isolation.c in linux-5.10.y, because it had
>>>>> pfn_valid_within(buddy_pfn) check after __find_buddy_pfn() to prevent
>>>>> buddy_pfn=0.
>>>>> The root cause comes from __find_buddy_pfn():
>>>>> return page_pfn ^ (1 << order);
>>>> Right. But pfn_valid_within() was removed since 5.15. So your fix is
>>>> required for kernels between 5.15 and 5.17 (inclusive).
>>>>
>>>>> When page_pfn is the same as the order size, it will return the
>>>>> previous buddy not the next. That is the only exception for this
>>>>> algorithm, right?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> In fact, the bug is a very long time to reproduce and is not easy to
>>>>> debug, so we want to contribute it to the community to prevent other
>>>>> guys from wasting time. Although there is no new patch at all.
>>>> Thanks for your reporting and sending out the patch. I really
>>>> appreciate it. We definitely need your inputs. Throughout the email
>>>> thread, I am trying to help you clarify the bug and how to fix it
>>>> properly:
>>>>
>>>> 1. The commit 787af64d05cd does not apply cleanly to commits
>>>> d9dddbf55667, meaning you cannot just cherry-pick that commit to
>>>> fix the issue. That is why we need your patch to fix the issue.
>>>> And saying it has a wrong fixes message in this patch’s git log is
>>>> misleading.
>>>>
>>>> 2. For kernels between 5.15 and 5.17 (inclusive), an additional fix
>>>> to mm/page_isolation.c is also needed, since pfn_valid_within() was
>>>> removed since 5.15 and the issue can appear during page isolation.
>>>>
>>>> 3. For kernels before 5.15, this patch will apply.
>>> Zi Yan, Guo Ren,
>>>
>>> I think we still need some imporvemnt for MASTER branch, as we discussed above, we will get an illegal buddy page if buddy_pfn is 0,
>>>
>>> within page_is_buddy(), it still use the illegal buddy page to do the check. I think in most of cases, page_is_buddy() can return false,  but it still may return true with very low probablity.
>> Can you elaborate more on this? What kind of page can lead to page_is_buddy()
>> returning true? You said it is buddy_pfn is 0, but if the page is reserved,
>> if (!page_is_guard(buddy) && !PageBuddy(buddy)) should return false.
>> Maybe show us the dump_page() that offending page.
>>
>> Thanks.
>
> Let‘s take the issue we met on RISC-V arch for example,
>
> pfn_base is 512 as we reserved 2M RAM for opensbi, mem_map's value is 0xffffffe07e205000, which is the page address of PFN 512.
>
> __find_buddy_pfn() returned 0 for PFN 0x2000 with order 0xd.
> We know PFN 0 is not a valid pfn for buddy system, because 512 is the first PFN for buddy system.
>
> Then it use below code to get buddy page with buddy_pfn 0:
> buddy = page + (buddy_pfn - pfn);
> So buddy page address is:
> 0xffffffe07e1fe000 = (struct page*)0xffffffe07e26e000 + (0 - 0x2000)
>
> we can know this buddy page's address is less than mem_map(0xffffffe07e1fe000 < 0xffffffe07e205000),
> actually 0xffffffe07e1fe000 is not a valid page's address. If we use 0xffffffe07e1fe000
> as the page's address to extract the value of a member in 'struct page', we may get an uncertain value.
> That's why I say page_is_buddy() may return true with very low probablity.
>
> So I think we need to add the code the verify buddy_pfn in the first place:
> 	pfn_valid(buddy_pfn)
>

+DavidH on how memory section works.

This 2MB RAM reservation does not sound right to me. How does it work in sparsemem?
RISC-V has SECTION_SIZE_BITS=27, i.e., 128MB a section. All pages within
a section should have their corresponding struct page (mem_map). So in this case,
the first 2MB pages should have mem_map and can be marked as PageReserved. As a
result, page_is_buddy() will return false.

For flatmem, IIRC, the valid addresses should be aligned to MAX_ORDER_NR_PAGES.
This means first 4MB (assuming MAX_ORDER is 11 on RISC-V) should not be on
buddy allocator and hence this issue does not happen in the first place.
But correct me if I am wrong.



>>> I think we need to add some code to verify buddy_pfn in the first place.
>>>
>>> Could you give some suggestions for this idea?
>>>
>>>>>>>>> Actually, this issue is involved by commit:
>>>>>>>>>        commit d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>>>>>>
>>>>>>>>> For RISC-V arch, the first 2M is reserved for sbi, so the start PFN is 512,
>>>>>>>>> but it got buddy PFN 0 for PFN 0x2000:
>>>>>>>>>        0 = 0x2000 ^ (1 << 12)
>>>>>>>>> With the illegal buddy PFN 0, it got an illegal buddy page, which caused
>>>>>>>>> crash in __get_pfnblock_flags_mask().
>>>>>>>> It seems that the RISC-V arch reveals a similar bug from d9dddbf55667.
>>>>>>>> Basically, this bug will only happen when PFN=0x2000 is merging up and
>>>>>>>> there are some isolated pageblocks.
>>>>>>> Not PFN=0x2000, it's PFN=0x1000, I guess.
>>>>>>>
>>>>>>> RISC-V's first 2MB RAM could reserve for opensbi, so it would have
>>>>>>> riscv_pfn_base=512 and mem_map began with 512th PFN when
>>>>>>> CONFIG_FLATMEM=y.
>>>>>>> (Also, csky has the same issue: a non-zero pfn_base in some scenarios.)
>>>>>>>
>>>>>>> But __find_buddy_pfn algorithm thinks the start address is 0, it could
>>>>>>> get 0 pfn or less than the pfn_base value. We need another check to
>>>>>>> prevent that.
>>>>>>>
>>>>>>>> BTW, what does first reserved 2MB imply? All 4KB pages from first 2MB are
>>>>>>>> set to PageReserved?
>>>>>>>>
>>>>>>>>> With the patch, it can avoid the calling of get_pageblock_migratetype() if
>>>>>>>>> it isn't buddy page.
>>>>>>>> You might miss the __find_buddy_pfn() caller in unset_migratetype_isolate()
>>>>>>>> from mm/page_isolation.c, if you are talking about linux-5.17.y and former
>>>>>>>> version. There, page_is_buddy() is also not called and is_migrate_isolate_page()
>>>>>>>> is called, which calls get_pageblock_migratetype() too.
>>>>>>>>
>>>>>>>>> Fixes: d9dddbf55667 ("mm/page_alloc: prevent merging between isolated and other pageblocks")
>>>>>>>>> Cc: stable@vger.kernel.org
>>>>>>>>> Reported-by: zjb194813@alibaba-inc.com
>>>>>>>>> Reported-by: tianhu.hh@alibaba-inc.com
>>>>>>>>> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
>>>>>>>>> ---
>>>>>>>>>    mm/page_alloc.c | 3 +++
>>>>>>>>>    1 file changed, 3 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>>>>>> index b1caa1c6c887..5b423caa68fd 100644
>>>>>>>>> --- a/mm/page_alloc.c
>>>>>>>>> +++ b/mm/page_alloc.c
>>>>>>>>> @@ -1129,6 +1129,9 @@ static inline void __free_one_page(struct page *page,
>>>>>>>>>
>>>>>>>>>                         buddy_pfn = __find_buddy_pfn(pfn, order);
>>>>>>>>>                         buddy = page + (buddy_pfn - pfn);
>>>>>>>>> +
>>>>>>>>> +                     if (!page_is_buddy(page, buddy, order))
>>>>>>>>> +                             goto done_merging;
>>>>>>>>>                         buddy_mt = get_pageblock_migratetype(buddy);
>>>>>>>>>
>>>>>>>>>                         if (migratetype != buddy_mt
>>>>>>>>> --
>>>>>>>>> 2.17.1
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Yan, Zi
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards
>>>>>>>    Guo Ren
>>>>>>>
>>>>>>> ML: https://lore.kernel.org/linux-csky/
>>>>>> --
>>>>>> Best Regards,
>>>>>> Yan, Zi
>>>>>
>>>>> -- 
>>>>> Best Regards
>>>>>    Guo Ren
>>>>>
>>>>> ML: https://lore.kernel.org/linux-csky/
>>>> --
>>>> Best Regards,
>>>> Yan, Zi
>> --
>> Best Regards,
>> Yan, Zi


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype
  2022-06-16 14:01                 ` Zi Yan
@ 2022-06-16 15:04                   ` David Hildenbrand
  2022-06-16 15:16                     ` Xianting Tian
  2022-06-17  3:17                     ` Guo Ren
  0 siblings, 2 replies; 19+ messages in thread
From: David Hildenbrand @ 2022-06-16 15:04 UTC (permalink / raw)
  To: Zi Yan, Xianting Tian
  Cc: Guo Ren, Greg Kroah-Hartman, Andrew Morton, Vlastimil Babka,
	Linux-MM, Linux Kernel Mailing List, stable, huanyi.xj,
	zjb194813, tianhu.hh, Hanjun Guo, Joonsoo Kim, Laura Abbott

On 16.06.22 16:01, Zi Yan wrote:
> On 15 Jun 2022, at 12:15, Xianting Tian wrote:
> 
>> 在 2022/6/15 下午9:55, Zi Yan 写道:
>>> On 15 Jun 2022, at 2:47, Xianting Tian wrote:
>>>
>>>> 在 2022/6/14 上午8:14, Zi Yan 写道:
>>>>> On 13 Jun 2022, at 19:47, Guo Ren wrote:
>>>>>
>>>>>> On Tue, Jun 14, 2022 at 3:49 AM Zi Yan <ziy@nvidia.com> wrote:
>>>>>>> On 13 Jun 2022, at 12:32, Guo Ren wrote:
>>>>>>>
>>>>>>>> On Mon, Jun 13, 2022 at 11:23 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>>>>>> Hi Xianting,
>>>>>>>>>
>>>>>>>>> Thanks for your patch.
>>>>>>>>>
>>>>>>>>> On 13 Jun 2022, at 9:10, Xianting Tian wrote:
>>>>>>>>>
>>>>>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>>>>>>>>>> added buddy check code. But unfortunately, this fix isn't backported to
>>>>>>>>>> linux-5.17.y and the former stable branches. The reason is it added wrong
>>>>>>>>>> fixes message:
>>>>>>>>>>        Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>>>>>>>>>>                            pageblocks with others")
>>>>>>>>> No, the Fixes tag is right. The commit above does need to validate buddy.
>>>>>>>> I think Xianting is right. The “Fixes:" tag is not accurate and the
>>>>>>>> page_is_buddy() is necessary here.
>>>>>>>>
>>>>>>>> This patch could be applied to the early version of the stable tree
>>>>>>>> (eg: Linux-5.10.y, not the master tree)
>>>>>>> This is quite misleading. Commit 787af64d05cd applies does not mean it is
>>>>>>> intended to fix the preexisting bug. Also it does not apply cleanly
>>>>>>> to commit d9dddbf55667, there is a clear indentation mismatch. At best,
>>>>>>> you can say the way of 787af64d05cd fixing 1dd214b8f21c also fixes d9dddbf55667.
>>>>>>> There is no way you can apply 787af64d05cd to earlier trees and call it a day.
>>>>>>>
>>>>>>> You can mention 787af64d05cd that it fixes a bug in 1dd214b8f21c and there is
>>>>>>> a similar bug in d9dddbf55667 that can be fixed in a similar way too. Saying
>>>>>>> the fixes message is wrong just misleads people, making them think there is
>>>>>>> no bug in 1dd214b8f21c. We need to be clear about this.
>>>>>> First, d9dddbf55667 is earlier than 1dd214b8f21c in Linus tree. The
>>>>>> origin fixes could cover the Linux-5.0.y tree if they give the
>>>>>> accurate commit number and that is the cause we want to point out.
>>>>> Yes, I got that d9dddbf55667 is earlier and commit 787af64d05cd fixes
>>>>> the issue introduced by d9dddbf55667. But my point is that 787af64d05cd
>>>>> is not intended to fix d9dddbf55667 and saying it has a wrong fixes
>>>>> message is misleading. This is the point I want to make.
>>>>>
>>>>>> Second, if the patch is for d9dddbf55667 then it could cover any tree
>>>>>> in the stable repo. Actually, we only know Linux-5.10.y has the
>>>>>> problem.
>>>>> But it is not and does not apply to d9dddbf55667 cleanly.
>>>>>
>>>>>> Maybe, Gregkh could help to direct us on how to deal with the issue:
>>>>>> (Fixup a bug which only belongs to the former stable branch.)
>>>>>>
>>>>> I think you just need to send this patch without saying “commit
>>>>> 787af64d05cd fixes message is wrong” would be a good start. You also
>>>>> need extra fix to mm/page_isolation.c for kernels between 5.15 and 5.17
>>>>> (inclusive). So there will need to be two patches:
>>>>>
>>>>> 1) your patch to stable tree prior to 5.15 and
>>>>>
>>>>> 2) your patch with an additional mm/page_isolation.c fix to stable tree
>>>>> between 5.15 and 5.17.
>>>>>
>>>>>>> Also, you will need to fix the mm/page_isolation.c code too to make this patch
>>>>>>> complete, unless you can show that PFN=0x1000 is never going to be encountered
>>>>>>> in the mm/page_isolation.c code I mentioned below.
>>>>>> No, we needn't fix mm/page_isolation.c in linux-5.10.y, because it had
>>>>>> pfn_valid_within(buddy_pfn) check after __find_buddy_pfn() to prevent
>>>>>> buddy_pfn=0.
>>>>>> The root cause comes from __find_buddy_pfn():
>>>>>> return page_pfn ^ (1 << order);
>>>>> Right. But pfn_valid_within() was removed since 5.15. So your fix is
>>>>> required for kernels between 5.15 and 5.17 (inclusive).
>>>>>
>>>>>> When page_pfn is the same as the order size, it will return the
>>>>>> previous buddy not the next. That is the only exception for this
>>>>>> algorithm, right?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> In fact, the bug is a very long time to reproduce and is not easy to
>>>>>> debug, so we want to contribute it to the community to prevent other
>>>>>> guys from wasting time. Although there is no new patch at all.
>>>>> Thanks for your reporting and sending out the patch. I really
>>>>> appreciate it. We definitely need your inputs. Throughout the email
>>>>> thread, I am trying to help you clarify the bug and how to fix it
>>>>> properly:
>>>>>
>>>>> 1. The commit 787af64d05cd does not apply cleanly to commits
>>>>> d9dddbf55667, meaning you cannot just cherry-pick that commit to
>>>>> fix the issue. That is why we need your patch to fix the issue.
>>>>> And saying it has a wrong fixes message in this patch’s git log is
>>>>> misleading.
>>>>>
>>>>> 2. For kernels between 5.15 and 5.17 (inclusive), an additional fix
>>>>> to mm/page_isolation.c is also needed, since pfn_valid_within() was
>>>>> removed since 5.15 and the issue can appear during page isolation.
>>>>>
>>>>> 3. For kernels before 5.15, this patch will apply.
>>>> Zi Yan, Guo Ren,
>>>>
>>>> I think we still need some imporvemnt for MASTER branch, as we discussed above, we will get an illegal buddy page if buddy_pfn is 0,
>>>>
>>>> within page_is_buddy(), it still use the illegal buddy page to do the check. I think in most of cases, page_is_buddy() can return false,  but it still may return true with very low probablity.
>>> Can you elaborate more on this? What kind of page can lead to page_is_buddy()
>>> returning true? You said it is buddy_pfn is 0, but if the page is reserved,
>>> if (!page_is_guard(buddy) && !PageBuddy(buddy)) should return false.
>>> Maybe show us the dump_page() that offending page.
>>>
>>> Thanks.
>>
>> Let‘s take the issue we met on RISC-V arch for example,
>>
>> pfn_base is 512 as we reserved 2M RAM for opensbi, mem_map's value is 0xffffffe07e205000, which is the page address of PFN 512.
>>
>> __find_buddy_pfn() returned 0 for PFN 0x2000 with order 0xd.
>> We know PFN 0 is not a valid pfn for buddy system, because 512 is the first PFN for buddy system.
>>
>> Then it use below code to get buddy page with buddy_pfn 0:
>> buddy = page + (buddy_pfn - pfn);
>> So buddy page address is:
>> 0xffffffe07e1fe000 = (struct page*)0xffffffe07e26e000 + (0 - 0x2000)
>>
>> we can know this buddy page's address is less than mem_map(0xffffffe07e1fe000 < 0xffffffe07e205000),
>> actually 0xffffffe07e1fe000 is not a valid page's address. If we use 0xffffffe07e1fe000
>> as the page's address to extract the value of a member in 'struct page', we may get an uncertain value.
>> That's why I say page_is_buddy() may return true with very low probablity.
>>
>> So I think we need to add the code the verify buddy_pfn in the first place:
>> 	pfn_valid(buddy_pfn)
>>
> 
> +DavidH on how memory section works.
> 
> This 2MB RAM reservation does not sound right to me. How does it work in sparsemem?
> RISC-V has SECTION_SIZE_BITS=27, i.e., 128MB a section. All pages within
> a section should have their corresponding struct page (mem_map). So in this case,
> the first 2MB pages should have mem_map and can be marked as PageReserved. As a
> result, page_is_buddy() will return false.

Yes. Unless there is a BUG :)

init_unavailable_range() is supposed to initialize the memap of
unavailable ranges and mark it reserved.

I wonder if we're missing a case in memmap_init(), to also initialize
holes at the beginning of a section, before RAM (we do handle sections
in a special way if the end of RAM falls in the middle of a section).

If it's not initialized, it might contain garbage.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype
  2022-06-16 15:04                   ` David Hildenbrand
@ 2022-06-16 15:16                     ` Xianting Tian
  2022-06-17  3:17                     ` Guo Ren
  1 sibling, 0 replies; 19+ messages in thread
From: Xianting Tian @ 2022-06-16 15:16 UTC (permalink / raw)
  To: David Hildenbrand, Zi Yan
  Cc: Guo Ren, Greg Kroah-Hartman, Andrew Morton, Vlastimil Babka,
	Linux-MM, Linux Kernel Mailing List, stable, huanyi.xj,
	zjb194813, tianhu.hh, Hanjun Guo, Joonsoo Kim, Laura Abbott


在 2022/6/16 下午11:04, David Hildenbrand 写道:
> On 16.06.22 16:01, Zi Yan wrote:
>> On 15 Jun 2022, at 12:15, Xianting Tian wrote:
>>
>>> 在 2022/6/15 下午9:55, Zi Yan 写道:
>>>> On 15 Jun 2022, at 2:47, Xianting Tian wrote:
>>>>
>>>>> 在 2022/6/14 上午8:14, Zi Yan 写道:
>>>>>> On 13 Jun 2022, at 19:47, Guo Ren wrote:
>>>>>>
>>>>>>> On Tue, Jun 14, 2022 at 3:49 AM Zi Yan <ziy@nvidia.com> wrote:
>>>>>>>> On 13 Jun 2022, at 12:32, Guo Ren wrote:
>>>>>>>>
>>>>>>>>> On Mon, Jun 13, 2022 at 11:23 PM Zi Yan <ziy@nvidia.com> wrote:
>>>>>>>>>> Hi Xianting,
>>>>>>>>>>
>>>>>>>>>> Thanks for your patch.
>>>>>>>>>>
>>>>>>>>>> On 13 Jun 2022, at 9:10, Xianting Tian wrote:
>>>>>>>>>>
>>>>>>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
>>>>>>>>>>> added buddy check code. But unfortunately, this fix isn't backported to
>>>>>>>>>>> linux-5.17.y and the former stable branches. The reason is it added wrong
>>>>>>>>>>> fixes message:
>>>>>>>>>>>         Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
>>>>>>>>>>>                             pageblocks with others")
>>>>>>>>>> No, the Fixes tag is right. The commit above does need to validate buddy.
>>>>>>>>> I think Xianting is right. The “Fixes:" tag is not accurate and the
>>>>>>>>> page_is_buddy() is necessary here.
>>>>>>>>>
>>>>>>>>> This patch could be applied to the early version of the stable tree
>>>>>>>>> (eg: Linux-5.10.y, not the master tree)
>>>>>>>> This is quite misleading. Commit 787af64d05cd applies does not mean it is
>>>>>>>> intended to fix the preexisting bug. Also it does not apply cleanly
>>>>>>>> to commit d9dddbf55667, there is a clear indentation mismatch. At best,
>>>>>>>> you can say the way of 787af64d05cd fixing 1dd214b8f21c also fixes d9dddbf55667.
>>>>>>>> There is no way you can apply 787af64d05cd to earlier trees and call it a day.
>>>>>>>>
>>>>>>>> You can mention 787af64d05cd that it fixes a bug in 1dd214b8f21c and there is
>>>>>>>> a similar bug in d9dddbf55667 that can be fixed in a similar way too. Saying
>>>>>>>> the fixes message is wrong just misleads people, making them think there is
>>>>>>>> no bug in 1dd214b8f21c. We need to be clear about this.
>>>>>>> First, d9dddbf55667 is earlier than 1dd214b8f21c in Linus tree. The
>>>>>>> origin fixes could cover the Linux-5.0.y tree if they give the
>>>>>>> accurate commit number and that is the cause we want to point out.
>>>>>> Yes, I got that d9dddbf55667 is earlier and commit 787af64d05cd fixes
>>>>>> the issue introduced by d9dddbf55667. But my point is that 787af64d05cd
>>>>>> is not intended to fix d9dddbf55667 and saying it has a wrong fixes
>>>>>> message is misleading. This is the point I want to make.
>>>>>>
>>>>>>> Second, if the patch is for d9dddbf55667 then it could cover any tree
>>>>>>> in the stable repo. Actually, we only know Linux-5.10.y has the
>>>>>>> problem.
>>>>>> But it is not and does not apply to d9dddbf55667 cleanly.
>>>>>>
>>>>>>> Maybe, Gregkh could help to direct us on how to deal with the issue:
>>>>>>> (Fixup a bug which only belongs to the former stable branch.)
>>>>>>>
>>>>>> I think you just need to send this patch without saying “commit
>>>>>> 787af64d05cd fixes message is wrong” would be a good start. You also
>>>>>> need extra fix to mm/page_isolation.c for kernels between 5.15 and 5.17
>>>>>> (inclusive). So there will need to be two patches:
>>>>>>
>>>>>> 1) your patch to stable tree prior to 5.15 and
>>>>>>
>>>>>> 2) your patch with an additional mm/page_isolation.c fix to stable tree
>>>>>> between 5.15 and 5.17.
>>>>>>
>>>>>>>> Also, you will need to fix the mm/page_isolation.c code too to make this patch
>>>>>>>> complete, unless you can show that PFN=0x1000 is never going to be encountered
>>>>>>>> in the mm/page_isolation.c code I mentioned below.
>>>>>>> No, we needn't fix mm/page_isolation.c in linux-5.10.y, because it had
>>>>>>> pfn_valid_within(buddy_pfn) check after __find_buddy_pfn() to prevent
>>>>>>> buddy_pfn=0.
>>>>>>> The root cause comes from __find_buddy_pfn():
>>>>>>> return page_pfn ^ (1 << order);
>>>>>> Right. But pfn_valid_within() was removed since 5.15. So your fix is
>>>>>> required for kernels between 5.15 and 5.17 (inclusive).
>>>>>>
>>>>>>> When page_pfn is the same as the order size, it will return the
>>>>>>> previous buddy not the next. That is the only exception for this
>>>>>>> algorithm, right?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> In fact, the bug is a very long time to reproduce and is not easy to
>>>>>>> debug, so we want to contribute it to the community to prevent other
>>>>>>> guys from wasting time. Although there is no new patch at all.
>>>>>> Thanks for your reporting and sending out the patch. I really
>>>>>> appreciate it. We definitely need your inputs. Throughout the email
>>>>>> thread, I am trying to help you clarify the bug and how to fix it
>>>>>> properly:
>>>>>>
>>>>>> 1. The commit 787af64d05cd does not apply cleanly to commits
>>>>>> d9dddbf55667, meaning you cannot just cherry-pick that commit to
>>>>>> fix the issue. That is why we need your patch to fix the issue.
>>>>>> And saying it has a wrong fixes message in this patch’s git log is
>>>>>> misleading.
>>>>>>
>>>>>> 2. For kernels between 5.15 and 5.17 (inclusive), an additional fix
>>>>>> to mm/page_isolation.c is also needed, since pfn_valid_within() was
>>>>>> removed since 5.15 and the issue can appear during page isolation.
>>>>>>
>>>>>> 3. For kernels before 5.15, this patch will apply.
>>>>> Zi Yan, Guo Ren,
>>>>>
>>>>> I think we still need some imporvemnt for MASTER branch, as we discussed above, we will get an illegal buddy page if buddy_pfn is 0,
>>>>>
>>>>> within page_is_buddy(), it still use the illegal buddy page to do the check. I think in most of cases, page_is_buddy() can return false,  but it still may return true with very low probablity.
>>>> Can you elaborate more on this? What kind of page can lead to page_is_buddy()
>>>> returning true? You said it is buddy_pfn is 0, but if the page is reserved,
>>>> if (!page_is_guard(buddy) && !PageBuddy(buddy)) should return false.
>>>> Maybe show us the dump_page() that offending page.
>>>>
>>>> Thanks.
>>> Let‘s take the issue we met on RISC-V arch for example,
>>>
>>> pfn_base is 512 as we reserved 2M RAM for opensbi, mem_map's value is 0xffffffe07e205000, which is the page address of PFN 512.
>>>
>>> __find_buddy_pfn() returned 0 for PFN 0x2000 with order 0xd.
>>> We know PFN 0 is not a valid pfn for buddy system, because 512 is the first PFN for buddy system.
>>>
>>> Then it use below code to get buddy page with buddy_pfn 0:
>>> buddy = page + (buddy_pfn - pfn);
>>> So buddy page address is:
>>> 0xffffffe07e1fe000 = (struct page*)0xffffffe07e26e000 + (0 - 0x2000)
>>>
>>> we can know this buddy page's address is less than mem_map(0xffffffe07e1fe000 < 0xffffffe07e205000),
>>> actually 0xffffffe07e1fe000 is not a valid page's address. If we use 0xffffffe07e1fe000
>>> as the page's address to extract the value of a member in 'struct page', we may get an uncertain value.
>>> That's why I say page_is_buddy() may return true with very low probablity.
>>>
>>> So I think we need to add the code the verify buddy_pfn in the first place:
>>> 	pfn_valid(buddy_pfn)
>>>
>> +DavidH on how memory section works.
>>
>> This 2MB RAM reservation does not sound right to me. How does it work in sparsemem?
>> RISC-V has SECTION_SIZE_BITS=27, i.e., 128MB a section. All pages within
>> a section should have their corresponding struct page (mem_map). So in this case,
>> the first 2MB pages should have mem_map and can be marked as PageReserved. As a
>> result, page_is_buddy() will return false.
> Yes. Unless there is a BUG :)
>
> init_unavailable_range() is supposed to initialize the memap of
> unavailable ranges and mark it reserved.
>
> I wonder if we're missing a case in memmap_init(), to also initialize
> holes at the beginning of a section, before RAM (we do handle sections
> in a special way if the end of RAM falls in the middle of a section).
>
> If it's not initialized, it might contain garbage.
Thanks for the comments, I will check it for RISC-V arch.
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype
  2022-06-16 15:04                   ` David Hildenbrand
  2022-06-16 15:16                     ` Xianting Tian
@ 2022-06-17  3:17                     ` Guo Ren
  1 sibling, 0 replies; 19+ messages in thread
From: Guo Ren @ 2022-06-17  3:17 UTC (permalink / raw)
  To: David Hildenbrand, Zi Yan
  Cc: Xianting Tian, Greg Kroah-Hartman, Andrew Morton,
	Vlastimil Babka, Linux-MM, Linux Kernel Mailing List, stable,
	huanyi.xj, zjb194813, tianhu.hh, Hanjun Guo, Joonsoo Kim,
	Laura Abbott

Hi David & Zi Yan


On Thu, Jun 16, 2022 at 11:04 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 16.06.22 16:01, Zi Yan wrote:
> > On 15 Jun 2022, at 12:15, Xianting Tian wrote:
> >
> >> 在 2022/6/15 下午9:55, Zi Yan 写道:
> >>> On 15 Jun 2022, at 2:47, Xianting Tian wrote:
> >>>
> >>>> 在 2022/6/14 上午8:14, Zi Yan 写道:
> >>>>> On 13 Jun 2022, at 19:47, Guo Ren wrote:
> >>>>>
> >>>>>> On Tue, Jun 14, 2022 at 3:49 AM Zi Yan <ziy@nvidia.com> wrote:
> >>>>>>> On 13 Jun 2022, at 12:32, Guo Ren wrote:
> >>>>>>>
> >>>>>>>> On Mon, Jun 13, 2022 at 11:23 PM Zi Yan <ziy@nvidia.com> wrote:
> >>>>>>>>> Hi Xianting,
> >>>>>>>>>
> >>>>>>>>> Thanks for your patch.
> >>>>>>>>>
> >>>>>>>>> On 13 Jun 2022, at 9:10, Xianting Tian wrote:
> >>>>>>>>>
> >>>>>>>>>> Commit 787af64d05cd ("mm: page_alloc: validate buddy before check its migratetype.")
> >>>>>>>>>> added buddy check code. But unfortunately, this fix isn't backported to
> >>>>>>>>>> linux-5.17.y and the former stable branches. The reason is it added wrong
> >>>>>>>>>> fixes message:
> >>>>>>>>>>        Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable
> >>>>>>>>>>                            pageblocks with others")
> >>>>>>>>> No, the Fixes tag is right. The commit above does need to validate buddy.
> >>>>>>>> I think Xianting is right. The “Fixes:" tag is not accurate and the
> >>>>>>>> page_is_buddy() is necessary here.
> >>>>>>>>
> >>>>>>>> This patch could be applied to the early version of the stable tree
> >>>>>>>> (eg: Linux-5.10.y, not the master tree)
> >>>>>>> This is quite misleading. Commit 787af64d05cd applies does not mean it is
> >>>>>>> intended to fix the preexisting bug. Also it does not apply cleanly
> >>>>>>> to commit d9dddbf55667, there is a clear indentation mismatch. At best,
> >>>>>>> you can say the way of 787af64d05cd fixing 1dd214b8f21c also fixes d9dddbf55667.
> >>>>>>> There is no way you can apply 787af64d05cd to earlier trees and call it a day.
> >>>>>>>
> >>>>>>> You can mention 787af64d05cd that it fixes a bug in 1dd214b8f21c and there is
> >>>>>>> a similar bug in d9dddbf55667 that can be fixed in a similar way too. Saying
> >>>>>>> the fixes message is wrong just misleads people, making them think there is
> >>>>>>> no bug in 1dd214b8f21c. We need to be clear about this.
> >>>>>> First, d9dddbf55667 is earlier than 1dd214b8f21c in Linus tree. The
> >>>>>> origin fixes could cover the Linux-5.0.y tree if they give the
> >>>>>> accurate commit number and that is the cause we want to point out.
> >>>>> Yes, I got that d9dddbf55667 is earlier and commit 787af64d05cd fixes
> >>>>> the issue introduced by d9dddbf55667. But my point is that 787af64d05cd
> >>>>> is not intended to fix d9dddbf55667 and saying it has a wrong fixes
> >>>>> message is misleading. This is the point I want to make.
> >>>>>
> >>>>>> Second, if the patch is for d9dddbf55667 then it could cover any tree
> >>>>>> in the stable repo. Actually, we only know Linux-5.10.y has the
> >>>>>> problem.
> >>>>> But it is not and does not apply to d9dddbf55667 cleanly.
> >>>>>
> >>>>>> Maybe, Gregkh could help to direct us on how to deal with the issue:
> >>>>>> (Fixup a bug which only belongs to the former stable branch.)
> >>>>>>
> >>>>> I think you just need to send this patch without saying “commit
> >>>>> 787af64d05cd fixes message is wrong” would be a good start. You also
> >>>>> need extra fix to mm/page_isolation.c for kernels between 5.15 and 5.17
> >>>>> (inclusive). So there will need to be two patches:
> >>>>>
> >>>>> 1) your patch to stable tree prior to 5.15 and
> >>>>>
> >>>>> 2) your patch with an additional mm/page_isolation.c fix to stable tree
> >>>>> between 5.15 and 5.17.
> >>>>>
> >>>>>>> Also, you will need to fix the mm/page_isolation.c code too to make this patch
> >>>>>>> complete, unless you can show that PFN=0x1000 is never going to be encountered
> >>>>>>> in the mm/page_isolation.c code I mentioned below.
> >>>>>> No, we needn't fix mm/page_isolation.c in linux-5.10.y, because it had
> >>>>>> pfn_valid_within(buddy_pfn) check after __find_buddy_pfn() to prevent
> >>>>>> buddy_pfn=0.
> >>>>>> The root cause comes from __find_buddy_pfn():
> >>>>>> return page_pfn ^ (1 << order);
> >>>>> Right. But pfn_valid_within() was removed since 5.15. So your fix is
> >>>>> required for kernels between 5.15 and 5.17 (inclusive).
> >>>>>
> >>>>>> When page_pfn is the same as the order size, it will return the
> >>>>>> previous buddy not the next. That is the only exception for this
> >>>>>> algorithm, right?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> In fact, the bug is a very long time to reproduce and is not easy to
> >>>>>> debug, so we want to contribute it to the community to prevent other
> >>>>>> guys from wasting time. Although there is no new patch at all.
> >>>>> Thanks for your reporting and sending out the patch. I really
> >>>>> appreciate it. We definitely need your inputs. Throughout the email
> >>>>> thread, I am trying to help you clarify the bug and how to fix it
> >>>>> properly:
> >>>>>
> >>>>> 1. The commit 787af64d05cd does not apply cleanly to commits
> >>>>> d9dddbf55667, meaning you cannot just cherry-pick that commit to
> >>>>> fix the issue. That is why we need your patch to fix the issue.
> >>>>> And saying it has a wrong fixes message in this patch’s git log is
> >>>>> misleading.
> >>>>>
> >>>>> 2. For kernels between 5.15 and 5.17 (inclusive), an additional fix
> >>>>> to mm/page_isolation.c is also needed, since pfn_valid_within() was
> >>>>> removed since 5.15 and the issue can appear during page isolation.
> >>>>>
> >>>>> 3. For kernels before 5.15, this patch will apply.
> >>>> Zi Yan, Guo Ren,
> >>>>
> >>>> I think we still need some imporvemnt for MASTER branch, as we discussed above, we will get an illegal buddy page if buddy_pfn is 0,
> >>>>
> >>>> within page_is_buddy(), it still use the illegal buddy page to do the check. I think in most of cases, page_is_buddy() can return false,  but it still may return true with very low probablity.
> >>> Can you elaborate more on this? What kind of page can lead to page_is_buddy()
> >>> returning true? You said it is buddy_pfn is 0, but if the page is reserved,
> >>> if (!page_is_guard(buddy) && !PageBuddy(buddy)) should return false.
> >>> Maybe show us the dump_page() that offending page.
> >>>
> >>> Thanks.
> >>
> >> Let‘s take the issue we met on RISC-V arch for example,
> >>
> >> pfn_base is 512 as we reserved 2M RAM for opensbi, mem_map's value is 0xffffffe07e205000, which is the page address of PFN 512.
> >>
> >> __find_buddy_pfn() returned 0 for PFN 0x2000 with order 0xd.
> >> We know PFN 0 is not a valid pfn for buddy system, because 512 is the first PFN for buddy system.
> >>
> >> Then it use below code to get buddy page with buddy_pfn 0:
> >> buddy = page + (buddy_pfn - pfn);
> >> So buddy page address is:
> >> 0xffffffe07e1fe000 = (struct page*)0xffffffe07e26e000 + (0 - 0x2000)
> >>
> >> we can know this buddy page's address is less than mem_map(0xffffffe07e1fe000 < 0xffffffe07e205000),
> >> actually 0xffffffe07e1fe000 is not a valid page's address. If we use 0xffffffe07e1fe000
> >> as the page's address to extract the value of a member in 'struct page', we may get an uncertain value.
> >> That's why I say page_is_buddy() may return true with very low probablity.
> >>
> >> So I think we need to add the code the verify buddy_pfn in the first place:
> >>      pfn_valid(buddy_pfn)
> >>
> >
> > +DavidH on how memory section works.
> >
> > This 2MB RAM reservation does not sound right to me. How does it work in sparsemem?
> > RISC-V has SECTION_SIZE_BITS=27, i.e., 128MB a section. All pages within
> > a section should have their corresponding struct page (mem_map). So in this case,
> > the first 2MB pages should have mem_map and can be marked as PageReserved. As a
> > result, page_is_buddy() will return false.
Actually, we had a patch to fix that, have a look:
https://lore.kernel.org/linux-riscv/20211123015717.542631-2-guoren@kernel.org/
What do you think of the above patch?

A lot of arch maintainers do not recognize that the buddy system has
an implied limitation that the start of the phy ram address must align
with (1 << MAX_ORDER-1).

>
> Yes. Unless there is a BUG :)
>
> init_unavailable_range() is supposed to initialize the memap of
> unavailable ranges and mark it reserved.
>
> I wonder if we're missing a case in memmap_init(), to also initialize
> holes at the beginning of a section, before RAM (we do handle sections
> in a special way if the end of RAM falls in the middle of a section).
>
> If it's not initialized, it might contain garbage.
>
> --
> Thanks,
>
> David / dhildenb
>


--
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-06-17  3:17 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-13 13:10 [RESEND PATCH] mm: page_alloc: validate buddy before check the migratetype Xianting Tian
2022-06-13 15:22 ` Zi Yan
2022-06-13 16:32   ` Guo Ren
2022-06-13 19:49     ` Zi Yan
2022-06-13 23:47       ` Guo Ren
2022-06-14  0:14         ` Zi Yan
2022-06-14  1:19           ` Guo Ren
2022-06-14  3:26             ` Xianting Tian
2022-06-15  6:47           ` Xianting Tian
2022-06-15 13:55             ` Zi Yan
2022-06-15 16:15               ` Xianting Tian
2022-06-16  1:51                 ` Xianting Tian
2022-06-16 14:01                 ` Zi Yan
2022-06-16 15:04                   ` David Hildenbrand
2022-06-16 15:16                     ` Xianting Tian
2022-06-17  3:17                     ` Guo Ren
2022-06-13 16:08 ` Guo Ren
2022-06-14  1:26   ` Xianting Tian
2022-06-14  2:04   ` Xianting Tian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).