* [PATCH] mm: hugetlb: Only prep and add allocated folios for non-gigantic pages
@ 2023-10-09 14:56 Usama Arif
2023-10-10 1:23 ` Mike Kravetz
2023-10-19 2:38 ` Mike Kravetz
0 siblings, 2 replies; 14+ messages in thread
From: Usama Arif @ 2023-10-09 14:56 UTC (permalink / raw)
To: linux-mm
Cc: linux-kernel, akpm, muchun.song, mike.kravetz, songmuchun,
fam.zheng, liangma, punit.agrawal, Usama Arif
Calling prep_and_add_allocated_folios when allocating gigantic pages
at boot time causes the kernel to crash as folio_list is empty
and iterating it causes a NULL pointer dereference. Call this only
for non-gigantic pages when folio_list has entires.
Fixes: bfb41d6b2fe148 ("hugetlb: restructure pool allocations")
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
---
mm/hugetlb.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f3749fc125d4..b12f5fd295bb 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3397,7 +3397,8 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
}
/* list will be empty if hstate_is_gigantic */
- prep_and_add_allocated_folios(h, &folio_list);
+ if (!hstate_is_gigantic(h))
+ prep_and_add_allocated_folios(h, &folio_list);
if (i < h->max_huge_pages) {
char buf[32];
--
2.25.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: hugetlb: Only prep and add allocated folios for non-gigantic pages
2023-10-09 14:56 [PATCH] mm: hugetlb: Only prep and add allocated folios for non-gigantic pages Usama Arif
@ 2023-10-10 1:23 ` Mike Kravetz
2023-10-10 17:01 ` [External] " Usama Arif
2023-10-12 0:03 ` Nathan Chancellor
2023-10-19 2:38 ` Mike Kravetz
1 sibling, 2 replies; 14+ messages in thread
From: Mike Kravetz @ 2023-10-10 1:23 UTC (permalink / raw)
To: Usama Arif
Cc: linux-mm, linux-kernel, akpm, muchun.song, songmuchun, fam.zheng,
liangma, punit.agrawal, Konrad Dybcio
On 10/09/23 15:56, Usama Arif wrote:
> Calling prep_and_add_allocated_folios when allocating gigantic pages
> at boot time causes the kernel to crash as folio_list is empty
> and iterating it causes a NULL pointer dereference. Call this only
> for non-gigantic pages when folio_list has entires.
Thanks!
However, are you sure the issue is the result of iterating through a
NULL list? For reference, the routine prep_and_add_allocated_folios is:
static void prep_and_add_allocated_folios(struct hstate *h,
struct list_head *folio_list)
{
struct folio *folio, *tmp_f;
/* Add all new pool pages to free lists in one lock cycle */
spin_lock_irq(&hugetlb_lock);
list_for_each_entry_safe(folio, tmp_f, folio_list, lru) {
__prep_account_new_huge_page(h, folio_nid(folio));
enqueue_hugetlb_folio(h, folio);
}
spin_unlock_irq(&hugetlb_lock);
}
If folio_list is empty, then the only code that should be executed is
acquiring the lock, notice the list is empty, release the lock.
In the case of gigantic pages addressed below, I do see the warning:
[ 0.055140] DEBUG_LOCKS_WARN_ON(early_boot_irqs_disabled)
[ 0.055149] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4345 lockdep_hardirqs_on_prepare+0x1a8/0x1b0
[ 0.055153] Modules linked in:
[ 0.055155] CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc4+ #40
[ 0.055157] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
[ 0.055158] RIP: 0010:lockdep_hardirqs_on_prepare+0x1a8/0x1b0
[ 0.055160] Code: 00 85 c0 0f 84 5e ff ff ff 8b 0d a7 20 74 01 85 c9 0f 85 50 ff ff ff 48 c7 c6 48 25 42 82 48 c7 c7 70 7f 40 82 e8 18 10 f7 ff <0f> 0b 5b e9 e0 d8 af 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90
[ 0.055162] RSP: 0000:ffffffff82603d40 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
[ 0.055164] RAX: 0000000000000000 RBX: ffffffff827911e0 RCX: 0000000000000000
[ 0.055165] RDX: 0000000000000004 RSI: ffffffff8246b3e1 RDI: 00000000ffffffff
[ 0.055166] RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
[ 0.055166] R10: ffffffffffffffff R11: 284e4f5f4e524157 R12: 0000000000000001
[ 0.055167] R13: ffffffff82eb6316 R14: ffffffff82603d70 R15: ffffffff82ee5f70
[ 0.055169] FS: 0000000000000000(0000) GS:ffff888277c00000(0000) knlGS:0000000000000000
[ 0.055170] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.055171] CR2: ffff88847ffff000 CR3: 000000000263a000 CR4: 00000000000200b0
[ 0.055174] Call Trace:
[ 0.055174] <TASK>
[ 0.055175] ? lockdep_hardirqs_on_prepare+0x1a8/0x1b0
[ 0.055177] ? __warn+0x81/0x170
[ 0.055181] ? lockdep_hardirqs_on_prepare+0x1a8/0x1b0
[ 0.055182] ? report_bug+0x18d/0x1c0
[ 0.055186] ? early_fixup_exception+0x92/0xb0
[ 0.055189] ? early_idt_handler_common+0x2f/0x40
[ 0.055194] ? lockdep_hardirqs_on_prepare+0x1a8/0x1b0
[ 0.055196] trace_hardirqs_on+0x10/0xa0
[ 0.055198] _raw_spin_unlock_irq+0x24/0x50
[ 0.055201] hugetlb_hstate_alloc_pages+0x311/0x3e0
[ 0.055206] hugepages_setup+0x220/0x2c0
[ 0.055210] unknown_bootoption+0x98/0x1d0
[ 0.055213] parse_args+0x152/0x440
[ 0.055216] ? __pfx_unknown_bootoption+0x10/0x10
[ 0.055220] start_kernel+0x1af/0x6c0
[ 0.055222] ? __pfx_unknown_bootoption+0x10/0x10
[ 0.055225] x86_64_start_reservations+0x14/0x30
[ 0.055227] x86_64_start_kernel+0x74/0x80
[ 0.055229] secondary_startup_64_no_verify+0x166/0x16b
[ 0.055234] </TASK>
[ 0.055235] irq event stamp: 0
[ 0.055236] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[ 0.055238] hardirqs last disabled at (0): [<0000000000000000>] 0x0
[ 0.055239] softirqs last enabled at (0): [<0000000000000000>] 0x0
[ 0.055240] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 0.055240] ---[ end trace 0000000000000000 ]---
This is because interrupts are not enabled this early in boot, and the
spin_unlock_irq() would incorrectly enable interrupts too early. I wonder
if this 'warning' could translate to a panic or NULL deref under certain
configurations?
Konrad, I am interested to see if this addresses your booting problem. But,
your stack trace is a bit different. My 'guess' is that this will not address
your issue. If it does not, can you try the following patch? This
applies to next-20231009.
--
Mike Kravetz
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f3749fc125d4..8346c98e5616 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2178,18 +2178,19 @@ static struct folio *alloc_fresh_hugetlb_folio(struct hstate *h,
static void prep_and_add_allocated_folios(struct hstate *h,
struct list_head *folio_list)
{
+ unsigned long flags;
struct folio *folio, *tmp_f;
/* Send list for bulk vmemmap optimization processing */
hugetlb_vmemmap_optimize_folios(h, folio_list);
/* Add all new pool pages to free lists in one lock cycle */
- spin_lock_irq(&hugetlb_lock);
+ spin_lock_irqsave(&hugetlb_lock, flags);
list_for_each_entry_safe(folio, tmp_f, folio_list, lru) {
__prep_account_new_huge_page(h, folio_nid(folio));
enqueue_hugetlb_folio(h, folio);
}
- spin_unlock_irq(&hugetlb_lock);
+ spin_unlock_irqrestore(&hugetlb_lock, flags);
}
/*
@@ -3224,13 +3225,14 @@ static void __init hugetlb_folio_init_vmemmap(struct folio *folio,
static void __init prep_and_add_bootmem_folios(struct hstate *h,
struct list_head *folio_list)
{
+ unsigned long flags;
struct folio *folio, *tmp_f;
/* Send list for bulk vmemmap optimization processing */
hugetlb_vmemmap_optimize_folios(h, folio_list);
/* Add all new pool pages to free lists in one lock cycle */
- spin_lock_irq(&hugetlb_lock);
+ spin_lock_irqsave(&hugetlb_lock, flags);
list_for_each_entry_safe(folio, tmp_f, folio_list, lru) {
if (!folio_test_hugetlb_vmemmap_optimized(folio)) {
/*
@@ -3246,7 +3248,7 @@ static void __init prep_and_add_bootmem_folios(struct hstate *h,
__prep_account_new_huge_page(h, folio_nid(folio));
enqueue_hugetlb_folio(h, folio);
}
- spin_unlock_irq(&hugetlb_lock);
+ spin_unlock_irqrestore(&hugetlb_lock, flags);
}
/*
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [External] Re: [PATCH] mm: hugetlb: Only prep and add allocated folios for non-gigantic pages
2023-10-10 1:23 ` Mike Kravetz
@ 2023-10-10 17:01 ` Usama Arif
2023-10-10 21:30 ` Mike Kravetz
2023-10-12 0:03 ` Nathan Chancellor
1 sibling, 1 reply; 14+ messages in thread
From: Usama Arif @ 2023-10-10 17:01 UTC (permalink / raw)
To: Mike Kravetz
Cc: linux-mm, linux-kernel, akpm, muchun.song, songmuchun, fam.zheng,
liangma, punit.agrawal, Konrad Dybcio
On 10/10/2023 02:23, Mike Kravetz wrote:
> On 10/09/23 15:56, Usama Arif wrote:
>> Calling prep_and_add_allocated_folios when allocating gigantic pages
>> at boot time causes the kernel to crash as folio_list is empty
>> and iterating it causes a NULL pointer dereference. Call this only
>> for non-gigantic pages when folio_list has entires.
>
> Thanks!
>
> However, are you sure the issue is the result of iterating through a
> NULL list? For reference, the routine prep_and_add_allocated_folios is:
>
Yes, you are right, it wasnt an issue with the list, but the lock. If I
do the below diff it boots.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 73803d62066a..f428af13e98a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2178,18 +2178,19 @@ static struct folio
*alloc_fresh_hugetlb_folio(struct hstate *h,
static void prep_and_add_allocated_folios(struct hstate *h,
struct list_head *folio_list)
{
+ unsigned long flags;
struct folio *folio, *tmp_f;
/* Send list for bulk vmemmap optimization processing */
hugetlb_vmemmap_optimize_folios(h, folio_list);
/* Add all new pool pages to free lists in one lock cycle */
- spin_lock_irq(&hugetlb_lock);
+ spin_lock_irqsave(&hugetlb_lock, flags);
list_for_each_entry_safe(folio, tmp_f, folio_list, lru) {
__prep_account_new_huge_page(h, folio_nid(folio));
enqueue_hugetlb_folio(h, folio);
}
- spin_unlock_irq(&hugetlb_lock);
+ spin_unlock_irqrestore(&hugetlb_lock, flags);
}
/*
FYI, this was an x86 VM with kvm enabled.
Thanks,
Usama
> static void prep_and_add_allocated_folios(struct hstate *h,
> struct list_head *folio_list)
> {
> struct folio *folio, *tmp_f;
>
> /* Add all new pool pages to free lists in one lock cycle */
> spin_lock_irq(&hugetlb_lock);
> list_for_each_entry_safe(folio, tmp_f, folio_list, lru) {
> __prep_account_new_huge_page(h, folio_nid(folio));
> enqueue_hugetlb_folio(h, folio);
> }
> spin_unlock_irq(&hugetlb_lock);
> }
>
> If folio_list is empty, then the only code that should be executed is
> acquiring the lock, notice the list is empty, release the lock.
>
> In the case of gigantic pages addressed below, I do see the warning:
>
> [ 0.055140] DEBUG_LOCKS_WARN_ON(early_boot_irqs_disabled)
> [ 0.055149] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4345 lockdep_hardirqs_on_prepare+0x1a8/0x1b0
> [ 0.055153] Modules linked in:
> [ 0.055155] CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc4+ #40
> [ 0.055157] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
> [ 0.055158] RIP: 0010:lockdep_hardirqs_on_prepare+0x1a8/0x1b0
> [ 0.055160] Code: 00 85 c0 0f 84 5e ff ff ff 8b 0d a7 20 74 01 85 c9 0f 85 50 ff ff ff 48 c7 c6 48 25 42 82 48 c7 c7 70 7f 40 82 e8 18 10 f7 ff <0f> 0b 5b e9 e0 d8 af 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90
> [ 0.055162] RSP: 0000:ffffffff82603d40 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
> [ 0.055164] RAX: 0000000000000000 RBX: ffffffff827911e0 RCX: 0000000000000000
> [ 0.055165] RDX: 0000000000000004 RSI: ffffffff8246b3e1 RDI: 00000000ffffffff
> [ 0.055166] RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
> [ 0.055166] R10: ffffffffffffffff R11: 284e4f5f4e524157 R12: 0000000000000001
> [ 0.055167] R13: ffffffff82eb6316 R14: ffffffff82603d70 R15: ffffffff82ee5f70
> [ 0.055169] FS: 0000000000000000(0000) GS:ffff888277c00000(0000) knlGS:0000000000000000
> [ 0.055170] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.055171] CR2: ffff88847ffff000 CR3: 000000000263a000 CR4: 00000000000200b0
> [ 0.055174] Call Trace:
> [ 0.055174] <TASK>
> [ 0.055175] ? lockdep_hardirqs_on_prepare+0x1a8/0x1b0
> [ 0.055177] ? __warn+0x81/0x170
> [ 0.055181] ? lockdep_hardirqs_on_prepare+0x1a8/0x1b0
> [ 0.055182] ? report_bug+0x18d/0x1c0
> [ 0.055186] ? early_fixup_exception+0x92/0xb0
> [ 0.055189] ? early_idt_handler_common+0x2f/0x40
> [ 0.055194] ? lockdep_hardirqs_on_prepare+0x1a8/0x1b0
> [ 0.055196] trace_hardirqs_on+0x10/0xa0
> [ 0.055198] _raw_spin_unlock_irq+0x24/0x50
> [ 0.055201] hugetlb_hstate_alloc_pages+0x311/0x3e0
> [ 0.055206] hugepages_setup+0x220/0x2c0
> [ 0.055210] unknown_bootoption+0x98/0x1d0
> [ 0.055213] parse_args+0x152/0x440
> [ 0.055216] ? __pfx_unknown_bootoption+0x10/0x10
> [ 0.055220] start_kernel+0x1af/0x6c0
> [ 0.055222] ? __pfx_unknown_bootoption+0x10/0x10
> [ 0.055225] x86_64_start_reservations+0x14/0x30
> [ 0.055227] x86_64_start_kernel+0x74/0x80
> [ 0.055229] secondary_startup_64_no_verify+0x166/0x16b
> [ 0.055234] </TASK>
> [ 0.055235] irq event stamp: 0
> [ 0.055236] hardirqs last enabled at (0): [<0000000000000000>] 0x0
> [ 0.055238] hardirqs last disabled at (0): [<0000000000000000>] 0x0
> [ 0.055239] softirqs last enabled at (0): [<0000000000000000>] 0x0
> [ 0.055240] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [ 0.055240] ---[ end trace 0000000000000000 ]---
>
> This is because interrupts are not enabled this early in boot, and the
> spin_unlock_irq() would incorrectly enable interrupts too early. I wonder
> if this 'warning' could translate to a panic or NULL deref under certain
> configurations?
>
> Konrad, I am interested to see if this addresses your booting problem. But,
> your stack trace is a bit different. My 'guess' is that this will not address
> your issue. If it does not, can you try the following patch? This
> applies to next-20231009.
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [External] Re: [PATCH] mm: hugetlb: Only prep and add allocated folios for non-gigantic pages
2023-10-10 17:01 ` [External] " Usama Arif
@ 2023-10-10 21:30 ` Mike Kravetz
2023-10-10 21:31 ` Konrad Dybcio
0 siblings, 1 reply; 14+ messages in thread
From: Mike Kravetz @ 2023-10-10 21:30 UTC (permalink / raw)
To: Usama Arif
Cc: linux-mm, linux-kernel, akpm, muchun.song, songmuchun, fam.zheng,
liangma, punit.agrawal, Konrad Dybcio
On 10/10/23 18:01, Usama Arif wrote:
>
>
> On 10/10/2023 02:23, Mike Kravetz wrote:
> > On 10/09/23 15:56, Usama Arif wrote:
> > > Calling prep_and_add_allocated_folios when allocating gigantic pages
> > > at boot time causes the kernel to crash as folio_list is empty
> > > and iterating it causes a NULL pointer dereference. Call this only
> > > for non-gigantic pages when folio_list has entires.
> >
> > Thanks!
> >
> > However, are you sure the issue is the result of iterating through a
> > NULL list? For reference, the routine prep_and_add_allocated_folios is:
> >
>
> Yes, you are right, it wasnt an issue with the list, but the lock. If I do
> the below diff it boots.
Thanks!
I believe that may be that may be the root cause of boot issues with
this series. It is unfortunate that the failures were not consistent
and did not directly point at the root cause.
Hopefully, these changes will resolve the boot issues for Konrad as well.
I will create a new version of the "Batch hugetlb vmemmap modification
operations" series with these locking changes.
--
Mike Kravetz
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [External] Re: [PATCH] mm: hugetlb: Only prep and add allocated folios for non-gigantic pages
2023-10-10 21:30 ` Mike Kravetz
@ 2023-10-10 21:31 ` Konrad Dybcio
0 siblings, 0 replies; 14+ messages in thread
From: Konrad Dybcio @ 2023-10-10 21:31 UTC (permalink / raw)
To: Mike Kravetz, Usama Arif
Cc: linux-mm, linux-kernel, akpm, muchun.song, songmuchun, fam.zheng,
liangma, punit.agrawal
On 10/10/23 23:30, Mike Kravetz wrote:
> On 10/10/23 18:01, Usama Arif wrote:
>>
>>
>> On 10/10/2023 02:23, Mike Kravetz wrote:
>>> On 10/09/23 15:56, Usama Arif wrote:
>>>> Calling prep_and_add_allocated_folios when allocating gigantic pages
>>>> at boot time causes the kernel to crash as folio_list is empty
>>>> and iterating it causes a NULL pointer dereference. Call this only
>>>> for non-gigantic pages when folio_list has entires.
>>>
>>> Thanks!
>>>
>>> However, are you sure the issue is the result of iterating through a
>>> NULL list? For reference, the routine prep_and_add_allocated_folios is:
>>>
>>
>> Yes, you are right, it wasnt an issue with the list, but the lock. If I do
>> the below diff it boots.
>
> Thanks!
>
> I believe that may be that may be the root cause of boot issues with
> this series. It is unfortunate that the failures were not consistent
> and did not directly point at the root cause.
>
> Hopefully, these changes will resolve the boot issues for Konrad as well.
>
> I will create a new version of the "Batch hugetlb vmemmap modification
> operations" series with these locking changes.
We sent a reply at the same time :P [1]
Konrad
[1]
https://lore.kernel.org/all/6f381d4c-d908-4f00-89b3-ed3bcb26b143@linaro.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: hugetlb: Only prep and add allocated folios for non-gigantic pages
2023-10-10 1:23 ` Mike Kravetz
2023-10-10 17:01 ` [External] " Usama Arif
@ 2023-10-12 0:03 ` Nathan Chancellor
2023-10-12 14:53 ` Mike Kravetz
1 sibling, 1 reply; 14+ messages in thread
From: Nathan Chancellor @ 2023-10-12 0:03 UTC (permalink / raw)
To: Mike Kravetz
Cc: Usama Arif, linux-mm, linux-kernel, akpm, muchun.song,
songmuchun, fam.zheng, liangma, punit.agrawal, Konrad Dybcio,
llvm
Hi Mike,
On Mon, Oct 09, 2023 at 06:23:45PM -0700, Mike Kravetz wrote:
> On 10/09/23 15:56, Usama Arif wrote:
> > Calling prep_and_add_allocated_folios when allocating gigantic pages
> > at boot time causes the kernel to crash as folio_list is empty
> > and iterating it causes a NULL pointer dereference. Call this only
> > for non-gigantic pages when folio_list has entires.
>
> Thanks!
>
> However, are you sure the issue is the result of iterating through a
> NULL list? For reference, the routine prep_and_add_allocated_folios is:
>
> static void prep_and_add_allocated_folios(struct hstate *h,
> struct list_head *folio_list)
> {
> struct folio *folio, *tmp_f;
>
> /* Add all new pool pages to free lists in one lock cycle */
> spin_lock_irq(&hugetlb_lock);
> list_for_each_entry_safe(folio, tmp_f, folio_list, lru) {
> __prep_account_new_huge_page(h, folio_nid(folio));
> enqueue_hugetlb_folio(h, folio);
> }
> spin_unlock_irq(&hugetlb_lock);
> }
>
> If folio_list is empty, then the only code that should be executed is
> acquiring the lock, notice the list is empty, release the lock.
>
> In the case of gigantic pages addressed below, I do see the warning:
>
> [ 0.055140] DEBUG_LOCKS_WARN_ON(early_boot_irqs_disabled)
> [ 0.055149] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4345 lockdep_hardirqs_on_prepare+0x1a8/0x1b0
> [ 0.055153] Modules linked in:
> [ 0.055155] CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc4+ #40
> [ 0.055157] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
> [ 0.055158] RIP: 0010:lockdep_hardirqs_on_prepare+0x1a8/0x1b0
> [ 0.055160] Code: 00 85 c0 0f 84 5e ff ff ff 8b 0d a7 20 74 01 85 c9 0f 85 50 ff ff ff 48 c7 c6 48 25 42 82 48 c7 c7 70 7f 40 82 e8 18 10 f7 ff <0f> 0b 5b e9 e0 d8 af 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90
> [ 0.055162] RSP: 0000:ffffffff82603d40 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
> [ 0.055164] RAX: 0000000000000000 RBX: ffffffff827911e0 RCX: 0000000000000000
> [ 0.055165] RDX: 0000000000000004 RSI: ffffffff8246b3e1 RDI: 00000000ffffffff
> [ 0.055166] RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000000
> [ 0.055166] R10: ffffffffffffffff R11: 284e4f5f4e524157 R12: 0000000000000001
> [ 0.055167] R13: ffffffff82eb6316 R14: ffffffff82603d70 R15: ffffffff82ee5f70
> [ 0.055169] FS: 0000000000000000(0000) GS:ffff888277c00000(0000) knlGS:0000000000000000
> [ 0.055170] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.055171] CR2: ffff88847ffff000 CR3: 000000000263a000 CR4: 00000000000200b0
> [ 0.055174] Call Trace:
> [ 0.055174] <TASK>
> [ 0.055175] ? lockdep_hardirqs_on_prepare+0x1a8/0x1b0
> [ 0.055177] ? __warn+0x81/0x170
> [ 0.055181] ? lockdep_hardirqs_on_prepare+0x1a8/0x1b0
> [ 0.055182] ? report_bug+0x18d/0x1c0
> [ 0.055186] ? early_fixup_exception+0x92/0xb0
> [ 0.055189] ? early_idt_handler_common+0x2f/0x40
> [ 0.055194] ? lockdep_hardirqs_on_prepare+0x1a8/0x1b0
> [ 0.055196] trace_hardirqs_on+0x10/0xa0
> [ 0.055198] _raw_spin_unlock_irq+0x24/0x50
> [ 0.055201] hugetlb_hstate_alloc_pages+0x311/0x3e0
> [ 0.055206] hugepages_setup+0x220/0x2c0
> [ 0.055210] unknown_bootoption+0x98/0x1d0
> [ 0.055213] parse_args+0x152/0x440
> [ 0.055216] ? __pfx_unknown_bootoption+0x10/0x10
> [ 0.055220] start_kernel+0x1af/0x6c0
> [ 0.055222] ? __pfx_unknown_bootoption+0x10/0x10
> [ 0.055225] x86_64_start_reservations+0x14/0x30
> [ 0.055227] x86_64_start_kernel+0x74/0x80
> [ 0.055229] secondary_startup_64_no_verify+0x166/0x16b
> [ 0.055234] </TASK>
> [ 0.055235] irq event stamp: 0
> [ 0.055236] hardirqs last enabled at (0): [<0000000000000000>] 0x0
> [ 0.055238] hardirqs last disabled at (0): [<0000000000000000>] 0x0
> [ 0.055239] softirqs last enabled at (0): [<0000000000000000>] 0x0
> [ 0.055240] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [ 0.055240] ---[ end trace 0000000000000000 ]---
>
> This is because interrupts are not enabled this early in boot, and the
> spin_unlock_irq() would incorrectly enable interrupts too early. I wonder
> if this 'warning' could translate to a panic or NULL deref under certain
> configurations?
>
> Konrad, I am interested to see if this addresses your booting problem. But,
> your stack trace is a bit different. My 'guess' is that this will not address
> your issue. If it does not, can you try the following patch? This
> applies to next-20231009.
> --
> Mike Kravetz
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f3749fc125d4..8346c98e5616 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2178,18 +2178,19 @@ static struct folio *alloc_fresh_hugetlb_folio(struct hstate *h,
> static void prep_and_add_allocated_folios(struct hstate *h,
> struct list_head *folio_list)
> {
> + unsigned long flags;
> struct folio *folio, *tmp_f;
>
> /* Send list for bulk vmemmap optimization processing */
> hugetlb_vmemmap_optimize_folios(h, folio_list);
>
> /* Add all new pool pages to free lists in one lock cycle */
> - spin_lock_irq(&hugetlb_lock);
> + spin_lock_irqsave(&hugetlb_lock, flags);
> list_for_each_entry_safe(folio, tmp_f, folio_list, lru) {
> __prep_account_new_huge_page(h, folio_nid(folio));
> enqueue_hugetlb_folio(h, folio);
> }
> - spin_unlock_irq(&hugetlb_lock);
> + spin_unlock_irqrestore(&hugetlb_lock, flags);
> }
>
> /*
> @@ -3224,13 +3225,14 @@ static void __init hugetlb_folio_init_vmemmap(struct folio *folio,
> static void __init prep_and_add_bootmem_folios(struct hstate *h,
> struct list_head *folio_list)
> {
> + unsigned long flags;
> struct folio *folio, *tmp_f;
>
> /* Send list for bulk vmemmap optimization processing */
> hugetlb_vmemmap_optimize_folios(h, folio_list);
>
> /* Add all new pool pages to free lists in one lock cycle */
> - spin_lock_irq(&hugetlb_lock);
> + spin_lock_irqsave(&hugetlb_lock, flags);
> list_for_each_entry_safe(folio, tmp_f, folio_list, lru) {
> if (!folio_test_hugetlb_vmemmap_optimized(folio)) {
> /*
> @@ -3246,7 +3248,7 @@ static void __init prep_and_add_bootmem_folios(struct hstate *h,
> __prep_account_new_huge_page(h, folio_nid(folio));
> enqueue_hugetlb_folio(h, folio);
> }
> - spin_unlock_irq(&hugetlb_lock);
> + spin_unlock_irqrestore(&hugetlb_lock, flags);
> }
>
> /*
I suspect the crash that our continuous integration spotted [1] is the
same issue that Konrad is seeing, as I have bisected that failure to
bfb41d6b2fe1 in next-20231009. However, neither the first half of your
diff (since the second half does not apply at bfb41d6b2fe1) nor the
original patch in this thread resolves the issue though, so maybe it is
entirely different from Konrad's?
For what it's worth, this issue is only visible for me when building for
arm64 using LLVM with CONFIG_INIT_STACK_NONE=y, instead of the default
CONFIG_INIT_STACK_ALL_ZERO=y (which appears to hide the problem?),
making it seem like it could be something with uninitialized memory... I
have not been able to reproduce it with GCC, which could also mean
something.
Using LLVM 17.0.2 from kernel.org [2]:
$ make -skj"$(nproc)" ARCH=arm64 LLVM=1 mrproper defconfig
$ scripts/config -d INIT_STACK_ALL_ZERO -e INIT_STACK_NONE
$ make -skj"$(nproc)" ARCH=arm64 LLVM=1 Image.gz
$ qemu-system-aarch64 \
-display none \
-nodefaults \
-cpu max,pauth-impdef=true \
-machine virt,gic-version=max,virtualization=true \
-append 'console=ttyAMA0 earlycon' \
-kernel arch/arm64/boot/Image.gz \
-initrd arm64-rootfs.cpio \
-m 512m \
-serial mon:stdio
...
[ 0.000000] Linux version 6.6.0-rc4-00317-gbfb41d6b2fe1 (nathan@dev-arch.thelio-3990X) (ClangBuiltLinux clang version 17.0.2 (https://github.com/llvm/llvm-project b2417f51dbbd7435eb3aaf203de24de6754da50e), ClangBuiltLinux LLD 17.0.2) #1 SMP PREEMPT Wed Oct 11 16:44:41 MST 2023
...
[ 0.304543] Unable to handle kernel paging request at virtual address ffffff602827f9f4
[ 0.304899] Mem abort info:
[ 0.305022] ESR = 0x0000000096000004
[ 0.305438] EC = 0x25: DABT (current EL), IL = 32 bits
[ 0.305668] SET = 0, FnV = 0
[ 0.305804] EA = 0, S1PTW = 0
[ 0.305949] FSC = 0x04: level 0 translation fault
[ 0.306156] Data abort info:
[ 0.306287] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 0.306500] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 0.306711] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 0.306976] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041cc3000
[ 0.307251] [ffffff602827f9f4] pgd=0000000000000000, p4d=0000000000000000
[ 0.308086] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 0.308428] Modules linked in:
[ 0.308722] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc4-00317-gbfb41d6b2fe1 #1
[ 0.309159] Hardware name: linux,dummy-virt (DT)
[ 0.309496] pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 0.309987] pc : gather_bootmem_prealloc+0x80/0x1a8
[ 0.310673] lr : hugetlb_init+0x1c8/0x2ec
[ 0.310871] sp : ffff80008000ba10
[ 0.311038] x29: ffff80008000ba30 x28: 0000000000000000 x27: ffffd80a09fe7db8
[ 0.311417] x26: 0000000000000001 x25: ffffd80a09fe7db8 x24: 0000000000000100
[ 0.311702] x23: fffffc0000000000 x22: 0001000000000000 x21: ffff80008000ba18
[ 0.311987] x20: ffffff602827f9c0 x19: ffffd80a0a555b60 x18: 00000000fbf7386f
[ 0.312272] x17: 00000000bee83943 x16: 000000002ae32058 x15: 0000000000000000
[ 0.312557] x14: 0000000000000009 x13: ffffd80a0a556d28 x12: ffffffffffffee38
[ 0.312831] x11: ffffd80a0a556d28 x10: 0000000000000004 x9 : ffffd80a09fe7000
[ 0.313141] x8 : 0000000d80a09fe7 x7 : 0000000001e1f7fb x6 : 0000000000000008
[ 0.313425] x5 : ffffd80a09ef1454 x4 : ffff00001fed5630 x3 : 0000000000019e00
[ 0.313703] x2 : ffff000002407b80 x1 : 0000000000019d00 x0 : 0000000000000000
[ 0.314054] Call trace:
[ 0.314259] gather_bootmem_prealloc+0x80/0x1a8
[ 0.314536] hugetlb_init+0x1c8/0x2ec
[ 0.314743] do_one_initcall+0xac/0x220
[ 0.314928] do_initcall_level+0x8c/0xac
[ 0.315114] do_initcalls+0x54/0x94
[ 0.315276] do_basic_setup+0x1c/0x28
[ 0.315450] kernel_init_freeable+0x104/0x170
[ 0.315648] kernel_init+0x20/0x1a0
[ 0.315822] ret_from_fork+0x10/0x20
[ 0.316235] Code: 979e8c0d 8b160328 d34cfd08 8b081af4 (b9403688)
[ 0.316745] ---[ end trace 0000000000000000 ]---
[ 0.317463] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 0.318093] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
The rootfs is available at [3] in case it is relevant. I am more than
happy to provide any additional information or test any patches as
necessary.
[1]: https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/6469151768/job/17570882198
[2]: https://mirrors.edge.kernel.org/pub/tools/llvm/
[3]: https://github.com/ClangBuiltLinux/boot-utils/releases
Cheers,
Nathan
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: hugetlb: Only prep and add allocated folios for non-gigantic pages
2023-10-12 0:03 ` Nathan Chancellor
@ 2023-10-12 14:53 ` Mike Kravetz
2023-10-13 0:12 ` Mike Kravetz
0 siblings, 1 reply; 14+ messages in thread
From: Mike Kravetz @ 2023-10-12 14:53 UTC (permalink / raw)
To: Nathan Chancellor
Cc: Usama Arif, linux-mm, linux-kernel, akpm, muchun.song,
songmuchun, fam.zheng, liangma, punit.agrawal, Konrad Dybcio,
llvm
On 10/11/23 17:03, Nathan Chancellor wrote:
> On Mon, Oct 09, 2023 at 06:23:45PM -0700, Mike Kravetz wrote:
> > On 10/09/23 15:56, Usama Arif wrote:
>
> I suspect the crash that our continuous integration spotted [1] is the
> same issue that Konrad is seeing, as I have bisected that failure to
> bfb41d6b2fe1 in next-20231009. However, neither the first half of your
> diff (since the second half does not apply at bfb41d6b2fe1) nor the
> original patch in this thread resolves the issue though, so maybe it is
> entirely different from Konrad's?
>
> For what it's worth, this issue is only visible for me when building for
> arm64 using LLVM with CONFIG_INIT_STACK_NONE=y, instead of the default
> CONFIG_INIT_STACK_ALL_ZERO=y (which appears to hide the problem?),
> making it seem like it could be something with uninitialized memory... I
> have not been able to reproduce it with GCC, which could also mean
> something.
Thank you Nathan! That is very helpful.
I will use this information to try and recreate. If I can recreate, I
should be able to get to root cause.
--
Mike Kravetz
> Using LLVM 17.0.2 from kernel.org [2]:
>
> $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 mrproper defconfig
>
> $ scripts/config -d INIT_STACK_ALL_ZERO -e INIT_STACK_NONE
>
> $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 Image.gz
>
> $ qemu-system-aarch64 \
> -display none \
> -nodefaults \
> -cpu max,pauth-impdef=true \
> -machine virt,gic-version=max,virtualization=true \
> -append 'console=ttyAMA0 earlycon' \
> -kernel arch/arm64/boot/Image.gz \
> -initrd arm64-rootfs.cpio \
> -m 512m \
> -serial mon:stdio
> ...
> [ 0.000000] Linux version 6.6.0-rc4-00317-gbfb41d6b2fe1 (nathan@dev-arch.thelio-3990X) (ClangBuiltLinux clang version 17.0.2 (https://github.com/llvm/llvm-project b2417f51dbbd7435eb3aaf203de24de6754da50e), ClangBuiltLinux LLD 17.0.2) #1 SMP PREEMPT Wed Oct 11 16:44:41 MST 2023
> ...
> [ 0.304543] Unable to handle kernel paging request at virtual address ffffff602827f9f4
> [ 0.304899] Mem abort info:
> [ 0.305022] ESR = 0x0000000096000004
> [ 0.305438] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 0.305668] SET = 0, FnV = 0
> [ 0.305804] EA = 0, S1PTW = 0
> [ 0.305949] FSC = 0x04: level 0 translation fault
> [ 0.306156] Data abort info:
> [ 0.306287] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> [ 0.306500] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [ 0.306711] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [ 0.306976] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041cc3000
> [ 0.307251] [ffffff602827f9f4] pgd=0000000000000000, p4d=0000000000000000
> [ 0.308086] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> [ 0.308428] Modules linked in:
> [ 0.308722] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc4-00317-gbfb41d6b2fe1 #1
> [ 0.309159] Hardware name: linux,dummy-virt (DT)
> [ 0.309496] pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [ 0.309987] pc : gather_bootmem_prealloc+0x80/0x1a8
> [ 0.310673] lr : hugetlb_init+0x1c8/0x2ec
> [ 0.310871] sp : ffff80008000ba10
> [ 0.311038] x29: ffff80008000ba30 x28: 0000000000000000 x27: ffffd80a09fe7db8
> [ 0.311417] x26: 0000000000000001 x25: ffffd80a09fe7db8 x24: 0000000000000100
> [ 0.311702] x23: fffffc0000000000 x22: 0001000000000000 x21: ffff80008000ba18
> [ 0.311987] x20: ffffff602827f9c0 x19: ffffd80a0a555b60 x18: 00000000fbf7386f
> [ 0.312272] x17: 00000000bee83943 x16: 000000002ae32058 x15: 0000000000000000
> [ 0.312557] x14: 0000000000000009 x13: ffffd80a0a556d28 x12: ffffffffffffee38
> [ 0.312831] x11: ffffd80a0a556d28 x10: 0000000000000004 x9 : ffffd80a09fe7000
> [ 0.313141] x8 : 0000000d80a09fe7 x7 : 0000000001e1f7fb x6 : 0000000000000008
> [ 0.313425] x5 : ffffd80a09ef1454 x4 : ffff00001fed5630 x3 : 0000000000019e00
> [ 0.313703] x2 : ffff000002407b80 x1 : 0000000000019d00 x0 : 0000000000000000
> [ 0.314054] Call trace:
> [ 0.314259] gather_bootmem_prealloc+0x80/0x1a8
> [ 0.314536] hugetlb_init+0x1c8/0x2ec
> [ 0.314743] do_one_initcall+0xac/0x220
> [ 0.314928] do_initcall_level+0x8c/0xac
> [ 0.315114] do_initcalls+0x54/0x94
> [ 0.315276] do_basic_setup+0x1c/0x28
> [ 0.315450] kernel_init_freeable+0x104/0x170
> [ 0.315648] kernel_init+0x20/0x1a0
> [ 0.315822] ret_from_fork+0x10/0x20
> [ 0.316235] Code: 979e8c0d 8b160328 d34cfd08 8b081af4 (b9403688)
> [ 0.316745] ---[ end trace 0000000000000000 ]---
> [ 0.317463] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [ 0.318093] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
>
> The rootfs is available at [3] in case it is relevant. I am more than
> happy to provide any additional information or test any patches as
> necessary.
>
> [1]: https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/6469151768/job/17570882198
> [2]: https://mirrors.edge.kernel.org/pub/tools/llvm/
> [3]: https://github.com/ClangBuiltLinux/boot-utils/releases
>
> Cheers,
> Nathan
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: hugetlb: Only prep and add allocated folios for non-gigantic pages
2023-10-12 14:53 ` Mike Kravetz
@ 2023-10-13 0:12 ` Mike Kravetz
2023-10-14 0:04 ` Mike Kravetz
0 siblings, 1 reply; 14+ messages in thread
From: Mike Kravetz @ 2023-10-13 0:12 UTC (permalink / raw)
To: Nathan Chancellor
Cc: Usama Arif, linux-mm, linux-kernel, akpm, muchun.song,
songmuchun, fam.zheng, liangma, punit.agrawal, Konrad Dybcio,
llvm
[-- Attachment #1: Type: text/plain, Size: 2825 bytes --]
On 10/12/23 07:53, Mike Kravetz wrote:
> On 10/11/23 17:03, Nathan Chancellor wrote:
> > On Mon, Oct 09, 2023 at 06:23:45PM -0700, Mike Kravetz wrote:
> > > On 10/09/23 15:56, Usama Arif wrote:
> >
> > I suspect the crash that our continuous integration spotted [1] is the
> > same issue that Konrad is seeing, as I have bisected that failure to
> > bfb41d6b2fe1 in next-20231009. However, neither the first half of your
> > diff (since the second half does not apply at bfb41d6b2fe1) nor the
> > original patch in this thread resolves the issue though, so maybe it is
> > entirely different from Konrad's?
> >
> > For what it's worth, this issue is only visible for me when building for
> > arm64 using LLVM with CONFIG_INIT_STACK_NONE=y, instead of the default
> > CONFIG_INIT_STACK_ALL_ZERO=y (which appears to hide the problem?),
> > making it seem like it could be something with uninitialized memory... I
> > have not been able to reproduce it with GCC, which could also mean
> > something.
>
> Thank you Nathan! That is very helpful.
>
> I will use this information to try and recreate. If I can recreate, I
> should be able to get to root cause.
I could easily recreate the issue using the provided instructions. First
thing I did was add a few printk's to check/verify state. The beginning
of gather_bootmem_prealloc looked like this:
static void __init gather_bootmem_prealloc(void)
{
LIST_HEAD(folio_list);
struct huge_bootmem_page *m;
struct hstate *h, *prev_h = NULL;
if (list_empty(&huge_boot_pages))
printk("gather_bootmem_prealloc: huge_boot_pages list empty\n");
list_for_each_entry(m, &huge_boot_pages, list) {
struct page *page = virt_to_page(m);
struct folio *folio = (void *)page;
printk("gather_bootmem_prealloc: loop entry m %lx\n",
(unsigned long)m);
The STRANGE thing is that the printk after testing for list_empty would
print, then we would enter the 'list_for_each_entry()' loop as if the list
was not empty. This is the cause of the addressing exception. m pointed
to the list head as opposed to an entry on the list.
I have attached disassembly of gather_bootmem_prealloc with INIT_STACK_NONE
and INIT_STACK_ALL_ZERO. disassembly listings are for code without
printks.
This is the first time I have looked at arm assembly, so I may be missing
something. However, in the INIT_STACK_NONE case it looks like we get the
address of huge_boot_pages into a register but do not use it to determine
if we should execute the loop. Code generated with INIT_STACK_ALL_ZERO seems
to show code checking the list before entering the loop.
Can someone with more arm assembly experience take a quick look? Since
huge_boot_pages is a global variable rather than on the stack, I can't
see how INIT_STACK_ALL_ZERO/INIT_STACK_NONE could make a difference.
--
Mike Kravetz
[-- Attachment #2: disass_INIT_STACK_NONE --]
[-- Type: text/plain, Size: 9882 bytes --]
Dump of assembler code for function gather_bootmem_prealloc:
mm/hugetlb.c:
3292 {
0xffff800081ae0f08 <+0>: d503233f paciasp
0xffff800081ae0f0c <+4>: d10203ff sub sp, sp, #0x80
0xffff800081ae0f10 <+8>: a9027bfd stp x29, x30, [sp, #32]
0xffff800081ae0f14 <+12>: a9036ffc stp x28, x27, [sp, #48]
0xffff800081ae0f18 <+16>: a90467fa stp x26, x25, [sp, #64]
0xffff800081ae0f1c <+20>: a9055ff8 stp x24, x23, [sp, #80]
0xffff800081ae0f20 <+24>: a90657f6 stp x22, x21, [sp, #96]
0xffff800081ae0f24 <+28>: a9074ff4 stp x20, x19, [sp, #112]
0xffff800081ae0f28 <+32>: 910083fd add x29, sp, #0x20
0xffff800081ae0f2c <+36>: d5384108 mrs x8, sp_el0
3294 struct huge_bootmem_page *m;
3295 struct hstate *h, *prev_h = NULL;
3296
3297 list_for_each_entry(m, &huge_boot_pages, list) {
0xffff800081ae0f30 <+40>: f00007a9 adrp x9, 0xffff800081bd7000 <new_log_buf_len>
0xffff800081ae0f34 <+44>: f9423d08 ldr x8, [x8, #1144]
0xffff800081ae0f38 <+48>: aa1f03e0 mov x0, xzr
0xffff800081ae0f3c <+52>: 910023f5 add x21, sp, #0x8
0xffff800081ae0f40 <+56>: d2e00036 mov x22, #0x1000000000000 // #281474976710656
0xffff800081ae0f44 <+60>: b25657f7 mov x23, #0xfffffc0000000000 // #-4398046511104
0xffff800081ae0f48 <+64>: 52802018 mov w24, #0x100 // #256
0xffff800081ae0f4c <+68>: f81f83a8 stur x8, [x29, #-8]
0xffff800081ae0f50 <+72>: 5280003a mov w26, #0x1 // #1
0xffff800081ae0f54 <+76>: f946dd39 ldr x25, [x9, #3512]
0xffff800081ae0f58 <+80>: d503201f nop
0xffff800081ae0f5c <+84>: 107b72fb adr x27, 0xffff800081bd7db8 <huge_boot_pages>
3293 LIST_HEAD(folio_list);
0xffff800081ae0f60 <+88>: a900d7f5 stp x21, x21, [sp, #8]
3298 struct page *page = virt_to_page(m);
3299 struct folio *folio = (void *)page;
3300
3301 h = m->hstate;
0xffff800081ae0f64 <+92>: f9400b33 ldr x19, [x25, #16]
3302 /*
3303 * It is possible to have multiple huge page sizes (hstates)
3304 * in this list. If so, process each size separately.
3305 */
3306 if (h != prev_h && prev_h != NULL)
0xffff800081ae0f68 <+96>: b40000a0 cbz x0, 0xffff800081ae0f7c <gather_bootmem_prealloc+116>
0xffff800081ae0f6c <+100>: eb00027f cmp x19, x0
0xffff800081ae0f70 <+104>: 54000060 b.eq 0xffff800081ae0f7c <gather_bootmem_prealloc+116> // b.none
3307 prep_and_add_allocated_folios(prev_h, &folio_list);
0xffff800081ae0f74 <+108>: 910023e1 add x1, sp, #0x8
0xffff800081ae0f78 <+112>: 979ecd63 bl 0xffff800080294504 <prep_and_add_allocated_folios>
0xffff800081ae0f7c <+116>: 8b160328 add x8, x25, x22
0xffff800081ae0f80 <+120>: d34cfd08 lsr x8, x8, #12
0xffff800081ae0f84 <+124>: 8b081af4 add x20, x23, x8, lsl #6
./include/linux/atomic/atomic-arch-fallback.h:
444 return arch_atomic_read(v);
0xffff800081ae0f88 <+128>: b9403688 ldr w8, [x20, #52]
mm/hugetlb.c:
3311 WARN_ON(folio_ref_count(folio) != 1);
0xffff800081ae0f8c <+132>: 7100051f cmp w8, #0x1
0xffff800081ae0f90 <+136>: 54000581 b.ne 0xffff800081ae1040 <gather_bootmem_prealloc+312> // b.any
3312
3313 hugetlb_folio_init_vmemmap(folio, h,
0xffff800081ae0f94 <+140>: aa1403e0 mov x0, x20
0xffff800081ae0f98 <+144>: aa1303e1 mov x1, x19
0xffff800081ae0f9c <+148>: 940001a2 bl 0xffff800081ae1624 <hugetlb_folio_init_vmemmap>
./arch/arm64/include/asm/alternative-macros.h:
232 asm_volatile_goto(
0xffff800081ae0fa0 <+152>: 1400002a b 0xffff800081ae1048 <gather_bootmem_prealloc+320>
./arch/arm64/include/asm/atomic_lse.h:
132 ATOMIC64_OP(or, stset)
0xffff800081ae0fa4 <+156>: 91010288 add x8, x20, #0x40
0xffff800081ae0fa8 <+160>: f838311f stset x24, [x8]
mm/hugetlb.c:
1969 INIT_LIST_HEAD(&folio->lru);
0xffff800081ae0fac <+164>: 9100229c add x28, x20, #0x8
./include/linux/list.h:
37 WRITE_ONCE(list->next, list);
0xffff800081ae0fb0 <+168>: f900069c str x28, [x20, #8]
38 WRITE_ONCE(list->prev, list);
0xffff800081ae0fb4 <+172>: f9000a9c str x28, [x20, #16]
./include/linux/hugetlb.h:
753 folio->_hugetlb_subpool = subpool;
0xffff800081ae0fb8 <+176>: f9004a9f str xzr, [x20, #144]
./include/asm-generic/bitops/generic-non-atomic.h:
128 return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
0xffff800081ae0fbc <+180>: f9400288 ldr x8, [x20]
./include/linux/mm.h:
1070 if (!folio_test_large(folio))
0xffff800081ae0fc0 <+184>: 363000a8 tbz w8, #6, 0xffff800081ae0fd4 <gather_bootmem_prealloc+204>
./include/linux/hugetlb_cgroup.h:
94 if (folio_order(folio) < HUGETLB_CGROUP_MIN_ORDER)
0xffff800081ae0fc4 <+188>: 39410288 ldrb w8, [x20, #64]
0xffff800081ae0fc8 <+192>: 721f191f tst w8, #0xfe
0xffff800081ae0fcc <+196>: 54000040 b.eq 0xffff800081ae0fd4 <gather_bootmem_prealloc+204> // b.none
98 else
99 folio->_hugetlb_cgroup = h_cg;
0xffff800081ae0fd0 <+200>: f9004e9f str xzr, [x20, #152]
./include/asm-generic/bitops/generic-non-atomic.h:
128 return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
0xffff800081ae0fd4 <+204>: f9400288 ldr x8, [x20]
./include/linux/mm.h:
1070 if (!folio_test_large(folio))
0xffff800081ae0fd8 <+208>: 363000a8 tbz w8, #6, 0xffff800081ae0fec <gather_bootmem_prealloc+228>
./include/linux/hugetlb_cgroup.h:
94 if (folio_order(folio) < HUGETLB_CGROUP_MIN_ORDER)
0xffff800081ae0fdc <+212>: 39410288 ldrb w8, [x20, #64]
0xffff800081ae0fe0 <+216>: 721f191f tst w8, #0xfe
0xffff800081ae0fe4 <+220>: 54000040 b.eq 0xffff800081ae0fec <gather_bootmem_prealloc+228> // b.none
95 return;
96 if (rsvd)
97 folio->_hugetlb_cgroup_rsvd = h_cg;
0xffff800081ae0fe8 <+224>: f900529f str xzr, [x20, #160]
./include/asm-generic/bitops/generic-non-atomic.h:
128 return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
0xffff800081ae0fec <+228>: f9401688 ldr x8, [x20, #40]
mm/hugetlb.c:
3317 if (!HPageVmemmapOptimized(&folio->page))
0xffff800081ae0ff0 <+232>: 362001c8 tbz w8, #4, 0xffff800081ae1028 <gather_bootmem_prealloc+288>
./include/linux/list.h:
169 __list_add(new, head, head->next);
0xffff800081ae0ff4 <+236>: f94007e8 ldr x8, [sp, #8]
mm/hugetlb.c:
3328 adjust_managed_page_count(page, pages_per_huge_page(h));
0xffff800081ae0ff8 <+240>: aa1403e0 mov x0, x20
./include/linux/list.h:
153 next->prev = new;
0xffff800081ae0ffc <+244>: f900051c str x28, [x8, #8]
154 new->next = next;
0xffff800081ae1000 <+248>: a900d688 stp x8, x21, [x20, #8]
155 new->prev = prev;
156 WRITE_ONCE(prev->next, new);
0xffff800081ae1004 <+252>: f90007fc str x28, [sp, #8]
./include/linux/hugetlb.h:
808 return 1 << h->order;
0xffff800081ae1008 <+256>: b9402a68 ldr w8, [x19, #40]
0xffff800081ae100c <+260>: 9ac82341 lsl x1, x26, x8
mm/hugetlb.c:
3328 adjust_managed_page_count(page, pages_per_huge_page(h));
0xffff800081ae1010 <+264>: 979e566a bl 0xffff8000802769b8 <adjust_managed_page_count>
3297 list_for_each_entry(m, &huge_boot_pages, list) {
0xffff800081ae1014 <+268>: f9400339 ldr x25, [x25]
0xffff800081ae1018 <+272>: aa1303e0 mov x0, x19
0xffff800081ae101c <+276>: eb1b033f cmp x25, x27
0xffff800081ae1020 <+280>: 54fffa21 b.ne 0xffff800081ae0f64 <gather_bootmem_prealloc+92> // b.any
0xffff800081ae1024 <+284>: 14000011 b 0xffff800081ae1068 <gather_bootmem_prealloc+352>
./include/linux/hugetlb.h:
808 return 1 << h->order;
0xffff800081ae1028 <+288>: b9402a68 ldr w8, [x19, #40]
mm/hugetlb.c:
3318 hugetlb_folio_init_tail_vmemmap(folio,
0xffff800081ae102c <+292>: aa1403e0 mov x0, x20
0xffff800081ae1030 <+296>: 52800801 mov w1, #0x40 // #64
./include/linux/hugetlb.h:
808 return 1 << h->order;
0xffff800081ae1034 <+300>: 9ac82342 lsl x2, x26, x8
mm/hugetlb.c:
3318 hugetlb_folio_init_tail_vmemmap(folio,
0xffff800081ae1038 <+304>: 940001ae bl 0xffff800081ae16f0 <hugetlb_folio_init_tail_vmemmap>
0xffff800081ae103c <+308>: 17ffffee b 0xffff800081ae0ff4 <gather_bootmem_prealloc+236>
0xffff800081ae1040 <+312>: d4210000 brk #0x800
0xffff800081ae1044 <+316>: 17ffffd4 b 0xffff800081ae0f94 <gather_bootmem_prealloc+140>
./arch/arm64/include/asm/atomic_ll_sc.h:
203 ATOMIC64_OPS(or, orr, L)
0xffff800081ae1048 <+320>: d503249f bti j
0xffff800081ae104c <+324>: 91010288 add x8, x20, #0x40
0xffff800081ae1050 <+328>: f9800111 prfm pstl1strm, [x8]
0xffff800081ae1054 <+332>: c85f7d09 ldxr x9, [x8]
0xffff800081ae1058 <+336>: b2780129 orr x9, x9, #0x100
0xffff800081ae105c <+340>: c80a7d09 stxr w10, x9, [x8]
0xffff800081ae1060 <+344>: 35ffffaa cbnz w10, 0xffff800081ae1054 <gather_bootmem_prealloc+332>
0xffff800081ae1064 <+348>: 17ffffd2 b 0xffff800081ae0fac <gather_bootmem_prealloc+164>
mm/hugetlb.c:
3332 prep_and_add_allocated_folios(h, &folio_list);
0xffff800081ae1068 <+352>: 910023e1 add x1, sp, #0x8
0xffff800081ae106c <+356>: aa1303e0 mov x0, x19
0xffff800081ae1070 <+360>: 979ecd25 bl 0xffff800080294504 <prep_and_add_allocated_folios>
0xffff800081ae1074 <+364>: d5384108 mrs x8, sp_el0
0xffff800081ae1078 <+368>: f9423d08 ldr x8, [x8, #1144]
0xffff800081ae107c <+372>: f85f83a9 ldur x9, [x29, #-8]
0xffff800081ae1080 <+376>: eb09011f cmp x8, x9
0xffff800081ae1084 <+380>: 54000141 b.ne 0xffff800081ae10ac <gather_bootmem_prealloc+420> // b.any
3333 }
0xffff800081ae1088 <+384>: a9474ff4 ldp x20, x19, [sp, #112]
0xffff800081ae108c <+388>: a94657f6 ldp x22, x21, [sp, #96]
0xffff800081ae1090 <+392>: a9455ff8 ldp x24, x23, [sp, #80]
0xffff800081ae1094 <+396>: a94467fa ldp x26, x25, [sp, #64]
0xffff800081ae1098 <+400>: a9436ffc ldp x28, x27, [sp, #48]
0xffff800081ae109c <+404>: a9427bfd ldp x29, x30, [sp, #32]
0xffff800081ae10a0 <+408>: 910203ff add sp, sp, #0x80
0xffff800081ae10a4 <+412>: d50323bf autiasp
0xffff800081ae10a8 <+416>: d65f03c0 ret
0xffff800081ae10ac <+420>: 97d6228a bl 0xffff800081069ad4 <__stack_chk_fail>
End of assembler dump.
[-- Attachment #3: disass_INIT_STACK_ALL_ZERO --]
[-- Type: text/plain, Size: 10136 bytes --]
Dump of assembler code for function gather_bootmem_prealloc:
mm/hugetlb.c:
3292 {
0xffff800081b0111c <+0>: d503233f paciasp
0xffff800081b01120 <+4>: d10203ff sub sp, sp, #0x80
0xffff800081b01124 <+8>: a9027bfd stp x29, x30, [sp, #32]
0xffff800081b01128 <+12>: a9036ffc stp x28, x27, [sp, #48]
0xffff800081b0112c <+16>: a90467fa stp x26, x25, [sp, #64]
0xffff800081b01130 <+20>: a9055ff8 stp x24, x23, [sp, #80]
0xffff800081b01134 <+24>: a90657f6 stp x22, x21, [sp, #96]
0xffff800081b01138 <+28>: a9074ff4 stp x20, x19, [sp, #112]
0xffff800081b0113c <+32>: 910083fd add x29, sp, #0x20
0xffff800081b01140 <+36>: d5384108 mrs x8, sp_el0
3294 struct huge_bootmem_page *m;
3295 struct hstate *h, *prev_h = NULL;
3296
3297 list_for_each_entry(m, &huge_boot_pages, list) {
0xffff800081b01144 <+40>: d503201f nop
0xffff800081b01148 <+44>: 107b6395 adr x21, 0xffff800081bf7db8 <huge_boot_pages>
0xffff800081b0114c <+48>: f9423d08 ldr x8, [x8, #1144]
0xffff800081b01150 <+52>: 910023f6 add x22, sp, #0x8
0xffff800081b01154 <+56>: f81f83a8 stur x8, [x29, #-8]
0xffff800081b01158 <+60>: f94002b7 ldr x23, [x21]
3293 LIST_HEAD(folio_list);
0xffff800081b0115c <+64>: a900dbf6 stp x22, x22, [sp, #8]
3294 struct huge_bootmem_page *m;
3295 struct hstate *h, *prev_h = NULL;
3296
3297 list_for_each_entry(m, &huge_boot_pages, list) {
0xffff800081b01160 <+68>: eb1502ff cmp x23, x21
0xffff800081b01164 <+72>: 540008e0 b.eq 0xffff800081b01280 <gather_bootmem_prealloc+356> // b.none
0xffff800081b01168 <+76>: aa1f03e0 mov x0, xzr
0xffff800081b0116c <+80>: d2e00038 mov x24, #0x1000000000000 // #281474976710656
0xffff800081b01170 <+84>: b25657f9 mov x25, #0xfffffc0000000000 // #-4398046511104
0xffff800081b01174 <+88>: 5280201a mov w26, #0x100 // #256
0xffff800081b01178 <+92>: 5280003b mov w27, #0x1 // #1
3298 struct page *page = virt_to_page(m);
3299 struct folio *folio = (void *)page;
3300
3301 h = m->hstate;
0xffff800081b0117c <+96>: f9400af3 ldr x19, [x23, #16]
3302 /*
3303 * It is possible to have multiple huge page sizes (hstates)
3304 * in this list. If so, process each size separately.
3305 */
3306 if (h != prev_h && prev_h != NULL)
0xffff800081b01180 <+100>: b40000a0 cbz x0, 0xffff800081b01194 <gather_bootmem_prealloc+120>
0xffff800081b01184 <+104>: eb00027f cmp x19, x0
0xffff800081b01188 <+108>: 54000060 b.eq 0xffff800081b01194 <gather_bootmem_prealloc+120> // b.none
3307 prep_and_add_allocated_folios(prev_h, &folio_list);
0xffff800081b0118c <+112>: 910023e1 add x1, sp, #0x8
0xffff800081b01190 <+116>: 979e5a34 bl 0xffff800080297a60 <prep_and_add_allocated_folios>
0xffff800081b01194 <+120>: 8b1802e8 add x8, x23, x24
0xffff800081b01198 <+124>: d34cfd08 lsr x8, x8, #12
0xffff800081b0119c <+128>: 8b081b34 add x20, x25, x8, lsl #6
./include/linux/atomic/atomic-arch-fallback.h:
444 return arch_atomic_read(v);
0xffff800081b011a0 <+132>: b9403688 ldr w8, [x20, #52]
mm/hugetlb.c:
3311 WARN_ON(folio_ref_count(folio) != 1);
0xffff800081b011a4 <+136>: 7100051f cmp w8, #0x1
0xffff800081b011a8 <+140>: 54000581 b.ne 0xffff800081b01258 <gather_bootmem_prealloc+316> // b.any
3312
3313 hugetlb_folio_init_vmemmap(folio, h,
0xffff800081b011ac <+144>: aa1403e0 mov x0, x20
0xffff800081b011b0 <+148>: aa1303e1 mov x1, x19
0xffff800081b011b4 <+152>: 940001a9 bl 0xffff800081b01858 <hugetlb_folio_init_vmemmap>
./arch/arm64/include/asm/alternative-macros.h:
232 asm_volatile_goto(
0xffff800081b011b8 <+156>: 1400002a b 0xffff800081b01260 <gather_bootmem_prealloc+324>
./arch/arm64/include/asm/atomic_lse.h:
132 ATOMIC64_OP(or, stset)
0xffff800081b011bc <+160>: 91010288 add x8, x20, #0x40
0xffff800081b011c0 <+164>: f83a311f stset x26, [x8]
mm/hugetlb.c:
1969 INIT_LIST_HEAD(&folio->lru);
0xffff800081b011c4 <+168>: 9100229c add x28, x20, #0x8
./include/linux/list.h:
37 WRITE_ONCE(list->next, list);
0xffff800081b011c8 <+172>: f900069c str x28, [x20, #8]
38 WRITE_ONCE(list->prev, list);
0xffff800081b011cc <+176>: f9000a9c str x28, [x20, #16]
./include/linux/hugetlb.h:
753 folio->_hugetlb_subpool = subpool;
0xffff800081b011d0 <+180>: f9004a9f str xzr, [x20, #144]
./include/asm-generic/bitops/generic-non-atomic.h:
128 return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
0xffff800081b011d4 <+184>: f9400288 ldr x8, [x20]
./include/linux/mm.h:
1070 if (!folio_test_large(folio))
0xffff800081b011d8 <+188>: 363000a8 tbz w8, #6, 0xffff800081b011ec <gather_bootmem_prealloc+208>
./include/linux/hugetlb_cgroup.h:
94 if (folio_order(folio) < HUGETLB_CGROUP_MIN_ORDER)
0xffff800081b011dc <+192>: 39410288 ldrb w8, [x20, #64]
0xffff800081b011e0 <+196>: 721f191f tst w8, #0xfe
0xffff800081b011e4 <+200>: 54000040 b.eq 0xffff800081b011ec <gather_bootmem_prealloc+208> // b.none
98 else
99 folio->_hugetlb_cgroup = h_cg;
0xffff800081b011e8 <+204>: f9004e9f str xzr, [x20, #152]
./include/asm-generic/bitops/generic-non-atomic.h:
128 return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
0xffff800081b011ec <+208>: f9400288 ldr x8, [x20]
./include/linux/mm.h:
1070 if (!folio_test_large(folio))
0xffff800081b011f0 <+212>: 363000a8 tbz w8, #6, 0xffff800081b01204 <gather_bootmem_prealloc+232>
./include/linux/hugetlb_cgroup.h:
94 if (folio_order(folio) < HUGETLB_CGROUP_MIN_ORDER)
0xffff800081b011f4 <+216>: 39410288 ldrb w8, [x20, #64]
0xffff800081b011f8 <+220>: 721f191f tst w8, #0xfe
0xffff800081b011fc <+224>: 54000040 b.eq 0xffff800081b01204 <gather_bootmem_prealloc+232> // b.none
95 return;
96 if (rsvd)
97 folio->_hugetlb_cgroup_rsvd = h_cg;
0xffff800081b01200 <+228>: f900529f str xzr, [x20, #160]
./include/asm-generic/bitops/generic-non-atomic.h:
128 return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
0xffff800081b01204 <+232>: f9401688 ldr x8, [x20, #40]
mm/hugetlb.c:
3317 if (!HPageVmemmapOptimized(&folio->page))
0xffff800081b01208 <+236>: 362001c8 tbz w8, #4, 0xffff800081b01240 <gather_bootmem_prealloc+292>
./include/linux/list.h:
169 __list_add(new, head, head->next);
0xffff800081b0120c <+240>: f94007e8 ldr x8, [sp, #8]
mm/hugetlb.c:
3328 adjust_managed_page_count(page, pages_per_huge_page(h));
0xffff800081b01210 <+244>: aa1403e0 mov x0, x20
./include/linux/list.h:
153 next->prev = new;
0xffff800081b01214 <+248>: f900051c str x28, [x8, #8]
154 new->next = next;
0xffff800081b01218 <+252>: a900da88 stp x8, x22, [x20, #8]
155 new->prev = prev;
156 WRITE_ONCE(prev->next, new);
0xffff800081b0121c <+256>: f90007fc str x28, [sp, #8]
./include/linux/hugetlb.h:
808 return 1 << h->order;
0xffff800081b01220 <+260>: b9402a68 ldr w8, [x19, #40]
0xffff800081b01224 <+264>: 9ac82361 lsl x1, x27, x8
mm/hugetlb.c:
3328 adjust_managed_page_count(page, pages_per_huge_page(h));
0xffff800081b01228 <+268>: 979de2fd bl 0xffff800080279e1c <adjust_managed_page_count>
3297 list_for_each_entry(m, &huge_boot_pages, list) {
0xffff800081b0122c <+272>: f94002f7 ldr x23, [x23]
0xffff800081b01230 <+276>: aa1303e0 mov x0, x19
0xffff800081b01234 <+280>: eb1502ff cmp x23, x21
0xffff800081b01238 <+284>: 54fffa21 b.ne 0xffff800081b0117c <gather_bootmem_prealloc+96> // b.any
0xffff800081b0123c <+288>: 14000012 b 0xffff800081b01284 <gather_bootmem_prealloc+360>
./include/linux/hugetlb.h:
808 return 1 << h->order;
0xffff800081b01240 <+292>: b9402a68 ldr w8, [x19, #40]
mm/hugetlb.c:
3318 hugetlb_folio_init_tail_vmemmap(folio,
0xffff800081b01244 <+296>: aa1403e0 mov x0, x20
0xffff800081b01248 <+300>: 52800801 mov w1, #0x40 // #64
./include/linux/hugetlb.h:
808 return 1 << h->order;
0xffff800081b0124c <+304>: 9ac82362 lsl x2, x27, x8
mm/hugetlb.c:
3318 hugetlb_folio_init_tail_vmemmap(folio,
0xffff800081b01250 <+308>: 940001b5 bl 0xffff800081b01924 <hugetlb_folio_init_tail_vmemmap>
0xffff800081b01254 <+312>: 17ffffee b 0xffff800081b0120c <gather_bootmem_prealloc+240>
0xffff800081b01258 <+316>: d4210000 brk #0x800
0xffff800081b0125c <+320>: 17ffffd4 b 0xffff800081b011ac <gather_bootmem_prealloc+144>
./arch/arm64/include/asm/atomic_ll_sc.h:
203 ATOMIC64_OPS(or, orr, L)
0xffff800081b01260 <+324>: d503249f bti j
0xffff800081b01264 <+328>: 91010288 add x8, x20, #0x40
0xffff800081b01268 <+332>: f9800111 prfm pstl1strm, [x8]
0xffff800081b0126c <+336>: c85f7d09 ldxr x9, [x8]
0xffff800081b01270 <+340>: b2780129 orr x9, x9, #0x100
0xffff800081b01274 <+344>: c80a7d09 stxr w10, x9, [x8]
0xffff800081b01278 <+348>: 35ffffaa cbnz w10, 0xffff800081b0126c <gather_bootmem_prealloc+336>
0xffff800081b0127c <+352>: 17ffffd2 b 0xffff800081b011c4 <gather_bootmem_prealloc+168>
0xffff800081b01280 <+356>: aa1f03f3 mov x19, xzr
mm/hugetlb.c:
3332 prep_and_add_allocated_folios(h, &folio_list);
0xffff800081b01284 <+360>: 910023e1 add x1, sp, #0x8
0xffff800081b01288 <+364>: aa1303e0 mov x0, x19
0xffff800081b0128c <+368>: 979e59f5 bl 0xffff800080297a60 <prep_and_add_allocated_folios>
0xffff800081b01290 <+372>: d5384108 mrs x8, sp_el0
0xffff800081b01294 <+376>: f9423d08 ldr x8, [x8, #1144]
0xffff800081b01298 <+380>: f85f83a9 ldur x9, [x29, #-8]
0xffff800081b0129c <+384>: eb09011f cmp x8, x9
0xffff800081b012a0 <+388>: 54000141 b.ne 0xffff800081b012c8 <gather_bootmem_prealloc+428> // b.any
3333 }
0xffff800081b012a4 <+392>: a9474ff4 ldp x20, x19, [sp, #112]
0xffff800081b012a8 <+396>: a94657f6 ldp x22, x21, [sp, #96]
0xffff800081b012ac <+400>: a9455ff8 ldp x24, x23, [sp, #80]
0xffff800081b012b0 <+404>: a94467fa ldp x26, x25, [sp, #64]
0xffff800081b012b4 <+408>: a9436ffc ldp x28, x27, [sp, #48]
0xffff800081b012b8 <+412>: a9427bfd ldp x29, x30, [sp, #32]
0xffff800081b012bc <+416>: 910203ff add sp, sp, #0x80
0xffff800081b012c0 <+420>: d50323bf autiasp
0xffff800081b012c4 <+424>: d65f03c0 ret
0xffff800081b012c8 <+428>: 97d5f73b bl 0xffff80008107efb4 <__stack_chk_fail>
End of assembler dump.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: hugetlb: Only prep and add allocated folios for non-gigantic pages
2023-10-13 0:12 ` Mike Kravetz
@ 2023-10-14 0:04 ` Mike Kravetz
2023-10-18 20:54 ` Nick Desaulniers
0 siblings, 1 reply; 14+ messages in thread
From: Mike Kravetz @ 2023-10-14 0:04 UTC (permalink / raw)
To: Nathan Chancellor
Cc: Usama Arif, linux-mm, linux-kernel, akpm, muchun.song,
songmuchun, fam.zheng, liangma, punit.agrawal, Konrad Dybcio,
llvm
On 10/12/23 17:12, Mike Kravetz wrote:
> On 10/12/23 07:53, Mike Kravetz wrote:
> > On 10/11/23 17:03, Nathan Chancellor wrote:
> > > On Mon, Oct 09, 2023 at 06:23:45PM -0700, Mike Kravetz wrote:
> > > > On 10/09/23 15:56, Usama Arif wrote:
> >
> > Thank you Nathan! That is very helpful.
> >
> > I will use this information to try and recreate. If I can recreate, I
> > should be able to get to root cause.
>
> I could easily recreate the issue using the provided instructions. First
> thing I did was add a few printk's to check/verify state. The beginning
> of gather_bootmem_prealloc looked like this:
Hi Nathan,
This is looking more and more like a Clang issue to me. I did a little
more problem isolation today. Here is what I did:
- Check out commit "hugetlb: restructure pool allocations" in linux-next
- Fix the known issue with early disable/enable IRQs via locking by
applying:
commit 266789498210dff6cf9a14b64fa3a5cb2fcc5858
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date: Fri Oct 13 13:14:15 2023 -0700
fix prep_and_add_allocated_folios locking
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c843506654f8..d8ab2d9b391b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2246,15 +2246,16 @@ static struct folio *alloc_fresh_hugetlb_folio(struct hstate *h,
static void prep_and_add_allocated_folios(struct hstate *h,
struct list_head *folio_list)
{
+ unsigned long flags;
struct folio *folio, *tmp_f;
/* Add all new pool pages to free lists in one lock cycle */
- spin_lock_irq(&hugetlb_lock);
+ spin_lock_irqsave(&hugetlb_lock, flags);
list_for_each_entry_safe(folio, tmp_f, folio_list, lru) {
__prep_account_new_huge_page(h, folio_nid(folio));
enqueue_hugetlb_folio(h, folio);
}
- spin_unlock_irq(&hugetlb_lock);
+ spin_unlock_irqrestore(&hugetlb_lock, flags);
}
/*
- Add the following code which would only trigger a BUG if we were to
traverse an empty list; which should NEVER happen.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d8ab2d9b391b..be234831b33f 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3294,11 +3294,21 @@ static void __init gather_bootmem_prealloc(void)
LIST_HEAD(folio_list);
struct huge_bootmem_page *m;
struct hstate *h, *prev_h = NULL;
+ bool empty;
+
+ empty = list_empty(&huge_boot_pages);
+ if (empty)
+ printk("gather_bootmem_prealloc: huge_boot_pages list empty\n");
list_for_each_entry(m, &huge_boot_pages, list) {
struct page *page = virt_to_page(m);
struct folio *folio = (void *)page;
+ if (empty) {
+ printk(" Traversing an empty list as if not empty!!!\n");
+ BUG();
+ }
+
h = m->hstate;
/*
* It is possible to have multiple huge page sizes (hstates)
- As you have experienced, this will BUG if built with LLVM 17.0.2 and
CONFIG_INIT_STACK_NONE
- It will NOT BUG if built with LLVM 13.0.1 but will BUG if built with
LLVM llvm-14.0.6-x86_64 and later.
As mentioned in the previous email, the generated code for loop entry
looks wrong to my untrained eyes. Can you or someone on the llvm team
take a look?
--
Mike Kravetz
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: hugetlb: Only prep and add allocated folios for non-gigantic pages
2023-10-14 0:04 ` Mike Kravetz
@ 2023-10-18 20:54 ` Nick Desaulniers
2023-10-18 22:20 ` Mike Kravetz
0 siblings, 1 reply; 14+ messages in thread
From: Nick Desaulniers @ 2023-10-18 20:54 UTC (permalink / raw)
To: Mike Kravetz
Cc: Nathan Chancellor, Usama Arif, linux-mm, linux-kernel, akpm,
muchun.song, songmuchun, fam.zheng, liangma, punit.agrawal,
Konrad Dybcio, llvm
On Fri, Oct 13, 2023 at 5:05 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 10/12/23 17:12, Mike Kravetz wrote:
> > On 10/12/23 07:53, Mike Kravetz wrote:
> > > On 10/11/23 17:03, Nathan Chancellor wrote:
> > > > On Mon, Oct 09, 2023 at 06:23:45PM -0700, Mike Kravetz wrote:
> > > > > On 10/09/23 15:56, Usama Arif wrote:
> > >
> > > Thank you Nathan! That is very helpful.
> > >
> > > I will use this information to try and recreate. If I can recreate, I
> > > should be able to get to root cause.
> >
> > I could easily recreate the issue using the provided instructions. First
> > thing I did was add a few printk's to check/verify state. The beginning
> > of gather_bootmem_prealloc looked like this:
>
> Hi Nathan,
>
> This is looking more and more like a Clang issue to me. I did a little
> more problem isolation today. Here is what I did:
>
> - Check out commit "hugetlb: restructure pool allocations" in linux-next
> - Fix the known issue with early disable/enable IRQs via locking by
> applying:
>
> commit 266789498210dff6cf9a14b64fa3a5cb2fcc5858
> Author: Mike Kravetz <mike.kravetz@oracle.com>
> Date: Fri Oct 13 13:14:15 2023 -0700
>
> fix prep_and_add_allocated_folios locking
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c843506654f8..d8ab2d9b391b 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2246,15 +2246,16 @@ static struct folio *alloc_fresh_hugetlb_folio(struct hstate *h,
> static void prep_and_add_allocated_folios(struct hstate *h,
> struct list_head *folio_list)
> {
> + unsigned long flags;
> struct folio *folio, *tmp_f;
>
> /* Add all new pool pages to free lists in one lock cycle */
> - spin_lock_irq(&hugetlb_lock);
> + spin_lock_irqsave(&hugetlb_lock, flags);
> list_for_each_entry_safe(folio, tmp_f, folio_list, lru) {
> __prep_account_new_huge_page(h, folio_nid(folio));
> enqueue_hugetlb_folio(h, folio);
> }
> - spin_unlock_irq(&hugetlb_lock);
> + spin_unlock_irqrestore(&hugetlb_lock, flags);
> }
>
> /*
>
> - Add the following code which would only trigger a BUG if we were to
> traverse an empty list; which should NEVER happen.
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d8ab2d9b391b..be234831b33f 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3294,11 +3294,21 @@ static void __init gather_bootmem_prealloc(void)
> LIST_HEAD(folio_list);
> struct huge_bootmem_page *m;
> struct hstate *h, *prev_h = NULL;
> + bool empty;
> +
> + empty = list_empty(&huge_boot_pages);
> + if (empty)
> + printk("gather_bootmem_prealloc: huge_boot_pages list empty\n");
>
> list_for_each_entry(m, &huge_boot_pages, list) {
> struct page *page = virt_to_page(m);
> struct folio *folio = (void *)page;
>
> + if (empty) {
> + printk(" Traversing an empty list as if not empty!!!\n");
> + BUG();
> + }
> +
> h = m->hstate;
> /*
> * It is possible to have multiple huge page sizes (hstates)
>
> - As you have experienced, this will BUG if built with LLVM 17.0.2 and
> CONFIG_INIT_STACK_NONE
>
> - It will NOT BUG if built with LLVM 13.0.1 but will BUG if built with
> LLVM llvm-14.0.6-x86_64 and later.
>
> As mentioned in the previous email, the generated code for loop entry
> looks wrong to my untrained eyes. Can you or someone on the llvm team
> take a look?
I think you need to initialize h, otherwise what value is passed to
prep_and_add_bootmem_folios if the loop is not run because the list is
empty. The compiler sees `h` is only given a value in the loop, so
the loop must be run. That's obviously hazardous, but the compiler
assumes there's no UB. At least that's my limited understanding
looking at the IR diff Nathan got me in
https://github.com/ClangBuiltLinux/linux/issues/1946.
--
Thanks,
~Nick Desaulniers
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: hugetlb: Only prep and add allocated folios for non-gigantic pages
2023-10-18 20:54 ` Nick Desaulniers
@ 2023-10-18 22:20 ` Mike Kravetz
2023-10-19 4:33 ` Sergey Senozhatsky
0 siblings, 1 reply; 14+ messages in thread
From: Mike Kravetz @ 2023-10-18 22:20 UTC (permalink / raw)
To: Nick Desaulniers
Cc: Nathan Chancellor, Usama Arif, linux-mm, linux-kernel, akpm,
muchun.song, songmuchun, fam.zheng, liangma, punit.agrawal,
Konrad Dybcio, llvm
On 10/18/23 13:54, Nick Desaulniers wrote:
> On Fri, Oct 13, 2023 at 5:05 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> > On 10/12/23 17:12, Mike Kravetz wrote:
> > > On 10/12/23 07:53, Mike Kravetz wrote:
> > > > On 10/11/23 17:03, Nathan Chancellor wrote:
> > > > > On Mon, Oct 09, 2023 at 06:23:45PM -0700, Mike Kravetz wrote:
> > > > > > On 10/09/23 15:56, Usama Arif wrote:
> > > >
> > > > Thank you Nathan! That is very helpful.
> > > >
> > > > I will use this information to try and recreate. If I can recreate, I
> > > > should be able to get to root cause.
> > >
> > > I could easily recreate the issue using the provided instructions. First
> > > thing I did was add a few printk's to check/verify state. The beginning
> > > of gather_bootmem_prealloc looked like this:
> >
> > Hi Nathan,
> >
> > This is looking more and more like a Clang issue to me. I did a little
> > more problem isolation today. Here is what I did:
> >
> > - Check out commit "hugetlb: restructure pool allocations" in linux-next
> > - Fix the known issue with early disable/enable IRQs via locking by
> > applying:
> >
> > commit 266789498210dff6cf9a14b64fa3a5cb2fcc5858
> > Author: Mike Kravetz <mike.kravetz@oracle.com>
> > Date: Fri Oct 13 13:14:15 2023 -0700
> >
> > fix prep_and_add_allocated_folios locking
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index c843506654f8..d8ab2d9b391b 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -2246,15 +2246,16 @@ static struct folio *alloc_fresh_hugetlb_folio(struct hstate *h,
> > static void prep_and_add_allocated_folios(struct hstate *h,
> > struct list_head *folio_list)
> > {
> > + unsigned long flags;
> > struct folio *folio, *tmp_f;
> >
> > /* Add all new pool pages to free lists in one lock cycle */
> > - spin_lock_irq(&hugetlb_lock);
> > + spin_lock_irqsave(&hugetlb_lock, flags);
> > list_for_each_entry_safe(folio, tmp_f, folio_list, lru) {
> > __prep_account_new_huge_page(h, folio_nid(folio));
> > enqueue_hugetlb_folio(h, folio);
> > }
> > - spin_unlock_irq(&hugetlb_lock);
> > + spin_unlock_irqrestore(&hugetlb_lock, flags);
> > }
> >
> > /*
> >
> > - Add the following code which would only trigger a BUG if we were to
> > traverse an empty list; which should NEVER happen.
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index d8ab2d9b391b..be234831b33f 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -3294,11 +3294,21 @@ static void __init gather_bootmem_prealloc(void)
> > LIST_HEAD(folio_list);
> > struct huge_bootmem_page *m;
> > struct hstate *h, *prev_h = NULL;
> > + bool empty;
> > +
> > + empty = list_empty(&huge_boot_pages);
> > + if (empty)
> > + printk("gather_bootmem_prealloc: huge_boot_pages list empty\n");
> >
> > list_for_each_entry(m, &huge_boot_pages, list) {
> > struct page *page = virt_to_page(m);
> > struct folio *folio = (void *)page;
> >
> > + if (empty) {
> > + printk(" Traversing an empty list as if not empty!!!\n");
> > + BUG();
> > + }
> > +
> > h = m->hstate;
> > /*
> > * It is possible to have multiple huge page sizes (hstates)
> >
> > - As you have experienced, this will BUG if built with LLVM 17.0.2 and
> > CONFIG_INIT_STACK_NONE
> >
> > - It will NOT BUG if built with LLVM 13.0.1 but will BUG if built with
> > LLVM llvm-14.0.6-x86_64 and later.
> >
> > As mentioned in the previous email, the generated code for loop entry
> > looks wrong to my untrained eyes. Can you or someone on the llvm team
> > take a look?
>
> I think you need to initialize h, otherwise what value is passed to
> prep_and_add_bootmem_folios if the loop is not run because the list is
> empty. The compiler sees `h` is only given a value in the loop, so
> the loop must be run. That's obviously hazardous, but the compiler
> assumes there's no UB. At least that's my limited understanding
> looking at the IR diff Nathan got me in
> https://github.com/ClangBuiltLinux/linux/issues/1946.
Thanks for looking closer at this Nick and Nathan!
I think you are saying the compiler is running the loop because it wants
to initialize h before passing the value to another function. It does
this even if the explicit loop entry condition is false. Is that correct?
For me, that is unexpected.
Internally, someone brought up the possibility that this could have been
caused by h not be initialized. However, I dismissed this. Why?
If h is not initialized, then this means we did not enter the loop and
process any entries. Hence, the list (folio_list) also passed to
prep_and_add_bootmem_folios is empty. In this case, prep_and_add_bootmem_folios
does not use the passed value h. h only applies to values in the list.
Sure, the coding is a little sloppy. But, I really did not expect this
to result in making a run through the loop when the entry condition was
false.
I will verify that initializing h will address the issue and if so, send
another version of this series.
--
Mike Kravetz
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: hugetlb: Only prep and add allocated folios for non-gigantic pages
2023-10-09 14:56 [PATCH] mm: hugetlb: Only prep and add allocated folios for non-gigantic pages Usama Arif
2023-10-10 1:23 ` Mike Kravetz
@ 2023-10-19 2:38 ` Mike Kravetz
1 sibling, 0 replies; 14+ messages in thread
From: Mike Kravetz @ 2023-10-19 2:38 UTC (permalink / raw)
To: Andrew Morton, Usama Arif
Cc: linux-mm, linux-kernel, muchun.song, songmuchun, fam.zheng,
liangma, punit.agrawal
On 10/09/23 15:56, Usama Arif wrote:
> Calling prep_and_add_allocated_folios when allocating gigantic pages
> at boot time causes the kernel to crash as folio_list is empty
> and iterating it causes a NULL pointer dereference. Call this only
> for non-gigantic pages when folio_list has entires.
>
> Fixes: bfb41d6b2fe148 ("hugetlb: restructure pool allocations")
> Signed-off-by: Usama Arif <usama.arif@bytedance.com>
> ---
> mm/hugetlb.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
Hi Andrew,
Can you remove this from mm-unstable. The root cause of Usama's crash
was improper irq enablement via locking calls. The changes Usama
verified (later in this thread) are in v8 of the "Batch hugetlb vmemmap
modification operations" series I just sent.
--
Mike Kravetz
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: hugetlb: Only prep and add allocated folios for non-gigantic pages
2023-10-18 22:20 ` Mike Kravetz
@ 2023-10-19 4:33 ` Sergey Senozhatsky
2023-10-19 14:20 ` Nathan Chancellor
0 siblings, 1 reply; 14+ messages in thread
From: Sergey Senozhatsky @ 2023-10-19 4:33 UTC (permalink / raw)
To: Mike Kravetz
Cc: Nick Desaulniers, Nathan Chancellor, Usama Arif, linux-mm,
linux-kernel, akpm, muchun.song, songmuchun, fam.zheng, liangma,
punit.agrawal, Konrad Dybcio, llvm
On (23/10/18 15:20), Mike Kravetz wrote:
> > I think you need to initialize h, otherwise what value is passed to
> > prep_and_add_bootmem_folios if the loop is not run because the list is
> > empty. The compiler sees `h` is only given a value in the loop, so
> > the loop must be run. That's obviously hazardous, but the compiler
> > assumes there's no UB. At least that's my limited understanding
> > looking at the IR diff Nathan got me in
> > https://github.com/ClangBuiltLinux/linux/issues/1946.
>
> Thanks for looking closer at this Nick and Nathan!
>
> I think you are saying the compiler is running the loop because it wants
> to initialize h before passing the value to another function. It does
> this even if the explicit loop entry condition is false. Is that correct?
The loop is getting promoted to "infinite" loop, there is no
&pos->member != (head) condition check in the generated code
at all (at least on my machine).
I wish we could at least get the "possibly uninitialized variable"
warning from the compiler in this case, which we'd translate to
"hold my beer, I'm going to try one thing".
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] mm: hugetlb: Only prep and add allocated folios for non-gigantic pages
2023-10-19 4:33 ` Sergey Senozhatsky
@ 2023-10-19 14:20 ` Nathan Chancellor
0 siblings, 0 replies; 14+ messages in thread
From: Nathan Chancellor @ 2023-10-19 14:20 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Mike Kravetz, Nick Desaulniers, Usama Arif, linux-mm,
linux-kernel, akpm, muchun.song, songmuchun, fam.zheng, liangma,
punit.agrawal, Konrad Dybcio, llvm
On Thu, Oct 19, 2023 at 01:33:05PM +0900, Sergey Senozhatsky wrote:
> On (23/10/18 15:20), Mike Kravetz wrote:
> > > I think you need to initialize h, otherwise what value is passed to
> > > prep_and_add_bootmem_folios if the loop is not run because the list is
> > > empty. The compiler sees `h` is only given a value in the loop, so
> > > the loop must be run. That's obviously hazardous, but the compiler
> > > assumes there's no UB. At least that's my limited understanding
> > > looking at the IR diff Nathan got me in
> > > https://github.com/ClangBuiltLinux/linux/issues/1946.
> >
> > Thanks for looking closer at this Nick and Nathan!
> >
> > I think you are saying the compiler is running the loop because it wants
> > to initialize h before passing the value to another function. It does
> > this even if the explicit loop entry condition is false. Is that correct?
>
> The loop is getting promoted to "infinite" loop, there is no
> &pos->member != (head) condition check in the generated code
> at all (at least on my machine).
>
> I wish we could at least get the "possibly uninitialized variable"
> warning from the compiler in this case, which we'd translate to
> "hold my beer, I'm going to try one thing".
GCC would warn about this under -Wmaybe-uninitialized but it has been
disabled in a normal build for the past three years, see commit
78a5255ffb6a ("Stop the ad-hoc games with -Wno-maybe-initialized").
In function 'gather_bootmem_prealloc',
inlined from 'hugetlb_init' at mm/hugetlb.c:4299:2:
mm/hugetlb.c:3203:9: warning: 'h' may be used uninitialized [-Wmaybe-uninitialized]
3203 | prep_and_add_allocated_folios(h, &folio_list);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mm/hugetlb.c: In function 'hugetlb_init':
mm/hugetlb.c:3166:24: note: 'h' was declared here
3166 | struct hstate *h, *prev_h = NULL;
| ^
Clang's -Wconditional-uninitialized would have flagged it too but it
suffers from the same problems as -Wmaybe-uninitialized.
mm/hugetlb.c:3203:32: warning: variable 'h' may be uninitialized when used here [-Wconditional-uninitialized]
3203 | prep_and_add_allocated_folios(h, &folio_list);
| ^
mm/hugetlb.c:3166:18: note: initialize the variable 'h' to silence this warning
3166 | struct hstate *h, *prev_h = NULL;
| ^
| = NULL
I know clang has some handling for loops in -Wsometimes-uninitialized, I
wonder why that does not trigger here...
Cheers,
Nathan
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-10-19 14:20 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-09 14:56 [PATCH] mm: hugetlb: Only prep and add allocated folios for non-gigantic pages Usama Arif
2023-10-10 1:23 ` Mike Kravetz
2023-10-10 17:01 ` [External] " Usama Arif
2023-10-10 21:30 ` Mike Kravetz
2023-10-10 21:31 ` Konrad Dybcio
2023-10-12 0:03 ` Nathan Chancellor
2023-10-12 14:53 ` Mike Kravetz
2023-10-13 0:12 ` Mike Kravetz
2023-10-14 0:04 ` Mike Kravetz
2023-10-18 20:54 ` Nick Desaulniers
2023-10-18 22:20 ` Mike Kravetz
2023-10-19 4:33 ` Sergey Senozhatsky
2023-10-19 14:20 ` Nathan Chancellor
2023-10-19 2:38 ` Mike Kravetz
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.