All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2] x86/kexec: do not update E820 kexec table for setup_data
@ 2024-03-21  9:23 ` Dave Young
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Young @ 2024-03-21  9:23 UTC (permalink / raw)
  To: x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, linux-kernel, kexec, Baoquan He, Eric Biederman,
	Huang, Kai

crashkernel reservation failed on a Thinkpad t440s laptop recently. 
Actually the memblock reservation succeeded, but later insert_resource()
failed.

Test steps:
kexec load -> /* make sure add crashkernel param eg. crashkernel=160M */
    kexec reboot -> 
        dmesg|grep "crashkernel reserved";
            crashkernel memory range like below reserved successfully:
            0x00000000d0000000 - 0x00000000da000000
        But no such "Crash kernel" region in /proc/iomem

The background story is like below:

Currently E820 code reserves setup_data regions for both the current
kernel and the kexec kernel, and it inserts them into the resources list.
Before the kexec kernel reboots nobody passes the old setup_data, and
kexec only passes fresh SETUP_EFI and SETUP_IMA if needed.  Thus the old
setup data memory is not used at all.

Due to old kernel updates the kexec e820 table as well so kexec kernel
sees them as E820_TYPE_RESERVED_KERN regions, and later the old setup_data
regions are inserted into resources list in the kexec kernel by
e820__reserve_resources().

Note, due to no setup_data is passed in for those old regions they are not
early reserved (by function early_reserve_memory), and the crashkernel
memblock reservation will just treat them as usable memory and it could
reserve the crashkernel region which overlaps with the old setup_data
regions. And just like the bug I noticed here, kdump insert_resource
failed because e820__reserve_resources has added the overlapped chunks
in /proc/iomem already.

Finally, looking at the code, the old setup_data regions are not used
at all as no setup_data is passed in by the kexec boot loader. Although
something like SETUP_PCI etc could be needed, kexec should pass
the info as new setup_data so that kexec kernel can take care of them.
This should be taken care of in other separate patches if needed.

Thus drop the useless buggy code here.

Signed-off-by: Dave Young <dyoung@redhat.com>
---
V2: changelog grammar fixes [suggestions from Huang Kai]
 arch/x86/kernel/e820.c |   16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

Index: linux/arch/x86/kernel/e820.c
===================================================================
--- linux.orig/arch/x86/kernel/e820.c
+++ linux/arch/x86/kernel/e820.c
@@ -1015,16 +1015,6 @@ void __init e820__reserve_setup_data(voi
 		pa_next = data->next;
 
 		e820__range_update(pa_data, sizeof(*data)+data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
-
-		/*
-		 * SETUP_EFI and SETUP_IMA are supplied by kexec and do not need
-		 * to be reserved.
-		 */
-		if (data->type != SETUP_EFI && data->type != SETUP_IMA)
-			e820__range_update_kexec(pa_data,
-						 sizeof(*data) + data->len,
-						 E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
-
 		if (data->type == SETUP_INDIRECT) {
 			len += data->len;
 			early_memunmap(data, sizeof(*data));
@@ -1036,12 +1026,9 @@ void __init e820__reserve_setup_data(voi
 
 			indirect = (struct setup_indirect *)data->data;
 
-			if (indirect->type != SETUP_INDIRECT) {
+			if (indirect->type != SETUP_INDIRECT)
 				e820__range_update(indirect->addr, indirect->len,
 						   E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
-				e820__range_update_kexec(indirect->addr, indirect->len,
-							 E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
-			}
 		}
 
 		pa_data = pa_next;
@@ -1049,7 +1036,6 @@ void __init e820__reserve_setup_data(voi
 	}
 
 	e820__update_table(e820_table);
-	e820__update_table(e820_table_kexec);
 
 	pr_info("extended physical RAM map:\n");
 	e820__print_table("reserve setup_data");


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH V2] x86/kexec: do not update E820 kexec table for setup_data
@ 2024-03-21  9:23 ` Dave Young
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Young @ 2024-03-21  9:23 UTC (permalink / raw)
  To: x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, linux-kernel, kexec, Baoquan He, Eric Biederman,
	Huang, Kai

crashkernel reservation failed on a Thinkpad t440s laptop recently. 
Actually the memblock reservation succeeded, but later insert_resource()
failed.

Test steps:
kexec load -> /* make sure add crashkernel param eg. crashkernel=160M */
    kexec reboot -> 
        dmesg|grep "crashkernel reserved";
            crashkernel memory range like below reserved successfully:
            0x00000000d0000000 - 0x00000000da000000
        But no such "Crash kernel" region in /proc/iomem

The background story is like below:

Currently E820 code reserves setup_data regions for both the current
kernel and the kexec kernel, and it inserts them into the resources list.
Before the kexec kernel reboots nobody passes the old setup_data, and
kexec only passes fresh SETUP_EFI and SETUP_IMA if needed.  Thus the old
setup data memory is not used at all.

Due to old kernel updates the kexec e820 table as well so kexec kernel
sees them as E820_TYPE_RESERVED_KERN regions, and later the old setup_data
regions are inserted into resources list in the kexec kernel by
e820__reserve_resources().

Note, due to no setup_data is passed in for those old regions they are not
early reserved (by function early_reserve_memory), and the crashkernel
memblock reservation will just treat them as usable memory and it could
reserve the crashkernel region which overlaps with the old setup_data
regions. And just like the bug I noticed here, kdump insert_resource
failed because e820__reserve_resources has added the overlapped chunks
in /proc/iomem already.

Finally, looking at the code, the old setup_data regions are not used
at all as no setup_data is passed in by the kexec boot loader. Although
something like SETUP_PCI etc could be needed, kexec should pass
the info as new setup_data so that kexec kernel can take care of them.
This should be taken care of in other separate patches if needed.

Thus drop the useless buggy code here.

Signed-off-by: Dave Young <dyoung@redhat.com>
---
V2: changelog grammar fixes [suggestions from Huang Kai]
 arch/x86/kernel/e820.c |   16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

Index: linux/arch/x86/kernel/e820.c
===================================================================
--- linux.orig/arch/x86/kernel/e820.c
+++ linux/arch/x86/kernel/e820.c
@@ -1015,16 +1015,6 @@ void __init e820__reserve_setup_data(voi
 		pa_next = data->next;
 
 		e820__range_update(pa_data, sizeof(*data)+data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
-
-		/*
-		 * SETUP_EFI and SETUP_IMA are supplied by kexec and do not need
-		 * to be reserved.
-		 */
-		if (data->type != SETUP_EFI && data->type != SETUP_IMA)
-			e820__range_update_kexec(pa_data,
-						 sizeof(*data) + data->len,
-						 E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
-
 		if (data->type == SETUP_INDIRECT) {
 			len += data->len;
 			early_memunmap(data, sizeof(*data));
@@ -1036,12 +1026,9 @@ void __init e820__reserve_setup_data(voi
 
 			indirect = (struct setup_indirect *)data->data;
 
-			if (indirect->type != SETUP_INDIRECT) {
+			if (indirect->type != SETUP_INDIRECT)
 				e820__range_update(indirect->addr, indirect->len,
 						   E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
-				e820__range_update_kexec(indirect->addr, indirect->len,
-							 E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
-			}
 		}
 
 		pa_data = pa_next;
@@ -1049,7 +1036,6 @@ void __init e820__reserve_setup_data(voi
 	}
 
 	e820__update_table(e820_table);
-	e820__update_table(e820_table_kexec);
 
 	pr_info("extended physical RAM map:\n");
 	e820__print_table("reserve setup_data");


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V2] x86/kexec: do not update E820 kexec table for setup_data
  2024-03-21  9:23 ` Dave Young
@ 2024-03-21 10:32   ` Jiri Bohac
  -1 siblings, 0 replies; 8+ messages in thread
From: Jiri Bohac @ 2024-03-21 10:32 UTC (permalink / raw)
  To: Dave Young
  Cc: x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, linux-kernel, kexec, Baoquan He, Eric Biederman,
	Huang, Kai

Hi,

On Thu, Mar 21, 2024 at 05:23:20PM +0800, Dave Young wrote:
> crashkernel reservation failed on a Thinkpad t440s laptop recently. 
> Actually the memblock reservation succeeded, but later insert_resource()
> failed.
> 
> Test steps:
> kexec load -> /* make sure add crashkernel param eg. crashkernel=160M */
>     kexec reboot -> 
>         dmesg|grep "crashkernel reserved";
>             crashkernel memory range like below reserved successfully:
>             0x00000000d0000000 - 0x00000000da000000
>         But no such "Crash kernel" region in /proc/iomem
> 
> The background story is like below:
> 
> Currently E820 code reserves setup_data regions for both the current
> kernel and the kexec kernel, and it inserts them into the resources list.
> Before the kexec kernel reboots nobody passes the old setup_data, and
> kexec only passes fresh SETUP_EFI and SETUP_IMA if needed.  Thus the old
> setup data memory is not used at all.
> 
> Due to old kernel updates the kexec e820 table as well so kexec kernel
> sees them as E820_TYPE_RESERVED_KERN regions, and later the old setup_data
> regions are inserted into resources list in the kexec kernel by
> e820__reserve_resources().
> 
> Note, due to no setup_data is passed in for those old regions they are not
> early reserved (by function early_reserve_memory), and the crashkernel
> memblock reservation will just treat them as usable memory and it could
> reserve the crashkernel region which overlaps with the old setup_data
> regions. And just like the bug I noticed here, kdump insert_resource
> failed because e820__reserve_resources has added the overlapped chunks
> in /proc/iomem already.

wouldn't this be caused by
4a693ce65b186fddc1a73621bd6f941e6e3eca21 ("kdump: defer the
insertion of crashkernel resources")?

Before that the crashkernel resources were inserted from
arch_reserve_crashkernel() which is called before
e820__reserve_resources().

The semantics of E820_TYPE_RESERVED_KERN wrt kexec quite
inconsistent. It's treated as E820_TYPE_RAM by
e820__memblock_setup() and e820_type_to_iomem_type().

The problem we're seeing here is the result of the former.
e820__memblock_setup() will add the E820_TYPE_RESERVED_KERN
region to the memblock, merging with the neighbouring memblocks,
allowing crashkernel region to span across the originally
reserved area.

e820_type_to_iomem_type() treating E820_TYPE_RESERVED_KERN as
E820_TYPE_RAM will make the E820_TYPE_RESERVED_KERN appear as
system ram in /proc/iomem. If the old kexec_load (not
kexec_file_load) syscall is used, the userspace kexec utility
will construct the e820 table based on the contents of
/proc/iomem and the kexec kernel will see the
E820_TYPE_RESERVED_KERN range as E820_TYPE_RAM.  When
kexec_file_load is used the E820_TYPE_RESERVED_KERN type is
propagated to the kexec kernel by bzImage64_load() /
setup_e820_entries().


> Index: linux/arch/x86/kernel/e820.c
> ===================================================================
> --- linux.orig/arch/x86/kernel/e820.c
> +++ linux/arch/x86/kernel/e820.c
> @@ -1015,16 +1015,6 @@ void __init e820__reserve_setup_data(voi
>  		pa_next = data->next;
>  
>  		e820__range_update(pa_data, sizeof(*data)+data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> -
> -		/*
> -		 * SETUP_EFI and SETUP_IMA are supplied by kexec and do not need
> -		 * to be reserved.
> -		 */
> -		if (data->type != SETUP_EFI && data->type != SETUP_IMA)
> -			e820__range_update_kexec(pa_data,
> -						 sizeof(*data) + data->len,
> -						 E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> -

Your tree is missing this recent commit:
7fd817c906503b6813ea3b41f5fdf4192449a707 ("x86/e820: Don't
reserve SETUP_RNG_SEED in e820").

Wouldn't this fix [/paper over] your problem as well? I.e., isn't
SETUP_RNG_SEED the setup_data item that's causing your problem?

Regards,

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V2] x86/kexec: do not update E820 kexec table for setup_data
@ 2024-03-21 10:32   ` Jiri Bohac
  0 siblings, 0 replies; 8+ messages in thread
From: Jiri Bohac @ 2024-03-21 10:32 UTC (permalink / raw)
  To: Dave Young
  Cc: x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, linux-kernel, kexec, Baoquan He, Eric Biederman,
	Huang, Kai

Hi,

On Thu, Mar 21, 2024 at 05:23:20PM +0800, Dave Young wrote:
> crashkernel reservation failed on a Thinkpad t440s laptop recently. 
> Actually the memblock reservation succeeded, but later insert_resource()
> failed.
> 
> Test steps:
> kexec load -> /* make sure add crashkernel param eg. crashkernel=160M */
>     kexec reboot -> 
>         dmesg|grep "crashkernel reserved";
>             crashkernel memory range like below reserved successfully:
>             0x00000000d0000000 - 0x00000000da000000
>         But no such "Crash kernel" region in /proc/iomem
> 
> The background story is like below:
> 
> Currently E820 code reserves setup_data regions for both the current
> kernel and the kexec kernel, and it inserts them into the resources list.
> Before the kexec kernel reboots nobody passes the old setup_data, and
> kexec only passes fresh SETUP_EFI and SETUP_IMA if needed.  Thus the old
> setup data memory is not used at all.
> 
> Due to old kernel updates the kexec e820 table as well so kexec kernel
> sees them as E820_TYPE_RESERVED_KERN regions, and later the old setup_data
> regions are inserted into resources list in the kexec kernel by
> e820__reserve_resources().
> 
> Note, due to no setup_data is passed in for those old regions they are not
> early reserved (by function early_reserve_memory), and the crashkernel
> memblock reservation will just treat them as usable memory and it could
> reserve the crashkernel region which overlaps with the old setup_data
> regions. And just like the bug I noticed here, kdump insert_resource
> failed because e820__reserve_resources has added the overlapped chunks
> in /proc/iomem already.

wouldn't this be caused by
4a693ce65b186fddc1a73621bd6f941e6e3eca21 ("kdump: defer the
insertion of crashkernel resources")?

Before that the crashkernel resources were inserted from
arch_reserve_crashkernel() which is called before
e820__reserve_resources().

The semantics of E820_TYPE_RESERVED_KERN wrt kexec quite
inconsistent. It's treated as E820_TYPE_RAM by
e820__memblock_setup() and e820_type_to_iomem_type().

The problem we're seeing here is the result of the former.
e820__memblock_setup() will add the E820_TYPE_RESERVED_KERN
region to the memblock, merging with the neighbouring memblocks,
allowing crashkernel region to span across the originally
reserved area.

e820_type_to_iomem_type() treating E820_TYPE_RESERVED_KERN as
E820_TYPE_RAM will make the E820_TYPE_RESERVED_KERN appear as
system ram in /proc/iomem. If the old kexec_load (not
kexec_file_load) syscall is used, the userspace kexec utility
will construct the e820 table based on the contents of
/proc/iomem and the kexec kernel will see the
E820_TYPE_RESERVED_KERN range as E820_TYPE_RAM.  When
kexec_file_load is used the E820_TYPE_RESERVED_KERN type is
propagated to the kexec kernel by bzImage64_load() /
setup_e820_entries().


> Index: linux/arch/x86/kernel/e820.c
> ===================================================================
> --- linux.orig/arch/x86/kernel/e820.c
> +++ linux/arch/x86/kernel/e820.c
> @@ -1015,16 +1015,6 @@ void __init e820__reserve_setup_data(voi
>  		pa_next = data->next;
>  
>  		e820__range_update(pa_data, sizeof(*data)+data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> -
> -		/*
> -		 * SETUP_EFI and SETUP_IMA are supplied by kexec and do not need
> -		 * to be reserved.
> -		 */
> -		if (data->type != SETUP_EFI && data->type != SETUP_IMA)
> -			e820__range_update_kexec(pa_data,
> -						 sizeof(*data) + data->len,
> -						 E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> -

Your tree is missing this recent commit:
7fd817c906503b6813ea3b41f5fdf4192449a707 ("x86/e820: Don't
reserve SETUP_RNG_SEED in e820").

Wouldn't this fix [/paper over] your problem as well? I.e., isn't
SETUP_RNG_SEED the setup_data item that's causing your problem?

Regards,

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V2] x86/kexec: do not update E820 kexec table for setup_data
  2024-03-21 10:32   ` Jiri Bohac
@ 2024-03-22  2:17     ` Dave Young
  -1 siblings, 0 replies; 8+ messages in thread
From: Dave Young @ 2024-03-22  2:17 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, linux-kernel, kexec, Baoquan He, Eric Biederman,
	Huang, Kai

Hi Jiri,

On Thu, 21 Mar 2024 at 18:32, Jiri Bohac <jbohac@suse.cz> wrote:
>
> Hi,
>
> On Thu, Mar 21, 2024 at 05:23:20PM +0800, Dave Young wrote:
> > crashkernel reservation failed on a Thinkpad t440s laptop recently.
> > Actually the memblock reservation succeeded, but later insert_resource()
> > failed.
> >
> > Test steps:
> > kexec load -> /* make sure add crashkernel param eg. crashkernel=160M */
> >     kexec reboot ->
> >         dmesg|grep "crashkernel reserved";
> >             crashkernel memory range like below reserved successfully:
> >             0x00000000d0000000 - 0x00000000da000000
> >         But no such "Crash kernel" region in /proc/iomem
> >
> > The background story is like below:
> >
> > Currently E820 code reserves setup_data regions for both the current
> > kernel and the kexec kernel, and it inserts them into the resources list.
> > Before the kexec kernel reboots nobody passes the old setup_data, and
> > kexec only passes fresh SETUP_EFI and SETUP_IMA if needed.  Thus the old
> > setup data memory is not used at all.
> >
> > Due to old kernel updates the kexec e820 table as well so kexec kernel
> > sees them as E820_TYPE_RESERVED_KERN regions, and later the old setup_data
> > regions are inserted into resources list in the kexec kernel by
> > e820__reserve_resources().
> >
> > Note, due to no setup_data is passed in for those old regions they are not
> > early reserved (by function early_reserve_memory), and the crashkernel
> > memblock reservation will just treat them as usable memory and it could
> > reserve the crashkernel region which overlaps with the old setup_data
> > regions. And just like the bug I noticed here, kdump insert_resource
> > failed because e820__reserve_resources has added the overlapped chunks
> > in /proc/iomem already.
>
> wouldn't this be caused by
> 4a693ce65b186fddc1a73621bd6f941e6e3eca21 ("kdump: defer the
> insertion of crashkernel resources")?
>
> Before that the crashkernel resources were inserted from
> arch_reserve_crashkernel() which is called before
> e820__reserve_resources().

I think reverting the commit you mentioned can paper out this issue
but it is not
the root cause.  Yes, arch_reserve_crashkernel can succeed, then e820
still tries
to reserve the setup_data overlapping with crashkernel for another purpose.

>
> The semantics of E820_TYPE_RESERVED_KERN wrt kexec quite
> inconsistent. It's treated as E820_TYPE_RAM by
> e820__memblock_setup() and e820_type_to_iomem_type().
>
> The problem we're seeing here is the result of the former.
> e820__memblock_setup() will add the E820_TYPE_RESERVED_KERN
> region to the memblock, merging with the neighbouring memblocks,
> allowing crashkernel region to span across the originally
> reserved area.
>
> e820_type_to_iomem_type() treating E820_TYPE_RESERVED_KERN as
> E820_TYPE_RAM will make the E820_TYPE_RESERVED_KERN appear as
> system ram in /proc/iomem. If the old kexec_load (not
> kexec_file_load) syscall is used, the userspace kexec utility
> will construct the e820 table based on the contents of
> /proc/iomem and the kexec kernel will see the
> E820_TYPE_RESERVED_KERN range as E820_TYPE_RAM.  When
> kexec_file_load is used the E820_TYPE_RESERVED_KERN type is
> propagated to the kexec kernel by bzImage64_load() /
> setup_e820_entries().

This is true, but it does not matter for the kexec kernel as they are
only reserved for
the 1st kernel, and it is not meaningful to the kexec kernel.  Use
them as system ram
is fine in the 2nd kexec kernel.

>
>
> > Index: linux/arch/x86/kernel/e820.c
> > ===================================================================
> > --- linux.orig/arch/x86/kernel/e820.c
> > +++ linux/arch/x86/kernel/e820.c
> > @@ -1015,16 +1015,6 @@ void __init e820__reserve_setup_data(voi
> >               pa_next = data->next;
> >
> >               e820__range_update(pa_data, sizeof(*data)+data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> > -
> > -             /*
> > -              * SETUP_EFI and SETUP_IMA are supplied by kexec and do not need
> > -              * to be reserved.
> > -              */
> > -             if (data->type != SETUP_EFI && data->type != SETUP_IMA)
> > -                     e820__range_update_kexec(pa_data,
> > -                                              sizeof(*data) + data->len,
> > -                                              E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> > -
>
> Your tree is missing this recent commit:
> 7fd817c906503b6813ea3b41f5fdf4192449a707 ("x86/e820: Don't
> reserve SETUP_RNG_SEED in e820").
>
> Wouldn't this fix [/paper over] your problem as well? I.e., isn't
> SETUP_RNG_SEED the setup_data item that's causing your problem?

Thanks for catching this, I will rebase and repost.

But it does not "fix" the problem as my problem is related to the
other setup_data
range, I think it is SETUP_PCI (not 100% sure, but it is certainly not RNG_SEED)

>
> Regards,
>
> --
> Jiri Bohac <jbohac@suse.cz>
> SUSE Labs, Prague, Czechia
>
>
Thanks
Dave


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V2] x86/kexec: do not update E820 kexec table for setup_data
@ 2024-03-22  2:17     ` Dave Young
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Young @ 2024-03-22  2:17 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, linux-kernel, kexec, Baoquan He, Eric Biederman,
	Huang, Kai

Hi Jiri,

On Thu, 21 Mar 2024 at 18:32, Jiri Bohac <jbohac@suse.cz> wrote:
>
> Hi,
>
> On Thu, Mar 21, 2024 at 05:23:20PM +0800, Dave Young wrote:
> > crashkernel reservation failed on a Thinkpad t440s laptop recently.
> > Actually the memblock reservation succeeded, but later insert_resource()
> > failed.
> >
> > Test steps:
> > kexec load -> /* make sure add crashkernel param eg. crashkernel=160M */
> >     kexec reboot ->
> >         dmesg|grep "crashkernel reserved";
> >             crashkernel memory range like below reserved successfully:
> >             0x00000000d0000000 - 0x00000000da000000
> >         But no such "Crash kernel" region in /proc/iomem
> >
> > The background story is like below:
> >
> > Currently E820 code reserves setup_data regions for both the current
> > kernel and the kexec kernel, and it inserts them into the resources list.
> > Before the kexec kernel reboots nobody passes the old setup_data, and
> > kexec only passes fresh SETUP_EFI and SETUP_IMA if needed.  Thus the old
> > setup data memory is not used at all.
> >
> > Due to old kernel updates the kexec e820 table as well so kexec kernel
> > sees them as E820_TYPE_RESERVED_KERN regions, and later the old setup_data
> > regions are inserted into resources list in the kexec kernel by
> > e820__reserve_resources().
> >
> > Note, due to no setup_data is passed in for those old regions they are not
> > early reserved (by function early_reserve_memory), and the crashkernel
> > memblock reservation will just treat them as usable memory and it could
> > reserve the crashkernel region which overlaps with the old setup_data
> > regions. And just like the bug I noticed here, kdump insert_resource
> > failed because e820__reserve_resources has added the overlapped chunks
> > in /proc/iomem already.
>
> wouldn't this be caused by
> 4a693ce65b186fddc1a73621bd6f941e6e3eca21 ("kdump: defer the
> insertion of crashkernel resources")?
>
> Before that the crashkernel resources were inserted from
> arch_reserve_crashkernel() which is called before
> e820__reserve_resources().

I think reverting the commit you mentioned can paper out this issue
but it is not
the root cause.  Yes, arch_reserve_crashkernel can succeed, then e820
still tries
to reserve the setup_data overlapping with crashkernel for another purpose.

>
> The semantics of E820_TYPE_RESERVED_KERN wrt kexec quite
> inconsistent. It's treated as E820_TYPE_RAM by
> e820__memblock_setup() and e820_type_to_iomem_type().
>
> The problem we're seeing here is the result of the former.
> e820__memblock_setup() will add the E820_TYPE_RESERVED_KERN
> region to the memblock, merging with the neighbouring memblocks,
> allowing crashkernel region to span across the originally
> reserved area.
>
> e820_type_to_iomem_type() treating E820_TYPE_RESERVED_KERN as
> E820_TYPE_RAM will make the E820_TYPE_RESERVED_KERN appear as
> system ram in /proc/iomem. If the old kexec_load (not
> kexec_file_load) syscall is used, the userspace kexec utility
> will construct the e820 table based on the contents of
> /proc/iomem and the kexec kernel will see the
> E820_TYPE_RESERVED_KERN range as E820_TYPE_RAM.  When
> kexec_file_load is used the E820_TYPE_RESERVED_KERN type is
> propagated to the kexec kernel by bzImage64_load() /
> setup_e820_entries().

This is true, but it does not matter for the kexec kernel as they are
only reserved for
the 1st kernel, and it is not meaningful to the kexec kernel.  Use
them as system ram
is fine in the 2nd kexec kernel.

>
>
> > Index: linux/arch/x86/kernel/e820.c
> > ===================================================================
> > --- linux.orig/arch/x86/kernel/e820.c
> > +++ linux/arch/x86/kernel/e820.c
> > @@ -1015,16 +1015,6 @@ void __init e820__reserve_setup_data(voi
> >               pa_next = data->next;
> >
> >               e820__range_update(pa_data, sizeof(*data)+data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> > -
> > -             /*
> > -              * SETUP_EFI and SETUP_IMA are supplied by kexec and do not need
> > -              * to be reserved.
> > -              */
> > -             if (data->type != SETUP_EFI && data->type != SETUP_IMA)
> > -                     e820__range_update_kexec(pa_data,
> > -                                              sizeof(*data) + data->len,
> > -                                              E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
> > -
>
> Your tree is missing this recent commit:
> 7fd817c906503b6813ea3b41f5fdf4192449a707 ("x86/e820: Don't
> reserve SETUP_RNG_SEED in e820").
>
> Wouldn't this fix [/paper over] your problem as well? I.e., isn't
> SETUP_RNG_SEED the setup_data item that's causing your problem?

Thanks for catching this, I will rebase and repost.

But it does not "fix" the problem as my problem is related to the
other setup_data
range, I think it is SETUP_PCI (not 100% sure, but it is certainly not RNG_SEED)

>
> Regards,
>
> --
> Jiri Bohac <jbohac@suse.cz>
> SUSE Labs, Prague, Czechia
>
>
Thanks
Dave


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V2] x86/kexec: do not update E820 kexec table for setup_data
  2024-03-22  2:17     ` Dave Young
@ 2024-03-22  5:20       ` Dave Young
  -1 siblings, 0 replies; 8+ messages in thread
From: Dave Young @ 2024-03-22  5:20 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, linux-kernel, kexec, Baoquan He, Eric Biederman,
	Huang, Kai

> > Your tree is missing this recent commit:
> > 7fd817c906503b6813ea3b41f5fdf4192449a707 ("x86/e820: Don't
> > reserve SETUP_RNG_SEED in e820").
> >
> > Wouldn't this fix [/paper over] your problem as well? I.e., isn't
> > SETUP_RNG_SEED the setup_data item that's causing your problem?
>
> Thanks for catching this, I will rebase and repost.
>
> But it does not "fix" the problem as my problem is related to the
> other setup_data
> range, I think it is SETUP_PCI (not 100% sure, but it is certainly not RNG_SEED)
>

The webmail reply broke the lines randomly, sorry for that.  I have
resent a rebased version.  And I also confirmed that in my case it was
SETUP_PCI caused the issue.   Note, this SETUP_PCI is from previous
physical bootup, the old kernel reserved it in kexec e820, it is not
the RNG_SEED which was passed in by kexec.  I believe the RND_SEED
region could cause issues only on the 3rd+ boot with a kernel without
the commit you mentioned.

It is a little tricky, I suppose not obvious to understand..

Thanks
Dave


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH V2] x86/kexec: do not update E820 kexec table for setup_data
@ 2024-03-22  5:20       ` Dave Young
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Young @ 2024-03-22  5:20 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: x86, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, linux-kernel, kexec, Baoquan He, Eric Biederman,
	Huang, Kai

> > Your tree is missing this recent commit:
> > 7fd817c906503b6813ea3b41f5fdf4192449a707 ("x86/e820: Don't
> > reserve SETUP_RNG_SEED in e820").
> >
> > Wouldn't this fix [/paper over] your problem as well? I.e., isn't
> > SETUP_RNG_SEED the setup_data item that's causing your problem?
>
> Thanks for catching this, I will rebase and repost.
>
> But it does not "fix" the problem as my problem is related to the
> other setup_data
> range, I think it is SETUP_PCI (not 100% sure, but it is certainly not RNG_SEED)
>

The webmail reply broke the lines randomly, sorry for that.  I have
resent a rebased version.  And I also confirmed that in my case it was
SETUP_PCI caused the issue.   Note, this SETUP_PCI is from previous
physical bootup, the old kernel reserved it in kexec e820, it is not
the RNG_SEED which was passed in by kexec.  I believe the RND_SEED
region could cause issues only on the 3rd+ boot with a kernel without
the commit you mentioned.

It is a little tricky, I suppose not obvious to understand..

Thanks
Dave


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-03-22  5:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-21  9:23 [PATCH V2] x86/kexec: do not update E820 kexec table for setup_data Dave Young
2024-03-21  9:23 ` Dave Young
2024-03-21 10:32 ` Jiri Bohac
2024-03-21 10:32   ` Jiri Bohac
2024-03-22  2:17   ` Dave Young
2024-03-22  2:17     ` Dave Young
2024-03-22  5:20     ` Dave Young
2024-03-22  5:20       ` Dave Young

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.