All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Question about Address Range Validation in Crash Kernel Allocation
@ 2024-03-21  9:17 chenhaixiang (A)
  2024-03-21  9:48 ` Li Huafei
  0 siblings, 1 reply; 17+ messages in thread
From: chenhaixiang (A) @ 2024-03-21  9:17 UTC (permalink / raw)
  To: Baoquan He
  Cc: kexec, chenhuacai, x86, Louhongxiang, wangbin (A),
	Fangchuangchuang(Fcc,Euler), lihuafei, wanghai (M),
	Wangkefeng (OS Kernel Lab)


> > I'm sorry for the delay. Here are some details from the boot log and
> /proc/iomem:
> > The Boot log:
> > [    0.000000] Linux version 6.8.0 (root@localhost.localdomain) (gcc (GCC)
> 10.3.1, GNU ld (GNU Binutils) 2.37) #3 SMP PREEMPT_DYNAMIC Wed Mar 20
> 11:46:11 UTC 2024
> > [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.8.0
> root=/dev/mapper/root ro crashkernel=512M resume=/dev/mapper/swap
> rd.lvm.lv=root rd.lvm.lv=swap crash_kexec_post_notifiers softlockup_panic=1
> reserve_kbox_mem=16M fsck.mode=auto fsck.repair=yes panic=3
> nmi_watchdog=1 quiet rd.shell=0 memblock=debug efi=debug
> console=ttyS0,115200n8 console=tty0
> ......snip...
> > [    0.022622] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
> from=0x0000000000000000 max_addr=0x0000000100000000
> reserve_crashkernel_generic+0x7c/0x220
> > [    0.022628] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
> from=0x0000000100000000 max_addr=0x0000400000000000
> reserve_crashkernel_generic+0x7c/0x220
> > [    0.022632] memblock_reserve: [0x000000c01f000000-0x000000c03effffff]
> memblock_alloc_range_nid+0xee/0x170
> > [    0.022634] memblock_phys_alloc_range: 268435456 bytes align=0x1000000
> from=0x0000000000000000 max_addr=0x0000000100000000
> reserve_crashkernel_generic+0x11d/0x220
> > [    0.022638] memblock_reserve: [0x0000000049000000-0x0000000058ffffff]
> memblock_alloc_range_nid+0xee/0x170
> > [    0.022640] crashkernel low memory reserved: 0x49000000 - 0x59000000
> (256 MB)
> > [    0.022641] crashkernel reserved: 0x000000c01f000000 -
> 0x000000c03f000000 (512 MB)
> 
> Here, crashkernel,low is reserved in region:  [0x49000000 - 0x59000000] (256
> MB)
>       crashkernel,high is reserved in region: [0x000000c01f000000 -
> 0x000000c03f000000] (512 MB) ......
> > [    0.029839] memblock_reserve: [0x000000c03ffff740-0x000000c03fffff7f]
> memblock_alloc_range_nid+0xee/0x170
> > [    0.029843] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
> reserved
> > [    0.029861] TSC deadline timer available
> 
> Then here, region [0x53cbd000-0x53ccffff] is reserved in e820, and print abvoe
> "usable ==> reserved". This should be the step which prevents earlier reserved
> crashkernel,low from being added to iomem tree. I am not sure what triggered
> the e820 update.
Current analysis suggests that efi_reserve_boot_services() is causing the update of the e820 table.

> 
> How do you boot into your new 6.8.0 kernel? Used kexec -l to jump into the 2nd
> kernel, or reboot from bios/firmware boot up into 6.8.0?
It's reboot from bios boot up into 6.8.0. I attempted to revert the below patch,
 and this time the conflicting segment "53cbd000-53ccffff" also appeared in the /proc/iomem
 of the 6.8 kernel.

2d4fd058-60efefff : System RAM
  2d4fd058-58ffffff : System RAM
    49000000-58ffffff : Crash kernel
      53cbd000-53ccffff : Reserved
60eff000-704fefff : Reserved
--
  93dd424000-93dd9fffff : Kernel bss
  c01f000000-c03effffff : Crash kernel
d0000000000-d0fffffffff : PCI Bus 0000:00
  d0000000000-d00001fffff : PCI Bus 0000:01
> 
> Reverting below commit should fix your problem, can you try it?
> 
> commit 4a693ce65b186fddc1a73621bd6f941e6e3eca21
> Author: Huacai Chen <chenhuacai@kernel.org>
> Date:   Fri Dec 29 16:02:13 2023 +0800
> 
>     kdump: defer the insertion of crashkernel resources


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Address Range Validation in Crash Kernel Allocation
  2024-03-21  9:17 Question about Address Range Validation in Crash Kernel Allocation chenhaixiang (A)
@ 2024-03-21  9:48 ` Li Huafei
  2024-03-21 10:06   ` Dave Young
  0 siblings, 1 reply; 17+ messages in thread
From: Li Huafei @ 2024-03-21  9:48 UTC (permalink / raw)
  To: chenhaixiang (A), Baoquan He
  Cc: kexec, chenhuacai, x86, Louhongxiang, wangbin (A),
	Fangchuangchuang(Fcc,Euler), wanghai (M),
	Wangkefeng (OS Kernel Lab)

Hi Baoquan,

On 2024/3/21 17:17, chenhaixiang (A) wrote:
> 
>>> I'm sorry for the delay. Here are some details from the boot log and
>> /proc/iomem:
>>> The Boot log:
>>> [    0.000000] Linux version 6.8.0 (root@localhost.localdomain) (gcc (GCC)
>> 10.3.1, GNU ld (GNU Binutils) 2.37) #3 SMP PREEMPT_DYNAMIC Wed Mar 20
>> 11:46:11 UTC 2024
>>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.8.0
>> root=/dev/mapper/root ro crashkernel=512M resume=/dev/mapper/swap
>> rd.lvm.lv=root rd.lvm.lv=swap crash_kexec_post_notifiers softlockup_panic=1
>> reserve_kbox_mem=16M fsck.mode=auto fsck.repair=yes panic=3
>> nmi_watchdog=1 quiet rd.shell=0 memblock=debug efi=debug
>> console=ttyS0,115200n8 console=tty0
>> ......snip...
>>> [    0.022622] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
>> from=0x0000000000000000 max_addr=0x0000000100000000
>> reserve_crashkernel_generic+0x7c/0x220
>>> [    0.022628] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
>> from=0x0000000100000000 max_addr=0x0000400000000000
>> reserve_crashkernel_generic+0x7c/0x220
>>> [    0.022632] memblock_reserve: [0x000000c01f000000-0x000000c03effffff]
>> memblock_alloc_range_nid+0xee/0x170
>>> [    0.022634] memblock_phys_alloc_range: 268435456 bytes align=0x1000000
>> from=0x0000000000000000 max_addr=0x0000000100000000
>> reserve_crashkernel_generic+0x11d/0x220
>>> [    0.022638] memblock_reserve: [0x0000000049000000-0x0000000058ffffff]
>> memblock_alloc_range_nid+0xee/0x170
>>> [    0.022640] crashkernel low memory reserved: 0x49000000 - 0x59000000
>> (256 MB)
>>> [    0.022641] crashkernel reserved: 0x000000c01f000000 -
>> 0x000000c03f000000 (512 MB)
>>
>> Here, crashkernel,low is reserved in region:  [0x49000000 - 0x59000000] (256
>> MB)
>>       crashkernel,high is reserved in region: [0x000000c01f000000 -
>> 0x000000c03f000000] (512 MB) ......
>>> [    0.029839] memblock_reserve: [0x000000c03ffff740-0x000000c03fffff7f]
>> memblock_alloc_range_nid+0xee/0x170
>>> [    0.029843] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
>> reserved
>>> [    0.029861] TSC deadline timer available
>>
>> Then here, region [0x53cbd000-0x53ccffff] is reserved in e820, and print abvoe
>> "usable ==> reserved". This should be the step which prevents earlier reserved
>> crashkernel,low from being added to iomem tree. I am not sure what triggered
>> the e820 update.

We added dump_stack () printing in efi_mem_reserve () and found that
[0x53cbd000-0x53ccffff] was reserved by BGRT:

  [    0.032259] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
reserved
  [    0.032262] CPU: 0 PID: 0 Comm: swapper Not tainted
5.10.0-60.18.0.50.h820.eulerosv2r11.x86_64 #7
  [    0.032263] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 8.25
08/30/2022
  [    0.032264] Call Trace:
  [    0.032265]  ? dump_stack+0x57/0x6e
  [    0.032267]  ? bgrt_init+0xc2/0xc2
  [    0.032268]  ? __e820__range_update+0x7a/0x1d6
  [    0.032270]  ? bgrt_init+0xc2/0xc2
  [    0.032272]  ? bgrt_init+0xc2/0xc2
  [    0.032274]  ? efi_arch_mem_reserve+0x1a3/0x1d0
  [    0.032276]  ? efi_mem_reserve+0x2d/0x42
  [    0.032278]  ? acpi_parse_bgrt+0xa/0x11
  [    0.032279]  ? acpi_table_parse+0x86/0xbc
  [    0.032281]  ? acpi_boot_init+0x79/0xad
  [    0.032282]  ? setup_arch+0x835/0x954
  [    0.032284]  ? start_kernel+0x5d/0x455
  [    0.032286]  ? secondary_startup_64_no_verify+0xc2/0xcb

efi_reserve_boot_services() has reserved memory of type
EFI_BOOT_SERVICES_CODE & EFI_BOOT_SERVICES_DATA  before crashkernel.
efi_bgrt_init() assumes that EFI_BOOT_SERVICES_DATA is not reserved by
other modules. Then, the e820_table is directly updated, and the BGRT
memory is reserved.

However, memblock_is_region_reserved() in efi_reserve_boot_services()
returns true when the ranges only overlap.

     already_reserved = memblock_is_region_reserved(start, size);

     /*
      * Because the following memblock_reserve() is paired
      * with memblock_free_late() for this region in
      * efi_free_boot_services(), we must be extremely
      * careful not to reserve, and subsequently free,
      * critical regions of memory (like the kernel image) or
      * those regions that somebody else has already
      * reserved.
      *
      * A good example of a critical region that must not be
      * freed is page zero (first 4Kb of memory), which may
      * contain boot services code/data but is marked
      * E820_TYPE_RESERVED by trim_bios_range().
      */
     if (!already_reserved) {
             memblock_reserve(start, size);

             /*
              * If we are the first to reserve the region, no
              * one else cares about it. We own it and can
              * free it later.
              */
             if (can_free_region(start, size))
                     continue;
     }

As a result, some memory of EFI_BOOT_SERVICES_DATA is not reserved in
advance. The subsequent crashkernel happens to reserve this portion of
memory, which conflicts with BGRT.

> Current analysis suggests that efi_reserve_boot_services() is causing the update of the e820 table.
> 
>>
>> How do you boot into your new 6.8.0 kernel? Used kexec -l to jump into the 2nd
>> kernel, or reboot from bios/firmware boot up into 6.8.0?
> It's reboot from bios boot up into 6.8.0. I attempted to revert the below patch,
>  and this time the conflicting segment "53cbd000-53ccffff" also appeared in the /proc/iomem
>  of the 6.8 kernel.
> 
> 2d4fd058-60efefff : System RAM
>   2d4fd058-58ffffff : System RAM
>     49000000-58ffffff : Crash kernel
>       53cbd000-53ccffff : Reserved
> 60eff000-704fefff : Reserved
> --
>   93dd424000-93dd9fffff : Kernel bss
>   c01f000000-c03effffff : Crash kernel
> d0000000000-d0fffffffff : PCI Bus 0000:00
>   d0000000000-d00001fffff : PCI Bus 0000:01
>>
>> Reverting below commit should fix your problem, can you try it?
>>
>> commit 4a693ce65b186fddc1a73621bd6f941e6e3eca21
>> Author: Huacai Chen <chenhuacai@kernel.org>
>> Date:   Fri Dec 29 16:02:13 2023 +0800
>>
>>     kdump: defer the insertion of crashkernel resources
> 
> .
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Address Range Validation in Crash Kernel Allocation
  2024-03-21  9:48 ` Li Huafei
@ 2024-03-21 10:06   ` Dave Young
  2024-03-21 12:37     ` Li Huafei
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Young @ 2024-03-21 10:06 UTC (permalink / raw)
  To: Li Huafei
  Cc: chenhaixiang (A),
	Baoquan He, kexec, chenhuacai, x86, Louhongxiang, wangbin (A),
	Fangchuangchuang(Fcc,Euler), wanghai (M),
	Wangkefeng (OS Kernel Lab)

Hi,

On Thu, 21 Mar 2024 at 17:49, Li Huafei <lihuafei1@huawei.com> wrote:
>
> Hi Baoquan,
>
> On 2024/3/21 17:17, chenhaixiang (A) wrote:
> >
> >>> I'm sorry for the delay. Here are some details from the boot log and
> >> /proc/iomem:
> >>> The Boot log:
> >>> [    0.000000] Linux version 6.8.0 (root@localhost.localdomain) (gcc (GCC)
> >> 10.3.1, GNU ld (GNU Binutils) 2.37) #3 SMP PREEMPT_DYNAMIC Wed Mar 20
> >> 11:46:11 UTC 2024
> >>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.8.0
> >> root=/dev/mapper/root ro crashkernel=512M resume=/dev/mapper/swap
> >> rd.lvm.lv=root rd.lvm.lv=swap crash_kexec_post_notifiers softlockup_panic=1
> >> reserve_kbox_mem=16M fsck.mode=auto fsck.repair=yes panic=3
> >> nmi_watchdog=1 quiet rd.shell=0 memblock=debug efi=debug
> >> console=ttyS0,115200n8 console=tty0
> >> ......snip...
> >>> [    0.022622] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
> >> from=0x0000000000000000 max_addr=0x0000000100000000
> >> reserve_crashkernel_generic+0x7c/0x220
> >>> [    0.022628] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
> >> from=0x0000000100000000 max_addr=0x0000400000000000
> >> reserve_crashkernel_generic+0x7c/0x220
> >>> [    0.022632] memblock_reserve: [0x000000c01f000000-0x000000c03effffff]
> >> memblock_alloc_range_nid+0xee/0x170
> >>> [    0.022634] memblock_phys_alloc_range: 268435456 bytes align=0x1000000
> >> from=0x0000000000000000 max_addr=0x0000000100000000
> >> reserve_crashkernel_generic+0x11d/0x220
> >>> [    0.022638] memblock_reserve: [0x0000000049000000-0x0000000058ffffff]
> >> memblock_alloc_range_nid+0xee/0x170
> >>> [    0.022640] crashkernel low memory reserved: 0x49000000 - 0x59000000
> >> (256 MB)
> >>> [    0.022641] crashkernel reserved: 0x000000c01f000000 -
> >> 0x000000c03f000000 (512 MB)
> >>
> >> Here, crashkernel,low is reserved in region:  [0x49000000 - 0x59000000] (256
> >> MB)
> >>       crashkernel,high is reserved in region: [0x000000c01f000000 -
> >> 0x000000c03f000000] (512 MB) ......
> >>> [    0.029839] memblock_reserve: [0x000000c03ffff740-0x000000c03fffff7f]
> >> memblock_alloc_range_nid+0xee/0x170
> >>> [    0.029843] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
> >> reserved
> >>> [    0.029861] TSC deadline timer available
> >>
> >> Then here, region [0x53cbd000-0x53ccffff] is reserved in e820, and print abvoe
> >> "usable ==> reserved". This should be the step which prevents earlier reserved
> >> crashkernel,low from being added to iomem tree. I am not sure what triggered
> >> the e820 update.
>
> We added dump_stack () printing in efi_mem_reserve () and found that
> [0x53cbd000-0x53ccffff] was reserved by BGRT:
>
>   [    0.032259] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
> reserved
>   [    0.032262] CPU: 0 PID: 0 Comm: swapper Not tainted
> 5.10.0-60.18.0.50.h820.eulerosv2r11.x86_64 #7
>   [    0.032263] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 8.25
> 08/30/2022
>   [    0.032264] Call Trace:
>   [    0.032265]  ? dump_stack+0x57/0x6e
>   [    0.032267]  ? bgrt_init+0xc2/0xc2
>   [    0.032268]  ? __e820__range_update+0x7a/0x1d6
>   [    0.032270]  ? bgrt_init+0xc2/0xc2
>   [    0.032272]  ? bgrt_init+0xc2/0xc2
>   [    0.032274]  ? efi_arch_mem_reserve+0x1a3/0x1d0
>   [    0.032276]  ? efi_mem_reserve+0x2d/0x42
>   [    0.032278]  ? acpi_parse_bgrt+0xa/0x11
>   [    0.032279]  ? acpi_table_parse+0x86/0xbc
>   [    0.032281]  ? acpi_boot_init+0x79/0xad
>   [    0.032282]  ? setup_arch+0x835/0x954
>   [    0.032284]  ? start_kernel+0x5d/0x455
>   [    0.032286]  ? secondary_startup_64_no_verify+0xc2/0xcb
>
> efi_reserve_boot_services() has reserved memory of type
> EFI_BOOT_SERVICES_CODE & EFI_BOOT_SERVICES_DATA  before crashkernel.
> efi_bgrt_init() assumes that EFI_BOOT_SERVICES_DATA is not reserved by
> other modules. Then, the e820_table is directly updated, and the BGRT
> memory is reserved.
>
> However, memblock_is_region_reserved() in efi_reserve_boot_services()
> returns true when the ranges only overlap.
>
>      already_reserved = memblock_is_region_reserved(start, size);

Do you mean efi_reserve_boot_services is supposed to reserve the bgrt
memory but it does not reserve it due to the region overlapping with
some other reserved region?  If so can you debug and find what exact
memblock reserved region overlaps with the bgrt?

BTW, the previous email threads are weird, and not threading
correctly, hard to find information.

>
>      /*
>       * Because the following memblock_reserve() is paired
>       * with memblock_free_late() for this region in
>       * efi_free_boot_services(), we must be extremely
>       * careful not to reserve, and subsequently free,
>       * critical regions of memory (like the kernel image) or
>       * those regions that somebody else has already
>       * reserved.
>       *
>       * A good example of a critical region that must not be
>       * freed is page zero (first 4Kb of memory), which may
>       * contain boot services code/data but is marked
>       * E820_TYPE_RESERVED by trim_bios_range().
>       */
>      if (!already_reserved) {
>              memblock_reserve(start, size);
>
>              /*
>               * If we are the first to reserve the region, no
>               * one else cares about it. We own it and can
>               * free it later.
>               */
>              if (can_free_region(start, size))
>                      continue;
>      }
>
> As a result, some memory of EFI_BOOT_SERVICES_DATA is not reserved in
> advance. The subsequent crashkernel happens to reserve this portion of
> memory, which conflicts with BGRT.
>
> > Current analysis suggests that efi_reserve_boot_services() is causing the update of the e820 table.
> >
> >>
> >> How do you boot into your new 6.8.0 kernel? Used kexec -l to jump into the 2nd
> >> kernel, or reboot from bios/firmware boot up into 6.8.0?
> > It's reboot from bios boot up into 6.8.0. I attempted to revert the below patch,
> >  and this time the conflicting segment "53cbd000-53ccffff" also appeared in the /proc/iomem
> >  of the 6.8 kernel.
> >
> > 2d4fd058-60efefff : System RAM
> >   2d4fd058-58ffffff : System RAM
> >     49000000-58ffffff : Crash kernel
> >       53cbd000-53ccffff : Reserved
> > 60eff000-704fefff : Reserved
> > --
> >   93dd424000-93dd9fffff : Kernel bss
> >   c01f000000-c03effffff : Crash kernel
> > d0000000000-d0fffffffff : PCI Bus 0000:00
> >   d0000000000-d00001fffff : PCI Bus 0000:01
> >>
> >> Reverting below commit should fix your problem, can you try it?
> >>
> >> commit 4a693ce65b186fddc1a73621bd6f941e6e3eca21
> >> Author: Huacai Chen <chenhuacai@kernel.org>
> >> Date:   Fri Dec 29 16:02:13 2023 +0800
> >>
> >>     kdump: defer the insertion of crashkernel resources
> >
> > .
> >
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Address Range Validation in Crash Kernel Allocation
  2024-03-21 10:06   ` Dave Young
@ 2024-03-21 12:37     ` Li Huafei
  2024-03-22  1:16       ` Baoquan He
  2024-03-22  7:18         ` Dave Young
  0 siblings, 2 replies; 17+ messages in thread
From: Li Huafei @ 2024-03-21 12:37 UTC (permalink / raw)
  To: Dave Young
  Cc: chenhaixiang (A),
	Baoquan He, kexec, chenhuacai, x86, Louhongxiang, wangbin (A),
	Fangchuangchuang(Fcc,Euler), wanghai (M),
	Wangkefeng (OS Kernel Lab)



On 2024/3/21 18:06, Dave Young wrote:
> Hi,
> 
> On Thu, 21 Mar 2024 at 17:49, Li Huafei <lihuafei1@huawei.com> wrote:
>>
>> Hi Baoquan,
>>
>> On 2024/3/21 17:17, chenhaixiang (A) wrote:
>>>
>>>>> I'm sorry for the delay. Here are some details from the boot log and
>>>> /proc/iomem:
>>>>> The Boot log:
>>>>> [    0.000000] Linux version 6.8.0 (root@localhost.localdomain) (gcc (GCC)
>>>> 10.3.1, GNU ld (GNU Binutils) 2.37) #3 SMP PREEMPT_DYNAMIC Wed Mar 20
>>>> 11:46:11 UTC 2024
>>>>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.8.0
>>>> root=/dev/mapper/root ro crashkernel=512M resume=/dev/mapper/swap
>>>> rd.lvm.lv=root rd.lvm.lv=swap crash_kexec_post_notifiers softlockup_panic=1
>>>> reserve_kbox_mem=16M fsck.mode=auto fsck.repair=yes panic=3
>>>> nmi_watchdog=1 quiet rd.shell=0 memblock=debug efi=debug
>>>> console=ttyS0,115200n8 console=tty0
>>>> ......snip...
>>>>> [    0.022622] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
>>>> from=0x0000000000000000 max_addr=0x0000000100000000
>>>> reserve_crashkernel_generic+0x7c/0x220
>>>>> [    0.022628] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
>>>> from=0x0000000100000000 max_addr=0x0000400000000000
>>>> reserve_crashkernel_generic+0x7c/0x220
>>>>> [    0.022632] memblock_reserve: [0x000000c01f000000-0x000000c03effffff]
>>>> memblock_alloc_range_nid+0xee/0x170
>>>>> [    0.022634] memblock_phys_alloc_range: 268435456 bytes align=0x1000000
>>>> from=0x0000000000000000 max_addr=0x0000000100000000
>>>> reserve_crashkernel_generic+0x11d/0x220
>>>>> [    0.022638] memblock_reserve: [0x0000000049000000-0x0000000058ffffff]
>>>> memblock_alloc_range_nid+0xee/0x170
>>>>> [    0.022640] crashkernel low memory reserved: 0x49000000 - 0x59000000
>>>> (256 MB)
>>>>> [    0.022641] crashkernel reserved: 0x000000c01f000000 -
>>>> 0x000000c03f000000 (512 MB)
>>>>
>>>> Here, crashkernel,low is reserved in region:  [0x49000000 - 0x59000000] (256
>>>> MB)
>>>>       crashkernel,high is reserved in region: [0x000000c01f000000 -
>>>> 0x000000c03f000000] (512 MB) ......
>>>>> [    0.029839] memblock_reserve: [0x000000c03ffff740-0x000000c03fffff7f]
>>>> memblock_alloc_range_nid+0xee/0x170
>>>>> [    0.029843] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
>>>> reserved
>>>>> [    0.029861] TSC deadline timer available
>>>>
>>>> Then here, region [0x53cbd000-0x53ccffff] is reserved in e820, and print abvoe
>>>> "usable ==> reserved". This should be the step which prevents earlier reserved
>>>> crashkernel,low from being added to iomem tree. I am not sure what triggered
>>>> the e820 update.
>>
>> We added dump_stack () printing in efi_mem_reserve () and found that
>> [0x53cbd000-0x53ccffff] was reserved by BGRT:
>>
>>   [    0.032259] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
>> reserved
>>   [    0.032262] CPU: 0 PID: 0 Comm: swapper Not tainted
>> 5.10.0-60.18.0.50.h820.eulerosv2r11.x86_64 #7
>>   [    0.032263] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 8.25
>> 08/30/2022
>>   [    0.032264] Call Trace:
>>   [    0.032265]  ? dump_stack+0x57/0x6e
>>   [    0.032267]  ? bgrt_init+0xc2/0xc2
>>   [    0.032268]  ? __e820__range_update+0x7a/0x1d6
>>   [    0.032270]  ? bgrt_init+0xc2/0xc2
>>   [    0.032272]  ? bgrt_init+0xc2/0xc2
>>   [    0.032274]  ? efi_arch_mem_reserve+0x1a3/0x1d0
>>   [    0.032276]  ? efi_mem_reserve+0x2d/0x42
>>   [    0.032278]  ? acpi_parse_bgrt+0xa/0x11
>>   [    0.032279]  ? acpi_table_parse+0x86/0xbc
>>   [    0.032281]  ? acpi_boot_init+0x79/0xad
>>   [    0.032282]  ? setup_arch+0x835/0x954
>>   [    0.032284]  ? start_kernel+0x5d/0x455
>>   [    0.032286]  ? secondary_startup_64_no_verify+0xc2/0xcb
>>
>> efi_reserve_boot_services() has reserved memory of type
>> EFI_BOOT_SERVICES_CODE & EFI_BOOT_SERVICES_DATA  before crashkernel.
>> efi_bgrt_init() assumes that EFI_BOOT_SERVICES_DATA is not reserved by
>> other modules. Then, the e820_table is directly updated, and the BGRT
>> memory is reserved.
>>
>> However, memblock_is_region_reserved() in efi_reserve_boot_services()
>> returns true when the ranges only overlap.
>>
>>      already_reserved = memblock_is_region_reserved(start, size);
> 
> Do you mean efi_reserve_boot_services is supposed to reserve the bgrt
> memory but it does not reserve it due to the region overlapping with
> some other reserved region?  If so can you debug and find what exact
> memblock reserved region overlaps with the bgrt?

Yes. I added the following debug print to efi_reserve_boot_services():

--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -339,6 +339,10 @@ void __init efi_reserve_boot_services(void)

                already_reserved = memblock_is_region_reserved(start, size);

+               pr_info("kdumpdebug: efi_reserve_boot_services start 0x%lu, "
+                       "size 0x%lx, type 0x%lx, already_reserved %d\n",
+                       start, size, md->type, already_reserved);
+
                /*
                 * Because the following memblock_reserve() is paired
                 * with memblock_free_late() for this region in

This memory [0x0000005976a018-0x00000005976abc7] is reserved here, which belongs to EFI_BOOT_SERVICES_DATA.
    [    0.000000] memblock_reserve: [0x000000005976a018-0x000000005976abc7] efi_memattr_init+0x51/0xa0
It falls in the following range
    [    0.000000] efi: mem22: [Boot Data   |   |  |  |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000051329000-0x000000005cefefff] (187MB)

in efi_reserve_boot_services(), [0x0000005132900-0x00000005cefeff] will not be fully reserved because [0x0000005976a018-0x0000005976abc7]
has already been reserved and overlaps with [0x0000005976a018-0x0000005976abc7]

    [    0.021316] efi: kdumpdebug: efi_reserve_boot_services start 0x51329000, size 0xbbd6000, type 0x4, already_reserved 1

It is not reserved by memblock, this free memory region is allocated by crashkernel

    [    0.022597] crashkernel low memory reserved: 0x49000000 - 0x59000000 (256 MB)
    [    0.022599] crashkernel reserved: 0x000000c01f000000 - 0x000000c03f000000 (512 MB)

In efi_bgrt_init (), it is assumed that the memory of the EFI_BOOT_SERVICES_DATA type has been successfully 
reserved. Therefore, the address in the range is directly used. As a result, the memory overlaps with
the crashkernel region. 

    [    0.029694] e820: update [mem 0x53cbd000-0x53ccffff] usable ==> reserved
> 
> BTW, the previous email threads are weird, and not threading
> correctly, hard to find information.

It should be because the log content is too large and has been put on hold. In my previous email, I received a prompt:

 The reason it is being held:

    Message body is too big: 248998 bytes with a limit of 40 KB


> 
>>
>>      /*
>>       * Because the following memblock_reserve() is paired
>>       * with memblock_free_late() for this region in
>>       * efi_free_boot_services(), we must be extremely
>>       * careful not to reserve, and subsequently free,
>>       * critical regions of memory (like the kernel image) or
>>       * those regions that somebody else has already
>>       * reserved.
>>       *
>>       * A good example of a critical region that must not be
>>       * freed is page zero (first 4Kb of memory), which may
>>       * contain boot services code/data but is marked
>>       * E820_TYPE_RESERVED by trim_bios_range().
>>       */
>>      if (!already_reserved) {
>>              memblock_reserve(start, size);
>>
>>              /*
>>               * If we are the first to reserve the region, no
>>               * one else cares about it. We own it and can
>>               * free it later.
>>               */
>>              if (can_free_region(start, size))
>>                      continue;
>>      }
>>
>> As a result, some memory of EFI_BOOT_SERVICES_DATA is not reserved in
>> advance. The subsequent crashkernel happens to reserve this portion of
>> memory, which conflicts with BGRT.
>>
>>> Current analysis suggests that efi_reserve_boot_services() is causing the update of the e820 table.
>>>
>>>>
>>>> How do you boot into your new 6.8.0 kernel? Used kexec -l to jump into the 2nd
>>>> kernel, or reboot from bios/firmware boot up into 6.8.0?
>>> It's reboot from bios boot up into 6.8.0. I attempted to revert the below patch,
>>>  and this time the conflicting segment "53cbd000-53ccffff" also appeared in the /proc/iomem
>>>  of the 6.8 kernel.
>>>
>>> 2d4fd058-60efefff : System RAM
>>>   2d4fd058-58ffffff : System RAM
>>>     49000000-58ffffff : Crash kernel
>>>       53cbd000-53ccffff : Reserved
>>> 60eff000-704fefff : Reserved
>>> --
>>>   93dd424000-93dd9fffff : Kernel bss
>>>   c01f000000-c03effffff : Crash kernel
>>> d0000000000-d0fffffffff : PCI Bus 0000:00
>>>   d0000000000-d00001fffff : PCI Bus 0000:01
>>>>
>>>> Reverting below commit should fix your problem, can you try it?
>>>>
>>>> commit 4a693ce65b186fddc1a73621bd6f941e6e3eca21
>>>> Author: Huacai Chen <chenhuacai@kernel.org>
>>>> Date:   Fri Dec 29 16:02:13 2023 +0800
>>>>
>>>>     kdump: defer the insertion of crashkernel resources
>>>
>>> .
>>>
>>
>> _______________________________________________
>> kexec mailing list
>> kexec@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/kexec
> 
> .
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Address Range Validation in Crash Kernel Allocation
  2024-03-21 12:37     ` Li Huafei
@ 2024-03-22  1:16       ` Baoquan He
  2024-03-22  7:26         ` Dave Young
  2024-03-22  7:18         ` Dave Young
  1 sibling, 1 reply; 17+ messages in thread
From: Baoquan He @ 2024-03-22  1:16 UTC (permalink / raw)
  To: Li Huafei
  Cc: Dave Young, chenhaixiang (A),
	kexec, chenhuacai, x86, Louhongxiang, wangbin (A),
	Fangchuangchuang(Fcc,Euler), wanghai (M),
	Wangkefeng (OS Kernel Lab)

On 03/21/24 at 08:37pm, Li Huafei wrote:
> 
> 
> On 2024/3/21 18:06, Dave Young wrote:
> > Hi,
> > 
> > On Thu, 21 Mar 2024 at 17:49, Li Huafei <lihuafei1@huawei.com> wrote:
> >>
> >> Hi Baoquan,
> >>
> >> On 2024/3/21 17:17, chenhaixiang (A) wrote:
> >>>
> >>>>> I'm sorry for the delay. Here are some details from the boot log and
> >>>> /proc/iomem:
> >>>>> The Boot log:
> >>>>> [    0.000000] Linux version 6.8.0 (root@localhost.localdomain) (gcc (GCC)
> >>>> 10.3.1, GNU ld (GNU Binutils) 2.37) #3 SMP PREEMPT_DYNAMIC Wed Mar 20
> >>>> 11:46:11 UTC 2024
> >>>>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.8.0
> >>>> root=/dev/mapper/root ro crashkernel=512M resume=/dev/mapper/swap
> >>>> rd.lvm.lv=root rd.lvm.lv=swap crash_kexec_post_notifiers softlockup_panic=1
> >>>> reserve_kbox_mem=16M fsck.mode=auto fsck.repair=yes panic=3
> >>>> nmi_watchdog=1 quiet rd.shell=0 memblock=debug efi=debug
> >>>> console=ttyS0,115200n8 console=tty0
> >>>> ......snip...
> >>>>> [    0.022622] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
> >>>> from=0x0000000000000000 max_addr=0x0000000100000000
> >>>> reserve_crashkernel_generic+0x7c/0x220
> >>>>> [    0.022628] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
> >>>> from=0x0000000100000000 max_addr=0x0000400000000000
> >>>> reserve_crashkernel_generic+0x7c/0x220
> >>>>> [    0.022632] memblock_reserve: [0x000000c01f000000-0x000000c03effffff]
> >>>> memblock_alloc_range_nid+0xee/0x170
> >>>>> [    0.022634] memblock_phys_alloc_range: 268435456 bytes align=0x1000000
> >>>> from=0x0000000000000000 max_addr=0x0000000100000000
> >>>> reserve_crashkernel_generic+0x11d/0x220
> >>>>> [    0.022638] memblock_reserve: [0x0000000049000000-0x0000000058ffffff]
> >>>> memblock_alloc_range_nid+0xee/0x170
> >>>>> [    0.022640] crashkernel low memory reserved: 0x49000000 - 0x59000000
> >>>> (256 MB)
> >>>>> [    0.022641] crashkernel reserved: 0x000000c01f000000 -
> >>>> 0x000000c03f000000 (512 MB)
> >>>>
> >>>> Here, crashkernel,low is reserved in region:  [0x49000000 - 0x59000000] (256
> >>>> MB)
> >>>>       crashkernel,high is reserved in region: [0x000000c01f000000 -
> >>>> 0x000000c03f000000] (512 MB) ......
> >>>>> [    0.029839] memblock_reserve: [0x000000c03ffff740-0x000000c03fffff7f]
> >>>> memblock_alloc_range_nid+0xee/0x170
> >>>>> [    0.029843] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
> >>>> reserved
> >>>>> [    0.029861] TSC deadline timer available
> >>>>
> >>>> Then here, region [0x53cbd000-0x53ccffff] is reserved in e820, and print abvoe
> >>>> "usable ==> reserved". This should be the step which prevents earlier reserved
> >>>> crashkernel,low from being added to iomem tree. I am not sure what triggered
> >>>> the e820 update.
> >>
> >> We added dump_stack () printing in efi_mem_reserve () and found that
> >> [0x53cbd000-0x53ccffff] was reserved by BGRT:
> >>
> >>   [    0.032259] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
> >> reserved
> >>   [    0.032262] CPU: 0 PID: 0 Comm: swapper Not tainted
> >> 5.10.0-60.18.0.50.h820.eulerosv2r11.x86_64 #7
> >>   [    0.032263] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 8.25
> >> 08/30/2022
> >>   [    0.032264] Call Trace:
> >>   [    0.032265]  ? dump_stack+0x57/0x6e
> >>   [    0.032267]  ? bgrt_init+0xc2/0xc2
> >>   [    0.032268]  ? __e820__range_update+0x7a/0x1d6
> >>   [    0.032270]  ? bgrt_init+0xc2/0xc2
> >>   [    0.032272]  ? bgrt_init+0xc2/0xc2
> >>   [    0.032274]  ? efi_arch_mem_reserve+0x1a3/0x1d0
> >>   [    0.032276]  ? efi_mem_reserve+0x2d/0x42
> >>   [    0.032278]  ? acpi_parse_bgrt+0xa/0x11
> >>   [    0.032279]  ? acpi_table_parse+0x86/0xbc
> >>   [    0.032281]  ? acpi_boot_init+0x79/0xad
> >>   [    0.032282]  ? setup_arch+0x835/0x954
> >>   [    0.032284]  ? start_kernel+0x5d/0x455
> >>   [    0.032286]  ? secondary_startup_64_no_verify+0xc2/0xcb
> >>
> >> efi_reserve_boot_services() has reserved memory of type
> >> EFI_BOOT_SERVICES_CODE & EFI_BOOT_SERVICES_DATA  before crashkernel.
> >> efi_bgrt_init() assumes that EFI_BOOT_SERVICES_DATA is not reserved by
> >> other modules. Then, the e820_table is directly updated, and the BGRT
> >> memory is reserved.
> >>
> >> However, memblock_is_region_reserved() in efi_reserve_boot_services()
> >> returns true when the ranges only overlap.
> >>
> >>      already_reserved = memblock_is_region_reserved(start, size);
> > 
> > Do you mean efi_reserve_boot_services is supposed to reserve the bgrt
> > memory but it does not reserve it due to the region overlapping with
> > some other reserved region?  If so can you debug and find what exact
> > memblock reserved region overlaps with the bgrt?
> 
> Yes. I added the following debug print to efi_reserve_boot_services():
> 
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
> @@ -339,6 +339,10 @@ void __init efi_reserve_boot_services(void)
> 
>                 already_reserved = memblock_is_region_reserved(start, size);
> 
> +               pr_info("kdumpdebug: efi_reserve_boot_services start 0x%lu, "
> +                       "size 0x%lx, type 0x%lx, already_reserved %d\n",
> +                       start, size, md->type, already_reserved);
> +
>                 /*
>                  * Because the following memblock_reserve() is paired
>                  * with memblock_free_late() for this region in
> 

It's great debugging and analysis, thanks you guys. Now there are
several questions:

1) why memory region [0x5976a018-0x5976abc7] is reserved by memblock
for efi_mem_attr_table. It's supposed to be outside of the
EFI_BOOT_SERVICES_DATA area? We may need check here if it's a bug.

[    0.000000] random: crng init done
[    0.000000] memblock_reserve: [0x000000005976a018-0x000000005976abc7] efi_memattr_init+0x51/0xa0

> This memory [0x0000005976a018-0x00000005976abc7] is reserved here, which belongs to EFI_BOOT_SERVICES_DATA.
>     [    0.000000] memblock_reserve: [0x000000005976a018-0x000000005976abc7] efi_memattr_init+0x51/0xa0
> It falls in the following range
>     [    0.000000] efi: mem22: [Boot Data   |   |  |  |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000051329000-0x000000005cefefff] (187MB)
> 
> in efi_reserve_boot_services(), [0x0000005132900-0x00000005cefeff] will not be fully reserved because [0x0000005976a018-0x0000005976abc7]
> has already been reserved and overlaps with [0x0000005976a018-0x0000005976abc7]

2) Because efi_mem_attr_table memblock reserved [0x5976a018-0x5976abc7],
the whole EFI_BOOT_SERVICES_DATA area [0x5132900-0x5cefeff] is not
memblock reserved for later free. Excep of the small area, do we need
still memblock reserve the remaining area, we may need check if this is
a bug.

> 
>     [    0.021316] efi: kdumpdebug: efi_reserve_boot_services start 0x51329000, size 0xbbd6000, type 0x4, already_reserved 1
> 
> It is not reserved by memblock, this free memory region is allocated by crashkernel
> 
>     [    0.022597] crashkernel low memory reserved: 0x49000000 - 0x59000000 (256 MB)
>     [    0.022599] crashkernel reserved: 0x000000c01f000000 - 0x000000c03f000000 (512 MB)
> 
> In efi_bgrt_init (), it is assumed that the memory of the EFI_BOOT_SERVICES_DATA type has been successfully 
> reserved. Therefore, the address in the range is directly used. As a result, the memory overlaps with
> the crashkernel region. 

(3) efi_bgrt_init() should be innocent because it's supposed to safely
use the area according to the existing efi quirk handling.


(4) the deferring of adding crashh_low_res to iomem exposed the above
efi issue. When we cancel the deferring of crashh_res inserting into
iomem, we can see that the brgt area is reserved inside crashkernel
region, that's problematic.

2d4fd058-60efefff : System RAM
  2d4fd058-58ffffff : System RAM
    49000000-58ffffff : Crash kernel
      53cbd000-53ccffff : Reserved     <--- 
60eff000-704fefff : Reserved
--
  93dd424000-93dd9fffff : Kernel bss
  c01f000000-c03effffff : Crash kernel
d0000000000-d0fffffffff : PCI Bus 0000:00
  d0000000000-d00001fffff : PCI Bus 0000:01

> 
>     [    0.029694] e820: update [mem 0x53cbd000-0x53ccffff] usable ==> reserved
> > 
> > BTW, the previous email threads are weird, and not threading
> > correctly, hard to find information.
> 
> It should be because the log content is too large and has been put on hold. In my previous email, I received a prompt:
> 
>  The reason it is being held:
> 
>     Message body is too big: 248998 bytes with a limit of 40 KB
> 
> 
> > 
> >>
> >>      /*
> >>       * Because the following memblock_reserve() is paired
> >>       * with memblock_free_late() for this region in
> >>       * efi_free_boot_services(), we must be extremely
> >>       * careful not to reserve, and subsequently free,
> >>       * critical regions of memory (like the kernel image) or
> >>       * those regions that somebody else has already
> >>       * reserved.
> >>       *
> >>       * A good example of a critical region that must not be
> >>       * freed is page zero (first 4Kb of memory), which may
> >>       * contain boot services code/data but is marked
> >>       * E820_TYPE_RESERVED by trim_bios_range().
> >>       */
> >>      if (!already_reserved) {
> >>              memblock_reserve(start, size);
> >>
> >>              /*
> >>               * If we are the first to reserve the region, no
> >>               * one else cares about it. We own it and can
> >>               * free it later.
> >>               */
> >>              if (can_free_region(start, size))
> >>                      continue;
> >>      }
> >>
> >> As a result, some memory of EFI_BOOT_SERVICES_DATA is not reserved in
> >> advance. The subsequent crashkernel happens to reserve this portion of
> >> memory, which conflicts with BGRT.
> >>
> >>> Current analysis suggests that efi_reserve_boot_services() is causing the update of the e820 table.
> >>>
> >>>>
> >>>> How do you boot into your new 6.8.0 kernel? Used kexec -l to jump into the 2nd
> >>>> kernel, or reboot from bios/firmware boot up into 6.8.0?
> >>> It's reboot from bios boot up into 6.8.0. I attempted to revert the below patch,
> >>>  and this time the conflicting segment "53cbd000-53ccffff" also appeared in the /proc/iomem
> >>>  of the 6.8 kernel.
> >>>
> >>> 2d4fd058-60efefff : System RAM
> >>>   2d4fd058-58ffffff : System RAM
> >>>     49000000-58ffffff : Crash kernel
> >>>       53cbd000-53ccffff : Reserved
> >>> 60eff000-704fefff : Reserved
> >>> --
> >>>   93dd424000-93dd9fffff : Kernel bss
> >>>   c01f000000-c03effffff : Crash kernel
> >>> d0000000000-d0fffffffff : PCI Bus 0000:00
> >>>   d0000000000-d00001fffff : PCI Bus 0000:01
> >>>>
> >>>> Reverting below commit should fix your problem, can you try it?
> >>>>
> >>>> commit 4a693ce65b186fddc1a73621bd6f941e6e3eca21
> >>>> Author: Huacai Chen <chenhuacai@kernel.org>
> >>>> Date:   Fri Dec 29 16:02:13 2023 +0800
> >>>>
> >>>>     kdump: defer the insertion of crashkernel resources
> >>>
> >>> .
> >>>
> >>
> >> _______________________________________________
> >> kexec mailing list
> >> kexec@lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/kexec
> > 
> > .
> > 
> 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Address Range Validation in Crash Kernel Allocation
  2024-03-21 12:37     ` Li Huafei
@ 2024-03-22  7:18         ` Dave Young
  2024-03-22  7:18         ` Dave Young
  1 sibling, 0 replies; 17+ messages in thread
From: Dave Young @ 2024-03-22  7:18 UTC (permalink / raw)
  To: Li Huafei, Ard Biesheuvel
  Cc: chenhaixiang (A),
	Baoquan He, kexec, chenhuacai, x86, Louhongxiang, wangbin (A),
	Fangchuangchuang(Fcc,Euler), wanghai (M),
	Wangkefeng (OS Kernel Lab),
	linux-efi

On Thu, 21 Mar 2024 at 20:37, Li Huafei <lihuafei1@huawei.com> wrote:
>
>
>
> On 2024/3/21 18:06, Dave Young wrote:
> > Hi,
> >
> > On Thu, 21 Mar 2024 at 17:49, Li Huafei <lihuafei1@huawei.com> wrote:
> >>
> >> Hi Baoquan,
> >>
> >> On 2024/3/21 17:17, chenhaixiang (A) wrote:
> >>>
> >>>>> I'm sorry for the delay. Here are some details from the boot log and
> >>>> /proc/iomem:
> >>>>> The Boot log:
> >>>>> [    0.000000] Linux version 6.8.0 (root@localhost.localdomain) (gcc (GCC)
> >>>> 10.3.1, GNU ld (GNU Binutils) 2.37) #3 SMP PREEMPT_DYNAMIC Wed Mar 20
> >>>> 11:46:11 UTC 2024
> >>>>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.8.0
> >>>> root=/dev/mapper/root ro crashkernel=512M resume=/dev/mapper/swap
> >>>> rd.lvm.lv=root rd.lvm.lv=swap crash_kexec_post_notifiers softlockup_panic=1
> >>>> reserve_kbox_mem=16M fsck.mode=auto fsck.repair=yes panic=3
> >>>> nmi_watchdog=1 quiet rd.shell=0 memblock=debug efi=debug
> >>>> console=ttyS0,115200n8 console=tty0
> >>>> ......snip...
> >>>>> [    0.022622] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
> >>>> from=0x0000000000000000 max_addr=0x0000000100000000
> >>>> reserve_crashkernel_generic+0x7c/0x220
> >>>>> [    0.022628] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
> >>>> from=0x0000000100000000 max_addr=0x0000400000000000
> >>>> reserve_crashkernel_generic+0x7c/0x220
> >>>>> [    0.022632] memblock_reserve: [0x000000c01f000000-0x000000c03effffff]
> >>>> memblock_alloc_range_nid+0xee/0x170
> >>>>> [    0.022634] memblock_phys_alloc_range: 268435456 bytes align=0x1000000
> >>>> from=0x0000000000000000 max_addr=0x0000000100000000
> >>>> reserve_crashkernel_generic+0x11d/0x220
> >>>>> [    0.022638] memblock_reserve: [0x0000000049000000-0x0000000058ffffff]
> >>>> memblock_alloc_range_nid+0xee/0x170
> >>>>> [    0.022640] crashkernel low memory reserved: 0x49000000 - 0x59000000
> >>>> (256 MB)
> >>>>> [    0.022641] crashkernel reserved: 0x000000c01f000000 -
> >>>> 0x000000c03f000000 (512 MB)
> >>>>
> >>>> Here, crashkernel,low is reserved in region:  [0x49000000 - 0x59000000] (256
> >>>> MB)
> >>>>       crashkernel,high is reserved in region: [0x000000c01f000000 -
> >>>> 0x000000c03f000000] (512 MB) ......
> >>>>> [    0.029839] memblock_reserve: [0x000000c03ffff740-0x000000c03fffff7f]
> >>>> memblock_alloc_range_nid+0xee/0x170
> >>>>> [    0.029843] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
> >>>> reserved
> >>>>> [    0.029861] TSC deadline timer available
> >>>>
> >>>> Then here, region [0x53cbd000-0x53ccffff] is reserved in e820, and print abvoe
> >>>> "usable ==> reserved". This should be the step which prevents earlier reserved
> >>>> crashkernel,low from being added to iomem tree. I am not sure what triggered
> >>>> the e820 update.
> >>
> >> We added dump_stack () printing in efi_mem_reserve () and found that
> >> [0x53cbd000-0x53ccffff] was reserved by BGRT:
> >>
> >>   [    0.032259] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
> >> reserved
> >>   [    0.032262] CPU: 0 PID: 0 Comm: swapper Not tainted
> >> 5.10.0-60.18.0.50.h820.eulerosv2r11.x86_64 #7
> >>   [    0.032263] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 8.25
> >> 08/30/2022
> >>   [    0.032264] Call Trace:
> >>   [    0.032265]  ? dump_stack+0x57/0x6e
> >>   [    0.032267]  ? bgrt_init+0xc2/0xc2
> >>   [    0.032268]  ? __e820__range_update+0x7a/0x1d6
> >>   [    0.032270]  ? bgrt_init+0xc2/0xc2
> >>   [    0.032272]  ? bgrt_init+0xc2/0xc2
> >>   [    0.032274]  ? efi_arch_mem_reserve+0x1a3/0x1d0
> >>   [    0.032276]  ? efi_mem_reserve+0x2d/0x42
> >>   [    0.032278]  ? acpi_parse_bgrt+0xa/0x11
> >>   [    0.032279]  ? acpi_table_parse+0x86/0xbc
> >>   [    0.032281]  ? acpi_boot_init+0x79/0xad
> >>   [    0.032282]  ? setup_arch+0x835/0x954
> >>   [    0.032284]  ? start_kernel+0x5d/0x455
> >>   [    0.032286]  ? secondary_startup_64_no_verify+0xc2/0xcb
> >>
> >> efi_reserve_boot_services() has reserved memory of type
> >> EFI_BOOT_SERVICES_CODE & EFI_BOOT_SERVICES_DATA  before crashkernel.
> >> efi_bgrt_init() assumes that EFI_BOOT_SERVICES_DATA is not reserved by
> >> other modules. Then, the e820_table is directly updated, and the BGRT
> >> memory is reserved.
> >>
> >> However, memblock_is_region_reserved() in efi_reserve_boot_services()
> >> returns true when the ranges only overlap.
> >>
> >>      already_reserved = memblock_is_region_reserved(start, size);
> >
> > Do you mean efi_reserve_boot_services is supposed to reserve the bgrt
> > memory but it does not reserve it due to the region overlapping with
> > some other reserved region?  If so can you debug and find what exact
> > memblock reserved region overlaps with the bgrt?
>
> Yes. I added the following debug print to efi_reserve_boot_services():
>
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
> @@ -339,6 +339,10 @@ void __init efi_reserve_boot_services(void)
>
>                 already_reserved = memblock_is_region_reserved(start, size);
>
> +               pr_info("kdumpdebug: efi_reserve_boot_services start 0x%lu, "
> +                       "size 0x%lx, type 0x%lx, already_reserved %d\n",
> +                       start, size, md->type, already_reserved);
> +
>                 /*
>                  * Because the following memblock_reserve() is paired
>                  * with memblock_free_late() for this region in
>
> This memory [0x0000005976a018-0x00000005976abc7] is reserved here, which belongs to EFI_BOOT_SERVICES_DATA.
>     [    0.000000] memblock_reserve: [0x000000005976a018-0x000000005976abc7] efi_memattr_init+0x51/0xa0
> It falls in the following range
>     [    0.000000] efi: mem22: [Boot Data   |   |  |  |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000051329000-0x000000005cefefff] (187MB)
>
> in efi_reserve_boot_services(), [0x0000005132900-0x00000005cefeff] will not be fully reserved because [0x0000005976a018-0x0000005976abc7]
> has already been reserved and overlaps with [0x0000005976a018-0x0000005976abc7]

Ok, it looks to me it is like this:

efi_memattr_init() reserved the memattr table with memblock_reserve
efi_reserve_boot_services failed to reserve the boot data region which
includes the memattr table due to it has been revervely partially.

So this should be a UEFI issue revealed by the crashkernel resource
late intert commit.   I suspect the memblock_reserve in
efi_memattr_init can be just removed and leave to
efi_reserve_boot_services to do that.

Added Ard and efi list for opinion.


>
>     [    0.021316] efi: kdumpdebug: efi_reserve_boot_services start 0x51329000, size 0xbbd6000, type 0x4, already_reserved 1
>
> It is not reserved by memblock, this free memory region is allocated by crashkernel
>
>     [    0.022597] crashkernel low memory reserved: 0x49000000 - 0x59000000 (256 MB)
>     [    0.022599] crashkernel reserved: 0x000000c01f000000 - 0x000000c03f000000 (512 MB)
>
> In efi_bgrt_init (), it is assumed that the memory of the EFI_BOOT_SERVICES_DATA type has been successfully
> reserved. Therefore, the address in the range is directly used. As a result, the memory overlaps with
> the crashkernel region.
>
>     [    0.029694] e820: update [mem 0x53cbd000-0x53ccffff] usable ==> reserved

Thanks
Dave


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Address Range Validation in Crash Kernel Allocation
@ 2024-03-22  7:18         ` Dave Young
  0 siblings, 0 replies; 17+ messages in thread
From: Dave Young @ 2024-03-22  7:18 UTC (permalink / raw)
  To: Li Huafei, Ard Biesheuvel
  Cc: chenhaixiang (A),
	Baoquan He, kexec, chenhuacai, x86, Louhongxiang, wangbin (A),
	Fangchuangchuang(Fcc,Euler), wanghai (M),
	Wangkefeng (OS Kernel Lab),
	linux-efi

On Thu, 21 Mar 2024 at 20:37, Li Huafei <lihuafei1@huawei.com> wrote:
>
>
>
> On 2024/3/21 18:06, Dave Young wrote:
> > Hi,
> >
> > On Thu, 21 Mar 2024 at 17:49, Li Huafei <lihuafei1@huawei.com> wrote:
> >>
> >> Hi Baoquan,
> >>
> >> On 2024/3/21 17:17, chenhaixiang (A) wrote:
> >>>
> >>>>> I'm sorry for the delay. Here are some details from the boot log and
> >>>> /proc/iomem:
> >>>>> The Boot log:
> >>>>> [    0.000000] Linux version 6.8.0 (root@localhost.localdomain) (gcc (GCC)
> >>>> 10.3.1, GNU ld (GNU Binutils) 2.37) #3 SMP PREEMPT_DYNAMIC Wed Mar 20
> >>>> 11:46:11 UTC 2024
> >>>>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.8.0
> >>>> root=/dev/mapper/root ro crashkernel=512M resume=/dev/mapper/swap
> >>>> rd.lvm.lv=root rd.lvm.lv=swap crash_kexec_post_notifiers softlockup_panic=1
> >>>> reserve_kbox_mem=16M fsck.mode=auto fsck.repair=yes panic=3
> >>>> nmi_watchdog=1 quiet rd.shell=0 memblock=debug efi=debug
> >>>> console=ttyS0,115200n8 console=tty0
> >>>> ......snip...
> >>>>> [    0.022622] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
> >>>> from=0x0000000000000000 max_addr=0x0000000100000000
> >>>> reserve_crashkernel_generic+0x7c/0x220
> >>>>> [    0.022628] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
> >>>> from=0x0000000100000000 max_addr=0x0000400000000000
> >>>> reserve_crashkernel_generic+0x7c/0x220
> >>>>> [    0.022632] memblock_reserve: [0x000000c01f000000-0x000000c03effffff]
> >>>> memblock_alloc_range_nid+0xee/0x170
> >>>>> [    0.022634] memblock_phys_alloc_range: 268435456 bytes align=0x1000000
> >>>> from=0x0000000000000000 max_addr=0x0000000100000000
> >>>> reserve_crashkernel_generic+0x11d/0x220
> >>>>> [    0.022638] memblock_reserve: [0x0000000049000000-0x0000000058ffffff]
> >>>> memblock_alloc_range_nid+0xee/0x170
> >>>>> [    0.022640] crashkernel low memory reserved: 0x49000000 - 0x59000000
> >>>> (256 MB)
> >>>>> [    0.022641] crashkernel reserved: 0x000000c01f000000 -
> >>>> 0x000000c03f000000 (512 MB)
> >>>>
> >>>> Here, crashkernel,low is reserved in region:  [0x49000000 - 0x59000000] (256
> >>>> MB)
> >>>>       crashkernel,high is reserved in region: [0x000000c01f000000 -
> >>>> 0x000000c03f000000] (512 MB) ......
> >>>>> [    0.029839] memblock_reserve: [0x000000c03ffff740-0x000000c03fffff7f]
> >>>> memblock_alloc_range_nid+0xee/0x170
> >>>>> [    0.029843] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
> >>>> reserved
> >>>>> [    0.029861] TSC deadline timer available
> >>>>
> >>>> Then here, region [0x53cbd000-0x53ccffff] is reserved in e820, and print abvoe
> >>>> "usable ==> reserved". This should be the step which prevents earlier reserved
> >>>> crashkernel,low from being added to iomem tree. I am not sure what triggered
> >>>> the e820 update.
> >>
> >> We added dump_stack () printing in efi_mem_reserve () and found that
> >> [0x53cbd000-0x53ccffff] was reserved by BGRT:
> >>
> >>   [    0.032259] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
> >> reserved
> >>   [    0.032262] CPU: 0 PID: 0 Comm: swapper Not tainted
> >> 5.10.0-60.18.0.50.h820.eulerosv2r11.x86_64 #7
> >>   [    0.032263] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 8.25
> >> 08/30/2022
> >>   [    0.032264] Call Trace:
> >>   [    0.032265]  ? dump_stack+0x57/0x6e
> >>   [    0.032267]  ? bgrt_init+0xc2/0xc2
> >>   [    0.032268]  ? __e820__range_update+0x7a/0x1d6
> >>   [    0.032270]  ? bgrt_init+0xc2/0xc2
> >>   [    0.032272]  ? bgrt_init+0xc2/0xc2
> >>   [    0.032274]  ? efi_arch_mem_reserve+0x1a3/0x1d0
> >>   [    0.032276]  ? efi_mem_reserve+0x2d/0x42
> >>   [    0.032278]  ? acpi_parse_bgrt+0xa/0x11
> >>   [    0.032279]  ? acpi_table_parse+0x86/0xbc
> >>   [    0.032281]  ? acpi_boot_init+0x79/0xad
> >>   [    0.032282]  ? setup_arch+0x835/0x954
> >>   [    0.032284]  ? start_kernel+0x5d/0x455
> >>   [    0.032286]  ? secondary_startup_64_no_verify+0xc2/0xcb
> >>
> >> efi_reserve_boot_services() has reserved memory of type
> >> EFI_BOOT_SERVICES_CODE & EFI_BOOT_SERVICES_DATA  before crashkernel.
> >> efi_bgrt_init() assumes that EFI_BOOT_SERVICES_DATA is not reserved by
> >> other modules. Then, the e820_table is directly updated, and the BGRT
> >> memory is reserved.
> >>
> >> However, memblock_is_region_reserved() in efi_reserve_boot_services()
> >> returns true when the ranges only overlap.
> >>
> >>      already_reserved = memblock_is_region_reserved(start, size);
> >
> > Do you mean efi_reserve_boot_services is supposed to reserve the bgrt
> > memory but it does not reserve it due to the region overlapping with
> > some other reserved region?  If so can you debug and find what exact
> > memblock reserved region overlaps with the bgrt?
>
> Yes. I added the following debug print to efi_reserve_boot_services():
>
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
> @@ -339,6 +339,10 @@ void __init efi_reserve_boot_services(void)
>
>                 already_reserved = memblock_is_region_reserved(start, size);
>
> +               pr_info("kdumpdebug: efi_reserve_boot_services start 0x%lu, "
> +                       "size 0x%lx, type 0x%lx, already_reserved %d\n",
> +                       start, size, md->type, already_reserved);
> +
>                 /*
>                  * Because the following memblock_reserve() is paired
>                  * with memblock_free_late() for this region in
>
> This memory [0x0000005976a018-0x00000005976abc7] is reserved here, which belongs to EFI_BOOT_SERVICES_DATA.
>     [    0.000000] memblock_reserve: [0x000000005976a018-0x000000005976abc7] efi_memattr_init+0x51/0xa0
> It falls in the following range
>     [    0.000000] efi: mem22: [Boot Data   |   |  |  |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000051329000-0x000000005cefefff] (187MB)
>
> in efi_reserve_boot_services(), [0x0000005132900-0x00000005cefeff] will not be fully reserved because [0x0000005976a018-0x0000005976abc7]
> has already been reserved and overlaps with [0x0000005976a018-0x0000005976abc7]

Ok, it looks to me it is like this:

efi_memattr_init() reserved the memattr table with memblock_reserve
efi_reserve_boot_services failed to reserve the boot data region which
includes the memattr table due to it has been revervely partially.

So this should be a UEFI issue revealed by the crashkernel resource
late intert commit.   I suspect the memblock_reserve in
efi_memattr_init can be just removed and leave to
efi_reserve_boot_services to do that.

Added Ard and efi list for opinion.


>
>     [    0.021316] efi: kdumpdebug: efi_reserve_boot_services start 0x51329000, size 0xbbd6000, type 0x4, already_reserved 1
>
> It is not reserved by memblock, this free memory region is allocated by crashkernel
>
>     [    0.022597] crashkernel low memory reserved: 0x49000000 - 0x59000000 (256 MB)
>     [    0.022599] crashkernel reserved: 0x000000c01f000000 - 0x000000c03f000000 (512 MB)
>
> In efi_bgrt_init (), it is assumed that the memory of the EFI_BOOT_SERVICES_DATA type has been successfully
> reserved. Therefore, the address in the range is directly used. As a result, the memory overlaps with
> the crashkernel region.
>
>     [    0.029694] e820: update [mem 0x53cbd000-0x53ccffff] usable ==> reserved

Thanks
Dave


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Address Range Validation in Crash Kernel Allocation
  2024-03-22  1:16       ` Baoquan He
@ 2024-03-22  7:26         ` Dave Young
  0 siblings, 0 replies; 17+ messages in thread
From: Dave Young @ 2024-03-22  7:26 UTC (permalink / raw)
  To: Baoquan He
  Cc: Li Huafei, chenhaixiang (A),
	kexec, chenhuacai, x86, Louhongxiang, wangbin (A),
	Fangchuangchuang(Fcc,Euler), wanghai (M),
	Wangkefeng (OS Kernel Lab)

Hi,

On Fri, 22 Mar 2024 at 09:16, Baoquan He <bhe@redhat.com> wrote:
>
> On 03/21/24 at 08:37pm, Li Huafei wrote:
> >
> >
> > On 2024/3/21 18:06, Dave Young wrote:
> > > Hi,
> > >
> > > On Thu, 21 Mar 2024 at 17:49, Li Huafei <lihuafei1@huawei.com> wrote:
> > >>
> > >> Hi Baoquan,
> > >>
> > >> On 2024/3/21 17:17, chenhaixiang (A) wrote:
> > >>>
> > >>>>> I'm sorry for the delay. Here are some details from the boot log and
> > >>>> /proc/iomem:
> > >>>>> The Boot log:
> > >>>>> [    0.000000] Linux version 6.8.0 (root@localhost.localdomain) (gcc (GCC)
> > >>>> 10.3.1, GNU ld (GNU Binutils) 2.37) #3 SMP PREEMPT_DYNAMIC Wed Mar 20
> > >>>> 11:46:11 UTC 2024
> > >>>>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.8.0
> > >>>> root=/dev/mapper/root ro crashkernel=512M resume=/dev/mapper/swap
> > >>>> rd.lvm.lv=root rd.lvm.lv=swap crash_kexec_post_notifiers softlockup_panic=1
> > >>>> reserve_kbox_mem=16M fsck.mode=auto fsck.repair=yes panic=3
> > >>>> nmi_watchdog=1 quiet rd.shell=0 memblock=debug efi=debug
> > >>>> console=ttyS0,115200n8 console=tty0
> > >>>> ......snip...
> > >>>>> [    0.022622] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
> > >>>> from=0x0000000000000000 max_addr=0x0000000100000000
> > >>>> reserve_crashkernel_generic+0x7c/0x220
> > >>>>> [    0.022628] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
> > >>>> from=0x0000000100000000 max_addr=0x0000400000000000
> > >>>> reserve_crashkernel_generic+0x7c/0x220
> > >>>>> [    0.022632] memblock_reserve: [0x000000c01f000000-0x000000c03effffff]
> > >>>> memblock_alloc_range_nid+0xee/0x170
> > >>>>> [    0.022634] memblock_phys_alloc_range: 268435456 bytes align=0x1000000
> > >>>> from=0x0000000000000000 max_addr=0x0000000100000000
> > >>>> reserve_crashkernel_generic+0x11d/0x220
> > >>>>> [    0.022638] memblock_reserve: [0x0000000049000000-0x0000000058ffffff]
> > >>>> memblock_alloc_range_nid+0xee/0x170
> > >>>>> [    0.022640] crashkernel low memory reserved: 0x49000000 - 0x59000000
> > >>>> (256 MB)
> > >>>>> [    0.022641] crashkernel reserved: 0x000000c01f000000 -
> > >>>> 0x000000c03f000000 (512 MB)
> > >>>>
> > >>>> Here, crashkernel,low is reserved in region:  [0x49000000 - 0x59000000] (256
> > >>>> MB)
> > >>>>       crashkernel,high is reserved in region: [0x000000c01f000000 -
> > >>>> 0x000000c03f000000] (512 MB) ......
> > >>>>> [    0.029839] memblock_reserve: [0x000000c03ffff740-0x000000c03fffff7f]
> > >>>> memblock_alloc_range_nid+0xee/0x170
> > >>>>> [    0.029843] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
> > >>>> reserved
> > >>>>> [    0.029861] TSC deadline timer available
> > >>>>
> > >>>> Then here, region [0x53cbd000-0x53ccffff] is reserved in e820, and print abvoe
> > >>>> "usable ==> reserved". This should be the step which prevents earlier reserved
> > >>>> crashkernel,low from being added to iomem tree. I am not sure what triggered
> > >>>> the e820 update.
> > >>
> > >> We added dump_stack () printing in efi_mem_reserve () and found that
> > >> [0x53cbd000-0x53ccffff] was reserved by BGRT:
> > >>
> > >>   [    0.032259] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
> > >> reserved
> > >>   [    0.032262] CPU: 0 PID: 0 Comm: swapper Not tainted
> > >> 5.10.0-60.18.0.50.h820.eulerosv2r11.x86_64 #7
> > >>   [    0.032263] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 8.25
> > >> 08/30/2022
> > >>   [    0.032264] Call Trace:
> > >>   [    0.032265]  ? dump_stack+0x57/0x6e
> > >>   [    0.032267]  ? bgrt_init+0xc2/0xc2
> > >>   [    0.032268]  ? __e820__range_update+0x7a/0x1d6
> > >>   [    0.032270]  ? bgrt_init+0xc2/0xc2
> > >>   [    0.032272]  ? bgrt_init+0xc2/0xc2
> > >>   [    0.032274]  ? efi_arch_mem_reserve+0x1a3/0x1d0
> > >>   [    0.032276]  ? efi_mem_reserve+0x2d/0x42
> > >>   [    0.032278]  ? acpi_parse_bgrt+0xa/0x11
> > >>   [    0.032279]  ? acpi_table_parse+0x86/0xbc
> > >>   [    0.032281]  ? acpi_boot_init+0x79/0xad
> > >>   [    0.032282]  ? setup_arch+0x835/0x954
> > >>   [    0.032284]  ? start_kernel+0x5d/0x455
> > >>   [    0.032286]  ? secondary_startup_64_no_verify+0xc2/0xcb
> > >>
> > >> efi_reserve_boot_services() has reserved memory of type
> > >> EFI_BOOT_SERVICES_CODE & EFI_BOOT_SERVICES_DATA  before crashkernel.
> > >> efi_bgrt_init() assumes that EFI_BOOT_SERVICES_DATA is not reserved by
> > >> other modules. Then, the e820_table is directly updated, and the BGRT
> > >> memory is reserved.
> > >>
> > >> However, memblock_is_region_reserved() in efi_reserve_boot_services()
> > >> returns true when the ranges only overlap.
> > >>
> > >>      already_reserved = memblock_is_region_reserved(start, size);
> > >
> > > Do you mean efi_reserve_boot_services is supposed to reserve the bgrt
> > > memory but it does not reserve it due to the region overlapping with
> > > some other reserved region?  If so can you debug and find what exact
> > > memblock reserved region overlaps with the bgrt?
> >
> > Yes. I added the following debug print to efi_reserve_boot_services():
> >
> > --- a/arch/x86/platform/efi/quirks.c
> > +++ b/arch/x86/platform/efi/quirks.c
> > @@ -339,6 +339,10 @@ void __init efi_reserve_boot_services(void)
> >
> >                 already_reserved = memblock_is_region_reserved(start, size);
> >
> > +               pr_info("kdumpdebug: efi_reserve_boot_services start 0x%lu, "
> > +                       "size 0x%lx, type 0x%lx, already_reserved %d\n",
> > +                       start, size, md->type, already_reserved);
> > +
> >                 /*
> >                  * Because the following memblock_reserve() is paired
> >                  * with memblock_free_late() for this region in
> >
>
> It's great debugging and analysis, thanks you guys. Now there are
> several questions:
>
> 1) why memory region [0x5976a018-0x5976abc7] is reserved by memblock
> for efi_mem_attr_table. It's supposed to be outside of the
> EFI_BOOT_SERVICES_DATA area? We may need check here if it's a bug.

The mem_attr_table memory falls into a EFI Boot Service Data region
[    0.000000] efi: mem22: [Boot Data   |   |  |  |  |  |  |  |  |  |
 |WB|WT|WC|UC] range=[0x0000000051329000-0x000000005cefefff] (187MB)

>
> [    0.000000] random: crng init done
> [    0.000000] memblock_reserve: [0x000000005976a018-0x000000005976abc7] efi_memattr_init+0x51/0xa0
>
> > This memory [0x0000005976a018-0x00000005976abc7] is reserved here, which belongs to EFI_BOOT_SERVICES_DATA.
> >     [    0.000000] memblock_reserve: [0x000000005976a018-0x000000005976abc7] efi_memattr_init+0x51/0xa0
> > It falls in the following range
> >     [    0.000000] efi: mem22: [Boot Data   |   |  |  |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000051329000-0x000000005cefefff] (187MB)
> >
> > in efi_reserve_boot_services(), [0x0000005132900-0x00000005cefeff] will not be fully reserved because [0x0000005976a018-0x0000005976abc7]
> > has already been reserved and overlaps with [0x0000005976a018-0x0000005976abc7]
>
> 2) Because efi_mem_attr_table memblock reserved [0x5976a018-0x5976abc7],
> the whole EFI_BOOT_SERVICES_DATA area [0x5132900-0x5cefeff] is not
> memblock reserved for later free. Excep of the small area, do we need
> still memblock reserve the remaining area, we may need check if this is
> a bug.

I think the whole EFI Boot Data region should be reserved temperately
by efi_reserve_boot_services, but if they should be reserved partially
as multiple smaller regions I'm not sure, I added Ard and EFI list in
another reply, let's see how EFI people think.

>
> >
> >     [    0.021316] efi: kdumpdebug: efi_reserve_boot_services start 0x51329000, size 0xbbd6000, type 0x4, already_reserved 1
> >
> > It is not reserved by memblock, this free memory region is allocated by crashkernel
> >
> >     [    0.022597] crashkernel low memory reserved: 0x49000000 - 0x59000000 (256 MB)
> >     [    0.022599] crashkernel reserved: 0x000000c01f000000 - 0x000000c03f000000 (512 MB)
> >
> > In efi_bgrt_init (), it is assumed that the memory of the EFI_BOOT_SERVICES_DATA type has been successfully
> > reserved. Therefore, the address in the range is directly used. As a result, the memory overlaps with
> > the crashkernel region.
>
> (3) efi_bgrt_init() should be innocent because it's supposed to safely
> use the area according to the existing efi quirk handling.

Agreed

>
>
> (4) the deferring of adding crashh_low_res to iomem exposed the above
> efi issue. When we cancel the deferring of crashh_res inserting into
> iomem, we can see that the brgt area is reserved inside crashkernel
> region, that's problematic.
>
> 2d4fd058-60efefff : System RAM
>   2d4fd058-58ffffff : System RAM
>     49000000-58ffffff : Crash kernel
>       53cbd000-53ccffff : Reserved     <---
> 60eff000-704fefff : Reserved
> --
>   93dd424000-93dd9fffff : Kernel bss
>   c01f000000-c03effffff : Crash kernel
> d0000000000-d0fffffffff : PCI Bus 0000:00
>   d0000000000-d00001fffff : PCI Bus 0000:01
>
> >
> >     [    0.029694] e820: update [mem 0x53cbd000-0x53ccffff] usable ==> reserved
> > >
> > > BTW, the previous email threads are weird, and not threading
> > > correctly, hard to find information.
> >
> > It should be because the log content is too large and has been put on hold. In my previous email, I received a prompt:
> >
> >  The reason it is being held:
> >
> >     Message body is too big: 248998 bytes with a limit of 40 KB
> >
> >
> > >
> > >>
> > >>      /*
> > >>       * Because the following memblock_reserve() is paired
> > >>       * with memblock_free_late() for this region in
> > >>       * efi_free_boot_services(), we must be extremely
> > >>       * careful not to reserve, and subsequently free,
> > >>       * critical regions of memory (like the kernel image) or
> > >>       * those regions that somebody else has already
> > >>       * reserved.
> > >>       *
> > >>       * A good example of a critical region that must not be
> > >>       * freed is page zero (first 4Kb of memory), which may
> > >>       * contain boot services code/data but is marked
> > >>       * E820_TYPE_RESERVED by trim_bios_range().
> > >>       */
> > >>      if (!already_reserved) {
> > >>              memblock_reserve(start, size);
> > >>
> > >>              /*
> > >>               * If we are the first to reserve the region, no
> > >>               * one else cares about it. We own it and can
> > >>               * free it later.
> > >>               */
> > >>              if (can_free_region(start, size))
> > >>                      continue;
> > >>      }
> > >>
> > >> As a result, some memory of EFI_BOOT_SERVICES_DATA is not reserved in
> > >> advance. The subsequent crashkernel happens to reserve this portion of
> > >> memory, which conflicts with BGRT.
> > >>
> > >>> Current analysis suggests that efi_reserve_boot_services() is causing the update of the e820 table.
> > >>>
> > >>>>
> > >>>> How do you boot into your new 6.8.0 kernel? Used kexec -l to jump into the 2nd
> > >>>> kernel, or reboot from bios/firmware boot up into 6.8.0?
> > >>> It's reboot from bios boot up into 6.8.0. I attempted to revert the below patch,
> > >>>  and this time the conflicting segment "53cbd000-53ccffff" also appeared in the /proc/iomem
> > >>>  of the 6.8 kernel.
> > >>>
> > >>> 2d4fd058-60efefff : System RAM
> > >>>   2d4fd058-58ffffff : System RAM
> > >>>     49000000-58ffffff : Crash kernel
> > >>>       53cbd000-53ccffff : Reserved
> > >>> 60eff000-704fefff : Reserved
> > >>> --
> > >>>   93dd424000-93dd9fffff : Kernel bss
> > >>>   c01f000000-c03effffff : Crash kernel
> > >>> d0000000000-d0fffffffff : PCI Bus 0000:00
> > >>>   d0000000000-d00001fffff : PCI Bus 0000:01
> > >>>>
> > >>>> Reverting below commit should fix your problem, can you try it?
> > >>>>
> > >>>> commit 4a693ce65b186fddc1a73621bd6f941e6e3eca21
> > >>>> Author: Huacai Chen <chenhuacai@kernel.org>
> > >>>> Date:   Fri Dec 29 16:02:13 2023 +0800
> > >>>>
> > >>>>     kdump: defer the insertion of crashkernel resources
> > >>>
> > >>> .
> > >>>
> > >>
> > >> _______________________________________________
> > >> kexec mailing list
> > >> kexec@lists.infradead.org
> > >> http://lists.infradead.org/mailman/listinfo/kexec
> > >
> > > .
> > >
> >
>


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Address Range Validation in Crash Kernel Allocation
  2024-03-22  7:18         ` Dave Young
@ 2024-03-22  7:58           ` Li Huafei
  -1 siblings, 0 replies; 17+ messages in thread
From: Li Huafei @ 2024-03-22  7:58 UTC (permalink / raw)
  To: Dave Young, Ard Biesheuvel
  Cc: chenhaixiang (A),
	Baoquan He, kexec, chenhuacai, x86, Louhongxiang, wangbin (A),
	Fangchuangchuang(Fcc,Euler), wanghai (M),
	Wangkefeng (OS Kernel Lab),
	linux-efi



On 2024/3/22 15:18, Dave Young wrote:
> On Thu, 21 Mar 2024 at 20:37, Li Huafei <lihuafei1@huawei.com> wrote:
>>
>>
>>
>> On 2024/3/21 18:06, Dave Young wrote:
>>> Hi,
>>>
>>> On Thu, 21 Mar 2024 at 17:49, Li Huafei <lihuafei1@huawei.com> wrote:
>>>>
>>>> Hi Baoquan,
>>>>
>>>> On 2024/3/21 17:17, chenhaixiang (A) wrote:
>>>>>
>>>>>>> I'm sorry for the delay. Here are some details from the boot log and
>>>>>> /proc/iomem:
>>>>>>> The Boot log:
>>>>>>> [    0.000000] Linux version 6.8.0 (root@localhost.localdomain) (gcc (GCC)
>>>>>> 10.3.1, GNU ld (GNU Binutils) 2.37) #3 SMP PREEMPT_DYNAMIC Wed Mar 20
>>>>>> 11:46:11 UTC 2024
>>>>>>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.8.0
>>>>>> root=/dev/mapper/root ro crashkernel=512M resume=/dev/mapper/swap
>>>>>> rd.lvm.lv=root rd.lvm.lv=swap crash_kexec_post_notifiers softlockup_panic=1
>>>>>> reserve_kbox_mem=16M fsck.mode=auto fsck.repair=yes panic=3
>>>>>> nmi_watchdog=1 quiet rd.shell=0 memblock=debug efi=debug
>>>>>> console=ttyS0,115200n8 console=tty0
>>>>>> ......snip...
>>>>>>> [    0.022622] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
>>>>>> from=0x0000000000000000 max_addr=0x0000000100000000
>>>>>> reserve_crashkernel_generic+0x7c/0x220
>>>>>>> [    0.022628] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
>>>>>> from=0x0000000100000000 max_addr=0x0000400000000000
>>>>>> reserve_crashkernel_generic+0x7c/0x220
>>>>>>> [    0.022632] memblock_reserve: [0x000000c01f000000-0x000000c03effffff]
>>>>>> memblock_alloc_range_nid+0xee/0x170
>>>>>>> [    0.022634] memblock_phys_alloc_range: 268435456 bytes align=0x1000000
>>>>>> from=0x0000000000000000 max_addr=0x0000000100000000
>>>>>> reserve_crashkernel_generic+0x11d/0x220
>>>>>>> [    0.022638] memblock_reserve: [0x0000000049000000-0x0000000058ffffff]
>>>>>> memblock_alloc_range_nid+0xee/0x170
>>>>>>> [    0.022640] crashkernel low memory reserved: 0x49000000 - 0x59000000
>>>>>> (256 MB)
>>>>>>> [    0.022641] crashkernel reserved: 0x000000c01f000000 -
>>>>>> 0x000000c03f000000 (512 MB)
>>>>>>
>>>>>> Here, crashkernel,low is reserved in region:  [0x49000000 - 0x59000000] (256
>>>>>> MB)
>>>>>>       crashkernel,high is reserved in region: [0x000000c01f000000 -
>>>>>> 0x000000c03f000000] (512 MB) ......
>>>>>>> [    0.029839] memblock_reserve: [0x000000c03ffff740-0x000000c03fffff7f]
>>>>>> memblock_alloc_range_nid+0xee/0x170
>>>>>>> [    0.029843] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
>>>>>> reserved
>>>>>>> [    0.029861] TSC deadline timer available
>>>>>>
>>>>>> Then here, region [0x53cbd000-0x53ccffff] is reserved in e820, and print abvoe
>>>>>> "usable ==> reserved". This should be the step which prevents earlier reserved
>>>>>> crashkernel,low from being added to iomem tree. I am not sure what triggered
>>>>>> the e820 update.
>>>>
>>>> We added dump_stack () printing in efi_mem_reserve () and found that
>>>> [0x53cbd000-0x53ccffff] was reserved by BGRT:
>>>>
>>>>   [    0.032259] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
>>>> reserved
>>>>   [    0.032262] CPU: 0 PID: 0 Comm: swapper Not tainted
>>>> 5.10.0-60.18.0.50.h820.eulerosv2r11.x86_64 #7
>>>>   [    0.032263] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 8.25
>>>> 08/30/2022
>>>>   [    0.032264] Call Trace:
>>>>   [    0.032265]  ? dump_stack+0x57/0x6e
>>>>   [    0.032267]  ? bgrt_init+0xc2/0xc2
>>>>   [    0.032268]  ? __e820__range_update+0x7a/0x1d6
>>>>   [    0.032270]  ? bgrt_init+0xc2/0xc2
>>>>   [    0.032272]  ? bgrt_init+0xc2/0xc2
>>>>   [    0.032274]  ? efi_arch_mem_reserve+0x1a3/0x1d0
>>>>   [    0.032276]  ? efi_mem_reserve+0x2d/0x42
>>>>   [    0.032278]  ? acpi_parse_bgrt+0xa/0x11
>>>>   [    0.032279]  ? acpi_table_parse+0x86/0xbc
>>>>   [    0.032281]  ? acpi_boot_init+0x79/0xad
>>>>   [    0.032282]  ? setup_arch+0x835/0x954
>>>>   [    0.032284]  ? start_kernel+0x5d/0x455
>>>>   [    0.032286]  ? secondary_startup_64_no_verify+0xc2/0xcb
>>>>
>>>> efi_reserve_boot_services() has reserved memory of type
>>>> EFI_BOOT_SERVICES_CODE & EFI_BOOT_SERVICES_DATA  before crashkernel.
>>>> efi_bgrt_init() assumes that EFI_BOOT_SERVICES_DATA is not reserved by
>>>> other modules. Then, the e820_table is directly updated, and the BGRT
>>>> memory is reserved.
>>>>
>>>> However, memblock_is_region_reserved() in efi_reserve_boot_services()
>>>> returns true when the ranges only overlap.
>>>>
>>>>      already_reserved = memblock_is_region_reserved(start, size);
>>>
>>> Do you mean efi_reserve_boot_services is supposed to reserve the bgrt
>>> memory but it does not reserve it due to the region overlapping with
>>> some other reserved region?  If so can you debug and find what exact
>>> memblock reserved region overlaps with the bgrt?
>>
>> Yes. I added the following debug print to efi_reserve_boot_services():
>>
>> --- a/arch/x86/platform/efi/quirks.c
>> +++ b/arch/x86/platform/efi/quirks.c
>> @@ -339,6 +339,10 @@ void __init efi_reserve_boot_services(void)
>>
>>                 already_reserved = memblock_is_region_reserved(start, size);
>>
>> +               pr_info("kdumpdebug: efi_reserve_boot_services start 0x%lu, "
>> +                       "size 0x%lx, type 0x%lx, already_reserved %d\n",
>> +                       start, size, md->type, already_reserved);
>> +
>>                 /*
>>                  * Because the following memblock_reserve() is paired
>>                  * with memblock_free_late() for this region in
>>
>> This memory [0x0000005976a018-0x00000005976abc7] is reserved here, which belongs to EFI_BOOT_SERVICES_DATA.
>>     [    0.000000] memblock_reserve: [0x000000005976a018-0x000000005976abc7] efi_memattr_init+0x51/0xa0
>> It falls in the following range
>>     [    0.000000] efi: mem22: [Boot Data   |   |  |  |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000051329000-0x000000005cefefff] (187MB)
>>
>> in efi_reserve_boot_services(), [0x0000005132900-0x00000005cefeff] will not be fully reserved because [0x0000005976a018-0x0000005976abc7]
>> has already been reserved and overlaps with [0x0000005976a018-0x0000005976abc7]
> 
> Ok, it looks to me it is like this:
> 
> efi_memattr_init() reserved the memattr table with memblock_reserve
> efi_reserve_boot_services failed to reserve the boot data region which
> includes the memattr table due to it has been revervely partially.
>

Yes, that's my analysis.

> So this should be a UEFI issue revealed by the crashkernel resource
> late intert commit.   I suspect the memblock_reserve in
> efi_memattr_init can be just removed and leave to
> efi_reserve_boot_services to do that.
> 
> Added Ard and efi list for opinion.
> 

Thanks a lot! I was just about to ask the efi guys for help.

> 
>>
>>     [    0.021316] efi: kdumpdebug: efi_reserve_boot_services start 0x51329000, size 0xbbd6000, type 0x4, already_reserved 1
>>
>> It is not reserved by memblock, this free memory region is allocated by crashkernel
>>
>>     [    0.022597] crashkernel low memory reserved: 0x49000000 - 0x59000000 (256 MB)
>>     [    0.022599] crashkernel reserved: 0x000000c01f000000 - 0x000000c03f000000 (512 MB)
>>
>> In efi_bgrt_init (), it is assumed that the memory of the EFI_BOOT_SERVICES_DATA type has been successfully
>> reserved. Therefore, the address in the range is directly used. As a result, the memory overlaps with
>> the crashkernel region.
>>
>>     [    0.029694] e820: update [mem 0x53cbd000-0x53ccffff] usable ==> reserved
> 
> Thanks
> Dave
> 
> .
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Address Range Validation in Crash Kernel Allocation
@ 2024-03-22  7:58           ` Li Huafei
  0 siblings, 0 replies; 17+ messages in thread
From: Li Huafei @ 2024-03-22  7:58 UTC (permalink / raw)
  To: Dave Young, Ard Biesheuvel
  Cc: chenhaixiang (A),
	Baoquan He, kexec, chenhuacai, x86, Louhongxiang, wangbin (A),
	Fangchuangchuang(Fcc,Euler), wanghai (M),
	Wangkefeng (OS Kernel Lab),
	linux-efi



On 2024/3/22 15:18, Dave Young wrote:
> On Thu, 21 Mar 2024 at 20:37, Li Huafei <lihuafei1@huawei.com> wrote:
>>
>>
>>
>> On 2024/3/21 18:06, Dave Young wrote:
>>> Hi,
>>>
>>> On Thu, 21 Mar 2024 at 17:49, Li Huafei <lihuafei1@huawei.com> wrote:
>>>>
>>>> Hi Baoquan,
>>>>
>>>> On 2024/3/21 17:17, chenhaixiang (A) wrote:
>>>>>
>>>>>>> I'm sorry for the delay. Here are some details from the boot log and
>>>>>> /proc/iomem:
>>>>>>> The Boot log:
>>>>>>> [    0.000000] Linux version 6.8.0 (root@localhost.localdomain) (gcc (GCC)
>>>>>> 10.3.1, GNU ld (GNU Binutils) 2.37) #3 SMP PREEMPT_DYNAMIC Wed Mar 20
>>>>>> 11:46:11 UTC 2024
>>>>>>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.8.0
>>>>>> root=/dev/mapper/root ro crashkernel=512M resume=/dev/mapper/swap
>>>>>> rd.lvm.lv=root rd.lvm.lv=swap crash_kexec_post_notifiers softlockup_panic=1
>>>>>> reserve_kbox_mem=16M fsck.mode=auto fsck.repair=yes panic=3
>>>>>> nmi_watchdog=1 quiet rd.shell=0 memblock=debug efi=debug
>>>>>> console=ttyS0,115200n8 console=tty0
>>>>>> ......snip...
>>>>>>> [    0.022622] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
>>>>>> from=0x0000000000000000 max_addr=0x0000000100000000
>>>>>> reserve_crashkernel_generic+0x7c/0x220
>>>>>>> [    0.022628] memblock_phys_alloc_range: 536870912 bytes align=0x1000000
>>>>>> from=0x0000000100000000 max_addr=0x0000400000000000
>>>>>> reserve_crashkernel_generic+0x7c/0x220
>>>>>>> [    0.022632] memblock_reserve: [0x000000c01f000000-0x000000c03effffff]
>>>>>> memblock_alloc_range_nid+0xee/0x170
>>>>>>> [    0.022634] memblock_phys_alloc_range: 268435456 bytes align=0x1000000
>>>>>> from=0x0000000000000000 max_addr=0x0000000100000000
>>>>>> reserve_crashkernel_generic+0x11d/0x220
>>>>>>> [    0.022638] memblock_reserve: [0x0000000049000000-0x0000000058ffffff]
>>>>>> memblock_alloc_range_nid+0xee/0x170
>>>>>>> [    0.022640] crashkernel low memory reserved: 0x49000000 - 0x59000000
>>>>>> (256 MB)
>>>>>>> [    0.022641] crashkernel reserved: 0x000000c01f000000 -
>>>>>> 0x000000c03f000000 (512 MB)
>>>>>>
>>>>>> Here, crashkernel,low is reserved in region:  [0x49000000 - 0x59000000] (256
>>>>>> MB)
>>>>>>       crashkernel,high is reserved in region: [0x000000c01f000000 -
>>>>>> 0x000000c03f000000] (512 MB) ......
>>>>>>> [    0.029839] memblock_reserve: [0x000000c03ffff740-0x000000c03fffff7f]
>>>>>> memblock_alloc_range_nid+0xee/0x170
>>>>>>> [    0.029843] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
>>>>>> reserved
>>>>>>> [    0.029861] TSC deadline timer available
>>>>>>
>>>>>> Then here, region [0x53cbd000-0x53ccffff] is reserved in e820, and print abvoe
>>>>>> "usable ==> reserved". This should be the step which prevents earlier reserved
>>>>>> crashkernel,low from being added to iomem tree. I am not sure what triggered
>>>>>> the e820 update.
>>>>
>>>> We added dump_stack () printing in efi_mem_reserve () and found that
>>>> [0x53cbd000-0x53ccffff] was reserved by BGRT:
>>>>
>>>>   [    0.032259] e820: update [mem 0x53cbd000-0x53ccffff] usable ==>
>>>> reserved
>>>>   [    0.032262] CPU: 0 PID: 0 Comm: swapper Not tainted
>>>> 5.10.0-60.18.0.50.h820.eulerosv2r11.x86_64 #7
>>>>   [    0.032263] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 8.25
>>>> 08/30/2022
>>>>   [    0.032264] Call Trace:
>>>>   [    0.032265]  ? dump_stack+0x57/0x6e
>>>>   [    0.032267]  ? bgrt_init+0xc2/0xc2
>>>>   [    0.032268]  ? __e820__range_update+0x7a/0x1d6
>>>>   [    0.032270]  ? bgrt_init+0xc2/0xc2
>>>>   [    0.032272]  ? bgrt_init+0xc2/0xc2
>>>>   [    0.032274]  ? efi_arch_mem_reserve+0x1a3/0x1d0
>>>>   [    0.032276]  ? efi_mem_reserve+0x2d/0x42
>>>>   [    0.032278]  ? acpi_parse_bgrt+0xa/0x11
>>>>   [    0.032279]  ? acpi_table_parse+0x86/0xbc
>>>>   [    0.032281]  ? acpi_boot_init+0x79/0xad
>>>>   [    0.032282]  ? setup_arch+0x835/0x954
>>>>   [    0.032284]  ? start_kernel+0x5d/0x455
>>>>   [    0.032286]  ? secondary_startup_64_no_verify+0xc2/0xcb
>>>>
>>>> efi_reserve_boot_services() has reserved memory of type
>>>> EFI_BOOT_SERVICES_CODE & EFI_BOOT_SERVICES_DATA  before crashkernel.
>>>> efi_bgrt_init() assumes that EFI_BOOT_SERVICES_DATA is not reserved by
>>>> other modules. Then, the e820_table is directly updated, and the BGRT
>>>> memory is reserved.
>>>>
>>>> However, memblock_is_region_reserved() in efi_reserve_boot_services()
>>>> returns true when the ranges only overlap.
>>>>
>>>>      already_reserved = memblock_is_region_reserved(start, size);
>>>
>>> Do you mean efi_reserve_boot_services is supposed to reserve the bgrt
>>> memory but it does not reserve it due to the region overlapping with
>>> some other reserved region?  If so can you debug and find what exact
>>> memblock reserved region overlaps with the bgrt?
>>
>> Yes. I added the following debug print to efi_reserve_boot_services():
>>
>> --- a/arch/x86/platform/efi/quirks.c
>> +++ b/arch/x86/platform/efi/quirks.c
>> @@ -339,6 +339,10 @@ void __init efi_reserve_boot_services(void)
>>
>>                 already_reserved = memblock_is_region_reserved(start, size);
>>
>> +               pr_info("kdumpdebug: efi_reserve_boot_services start 0x%lu, "
>> +                       "size 0x%lx, type 0x%lx, already_reserved %d\n",
>> +                       start, size, md->type, already_reserved);
>> +
>>                 /*
>>                  * Because the following memblock_reserve() is paired
>>                  * with memblock_free_late() for this region in
>>
>> This memory [0x0000005976a018-0x00000005976abc7] is reserved here, which belongs to EFI_BOOT_SERVICES_DATA.
>>     [    0.000000] memblock_reserve: [0x000000005976a018-0x000000005976abc7] efi_memattr_init+0x51/0xa0
>> It falls in the following range
>>     [    0.000000] efi: mem22: [Boot Data   |   |  |  |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000051329000-0x000000005cefefff] (187MB)
>>
>> in efi_reserve_boot_services(), [0x0000005132900-0x00000005cefeff] will not be fully reserved because [0x0000005976a018-0x0000005976abc7]
>> has already been reserved and overlaps with [0x0000005976a018-0x0000005976abc7]
> 
> Ok, it looks to me it is like this:
> 
> efi_memattr_init() reserved the memattr table with memblock_reserve
> efi_reserve_boot_services failed to reserve the boot data region which
> includes the memattr table due to it has been revervely partially.
>

Yes, that's my analysis.

> So this should be a UEFI issue revealed by the crashkernel resource
> late intert commit.   I suspect the memblock_reserve in
> efi_memattr_init can be just removed and leave to
> efi_reserve_boot_services to do that.
> 
> Added Ard and efi list for opinion.
> 

Thanks a lot! I was just about to ask the efi guys for help.

> 
>>
>>     [    0.021316] efi: kdumpdebug: efi_reserve_boot_services start 0x51329000, size 0xbbd6000, type 0x4, already_reserved 1
>>
>> It is not reserved by memblock, this free memory region is allocated by crashkernel
>>
>>     [    0.022597] crashkernel low memory reserved: 0x49000000 - 0x59000000 (256 MB)
>>     [    0.022599] crashkernel reserved: 0x000000c01f000000 - 0x000000c03f000000 (512 MB)
>>
>> In efi_bgrt_init (), it is assumed that the memory of the EFI_BOOT_SERVICES_DATA type has been successfully
>> reserved. Therefore, the address in the range is directly used. As a result, the memory overlaps with
>> the crashkernel region.
>>
>>     [    0.029694] e820: update [mem 0x53cbd000-0x53ccffff] usable ==> reserved
> 
> Thanks
> Dave
> 
> .
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Address Range Validation in Crash Kernel Allocation
       [not found] <45065451d7d343679e150313c1ee2b62@huawei.com>
@ 2024-03-21  7:09 ` Baoquan He
  0 siblings, 0 replies; 17+ messages in thread
From: Baoquan He @ 2024-03-21  7:09 UTC (permalink / raw)
  To: chenhaixiang (A)
  Cc: kexec, chenhuacai, x86, Louhongxiang, wangbin (A),
	Fangchuangchuang(Fcc,Euler), lihuafei, wanghai (M)

On 03/21/24 at 03:22am, chenhaixiang (A) wrote:
> I'm sorry for the delay. Here are some details from the boot log and /proc/iomem:
> The Boot log:
> [    0.000000] Linux version 6.8.0 (root@localhost.localdomain) (gcc (GCC) 10.3.1, GNU ld (GNU Binutils) 2.37) #3 SMP PREEMPT_DYNAMIC Wed Mar 20 11:46:11 UTC 2024
> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.8.0 root=/dev/mapper/root ro crashkernel=512M resume=/dev/mapper/swap rd.lvm.lv=root rd.lvm.lv=swap crash_kexec_post_notifiers softlockup_panic=1 reserve_kbox_mem=16M fsck.mode=auto fsck.repair=yes panic=3 nmi_watchdog=1 quiet rd.shell=0 memblock=debug efi=debug console=ttyS0,115200n8 console=tty0
......snip...
> [    0.022622] memblock_phys_alloc_range: 536870912 bytes align=0x1000000 from=0x0000000000000000 max_addr=0x0000000100000000 reserve_crashkernel_generic+0x7c/0x220
> [    0.022628] memblock_phys_alloc_range: 536870912 bytes align=0x1000000 from=0x0000000100000000 max_addr=0x0000400000000000 reserve_crashkernel_generic+0x7c/0x220
> [    0.022632] memblock_reserve: [0x000000c01f000000-0x000000c03effffff] memblock_alloc_range_nid+0xee/0x170
> [    0.022634] memblock_phys_alloc_range: 268435456 bytes align=0x1000000 from=0x0000000000000000 max_addr=0x0000000100000000 reserve_crashkernel_generic+0x11d/0x220
> [    0.022638] memblock_reserve: [0x0000000049000000-0x0000000058ffffff] memblock_alloc_range_nid+0xee/0x170
> [    0.022640] crashkernel low memory reserved: 0x49000000 - 0x59000000 (256 MB)
> [    0.022641] crashkernel reserved: 0x000000c01f000000 - 0x000000c03f000000 (512 MB)

Here, crashkernel,low is reserved in region:  [0x49000000 - 0x59000000] (256 MB)
      crashkernel,high is reserved in region: [0x000000c01f000000 - 0x000000c03f000000] (512 MB)
......
> [    0.029839] memblock_reserve: [0x000000c03ffff740-0x000000c03fffff7f] memblock_alloc_range_nid+0xee/0x170
> [    0.029843] e820: update [mem 0x53cbd000-0x53ccffff] usable ==> reserved
> [    0.029861] TSC deadline timer available

Then here, region [0x53cbd000-0x53ccffff] is reserved in e820, and print
abvoe "usable ==> reserved". This should be the step which prevents earlier
reserved crashkernel,low from being added to iomem tree. I am not sure
what triggered the e820 update.

How do you boot into your new 6.8.0 kernel? Used kexec -l to jump into
the 2nd kernel, or reboot from bios/firmware boot up into 6.8.0?

Reverting below commit should fix your problem, can you try it? 

commit 4a693ce65b186fddc1a73621bd6f941e6e3eca21
Author: Huacai Chen <chenhuacai@kernel.org>
Date:   Fri Dec 29 16:02:13 2023 +0800

    kdump: defer the insertion of crashkernel resources


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Address Range Validation in Crash Kernel Allocation
  2024-03-20 13:12 chenhaixiang (A)
@ 2024-03-20 14:08 ` Baoquan He
  0 siblings, 0 replies; 17+ messages in thread
From: Baoquan He @ 2024-03-20 14:08 UTC (permalink / raw)
  To: chenhaixiang (A)
  Cc: kexec, Louhongxiang, wangbin (A), Fangchuangchuang(Fcc,Euler),
	lihuafei, wanghai (M)

On 03/20/24 at 01:12pm, chenhaixiang (A) wrote:
> I tested the kernel-6.8 on my machine and found that the crashkernel memory reservation range is consistent with kernel-5.10. However, it's strange that when crashkernel=512M, the kernel still allocates two memory segments for crashkernel, as seen in the logs:
> [    0.022640] crashkernel low memory reserved: 0x49000000 - 0x59000000 (256 MB)
> [    0.022641] crashkernel reserved: 0x000000c01f000000 - 0x000000c03f000000 (512 MB)
> But only one segment is shown in /proc/iomem:
> 	c01f000000-c03effffff : Crash kernel
> Moreover, the conflicting address 53cbd000-53ccffff is still reserved by someone else:
> 	53cbd000-53ccffff : Reserved
> [    0.029843] e820: update [mem 0x53cbd000-0x53ccffff] usable ==> reserved
> It seems there is a kernel bug here.
> If you need the complete log, I can send it later.

Yeah, please attach the complete log. I will have a look.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Address Range Validation in Crash Kernel Allocation
@ 2024-03-20 13:12 chenhaixiang (A)
  2024-03-20 14:08 ` Baoquan He
  0 siblings, 1 reply; 17+ messages in thread
From: chenhaixiang (A) @ 2024-03-20 13:12 UTC (permalink / raw)
  To: Baoquan He
  Cc: kexec, Louhongxiang, wangbin (A), Fangchuangchuang(Fcc,Euler),
	lihuafei, wanghai (M)

I tested the kernel-6.8 on my machine and found that the crashkernel memory reservation range is consistent with kernel-5.10. However, it's strange that when crashkernel=512M, the kernel still allocates two memory segments for crashkernel, as seen in the logs:
[    0.022640] crashkernel low memory reserved: 0x49000000 - 0x59000000 (256 MB)
[    0.022641] crashkernel reserved: 0x000000c01f000000 - 0x000000c03f000000 (512 MB)
But only one segment is shown in /proc/iomem:
	c01f000000-c03effffff : Crash kernel
Moreover, the conflicting address 53cbd000-53ccffff is still reserved by someone else:
	53cbd000-53ccffff : Reserved
[    0.029843] e820: update [mem 0x53cbd000-0x53ccffff] usable ==> reserved
It seems there is a kernel bug here.
If you need the complete log, I can send it later.
---------
On 03/19/24 at 4:22pm, Baoquan He wrote:
> On 03/19/24 at 07:24am, chenhaixiang (A) wrote:
> > Thank you for your reply!
> > The kernel version on my machine is kernel-5.10, and the kexec-tools version is
> kexec-tools-2.0.27.
> > However, my issue seems to be a bit different. On my machine, I can see the
> crashkernel memory segment in /proc/iomem. However, for some reason,
> within the address range allocated for crashkernel, there is also a segment
> marked as 'Reserved' (I'm not sure who marked it). In this scenario, kexec-tools
> calculates the CRASH MEMORY RANGES incorrectly.
> > ```
> 
> crashkernel region can't be reserved again once it's allocated and reserved in
> memblock. There must be something wrong with the code. You can try upstream
> kernel and kexec-tools to see if it exists too. Since you are using an old kernel and
> could be on a distros, we may not be able to cover it. Sorry about that.
> 
> If you want to debug to find out the reason, I can help give suggestions.
> 
> > cat /proc/iomem
> > 2d4fd058-58ffffff : System RAM
> >   49000000-58ffffff : Crash kernel
> >     53cbd000-53ccffff : Reserved
> > ```
> > I'm not sure if the crashkernel memory segment should not include other
> markings, and if not supported, whether kexec-tools should raise an error.
> > Thanks
> > Chen Haixiang
> > ----------
> > On 03/19/24 at 9:38qm, Baoquan He wrote:
> > > Hi,
> > >
> > > On 03/18/24 at 12:00pm, chenhaixiang (A) wrote:
> > > > Dear kexec Community Members,
> > > >
> > > > I encountered an issue while using kexec-tools on my x86_64 machine.
> > > > When there is a segment marked as 'reserved' within the memory
> > > > range
> > > allocated for the crash kernel in /proc/iomem,the output appears as follows:
> > > > 2d4fd058-60efefff : System RAM
> > > >   2d4fd058-58ffffff : System RAM
> > > >     49000000-58ffffff : Crash kernel
> > > >       53cbd000-53ccffff : Reserved
> > >
> > > What kernel are you using? the version of kernel, and kexec-tools?
> > >
> > > If you are testing on the latest mainline kernel, you could meet the
> > > issue Dave have met and fixed in below patch:
> > >
> > > [PATCH] x86/kexec: do not update E820 kexec table for setup_data
> > > https://lore.kernel.org/all/ZeZ2Kos-OOZNSrmO@darkstar.users.ipa.redh
> > > at.com/
> > > T/#u
> > >
> > > Thanks
> > > Baoquan
> > >
> > > >
> > > > The crash_memory_range array will encounter incorrect address ranges:
> > > > CRASH MEMORY RANGES
> > > > 000000002d4fd058-0000000048ffffff (0)
> > > > 0000000053cbd000-0000000048ffffff (1)
> > > > 0000000059000000-0000000053ccffff (0)
> > > >
> > > > Read the code, I noticed that the get_crash_memory_ranges()
> > > > function
> > > invokes exclude_region() to handle the splitting of memory regions,
> > > but it seems unable to properly handle the scenario described above.
> > > > The code logic is as follows:
> > > > ...
> > > > 	if (start < mend && end > mstart) {
> > > > 		if (start != mstart && end != mend) {
> > > > 			/* Split memory region */
> > > > 			crash_memory_range[i].end = start - 1;
> > > > 			temp_region.start = end + 1;
> > > > 			temp_region.end = mend;
> > > > 			temp_region.type = RANGE_RAM;
> > > > 			tidx = i+1;
> > > > 		} else if (start != mstart)
> > > > 			crash_memory_range[i].end = start - 1;
> > > > 		else
> > > > 			crash_memory_range[i].start = end + 1;
> > > > 	}
> > > > ...
> > > > If start < mstart < mend < end, resulting in
> > > > crash_memory_range[i].end
> > > becoming less than crash_memory_range[i].start, leading to incorrect
> > > address ranges.
> > > > I would like to know if this behavior is reasonable and whether it
> > > > is necessary to
> > > validate the address ranges for compliance at the end.
> > > >
> > > > Thank you for your time and assistance.
> > > >
> > > > Chen Haixiang
> > > >
> > > > _______________________________________________
> > > > kexec mailing list
> > > > kexec@lists.infradead.org
> > > > http://lists.infradead.org/mailman/listinfo/kexec
> > > >
> >


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Address Range Validation in Crash Kernel Allocation
  2024-03-19  7:24 chenhaixiang (A)
@ 2024-03-19  8:21 ` Baoquan He
  0 siblings, 0 replies; 17+ messages in thread
From: Baoquan He @ 2024-03-19  8:21 UTC (permalink / raw)
  To: chenhaixiang (A)
  Cc: kexec, Louhongxiang, wangbin (A), Fangchuangchuang(Fcc,Euler)

On 03/19/24 at 07:24am, chenhaixiang (A) wrote:
> Thank you for your reply!
> The kernel version on my machine is kernel-5.10, and the kexec-tools version is kexec-tools-2.0.27. 
> However, my issue seems to be a bit different. On my machine, I can see the crashkernel memory segment in /proc/iomem. However, for some reason, within the address range allocated for crashkernel, there is also a segment marked as 'Reserved' (I'm not sure who marked it). In this scenario, kexec-tools calculates the CRASH MEMORY RANGES incorrectly.
> ```

crashkernel region can't be reserved again once it's allocated and
reserved in memblock. There must be something wrong with the code. You
can try upstream kernel and kexec-tools to see if it exists too. Since
you are using an old kernel and could be on a distros, we may not be
able to cover it. Sorry about that.

If you want to debug to find out the reason, I can help give suggestions.

> cat /proc/iomem
> 2d4fd058-58ffffff : System RAM
>   49000000-58ffffff : Crash kernel
>     53cbd000-53ccffff : Reserved
> ```
> I'm not sure if the crashkernel memory segment should not include other markings, and if not supported, whether kexec-tools should raise an error.
> Thanks
> Chen Haixiang
> ----------
> On 03/19/24 at 9:38qm, Baoquan He wrote:
> > Hi,
> > 
> > On 03/18/24 at 12:00pm, chenhaixiang (A) wrote:
> > > Dear kexec Community Members,
> > >
> > > I encountered an issue while using kexec-tools on my x86_64 machine.
> > > When there is a segment marked as 'reserved' within the memory range
> > allocated for the crash kernel in /proc/iomem,the output appears as follows:
> > > 2d4fd058-60efefff : System RAM
> > >   2d4fd058-58ffffff : System RAM
> > >     49000000-58ffffff : Crash kernel
> > >       53cbd000-53ccffff : Reserved
> > 
> > What kernel are you using? the version of kernel, and kexec-tools?
> > 
> > If you are testing on the latest mainline kernel, you could meet the issue Dave
> > have met and fixed in below patch:
> > 
> > [PATCH] x86/kexec: do not update E820 kexec table for setup_data
> > https://lore.kernel.org/all/ZeZ2Kos-OOZNSrmO@darkstar.users.ipa.redhat.com/
> > T/#u
> > 
> > Thanks
> > Baoquan
> > 
> > >
> > > The crash_memory_range array will encounter incorrect address ranges:
> > > CRASH MEMORY RANGES
> > > 000000002d4fd058-0000000048ffffff (0)
> > > 0000000053cbd000-0000000048ffffff (1)
> > > 0000000059000000-0000000053ccffff (0)
> > >
> > > Read the code, I noticed that the get_crash_memory_ranges() function
> > invokes exclude_region() to handle the splitting of memory regions, but it seems
> > unable to properly handle the scenario described above.
> > > The code logic is as follows:
> > > ...
> > > 	if (start < mend && end > mstart) {
> > > 		if (start != mstart && end != mend) {
> > > 			/* Split memory region */
> > > 			crash_memory_range[i].end = start - 1;
> > > 			temp_region.start = end + 1;
> > > 			temp_region.end = mend;
> > > 			temp_region.type = RANGE_RAM;
> > > 			tidx = i+1;
> > > 		} else if (start != mstart)
> > > 			crash_memory_range[i].end = start - 1;
> > > 		else
> > > 			crash_memory_range[i].start = end + 1;
> > > 	}
> > > ...
> > > If start < mstart < mend < end, resulting in crash_memory_range[i].end
> > becoming less than crash_memory_range[i].start, leading to incorrect address
> > ranges.
> > > I would like to know if this behavior is reasonable and whether it is necessary to
> > validate the address ranges for compliance at the end.
> > >
> > > Thank you for your time and assistance.
> > >
> > > Chen Haixiang
> > >
> > > _______________________________________________
> > > kexec mailing list
> > > kexec@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/kexec
> > >
> 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Address Range Validation in Crash Kernel Allocation
@ 2024-03-19  7:24 chenhaixiang (A)
  2024-03-19  8:21 ` Baoquan He
  0 siblings, 1 reply; 17+ messages in thread
From: chenhaixiang (A) @ 2024-03-19  7:24 UTC (permalink / raw)
  To: Baoquan He
  Cc: kexec, Louhongxiang, wangbin (A), Fangchuangchuang(Fcc,Euler),
	chenhaixiang (A)

Thank you for your reply!
The kernel version on my machine is kernel-5.10, and the kexec-tools version is kexec-tools-2.0.27. 
However, my issue seems to be a bit different. On my machine, I can see the crashkernel memory segment in /proc/iomem. However, for some reason, within the address range allocated for crashkernel, there is also a segment marked as 'Reserved' (I'm not sure who marked it). In this scenario, kexec-tools calculates the CRASH MEMORY RANGES incorrectly.
```
cat /proc/iomem
2d4fd058-58ffffff : System RAM
  49000000-58ffffff : Crash kernel
    53cbd000-53ccffff : Reserved
```
I'm not sure if the crashkernel memory segment should not include other markings, and if not supported, whether kexec-tools should raise an error.
Thanks
Chen Haixiang
----------
On 03/19/24 at 9:38qm, Baoquan He wrote:
> Hi,
> 
> On 03/18/24 at 12:00pm, chenhaixiang (A) wrote:
> > Dear kexec Community Members,
> >
> > I encountered an issue while using kexec-tools on my x86_64 machine.
> > When there is a segment marked as 'reserved' within the memory range
> allocated for the crash kernel in /proc/iomem,the output appears as follows:
> > 2d4fd058-60efefff : System RAM
> >   2d4fd058-58ffffff : System RAM
> >     49000000-58ffffff : Crash kernel
> >       53cbd000-53ccffff : Reserved
> 
> What kernel are you using? the version of kernel, and kexec-tools?
> 
> If you are testing on the latest mainline kernel, you could meet the issue Dave
> have met and fixed in below patch:
> 
> [PATCH] x86/kexec: do not update E820 kexec table for setup_data
> https://lore.kernel.org/all/ZeZ2Kos-OOZNSrmO@darkstar.users.ipa.redhat.com/
> T/#u
> 
> Thanks
> Baoquan
> 
> >
> > The crash_memory_range array will encounter incorrect address ranges:
> > CRASH MEMORY RANGES
> > 000000002d4fd058-0000000048ffffff (0)
> > 0000000053cbd000-0000000048ffffff (1)
> > 0000000059000000-0000000053ccffff (0)
> >
> > Read the code, I noticed that the get_crash_memory_ranges() function
> invokes exclude_region() to handle the splitting of memory regions, but it seems
> unable to properly handle the scenario described above.
> > The code logic is as follows:
> > ...
> > 	if (start < mend && end > mstart) {
> > 		if (start != mstart && end != mend) {
> > 			/* Split memory region */
> > 			crash_memory_range[i].end = start - 1;
> > 			temp_region.start = end + 1;
> > 			temp_region.end = mend;
> > 			temp_region.type = RANGE_RAM;
> > 			tidx = i+1;
> > 		} else if (start != mstart)
> > 			crash_memory_range[i].end = start - 1;
> > 		else
> > 			crash_memory_range[i].start = end + 1;
> > 	}
> > ...
> > If start < mstart < mend < end, resulting in crash_memory_range[i].end
> becoming less than crash_memory_range[i].start, leading to incorrect address
> ranges.
> > I would like to know if this behavior is reasonable and whether it is necessary to
> validate the address ranges for compliance at the end.
> >
> > Thank you for your time and assistance.
> >
> > Chen Haixiang
> >
> > _______________________________________________
> > kexec mailing list
> > kexec@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec
> >


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Address Range Validation in Crash Kernel Allocation
  2024-03-18 12:00 chenhaixiang (A)
@ 2024-03-19  1:38 ` Baoquan He
  0 siblings, 0 replies; 17+ messages in thread
From: Baoquan He @ 2024-03-19  1:38 UTC (permalink / raw)
  To: chenhaixiang (A)
  Cc: kexec, Louhongxiang, wangbin (A), Fangchuangchuang(Fcc,Euler)

Hi,

On 03/18/24 at 12:00pm, chenhaixiang (A) wrote:
> Dear kexec Community Members,
> 
> I encountered an issue while using kexec-tools on my x86_64 machine.
> When there is a segment marked as 'reserved' within the memory range allocated for the crash kernel in /proc/iomem,the output appears as follows:
> 2d4fd058-60efefff : System RAM
>   2d4fd058-58ffffff : System RAM
>     49000000-58ffffff : Crash kernel
>       53cbd000-53ccffff : Reserved

What kernel are you using? the version of kernel, and kexec-tools?

If you are testing on the latest mainline kernel, you could meet the
issue Dave have met and fixed in below patch:

[PATCH] x86/kexec: do not update E820 kexec table for setup_data
https://lore.kernel.org/all/ZeZ2Kos-OOZNSrmO@darkstar.users.ipa.redhat.com/T/#u

Thanks
Baoquan

> 
> The crash_memory_range array will encounter incorrect address ranges:
> CRASH MEMORY RANGES
> 000000002d4fd058-0000000048ffffff (0)
> 0000000053cbd000-0000000048ffffff (1)
> 0000000059000000-0000000053ccffff (0)
> 
> Read the code, I noticed that the get_crash_memory_ranges() function invokes exclude_region() to handle the splitting of memory regions, but it seems unable to properly handle the scenario described above.
> The code logic is as follows:
> ...
> 	if (start < mend && end > mstart) {
> 		if (start != mstart && end != mend) {
> 			/* Split memory region */
> 			crash_memory_range[i].end = start - 1;
> 			temp_region.start = end + 1;
> 			temp_region.end = mend;
> 			temp_region.type = RANGE_RAM;
> 			tidx = i+1;
> 		} else if (start != mstart)
> 			crash_memory_range[i].end = start - 1;
> 		else
> 			crash_memory_range[i].start = end + 1;
> 	}
> ...
> If start < mstart < mend < end, resulting in crash_memory_range[i].end becoming less than crash_memory_range[i].start, leading to incorrect address ranges.
> I would like to know if this behavior is reasonable and whether it is necessary to validate the address ranges for compliance at the end.
> 
> Thank you for your time and assistance.
> 
> Chen Haixiang
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Question about Address Range Validation in Crash Kernel Allocation
@ 2024-03-18 12:00 chenhaixiang (A)
  2024-03-19  1:38 ` Baoquan He
  0 siblings, 1 reply; 17+ messages in thread
From: chenhaixiang (A) @ 2024-03-18 12:00 UTC (permalink / raw)
  To: kexec
  Cc: Louhongxiang, wangbin (A), chenhaixiang (A), Fangchuangchuang(Fcc,Euler)

Dear kexec Community Members,

I encountered an issue while using kexec-tools on my x86_64 machine.
When there is a segment marked as 'reserved' within the memory range allocated for the crash kernel in /proc/iomem,the output appears as follows:
2d4fd058-60efefff : System RAM
  2d4fd058-58ffffff : System RAM
    49000000-58ffffff : Crash kernel
      53cbd000-53ccffff : Reserved

The crash_memory_range array will encounter incorrect address ranges:
CRASH MEMORY RANGES
000000002d4fd058-0000000048ffffff (0)
0000000053cbd000-0000000048ffffff (1)
0000000059000000-0000000053ccffff (0)

Read the code, I noticed that the get_crash_memory_ranges() function invokes exclude_region() to handle the splitting of memory regions, but it seems unable to properly handle the scenario described above.
The code logic is as follows:
...
	if (start < mend && end > mstart) {
		if (start != mstart && end != mend) {
			/* Split memory region */
			crash_memory_range[i].end = start - 1;
			temp_region.start = end + 1;
			temp_region.end = mend;
			temp_region.type = RANGE_RAM;
			tidx = i+1;
		} else if (start != mstart)
			crash_memory_range[i].end = start - 1;
		else
			crash_memory_range[i].start = end + 1;
	}
...
If start < mstart < mend < end, resulting in crash_memory_range[i].end becoming less than crash_memory_range[i].start, leading to incorrect address ranges.
I would like to know if this behavior is reasonable and whether it is necessary to validate the address ranges for compliance at the end.

Thank you for your time and assistance.

Chen Haixiang

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-03-22  7:58 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-21  9:17 Question about Address Range Validation in Crash Kernel Allocation chenhaixiang (A)
2024-03-21  9:48 ` Li Huafei
2024-03-21 10:06   ` Dave Young
2024-03-21 12:37     ` Li Huafei
2024-03-22  1:16       ` Baoquan He
2024-03-22  7:26         ` Dave Young
2024-03-22  7:18       ` Dave Young
2024-03-22  7:18         ` Dave Young
2024-03-22  7:58         ` Li Huafei
2024-03-22  7:58           ` Li Huafei
     [not found] <45065451d7d343679e150313c1ee2b62@huawei.com>
2024-03-21  7:09 ` Baoquan He
  -- strict thread matches above, loose matches on Subject: below --
2024-03-20 13:12 chenhaixiang (A)
2024-03-20 14:08 ` Baoquan He
2024-03-19  7:24 chenhaixiang (A)
2024-03-19  8:21 ` Baoquan He
2024-03-18 12:00 chenhaixiang (A)
2024-03-19  1:38 ` Baoquan He

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.