All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arm64/crashdump-arm64: increase CRASH_MAX_MEMORY_RANGES to 32k
@ 2022-04-19 17:23 abuehaze14
  2022-04-19 17:36 ` Mohamed
  2022-04-26  0:14 ` Baoquan He
  0 siblings, 2 replies; 8+ messages in thread
From: abuehaze14 @ 2022-04-19 17:23 UTC (permalink / raw)
  To: kexec

On ARM64 based VMs hotplugging more than 31GB of memory will cause
kdump to fail loading as it's hitting the CRASH_MAX_MEMORY_RANGES
limit which is currently 32 on ARM64 given that the  memory block size
is 1GB. This patch is raising CRASH_MAX_MEMORY_RANGES
to 32K similar to what we have on x86, this should allow
kdump to work until the VM has 32TB which should be
enough for a long time.

Signed-off-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>
---
 kexec/arch/arm64/crashdump-arm64.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kexec/arch/arm64/crashdump-arm64.h b/kexec/arch/arm64/crashdump-arm64.h
index 12f4308..82fa69b 100644
--- a/kexec/arch/arm64/crashdump-arm64.h
+++ b/kexec/arch/arm64/crashdump-arm64.h
@@ -14,7 +14,7 @@
 
 #include "kexec.h"
 
-#define CRASH_MAX_MEMORY_RANGES	32
+#define CRASH_MAX_MEMORY_RANGES	32768
 
 /* crash dump kernel support at most two regions, low_region and high region. */
 #define CRASH_MAX_RESERVED_RANGES	2
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] arm64/crashdump-arm64: increase CRASH_MAX_MEMORY_RANGES to 32k
  2022-04-19 17:23 [PATCH] arm64/crashdump-arm64: increase CRASH_MAX_MEMORY_RANGES to 32k abuehaze14
@ 2022-04-19 17:36 ` Mohamed
  2022-04-19 17:48   ` Mohamed
  2022-04-26  0:14 ` Baoquan He
  1 sibling, 1 reply; 8+ messages in thread
From: Mohamed @ 2022-04-19 17:36 UTC (permalink / raw)
  To: kexec

Hey Team,

- I am sending this e-mail for more context about the previous submitted patch, we are seeing an issue on aarch64 based EC2 instances where kdump will load failing showing "Number of crash memory ranges excedeed the max limit"  if the amount of memory hotplugged to the instance reach 32 GB while is 32 * 1GB memory blocks as shown below. It looks like we are hitting the CRASH_MAX_MEMORY_RANGES limit which is 32 on aarch64 compared to around 2k before been increased to 32K on x86 as mentioned in https://www.spinics.net/lists/kexec/msg26574.html . so when we hotplug a new memory region there is kexec udev rules configured to reload kdump for updating the elfcorehdr note info for memory bank/cpu changes that  works fine until we hit the CRASH_MAX_MEMORY_RANGES limit then we are seeing kdump load failure as shown below.


[root at ip-xx-xx-xx-xx ec2-user]# echo 0x0000000b80000000  > /sys/devices/system/memory/probe 
[root at ip-xx-xx-xx-xx ec2-user]# lsmem 
RANGE                                 SIZE  STATE REMOVABLE BLOCK
0x0000000040000000-0x000000007fffffff   1G online       yes     1
0x0000000400000000-0x00000004bfffffff   3G online       yes 16-18
0x0000000500000000-0x0000000bbfffffff  27G online       yes 20-46

Memory block size:         1G
Total online memory:      31G
Total offline memory:      0B

[root at ip-xx-xx-xx-xx ec2-user]# service kdump status 
Redirecting to /bin/systemctl status kdump.service
? kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
   Active: active (exited) since Fri 2022-04-15 22:16:34 UTC; 9s ago
  Process: 6185 ExecStop=/usr/bin/kdumpctl stop (code=exited, status=0/SUCCESS)
  Process: 6194 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
 Main PID: 6194 (code=exited, status=0/SUCCESS)

Apr 15 22:16:33 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Starting Crash recovery kernel arming...
Apr 15 22:16:34 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6194]: kexec: loaded kdump kernel
Apr 15 22:16:34 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Started Crash recovery kernel arming.
Apr 15 22:16:34 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6194]: Starting kdump: [OK]
[root at ip-xx-xx-xx-xx ec2-user]# echo 0x0000000bc0000000  > /sys/devices/system/memory/probe 

[root at ip-xx-xx-xx-xx ec2-user]# lsmem 
RANGE                                 SIZE  STATE REMOVABLE BLOCK
0x0000000040000000-0x000000007fffffff   1G online       yes     1
0x0000000400000000-0x00000004bfffffff   3G online       yes 16-18
0x0000000500000000-0x0000000bffffffff  28G online       yes 20-47

Memory block size:         1G
Total online memory:      32G
Total offline memory:      0B

[root at ip-xx-xx-xx-xx ec2-user]# service kdump status 
Redirecting to /bin/systemctl status kdump.service
? kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Fri 2022-04-15 22:17:14 UTC; 1s ago
  Process: 6362 ExecStop=/usr/bin/kdumpctl stop (code=exited, status=0/SUCCESS)
  Process: 6371 ExecStart=/usr/bin/kdumpctl start (code=exited, status=1/FAILURE)
 Main PID: 6371 (code=exited, status=1/FAILURE)

Apr 15 22:17:13 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Starting Crash recovery kernel arming...
Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6371]: Error: Number of crash memory ranges excedeed the max limit
Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6371]: kexec: load failed.
Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6371]: Cannot load /boot/vmlinuz-5.10.102-99.473.amzn2.aarch64
Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6371]: kexec: failed to load kdump kernel
Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6371]: Starting kdump: [FAILED]
Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: kdump.service: main process exited, code=exited, status=1/FAILURE
Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Failed to start Crash recovery kernel arming.
Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Unit kdump.service entered failed state.
Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: kdump.service failed.


- With the proposed patch, I am able to hotplug 256 GB of memory to the EC2 instance and kdump is working appropriately. 

[root at ip-xx-xx-xx-xx ec2-user]# lsmem 
RANGE                                  SIZE  STATE REMOVABLE  BLOCK
0x0000000040000000-0x000000007fffffff    1G online       yes      1
0x0000000400000000-0x00000004bfffffff    3G online       yes  16-18
0x0000000500000000-0x000000433fffffff  249G online       yes 20-268

Memory block size:         1G
Total online memory:     253G
Total offline memory:      0B
[root at ip-172-31-1-51 ec2-user]# service kdump status 
Redirecting to /bin/systemctl status kdump.service
? kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
   Active: active (exited) since Sat 2022-04-16 01:10:38 UTC; 32s ago
  Process: 15653 ExecStop=/usr/bin/kdumpctl stop (code=exited, status=0/SUCCESS)
  Process: 15662 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
 Main PID: 15662 (code=exited, status=0/SUCCESS)

Apr 16 01:10:37 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Starting Crash recovery kernel arming...
Apr 16 01:10:38 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[15662]: kexec: loaded kdump kernel
Apr 16 01:10:38 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[15662]: Starting kdump: [OK]
Apr 16 01:10:38 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Started Crash recovery kernel arming.


?On 19/04/2022, 19:23, "abuehaze14" <abuehaze@amazon.com> wrote:

    On ARM64 based VMs hotplugging more than 31GB of memory will cause
    kdump to fail loading as it's hitting the CRASH_MAX_MEMORY_RANGES
    limit which is currently 32 on ARM64 given that the  memory block size
    is 1GB. This patch is raising CRASH_MAX_MEMORY_RANGES
    to 32K similar to what we have on x86, this should allow
    kdump to work until the VM has 32TB which should be
    enough for a long time.

    Signed-off-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>
    ---
     kexec/arch/arm64/crashdump-arm64.h | 2 +-
     1 file changed, 1 insertion(+), 1 deletion(-)

    diff --git a/kexec/arch/arm64/crashdump-arm64.h b/kexec/arch/arm64/crashdump-arm64.h
    index 12f4308..82fa69b 100644
    --- a/kexec/arch/arm64/crashdump-arm64.h
    +++ b/kexec/arch/arm64/crashdump-arm64.h
    @@ -14,7 +14,7 @@

     #include "kexec.h"

    -#define CRASH_MAX_MEMORY_RANGES	32
    +#define CRASH_MAX_MEMORY_RANGES	32768

     /* crash dump kernel support at most two regions, low_region and high region. */
     #define CRASH_MAX_RESERVED_RANGES	2
    -- 
    2.32.0



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] arm64/crashdump-arm64: increase CRASH_MAX_MEMORY_RANGES to 32k
  2022-04-19 17:36 ` Mohamed
@ 2022-04-19 17:48   ` Mohamed
  2022-04-25 20:49     ` Mohamed
  0 siblings, 1 reply; 8+ messages in thread
From: Mohamed @ 2022-04-19 17:48 UTC (permalink / raw)
  To: kexec

Adding Simon to the thread.

Thank you.

Hazem

?On 19/04/2022, 19:36, "Mohamed Abuelfotoh, Hazem" <abuehaze@amazon.com> wrote:

    Hey Team,

    - I am sending this e-mail for more context about the previous submitted patch, we are seeing an issue on aarch64 based EC2 instances where kdump will load failing showing "Number of crash memory ranges excedeed the max limit"  if the amount of memory hotplugged to the instance reach 32 GB while is 32 * 1GB memory blocks as shown below. It looks like we are hitting the CRASH_MAX_MEMORY_RANGES limit which is 32 on aarch64 compared to around 2k before been increased to 32K on x86 as mentioned in https://www.spinics.net/lists/kexec/msg26574.html . so when we hotplug a new memory region there is kexec udev rules configured to reload kdump for updating the elfcorehdr note info for memory bank/cpu changes that  works fine until we hit the CRASH_MAX_MEMORY_RANGES limit then we are seeing kdump load failure as shown below.


    [root at ip-xx-xx-xx-xx ec2-user]# echo 0x0000000b80000000  > /sys/devices/system/memory/probe 
    [root at ip-xx-xx-xx-xx ec2-user]# lsmem 
    RANGE                                 SIZE  STATE REMOVABLE BLOCK
    0x0000000040000000-0x000000007fffffff   1G online       yes     1
    0x0000000400000000-0x00000004bfffffff   3G online       yes 16-18
    0x0000000500000000-0x0000000bbfffffff  27G online       yes 20-46

    Memory block size:         1G
    Total online memory:      31G
    Total offline memory:      0B

    [root at ip-xx-xx-xx-xx ec2-user]# service kdump status 
    Redirecting to /bin/systemctl status kdump.service
    ? kdump.service - Crash recovery kernel arming
       Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
       Active: active (exited) since Fri 2022-04-15 22:16:34 UTC; 9s ago
      Process: 6185 ExecStop=/usr/bin/kdumpctl stop (code=exited, status=0/SUCCESS)
      Process: 6194 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
     Main PID: 6194 (code=exited, status=0/SUCCESS)

    Apr 15 22:16:33 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Starting Crash recovery kernel arming...
    Apr 15 22:16:34 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6194]: kexec: loaded kdump kernel
    Apr 15 22:16:34 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Started Crash recovery kernel arming.
    Apr 15 22:16:34 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6194]: Starting kdump: [OK]
    [root at ip-xx-xx-xx-xx ec2-user]# echo 0x0000000bc0000000  > /sys/devices/system/memory/probe 

    [root at ip-xx-xx-xx-xx ec2-user]# lsmem 
    RANGE                                 SIZE  STATE REMOVABLE BLOCK
    0x0000000040000000-0x000000007fffffff   1G online       yes     1
    0x0000000400000000-0x00000004bfffffff   3G online       yes 16-18
    0x0000000500000000-0x0000000bffffffff  28G online       yes 20-47

    Memory block size:         1G
    Total online memory:      32G
    Total offline memory:      0B

    [root at ip-xx-xx-xx-xx ec2-user]# service kdump status 
    Redirecting to /bin/systemctl status kdump.service
    ? kdump.service - Crash recovery kernel arming
       Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
       Active: failed (Result: exit-code) since Fri 2022-04-15 22:17:14 UTC; 1s ago
      Process: 6362 ExecStop=/usr/bin/kdumpctl stop (code=exited, status=0/SUCCESS)
      Process: 6371 ExecStart=/usr/bin/kdumpctl start (code=exited, status=1/FAILURE)
     Main PID: 6371 (code=exited, status=1/FAILURE)

    Apr 15 22:17:13 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Starting Crash recovery kernel arming...
    Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6371]: Error: Number of crash memory ranges excedeed the max limit
    Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6371]: kexec: load failed.
    Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6371]: Cannot load /boot/vmlinuz-5.10.102-99.473.amzn2.aarch64
    Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6371]: kexec: failed to load kdump kernel
    Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6371]: Starting kdump: [FAILED]
    Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: kdump.service: main process exited, code=exited, status=1/FAILURE
    Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Failed to start Crash recovery kernel arming.
    Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Unit kdump.service entered failed state.
    Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: kdump.service failed.


    - With the proposed patch, I am able to hotplug 256 GB of memory to the EC2 instance and kdump is working appropriately. 

    [root at ip-xx-xx-xx-xx ec2-user]# lsmem 
    RANGE                                  SIZE  STATE REMOVABLE  BLOCK
    0x0000000040000000-0x000000007fffffff    1G online       yes      1
    0x0000000400000000-0x00000004bfffffff    3G online       yes  16-18
    0x0000000500000000-0x000000433fffffff  249G online       yes 20-268

    Memory block size:         1G
    Total online memory:     253G
    Total offline memory:      0B
    [root at ip-172-31-1-51 ec2-user]# service kdump status 
    Redirecting to /bin/systemctl status kdump.service
    ? kdump.service - Crash recovery kernel arming
       Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
       Active: active (exited) since Sat 2022-04-16 01:10:38 UTC; 32s ago
      Process: 15653 ExecStop=/usr/bin/kdumpctl stop (code=exited, status=0/SUCCESS)
      Process: 15662 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
     Main PID: 15662 (code=exited, status=0/SUCCESS)

    Apr 16 01:10:37 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Starting Crash recovery kernel arming...
    Apr 16 01:10:38 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[15662]: kexec: loaded kdump kernel
    Apr 16 01:10:38 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[15662]: Starting kdump: [OK]
    Apr 16 01:10:38 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Started Crash recovery kernel arming.


    On 19/04/2022, 19:23, "abuehaze14" <abuehaze@amazon.com> wrote:

        On ARM64 based VMs hotplugging more than 31GB of memory will cause
        kdump to fail loading as it's hitting the CRASH_MAX_MEMORY_RANGES
        limit which is currently 32 on ARM64 given that the  memory block size
        is 1GB. This patch is raising CRASH_MAX_MEMORY_RANGES
        to 32K similar to what we have on x86, this should allow
        kdump to work until the VM has 32TB which should be
        enough for a long time.

        Signed-off-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>
        ---
         kexec/arch/arm64/crashdump-arm64.h | 2 +-
         1 file changed, 1 insertion(+), 1 deletion(-)

        diff --git a/kexec/arch/arm64/crashdump-arm64.h b/kexec/arch/arm64/crashdump-arm64.h
        index 12f4308..82fa69b 100644
        --- a/kexec/arch/arm64/crashdump-arm64.h
        +++ b/kexec/arch/arm64/crashdump-arm64.h
        @@ -14,7 +14,7 @@

         #include "kexec.h"

        -#define CRASH_MAX_MEMORY_RANGES	32
        +#define CRASH_MAX_MEMORY_RANGES	32768

         /* crash dump kernel support at most two regions, low_region and high region. */
         #define CRASH_MAX_RESERVED_RANGES	2
        -- 
        2.32.0




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] arm64/crashdump-arm64: increase CRASH_MAX_MEMORY_RANGES to 32k
  2022-04-19 17:48   ` Mohamed
@ 2022-04-25 20:49     ` Mohamed
  0 siblings, 0 replies; 8+ messages in thread
From: Mohamed @ 2022-04-25 20:49 UTC (permalink / raw)
  To: kexec

Ping!

?On 19/04/2022, 19:48, "Mohamed Abuelfotoh, Hazem" <abuehaze@amazon.com> wrote:

    Adding Simon to the thread.

    Thank you.

    Hazem

    On 19/04/2022, 19:36, "Mohamed Abuelfotoh, Hazem" <abuehaze@amazon.com> wrote:

        Hey Team,

        - I am sending this e-mail for more context about the previous submitted patch, we are seeing an issue on aarch64 based EC2 instances where kdump will load failing showing "Number of crash memory ranges excedeed the max limit"  if the amount of memory hotplugged to the instance reach 32 GB while is 32 * 1GB memory blocks as shown below. It looks like we are hitting the CRASH_MAX_MEMORY_RANGES limit which is 32 on aarch64 compared to around 2k before been increased to 32K on x86 as mentioned in https://www.spinics.net/lists/kexec/msg26574.html . so when we hotplug a new memory region there is kexec udev rules configured to reload kdump for updating the elfcorehdr note info for memory bank/cpu changes that  works fine until we hit the CRASH_MAX_MEMORY_RANGES limit then we are seeing kdump load failure as shown below.


        [root at ip-xx-xx-xx-xx ec2-user]# echo 0x0000000b80000000  > /sys/devices/system/memory/probe 
        [root at ip-xx-xx-xx-xx ec2-user]# lsmem 
        RANGE                                 SIZE  STATE REMOVABLE BLOCK
        0x0000000040000000-0x000000007fffffff   1G online       yes     1
        0x0000000400000000-0x00000004bfffffff   3G online       yes 16-18
        0x0000000500000000-0x0000000bbfffffff  27G online       yes 20-46

        Memory block size:         1G
        Total online memory:      31G
        Total offline memory:      0B

        [root at ip-xx-xx-xx-xx ec2-user]# service kdump status 
        Redirecting to /bin/systemctl status kdump.service
        ? kdump.service - Crash recovery kernel arming
           Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
           Active: active (exited) since Fri 2022-04-15 22:16:34 UTC; 9s ago
          Process: 6185 ExecStop=/usr/bin/kdumpctl stop (code=exited, status=0/SUCCESS)
          Process: 6194 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
         Main PID: 6194 (code=exited, status=0/SUCCESS)

        Apr 15 22:16:33 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Starting Crash recovery kernel arming...
        Apr 15 22:16:34 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6194]: kexec: loaded kdump kernel
        Apr 15 22:16:34 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Started Crash recovery kernel arming.
        Apr 15 22:16:34 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6194]: Starting kdump: [OK]
        [root at ip-xx-xx-xx-xx ec2-user]# echo 0x0000000bc0000000  > /sys/devices/system/memory/probe 

        [root at ip-xx-xx-xx-xx ec2-user]# lsmem 
        RANGE                                 SIZE  STATE REMOVABLE BLOCK
        0x0000000040000000-0x000000007fffffff   1G online       yes     1
        0x0000000400000000-0x00000004bfffffff   3G online       yes 16-18
        0x0000000500000000-0x0000000bffffffff  28G online       yes 20-47

        Memory block size:         1G
        Total online memory:      32G
        Total offline memory:      0B

        [root at ip-xx-xx-xx-xx ec2-user]# service kdump status 
        Redirecting to /bin/systemctl status kdump.service
        ? kdump.service - Crash recovery kernel arming
           Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
           Active: failed (Result: exit-code) since Fri 2022-04-15 22:17:14 UTC; 1s ago
          Process: 6362 ExecStop=/usr/bin/kdumpctl stop (code=exited, status=0/SUCCESS)
          Process: 6371 ExecStart=/usr/bin/kdumpctl start (code=exited, status=1/FAILURE)
         Main PID: 6371 (code=exited, status=1/FAILURE)

        Apr 15 22:17:13 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Starting Crash recovery kernel arming...
        Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6371]: Error: Number of crash memory ranges excedeed the max limit
        Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6371]: kexec: load failed.
        Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6371]: Cannot load /boot/vmlinuz-5.10.102-99.473.amzn2.aarch64
        Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6371]: kexec: failed to load kdump kernel
        Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[6371]: Starting kdump: [FAILED]
        Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: kdump.service: main process exited, code=exited, status=1/FAILURE
        Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Failed to start Crash recovery kernel arming.
        Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Unit kdump.service entered failed state.
        Apr 15 22:17:14 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: kdump.service failed.


        - With the proposed patch, I am able to hotplug 256 GB of memory to the EC2 instance and kdump is working appropriately. 

        [root at ip-xx-xx-xx-xx ec2-user]# lsmem 
        RANGE                                  SIZE  STATE REMOVABLE  BLOCK
        0x0000000040000000-0x000000007fffffff    1G online       yes      1
        0x0000000400000000-0x00000004bfffffff    3G online       yes  16-18
        0x0000000500000000-0x000000433fffffff  249G online       yes 20-268

        Memory block size:         1G
        Total online memory:     253G
        Total offline memory:      0B
        [root at ip-172-31-1-51 ec2-user]# service kdump status 
        Redirecting to /bin/systemctl status kdump.service
        ? kdump.service - Crash recovery kernel arming
           Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
           Active: active (exited) since Sat 2022-04-16 01:10:38 UTC; 32s ago
          Process: 15653 ExecStop=/usr/bin/kdumpctl stop (code=exited, status=0/SUCCESS)
          Process: 15662 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
         Main PID: 15662 (code=exited, status=0/SUCCESS)

        Apr 16 01:10:37 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Starting Crash recovery kernel arming...
        Apr 16 01:10:38 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[15662]: kexec: loaded kdump kernel
        Apr 16 01:10:38 ip-xx-xx-xx-xx.eu-west-1.compute.internal kdumpctl[15662]: Starting kdump: [OK]
        Apr 16 01:10:38 ip-xx-xx-xx-xx.eu-west-1.compute.internal systemd[1]: Started Crash recovery kernel arming.


        On 19/04/2022, 19:23, "abuehaze14" <abuehaze@amazon.com> wrote:

            On ARM64 based VMs hotplugging more than 31GB of memory will cause
            kdump to fail loading as it's hitting the CRASH_MAX_MEMORY_RANGES
            limit which is currently 32 on ARM64 given that the  memory block size
            is 1GB. This patch is raising CRASH_MAX_MEMORY_RANGES
            to 32K similar to what we have on x86, this should allow
            kdump to work until the VM has 32TB which should be
            enough for a long time.

            Signed-off-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>
            ---
             kexec/arch/arm64/crashdump-arm64.h | 2 +-
             1 file changed, 1 insertion(+), 1 deletion(-)

            diff --git a/kexec/arch/arm64/crashdump-arm64.h b/kexec/arch/arm64/crashdump-arm64.h
            index 12f4308..82fa69b 100644
            --- a/kexec/arch/arm64/crashdump-arm64.h
            +++ b/kexec/arch/arm64/crashdump-arm64.h
            @@ -14,7 +14,7 @@

             #include "kexec.h"

            -#define CRASH_MAX_MEMORY_RANGES	32
            +#define CRASH_MAX_MEMORY_RANGES	32768

             /* crash dump kernel support at most two regions, low_region and high region. */
             #define CRASH_MAX_RESERVED_RANGES	2
            -- 
            2.32.0





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] arm64/crashdump-arm64: increase CRASH_MAX_MEMORY_RANGES to 32k
  2022-04-19 17:23 [PATCH] arm64/crashdump-arm64: increase CRASH_MAX_MEMORY_RANGES to 32k abuehaze14
  2022-04-19 17:36 ` Mohamed
@ 2022-04-26  0:14 ` Baoquan He
  2022-04-28 12:15   ` Mohamed
  2022-04-29  9:58   ` Simon Horman
  1 sibling, 2 replies; 8+ messages in thread
From: Baoquan He @ 2022-04-26  0:14 UTC (permalink / raw)
  To: kexec

On 04/19/22 at 05:23pm, abuehaze14 wrote:
> On ARM64 based VMs hotplugging more than 31GB of memory will cause
> kdump to fail loading as it's hitting the CRASH_MAX_MEMORY_RANGES
> limit which is currently 32 on ARM64 given that the  memory block size
> is 1GB. This patch is raising CRASH_MAX_MEMORY_RANGES
> to 32K similar to what we have on x86, this should allow
> kdump to work until the VM has 32TB which should be
> enough for a long time.
> 
> Signed-off-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>

Sounds reasonable.

Acked-by: Baoquan He <bhe@redhat.com>

By the way, Simon usually collects kexec-tools patches every one to two
weeks, no need to always ping in a short time.

> ---
>  kexec/arch/arm64/crashdump-arm64.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kexec/arch/arm64/crashdump-arm64.h b/kexec/arch/arm64/crashdump-arm64.h
> index 12f4308..82fa69b 100644
> --- a/kexec/arch/arm64/crashdump-arm64.h
> +++ b/kexec/arch/arm64/crashdump-arm64.h
> @@ -14,7 +14,7 @@
>  
>  #include "kexec.h"
>  
> -#define CRASH_MAX_MEMORY_RANGES	32
> +#define CRASH_MAX_MEMORY_RANGES	32768
>  
>  /* crash dump kernel support at most two regions, low_region and high region. */
>  #define CRASH_MAX_RESERVED_RANGES	2
> -- 
> 2.32.0
> 
> 
> _______________________________________________
> kexec mailing list
> kexec at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] arm64/crashdump-arm64: increase CRASH_MAX_MEMORY_RANGES to 32k
  2022-04-26  0:14 ` Baoquan He
@ 2022-04-28 12:15   ` Mohamed
  2022-04-29  9:58   ` Simon Horman
  1 sibling, 0 replies; 8+ messages in thread
From: Mohamed @ 2022-04-28 12:15 UTC (permalink / raw)
  To: kexec

Thanks Baoquan, Will we get this patch into the next kexec-tools release?

Hazem.

?On 26/04/2022, 02:15, "Baoquan He" <bhe@redhat.com> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    On 04/19/22 at 05:23pm, abuehaze14 wrote:
    > On ARM64 based VMs hotplugging more than 31GB of memory will cause
    > kdump to fail loading as it's hitting the CRASH_MAX_MEMORY_RANGES
    > limit which is currently 32 on ARM64 given that the  memory block size
    > is 1GB. This patch is raising CRASH_MAX_MEMORY_RANGES
    > to 32K similar to what we have on x86, this should allow
    > kdump to work until the VM has 32TB which should be
    > enough for a long time.
    >
    > Signed-off-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>

    Sounds reasonable.

    Acked-by: Baoquan He <bhe@redhat.com>

    By the way, Simon usually collects kexec-tools patches every one to two
    weeks, no need to always ping in a short time.

    > ---
    >  kexec/arch/arm64/crashdump-arm64.h | 2 +-
    >  1 file changed, 1 insertion(+), 1 deletion(-)
    >
    > diff --git a/kexec/arch/arm64/crashdump-arm64.h b/kexec/arch/arm64/crashdump-arm64.h
    > index 12f4308..82fa69b 100644
    > --- a/kexec/arch/arm64/crashdump-arm64.h
    > +++ b/kexec/arch/arm64/crashdump-arm64.h
    > @@ -14,7 +14,7 @@
    >
    >  #include "kexec.h"
    >
    > -#define CRASH_MAX_MEMORY_RANGES      32
    > +#define CRASH_MAX_MEMORY_RANGES      32768
    >
    >  /* crash dump kernel support at most two regions, low_region and high region. */
    >  #define CRASH_MAX_RESERVED_RANGES    2
    > --
    > 2.32.0
    >
    >
    > _______________________________________________
    > kexec mailing list
    > kexec at lists.infradead.org
    > http://lists.infradead.org/mailman/listinfo/kexec
    >



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] arm64/crashdump-arm64: increase CRASH_MAX_MEMORY_RANGES to 32k
  2022-04-26  0:14 ` Baoquan He
  2022-04-28 12:15   ` Mohamed
@ 2022-04-29  9:58   ` Simon Horman
  2022-05-02 12:43     ` Mohamed
  1 sibling, 1 reply; 8+ messages in thread
From: Simon Horman @ 2022-04-29  9:58 UTC (permalink / raw)
  To: kexec

On Tue, Apr 26, 2022 at 08:14:15AM +0800, Baoquan He wrote:
> On 04/19/22 at 05:23pm, abuehaze14 wrote:
> > On ARM64 based VMs hotplugging more than 31GB of memory will cause
> > kdump to fail loading as it's hitting the CRASH_MAX_MEMORY_RANGES
> > limit which is currently 32 on ARM64 given that the  memory block size
> > is 1GB. This patch is raising CRASH_MAX_MEMORY_RANGES
> > to 32K similar to what we have on x86, this should allow
> > kdump to work until the VM has 32TB which should be
> > enough for a long time.
> > 
> > Signed-off-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>
> 
> Sounds reasonable.
> 
> Acked-by: Baoquan He <bhe@redhat.com>
> 
> By the way, Simon usually collects kexec-tools patches every one to two
> weeks, no need to always ping in a short time.

Thanks and sorry for the delay.

I have applied this patch to main.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] arm64/crashdump-arm64: increase CRASH_MAX_MEMORY_RANGES to 32k
  2022-04-29  9:58   ` Simon Horman
@ 2022-05-02 12:43     ` Mohamed
  0 siblings, 0 replies; 8+ messages in thread
From: Mohamed @ 2022-05-02 12:43 UTC (permalink / raw)
  To: kexec

Thanks Simon & Baoquan.

Hazem

?On 29/04/2022, 11:59, "Simon Horman" <horms@kernel.org> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    On Tue, Apr 26, 2022 at 08:14:15AM +0800, Baoquan He wrote:
    > On 04/19/22 at 05:23pm, abuehaze14 wrote:
    > > On ARM64 based VMs hotplugging more than 31GB of memory will cause
    > > kdump to fail loading as it's hitting the CRASH_MAX_MEMORY_RANGES
    > > limit which is currently 32 on ARM64 given that the  memory block size
    > > is 1GB. This patch is raising CRASH_MAX_MEMORY_RANGES
    > > to 32K similar to what we have on x86, this should allow
    > > kdump to work until the VM has 32TB which should be
    > > enough for a long time.
    > >
    > > Signed-off-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>
    >
    > Sounds reasonable.
    >
    > Acked-by: Baoquan He <bhe@redhat.com>
    >
    > By the way, Simon usually collects kexec-tools patches every one to two
    > weeks, no need to always ping in a short time.

    Thanks and sorry for the delay.

    I have applied this patch to main.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-05-02 12:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-19 17:23 [PATCH] arm64/crashdump-arm64: increase CRASH_MAX_MEMORY_RANGES to 32k abuehaze14
2022-04-19 17:36 ` Mohamed
2022-04-19 17:48   ` Mohamed
2022-04-25 20:49     ` Mohamed
2022-04-26  0:14 ` Baoquan He
2022-04-28 12:15   ` Mohamed
2022-04-29  9:58   ` Simon Horman
2022-05-02 12:43     ` Mohamed

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.