Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use
@ 2020-03-26 18:07 James Morse
  2020-03-26 18:07 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image James Morse
                   ` (6 more replies)
  0 siblings, 7 replies; 33+ messages in thread
From: James Morse @ 2020-03-26 18:07 UTC (permalink / raw)
  To: kexec, linux-mm, linux-arm-kernel
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, James Morse,
	Eric Biederman, Andrew Morton, Will Deacon

Hello!

arm64 recently queued support for memory hotremove, which led to some
new corner cases for kexec.

If the kexec segments are loaded for a removable region, that region may
be removed before kexec actually occurs. This causes the first kernel to
lockup when applying the relocations. (I've triggered this on x86 too).

The first patch adds a memory notifier for kexec so that it can refuse
to allow in-use regions to be taken offline.


This doesn't solve the problem for arm64, where the new kernel must
initially rely on the data structures from the first boot to describe
memory. These don't describe hotpluggable memory.
If kexec places the kernel in one of these regions, it must also provide
a DT that describes the region in which the kernel was mapped as memory.
(and somehow ensure its always present in the future...)

To prevent this from happening accidentally with unaware user-space,
patches two and three allow arm64 to give these regions a different
name.

This is a change in behaviour for arm64 as memory hotadd and hotremove
were added separately.


I haven't tried kdump.
Unaware kdump from user-space probably won't describe the hotplug
regions if the name is different, which saves us from problems if
the memory is no longer present at kdump time, but means the vmcore
is incomplete.


These patches are based on arm64's for-next/core branch, but can all
be merged independently.

Thanks,

James Morse (3):
  kexec: Prevent removal of memory in use by a loaded kexec image
  mm/memory_hotplug: Allow arch override of non boot memory resource
    names
  arm64: memory: Give hotplug memory a different resource name

 arch/arm64/include/asm/memory.h | 11 +++++++
 kernel/kexec_core.c             | 56 +++++++++++++++++++++++++++++++++
 mm/memory_hotplug.c             |  6 +++-
 3 files changed, 72 insertions(+), 1 deletion(-)

-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-26 18:07 [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use James Morse
@ 2020-03-26 18:07 ` James Morse
  2020-03-27  0:43   ` Anshuman Khandual
                     ` (2 more replies)
  2020-03-26 18:07 ` [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names James Morse
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 33+ messages in thread
From: James Morse @ 2020-03-26 18:07 UTC (permalink / raw)
  To: kexec, linux-mm, linux-arm-kernel
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, James Morse,
	Eric Biederman, Andrew Morton, Will Deacon

An image loaded for kexec is not stored in place, instead its segments
are scattered through memory, and are re-assembled when needed. In the
meantime, the target memory may have been removed.

Because mm is not aware that this memory is still in use, it allows it
to be removed.

Add a memory notifier to prevent the removal of memory regions that
overlap with a loaded kexec image segment. e.g., when triggered from the
Qemu console:
| kexec_core: memory region in use
| memory memory32: Offline failed.

Signed-off-by: James Morse <james.morse@arm.com>
---
 kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index c19c0dad1ebe..ba1d91e868ca 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -12,6 +12,7 @@
 #include <linux/slab.h>
 #include <linux/fs.h>
 #include <linux/kexec.h>
+#include <linux/memory.h>
 #include <linux/mutex.h>
 #include <linux/list.h>
 #include <linux/highmem.h>
@@ -22,10 +23,12 @@
 #include <linux/elf.h>
 #include <linux/elfcore.h>
 #include <linux/utsname.h>
+#include <linux/notifier.h>
 #include <linux/numa.h>
 #include <linux/suspend.h>
 #include <linux/device.h>
 #include <linux/freezer.h>
+#include <linux/pfn.h>
 #include <linux/pm.h>
 #include <linux/cpu.h>
 #include <linux/uaccess.h>
@@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void)
 
 void __weak arch_kexec_unprotect_crashkres(void)
 {}
+
+/*
+ * If user-space wants to offline memory that is in use by a loaded kexec
+ * image, it should unload the image first.
+ */
+static int mem_remove_cb(struct notifier_block *nb, unsigned long action,
+			 void *data)
+{
+	int rv = NOTIFY_OK, i;
+	struct memory_notify *arg = data;
+	unsigned long pfn = arg->start_pfn;
+	unsigned long nr_segments, sstart, send;
+	unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
+
+	might_sleep();
+
+	if (action != MEM_GOING_OFFLINE)
+		return NOTIFY_DONE;
+
+	mutex_lock(&kexec_mutex);
+	if (kexec_image) {
+		nr_segments = kexec_image->nr_segments;
+
+		for (i = 0; i < nr_segments; i++) {
+			sstart = PFN_DOWN(kexec_image->segment[i].mem);
+			send = PFN_UP(kexec_image->segment[i].mem +
+				      kexec_image->segment[i].memsz);
+
+			if ((pfn <= sstart && sstart < end_pfn) ||
+			    (pfn <= send && send < end_pfn)) {
+				pr_warn("Memory region in use\n");
+				rv = NOTIFY_BAD;
+				break;
+			}
+		}
+	}
+	mutex_unlock(&kexec_mutex);
+
+	return rv;
+}
+
+static struct notifier_block mem_remove_nb = {
+	.notifier_call = mem_remove_cb,
+};
+
+static int __init register_mem_remove_cb(void)
+{
+	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+		return register_memory_notifier(&mem_remove_nb);
+
+	return 0;
+}
+device_initcall(register_mem_remove_cb);
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names
  2020-03-26 18:07 [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use James Morse
  2020-03-26 18:07 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image James Morse
@ 2020-03-26 18:07 ` James Morse
  2020-03-27  9:59   ` David Hildenbrand
  2020-04-02  5:49   ` Dave Young
  2020-03-26 18:07 ` [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name James Morse
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 33+ messages in thread
From: James Morse @ 2020-03-26 18:07 UTC (permalink / raw)
  To: kexec, linux-mm, linux-arm-kernel
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, James Morse,
	Eric Biederman, Andrew Morton, Will Deacon

Memory added to the system by hotplug has a 'System RAM' resource created
for it. This is exposed to user-space via /proc/iomem.

This poses problems for kexec on arm64. If kexec decides to place the
kernel in one of these newly onlined regions, the new kernel will find
itself booting from a region not described as memory in the firmware
tables.

Arm64 doesn't have a structure like the e820 memory map that can be
re-written when memory is brought online. Instead arm64 uses the UEFI
memory map, or the memory node from the DT, sometimes both. We never
rewrite these.

Allow an architecture to specify a different name for these hotplug
regions.

Signed-off-by: James Morse <james.morse@arm.com>
---
 mm/memory_hotplug.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0a54ffac8c68..69b03dd7fc74 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -42,6 +42,10 @@
 #include "internal.h"
 #include "shuffle.h"
 
+#ifndef MEMORY_HOTPLUG_RES_NAME
+#define MEMORY_HOTPLUG_RES_NAME "System RAM"
+#endif
+
 /*
  * online_page_callback contains pointer to current page onlining function.
  * Initially it is generic_online_page(). If it is required it could be
@@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size)
 {
 	struct resource *res;
 	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
-	char *resource_name = "System RAM";
+	char *resource_name = MEMORY_HOTPLUG_RES_NAME;
 
 	if (start + size > max_mem_size)
 		return ERR_PTR(-E2BIG);
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name
  2020-03-26 18:07 [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use James Morse
  2020-03-26 18:07 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image James Morse
  2020-03-26 18:07 ` [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names James Morse
@ 2020-03-26 18:07 ` James Morse
  2020-03-30 19:01   ` David Hildenbrand
  2020-03-27  2:11 ` [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use Baoquan He
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: James Morse @ 2020-03-26 18:07 UTC (permalink / raw)
  To: kexec, linux-mm, linux-arm-kernel
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, James Morse,
	Eric Biederman, Andrew Morton, Will Deacon

If kexec chooses to place the kernel in a memory region that was
added after boot, we fail to boot as the kernel is running from a
location that is not described as memory by the UEFI memory map or
the original DT.

To prevent unaware user-space kexec from doing this accidentally,
give these regions a different name.

Signed-off-by: James Morse <james.morse@arm.com>
---
This is a change in behaviour as seen by user-space, because memory hot-add
has already been merged.

 arch/arm64/include/asm/memory.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 2be67b232499..ef1686518469 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -166,6 +166,17 @@
 #define IOREMAP_MAX_ORDER	(PMD_SHIFT)
 #endif
 
+/*
+ * Memory hotplug allows new regions of 'System RAM' to be added to the system.
+ * These aren't described as memory by the UEFI memory map, or DT memory node.
+ * If we kexec from one of these regions, the new kernel boots from a location
+ * that isn't described as RAM.
+ *
+ * Give these resources a different name, so unaware kexec doesn't do this by
+ * accident.
+ */
+#define MEMORY_HOTPLUG_RES_NAME "System RAM (hotplug)"
+
 #ifndef __ASSEMBLY__
 extern u64			vabits_actual;
 #define PAGE_END		(_PAGE_END(vabits_actual))
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-26 18:07 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image James Morse
@ 2020-03-27  0:43   ` Anshuman Khandual
  2020-03-27  2:54     ` Baoquan He
  2020-03-27 15:46     ` James Morse
  2020-03-27  2:34   ` Baoquan He
  2020-03-27  9:30   ` David Hildenbrand
  2 siblings, 2 replies; 33+ messages in thread
From: Anshuman Khandual @ 2020-03-27  0:43 UTC (permalink / raw)
  To: James Morse, kexec, linux-mm, linux-arm-kernel
  Cc: Catalin Marinas, Andrew Morton, Bhupesh Sharma, Will Deacon,
	Eric Biederman



On 03/26/2020 11:37 PM, James Morse wrote:
> An image loaded for kexec is not stored in place, instead its segments
> are scattered through memory, and are re-assembled when needed. In the
> meantime, the target memory may have been removed.
> 
> Because mm is not aware that this memory is still in use, it allows it
> to be removed.

Why the isolation process does not fail when these pages are currently
being used by kexec ?

> 
> Add a memory notifier to prevent the removal of memory regions that
> overlap with a loaded kexec image segment. e.g., when triggered from the
> Qemu console:
> | kexec_core: memory region in use
> | memory memory32: Offline failed.

Yes this is definitely an added protection for these kexec loaded kernels
memory areas from being offlined but I would have expected the preceding
offlining to have failed as well.

> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 56 insertions(+)
> 
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index c19c0dad1ebe..ba1d91e868ca 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -12,6 +12,7 @@
>  #include <linux/slab.h>
>  #include <linux/fs.h>
>  #include <linux/kexec.h>
> +#include <linux/memory.h>
>  #include <linux/mutex.h>
>  #include <linux/list.h>
>  #include <linux/highmem.h>
> @@ -22,10 +23,12 @@
>  #include <linux/elf.h>
>  #include <linux/elfcore.h>
>  #include <linux/utsname.h>
> +#include <linux/notifier.h>
>  #include <linux/numa.h>
>  #include <linux/suspend.h>
>  #include <linux/device.h>
>  #include <linux/freezer.h>
> +#include <linux/pfn.h>
>  #include <linux/pm.h>
>  #include <linux/cpu.h>
>  #include <linux/uaccess.h>
> @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void)
>  
>  void __weak arch_kexec_unprotect_crashkres(void)
>  {}
> +
> +/*
> + * If user-space wants to offline memory that is in use by a loaded kexec
> + * image, it should unload the image first.
> + */

Probably this would need kexec user manual and related system call man pages
update as well.

> +static int mem_remove_cb(struct notifier_block *nb, unsigned long action,
> +			 void *data)
> +{
> +	int rv = NOTIFY_OK, i;
> +	struct memory_notify *arg = data;
> +	unsigned long pfn = arg->start_pfn;
> +	unsigned long nr_segments, sstart, send;
> +	unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
> +
> +	might_sleep();

Required ?

> +
> +	if (action != MEM_GOING_OFFLINE)
> +		return NOTIFY_DONE;
> +
> +	mutex_lock(&kexec_mutex);
> +	if (kexec_image) {
> +		nr_segments = kexec_image->nr_segments;
> +
> +		for (i = 0; i < nr_segments; i++) {
> +			sstart = PFN_DOWN(kexec_image->segment[i].mem);
> +			send = PFN_UP(kexec_image->segment[i].mem +
> +				      kexec_image->segment[i].memsz);
> +
> +			if ((pfn <= sstart && sstart < end_pfn) ||
> +			    (pfn <= send && send < end_pfn)) {
> +				pr_warn("Memory region in use\n");
> +				rv = NOTIFY_BAD;
> +				break;
> +			}
> +		}
> +	}
> +	mutex_unlock(&kexec_mutex);
> +
> +	return rv;

Variable 'rv' is redundant, should use NOTIFY_[BAD|OK] directly instead.

> +}
> +
> +static struct notifier_block mem_remove_nb = {
> +	.notifier_call = mem_remove_cb,
> +};
> +
> +static int __init register_mem_remove_cb(void)
> +{
> +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))

Should not all these new code here be wrapped with CONFIG_MEMORY_HOTREMOVE
to reduce the scope as well as final code size when the config is disabled.

> +		return register_memory_notifier(&mem_remove_nb);
> +
> +	return 0;
> +}
> +device_initcall(register_mem_remove_cb);
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use
  2020-03-26 18:07 [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use James Morse
                   ` (2 preceding siblings ...)
  2020-03-26 18:07 ` [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name James Morse
@ 2020-03-27  2:11 ` Baoquan He
  2020-03-27 15:40   ` James Morse
  2020-03-27  9:27 ` David Hildenbrand
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Baoquan He @ 2020-03-27  2:11 UTC (permalink / raw)
  To: James Morse
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

On 03/26/20 at 06:07pm, James Morse wrote:
> Hello!
> 
> arm64 recently queued support for memory hotremove, which led to some
> new corner cases for kexec.
> 
> If the kexec segments are loaded for a removable region, that region may
> be removed before kexec actually occurs. This causes the first kernel to
> lockup when applying the relocations. (I've triggered this on x86 too).

Do you mean you use 'kexec -l /boot/vmlinuz-xxxx --initrd ...' to load a
kernel, next you hot remove some memory regions, then you execute
'kexec -e' to trigger kexec reboot?

I may not get the point clearly, but we usually do the loading and
triggering of kexec-ed kernel at the same time. 

> 
> The first patch adds a memory notifier for kexec so that it can refuse
> to allow in-use regions to be taken offline.
> 
> 
> This doesn't solve the problem for arm64, where the new kernel must
> initially rely on the data structures from the first boot to describe
> memory. These don't describe hotpluggable memory.
> If kexec places the kernel in one of these regions, it must also provide
> a DT that describes the region in which the kernel was mapped as memory.
> (and somehow ensure its always present in the future...)
> 
> To prevent this from happening accidentally with unaware user-space,
> patches two and three allow arm64 to give these regions a different
> name.
> 
> This is a change in behaviour for arm64 as memory hotadd and hotremove
> were added separately.
> 
> 
> I haven't tried kdump.
> Unaware kdump from user-space probably won't describe the hotplug
> regions if the name is different, which saves us from problems if
> the memory is no longer present at kdump time, but means the vmcore
> is incomplete.

Currently, we will monitor udev events of mem hot add/remove, then
reload kdump kernel. That reloading is only update the elfcorehdr,
because crashkernel has to be reserved during 1st kernel bootup. I don't
think this will have problem.

> 
> 
> These patches are based on arm64's for-next/core branch, but can all
> be merged independently.
> 
> Thanks,
> 
> James Morse (3):
>   kexec: Prevent removal of memory in use by a loaded kexec image
>   mm/memory_hotplug: Allow arch override of non boot memory resource
>     names
>   arm64: memory: Give hotplug memory a different resource name
> 
>  arch/arm64/include/asm/memory.h | 11 +++++++
>  kernel/kexec_core.c             | 56 +++++++++++++++++++++++++++++++++
>  mm/memory_hotplug.c             |  6 +++-
>  3 files changed, 72 insertions(+), 1 deletion(-)
> 
> -- 
> 2.25.1
> 
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-26 18:07 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image James Morse
  2020-03-27  0:43   ` Anshuman Khandual
@ 2020-03-27  2:34   ` Baoquan He
  2020-03-27  9:30   ` David Hildenbrand
  2 siblings, 0 replies; 33+ messages in thread
From: Baoquan He @ 2020-03-27  2:34 UTC (permalink / raw)
  To: James Morse
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

On 03/26/20 at 06:07pm, James Morse wrote:
> An image loaded for kexec is not stored in place, instead its segments
> are scattered through memory, and are re-assembled when needed. In the
> meantime, the target memory may have been removed.
> 
> Because mm is not aware that this memory is still in use, it allows it
> to be removed.
> 
> Add a memory notifier to prevent the removal of memory regions that
> overlap with a loaded kexec image segment. e.g., when triggered from the
> Qemu console:
> | kexec_core: memory region in use
> | memory memory32: Offline failed.

As I replied to the cover letter, usually we do loading and juming of
kexec-ed kernel at one time. If we expect to do both of them at
different time, I agree we should do something to make thing safer if
someone really want to do since it's allowed, can we do anything with
the existing notifier?

Mem hotplug has got a notifier to notice it will offline a memory
region.
memory_notify(MEM_OFFLINE, &arg);

> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 56 insertions(+)
> 
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index c19c0dad1ebe..ba1d91e868ca 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -12,6 +12,7 @@
>  #include <linux/slab.h>
>  #include <linux/fs.h>
>  #include <linux/kexec.h>
> +#include <linux/memory.h>
>  #include <linux/mutex.h>
>  #include <linux/list.h>
>  #include <linux/highmem.h>
> @@ -22,10 +23,12 @@
>  #include <linux/elf.h>
>  #include <linux/elfcore.h>
>  #include <linux/utsname.h>
> +#include <linux/notifier.h>
>  #include <linux/numa.h>
>  #include <linux/suspend.h>
>  #include <linux/device.h>
>  #include <linux/freezer.h>
> +#include <linux/pfn.h>
>  #include <linux/pm.h>
>  #include <linux/cpu.h>
>  #include <linux/uaccess.h>
> @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void)
>  
>  void __weak arch_kexec_unprotect_crashkres(void)
>  {}
> +
> +/*
> + * If user-space wants to offline memory that is in use by a loaded kexec
> + * image, it should unload the image first.
> + */
> +static int mem_remove_cb(struct notifier_block *nb, unsigned long action,
> +			 void *data)
> +{
> +	int rv = NOTIFY_OK, i;
> +	struct memory_notify *arg = data;
> +	unsigned long pfn = arg->start_pfn;
> +	unsigned long nr_segments, sstart, send;
> +	unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
> +
> +	might_sleep();
> +
> +	if (action != MEM_GOING_OFFLINE)
> +		return NOTIFY_DONE;
> +
> +	mutex_lock(&kexec_mutex);
> +	if (kexec_image) {
> +		nr_segments = kexec_image->nr_segments;
> +
> +		for (i = 0; i < nr_segments; i++) {
> +			sstart = PFN_DOWN(kexec_image->segment[i].mem);
> +			send = PFN_UP(kexec_image->segment[i].mem +
> +				      kexec_image->segment[i].memsz);
> +
> +			if ((pfn <= sstart && sstart < end_pfn) ||
> +			    (pfn <= send && send < end_pfn)) {
> +				pr_warn("Memory region in use\n");
> +				rv = NOTIFY_BAD;
> +				break;
> +			}
> +		}
> +	}
> +	mutex_unlock(&kexec_mutex);
> +
> +	return rv;
> +}
> +
> +static struct notifier_block mem_remove_nb = {
> +	.notifier_call = mem_remove_cb,
> +};
> +
> +static int __init register_mem_remove_cb(void)
> +{
> +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
> +		return register_memory_notifier(&mem_remove_nb);
> +
> +	return 0;
> +}
> +device_initcall(register_mem_remove_cb);
> -- 
> 2.25.1
> 
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-27  0:43   ` Anshuman Khandual
@ 2020-03-27  2:54     ` Baoquan He
  2020-03-27 15:46     ` James Morse
  1 sibling, 0 replies; 33+ messages in thread
From: Baoquan He @ 2020-03-27  2:54 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse,
	Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel

On 03/27/20 at 06:13am, Anshuman Khandual wrote:
> 
> 
> On 03/26/2020 11:37 PM, James Morse wrote:
> > An image loaded for kexec is not stored in place, instead its segments
> > are scattered through memory, and are re-assembled when needed. In the
> > meantime, the target memory may have been removed.
> > 
> > Because mm is not aware that this memory is still in use, it allows it
> > to be removed.
> 
> Why the isolation process does not fail when these pages are currently
> being used by kexec ?

That is trick of kexec implementaiton. When loading kexec-ed kernel, it
just reads in the kenrel image in user space, then searches a place
where it's going to put kernel image in the whole system RAM, but won't
put kernel image in the searched region immediately, just book ahead a
room. When you execute 'kexec -e' to trigger kexec jumping, it will copy
kernel image into the booked room. So the booking is only recorded in
kexec's own data structure.

> 
> > 
> > Add a memory notifier to prevent the removal of memory regions that
> > overlap with a loaded kexec image segment. e.g., when triggered from the
> > Qemu console:
> > | kexec_core: memory region in use
> > | memory memory32: Offline failed.
> 
> Yes this is definitely an added protection for these kexec loaded kernels
> memory areas from being offlined but I would have expected the preceding
> offlining to have failed as well.
> 
> > 
> > Signed-off-by: James Morse <james.morse@arm.com>
> > ---
> >  kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 56 insertions(+)
> > 
> > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > index c19c0dad1ebe..ba1d91e868ca 100644
> > --- a/kernel/kexec_core.c
> > +++ b/kernel/kexec_core.c
> > @@ -12,6 +12,7 @@
> >  #include <linux/slab.h>
> >  #include <linux/fs.h>
> >  #include <linux/kexec.h>
> > +#include <linux/memory.h>
> >  #include <linux/mutex.h>
> >  #include <linux/list.h>
> >  #include <linux/highmem.h>
> > @@ -22,10 +23,12 @@
> >  #include <linux/elf.h>
> >  #include <linux/elfcore.h>
> >  #include <linux/utsname.h>
> > +#include <linux/notifier.h>
> >  #include <linux/numa.h>
> >  #include <linux/suspend.h>
> >  #include <linux/device.h>
> >  #include <linux/freezer.h>
> > +#include <linux/pfn.h>
> >  #include <linux/pm.h>
> >  #include <linux/cpu.h>
> >  #include <linux/uaccess.h>
> > @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void)
> >  
> >  void __weak arch_kexec_unprotect_crashkres(void)
> >  {}
> > +
> > +/*
> > + * If user-space wants to offline memory that is in use by a loaded kexec
> > + * image, it should unload the image first.
> > + */
> 
> Probably this would need kexec user manual and related system call man pages
> update as well.
> 
> > +static int mem_remove_cb(struct notifier_block *nb, unsigned long action,
> > +			 void *data)
> > +{
> > +	int rv = NOTIFY_OK, i;
> > +	struct memory_notify *arg = data;
> > +	unsigned long pfn = arg->start_pfn;
> > +	unsigned long nr_segments, sstart, send;
> > +	unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
> > +
> > +	might_sleep();
> 
> Required ?
> 
> > +
> > +	if (action != MEM_GOING_OFFLINE)
> > +		return NOTIFY_DONE;
> > +
> > +	mutex_lock(&kexec_mutex);
> > +	if (kexec_image) {
> > +		nr_segments = kexec_image->nr_segments;
> > +
> > +		for (i = 0; i < nr_segments; i++) {
> > +			sstart = PFN_DOWN(kexec_image->segment[i].mem);
> > +			send = PFN_UP(kexec_image->segment[i].mem +
> > +				      kexec_image->segment[i].memsz);
> > +
> > +			if ((pfn <= sstart && sstart < end_pfn) ||
> > +			    (pfn <= send && send < end_pfn)) {
> > +				pr_warn("Memory region in use\n");
> > +				rv = NOTIFY_BAD;
> > +				break;
> > +			}
> > +		}
> > +	}
> > +	mutex_unlock(&kexec_mutex);
> > +
> > +	return rv;
> 
> Variable 'rv' is redundant, should use NOTIFY_[BAD|OK] directly instead.
> 
> > +}
> > +
> > +static struct notifier_block mem_remove_nb = {
> > +	.notifier_call = mem_remove_cb,
> > +};
> > +
> > +static int __init register_mem_remove_cb(void)
> > +{
> > +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
> 
> Should not all these new code here be wrapped with CONFIG_MEMORY_HOTREMOVE
> to reduce the scope as well as final code size when the config is disabled.
> 
> > +		return register_memory_notifier(&mem_remove_nb);
> > +
> > +	return 0;
> > +}
> > +device_initcall(register_mem_remove_cb);
> > 
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use
  2020-03-26 18:07 [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use James Morse
                   ` (3 preceding siblings ...)
  2020-03-27  2:11 ` [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use Baoquan He
@ 2020-03-27  9:27 ` David Hildenbrand
  2020-03-27 15:42   ` James Morse
  2020-03-30 13:55 ` Baoquan He
  2020-03-31  3:38 ` Dave Young
  6 siblings, 1 reply; 33+ messages in thread
From: David Hildenbrand @ 2020-03-27  9:27 UTC (permalink / raw)
  To: James Morse, kexec, linux-mm, linux-arm-kernel
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma,
	Eric Biederman, Andrew Morton, Will Deacon

On 26.03.20 19:07, James Morse wrote:
> Hello!
> 
> arm64 recently queued support for memory hotremove, which led to some
> new corner cases for kexec.
> 
> If the kexec segments are loaded for a removable region, that region may
> be removed before kexec actually occurs. This causes the first kernel to
> lockup when applying the relocations. (I've triggered this on x86 too).
> 
> The first patch adds a memory notifier for kexec so that it can refuse
> to allow in-use regions to be taken offline.

IIRC other architectures handle that by setting the affected pages
PageReserved. Any reason why to not stick to the same?

> 
> 
> This doesn't solve the problem for arm64, where the new kernel must
> initially rely on the data structures from the first boot to describe
> memory. These don't describe hotpluggable memory.
> If kexec places the kernel in one of these regions, it must also provide
> a DT that describes the region in which the kernel was mapped as memory.
> (and somehow ensure its always present in the future...)
> 
> To prevent this from happening accidentally with unaware user-space,
> patches two and three allow arm64 to give these regions a different
> name.
> 
> This is a change in behaviour for arm64 as memory hotadd and hotremove
> were added separately.
> 
> 
> I haven't tried kdump.
> Unaware kdump from user-space probably won't describe the hotplug
> regions if the name is different, which saves us from problems if
> the memory is no longer present at kdump time, but means the vmcore
> is incomplete.

Whenever memory is added/removed, kdump.service is to be restarted from
user space, which will fixup the data structures such that kdump will
not try to dump unplugged memory. Also, makedumpfile will check if the
sections are still around IIRC.

Not sure what you mean by "Unaware kdump from user-space".


-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-26 18:07 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image James Morse
  2020-03-27  0:43   ` Anshuman Khandual
  2020-03-27  2:34   ` Baoquan He
@ 2020-03-27  9:30   ` David Hildenbrand
  2020-03-27 16:56     ` James Morse
  2 siblings, 1 reply; 33+ messages in thread
From: David Hildenbrand @ 2020-03-27  9:30 UTC (permalink / raw)
  To: James Morse, kexec, linux-mm, linux-arm-kernel
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma,
	Eric Biederman, Andrew Morton, Will Deacon

On 26.03.20 19:07, James Morse wrote:
> An image loaded for kexec is not stored in place, instead its segments
> are scattered through memory, and are re-assembled when needed. In the
> meantime, the target memory may have been removed.
> 
> Because mm is not aware that this memory is still in use, it allows it
> to be removed.
> 
> Add a memory notifier to prevent the removal of memory regions that
> overlap with a loaded kexec image segment. e.g., when triggered from the
> Qemu console:
> | kexec_core: memory region in use
> | memory memory32: Offline failed.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 56 insertions(+)
> 
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index c19c0dad1ebe..ba1d91e868ca 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -12,6 +12,7 @@
>  #include <linux/slab.h>
>  #include <linux/fs.h>
>  #include <linux/kexec.h>
> +#include <linux/memory.h>
>  #include <linux/mutex.h>
>  #include <linux/list.h>
>  #include <linux/highmem.h>
> @@ -22,10 +23,12 @@
>  #include <linux/elf.h>
>  #include <linux/elfcore.h>
>  #include <linux/utsname.h>
> +#include <linux/notifier.h>
>  #include <linux/numa.h>
>  #include <linux/suspend.h>
>  #include <linux/device.h>
>  #include <linux/freezer.h>
> +#include <linux/pfn.h>
>  #include <linux/pm.h>
>  #include <linux/cpu.h>
>  #include <linux/uaccess.h>
> @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void)
>  
>  void __weak arch_kexec_unprotect_crashkres(void)
>  {}
> +
> +/*
> + * If user-space wants to offline memory that is in use by a loaded kexec
> + * image, it should unload the image first.
> + */
> +static int mem_remove_cb(struct notifier_block *nb, unsigned long action,
> +			 void *data)
> +{
> +	int rv = NOTIFY_OK, i;
> +	struct memory_notify *arg = data;
> +	unsigned long pfn = arg->start_pfn;
> +	unsigned long nr_segments, sstart, send;
> +	unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
> +
> +	might_sleep();
> +
> +	if (action != MEM_GOING_OFFLINE)
> +		return NOTIFY_DONE;
> +
> +	mutex_lock(&kexec_mutex);
> +	if (kexec_image) {
> +		nr_segments = kexec_image->nr_segments;
> +
> +		for (i = 0; i < nr_segments; i++) {
> +			sstart = PFN_DOWN(kexec_image->segment[i].mem);
> +			send = PFN_UP(kexec_image->segment[i].mem +
> +				      kexec_image->segment[i].memsz);
> +
> +			if ((pfn <= sstart && sstart < end_pfn) ||
> +			    (pfn <= send && send < end_pfn)) {
> +				pr_warn("Memory region in use\n");
> +				rv = NOTIFY_BAD;
> +				break;
> +			}
> +		}
> +	}
> +	mutex_unlock(&kexec_mutex);
> +
> +	return rv;
> +}
> +
> +static struct notifier_block mem_remove_nb = {
> +	.notifier_call = mem_remove_cb,
> +};
> +
> +static int __init register_mem_remove_cb(void)
> +{
> +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
> +		return register_memory_notifier(&mem_remove_nb);
> +
> +	return 0;
> +}
> +device_initcall(register_mem_remove_cb);
> 

E.g., in kernel/kexec_core.c:kimage_alloc_pages()

"SetPageReserved(pages + i);"

Pages that are reserved cannot get offlined. How are you able to trigger
that before this patch? (where is the allocation path for kexec, which
will not set the pages reserved?)

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names
  2020-03-26 18:07 ` [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names James Morse
@ 2020-03-27  9:59   ` David Hildenbrand
  2020-03-27 15:39     ` James Morse
  2020-04-02  5:49   ` Dave Young
  1 sibling, 1 reply; 33+ messages in thread
From: David Hildenbrand @ 2020-03-27  9:59 UTC (permalink / raw)
  To: James Morse, kexec, linux-mm, linux-arm-kernel
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma,
	Eric Biederman, Andrew Morton, Will Deacon

On 26.03.20 19:07, James Morse wrote:
> Memory added to the system by hotplug has a 'System RAM' resource created
> for it. This is exposed to user-space via /proc/iomem.
> 
> This poses problems for kexec on arm64. If kexec decides to place the
> kernel in one of these newly onlined regions, the new kernel will find
> itself booting from a region not described as memory in the firmware
> tables.
> 
> Arm64 doesn't have a structure like the e820 memory map that can be
> re-written when memory is brought online. Instead arm64 uses the UEFI
> memory map, or the memory node from the DT, sometimes both. We never
> rewrite these.
> 
> Allow an architecture to specify a different name for these hotplug
> regions.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  mm/memory_hotplug.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 0a54ffac8c68..69b03dd7fc74 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -42,6 +42,10 @@
>  #include "internal.h"
>  #include "shuffle.h"
>  
> +#ifndef MEMORY_HOTPLUG_RES_NAME
> +#define MEMORY_HOTPLUG_RES_NAME "System RAM"
> +#endif

So I assume changing this for all architectures would result in some
user space tool breaking? Are we aware of any?

I do wonder if we should simply change it for all architectures if possible.


> +
>  /*
>   * online_page_callback contains pointer to current page onlining function.
>   * Initially it is generic_online_page(). If it is required it could be
> @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size)
>  {
>  	struct resource *res;
>  	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> -	char *resource_name = "System RAM";
> +	char *resource_name = MEMORY_HOTPLUG_RES_NAME;
>  
>  	if (start + size > max_mem_size)
>  		return ERR_PTR(-E2BIG);
> 


-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names
  2020-03-27  9:59   ` David Hildenbrand
@ 2020-03-27 15:39     ` James Morse
  2020-03-30 13:23       ` David Hildenbrand
  0 siblings, 1 reply; 33+ messages in thread
From: James Morse @ 2020-03-27 15:39 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

Hi David,

On 3/27/20 9:59 AM, David Hildenbrand wrote:
> On 26.03.20 19:07, James Morse wrote:
>> Memory added to the system by hotplug has a 'System RAM' resource created
>> for it. This is exposed to user-space via /proc/iomem.
>>
>> This poses problems for kexec on arm64. If kexec decides to place the
>> kernel in one of these newly onlined regions, the new kernel will find
>> itself booting from a region not described as memory in the firmware
>> tables.
>>
>> Arm64 doesn't have a structure like the e820 memory map that can be
>> re-written when memory is brought online. Instead arm64 uses the UEFI
>> memory map, or the memory node from the DT, sometimes both. We never
>> rewrite these.
>>
>> Allow an architecture to specify a different name for these hotplug
>> regions.


>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 0a54ffac8c68..69b03dd7fc74 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -42,6 +42,10 @@
>>  #include "internal.h"
>>  #include "shuffle.h"
>>  
>> +#ifndef MEMORY_HOTPLUG_RES_NAME
>> +#define MEMORY_HOTPLUG_RES_NAME "System RAM"
>> +#endif
> 
> So I assume changing this for all architectures would result in some
> user space tool breaking? Are we aware of any?

Last time we had to touch arm64's /proc/iomem strings I went through debian's
codesearch for stuff that reads it, kexec-tools was the only thing that parsed
it in anger. (From memory, the other tools were looking for PCIe windows to do
firmware flashing..)

Looking again, having qualifiers on the end of 'System RAM' looks like it could
confuse 's390-tools's detect_mem_chunks parser.

It looks like the strings that come out of 'FIRMWARE_MEMMAP' are a duplicate set.


> I do wonder if we should simply change it for all architectures if possible.

If its possible that would be great. But I suspect that ship has sailed,
changing it on other architectures could break some fragile parsing code.

I'm wary of changing it on arm64, the only thing that makes it tolerable is that
memory hot-add was relatively recently merged, and we don't anticipate it being
widely used until you can remove memory as well.

Changing it on arm64 is to prevent today's versions of kexec-tools from
accidentally placing the new kernel in memory that wasn't described at boot.
This leads to an unhandled exception during boot[0] because the kernel can't
access itself via the mapping of all memory. (hotpluggable regions are only
discovered by suitably configured ACPI systems much later)


Thanks,

James

[0]
| NUMA: NODE_DATA [mem 0x7fdf1780-0x7fdf3fff]
| Unable to handle kernel paging request at virtual address ffff00004230aff8
| Mem abort info:
|   ESR = 0x96000006
|   EC = 0x25: DABT (current EL), IL = 32 bits
|   SET = 0, FnV = 0
|   EA = 0, S1PTW = 0
| Data abort info:
|   ISV = 0, ISS = 0x00000006
|   CM = 0, WnR = 0
| swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000008181d000
| [ffff00004230aff8] pgd=000000007fff9003, pud=000000007fdf7003,
pmd=0000000000000000
| Internal error: Oops: 96000006 [#1] PREEMPT SMP
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper Not tainted 5.6.0-rc3-00098-g3f6c690f5dfe
#11618
| Hardware name: linux,dummy-virt (DT)
| pstate: 80400085 (Nzcv daIf +PAN -UAO BTYPE=--)
| pc : vmemmap_pud_populate+0x2c/0xa0
| lr : vmemmap_populate+0x78/0x154

| Call trace:
|  vmemmap_pud_populate+0x2c/0xa0
|  vmemmap_populate+0x78/0x154
|  __populate_section_memmap+0x3c/0x60
|  sparse_init_nid+0x29c/0x414
|  sparse_init+0x154/0x170
|  bootmem_init+0x78/0xdc
|  setup_arch+0x280/0x5d0
|  start_kernel+0x98/0x4f8
| Code: f9469a84 92748e73 8b010e61 cb040033 (f9400261)
| random: get_random_bytes called from print_oops_end_marker+0x34/0x60
with crng_init=0
| ---[ end trace 0000000000000000 ]---
| Kernel panic - not syncing: Attempted to kill the idle task!
| ---[ end Kernel panic - not syncing: Attempted to kill the idle task!



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use
  2020-03-27  2:11 ` [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use Baoquan He
@ 2020-03-27 15:40   ` James Morse
  0 siblings, 0 replies; 33+ messages in thread
From: James Morse @ 2020-03-27 15:40 UTC (permalink / raw)
  To: Baoquan He
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

Hi Baoquan,

On 3/27/20 2:11 AM, Baoquan He wrote:
> On 03/26/20 at 06:07pm, James Morse wrote:
>> arm64 recently queued support for memory hotremove, which led to some
>> new corner cases for kexec.
>>
>> If the kexec segments are loaded for a removable region, that region may
>> be removed before kexec actually occurs. This causes the first kernel to
>> lockup when applying the relocations. (I've triggered this on x86 too).

> Do you mean you use 'kexec -l /boot/vmlinuz-xxxx --initrd ...' to load a
> kernel, next you hot remove some memory regions, then you execute
> 'kexec -e' to trigger kexec reboot?

Yes. But to make it more fun, get someone else to trigger the hot-remove behind
your back!


> I may not get the point clearly, but we usually do the loading and
> triggering of kexec-ed kernel at the same time. 

But its two syscalls. Should the second one fail if the memory layout has
changed since the first?

(UEFI does this for exit-boot-services, there is handshake to prove you know
what the current memory map is)


>> The first patch adds a memory notifier for kexec so that it can refuse
>> to allow in-use regions to be taken offline.
>>
>>
>> This doesn't solve the problem for arm64, where the new kernel must
>> initially rely on the data structures from the first boot to describe
>> memory. These don't describe hotpluggable memory.
>> If kexec places the kernel in one of these regions, it must also provide
>> a DT that describes the region in which the kernel was mapped as memory.
>> (and somehow ensure its always present in the future...)
>>
>> To prevent this from happening accidentally with unaware user-space,
>> patches two and three allow arm64 to give these regions a different
>> name.
>>
>> This is a change in behaviour for arm64 as memory hotadd and hotremove
>> were added separately.
>>
>>
>> I haven't tried kdump.
>> Unaware kdump from user-space probably won't describe the hotplug
>> regions if the name is different, which saves us from problems if
>> the memory is no longer present at kdump time, but means the vmcore
>> is incomplete.

> Currently, we will monitor udev events of mem hot add/remove, then
> reload kdump kernel. That reloading is only update the elfcorehdr,
> because crashkernel has to be reserved during 1st kernel bootup. I don't
> think this will have problem.

Great. I don't think there is much the kernel can do for the kdump case, so its
good to know the tools already exist for detecting and restarting the kdump load
when the memory layout changes.

For kdump via kexec-file-load, we would need to regenerate the elfcorehdr, I'm
hoping that can be done in core code.


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use
  2020-03-27  9:27 ` David Hildenbrand
@ 2020-03-27 15:42   ` James Morse
  2020-03-30 13:18     ` David Hildenbrand
  0 siblings, 1 reply; 33+ messages in thread
From: James Morse @ 2020-03-27 15:42 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

Hi David,

On 3/27/20 9:27 AM, David Hildenbrand wrote:
> On 26.03.20 19:07, James Morse wrote:
>> arm64 recently queued support for memory hotremove, which led to some
>> new corner cases for kexec.
>>
>> If the kexec segments are loaded for a removable region, that region may
>> be removed before kexec actually occurs. This causes the first kernel to
>> lockup when applying the relocations. (I've triggered this on x86 too).
>>
>> The first patch adds a memory notifier for kexec so that it can refuse
>> to allow in-use regions to be taken offline.

> IIRC other architectures handle that by setting the affected pages
> PageReserved. Any reason why to not stick to the same?

Hmm, I didn't spot this. How come core code doesn't do it if its needed?

Doesn't PG_Reserved prevent the page from being used for regular allocations?
(or is that only if its done early)

I prefer the runtime check as the dmesg output gives the user some chance of
knowing why their memory-offline failed, and doing something about it!


>> This doesn't solve the problem for arm64, where the new kernel must
>> initially rely on the data structures from the first boot to describe
>> memory. These don't describe hotpluggable memory.
>> If kexec places the kernel in one of these regions, it must also provide
>> a DT that describes the region in which the kernel was mapped as memory.
>> (and somehow ensure its always present in the future...)
>>
>> To prevent this from happening accidentally with unaware user-space,
>> patches two and three allow arm64 to give these regions a different
>> name.
>>
>> This is a change in behaviour for arm64 as memory hotadd and hotremove
>> were added separately.
>>
>>
>> I haven't tried kdump.
>> Unaware kdump from user-space probably won't describe the hotplug
>> regions if the name is different, which saves us from problems if
>> the memory is no longer present at kdump time, but means the vmcore
>> is incomplete.

> Whenever memory is added/removed, kdump.service is to be restarted from
> user space, which will fixup the data structures such that kdump will
> not try to dump unplugged memory.

Cunning.


> Also, makedumpfile will check if the
> sections are still around IIRC.

Curious. I thought the vmcore was virtually addressed, how does it know which
linear-map portions correspond to sysfs memory nodes with KASLR?


> Not sure what you mean by "Unaware kdump from user-space".

The existing kexec-tools binaries, that (I assume) don't go probing to find out
if 'System RAM' is removable or not, loading a kdump kernel, along with the
user-space generated blob that describes the first kernel's memory usage to the
second kernel.

'user-space' here to distinguish all this from kexec_file_load().



Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-27  0:43   ` Anshuman Khandual
  2020-03-27  2:54     ` Baoquan He
@ 2020-03-27 15:46     ` James Morse
  1 sibling, 0 replies; 33+ messages in thread
From: James Morse @ 2020-03-27 15:46 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman,
	Andrew Morton, Will Deacon, linux-arm-kernel

Hi Anshuman,

On 3/27/20 12:43 AM, Anshuman Khandual wrote:
> On 03/26/2020 11:37 PM, James Morse wrote:
>> An image loaded for kexec is not stored in place, instead its segments
>> are scattered through memory, and are re-assembled when needed. In the
>> meantime, the target memory may have been removed.
>>
>> Because mm is not aware that this memory is still in use, it allows it
>> to be removed.

> Why the isolation process does not fail when these pages are currently
> being used by kexec ?

Kexec isn't using them right now, but it will once kexec is triggered.
Those physical addresses are held in some internal kexec data structures until
kexec is triggered.


>> Add a memory notifier to prevent the removal of memory regions that
>> overlap with a loaded kexec image segment. e.g., when triggered from the
>> Qemu console:
>> | kexec_core: memory region in use
>> | memory memory32: Offline failed.
> 
> Yes this is definitely an added protection for these kexec loaded kernels
> memory areas from being offlined but I would have expected the preceding
> offlining to have failed as well.

kexec hasn't allocate the memory, part of the regions user-space may specify for
the next kernel may be in use. There is nothing to stop the memory being used in
the meantime.


Any other way of doing this would prevent us saying why it failed.
Like this, the user can spot the 'kexec: Memory region in use message', and
unload kexec.


>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>> index c19c0dad1ebe..ba1d91e868ca 100644
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c
>> @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void)
>>  
>>  void __weak arch_kexec_unprotect_crashkres(void)
>>  {}
>> +
>> +/*
>> + * If user-space wants to offline memory that is in use by a loaded kexec
>> + * image, it should unload the image first.
>> + */

> Probably this would need kexec user manual and related system call man pages
> update as well.

I can't see anything relevant under Documentation. (kdump yes, kexec no...)


>> +static int mem_remove_cb(struct notifier_block *nb, unsigned long action,
>> +			 void *data)
>> +{
>> +	int rv = NOTIFY_OK, i;
>> +	struct memory_notify *arg = data;
>> +	unsigned long pfn = arg->start_pfn;
>> +	unsigned long nr_segments, sstart, send;
>> +	unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
>> +
>> +	might_sleep();
> 
> Required ?

Habit, and I think best practice. We take a mutex, so might_sleep(), but we also
conditionally return before lockdep would see the mutex. Having this annotation
means a dangerous change to the way this is called triggers a warning without
having to test memory hotplug explicitly.


>> +
>> +	if (action != MEM_GOING_OFFLINE)
>> +		return NOTIFY_DONE;
>> +
>> +	mutex_lock(&kexec_mutex);
>> +	if (kexec_image) {
>> +		nr_segments = kexec_image->nr_segments;
>> +
>> +		for (i = 0; i < nr_segments; i++) {
>> +			sstart = PFN_DOWN(kexec_image->segment[i].mem);
>> +			send = PFN_UP(kexec_image->segment[i].mem +
>> +				      kexec_image->segment[i].memsz);
>> +
>> +			if ((pfn <= sstart && sstart < end_pfn) ||
>> +			    (pfn <= send && send < end_pfn)) {
>> +				pr_warn("Memory region in use\n");
>> +				rv = NOTIFY_BAD;
>> +				break;
>> +			}
>> +		}
>> +	}
>> +	mutex_unlock(&kexec_mutex);
>> +
>> +	return rv;
> 
> Variable 'rv' is redundant, should use NOTIFY_[BAD|OK] directly instead.

You'd prefer a mutex_unlock() in the middle of the loop? ... or goto?
(I'm not convinced)

>> +}
>> +
>> +static struct notifier_block mem_remove_nb = {
>> +	.notifier_call = mem_remove_cb,
>> +};
>> +
>> +static int __init register_mem_remove_cb(void)
>> +{
>> +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
> 
> Should not all these new code here be wrapped with CONFIG_MEMORY_HOTREMOVE
> to reduce the scope as well as final code size when the config is disabled.

The compiler is really good at this. "if (false)" means this is all dead code,
and static means its not exported, so the compiler is free to remove it.

Not #ifdef-ing it makes it much more readable, and means the compiler checks its
valid C before it removes it. This avoids weird header include problems that
only show up on some rand-config builds.


Thanks,

James

>> +		return register_memory_notifier(&mem_remove_nb);
>> +
>> +	return 0;
>> +}
>> +device_initcall(register_mem_remove_cb);
>>
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-27  9:30   ` David Hildenbrand
@ 2020-03-27 16:56     ` James Morse
  2020-03-27 17:06       ` David Hildenbrand
  0 siblings, 1 reply; 33+ messages in thread
From: James Morse @ 2020-03-27 16:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

Hi David,

On 3/27/20 9:30 AM, David Hildenbrand wrote:
> On 26.03.20 19:07, James Morse wrote:
>> An image loaded for kexec is not stored in place, instead its segments
>> are scattered through memory, and are re-assembled when needed. In the
>> meantime, the target memory may have been removed.
>>
>> Because mm is not aware that this memory is still in use, it allows it
>> to be removed.
>>
>> Add a memory notifier to prevent the removal of memory regions that
>> overlap with a loaded kexec image segment. e.g., when triggered from the
>> Qemu console:
>> | kexec_core: memory region in use
>> | memory memory32: Offline failed.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> ---
>>  kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 56 insertions(+)
>>
>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>> index c19c0dad1ebe..ba1d91e868ca 100644
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c

> E.g., in kernel/kexec_core.c:kimage_alloc_pages()
> 
> "SetPageReserved(pages + i);"
> 
> Pages that are reserved cannot get offlined. How are you able to trigger
> that before this patch? (where is the allocation path for kexec, which
> will not set the pages reserved?)

This sets page reserved on the memory it gets back from
alloc_pages() in kimage_alloc_pages(). This is when you load the image[0].

The problem I see is for the target or destination memory once you execute the
image. Once machine_kexec() runs, it tries to write to this, assuming it is
still present...


How can I make the commit message clearer?

're-assembled' and 'target memory' aren't quite cutting it, is there are a
correct term to use?
(destination?)



Thanks,

James


[0] Just to convince myself:
| kimage_alloc_pages+0x30/0x15c
| kimage_alloc_page+0x210/0x7d8
| kimage_load_segment+0x14c/0x8c8
| __arm64_sys_kexec_load+0x4f0/0x720
| do_el0_svc+0x13c/0x3c0
| el0_sync_handler+0x9c/0x3c0
| el0_sync+0x158/0x180

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-27 16:56     ` James Morse
@ 2020-03-27 17:06       ` David Hildenbrand
  2020-03-27 18:07         ` James Morse
  0 siblings, 1 reply; 33+ messages in thread
From: David Hildenbrand @ 2020-03-27 17:06 UTC (permalink / raw)
  To: James Morse
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

On 27.03.20 17:56, James Morse wrote:
> Hi David,
> 
> On 3/27/20 9:30 AM, David Hildenbrand wrote:
>> On 26.03.20 19:07, James Morse wrote:
>>> An image loaded for kexec is not stored in place, instead its segments
>>> are scattered through memory, and are re-assembled when needed. In the
>>> meantime, the target memory may have been removed.
>>>
>>> Because mm is not aware that this memory is still in use, it allows it
>>> to be removed.
>>>
>>> Add a memory notifier to prevent the removal of memory regions that
>>> overlap with a loaded kexec image segment. e.g., when triggered from the
>>> Qemu console:
>>> | kexec_core: memory region in use
>>> | memory memory32: Offline failed.
>>>
>>> Signed-off-by: James Morse <james.morse@arm.com>
>>> ---
>>>  kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 56 insertions(+)
>>>
>>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>>> index c19c0dad1ebe..ba1d91e868ca 100644
>>> --- a/kernel/kexec_core.c
>>> +++ b/kernel/kexec_core.c
> 
>> E.g., in kernel/kexec_core.c:kimage_alloc_pages()
>>
>> "SetPageReserved(pages + i);"
>>
>> Pages that are reserved cannot get offlined. How are you able to trigger
>> that before this patch? (where is the allocation path for kexec, which
>> will not set the pages reserved?)
> 
> This sets page reserved on the memory it gets back from
> alloc_pages() in kimage_alloc_pages(). This is when you load the image[0].
> 
> The problem I see is for the target or destination memory once you execute the
> image. Once machine_kexec() runs, it tries to write to this, assuming it is
> still present...

Let's recap

1. You load the image. You allocate memory for e.g., the kexec kernel.
The pages will be marked PG_reserved, so they cannot be offlined.

2. You do the kexec. The kexec kernel will only operate on a reserved
memory region (reserved via e.g., kernel cmdline crashkernel=128M).

Is it that in 2., the reserved memory region (for the crashkernel) could
have been offlined in the meantime?

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-27 17:06       ` David Hildenbrand
@ 2020-03-27 18:07         ` James Morse
  2020-03-27 18:52           ` David Hildenbrand
  0 siblings, 1 reply; 33+ messages in thread
From: James Morse @ 2020-03-27 18:07 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

Hi David,

On 3/27/20 5:06 PM, David Hildenbrand wrote:
> On 27.03.20 17:56, James Morse wrote:
>> On 3/27/20 9:30 AM, David Hildenbrand wrote:
>>> On 26.03.20 19:07, James Morse wrote:
>>>> An image loaded for kexec is not stored in place, instead its segments
>>>> are scattered through memory, and are re-assembled when needed. In the
>>>> meantime, the target memory may have been removed.
>>>>
>>>> Because mm is not aware that this memory is still in use, it allows it
>>>> to be removed.
>>>>
>>>> Add a memory notifier to prevent the removal of memory regions that
>>>> overlap with a loaded kexec image segment. e.g., when triggered from the
>>>> Qemu console:
>>>> | kexec_core: memory region in use
>>>> | memory memory32: Offline failed.

>>>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>>>> index c19c0dad1ebe..ba1d91e868ca 100644
>>>> --- a/kernel/kexec_core.c
>>>> +++ b/kernel/kexec_core.c
>>
>>> E.g., in kernel/kexec_core.c:kimage_alloc_pages()
>>>
>>> "SetPageReserved(pages + i);"
>>>
>>> Pages that are reserved cannot get offlined. How are you able to trigger
>>> that before this patch? (where is the allocation path for kexec, which
>>> will not set the pages reserved?)
>>
>> This sets page reserved on the memory it gets back from
>> alloc_pages() in kimage_alloc_pages(). This is when you load the image[0].
>>
>> The problem I see is for the target or destination memory once you execute the
>> image. Once machine_kexec() runs, it tries to write to this, assuming it is
>> still present...

> Let's recap
> 
> 1. You load the image. You allocate memory for e.g., the kexec kernel.
> The pages will be marked PG_reserved, so they cannot be offlined.
> 
> 2. You do the kexec. The kexec kernel will only operate on a reserved
> memory region (reserved via e.g., kernel cmdline crashkernel=128M).

I think you are merging the kexec and kdump behaviours.
(Wrong terminology? The things behind 'kexec -l Image' and 'kexec -p Image')

For kdump, yes, the new kernel is loaded into the crashkernel reservation, and
confined to it.


For regular kexec, the new kernel can be loaded any where in memory. There might
be a difference with how this works on arm64....

The regular kexec kernel isn't stored in its final location when its loaded, its
relocated there when the image is executed. The target/destination memory may
have been removed in the meantime.

(an example recipe below should clarify this)


> Is it that in 2., the reserved memory region (for the crashkernel) could
> have been offlined in the meantime?

No, for kdump: the crashkernel reservation is PG_reserved, and its not something
mm knows how to move, so that region can't be taken offline.

(On arm64 we additionally prevent the boot-memory from being removed as it is
all described as present by UEFI. The crashkernel reservation would always be
from this type of memory)


This is about a regular kexec, any crashdump reservation is irrelevant.
This kexec kernel is temporarily stored out of line, then relocated when executed.

A recipe so that we're at least on the same terminal! This is on a TX2 running
arm64's for-next/core using Qemu-TCG to emulate x86. (Sorry for the bizarre
config, its because Qemu supports hotremove on x86, but not yet on arm64).


Insert the memory:
(qemu) object_add memory-backend-ram,id=mem1,size=1G
(qemu) device_add pc-dimm,id=dimm1,memdev=mem1

| root@vm:~# free -m
|               total        used        free      shared  ...
| Mem:            918          52         814           0  ...
| Swap:             0           0           0


Bring it online:
| root@vm:~# cd /sys/devices/system/memory/
| root@vm:/sys/devices/system/memory# for F in memory3*; do echo \
| online_movable > $F/state; done

| Built 1 zonelists, mobility grouping on.  Total pages: 251049
| Policy zone: DMA32

| -bash: echo: write error: Invalid argument
| root@vm:/sys/devices/system/memory# free -m
|               total        used        free      shared  ...
| Mem:           1942          53        1836           0  ...
| Swap:             0           0           0


Load kexec:
| root@vm:/sys/devices/system/memory# kexec -l /root/bzImage --reuse-cmdline

Press the Attention button to request removal:

(qemu) device_del dimm1

| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Built 1 zonelists, mobility grouping on.  Total pages: 233728
| Policy zone: DMA32

The memory is gone:
| root@vm:/sys/devices/system/memory# free -m
|               total        used        free      shared  ...
| Mem:            918          89         769           0  ...
| Swap:             0           0           0

Trigger kexec:
| root@vm:/sys/devices/system/memory# kexec -e

[...]

| sd 0:0:0:0: [sda] Synchronizing SCSI cache
| kexec_core: Starting new kernel

... and Qemu restarts the platform firmware instead of proceeding with kexec.
(I assume this is a triple fault)

You can use mem-min and mem-max to control where kexec's user space will place
the memory.


If you apply this patch, the above sequence will fail at the device remove step,
as the physical addresses match the loaded kexec image:

| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| kexec_core: Memory region in use
| kexec_core: Memory region in use
| memory memory39: Offline failed.
| Built 1 zonelists, mobility grouping on.  Total pages: 299212
| Policy zone: Normal

| root@vm:/sys/devices/system/memory# free -m
|               total        used        free      shared  ...
| Mem:           1942          90        1793           0    ...
| Swap:             0           0           0

I can't remove the DIMM, because we failed to offline it:

(qemu) object_del mem1
object 'mem1' is in use, can not be deleted

and I can trigger kexec and boot the new kernel.

kexec user-space here comes from debian bullseye. It picked the removable memory
all by itself without any additional arguments.


(a different issue that can be ignored for now: x86 additionally fails to reboot
if I remove memory, even if its not in use by the kexec image. This doesn't
cause qemu to reboot via firmware, I think it dies before the console. It
doesn't happen on arm64. I suspect the memory map is snapshotted and assumed to
still be correct when the image is executed.)


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-27 18:07         ` James Morse
@ 2020-03-27 18:52           ` David Hildenbrand
  2020-03-30 13:00             ` James Morse
  0 siblings, 1 reply; 33+ messages in thread
From: David Hildenbrand @ 2020-03-27 18:52 UTC (permalink / raw)
  To: James Morse
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

>> 2. You do the kexec. The kexec kernel will only operate on a reserved
>> memory region (reserved via e.g., kernel cmdline crashkernel=128M).
> 
> I think you are merging the kexec and kdump behaviours.
> (Wrong terminology? The things behind 'kexec -l Image' and 'kexec -p Image')

Oh, I see - I think your example below clarifies things. Something like
that should go in the cover letter if we end up in this patch being
required :)

(I missed that the problematic part is "random" addresses passed by user
space to the kernel, where it wants data to be loaded to on kexec -e)

> 
> For kdump, yes, the new kernel is loaded into the crashkernel reservation, and
> confined to it.
> 
> 
> For regular kexec, the new kernel can be loaded any where in memory. There might
> be a difference with how this works on arm64....
> 
> The regular kexec kernel isn't stored in its final location when its loaded, its
> relocated there when the image is executed. The target/destination memory may
> have been removed in the meantime.
> 
> (an example recipe below should clarify this)
> 
> 
>> Is it that in 2., the reserved memory region (for the crashkernel) could
>> have been offlined in the meantime?
> 
> No, for kdump: the crashkernel reservation is PG_reserved, and its not something
> mm knows how to move, so that region can't be taken offline.
> 
> (On arm64 we additionally prevent the boot-memory from being removed as it is
> all described as present by UEFI. The crashkernel reservation would always be
> from this type of memory)

Right.

> 
> 
> This is about a regular kexec, any crashdump reservation is irrelevant.
> This kexec kernel is temporarily stored out of line, then relocated when executed.
> 
> A recipe so that we're at least on the same terminal! This is on a TX2 running
> arm64's for-next/core using Qemu-TCG to emulate x86. (Sorry for the bizarre
> config, its because Qemu supports hotremove on x86, but not yet on arm64).
> 
> 
> Insert the memory:
> (qemu) object_add memory-backend-ram,id=mem1,size=1G
> (qemu) device_add pc-dimm,id=dimm1,memdev=mem1
> 
> | root@vm:~# free -m
> |               total        used        free      shared  ...
> | Mem:            918          52         814           0  ...
> | Swap:             0           0           0
> 
> 
> Bring it online:
> | root@vm:~# cd /sys/devices/system/memory/
> | root@vm:/sys/devices/system/memory# for F in memory3*; do echo \
> | online_movable > $F/state; done
> 
> | Built 1 zonelists, mobility grouping on.  Total pages: 251049
> | Policy zone: DMA32
> 
> | -bash: echo: write error: Invalid argument
> | root@vm:/sys/devices/system/memory# free -m
> |               total        used        free      shared  ...
> | Mem:           1942          53        1836           0  ...
> | Swap:             0           0           0
> 
> 
> Load kexec:
> | root@vm:/sys/devices/system/memory# kexec -l /root/bzImage --reuse-cmdline
> 

I assume this will trigger

kexec_load -> do_kexec_load -> kimage_load_segment ->
kimage_load_normal_segment -> kimage_alloc_page -> kimage_alloc_pages

Which will just allocate a bunch of pages and mark them reserved.

Now, AFAIKs, all allocations will be unmovable. So none of the kexec
segment allocations will actually end up on your DIMM (as it is onlined
online_movable).

So, the loaded image (with its segments) from user won't be problematic
and not get placed on your DIMM.


Now, the problematic part is (via man kexec_load) "mem and memsz specify
a physical address range that is the target of the copy."

So the place where the image will be "assembled" at when doing the
reboot. Understood :)

> Press the Attention button to request removal:
> 
> (qemu) device_del dimm1
> 
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Built 1 zonelists, mobility grouping on.  Total pages: 233728
> | Policy zone: DMA32
> 
> The memory is gone:
> | root@vm:/sys/devices/system/memory# free -m
> |               total        used        free      shared  ...
> | Mem:            918          89         769           0  ...
> | Swap:             0           0           0
> 
> Trigger kexec:
> | root@vm:/sys/devices/system/memory# kexec -e
> 
> [...]
> 
> | sd 0:0:0:0: [sda] Synchronizing SCSI cache
> | kexec_core: Starting new kernel
> 
> ... and Qemu restarts the platform firmware instead of proceeding with kexec.
> (I assume this is a triple fault)
> 
> You can use mem-min and mem-max to control where kexec's user space will place
> the memory.
> 
> 
> If you apply this patch, the above sequence will fail at the device remove step,
> as the physical addresses match the loaded kexec image:
> 
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | kexec_core: Memory region in use
> | kexec_core: Memory region in use

Okay, so I assume the kexec userspace tool provided target kernel
addresses for segments that reside on the DIMM.


> | memory memory39: Offline failed.
> | Built 1 zonelists, mobility grouping on.  Total pages: 299212
> | Policy zone: Normal
> 
> | root@vm:/sys/devices/system/memory# free -m
> |               total        used        free      shared  ...
> | Mem:           1942          90        1793           0    ...
> | Swap:             0           0           0
> 
> I can't remove the DIMM, because we failed to offline it:

I wonder if we should instead make the "kexec -e" fail. It tries to
touch random system memory.

Denying to offline MOVABLE memory should be avoided - and what kexec
does here sounds dangerous to me (allowing it to write random system
memory).

Roughly what I am thinking is this:

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index ba1d91e868ca..70c39a5307e5 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1135,6 +1135,10 @@ int kernel_kexec(void)
                error = -EINVAL;
                goto Unlock;
        }
+       if (!kexec_image_validate()) {
+               error = -EINVAL;
+               goto Unlock;
+       }

 #ifdef CONFIG_KEXEC_JUMP
        if (kexec_image->preserve_context) {


kexec_image_validate() would go over all segments and validate that the
involved pages are actual valid memory (pfn_to_online_page()).

All we have to do is protect from memory hotplug until we switch to the
new kernel.

Will probably need some thought. But it will actually also bail out when
user space passes wrong physical memory addresses, instead of
triple-faulting silently.

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-27 18:52           ` David Hildenbrand
@ 2020-03-30 13:00             ` James Morse
  2020-03-30 13:13               ` David Hildenbrand
  0 siblings, 1 reply; 33+ messages in thread
From: James Morse @ 2020-03-30 13:00 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

Hi David,

On 3/27/20 6:52 PM, David Hildenbrand wrote:
>>> 2. You do the kexec. The kexec kernel will only operate on a reserved
>>> memory region (reserved via e.g., kernel cmdline crashkernel=128M).
>>
>> I think you are merging the kexec and kdump behaviours.
>> (Wrong terminology? The things behind 'kexec -l Image' and 'kexec -p Image')
> 
> Oh, I see - I think your example below clarifies things. Something like
> that should go in the cover letter if we end up in this patch being
> required :)

Do you mean the commit message? I think its far too long...

Adding a sentence about the way kexec load works may help, the first paragraph
would read:

| Kexec allows user-space to specify the address that the kexec image should be
| loaded to. Because this memory may be in use, an image loaded for kexec is not
| stored in place, instead its segments are scattered through memory, and are
| re-assembled when needed. In the meantime, the target memory may have been
| removed.

Do you think thats clearer?


> (I missed that the problematic part is "random" addresses passed by user
> space to the kernel, where it wants data to be loaded to on kexec -e)

[...]

>> Load kexec:
>> | root@vm:/sys/devices/system/memory# kexec -l /root/bzImage --reuse-cmdline
>>
> 
> I assume this will trigger
> 
> kexec_load -> do_kexec_load -> kimage_load_segment ->
> kimage_load_normal_segment -> kimage_alloc_page -> kimage_alloc_pages
> 
> Which will just allocate a bunch of pages and mark them reserved.
> 
> Now, AFAIKs, all allocations will be unmovable. So none of the kexec
> segment allocations will actually end up on your DIMM (as it is onlined
> online_movable).
> 
> So, the loaded image (with its segments) from user won't be problematic
> and not get placed on your DIMM.
> 
> 
> Now, the problematic part is (via man kexec_load) "mem and memsz specify
> a physical address range that is the target of the copy."
> 
> So the place where the image will be "assembled" at when doing the
> reboot. Understood :)

Yup.

[...]

> I wonder if we should instead make the "kexec -e" fail. It tries to
> touch random system memory.

Heh, isn't touching random system memory what kexec does?!

Its all described to user-space as 'System RAM'. Teaching it to probe
/sys/devices/memory/... would require a user-space change.


> Denying to offline MOVABLE memory should be avoided - and what kexec
> does here sounds dangerous to me (allowing it to write random system
> memory).

> Roughly what I am thinking is this:
> 
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index ba1d91e868ca..70c39a5307e5 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -1135,6 +1135,10 @@ int kernel_kexec(void)
>                 error = -EINVAL;
>                 goto Unlock;
>         }
> +       if (!kexec_image_validate()) {
> +               error = -EINVAL;
> +               goto Unlock;
> +       }
> 
>  #ifdef CONFIG_KEXEC_JUMP
>         if (kexec_image->preserve_context) {
> 
> 
> kexec_image_validate() would go over all segments and validate that the
> involved pages are actual valid memory (pfn_to_online_page()).
> 
> All we have to do is protect from memory hotplug until we switch to the
> new kernel.

(migrate_to_reboot_cpu() can sleep), I think you'd end up with something like
this patch, but only while kexec_in_progress. I don't think letting kexec fail
if the events occur in a different order is good for user-space.


> Will probably need some thought. But it will actually also bail out when
> user space passes wrong physical memory addresses, instead of
> triple-faulting silently.

With this change, the reboot(LINUX_REBOOT_CMD_KEXEC), call would fail. This
thing doesn't usually return, so we're likely to trigger error-handling that has
never run before.

(Last time I debugged one of these, it turned out kexec had taken the network
interfaces down, meaning the nfsroot was no longer accessible)

How can user-space know whether kexec is going to succeed, or fail like this?
Any loaded kexec kernel could secretly be in this broken state.

Can user-space know what caused this to become unreliable? (without reading the
kernel source)


Given kexec can be unloaded by user-space, I think its better to prevent us
getting into the broken state, preferably giving the hint that kexec us using
that memory. The user can 'kexec -u', then retry removing the memory.

I think forbidding the memory-offline is simpler for user-space to deal with.


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-30 13:00             ` James Morse
@ 2020-03-30 13:13               ` David Hildenbrand
  2020-03-30 17:17                 ` James Morse
  0 siblings, 1 reply; 33+ messages in thread
From: David Hildenbrand @ 2020-03-30 13:13 UTC (permalink / raw)
  To: James Morse
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

> Adding a sentence about the way kexec load works may help, the first paragraph
> would read:
> 
> | Kexec allows user-space to specify the address that the kexec image should be
> | loaded to. Because this memory may be in use, an image loaded for kexec is not
> | stored in place, instead its segments are scattered through memory, and are
> | re-assembled when needed. In the meantime, the target memory may have been
> | removed.
> 
> Do you think thats clearer?

Yes, very much. Maybe add, that the target is described by user space
during kexec_load() and that user space - right now - parses /proc/iomem
to find applicable system memory.

> [...]
> 
>>> Load kexec:
>>> | root@vm:/sys/devices/system/memory# kexec -l /root/bzImage --reuse-cmdline
>>>
>>
>> I assume this will trigger
>>
>> kexec_load -> do_kexec_load -> kimage_load_segment ->
>> kimage_load_normal_segment -> kimage_alloc_page -> kimage_alloc_pages
>>
>> Which will just allocate a bunch of pages and mark them reserved.
>>
>> Now, AFAIKs, all allocations will be unmovable. So none of the kexec
>> segment allocations will actually end up on your DIMM (as it is onlined
>> online_movable).
>>
>> So, the loaded image (with its segments) from user won't be problematic
>> and not get placed on your DIMM.
>>
>>
>> Now, the problematic part is (via man kexec_load) "mem and memsz specify
>> a physical address range that is the target of the copy."
>>
>> So the place where the image will be "assembled" at when doing the
>> reboot. Understood :)
> 
> Yup.
> 
> [...]
> 
>> I wonder if we should instead make the "kexec -e" fail. It tries to
>> touch random system memory.
> 
> Heh, isn't touching random system memory what kexec does?!

Having a racy user interface that can trigger kernel crashes feels very
wrong. We should limit the impact.

> 
> Its all described to user-space as 'System RAM'. Teaching it to probe
> /sys/devices/memory/... would require a user-space change.

I think we should really rename hotplugged memory on all architectures.

Especially also relevant for virtio-mem/hyper-v balloon, where some
pieces of (hotplugged )memory blocks are partially unavailable and
should not be touched - accessing them results in unpredictable behavior
(e.g., crashes or discarded writes).

[...]

>> Will probably need some thought. But it will actually also bail out when
>> user space passes wrong physical memory addresses, instead of
>> triple-faulting silently.
> 
> With this change, the reboot(LINUX_REBOOT_CMD_KEXEC), call would fail. This
> thing doesn't usually return, so we're likely to trigger error-handling that has
> never run before.
> 
> (Last time I debugged one of these, it turned out kexec had taken the network
> interfaces down, meaning the nfsroot was no longer accessible)
> 
> How can user-space know whether kexec is going to succeed, or fail like this?
> Any loaded kexec kernel could secretly be in this broken state.
> 
> Can user-space know what caused this to become unreliable? (without reading the
> kernel source)
> 
> 
> Given kexec can be unloaded by user-space, I think its better to prevent us
> getting into the broken state, preferably giving the hint that kexec us using
> that memory. The user can 'kexec -u', then retry removing the memory.
> 
> I think forbidding the memory-offline is simpler for user-space to deal with.

I thought about this over the weekend, and I don't think it's the right
approach.

1. It's racy. If memory is getting offlined/unplugged just while user
space is about to trigger the kexec_load(), you end up with the very
same triple-fault.

2. It's semantically wrong. kexec does not need online memory ("managed
by the buddy"), but still you disallow offlining memory.


I would really much rather want to see user-space choosing boot memory
(e.g., renaming hotplugged memory on all architectures), and checking
during "kexec -e" if the selected memory is actually "there", before
trying to write to it.

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use
  2020-03-27 15:42   ` James Morse
@ 2020-03-30 13:18     ` David Hildenbrand
  0 siblings, 0 replies; 33+ messages in thread
From: David Hildenbrand @ 2020-03-30 13:18 UTC (permalink / raw)
  To: James Morse
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

On 27.03.20 16:42, James Morse wrote:
> Hi David,
> 
> On 3/27/20 9:27 AM, David Hildenbrand wrote:
>> On 26.03.20 19:07, James Morse wrote:
>>> arm64 recently queued support for memory hotremove, which led to some
>>> new corner cases for kexec.
>>>
>>> If the kexec segments are loaded for a removable region, that region may
>>> be removed before kexec actually occurs. This causes the first kernel to
>>> lockup when applying the relocations. (I've triggered this on x86 too).
>>>
>>> The first patch adds a memory notifier for kexec so that it can refuse
>>> to allow in-use regions to be taken offline.
> 
>> IIRC other architectures handle that by setting the affected pages
>> PageReserved. Any reason why to not stick to the same?
> 
> Hmm, I didn't spot this. How come core code doesn't do it if its needed?
> 
> Doesn't PG_Reserved prevent the page from being used for regular allocations?
> (or is that only if its done early)
> 
> I prefer the runtime check as the dmesg output gives the user some chance of
> knowing why their memory-offline failed, and doing something about it!

I was confused which memory we are trying to protect. Understood now,
that you are dealing with the target physical memory described during
described during kexec_load.

[...]

> 
>> Also, makedumpfile will check if the
>> sections are still around IIRC.
> 
> Curious. I thought the vmcore was virtually addressed, how does it know which
> linear-map portions correspond to sysfs memory nodes with KASLR?

That's a very interesting question. I remember there was KASLR support
being implemented specifically for that - but I don't know any details.

>> Not sure what you mean by "Unaware kdump from user-space".
> 
> The existing kexec-tools binaries, that (I assume) don't go probing to find out
> if 'System RAM' is removable or not, loading a kdump kernel, along with the
> user-space generated blob that describes the first kernel's memory usage to the
> second kernel.

Finally understood how kexec without kdump works, thanks.

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names
  2020-03-27 15:39     ` James Morse
@ 2020-03-30 13:23       ` David Hildenbrand
  2020-03-30 17:17         ` James Morse
  0 siblings, 1 reply; 33+ messages in thread
From: David Hildenbrand @ 2020-03-30 13:23 UTC (permalink / raw)
  To: James Morse
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	Vitaly Kuznetsov, linux-arm-kernel

>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index 0a54ffac8c68..69b03dd7fc74 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -42,6 +42,10 @@
>>>  #include "internal.h"
>>>  #include "shuffle.h"
>>>  
>>> +#ifndef MEMORY_HOTPLUG_RES_NAME
>>> +#define MEMORY_HOTPLUG_RES_NAME "System RAM"
>>> +#endif
>>
>> So I assume changing this for all architectures would result in some
>> user space tool breaking? Are we aware of any?
> 
> Last time we had to touch arm64's /proc/iomem strings I went through debian's
> codesearch for stuff that reads it, kexec-tools was the only thing that parsed
> it in anger. (From memory, the other tools were looking for PCIe windows to do
> firmware flashing..)
> 
> Looking again, having qualifiers on the end of 'System RAM' looks like it could
> confuse 's390-tools's detect_mem_chunks parser.

Good to know, we should find out if this could work.

> 
> It looks like the strings that come out of 'FIRMWARE_MEMMAP' are a duplicate set.
> 
> 
>> I do wonder if we should simply change it for all architectures if possible.
> 
> If its possible that would be great. But I suspect that ship has sailed,
> changing it on other architectures could break some fragile parsing code.

I assume any parser has to be prepared for new types showing up.
Otherwise these would not be future proof. The question is if a common
prefix is problematic.

E.g., Use "Hotplugged System RAM" instead of "System RAM (hotplug)"

> 
> I'm wary of changing it on arm64, the only thing that makes it tolerable is that
> memory hot-add was relatively recently merged, and we don't anticipate it being
> widely used until you can remove memory as well.
> 
> Changing it on arm64 is to prevent today's versions of kexec-tools from
> accidentally placing the new kernel in memory that wasn't described at boot.
> This leads to an unhandled exception during boot[0] because the kernel can't
> access itself via the mapping of all memory. (hotpluggable regions are only
> discovered by suitably configured ACPI systems much later)

I want the very same for virtio-mem (initially x86-only, but later open
for all archs). Can also be interesting for Hyper-V. kexec should not
try to use hotplugged memory as kexec target, as memory blocks can be
partially inaccessible.

Of course, I can provide an interface to override the name via
add_memory(), but having it on all architectures handled in a similar
way right from the start would be nicer.


-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use
  2020-03-26 18:07 [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use James Morse
                   ` (4 preceding siblings ...)
  2020-03-27  9:27 ` David Hildenbrand
@ 2020-03-30 13:55 ` Baoquan He
  2020-03-30 17:17   ` James Morse
  2020-03-31  3:38 ` Dave Young
  6 siblings, 1 reply; 33+ messages in thread
From: Baoquan He @ 2020-03-30 13:55 UTC (permalink / raw)
  To: James Morse
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

Hi James,

On 03/26/20 at 06:07pm, James Morse wrote:
> Hello!
> 
> arm64 recently queued support for memory hotremove, which led to some
> new corner cases for kexec.
> 
> If the kexec segments are loaded for a removable region, that region may
> be removed before kexec actually occurs. This causes the first kernel to
> lockup when applying the relocations. (I've triggered this on x86 too).
> 
> The first patch adds a memory notifier for kexec so that it can refuse
> to allow in-use regions to be taken offline.

I talked about this with Dave Young. Currently, we tend to use
kexec_file_load more in the future since most of its implementation is
in kernel, we can get information about kernel more easilier. For the
kexec kernel loaded into hotpluggable area, we can fix it in
kexec_file_load side, we know the MOVABLE zone's start and end. As for
the old kexec_load, we would like to keep it for back compatibility. At
least in our distros, we have switched to kexec_file_load, will
gradually obsolete kexec_load. So for this one, I suggest avoiding those
MOVZBLE memory region when searching place for kexec kernel.

Not sure if arm64 will still have difficulty.

> 
> 
> This doesn't solve the problem for arm64, where the new kernel must
> initially rely on the data structures from the first boot to describe
> memory. These don't describe hotpluggable memory.
> If kexec places the kernel in one of these regions, it must also provide
> a DT that describes the region in which the kernel was mapped as memory.
> (and somehow ensure its always present in the future...)
> 
> To prevent this from happening accidentally with unaware user-space,
> patches two and three allow arm64 to give these regions a different
> name.
> 
> This is a change in behaviour for arm64 as memory hotadd and hotremove
> were added separately.
> 
> 
> I haven't tried kdump.
> Unaware kdump from user-space probably won't describe the hotplug
> regions if the name is different, which saves us from problems if
> the memory is no longer present at kdump time, but means the vmcore
> is incomplete.
> 
> 
> These patches are based on arm64's for-next/core branch, but can all
> be merged independently.
> 
> Thanks,
> 
> James Morse (3):
>   kexec: Prevent removal of memory in use by a loaded kexec image
>   mm/memory_hotplug: Allow arch override of non boot memory resource
>     names
>   arm64: memory: Give hotplug memory a different resource name
> 
>  arch/arm64/include/asm/memory.h | 11 +++++++
>  kernel/kexec_core.c             | 56 +++++++++++++++++++++++++++++++++
>  mm/memory_hotplug.c             |  6 +++-
>  3 files changed, 72 insertions(+), 1 deletion(-)
> 
> -- 
> 2.25.1
> 
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use
  2020-03-30 13:55 ` Baoquan He
@ 2020-03-30 17:17   ` James Morse
  2020-03-31  3:46     ` Dave Young
  0 siblings, 1 reply; 33+ messages in thread
From: James Morse @ 2020-03-30 17:17 UTC (permalink / raw)
  To: Baoquan He
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

Hi Baoquan,

On 3/30/20 2:55 PM, Baoquan He wrote:
> On 03/26/20 at 06:07pm, James Morse wrote:
>> arm64 recently queued support for memory hotremove, which led to some
>> new corner cases for kexec.
>>
>> If the kexec segments are loaded for a removable region, that region may
>> be removed before kexec actually occurs. This causes the first kernel to
>> lockup when applying the relocations. (I've triggered this on x86 too).
>>
>> The first patch adds a memory notifier for kexec so that it can refuse
>> to allow in-use regions to be taken offline.
> 
> I talked about this with Dave Young. Currently, we tend to use
> kexec_file_load more in the future since most of its implementation is
> in kernel, we can get information about kernel more easilier. For the
> kexec kernel loaded into hotpluggable area, we can fix it in
> kexec_file_load side, we know the MOVABLE zone's start and end. As for
> the old kexec_load, we would like to keep it for back compatibility. At
> least in our distros, we have switched to kexec_file_load, will
> gradually obsolete kexec_load.

> So for this one, I suggest avoiding those
> MOVZBLE memory region when searching place for kexec kernel.

How does today's user-space know?


> Not sure if arm64 will still have difficulty.

arm64 added support for kexec_load first, then kexec_file_load. (evidently a
mistake).
kexec_file_load support was only added in the last year or so, I'd hazard most
people using this, are using the regular load kind. (and probably don't know or
care).



Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names
  2020-03-30 13:23       ` David Hildenbrand
@ 2020-03-30 17:17         ` James Morse
  0 siblings, 0 replies; 33+ messages in thread
From: James Morse @ 2020-03-30 17:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	Vitaly Kuznetsov, linux-arm-kernel

Hi David,

On 3/30/20 2:23 PM, David Hildenbrand wrote:
>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>>> index 0a54ffac8c68..69b03dd7fc74 100644
>>>> --- a/mm/memory_hotplug.c
>>>> +++ b/mm/memory_hotplug.c
>>>> @@ -42,6 +42,10 @@
>>>>  #include "internal.h"
>>>>  #include "shuffle.h"
>>>>  
>>>> +#ifndef MEMORY_HOTPLUG_RES_NAME
>>>> +#define MEMORY_HOTPLUG_RES_NAME "System RAM"
>>>> +#endif
>>>
>>> So I assume changing this for all architectures would result in some
>>> user space tool breaking? Are we aware of any?
>>
>> Last time we had to touch arm64's /proc/iomem strings I went through debian's
>> codesearch for stuff that reads it, kexec-tools was the only thing that parsed
>> it in anger. (From memory, the other tools were looking for PCIe windows to do
>> firmware flashing..)
>>
>> Looking again, having qualifiers on the end of 'System RAM' looks like it could
>> confuse 's390-tools's detect_mem_chunks parser.
> 
> Good to know, we should find out if this could work.
> 
>>
>> It looks like the strings that come out of 'FIRMWARE_MEMMAP' are a duplicate set.
>>
>>
>>> I do wonder if we should simply change it for all architectures if possible.
>>
>> If its possible that would be great. But I suspect that ship has sailed,
>> changing it on other architectures could break some fragile parsing code.
> 
> I assume any parser has to be prepared for new types showing up.
> Otherwise these would not be future proof. The question is if a common
> prefix is problematic.
> 
> E.g., Use "Hotplugged System RAM" instead of "System RAM (hotplug)"

Aha, I went for a (suffix) because that is what 32bit Arm did for the boot alias.


>> I'm wary of changing it on arm64, the only thing that makes it tolerable is that
>> memory hot-add was relatively recently merged, and we don't anticipate it being
>> widely used until you can remove memory as well.
>>
>> Changing it on arm64 is to prevent today's versions of kexec-tools from
>> accidentally placing the new kernel in memory that wasn't described at boot.
>> This leads to an unhandled exception during boot[0] because the kernel can't
>> access itself via the mapping of all memory. (hotpluggable regions are only
>> discovered by suitably configured ACPI systems much later)

> I want the very same for virtio-mem (initially x86-only, but later open
> for all archs). Can also be interesting for Hyper-V. kexec should not
> try to use hotplugged memory as kexec target, as memory blocks can be
> partially inaccessible.

Great! I assumed these placement requirements would be arm64 specific.


> Of course, I can provide an interface to override the name via
> add_memory(), but having it on all architectures handled in a similar
> way right from the start would be nicer.

I agree having it the same on all architectures would be good.

It sounds like virtio-mem is a better argument for doing this than arm64's
firmware memory description.

I'll have a read, and maybe post something to linux-arch to do this at rc1.
(I assume we'll have a few weeks to make sure arm64 at least uses the same name
if it goes on longer)


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-30 13:13               ` David Hildenbrand
@ 2020-03-30 17:17                 ` James Morse
  2020-03-30 18:14                   ` David Hildenbrand
  0 siblings, 1 reply; 33+ messages in thread
From: James Morse @ 2020-03-30 17:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

Hi David,

On 3/30/20 2:13 PM, David Hildenbrand wrote:
>> Adding a sentence about the way kexec load works may help, the first paragraph
>> would read:
>>
>> | Kexec allows user-space to specify the address that the kexec image should be
>> | loaded to. Because this memory may be in use, an image loaded for kexec is not
>> | stored in place, instead its segments are scattered through memory, and are
>> | re-assembled when needed. In the meantime, the target memory may have been
>> | removed.
>>
>> Do you think thats clearer?
> 
> Yes, very much. Maybe add, that the target is described by user space
> during kexec_load() and that user space - right now - parses /proc/iomem
> to find applicable system memory.

(I don't think x86 parses /proc/iomem anymore). I'll repost this patch with that
expanded commit message, once we've agreed this is the right thing to do!


>>> I wonder if we should instead make the "kexec -e" fail. It tries to
>>> touch random system memory.
>>
>> Heh, isn't touching random system memory what kexec does?!
> 
> Having a racy user interface that can trigger kernel crashes feels very
> wrong. We should limit the impact.


>> Its all described to user-space as 'System RAM'. Teaching it to probe
>> /sys/devices/memory/... would require a user-space change.
> 
> I think we should really rename hotplugged memory on all architectures.
> 
> Especially also relevant for virtio-mem/hyper-v balloon, where some
> pieces of (hotplugged )memory blocks are partially unavailable and
> should not be touched - accessing them results in unpredictable behavior
> (e.g., crashes or discarded writes).

I'll need to look into these. I'd assume for KVM that virtio-mem can be brought
back when its accessed ... its just going to be slow.


>>> Will probably need some thought. But it will actually also bail out when
>>> user space passes wrong physical memory addresses, instead of
>>> triple-faulting silently.
>>
>> With this change, the reboot(LINUX_REBOOT_CMD_KEXEC), call would fail. This
>> thing doesn't usually return, so we're likely to trigger error-handling that has
>> never run before.
>>
>> (Last time I debugged one of these, it turned out kexec had taken the network
>> interfaces down, meaning the nfsroot was no longer accessible)
>>
>> How can user-space know whether kexec is going to succeed, or fail like this?
>> Any loaded kexec kernel could secretly be in this broken state.
>>
>> Can user-space know what caused this to become unreliable? (without reading the
>> kernel source)
>>
>>
>> Given kexec can be unloaded by user-space, I think its better to prevent us
>> getting into the broken state, preferably giving the hint that kexec us using
>> that memory. The user can 'kexec -u', then retry removing the memory.
>>
>> I think forbidding the memory-offline is simpler for user-space to deal with.
> 
> I thought about this over the weekend, and I don't think it's the right
> approach.


> 1. It's racy. If memory is getting offlined/unplugged just while user
> space is about to trigger the kexec_load(), you end up with the very
> same triple-fault.

load? How is this different to user-space providing a bogus address?

Sure, user-space may take a nap between parsing /proc/iomem and calling
kexec_load(), but the kernel should reject these as they would never work.

(I can't see where sanity_check_segment_list() considers the platform's memory.
If it doesn't, we should fix it)

Once the image is loaded, and clashes with a request to remove the memory there
are two choices: secretly unload the image, or prevent the memory being taken
offline.


> 2. It's semantically wrong. kexec does not need online memory ("managed
> by the buddy"), but still you disallow offlining memory.

It does need the memory if you want 'kexec -e' to succeed.
If there were any sanity tests, they should have happened at load time.

The memory is effectively in use by the loaded kexec image. User-space told the
kernel to use this memory, you should not be able to then remove it, without
unloading the kexec image first.


Are you saying feeding bogus addresses to kexec_load() is _expected_ to blow up
like this?


> I would really much rather want to see user-space choosing boot memory
> (e.g., renaming hotplugged memory on all architectures), and checking
> during "kexec -e" if the selected memory is actually "there", before
> trying to write to it.

How does 'kexec -e' know where the kexec kernel was loaded? You'd need to pass
something between 'load' and 'exec'. How do you keep existing user-space working
as much as possible?

What do you do if the memory isn't there? User-space just called reboot(), it
would be better to avoid getting into the situation where we have to fail that call.


Solving the bigger problem, would add a 'kexec_it_now' flag to the kexec_load()
call. This would make the window where 'stuff' can change much smaller. Things
changing while user-space sleeps isn't a solvable problem, these would need to
be rejected by sanity tests at load time.


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-30 17:17                 ` James Morse
@ 2020-03-30 18:14                   ` David Hildenbrand
  0 siblings, 0 replies; 33+ messages in thread
From: David Hildenbrand @ 2020-03-30 18:14 UTC (permalink / raw)
  To: James Morse
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

On 30.03.20 19:17, James Morse wrote:
> Hi David,
> 
> On 3/30/20 2:13 PM, David Hildenbrand wrote:
>>> Adding a sentence about the way kexec load works may help, the first paragraph
>>> would read:
>>>
>>> | Kexec allows user-space to specify the address that the kexec image should be
>>> | loaded to. Because this memory may be in use, an image loaded for kexec is not
>>> | stored in place, instead its segments are scattered through memory, and are
>>> | re-assembled when needed. In the meantime, the target memory may have been
>>> | removed.
>>>
>>> Do you think thats clearer?
>>
>> Yes, very much. Maybe add, that the target is described by user space
>> during kexec_load() and that user space - right now - parses /proc/iomem
>> to find applicable system memory.
> 
> (I don't think x86 parses /proc/iomem anymore). I'll repost this patch with that
> expanded commit message, once we've agreed this is the right thing to do!

Right, I can see kexec-tools parsing /sys/firmware/memmap first.
Unfortunately, all hotplugged memory (via add_memory()) is indicated
there as System RAM ... including memory added by virtio-mem.

I think we should adapt the type there as well. (in your patch #2)

	firmware_map_add_hotplug(start, start + size, "System RAM");

> 
> 
>>>> I wonder if we should instead make the "kexec -e" fail. It tries to
>>>> touch random system memory.
>>>
>>> Heh, isn't touching random system memory what kexec does?!
>>
>> Having a racy user interface that can trigger kernel crashes feels very
>> wrong. We should limit the impact.
> 
> 
>>> Its all described to user-space as 'System RAM'. Teaching it to probe
>>> /sys/devices/memory/... would require a user-space change.
>>
>> I think we should really rename hotplugged memory on all architectures.
>>
>> Especially also relevant for virtio-mem/hyper-v balloon, where some
>> pieces of (hotplugged )memory blocks are partially unavailable and
>> should not be touched - accessing them results in unpredictable behavior
>> (e.g., crashes or discarded writes).
> 
> I'll need to look into these. I'd assume for KVM that virtio-mem can be brought
> back when its accessed ... its just going to be slow.

Touching unplugged virtio-mem memory can result in unpredictable
behavior. Touching (some) unplugged Hyper-V memory will be handled
similarly AFAIK.

[...]

>> 1. It's racy. If memory is getting offlined/unplugged just while user
>> space is about to trigger the kexec_load(), you end up with the very
>> same triple-fault.
> 
> load? How is this different to user-space providing a bogus address?

I guess it's not different. It's just racy because user space with good
intend could crash the system :)

> 
> Sure, user-space may take a nap between parsing /proc/iomem and calling
> kexec_load(), but the kernel should reject these as they would never work.
> 
> (I can't see where sanity_check_segment_list() considers the platform's memory.
> If it doesn't, we should fix it)

Right, that's what I meant. I was not able to find any sanity checks.
Maybe they are in place but I was not able to spot them.

> 
> Once the image is loaded, and clashes with a request to remove the memory there
> are two choices: secretly unload the image, or prevent the memory being taken
> offline.

Exactly. Or make "kexec -e" fail.

> 
> 
>> 2. It's semantically wrong. kexec does not need online memory ("managed
>> by the buddy"), but still you disallow offlining memory.
> 
> It does need the memory if you want 'kexec -e' to succeed.
> If there were any sanity tests, they should have happened at load time.

Offlining != removing. That's the point I was trying to make. (and we
don't want to block removing of memory in the kernel any other way)

> 
> The memory is effectively in use by the loaded kexec image. User-space told the
> kernel to use this memory, you should not be able to then remove it, without
> unloading the kexec image first.

It's not in use before you do the "kexec -e" IMHO.

> Are you saying feeding bogus addresses to kexec_load() is _expected_ to blow up
> like this?

No, not at all. I think this should be fixed if this is possible.

> 
>> I would really much rather want to see user-space choosing boot memory
>> (e.g., renaming hotplugged memory on all architectures), and checking
>> during "kexec -e" if the selected memory is actually "there", before
>> trying to write to it.
> 
> How does 'kexec -e' know where the kexec kernel was loaded? You'd need to pass
> something between 'load' and 'exec'. How do you keep existing user-space working
> as much as possible?

If we use new types (e.g., "System RAM (hotplugged)"), looks like most
of kexec will continue working (memory will be treated like
RANGE_RESERVED or ignored).

I guess we would still have to teach kexec-tools the new types,
primarily to keep the crash memory ranges from getting detected
properly. (no idea how they are used, will have to take a closer look)

> 
> What do you do if the memory isn't there? User-space just called reboot(), it
> would be better to avoid getting into the situation where we have to fail that call.

In kernel_kexec() we already fail if there is no kernel image loaded, so
we can similarly simply fail if the kernel image cannot be moved to the
target memory IMHO.

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name
  2020-03-26 18:07 ` [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name James Morse
@ 2020-03-30 19:01   ` David Hildenbrand
  0 siblings, 0 replies; 33+ messages in thread
From: David Hildenbrand @ 2020-03-30 19:01 UTC (permalink / raw)
  To: James Morse, kexec, linux-mm, linux-arm-kernel
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma,
	Eric Biederman, Andrew Morton, Will Deacon

On 26.03.20 19:07, James Morse wrote:
> If kexec chooses to place the kernel in a memory region that was
> added after boot, we fail to boot as the kernel is running from a
> location that is not described as memory by the UEFI memory map or
> the original DT.
> 
> To prevent unaware user-space kexec from doing this accidentally,
> give these regions a different name.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> This is a change in behaviour as seen by user-space, because memory hot-add
> has already been merged.
> 
>  arch/arm64/include/asm/memory.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 2be67b232499..ef1686518469 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -166,6 +166,17 @@
>  #define IOREMAP_MAX_ORDER	(PMD_SHIFT)
>  #endif
>  
> +/*
> + * Memory hotplug allows new regions of 'System RAM' to be added to the system.
> + * These aren't described as memory by the UEFI memory map, or DT memory node.
> + * If we kexec from one of these regions, the new kernel boots from a location
> + * that isn't described as RAM.
> + *
> + * Give these resources a different name, so unaware kexec doesn't do this by
> + * accident.
> + */
> +#define MEMORY_HOTPLUG_RES_NAME "System RAM (hotplug)"
> +
>  #ifndef __ASSEMBLY__
>  extern u64			vabits_actual;
>  #define PAGE_END		(_PAGE_END(vabits_actual))
> 

(While I am familiar with makedumpfile in the crash kernel, I am not yet
familiar with kexec, so bare with me)


Looking at kexec:arch/arm64/crashdump-arm64.c

load_crashdump_segments() -> crash_get_memory_ranges() ->
kexec_iomem_for_each_line() -> iomem_range_callback()


#define SYSTEM_RAM "System RAM\n"

...
} else if (strncmp(str, SYSTEM_RAM, strlen(SYSTEM_RAM)) == 0) {
	return mem_regions_add(&system_memory_rgns, ...);
}


The hotplugged memory will no longer be detected as a crashdump segment,
consequently (AFAIU) not be described in the elf header, and therefore
also no longer dumped (e.g., by makedumpfile).

I assume you'll have to adapt kexec-tools to still consider this memory
for dumping, correct? Or am I missing something?


-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use
  2020-03-26 18:07 [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use James Morse
                   ` (5 preceding siblings ...)
  2020-03-30 13:55 ` Baoquan He
@ 2020-03-31  3:38 ` Dave Young
  6 siblings, 0 replies; 33+ messages in thread
From: Dave Young @ 2020-03-31  3:38 UTC (permalink / raw)
  To: James Morse
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

Hi James,
On 03/26/20 at 06:07pm, James Morse wrote:
> Hello!
> 
> arm64 recently queued support for memory hotremove, which led to some
> new corner cases for kexec.
> 
> If the kexec segments are loaded for a removable region, that region may
> be removed before kexec actually occurs. This causes the first kernel to
> lockup when applying the relocations. (I've triggered this on x86 too).

Does a kexec reload work for your case?   If yes then I would suggest to
do it in userspace,  for example have a udev rule to reload kexec if
needed.

Actually we have a rule to restart kdump loading, but not for kexec, it
sounds also need a service to load kexec, and an udev rule to reload for
memory hotplug.

> 
> The first patch adds a memory notifier for kexec so that it can refuse
> to allow in-use regions to be taken offline.
> 
> 
> This doesn't solve the problem for arm64, where the new kernel must
> initially rely on the data structures from the first boot to describe
> memory. These don't describe hotpluggable memory.
> If kexec places the kernel in one of these regions, it must also provide
> a DT that describes the region in which the kernel was mapped as memory.
> (and somehow ensure its always present in the future...)
> 
> To prevent this from happening accidentally with unaware user-space,
> patches two and three allow arm64 to give these regions a different
> name.
> 
> This is a change in behaviour for arm64 as memory hotadd and hotremove
> were added separately.
> 
> 
> I haven't tried kdump.
> Unaware kdump from user-space probably won't describe the hotplug
> regions if the name is different, which saves us from problems if
> the memory is no longer present at kdump time, but means the vmcore
> is incomplete.
> 
> 
> These patches are based on arm64's for-next/core branch, but can all
> be merged independently.
> 
> Thanks,
> 
> James Morse (3):
>   kexec: Prevent removal of memory in use by a loaded kexec image
>   mm/memory_hotplug: Allow arch override of non boot memory resource
>     names
>   arm64: memory: Give hotplug memory a different resource name
> 
>  arch/arm64/include/asm/memory.h | 11 +++++++
>  kernel/kexec_core.c             | 56 +++++++++++++++++++++++++++++++++
>  mm/memory_hotplug.c             |  6 +++-
>  3 files changed, 72 insertions(+), 1 deletion(-)
> 
> -- 
> 2.25.1
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 

Thanks
Dave


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use
  2020-03-30 17:17   ` James Morse
@ 2020-03-31  3:46     ` Dave Young
  0 siblings, 0 replies; 33+ messages in thread
From: Dave Young @ 2020-03-31  3:46 UTC (permalink / raw)
  To: James Morse
  Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma,
	kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

Hi James,
On 03/30/20 at 06:17pm, James Morse wrote:
> Hi Baoquan,
> 
> On 3/30/20 2:55 PM, Baoquan He wrote:
> > On 03/26/20 at 06:07pm, James Morse wrote:
> >> arm64 recently queued support for memory hotremove, which led to some
> >> new corner cases for kexec.
> >>
> >> If the kexec segments are loaded for a removable region, that region may
> >> be removed before kexec actually occurs. This causes the first kernel to
> >> lockup when applying the relocations. (I've triggered this on x86 too).
> >>
> >> The first patch adds a memory notifier for kexec so that it can refuse
> >> to allow in-use regions to be taken offline.
> > 
> > I talked about this with Dave Young. Currently, we tend to use
> > kexec_file_load more in the future since most of its implementation is
> > in kernel, we can get information about kernel more easilier. For the
> > kexec kernel loaded into hotpluggable area, we can fix it in
> > kexec_file_load side, we know the MOVABLE zone's start and end. As for
> > the old kexec_load, we would like to keep it for back compatibility. At
> > least in our distros, we have switched to kexec_file_load, will
> > gradually obsolete kexec_load.
> 
> > So for this one, I suggest avoiding those
> > MOVZBLE memory region when searching place for kexec kernel.
> 
> How does today's user-space know?
> 
> 
> > Not sure if arm64 will still have difficulty.
> 
> arm64 added support for kexec_load first, then kexec_file_load. (evidently a
> mistake).
> kexec_file_load support was only added in the last year or so, I'd hazard most
> people using this, are using the regular load kind. (and probably don't know or
> care).

I agreed that file load is still not widely used,  but in the long run
we should not maintain both of them all the future time.  Especially
when some kernel-userspace interfaces need to be introduced, file load
will have the natural advantage.  We may keep the kexec_load for other
misc usecases, but we can use file load for the major modern
linux-to-linux loading.  I'm not saying we can do it immediately, just
thought we should reduce the duplicate effort and try to avoid hacking if
possible.

Anyway about this particular issue, I wonder if we can just reload with
a udev rule as replied in another mail.

Thanks
Dave


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names
  2020-03-26 18:07 ` [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names James Morse
  2020-03-27  9:59   ` David Hildenbrand
@ 2020-04-02  5:49   ` Dave Young
  2020-04-02  6:12     ` piliu
  1 sibling, 1 reply; 33+ messages in thread
From: Dave Young @ 2020-04-02  5:49 UTC (permalink / raw)
  To: James Morse
  Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Hari Bathini, Andrew Morton,
	Will Deacon, linux-arm-kernel

On 03/26/20 at 06:07pm, James Morse wrote:
> Memory added to the system by hotplug has a 'System RAM' resource created
> for it. This is exposed to user-space via /proc/iomem.
> 
> This poses problems for kexec on arm64. If kexec decides to place the
> kernel in one of these newly onlined regions, the new kernel will find
> itself booting from a region not described as memory in the firmware
> tables.
> 
> Arm64 doesn't have a structure like the e820 memory map that can be
> re-written when memory is brought online. Instead arm64 uses the UEFI
> memory map, or the memory node from the DT, sometimes both. We never
> rewrite these.

Could arm64 use similar way to update DT, or a cooked UEFI maps?

Add pingfan in cc, he said ppc64 update the DT after a memremove thus it
would be good to just redo a kexec load.

Added Pingfan and Hari for comments and corrections.

> 
> Allow an architecture to specify a different name for these hotplug
> regions.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  mm/memory_hotplug.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 0a54ffac8c68..69b03dd7fc74 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -42,6 +42,10 @@
>  #include "internal.h"
>  #include "shuffle.h"
>  
> +#ifndef MEMORY_HOTPLUG_RES_NAME
> +#define MEMORY_HOTPLUG_RES_NAME "System RAM"
> +#endif
> +
>  /*
>   * online_page_callback contains pointer to current page onlining function.
>   * Initially it is generic_online_page(). If it is required it could be
> @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size)
>  {
>  	struct resource *res;
>  	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> -	char *resource_name = "System RAM";
> +	char *resource_name = MEMORY_HOTPLUG_RES_NAME;
>  
>  	if (start + size > max_mem_size)
>  		return ERR_PTR(-E2BIG);
> -- 
> 2.25.1
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names
  2020-04-02  5:49   ` Dave Young
@ 2020-04-02  6:12     ` piliu
  0 siblings, 0 replies; 33+ messages in thread
From: piliu @ 2020-04-02  6:12 UTC (permalink / raw)
  To: Dave Young, James Morse
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Hari Bathini, Andrew Morton,
	Will Deacon, linux-arm-kernel



On 04/02/2020 01:49 PM, Dave Young wrote:
> On 03/26/20 at 06:07pm, James Morse wrote:
>> Memory added to the system by hotplug has a 'System RAM' resource created
>> for it. This is exposed to user-space via /proc/iomem.
>>
>> This poses problems for kexec on arm64. If kexec decides to place the
>> kernel in one of these newly onlined regions, the new kernel will find
>> itself booting from a region not described as memory in the firmware
>> tables.
>>
>> Arm64 doesn't have a structure like the e820 memory map that can be
>> re-written when memory is brought online. Instead arm64 uses the UEFI
>> memory map, or the memory node from the DT, sometimes both. We never
>> rewrite these.
> 
> Could arm64 use similar way to update DT, or a cooked UEFI maps?
> 
> Add pingfan in cc, he said ppc64 update the DT after a memremove thus it
> would be good to just redo a kexec load.
> 
Yes, the memory changes will be observed through device-node under
/proc/device-tree/ (which is for powerpc).

Later if running kexec -l/-p , it can build new dtb with the latest info
from /proc/device-tree
> Added Pingfan and Hari for comments and corrections.
> 
>>
>> Allow an architecture to specify a different name for these hotplug
>> regions.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> ---
>>  mm/memory_hotplug.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 0a54ffac8c68..69b03dd7fc74 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -42,6 +42,10 @@
>>  #include "internal.h"
>>  #include "shuffle.h"
>>  
>> +#ifndef MEMORY_HOTPLUG_RES_NAME
>> +#define MEMORY_HOTPLUG_RES_NAME "System RAM"
>> +#endif
>> +
>>  /*
>>   * online_page_callback contains pointer to current page onlining function.
>>   * Initially it is generic_online_page(). If it is required it could be
>> @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size)
>>  {
>>  	struct resource *res;
>>  	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>> -	char *resource_name = "System RAM";
>> +	char *resource_name = MEMORY_HOTPLUG_RES_NAME;
>>  
>>  	if (start + size > max_mem_size)
>>  		return ERR_PTR(-E2BIG);
>> -- 
>> 2.25.1
>>
>>
>> _______________________________________________
>> kexec mailing list
>> kexec@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/kexec
>>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, back to index

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-26 18:07 [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use James Morse
2020-03-26 18:07 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image James Morse
2020-03-27  0:43   ` Anshuman Khandual
2020-03-27  2:54     ` Baoquan He
2020-03-27 15:46     ` James Morse
2020-03-27  2:34   ` Baoquan He
2020-03-27  9:30   ` David Hildenbrand
2020-03-27 16:56     ` James Morse
2020-03-27 17:06       ` David Hildenbrand
2020-03-27 18:07         ` James Morse
2020-03-27 18:52           ` David Hildenbrand
2020-03-30 13:00             ` James Morse
2020-03-30 13:13               ` David Hildenbrand
2020-03-30 17:17                 ` James Morse
2020-03-30 18:14                   ` David Hildenbrand
2020-03-26 18:07 ` [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names James Morse
2020-03-27  9:59   ` David Hildenbrand
2020-03-27 15:39     ` James Morse
2020-03-30 13:23       ` David Hildenbrand
2020-03-30 17:17         ` James Morse
2020-04-02  5:49   ` Dave Young
2020-04-02  6:12     ` piliu
2020-03-26 18:07 ` [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name James Morse
2020-03-30 19:01   ` David Hildenbrand
2020-03-27  2:11 ` [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use Baoquan He
2020-03-27 15:40   ` James Morse
2020-03-27  9:27 ` David Hildenbrand
2020-03-27 15:42   ` James Morse
2020-03-30 13:18     ` David Hildenbrand
2020-03-30 13:55 ` Baoquan He
2020-03-30 17:17   ` James Morse
2020-03-31  3:46     ` Dave Young
2020-03-31  3:38 ` Dave Young

Linux-ARM-Kernel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-arm-kernel/0 linux-arm-kernel/git/0.git
	git clone --mirror https://lore.kernel.org/linux-arm-kernel/1 linux-arm-kernel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-arm-kernel linux-arm-kernel/ https://lore.kernel.org/linux-arm-kernel \
		linux-arm-kernel@lists.infradead.org
	public-inbox-index linux-arm-kernel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-arm-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git