Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use
@ 2020-03-26 18:07 James Morse
  2020-03-26 18:07 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image James Morse
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: James Morse @ 2020-03-26 18:07 UTC (permalink / raw)
  To: kexec, linux-mm, linux-arm-kernel
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, James Morse,
	Eric Biederman, Andrew Morton, Will Deacon

Hello!

arm64 recently queued support for memory hotremove, which led to some
new corner cases for kexec.

If the kexec segments are loaded for a removable region, that region may
be removed before kexec actually occurs. This causes the first kernel to
lockup when applying the relocations. (I've triggered this on x86 too).

The first patch adds a memory notifier for kexec so that it can refuse
to allow in-use regions to be taken offline.


This doesn't solve the problem for arm64, where the new kernel must
initially rely on the data structures from the first boot to describe
memory. These don't describe hotpluggable memory.
If kexec places the kernel in one of these regions, it must also provide
a DT that describes the region in which the kernel was mapped as memory.
(and somehow ensure its always present in the future...)

To prevent this from happening accidentally with unaware user-space,
patches two and three allow arm64 to give these regions a different
name.

This is a change in behaviour for arm64 as memory hotadd and hotremove
were added separately.


I haven't tried kdump.
Unaware kdump from user-space probably won't describe the hotplug
regions if the name is different, which saves us from problems if
the memory is no longer present at kdump time, but means the vmcore
is incomplete.


These patches are based on arm64's for-next/core branch, but can all
be merged independently.

Thanks,

James Morse (3):
  kexec: Prevent removal of memory in use by a loaded kexec image
  mm/memory_hotplug: Allow arch override of non boot memory resource
    names
  arm64: memory: Give hotplug memory a different resource name

 arch/arm64/include/asm/memory.h | 11 +++++++
 kernel/kexec_core.c             | 56 +++++++++++++++++++++++++++++++++
 mm/memory_hotplug.c             |  6 +++-
 3 files changed, 72 insertions(+), 1 deletion(-)

-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-26 18:07 [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use James Morse
@ 2020-03-26 18:07 ` James Morse
  2020-03-27  0:43   ` Anshuman Khandual
                     ` (2 more replies)
  2020-03-26 18:07 ` [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names James Morse
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 19+ messages in thread
From: James Morse @ 2020-03-26 18:07 UTC (permalink / raw)
  To: kexec, linux-mm, linux-arm-kernel
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, James Morse,
	Eric Biederman, Andrew Morton, Will Deacon

An image loaded for kexec is not stored in place, instead its segments
are scattered through memory, and are re-assembled when needed. In the
meantime, the target memory may have been removed.

Because mm is not aware that this memory is still in use, it allows it
to be removed.

Add a memory notifier to prevent the removal of memory regions that
overlap with a loaded kexec image segment. e.g., when triggered from the
Qemu console:
| kexec_core: memory region in use
| memory memory32: Offline failed.

Signed-off-by: James Morse <james.morse@arm.com>
---
 kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index c19c0dad1ebe..ba1d91e868ca 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -12,6 +12,7 @@
 #include <linux/slab.h>
 #include <linux/fs.h>
 #include <linux/kexec.h>
+#include <linux/memory.h>
 #include <linux/mutex.h>
 #include <linux/list.h>
 #include <linux/highmem.h>
@@ -22,10 +23,12 @@
 #include <linux/elf.h>
 #include <linux/elfcore.h>
 #include <linux/utsname.h>
+#include <linux/notifier.h>
 #include <linux/numa.h>
 #include <linux/suspend.h>
 #include <linux/device.h>
 #include <linux/freezer.h>
+#include <linux/pfn.h>
 #include <linux/pm.h>
 #include <linux/cpu.h>
 #include <linux/uaccess.h>
@@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void)
 
 void __weak arch_kexec_unprotect_crashkres(void)
 {}
+
+/*
+ * If user-space wants to offline memory that is in use by a loaded kexec
+ * image, it should unload the image first.
+ */
+static int mem_remove_cb(struct notifier_block *nb, unsigned long action,
+			 void *data)
+{
+	int rv = NOTIFY_OK, i;
+	struct memory_notify *arg = data;
+	unsigned long pfn = arg->start_pfn;
+	unsigned long nr_segments, sstart, send;
+	unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
+
+	might_sleep();
+
+	if (action != MEM_GOING_OFFLINE)
+		return NOTIFY_DONE;
+
+	mutex_lock(&kexec_mutex);
+	if (kexec_image) {
+		nr_segments = kexec_image->nr_segments;
+
+		for (i = 0; i < nr_segments; i++) {
+			sstart = PFN_DOWN(kexec_image->segment[i].mem);
+			send = PFN_UP(kexec_image->segment[i].mem +
+				      kexec_image->segment[i].memsz);
+
+			if ((pfn <= sstart && sstart < end_pfn) ||
+			    (pfn <= send && send < end_pfn)) {
+				pr_warn("Memory region in use\n");
+				rv = NOTIFY_BAD;
+				break;
+			}
+		}
+	}
+	mutex_unlock(&kexec_mutex);
+
+	return rv;
+}
+
+static struct notifier_block mem_remove_nb = {
+	.notifier_call = mem_remove_cb,
+};
+
+static int __init register_mem_remove_cb(void)
+{
+	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+		return register_memory_notifier(&mem_remove_nb);
+
+	return 0;
+}
+device_initcall(register_mem_remove_cb);
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names
  2020-03-26 18:07 [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use James Morse
  2020-03-26 18:07 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image James Morse
@ 2020-03-26 18:07 ` James Morse
  2020-03-27  9:59   ` David Hildenbrand
  2020-03-26 18:07 ` [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name James Morse
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: James Morse @ 2020-03-26 18:07 UTC (permalink / raw)
  To: kexec, linux-mm, linux-arm-kernel
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, James Morse,
	Eric Biederman, Andrew Morton, Will Deacon

Memory added to the system by hotplug has a 'System RAM' resource created
for it. This is exposed to user-space via /proc/iomem.

This poses problems for kexec on arm64. If kexec decides to place the
kernel in one of these newly onlined regions, the new kernel will find
itself booting from a region not described as memory in the firmware
tables.

Arm64 doesn't have a structure like the e820 memory map that can be
re-written when memory is brought online. Instead arm64 uses the UEFI
memory map, or the memory node from the DT, sometimes both. We never
rewrite these.

Allow an architecture to specify a different name for these hotplug
regions.

Signed-off-by: James Morse <james.morse@arm.com>
---
 mm/memory_hotplug.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0a54ffac8c68..69b03dd7fc74 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -42,6 +42,10 @@
 #include "internal.h"
 #include "shuffle.h"
 
+#ifndef MEMORY_HOTPLUG_RES_NAME
+#define MEMORY_HOTPLUG_RES_NAME "System RAM"
+#endif
+
 /*
  * online_page_callback contains pointer to current page onlining function.
  * Initially it is generic_online_page(). If it is required it could be
@@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size)
 {
 	struct resource *res;
 	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
-	char *resource_name = "System RAM";
+	char *resource_name = MEMORY_HOTPLUG_RES_NAME;
 
 	if (start + size > max_mem_size)
 		return ERR_PTR(-E2BIG);
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name
  2020-03-26 18:07 [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use James Morse
  2020-03-26 18:07 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image James Morse
  2020-03-26 18:07 ` [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names James Morse
@ 2020-03-26 18:07 ` James Morse
  2020-03-27  2:11 ` [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use Baoquan He
  2020-03-27  9:27 ` David Hildenbrand
  4 siblings, 0 replies; 19+ messages in thread
From: James Morse @ 2020-03-26 18:07 UTC (permalink / raw)
  To: kexec, linux-mm, linux-arm-kernel
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, James Morse,
	Eric Biederman, Andrew Morton, Will Deacon

If kexec chooses to place the kernel in a memory region that was
added after boot, we fail to boot as the kernel is running from a
location that is not described as memory by the UEFI memory map or
the original DT.

To prevent unaware user-space kexec from doing this accidentally,
give these regions a different name.

Signed-off-by: James Morse <james.morse@arm.com>
---
This is a change in behaviour as seen by user-space, because memory hot-add
has already been merged.

 arch/arm64/include/asm/memory.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 2be67b232499..ef1686518469 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -166,6 +166,17 @@
 #define IOREMAP_MAX_ORDER	(PMD_SHIFT)
 #endif
 
+/*
+ * Memory hotplug allows new regions of 'System RAM' to be added to the system.
+ * These aren't described as memory by the UEFI memory map, or DT memory node.
+ * If we kexec from one of these regions, the new kernel boots from a location
+ * that isn't described as RAM.
+ *
+ * Give these resources a different name, so unaware kexec doesn't do this by
+ * accident.
+ */
+#define MEMORY_HOTPLUG_RES_NAME "System RAM (hotplug)"
+
 #ifndef __ASSEMBLY__
 extern u64			vabits_actual;
 #define PAGE_END		(_PAGE_END(vabits_actual))
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-26 18:07 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image James Morse
@ 2020-03-27  0:43   ` Anshuman Khandual
  2020-03-27  2:54     ` Baoquan He
  2020-03-27 15:46     ` James Morse
  2020-03-27  2:34   ` Baoquan He
  2020-03-27  9:30   ` David Hildenbrand
  2 siblings, 2 replies; 19+ messages in thread
From: Anshuman Khandual @ 2020-03-27  0:43 UTC (permalink / raw)
  To: James Morse, kexec, linux-mm, linux-arm-kernel
  Cc: Catalin Marinas, Andrew Morton, Bhupesh Sharma, Will Deacon,
	Eric Biederman



On 03/26/2020 11:37 PM, James Morse wrote:
> An image loaded for kexec is not stored in place, instead its segments
> are scattered through memory, and are re-assembled when needed. In the
> meantime, the target memory may have been removed.
> 
> Because mm is not aware that this memory is still in use, it allows it
> to be removed.

Why the isolation process does not fail when these pages are currently
being used by kexec ?

> 
> Add a memory notifier to prevent the removal of memory regions that
> overlap with a loaded kexec image segment. e.g., when triggered from the
> Qemu console:
> | kexec_core: memory region in use
> | memory memory32: Offline failed.

Yes this is definitely an added protection for these kexec loaded kernels
memory areas from being offlined but I would have expected the preceding
offlining to have failed as well.

> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 56 insertions(+)
> 
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index c19c0dad1ebe..ba1d91e868ca 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -12,6 +12,7 @@
>  #include <linux/slab.h>
>  #include <linux/fs.h>
>  #include <linux/kexec.h>
> +#include <linux/memory.h>
>  #include <linux/mutex.h>
>  #include <linux/list.h>
>  #include <linux/highmem.h>
> @@ -22,10 +23,12 @@
>  #include <linux/elf.h>
>  #include <linux/elfcore.h>
>  #include <linux/utsname.h>
> +#include <linux/notifier.h>
>  #include <linux/numa.h>
>  #include <linux/suspend.h>
>  #include <linux/device.h>
>  #include <linux/freezer.h>
> +#include <linux/pfn.h>
>  #include <linux/pm.h>
>  #include <linux/cpu.h>
>  #include <linux/uaccess.h>
> @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void)
>  
>  void __weak arch_kexec_unprotect_crashkres(void)
>  {}
> +
> +/*
> + * If user-space wants to offline memory that is in use by a loaded kexec
> + * image, it should unload the image first.
> + */

Probably this would need kexec user manual and related system call man pages
update as well.

> +static int mem_remove_cb(struct notifier_block *nb, unsigned long action,
> +			 void *data)
> +{
> +	int rv = NOTIFY_OK, i;
> +	struct memory_notify *arg = data;
> +	unsigned long pfn = arg->start_pfn;
> +	unsigned long nr_segments, sstart, send;
> +	unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
> +
> +	might_sleep();

Required ?

> +
> +	if (action != MEM_GOING_OFFLINE)
> +		return NOTIFY_DONE;
> +
> +	mutex_lock(&kexec_mutex);
> +	if (kexec_image) {
> +		nr_segments = kexec_image->nr_segments;
> +
> +		for (i = 0; i < nr_segments; i++) {
> +			sstart = PFN_DOWN(kexec_image->segment[i].mem);
> +			send = PFN_UP(kexec_image->segment[i].mem +
> +				      kexec_image->segment[i].memsz);
> +
> +			if ((pfn <= sstart && sstart < end_pfn) ||
> +			    (pfn <= send && send < end_pfn)) {
> +				pr_warn("Memory region in use\n");
> +				rv = NOTIFY_BAD;
> +				break;
> +			}
> +		}
> +	}
> +	mutex_unlock(&kexec_mutex);
> +
> +	return rv;

Variable 'rv' is redundant, should use NOTIFY_[BAD|OK] directly instead.

> +}
> +
> +static struct notifier_block mem_remove_nb = {
> +	.notifier_call = mem_remove_cb,
> +};
> +
> +static int __init register_mem_remove_cb(void)
> +{
> +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))

Should not all these new code here be wrapped with CONFIG_MEMORY_HOTREMOVE
to reduce the scope as well as final code size when the config is disabled.

> +		return register_memory_notifier(&mem_remove_nb);
> +
> +	return 0;
> +}
> +device_initcall(register_mem_remove_cb);
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use
  2020-03-26 18:07 [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use James Morse
                   ` (2 preceding siblings ...)
  2020-03-26 18:07 ` [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name James Morse
@ 2020-03-27  2:11 ` Baoquan He
  2020-03-27 15:40   ` James Morse
  2020-03-27  9:27 ` David Hildenbrand
  4 siblings, 1 reply; 19+ messages in thread
From: Baoquan He @ 2020-03-27  2:11 UTC (permalink / raw)
  To: James Morse
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

On 03/26/20 at 06:07pm, James Morse wrote:
> Hello!
> 
> arm64 recently queued support for memory hotremove, which led to some
> new corner cases for kexec.
> 
> If the kexec segments are loaded for a removable region, that region may
> be removed before kexec actually occurs. This causes the first kernel to
> lockup when applying the relocations. (I've triggered this on x86 too).

Do you mean you use 'kexec -l /boot/vmlinuz-xxxx --initrd ...' to load a
kernel, next you hot remove some memory regions, then you execute
'kexec -e' to trigger kexec reboot?

I may not get the point clearly, but we usually do the loading and
triggering of kexec-ed kernel at the same time. 

> 
> The first patch adds a memory notifier for kexec so that it can refuse
> to allow in-use regions to be taken offline.
> 
> 
> This doesn't solve the problem for arm64, where the new kernel must
> initially rely on the data structures from the first boot to describe
> memory. These don't describe hotpluggable memory.
> If kexec places the kernel in one of these regions, it must also provide
> a DT that describes the region in which the kernel was mapped as memory.
> (and somehow ensure its always present in the future...)
> 
> To prevent this from happening accidentally with unaware user-space,
> patches two and three allow arm64 to give these regions a different
> name.
> 
> This is a change in behaviour for arm64 as memory hotadd and hotremove
> were added separately.
> 
> 
> I haven't tried kdump.
> Unaware kdump from user-space probably won't describe the hotplug
> regions if the name is different, which saves us from problems if
> the memory is no longer present at kdump time, but means the vmcore
> is incomplete.

Currently, we will monitor udev events of mem hot add/remove, then
reload kdump kernel. That reloading is only update the elfcorehdr,
because crashkernel has to be reserved during 1st kernel bootup. I don't
think this will have problem.

> 
> 
> These patches are based on arm64's for-next/core branch, but can all
> be merged independently.
> 
> Thanks,
> 
> James Morse (3):
>   kexec: Prevent removal of memory in use by a loaded kexec image
>   mm/memory_hotplug: Allow arch override of non boot memory resource
>     names
>   arm64: memory: Give hotplug memory a different resource name
> 
>  arch/arm64/include/asm/memory.h | 11 +++++++
>  kernel/kexec_core.c             | 56 +++++++++++++++++++++++++++++++++
>  mm/memory_hotplug.c             |  6 +++-
>  3 files changed, 72 insertions(+), 1 deletion(-)
> 
> -- 
> 2.25.1
> 
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-26 18:07 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image James Morse
  2020-03-27  0:43   ` Anshuman Khandual
@ 2020-03-27  2:34   ` Baoquan He
  2020-03-27  9:30   ` David Hildenbrand
  2 siblings, 0 replies; 19+ messages in thread
From: Baoquan He @ 2020-03-27  2:34 UTC (permalink / raw)
  To: James Morse
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

On 03/26/20 at 06:07pm, James Morse wrote:
> An image loaded for kexec is not stored in place, instead its segments
> are scattered through memory, and are re-assembled when needed. In the
> meantime, the target memory may have been removed.
> 
> Because mm is not aware that this memory is still in use, it allows it
> to be removed.
> 
> Add a memory notifier to prevent the removal of memory regions that
> overlap with a loaded kexec image segment. e.g., when triggered from the
> Qemu console:
> | kexec_core: memory region in use
> | memory memory32: Offline failed.

As I replied to the cover letter, usually we do loading and juming of
kexec-ed kernel at one time. If we expect to do both of them at
different time, I agree we should do something to make thing safer if
someone really want to do since it's allowed, can we do anything with
the existing notifier?

Mem hotplug has got a notifier to notice it will offline a memory
region.
memory_notify(MEM_OFFLINE, &arg);

> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 56 insertions(+)
> 
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index c19c0dad1ebe..ba1d91e868ca 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -12,6 +12,7 @@
>  #include <linux/slab.h>
>  #include <linux/fs.h>
>  #include <linux/kexec.h>
> +#include <linux/memory.h>
>  #include <linux/mutex.h>
>  #include <linux/list.h>
>  #include <linux/highmem.h>
> @@ -22,10 +23,12 @@
>  #include <linux/elf.h>
>  #include <linux/elfcore.h>
>  #include <linux/utsname.h>
> +#include <linux/notifier.h>
>  #include <linux/numa.h>
>  #include <linux/suspend.h>
>  #include <linux/device.h>
>  #include <linux/freezer.h>
> +#include <linux/pfn.h>
>  #include <linux/pm.h>
>  #include <linux/cpu.h>
>  #include <linux/uaccess.h>
> @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void)
>  
>  void __weak arch_kexec_unprotect_crashkres(void)
>  {}
> +
> +/*
> + * If user-space wants to offline memory that is in use by a loaded kexec
> + * image, it should unload the image first.
> + */
> +static int mem_remove_cb(struct notifier_block *nb, unsigned long action,
> +			 void *data)
> +{
> +	int rv = NOTIFY_OK, i;
> +	struct memory_notify *arg = data;
> +	unsigned long pfn = arg->start_pfn;
> +	unsigned long nr_segments, sstart, send;
> +	unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
> +
> +	might_sleep();
> +
> +	if (action != MEM_GOING_OFFLINE)
> +		return NOTIFY_DONE;
> +
> +	mutex_lock(&kexec_mutex);
> +	if (kexec_image) {
> +		nr_segments = kexec_image->nr_segments;
> +
> +		for (i = 0; i < nr_segments; i++) {
> +			sstart = PFN_DOWN(kexec_image->segment[i].mem);
> +			send = PFN_UP(kexec_image->segment[i].mem +
> +				      kexec_image->segment[i].memsz);
> +
> +			if ((pfn <= sstart && sstart < end_pfn) ||
> +			    (pfn <= send && send < end_pfn)) {
> +				pr_warn("Memory region in use\n");
> +				rv = NOTIFY_BAD;
> +				break;
> +			}
> +		}
> +	}
> +	mutex_unlock(&kexec_mutex);
> +
> +	return rv;
> +}
> +
> +static struct notifier_block mem_remove_nb = {
> +	.notifier_call = mem_remove_cb,
> +};
> +
> +static int __init register_mem_remove_cb(void)
> +{
> +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
> +		return register_memory_notifier(&mem_remove_nb);
> +
> +	return 0;
> +}
> +device_initcall(register_mem_remove_cb);
> -- 
> 2.25.1
> 
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-27  0:43   ` Anshuman Khandual
@ 2020-03-27  2:54     ` Baoquan He
  2020-03-27 15:46     ` James Morse
  1 sibling, 0 replies; 19+ messages in thread
From: Baoquan He @ 2020-03-27  2:54 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse,
	Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel

On 03/27/20 at 06:13am, Anshuman Khandual wrote:
> 
> 
> On 03/26/2020 11:37 PM, James Morse wrote:
> > An image loaded for kexec is not stored in place, instead its segments
> > are scattered through memory, and are re-assembled when needed. In the
> > meantime, the target memory may have been removed.
> > 
> > Because mm is not aware that this memory is still in use, it allows it
> > to be removed.
> 
> Why the isolation process does not fail when these pages are currently
> being used by kexec ?

That is trick of kexec implementaiton. When loading kexec-ed kernel, it
just reads in the kenrel image in user space, then searches a place
where it's going to put kernel image in the whole system RAM, but won't
put kernel image in the searched region immediately, just book ahead a
room. When you execute 'kexec -e' to trigger kexec jumping, it will copy
kernel image into the booked room. So the booking is only recorded in
kexec's own data structure.

> 
> > 
> > Add a memory notifier to prevent the removal of memory regions that
> > overlap with a loaded kexec image segment. e.g., when triggered from the
> > Qemu console:
> > | kexec_core: memory region in use
> > | memory memory32: Offline failed.
> 
> Yes this is definitely an added protection for these kexec loaded kernels
> memory areas from being offlined but I would have expected the preceding
> offlining to have failed as well.
> 
> > 
> > Signed-off-by: James Morse <james.morse@arm.com>
> > ---
> >  kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 56 insertions(+)
> > 
> > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > index c19c0dad1ebe..ba1d91e868ca 100644
> > --- a/kernel/kexec_core.c
> > +++ b/kernel/kexec_core.c
> > @@ -12,6 +12,7 @@
> >  #include <linux/slab.h>
> >  #include <linux/fs.h>
> >  #include <linux/kexec.h>
> > +#include <linux/memory.h>
> >  #include <linux/mutex.h>
> >  #include <linux/list.h>
> >  #include <linux/highmem.h>
> > @@ -22,10 +23,12 @@
> >  #include <linux/elf.h>
> >  #include <linux/elfcore.h>
> >  #include <linux/utsname.h>
> > +#include <linux/notifier.h>
> >  #include <linux/numa.h>
> >  #include <linux/suspend.h>
> >  #include <linux/device.h>
> >  #include <linux/freezer.h>
> > +#include <linux/pfn.h>
> >  #include <linux/pm.h>
> >  #include <linux/cpu.h>
> >  #include <linux/uaccess.h>
> > @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void)
> >  
> >  void __weak arch_kexec_unprotect_crashkres(void)
> >  {}
> > +
> > +/*
> > + * If user-space wants to offline memory that is in use by a loaded kexec
> > + * image, it should unload the image first.
> > + */
> 
> Probably this would need kexec user manual and related system call man pages
> update as well.
> 
> > +static int mem_remove_cb(struct notifier_block *nb, unsigned long action,
> > +			 void *data)
> > +{
> > +	int rv = NOTIFY_OK, i;
> > +	struct memory_notify *arg = data;
> > +	unsigned long pfn = arg->start_pfn;
> > +	unsigned long nr_segments, sstart, send;
> > +	unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
> > +
> > +	might_sleep();
> 
> Required ?
> 
> > +
> > +	if (action != MEM_GOING_OFFLINE)
> > +		return NOTIFY_DONE;
> > +
> > +	mutex_lock(&kexec_mutex);
> > +	if (kexec_image) {
> > +		nr_segments = kexec_image->nr_segments;
> > +
> > +		for (i = 0; i < nr_segments; i++) {
> > +			sstart = PFN_DOWN(kexec_image->segment[i].mem);
> > +			send = PFN_UP(kexec_image->segment[i].mem +
> > +				      kexec_image->segment[i].memsz);
> > +
> > +			if ((pfn <= sstart && sstart < end_pfn) ||
> > +			    (pfn <= send && send < end_pfn)) {
> > +				pr_warn("Memory region in use\n");
> > +				rv = NOTIFY_BAD;
> > +				break;
> > +			}
> > +		}
> > +	}
> > +	mutex_unlock(&kexec_mutex);
> > +
> > +	return rv;
> 
> Variable 'rv' is redundant, should use NOTIFY_[BAD|OK] directly instead.
> 
> > +}
> > +
> > +static struct notifier_block mem_remove_nb = {
> > +	.notifier_call = mem_remove_cb,
> > +};
> > +
> > +static int __init register_mem_remove_cb(void)
> > +{
> > +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
> 
> Should not all these new code here be wrapped with CONFIG_MEMORY_HOTREMOVE
> to reduce the scope as well as final code size when the config is disabled.
> 
> > +		return register_memory_notifier(&mem_remove_nb);
> > +
> > +	return 0;
> > +}
> > +device_initcall(register_mem_remove_cb);
> > 
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use
  2020-03-26 18:07 [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use James Morse
                   ` (3 preceding siblings ...)
  2020-03-27  2:11 ` [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use Baoquan He
@ 2020-03-27  9:27 ` David Hildenbrand
  2020-03-27 15:42   ` James Morse
  4 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand @ 2020-03-27  9:27 UTC (permalink / raw)
  To: James Morse, kexec, linux-mm, linux-arm-kernel
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma,
	Eric Biederman, Andrew Morton, Will Deacon

On 26.03.20 19:07, James Morse wrote:
> Hello!
> 
> arm64 recently queued support for memory hotremove, which led to some
> new corner cases for kexec.
> 
> If the kexec segments are loaded for a removable region, that region may
> be removed before kexec actually occurs. This causes the first kernel to
> lockup when applying the relocations. (I've triggered this on x86 too).
> 
> The first patch adds a memory notifier for kexec so that it can refuse
> to allow in-use regions to be taken offline.

IIRC other architectures handle that by setting the affected pages
PageReserved. Any reason why to not stick to the same?

> 
> 
> This doesn't solve the problem for arm64, where the new kernel must
> initially rely on the data structures from the first boot to describe
> memory. These don't describe hotpluggable memory.
> If kexec places the kernel in one of these regions, it must also provide
> a DT that describes the region in which the kernel was mapped as memory.
> (and somehow ensure its always present in the future...)
> 
> To prevent this from happening accidentally with unaware user-space,
> patches two and three allow arm64 to give these regions a different
> name.
> 
> This is a change in behaviour for arm64 as memory hotadd and hotremove
> were added separately.
> 
> 
> I haven't tried kdump.
> Unaware kdump from user-space probably won't describe the hotplug
> regions if the name is different, which saves us from problems if
> the memory is no longer present at kdump time, but means the vmcore
> is incomplete.

Whenever memory is added/removed, kdump.service is to be restarted from
user space, which will fixup the data structures such that kdump will
not try to dump unplugged memory. Also, makedumpfile will check if the
sections are still around IIRC.

Not sure what you mean by "Unaware kdump from user-space".


-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-26 18:07 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image James Morse
  2020-03-27  0:43   ` Anshuman Khandual
  2020-03-27  2:34   ` Baoquan He
@ 2020-03-27  9:30   ` David Hildenbrand
  2020-03-27 16:56     ` James Morse
  2 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand @ 2020-03-27  9:30 UTC (permalink / raw)
  To: James Morse, kexec, linux-mm, linux-arm-kernel
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma,
	Eric Biederman, Andrew Morton, Will Deacon

On 26.03.20 19:07, James Morse wrote:
> An image loaded for kexec is not stored in place, instead its segments
> are scattered through memory, and are re-assembled when needed. In the
> meantime, the target memory may have been removed.
> 
> Because mm is not aware that this memory is still in use, it allows it
> to be removed.
> 
> Add a memory notifier to prevent the removal of memory regions that
> overlap with a loaded kexec image segment. e.g., when triggered from the
> Qemu console:
> | kexec_core: memory region in use
> | memory memory32: Offline failed.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 56 insertions(+)
> 
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index c19c0dad1ebe..ba1d91e868ca 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -12,6 +12,7 @@
>  #include <linux/slab.h>
>  #include <linux/fs.h>
>  #include <linux/kexec.h>
> +#include <linux/memory.h>
>  #include <linux/mutex.h>
>  #include <linux/list.h>
>  #include <linux/highmem.h>
> @@ -22,10 +23,12 @@
>  #include <linux/elf.h>
>  #include <linux/elfcore.h>
>  #include <linux/utsname.h>
> +#include <linux/notifier.h>
>  #include <linux/numa.h>
>  #include <linux/suspend.h>
>  #include <linux/device.h>
>  #include <linux/freezer.h>
> +#include <linux/pfn.h>
>  #include <linux/pm.h>
>  #include <linux/cpu.h>
>  #include <linux/uaccess.h>
> @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void)
>  
>  void __weak arch_kexec_unprotect_crashkres(void)
>  {}
> +
> +/*
> + * If user-space wants to offline memory that is in use by a loaded kexec
> + * image, it should unload the image first.
> + */
> +static int mem_remove_cb(struct notifier_block *nb, unsigned long action,
> +			 void *data)
> +{
> +	int rv = NOTIFY_OK, i;
> +	struct memory_notify *arg = data;
> +	unsigned long pfn = arg->start_pfn;
> +	unsigned long nr_segments, sstart, send;
> +	unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
> +
> +	might_sleep();
> +
> +	if (action != MEM_GOING_OFFLINE)
> +		return NOTIFY_DONE;
> +
> +	mutex_lock(&kexec_mutex);
> +	if (kexec_image) {
> +		nr_segments = kexec_image->nr_segments;
> +
> +		for (i = 0; i < nr_segments; i++) {
> +			sstart = PFN_DOWN(kexec_image->segment[i].mem);
> +			send = PFN_UP(kexec_image->segment[i].mem +
> +				      kexec_image->segment[i].memsz);
> +
> +			if ((pfn <= sstart && sstart < end_pfn) ||
> +			    (pfn <= send && send < end_pfn)) {
> +				pr_warn("Memory region in use\n");
> +				rv = NOTIFY_BAD;
> +				break;
> +			}
> +		}
> +	}
> +	mutex_unlock(&kexec_mutex);
> +
> +	return rv;
> +}
> +
> +static struct notifier_block mem_remove_nb = {
> +	.notifier_call = mem_remove_cb,
> +};
> +
> +static int __init register_mem_remove_cb(void)
> +{
> +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
> +		return register_memory_notifier(&mem_remove_nb);
> +
> +	return 0;
> +}
> +device_initcall(register_mem_remove_cb);
> 

E.g., in kernel/kexec_core.c:kimage_alloc_pages()

"SetPageReserved(pages + i);"

Pages that are reserved cannot get offlined. How are you able to trigger
that before this patch? (where is the allocation path for kexec, which
will not set the pages reserved?)

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names
  2020-03-26 18:07 ` [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names James Morse
@ 2020-03-27  9:59   ` David Hildenbrand
  2020-03-27 15:39     ` James Morse
  0 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand @ 2020-03-27  9:59 UTC (permalink / raw)
  To: James Morse, kexec, linux-mm, linux-arm-kernel
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma,
	Eric Biederman, Andrew Morton, Will Deacon

On 26.03.20 19:07, James Morse wrote:
> Memory added to the system by hotplug has a 'System RAM' resource created
> for it. This is exposed to user-space via /proc/iomem.
> 
> This poses problems for kexec on arm64. If kexec decides to place the
> kernel in one of these newly onlined regions, the new kernel will find
> itself booting from a region not described as memory in the firmware
> tables.
> 
> Arm64 doesn't have a structure like the e820 memory map that can be
> re-written when memory is brought online. Instead arm64 uses the UEFI
> memory map, or the memory node from the DT, sometimes both. We never
> rewrite these.
> 
> Allow an architecture to specify a different name for these hotplug
> regions.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  mm/memory_hotplug.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 0a54ffac8c68..69b03dd7fc74 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -42,6 +42,10 @@
>  #include "internal.h"
>  #include "shuffle.h"
>  
> +#ifndef MEMORY_HOTPLUG_RES_NAME
> +#define MEMORY_HOTPLUG_RES_NAME "System RAM"
> +#endif

So I assume changing this for all architectures would result in some
user space tool breaking? Are we aware of any?

I do wonder if we should simply change it for all architectures if possible.


> +
>  /*
>   * online_page_callback contains pointer to current page onlining function.
>   * Initially it is generic_online_page(). If it is required it could be
> @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size)
>  {
>  	struct resource *res;
>  	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> -	char *resource_name = "System RAM";
> +	char *resource_name = MEMORY_HOTPLUG_RES_NAME;
>  
>  	if (start + size > max_mem_size)
>  		return ERR_PTR(-E2BIG);
> 


-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names
  2020-03-27  9:59   ` David Hildenbrand
@ 2020-03-27 15:39     ` James Morse
  0 siblings, 0 replies; 19+ messages in thread
From: James Morse @ 2020-03-27 15:39 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

Hi David,

On 3/27/20 9:59 AM, David Hildenbrand wrote:
> On 26.03.20 19:07, James Morse wrote:
>> Memory added to the system by hotplug has a 'System RAM' resource created
>> for it. This is exposed to user-space via /proc/iomem.
>>
>> This poses problems for kexec on arm64. If kexec decides to place the
>> kernel in one of these newly onlined regions, the new kernel will find
>> itself booting from a region not described as memory in the firmware
>> tables.
>>
>> Arm64 doesn't have a structure like the e820 memory map that can be
>> re-written when memory is brought online. Instead arm64 uses the UEFI
>> memory map, or the memory node from the DT, sometimes both. We never
>> rewrite these.
>>
>> Allow an architecture to specify a different name for these hotplug
>> regions.


>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 0a54ffac8c68..69b03dd7fc74 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -42,6 +42,10 @@
>>  #include "internal.h"
>>  #include "shuffle.h"
>>  
>> +#ifndef MEMORY_HOTPLUG_RES_NAME
>> +#define MEMORY_HOTPLUG_RES_NAME "System RAM"
>> +#endif
> 
> So I assume changing this for all architectures would result in some
> user space tool breaking? Are we aware of any?

Last time we had to touch arm64's /proc/iomem strings I went through debian's
codesearch for stuff that reads it, kexec-tools was the only thing that parsed
it in anger. (From memory, the other tools were looking for PCIe windows to do
firmware flashing..)

Looking again, having qualifiers on the end of 'System RAM' looks like it could
confuse 's390-tools's detect_mem_chunks parser.

It looks like the strings that come out of 'FIRMWARE_MEMMAP' are a duplicate set.


> I do wonder if we should simply change it for all architectures if possible.

If its possible that would be great. But I suspect that ship has sailed,
changing it on other architectures could break some fragile parsing code.

I'm wary of changing it on arm64, the only thing that makes it tolerable is that
memory hot-add was relatively recently merged, and we don't anticipate it being
widely used until you can remove memory as well.

Changing it on arm64 is to prevent today's versions of kexec-tools from
accidentally placing the new kernel in memory that wasn't described at boot.
This leads to an unhandled exception during boot[0] because the kernel can't
access itself via the mapping of all memory. (hotpluggable regions are only
discovered by suitably configured ACPI systems much later)


Thanks,

James

[0]
| NUMA: NODE_DATA [mem 0x7fdf1780-0x7fdf3fff]
| Unable to handle kernel paging request at virtual address ffff00004230aff8
| Mem abort info:
|   ESR = 0x96000006
|   EC = 0x25: DABT (current EL), IL = 32 bits
|   SET = 0, FnV = 0
|   EA = 0, S1PTW = 0
| Data abort info:
|   ISV = 0, ISS = 0x00000006
|   CM = 0, WnR = 0
| swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000008181d000
| [ffff00004230aff8] pgd=000000007fff9003, pud=000000007fdf7003,
pmd=0000000000000000
| Internal error: Oops: 96000006 [#1] PREEMPT SMP
| Modules linked in:
| CPU: 0 PID: 0 Comm: swapper Not tainted 5.6.0-rc3-00098-g3f6c690f5dfe
#11618
| Hardware name: linux,dummy-virt (DT)
| pstate: 80400085 (Nzcv daIf +PAN -UAO BTYPE=--)
| pc : vmemmap_pud_populate+0x2c/0xa0
| lr : vmemmap_populate+0x78/0x154

| Call trace:
|  vmemmap_pud_populate+0x2c/0xa0
|  vmemmap_populate+0x78/0x154
|  __populate_section_memmap+0x3c/0x60
|  sparse_init_nid+0x29c/0x414
|  sparse_init+0x154/0x170
|  bootmem_init+0x78/0xdc
|  setup_arch+0x280/0x5d0
|  start_kernel+0x98/0x4f8
| Code: f9469a84 92748e73 8b010e61 cb040033 (f9400261)
| random: get_random_bytes called from print_oops_end_marker+0x34/0x60
with crng_init=0
| ---[ end trace 0000000000000000 ]---
| Kernel panic - not syncing: Attempted to kill the idle task!
| ---[ end Kernel panic - not syncing: Attempted to kill the idle task!



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use
  2020-03-27  2:11 ` [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use Baoquan He
@ 2020-03-27 15:40   ` James Morse
  0 siblings, 0 replies; 19+ messages in thread
From: James Morse @ 2020-03-27 15:40 UTC (permalink / raw)
  To: Baoquan He
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

Hi Baoquan,

On 3/27/20 2:11 AM, Baoquan He wrote:
> On 03/26/20 at 06:07pm, James Morse wrote:
>> arm64 recently queued support for memory hotremove, which led to some
>> new corner cases for kexec.
>>
>> If the kexec segments are loaded for a removable region, that region may
>> be removed before kexec actually occurs. This causes the first kernel to
>> lockup when applying the relocations. (I've triggered this on x86 too).

> Do you mean you use 'kexec -l /boot/vmlinuz-xxxx --initrd ...' to load a
> kernel, next you hot remove some memory regions, then you execute
> 'kexec -e' to trigger kexec reboot?

Yes. But to make it more fun, get someone else to trigger the hot-remove behind
your back!


> I may not get the point clearly, but we usually do the loading and
> triggering of kexec-ed kernel at the same time. 

But its two syscalls. Should the second one fail if the memory layout has
changed since the first?

(UEFI does this for exit-boot-services, there is handshake to prove you know
what the current memory map is)


>> The first patch adds a memory notifier for kexec so that it can refuse
>> to allow in-use regions to be taken offline.
>>
>>
>> This doesn't solve the problem for arm64, where the new kernel must
>> initially rely on the data structures from the first boot to describe
>> memory. These don't describe hotpluggable memory.
>> If kexec places the kernel in one of these regions, it must also provide
>> a DT that describes the region in which the kernel was mapped as memory.
>> (and somehow ensure its always present in the future...)
>>
>> To prevent this from happening accidentally with unaware user-space,
>> patches two and three allow arm64 to give these regions a different
>> name.
>>
>> This is a change in behaviour for arm64 as memory hotadd and hotremove
>> were added separately.
>>
>>
>> I haven't tried kdump.
>> Unaware kdump from user-space probably won't describe the hotplug
>> regions if the name is different, which saves us from problems if
>> the memory is no longer present at kdump time, but means the vmcore
>> is incomplete.

> Currently, we will monitor udev events of mem hot add/remove, then
> reload kdump kernel. That reloading is only update the elfcorehdr,
> because crashkernel has to be reserved during 1st kernel bootup. I don't
> think this will have problem.

Great. I don't think there is much the kernel can do for the kdump case, so its
good to know the tools already exist for detecting and restarting the kdump load
when the memory layout changes.

For kdump via kexec-file-load, we would need to regenerate the elfcorehdr, I'm
hoping that can be done in core code.


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use
  2020-03-27  9:27 ` David Hildenbrand
@ 2020-03-27 15:42   ` James Morse
  0 siblings, 0 replies; 19+ messages in thread
From: James Morse @ 2020-03-27 15:42 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

Hi David,

On 3/27/20 9:27 AM, David Hildenbrand wrote:
> On 26.03.20 19:07, James Morse wrote:
>> arm64 recently queued support for memory hotremove, which led to some
>> new corner cases for kexec.
>>
>> If the kexec segments are loaded for a removable region, that region may
>> be removed before kexec actually occurs. This causes the first kernel to
>> lockup when applying the relocations. (I've triggered this on x86 too).
>>
>> The first patch adds a memory notifier for kexec so that it can refuse
>> to allow in-use regions to be taken offline.

> IIRC other architectures handle that by setting the affected pages
> PageReserved. Any reason why to not stick to the same?

Hmm, I didn't spot this. How come core code doesn't do it if its needed?

Doesn't PG_Reserved prevent the page from being used for regular allocations?
(or is that only if its done early)

I prefer the runtime check as the dmesg output gives the user some chance of
knowing why their memory-offline failed, and doing something about it!


>> This doesn't solve the problem for arm64, where the new kernel must
>> initially rely on the data structures from the first boot to describe
>> memory. These don't describe hotpluggable memory.
>> If kexec places the kernel in one of these regions, it must also provide
>> a DT that describes the region in which the kernel was mapped as memory.
>> (and somehow ensure its always present in the future...)
>>
>> To prevent this from happening accidentally with unaware user-space,
>> patches two and three allow arm64 to give these regions a different
>> name.
>>
>> This is a change in behaviour for arm64 as memory hotadd and hotremove
>> were added separately.
>>
>>
>> I haven't tried kdump.
>> Unaware kdump from user-space probably won't describe the hotplug
>> regions if the name is different, which saves us from problems if
>> the memory is no longer present at kdump time, but means the vmcore
>> is incomplete.

> Whenever memory is added/removed, kdump.service is to be restarted from
> user space, which will fixup the data structures such that kdump will
> not try to dump unplugged memory.

Cunning.


> Also, makedumpfile will check if the
> sections are still around IIRC.

Curious. I thought the vmcore was virtually addressed, how does it know which
linear-map portions correspond to sysfs memory nodes with KASLR?


> Not sure what you mean by "Unaware kdump from user-space".

The existing kexec-tools binaries, that (I assume) don't go probing to find out
if 'System RAM' is removable or not, loading a kdump kernel, along with the
user-space generated blob that describes the first kernel's memory usage to the
second kernel.

'user-space' here to distinguish all this from kexec_file_load().



Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-27  0:43   ` Anshuman Khandual
  2020-03-27  2:54     ` Baoquan He
@ 2020-03-27 15:46     ` James Morse
  1 sibling, 0 replies; 19+ messages in thread
From: James Morse @ 2020-03-27 15:46 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman,
	Andrew Morton, Will Deacon, linux-arm-kernel

Hi Anshuman,

On 3/27/20 12:43 AM, Anshuman Khandual wrote:
> On 03/26/2020 11:37 PM, James Morse wrote:
>> An image loaded for kexec is not stored in place, instead its segments
>> are scattered through memory, and are re-assembled when needed. In the
>> meantime, the target memory may have been removed.
>>
>> Because mm is not aware that this memory is still in use, it allows it
>> to be removed.

> Why the isolation process does not fail when these pages are currently
> being used by kexec ?

Kexec isn't using them right now, but it will once kexec is triggered.
Those physical addresses are held in some internal kexec data structures until
kexec is triggered.


>> Add a memory notifier to prevent the removal of memory regions that
>> overlap with a loaded kexec image segment. e.g., when triggered from the
>> Qemu console:
>> | kexec_core: memory region in use
>> | memory memory32: Offline failed.
> 
> Yes this is definitely an added protection for these kexec loaded kernels
> memory areas from being offlined but I would have expected the preceding
> offlining to have failed as well.

kexec hasn't allocate the memory, part of the regions user-space may specify for
the next kernel may be in use. There is nothing to stop the memory being used in
the meantime.


Any other way of doing this would prevent us saying why it failed.
Like this, the user can spot the 'kexec: Memory region in use message', and
unload kexec.


>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>> index c19c0dad1ebe..ba1d91e868ca 100644
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c
>> @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void)
>>  
>>  void __weak arch_kexec_unprotect_crashkres(void)
>>  {}
>> +
>> +/*
>> + * If user-space wants to offline memory that is in use by a loaded kexec
>> + * image, it should unload the image first.
>> + */

> Probably this would need kexec user manual and related system call man pages
> update as well.

I can't see anything relevant under Documentation. (kdump yes, kexec no...)


>> +static int mem_remove_cb(struct notifier_block *nb, unsigned long action,
>> +			 void *data)
>> +{
>> +	int rv = NOTIFY_OK, i;
>> +	struct memory_notify *arg = data;
>> +	unsigned long pfn = arg->start_pfn;
>> +	unsigned long nr_segments, sstart, send;
>> +	unsigned long end_pfn = arg->start_pfn + arg->nr_pages;
>> +
>> +	might_sleep();
> 
> Required ?

Habit, and I think best practice. We take a mutex, so might_sleep(), but we also
conditionally return before lockdep would see the mutex. Having this annotation
means a dangerous change to the way this is called triggers a warning without
having to test memory hotplug explicitly.


>> +
>> +	if (action != MEM_GOING_OFFLINE)
>> +		return NOTIFY_DONE;
>> +
>> +	mutex_lock(&kexec_mutex);
>> +	if (kexec_image) {
>> +		nr_segments = kexec_image->nr_segments;
>> +
>> +		for (i = 0; i < nr_segments; i++) {
>> +			sstart = PFN_DOWN(kexec_image->segment[i].mem);
>> +			send = PFN_UP(kexec_image->segment[i].mem +
>> +				      kexec_image->segment[i].memsz);
>> +
>> +			if ((pfn <= sstart && sstart < end_pfn) ||
>> +			    (pfn <= send && send < end_pfn)) {
>> +				pr_warn("Memory region in use\n");
>> +				rv = NOTIFY_BAD;
>> +				break;
>> +			}
>> +		}
>> +	}
>> +	mutex_unlock(&kexec_mutex);
>> +
>> +	return rv;
> 
> Variable 'rv' is redundant, should use NOTIFY_[BAD|OK] directly instead.

You'd prefer a mutex_unlock() in the middle of the loop? ... or goto?
(I'm not convinced)

>> +}
>> +
>> +static struct notifier_block mem_remove_nb = {
>> +	.notifier_call = mem_remove_cb,
>> +};
>> +
>> +static int __init register_mem_remove_cb(void)
>> +{
>> +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
> 
> Should not all these new code here be wrapped with CONFIG_MEMORY_HOTREMOVE
> to reduce the scope as well as final code size when the config is disabled.

The compiler is really good at this. "if (false)" means this is all dead code,
and static means its not exported, so the compiler is free to remove it.

Not #ifdef-ing it makes it much more readable, and means the compiler checks its
valid C before it removes it. This avoids weird header include problems that
only show up on some rand-config builds.


Thanks,

James

>> +		return register_memory_notifier(&mem_remove_nb);
>> +
>> +	return 0;
>> +}
>> +device_initcall(register_mem_remove_cb);
>>
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-27  9:30   ` David Hildenbrand
@ 2020-03-27 16:56     ` James Morse
  2020-03-27 17:06       ` David Hildenbrand
  0 siblings, 1 reply; 19+ messages in thread
From: James Morse @ 2020-03-27 16:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

Hi David,

On 3/27/20 9:30 AM, David Hildenbrand wrote:
> On 26.03.20 19:07, James Morse wrote:
>> An image loaded for kexec is not stored in place, instead its segments
>> are scattered through memory, and are re-assembled when needed. In the
>> meantime, the target memory may have been removed.
>>
>> Because mm is not aware that this memory is still in use, it allows it
>> to be removed.
>>
>> Add a memory notifier to prevent the removal of memory regions that
>> overlap with a loaded kexec image segment. e.g., when triggered from the
>> Qemu console:
>> | kexec_core: memory region in use
>> | memory memory32: Offline failed.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> ---
>>  kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 56 insertions(+)
>>
>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>> index c19c0dad1ebe..ba1d91e868ca 100644
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c

> E.g., in kernel/kexec_core.c:kimage_alloc_pages()
> 
> "SetPageReserved(pages + i);"
> 
> Pages that are reserved cannot get offlined. How are you able to trigger
> that before this patch? (where is the allocation path for kexec, which
> will not set the pages reserved?)

This sets page reserved on the memory it gets back from
alloc_pages() in kimage_alloc_pages(). This is when you load the image[0].

The problem I see is for the target or destination memory once you execute the
image. Once machine_kexec() runs, it tries to write to this, assuming it is
still present...


How can I make the commit message clearer?

're-assembled' and 'target memory' aren't quite cutting it, is there are a
correct term to use?
(destination?)



Thanks,

James


[0] Just to convince myself:
| kimage_alloc_pages+0x30/0x15c
| kimage_alloc_page+0x210/0x7d8
| kimage_load_segment+0x14c/0x8c8
| __arm64_sys_kexec_load+0x4f0/0x720
| do_el0_svc+0x13c/0x3c0
| el0_sync_handler+0x9c/0x3c0
| el0_sync+0x158/0x180

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-27 16:56     ` James Morse
@ 2020-03-27 17:06       ` David Hildenbrand
  2020-03-27 18:07         ` James Morse
  0 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand @ 2020-03-27 17:06 UTC (permalink / raw)
  To: James Morse
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

On 27.03.20 17:56, James Morse wrote:
> Hi David,
> 
> On 3/27/20 9:30 AM, David Hildenbrand wrote:
>> On 26.03.20 19:07, James Morse wrote:
>>> An image loaded for kexec is not stored in place, instead its segments
>>> are scattered through memory, and are re-assembled when needed. In the
>>> meantime, the target memory may have been removed.
>>>
>>> Because mm is not aware that this memory is still in use, it allows it
>>> to be removed.
>>>
>>> Add a memory notifier to prevent the removal of memory regions that
>>> overlap with a loaded kexec image segment. e.g., when triggered from the
>>> Qemu console:
>>> | kexec_core: memory region in use
>>> | memory memory32: Offline failed.
>>>
>>> Signed-off-by: James Morse <james.morse@arm.com>
>>> ---
>>>  kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 56 insertions(+)
>>>
>>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>>> index c19c0dad1ebe..ba1d91e868ca 100644
>>> --- a/kernel/kexec_core.c
>>> +++ b/kernel/kexec_core.c
> 
>> E.g., in kernel/kexec_core.c:kimage_alloc_pages()
>>
>> "SetPageReserved(pages + i);"
>>
>> Pages that are reserved cannot get offlined. How are you able to trigger
>> that before this patch? (where is the allocation path for kexec, which
>> will not set the pages reserved?)
> 
> This sets page reserved on the memory it gets back from
> alloc_pages() in kimage_alloc_pages(). This is when you load the image[0].
> 
> The problem I see is for the target or destination memory once you execute the
> image. Once machine_kexec() runs, it tries to write to this, assuming it is
> still present...

Let's recap

1. You load the image. You allocate memory for e.g., the kexec kernel.
The pages will be marked PG_reserved, so they cannot be offlined.

2. You do the kexec. The kexec kernel will only operate on a reserved
memory region (reserved via e.g., kernel cmdline crashkernel=128M).

Is it that in 2., the reserved memory region (for the crashkernel) could
have been offlined in the meantime?

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-27 17:06       ` David Hildenbrand
@ 2020-03-27 18:07         ` James Morse
  2020-03-27 18:52           ` David Hildenbrand
  0 siblings, 1 reply; 19+ messages in thread
From: James Morse @ 2020-03-27 18:07 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

Hi David,

On 3/27/20 5:06 PM, David Hildenbrand wrote:
> On 27.03.20 17:56, James Morse wrote:
>> On 3/27/20 9:30 AM, David Hildenbrand wrote:
>>> On 26.03.20 19:07, James Morse wrote:
>>>> An image loaded for kexec is not stored in place, instead its segments
>>>> are scattered through memory, and are re-assembled when needed. In the
>>>> meantime, the target memory may have been removed.
>>>>
>>>> Because mm is not aware that this memory is still in use, it allows it
>>>> to be removed.
>>>>
>>>> Add a memory notifier to prevent the removal of memory regions that
>>>> overlap with a loaded kexec image segment. e.g., when triggered from the
>>>> Qemu console:
>>>> | kexec_core: memory region in use
>>>> | memory memory32: Offline failed.

>>>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>>>> index c19c0dad1ebe..ba1d91e868ca 100644
>>>> --- a/kernel/kexec_core.c
>>>> +++ b/kernel/kexec_core.c
>>
>>> E.g., in kernel/kexec_core.c:kimage_alloc_pages()
>>>
>>> "SetPageReserved(pages + i);"
>>>
>>> Pages that are reserved cannot get offlined. How are you able to trigger
>>> that before this patch? (where is the allocation path for kexec, which
>>> will not set the pages reserved?)
>>
>> This sets page reserved on the memory it gets back from
>> alloc_pages() in kimage_alloc_pages(). This is when you load the image[0].
>>
>> The problem I see is for the target or destination memory once you execute the
>> image. Once machine_kexec() runs, it tries to write to this, assuming it is
>> still present...

> Let's recap
> 
> 1. You load the image. You allocate memory for e.g., the kexec kernel.
> The pages will be marked PG_reserved, so they cannot be offlined.
> 
> 2. You do the kexec. The kexec kernel will only operate on a reserved
> memory region (reserved via e.g., kernel cmdline crashkernel=128M).

I think you are merging the kexec and kdump behaviours.
(Wrong terminology? The things behind 'kexec -l Image' and 'kexec -p Image')

For kdump, yes, the new kernel is loaded into the crashkernel reservation, and
confined to it.


For regular kexec, the new kernel can be loaded any where in memory. There might
be a difference with how this works on arm64....

The regular kexec kernel isn't stored in its final location when its loaded, its
relocated there when the image is executed. The target/destination memory may
have been removed in the meantime.

(an example recipe below should clarify this)


> Is it that in 2., the reserved memory region (for the crashkernel) could
> have been offlined in the meantime?

No, for kdump: the crashkernel reservation is PG_reserved, and its not something
mm knows how to move, so that region can't be taken offline.

(On arm64 we additionally prevent the boot-memory from being removed as it is
all described as present by UEFI. The crashkernel reservation would always be
from this type of memory)


This is about a regular kexec, any crashdump reservation is irrelevant.
This kexec kernel is temporarily stored out of line, then relocated when executed.

A recipe so that we're at least on the same terminal! This is on a TX2 running
arm64's for-next/core using Qemu-TCG to emulate x86. (Sorry for the bizarre
config, its because Qemu supports hotremove on x86, but not yet on arm64).


Insert the memory:
(qemu) object_add memory-backend-ram,id=mem1,size=1G
(qemu) device_add pc-dimm,id=dimm1,memdev=mem1

| root@vm:~# free -m
|               total        used        free      shared  ...
| Mem:            918          52         814           0  ...
| Swap:             0           0           0


Bring it online:
| root@vm:~# cd /sys/devices/system/memory/
| root@vm:/sys/devices/system/memory# for F in memory3*; do echo \
| online_movable > $F/state; done

| Built 1 zonelists, mobility grouping on.  Total pages: 251049
| Policy zone: DMA32

| -bash: echo: write error: Invalid argument
| root@vm:/sys/devices/system/memory# free -m
|               total        used        free      shared  ...
| Mem:           1942          53        1836           0  ...
| Swap:             0           0           0


Load kexec:
| root@vm:/sys/devices/system/memory# kexec -l /root/bzImage --reuse-cmdline

Press the Attention button to request removal:

(qemu) device_del dimm1

| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Built 1 zonelists, mobility grouping on.  Total pages: 233728
| Policy zone: DMA32

The memory is gone:
| root@vm:/sys/devices/system/memory# free -m
|               total        used        free      shared  ...
| Mem:            918          89         769           0  ...
| Swap:             0           0           0

Trigger kexec:
| root@vm:/sys/devices/system/memory# kexec -e

[...]

| sd 0:0:0:0: [sda] Synchronizing SCSI cache
| kexec_core: Starting new kernel

... and Qemu restarts the platform firmware instead of proceeding with kexec.
(I assume this is a triple fault)

You can use mem-min and mem-max to control where kexec's user space will place
the memory.


If you apply this patch, the above sequence will fail at the device remove step,
as the physical addresses match the loaded kexec image:

| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| Offlined Pages 32768
| kexec_core: Memory region in use
| kexec_core: Memory region in use
| memory memory39: Offline failed.
| Built 1 zonelists, mobility grouping on.  Total pages: 299212
| Policy zone: Normal

| root@vm:/sys/devices/system/memory# free -m
|               total        used        free      shared  ...
| Mem:           1942          90        1793           0    ...
| Swap:             0           0           0

I can't remove the DIMM, because we failed to offline it:

(qemu) object_del mem1
object 'mem1' is in use, can not be deleted

and I can trigger kexec and boot the new kernel.

kexec user-space here comes from debian bullseye. It picked the removable memory
all by itself without any additional arguments.


(a different issue that can be ignored for now: x86 additionally fails to reboot
if I remove memory, even if its not in use by the kexec image. This doesn't
cause qemu to reboot via firmware, I think it dies before the console. It
doesn't happen on arm64. I suspect the memory map is snapshotted and assumed to
still be correct when the image is executed.)


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image
  2020-03-27 18:07         ` James Morse
@ 2020-03-27 18:52           ` David Hildenbrand
  0 siblings, 0 replies; 19+ messages in thread
From: David Hildenbrand @ 2020-03-27 18:52 UTC (permalink / raw)
  To: James Morse
  Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec,
	linux-mm, Eric Biederman, Andrew Morton, Will Deacon,
	linux-arm-kernel

>> 2. You do the kexec. The kexec kernel will only operate on a reserved
>> memory region (reserved via e.g., kernel cmdline crashkernel=128M).
> 
> I think you are merging the kexec and kdump behaviours.
> (Wrong terminology? The things behind 'kexec -l Image' and 'kexec -p Image')

Oh, I see - I think your example below clarifies things. Something like
that should go in the cover letter if we end up in this patch being
required :)

(I missed that the problematic part is "random" addresses passed by user
space to the kernel, where it wants data to be loaded to on kexec -e)

> 
> For kdump, yes, the new kernel is loaded into the crashkernel reservation, and
> confined to it.
> 
> 
> For regular kexec, the new kernel can be loaded any where in memory. There might
> be a difference with how this works on arm64....
> 
> The regular kexec kernel isn't stored in its final location when its loaded, its
> relocated there when the image is executed. The target/destination memory may
> have been removed in the meantime.
> 
> (an example recipe below should clarify this)
> 
> 
>> Is it that in 2., the reserved memory region (for the crashkernel) could
>> have been offlined in the meantime?
> 
> No, for kdump: the crashkernel reservation is PG_reserved, and its not something
> mm knows how to move, so that region can't be taken offline.
> 
> (On arm64 we additionally prevent the boot-memory from being removed as it is
> all described as present by UEFI. The crashkernel reservation would always be
> from this type of memory)

Right.

> 
> 
> This is about a regular kexec, any crashdump reservation is irrelevant.
> This kexec kernel is temporarily stored out of line, then relocated when executed.
> 
> A recipe so that we're at least on the same terminal! This is on a TX2 running
> arm64's for-next/core using Qemu-TCG to emulate x86. (Sorry for the bizarre
> config, its because Qemu supports hotremove on x86, but not yet on arm64).
> 
> 
> Insert the memory:
> (qemu) object_add memory-backend-ram,id=mem1,size=1G
> (qemu) device_add pc-dimm,id=dimm1,memdev=mem1
> 
> | root@vm:~# free -m
> |               total        used        free      shared  ...
> | Mem:            918          52         814           0  ...
> | Swap:             0           0           0
> 
> 
> Bring it online:
> | root@vm:~# cd /sys/devices/system/memory/
> | root@vm:/sys/devices/system/memory# for F in memory3*; do echo \
> | online_movable > $F/state; done
> 
> | Built 1 zonelists, mobility grouping on.  Total pages: 251049
> | Policy zone: DMA32
> 
> | -bash: echo: write error: Invalid argument
> | root@vm:/sys/devices/system/memory# free -m
> |               total        used        free      shared  ...
> | Mem:           1942          53        1836           0  ...
> | Swap:             0           0           0
> 
> 
> Load kexec:
> | root@vm:/sys/devices/system/memory# kexec -l /root/bzImage --reuse-cmdline
> 

I assume this will trigger

kexec_load -> do_kexec_load -> kimage_load_segment ->
kimage_load_normal_segment -> kimage_alloc_page -> kimage_alloc_pages

Which will just allocate a bunch of pages and mark them reserved.

Now, AFAIKs, all allocations will be unmovable. So none of the kexec
segment allocations will actually end up on your DIMM (as it is onlined
online_movable).

So, the loaded image (with its segments) from user won't be problematic
and not get placed on your DIMM.


Now, the problematic part is (via man kexec_load) "mem and memsz specify
a physical address range that is the target of the copy."

So the place where the image will be "assembled" at when doing the
reboot. Understood :)

> Press the Attention button to request removal:
> 
> (qemu) device_del dimm1
> 
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Built 1 zonelists, mobility grouping on.  Total pages: 233728
> | Policy zone: DMA32
> 
> The memory is gone:
> | root@vm:/sys/devices/system/memory# free -m
> |               total        used        free      shared  ...
> | Mem:            918          89         769           0  ...
> | Swap:             0           0           0
> 
> Trigger kexec:
> | root@vm:/sys/devices/system/memory# kexec -e
> 
> [...]
> 
> | sd 0:0:0:0: [sda] Synchronizing SCSI cache
> | kexec_core: Starting new kernel
> 
> ... and Qemu restarts the platform firmware instead of proceeding with kexec.
> (I assume this is a triple fault)
> 
> You can use mem-min and mem-max to control where kexec's user space will place
> the memory.
> 
> 
> If you apply this patch, the above sequence will fail at the device remove step,
> as the physical addresses match the loaded kexec image:
> 
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | Offlined Pages 32768
> | kexec_core: Memory region in use
> | kexec_core: Memory region in use

Okay, so I assume the kexec userspace tool provided target kernel
addresses for segments that reside on the DIMM.


> | memory memory39: Offline failed.
> | Built 1 zonelists, mobility grouping on.  Total pages: 299212
> | Policy zone: Normal
> 
> | root@vm:/sys/devices/system/memory# free -m
> |               total        used        free      shared  ...
> | Mem:           1942          90        1793           0    ...
> | Swap:             0           0           0
> 
> I can't remove the DIMM, because we failed to offline it:

I wonder if we should instead make the "kexec -e" fail. It tries to
touch random system memory.

Denying to offline MOVABLE memory should be avoided - and what kexec
does here sounds dangerous to me (allowing it to write random system
memory).

Roughly what I am thinking is this:

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index ba1d91e868ca..70c39a5307e5 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1135,6 +1135,10 @@ int kernel_kexec(void)
                error = -EINVAL;
                goto Unlock;
        }
+       if (!kexec_image_validate()) {
+               error = -EINVAL;
+               goto Unlock;
+       }

 #ifdef CONFIG_KEXEC_JUMP
        if (kexec_image->preserve_context) {


kexec_image_validate() would go over all segments and validate that the
involved pages are actual valid memory (pfn_to_online_page()).

All we have to do is protect from memory hotplug until we switch to the
new kernel.

Will probably need some thought. But it will actually also bail out when
user space passes wrong physical memory addresses, instead of
triple-faulting silently.

-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, back to index

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-26 18:07 [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use James Morse
2020-03-26 18:07 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image James Morse
2020-03-27  0:43   ` Anshuman Khandual
2020-03-27  2:54     ` Baoquan He
2020-03-27 15:46     ` James Morse
2020-03-27  2:34   ` Baoquan He
2020-03-27  9:30   ` David Hildenbrand
2020-03-27 16:56     ` James Morse
2020-03-27 17:06       ` David Hildenbrand
2020-03-27 18:07         ` James Morse
2020-03-27 18:52           ` David Hildenbrand
2020-03-26 18:07 ` [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names James Morse
2020-03-27  9:59   ` David Hildenbrand
2020-03-27 15:39     ` James Morse
2020-03-26 18:07 ` [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name James Morse
2020-03-27  2:11 ` [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use Baoquan He
2020-03-27 15:40   ` James Morse
2020-03-27  9:27 ` David Hildenbrand
2020-03-27 15:42   ` James Morse

Linux-ARM-Kernel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-arm-kernel/0 linux-arm-kernel/git/0.git
	git clone --mirror https://lore.kernel.org/linux-arm-kernel/1 linux-arm-kernel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-arm-kernel linux-arm-kernel/ https://lore.kernel.org/linux-arm-kernel \
		linux-arm-kernel@lists.infradead.org
	public-inbox-index linux-arm-kernel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-arm-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git