* [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-03-26 18:07 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-26 18:07 UTC (permalink / raw) To: kexec, linux-mm, linux-arm-kernel Cc: Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma, James Morse Hello! arm64 recently queued support for memory hotremove, which led to some new corner cases for kexec. If the kexec segments are loaded for a removable region, that region may be removed before kexec actually occurs. This causes the first kernel to lockup when applying the relocations. (I've triggered this on x86 too). The first patch adds a memory notifier for kexec so that it can refuse to allow in-use regions to be taken offline. This doesn't solve the problem for arm64, where the new kernel must initially rely on the data structures from the first boot to describe memory. These don't describe hotpluggable memory. If kexec places the kernel in one of these regions, it must also provide a DT that describes the region in which the kernel was mapped as memory. (and somehow ensure its always present in the future...) To prevent this from happening accidentally with unaware user-space, patches two and three allow arm64 to give these regions a different name. This is a change in behaviour for arm64 as memory hotadd and hotremove were added separately. I haven't tried kdump. Unaware kdump from user-space probably won't describe the hotplug regions if the name is different, which saves us from problems if the memory is no longer present at kdump time, but means the vmcore is incomplete. These patches are based on arm64's for-next/core branch, but can all be merged independently. Thanks, James Morse (3): kexec: Prevent removal of memory in use by a loaded kexec image mm/memory_hotplug: Allow arch override of non boot memory resource names arm64: memory: Give hotplug memory a different resource name arch/arm64/include/asm/memory.h | 11 +++++++ kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++ mm/memory_hotplug.c | 6 +++- 3 files changed, 72 insertions(+), 1 deletion(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 264+ messages in thread
* [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-03-26 18:07 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-26 18:07 UTC (permalink / raw) To: kexec, linux-mm, linux-arm-kernel Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, James Morse, Eric Biederman, Andrew Morton, Will Deacon Hello! arm64 recently queued support for memory hotremove, which led to some new corner cases for kexec. If the kexec segments are loaded for a removable region, that region may be removed before kexec actually occurs. This causes the first kernel to lockup when applying the relocations. (I've triggered this on x86 too). The first patch adds a memory notifier for kexec so that it can refuse to allow in-use regions to be taken offline. This doesn't solve the problem for arm64, where the new kernel must initially rely on the data structures from the first boot to describe memory. These don't describe hotpluggable memory. If kexec places the kernel in one of these regions, it must also provide a DT that describes the region in which the kernel was mapped as memory. (and somehow ensure its always present in the future...) To prevent this from happening accidentally with unaware user-space, patches two and three allow arm64 to give these regions a different name. This is a change in behaviour for arm64 as memory hotadd and hotremove were added separately. I haven't tried kdump. Unaware kdump from user-space probably won't describe the hotplug regions if the name is different, which saves us from problems if the memory is no longer present at kdump time, but means the vmcore is incomplete. These patches are based on arm64's for-next/core branch, but can all be merged independently. Thanks, James Morse (3): kexec: Prevent removal of memory in use by a loaded kexec image mm/memory_hotplug: Allow arch override of non boot memory resource names arm64: memory: Give hotplug memory a different resource name arch/arm64/include/asm/memory.h | 11 +++++++ kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++ mm/memory_hotplug.c | 6 +++- 3 files changed, 72 insertions(+), 1 deletion(-) -- 2.25.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-03-26 18:07 ` James Morse @ 2020-03-26 18:07 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-26 18:07 UTC (permalink / raw) To: kexec, linux-mm, linux-arm-kernel Cc: Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma, James Morse An image loaded for kexec is not stored in place, instead its segments are scattered through memory, and are re-assembled when needed. In the meantime, the target memory may have been removed. Because mm is not aware that this memory is still in use, it allows it to be removed. Add a memory notifier to prevent the removal of memory regions that overlap with a loaded kexec image segment. e.g., when triggered from the Qemu console: | kexec_core: memory region in use | memory memory32: Offline failed. Signed-off-by: James Morse <james.morse@arm.com> --- kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index c19c0dad1ebe..ba1d91e868ca 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -12,6 +12,7 @@ #include <linux/slab.h> #include <linux/fs.h> #include <linux/kexec.h> +#include <linux/memory.h> #include <linux/mutex.h> #include <linux/list.h> #include <linux/highmem.h> @@ -22,10 +23,12 @@ #include <linux/elf.h> #include <linux/elfcore.h> #include <linux/utsname.h> +#include <linux/notifier.h> #include <linux/numa.h> #include <linux/suspend.h> #include <linux/device.h> #include <linux/freezer.h> +#include <linux/pfn.h> #include <linux/pm.h> #include <linux/cpu.h> #include <linux/uaccess.h> @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void) void __weak arch_kexec_unprotect_crashkres(void) {} + +/* + * If user-space wants to offline memory that is in use by a loaded kexec + * image, it should unload the image first. + */ +static int mem_remove_cb(struct notifier_block *nb, unsigned long action, + void *data) +{ + int rv = NOTIFY_OK, i; + struct memory_notify *arg = data; + unsigned long pfn = arg->start_pfn; + unsigned long nr_segments, sstart, send; + unsigned long end_pfn = arg->start_pfn + arg->nr_pages; + + might_sleep(); + + if (action != MEM_GOING_OFFLINE) + return NOTIFY_DONE; + + mutex_lock(&kexec_mutex); + if (kexec_image) { + nr_segments = kexec_image->nr_segments; + + for (i = 0; i < nr_segments; i++) { + sstart = PFN_DOWN(kexec_image->segment[i].mem); + send = PFN_UP(kexec_image->segment[i].mem + + kexec_image->segment[i].memsz); + + if ((pfn <= sstart && sstart < end_pfn) || + (pfn <= send && send < end_pfn)) { + pr_warn("Memory region in use\n"); + rv = NOTIFY_BAD; + break; + } + } + } + mutex_unlock(&kexec_mutex); + + return rv; +} + +static struct notifier_block mem_remove_nb = { + .notifier_call = mem_remove_cb, +}; + +static int __init register_mem_remove_cb(void) +{ + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) + return register_memory_notifier(&mem_remove_nb); + + return 0; +} +device_initcall(register_mem_remove_cb); -- 2.25.1 ^ permalink raw reply related [flat|nested] 264+ messages in thread
* [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-03-26 18:07 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-26 18:07 UTC (permalink / raw) To: kexec, linux-mm, linux-arm-kernel Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, James Morse, Eric Biederman, Andrew Morton, Will Deacon An image loaded for kexec is not stored in place, instead its segments are scattered through memory, and are re-assembled when needed. In the meantime, the target memory may have been removed. Because mm is not aware that this memory is still in use, it allows it to be removed. Add a memory notifier to prevent the removal of memory regions that overlap with a loaded kexec image segment. e.g., when triggered from the Qemu console: | kexec_core: memory region in use | memory memory32: Offline failed. Signed-off-by: James Morse <james.morse@arm.com> --- kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index c19c0dad1ebe..ba1d91e868ca 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -12,6 +12,7 @@ #include <linux/slab.h> #include <linux/fs.h> #include <linux/kexec.h> +#include <linux/memory.h> #include <linux/mutex.h> #include <linux/list.h> #include <linux/highmem.h> @@ -22,10 +23,12 @@ #include <linux/elf.h> #include <linux/elfcore.h> #include <linux/utsname.h> +#include <linux/notifier.h> #include <linux/numa.h> #include <linux/suspend.h> #include <linux/device.h> #include <linux/freezer.h> +#include <linux/pfn.h> #include <linux/pm.h> #include <linux/cpu.h> #include <linux/uaccess.h> @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void) void __weak arch_kexec_unprotect_crashkres(void) {} + +/* + * If user-space wants to offline memory that is in use by a loaded kexec + * image, it should unload the image first. + */ +static int mem_remove_cb(struct notifier_block *nb, unsigned long action, + void *data) +{ + int rv = NOTIFY_OK, i; + struct memory_notify *arg = data; + unsigned long pfn = arg->start_pfn; + unsigned long nr_segments, sstart, send; + unsigned long end_pfn = arg->start_pfn + arg->nr_pages; + + might_sleep(); + + if (action != MEM_GOING_OFFLINE) + return NOTIFY_DONE; + + mutex_lock(&kexec_mutex); + if (kexec_image) { + nr_segments = kexec_image->nr_segments; + + for (i = 0; i < nr_segments; i++) { + sstart = PFN_DOWN(kexec_image->segment[i].mem); + send = PFN_UP(kexec_image->segment[i].mem + + kexec_image->segment[i].memsz); + + if ((pfn <= sstart && sstart < end_pfn) || + (pfn <= send && send < end_pfn)) { + pr_warn("Memory region in use\n"); + rv = NOTIFY_BAD; + break; + } + } + } + mutex_unlock(&kexec_mutex); + + return rv; +} + +static struct notifier_block mem_remove_nb = { + .notifier_call = mem_remove_cb, +}; + +static int __init register_mem_remove_cb(void) +{ + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) + return register_memory_notifier(&mem_remove_nb); + + return 0; +} +device_initcall(register_mem_remove_cb); -- 2.25.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-03-26 18:07 ` James Morse @ 2020-03-27 0:43 ` Anshuman Khandual -1 siblings, 0 replies; 264+ messages in thread From: Anshuman Khandual @ 2020-03-27 0:43 UTC (permalink / raw) To: James Morse, kexec, linux-mm, linux-arm-kernel Cc: Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Bhupesh Sharma On 03/26/2020 11:37 PM, James Morse wrote: > An image loaded for kexec is not stored in place, instead its segments > are scattered through memory, and are re-assembled when needed. In the > meantime, the target memory may have been removed. > > Because mm is not aware that this memory is still in use, it allows it > to be removed. Why the isolation process does not fail when these pages are currently being used by kexec ? > > Add a memory notifier to prevent the removal of memory regions that > overlap with a loaded kexec image segment. e.g., when triggered from the > Qemu console: > | kexec_core: memory region in use > | memory memory32: Offline failed. Yes this is definitely an added protection for these kexec loaded kernels memory areas from being offlined but I would have expected the preceding offlining to have failed as well. > > Signed-off-by: James Morse <james.morse@arm.com> > --- > kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 56 insertions(+) > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index c19c0dad1ebe..ba1d91e868ca 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -12,6 +12,7 @@ > #include <linux/slab.h> > #include <linux/fs.h> > #include <linux/kexec.h> > +#include <linux/memory.h> > #include <linux/mutex.h> > #include <linux/list.h> > #include <linux/highmem.h> > @@ -22,10 +23,12 @@ > #include <linux/elf.h> > #include <linux/elfcore.h> > #include <linux/utsname.h> > +#include <linux/notifier.h> > #include <linux/numa.h> > #include <linux/suspend.h> > #include <linux/device.h> > #include <linux/freezer.h> > +#include <linux/pfn.h> > #include <linux/pm.h> > #include <linux/cpu.h> > #include <linux/uaccess.h> > @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void) > > void __weak arch_kexec_unprotect_crashkres(void) > {} > + > +/* > + * If user-space wants to offline memory that is in use by a loaded kexec > + * image, it should unload the image first. > + */ Probably this would need kexec user manual and related system call man pages update as well. > +static int mem_remove_cb(struct notifier_block *nb, unsigned long action, > + void *data) > +{ > + int rv = NOTIFY_OK, i; > + struct memory_notify *arg = data; > + unsigned long pfn = arg->start_pfn; > + unsigned long nr_segments, sstart, send; > + unsigned long end_pfn = arg->start_pfn + arg->nr_pages; > + > + might_sleep(); Required ? > + > + if (action != MEM_GOING_OFFLINE) > + return NOTIFY_DONE; > + > + mutex_lock(&kexec_mutex); > + if (kexec_image) { > + nr_segments = kexec_image->nr_segments; > + > + for (i = 0; i < nr_segments; i++) { > + sstart = PFN_DOWN(kexec_image->segment[i].mem); > + send = PFN_UP(kexec_image->segment[i].mem + > + kexec_image->segment[i].memsz); > + > + if ((pfn <= sstart && sstart < end_pfn) || > + (pfn <= send && send < end_pfn)) { > + pr_warn("Memory region in use\n"); > + rv = NOTIFY_BAD; > + break; > + } > + } > + } > + mutex_unlock(&kexec_mutex); > + > + return rv; Variable 'rv' is redundant, should use NOTIFY_[BAD|OK] directly instead. > +} > + > +static struct notifier_block mem_remove_nb = { > + .notifier_call = mem_remove_cb, > +}; > + > +static int __init register_mem_remove_cb(void) > +{ > + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) Should not all these new code here be wrapped with CONFIG_MEMORY_HOTREMOVE to reduce the scope as well as final code size when the config is disabled. > + return register_memory_notifier(&mem_remove_nb); > + > + return 0; > +} > +device_initcall(register_mem_remove_cb); > ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-03-27 0:43 ` Anshuman Khandual 0 siblings, 0 replies; 264+ messages in thread From: Anshuman Khandual @ 2020-03-27 0:43 UTC (permalink / raw) To: James Morse, kexec, linux-mm, linux-arm-kernel Cc: Catalin Marinas, Andrew Morton, Bhupesh Sharma, Will Deacon, Eric Biederman On 03/26/2020 11:37 PM, James Morse wrote: > An image loaded for kexec is not stored in place, instead its segments > are scattered through memory, and are re-assembled when needed. In the > meantime, the target memory may have been removed. > > Because mm is not aware that this memory is still in use, it allows it > to be removed. Why the isolation process does not fail when these pages are currently being used by kexec ? > > Add a memory notifier to prevent the removal of memory regions that > overlap with a loaded kexec image segment. e.g., when triggered from the > Qemu console: > | kexec_core: memory region in use > | memory memory32: Offline failed. Yes this is definitely an added protection for these kexec loaded kernels memory areas from being offlined but I would have expected the preceding offlining to have failed as well. > > Signed-off-by: James Morse <james.morse@arm.com> > --- > kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 56 insertions(+) > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index c19c0dad1ebe..ba1d91e868ca 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -12,6 +12,7 @@ > #include <linux/slab.h> > #include <linux/fs.h> > #include <linux/kexec.h> > +#include <linux/memory.h> > #include <linux/mutex.h> > #include <linux/list.h> > #include <linux/highmem.h> > @@ -22,10 +23,12 @@ > #include <linux/elf.h> > #include <linux/elfcore.h> > #include <linux/utsname.h> > +#include <linux/notifier.h> > #include <linux/numa.h> > #include <linux/suspend.h> > #include <linux/device.h> > #include <linux/freezer.h> > +#include <linux/pfn.h> > #include <linux/pm.h> > #include <linux/cpu.h> > #include <linux/uaccess.h> > @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void) > > void __weak arch_kexec_unprotect_crashkres(void) > {} > + > +/* > + * If user-space wants to offline memory that is in use by a loaded kexec > + * image, it should unload the image first. > + */ Probably this would need kexec user manual and related system call man pages update as well. > +static int mem_remove_cb(struct notifier_block *nb, unsigned long action, > + void *data) > +{ > + int rv = NOTIFY_OK, i; > + struct memory_notify *arg = data; > + unsigned long pfn = arg->start_pfn; > + unsigned long nr_segments, sstart, send; > + unsigned long end_pfn = arg->start_pfn + arg->nr_pages; > + > + might_sleep(); Required ? > + > + if (action != MEM_GOING_OFFLINE) > + return NOTIFY_DONE; > + > + mutex_lock(&kexec_mutex); > + if (kexec_image) { > + nr_segments = kexec_image->nr_segments; > + > + for (i = 0; i < nr_segments; i++) { > + sstart = PFN_DOWN(kexec_image->segment[i].mem); > + send = PFN_UP(kexec_image->segment[i].mem + > + kexec_image->segment[i].memsz); > + > + if ((pfn <= sstart && sstart < end_pfn) || > + (pfn <= send && send < end_pfn)) { > + pr_warn("Memory region in use\n"); > + rv = NOTIFY_BAD; > + break; > + } > + } > + } > + mutex_unlock(&kexec_mutex); > + > + return rv; Variable 'rv' is redundant, should use NOTIFY_[BAD|OK] directly instead. > +} > + > +static struct notifier_block mem_remove_nb = { > + .notifier_call = mem_remove_cb, > +}; > + > +static int __init register_mem_remove_cb(void) > +{ > + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) Should not all these new code here be wrapped with CONFIG_MEMORY_HOTREMOVE to reduce the scope as well as final code size when the config is disabled. > + return register_memory_notifier(&mem_remove_nb); > + > + return 0; > +} > +device_initcall(register_mem_remove_cb); > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-03-27 0:43 ` Anshuman Khandual @ 2020-03-27 2:54 ` Baoquan He -1 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-03-27 2:54 UTC (permalink / raw) To: Anshuman Khandual Cc: James Morse, kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Bhupesh Sharma On 03/27/20 at 06:13am, Anshuman Khandual wrote: > > > On 03/26/2020 11:37 PM, James Morse wrote: > > An image loaded for kexec is not stored in place, instead its segments > > are scattered through memory, and are re-assembled when needed. In the > > meantime, the target memory may have been removed. > > > > Because mm is not aware that this memory is still in use, it allows it > > to be removed. > > Why the isolation process does not fail when these pages are currently > being used by kexec ? That is trick of kexec implementaiton. When loading kexec-ed kernel, it just reads in the kenrel image in user space, then searches a place where it's going to put kernel image in the whole system RAM, but won't put kernel image in the searched region immediately, just book ahead a room. When you execute 'kexec -e' to trigger kexec jumping, it will copy kernel image into the booked room. So the booking is only recorded in kexec's own data structure. > > > > > Add a memory notifier to prevent the removal of memory regions that > > overlap with a loaded kexec image segment. e.g., when triggered from the > > Qemu console: > > | kexec_core: memory region in use > > | memory memory32: Offline failed. > > Yes this is definitely an added protection for these kexec loaded kernels > memory areas from being offlined but I would have expected the preceding > offlining to have failed as well. > > > > > Signed-off-by: James Morse <james.morse@arm.com> > > --- > > kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 56 insertions(+) > > > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > > index c19c0dad1ebe..ba1d91e868ca 100644 > > --- a/kernel/kexec_core.c > > +++ b/kernel/kexec_core.c > > @@ -12,6 +12,7 @@ > > #include <linux/slab.h> > > #include <linux/fs.h> > > #include <linux/kexec.h> > > +#include <linux/memory.h> > > #include <linux/mutex.h> > > #include <linux/list.h> > > #include <linux/highmem.h> > > @@ -22,10 +23,12 @@ > > #include <linux/elf.h> > > #include <linux/elfcore.h> > > #include <linux/utsname.h> > > +#include <linux/notifier.h> > > #include <linux/numa.h> > > #include <linux/suspend.h> > > #include <linux/device.h> > > #include <linux/freezer.h> > > +#include <linux/pfn.h> > > #include <linux/pm.h> > > #include <linux/cpu.h> > > #include <linux/uaccess.h> > > @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void) > > > > void __weak arch_kexec_unprotect_crashkres(void) > > {} > > + > > +/* > > + * If user-space wants to offline memory that is in use by a loaded kexec > > + * image, it should unload the image first. > > + */ > > Probably this would need kexec user manual and related system call man pages > update as well. > > > +static int mem_remove_cb(struct notifier_block *nb, unsigned long action, > > + void *data) > > +{ > > + int rv = NOTIFY_OK, i; > > + struct memory_notify *arg = data; > > + unsigned long pfn = arg->start_pfn; > > + unsigned long nr_segments, sstart, send; > > + unsigned long end_pfn = arg->start_pfn + arg->nr_pages; > > + > > + might_sleep(); > > Required ? > > > + > > + if (action != MEM_GOING_OFFLINE) > > + return NOTIFY_DONE; > > + > > + mutex_lock(&kexec_mutex); > > + if (kexec_image) { > > + nr_segments = kexec_image->nr_segments; > > + > > + for (i = 0; i < nr_segments; i++) { > > + sstart = PFN_DOWN(kexec_image->segment[i].mem); > > + send = PFN_UP(kexec_image->segment[i].mem + > > + kexec_image->segment[i].memsz); > > + > > + if ((pfn <= sstart && sstart < end_pfn) || > > + (pfn <= send && send < end_pfn)) { > > + pr_warn("Memory region in use\n"); > > + rv = NOTIFY_BAD; > > + break; > > + } > > + } > > + } > > + mutex_unlock(&kexec_mutex); > > + > > + return rv; > > Variable 'rv' is redundant, should use NOTIFY_[BAD|OK] directly instead. > > > +} > > + > > +static struct notifier_block mem_remove_nb = { > > + .notifier_call = mem_remove_cb, > > +}; > > + > > +static int __init register_mem_remove_cb(void) > > +{ > > + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) > > Should not all these new code here be wrapped with CONFIG_MEMORY_HOTREMOVE > to reduce the scope as well as final code size when the config is disabled. > > > + return register_memory_notifier(&mem_remove_nb); > > + > > + return 0; > > +} > > +device_initcall(register_mem_remove_cb); > > > ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-03-27 2:54 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-03-27 2:54 UTC (permalink / raw) To: Anshuman Khandual Cc: Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 03/27/20 at 06:13am, Anshuman Khandual wrote: > > > On 03/26/2020 11:37 PM, James Morse wrote: > > An image loaded for kexec is not stored in place, instead its segments > > are scattered through memory, and are re-assembled when needed. In the > > meantime, the target memory may have been removed. > > > > Because mm is not aware that this memory is still in use, it allows it > > to be removed. > > Why the isolation process does not fail when these pages are currently > being used by kexec ? That is trick of kexec implementaiton. When loading kexec-ed kernel, it just reads in the kenrel image in user space, then searches a place where it's going to put kernel image in the whole system RAM, but won't put kernel image in the searched region immediately, just book ahead a room. When you execute 'kexec -e' to trigger kexec jumping, it will copy kernel image into the booked room. So the booking is only recorded in kexec's own data structure. > > > > > Add a memory notifier to prevent the removal of memory regions that > > overlap with a loaded kexec image segment. e.g., when triggered from the > > Qemu console: > > | kexec_core: memory region in use > > | memory memory32: Offline failed. > > Yes this is definitely an added protection for these kexec loaded kernels > memory areas from being offlined but I would have expected the preceding > offlining to have failed as well. > > > > > Signed-off-by: James Morse <james.morse@arm.com> > > --- > > kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 56 insertions(+) > > > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > > index c19c0dad1ebe..ba1d91e868ca 100644 > > --- a/kernel/kexec_core.c > > +++ b/kernel/kexec_core.c > > @@ -12,6 +12,7 @@ > > #include <linux/slab.h> > > #include <linux/fs.h> > > #include <linux/kexec.h> > > +#include <linux/memory.h> > > #include <linux/mutex.h> > > #include <linux/list.h> > > #include <linux/highmem.h> > > @@ -22,10 +23,12 @@ > > #include <linux/elf.h> > > #include <linux/elfcore.h> > > #include <linux/utsname.h> > > +#include <linux/notifier.h> > > #include <linux/numa.h> > > #include <linux/suspend.h> > > #include <linux/device.h> > > #include <linux/freezer.h> > > +#include <linux/pfn.h> > > #include <linux/pm.h> > > #include <linux/cpu.h> > > #include <linux/uaccess.h> > > @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void) > > > > void __weak arch_kexec_unprotect_crashkres(void) > > {} > > + > > +/* > > + * If user-space wants to offline memory that is in use by a loaded kexec > > + * image, it should unload the image first. > > + */ > > Probably this would need kexec user manual and related system call man pages > update as well. > > > +static int mem_remove_cb(struct notifier_block *nb, unsigned long action, > > + void *data) > > +{ > > + int rv = NOTIFY_OK, i; > > + struct memory_notify *arg = data; > > + unsigned long pfn = arg->start_pfn; > > + unsigned long nr_segments, sstart, send; > > + unsigned long end_pfn = arg->start_pfn + arg->nr_pages; > > + > > + might_sleep(); > > Required ? > > > + > > + if (action != MEM_GOING_OFFLINE) > > + return NOTIFY_DONE; > > + > > + mutex_lock(&kexec_mutex); > > + if (kexec_image) { > > + nr_segments = kexec_image->nr_segments; > > + > > + for (i = 0; i < nr_segments; i++) { > > + sstart = PFN_DOWN(kexec_image->segment[i].mem); > > + send = PFN_UP(kexec_image->segment[i].mem + > > + kexec_image->segment[i].memsz); > > + > > + if ((pfn <= sstart && sstart < end_pfn) || > > + (pfn <= send && send < end_pfn)) { > > + pr_warn("Memory region in use\n"); > > + rv = NOTIFY_BAD; > > + break; > > + } > > + } > > + } > > + mutex_unlock(&kexec_mutex); > > + > > + return rv; > > Variable 'rv' is redundant, should use NOTIFY_[BAD|OK] directly instead. > > > +} > > + > > +static struct notifier_block mem_remove_nb = { > > + .notifier_call = mem_remove_cb, > > +}; > > + > > +static int __init register_mem_remove_cb(void) > > +{ > > + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) > > Should not all these new code here be wrapped with CONFIG_MEMORY_HOTREMOVE > to reduce the scope as well as final code size when the config is disabled. > > > + return register_memory_notifier(&mem_remove_nb); > > + > > + return 0; > > +} > > +device_initcall(register_mem_remove_cb); > > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-03-27 0:43 ` Anshuman Khandual @ 2020-03-27 15:46 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-27 15:46 UTC (permalink / raw) To: Anshuman Khandual Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Bhupesh Sharma Hi Anshuman, On 3/27/20 12:43 AM, Anshuman Khandual wrote: > On 03/26/2020 11:37 PM, James Morse wrote: >> An image loaded for kexec is not stored in place, instead its segments >> are scattered through memory, and are re-assembled when needed. In the >> meantime, the target memory may have been removed. >> >> Because mm is not aware that this memory is still in use, it allows it >> to be removed. > Why the isolation process does not fail when these pages are currently > being used by kexec ? Kexec isn't using them right now, but it will once kexec is triggered. Those physical addresses are held in some internal kexec data structures until kexec is triggered. >> Add a memory notifier to prevent the removal of memory regions that >> overlap with a loaded kexec image segment. e.g., when triggered from the >> Qemu console: >> | kexec_core: memory region in use >> | memory memory32: Offline failed. > > Yes this is definitely an added protection for these kexec loaded kernels > memory areas from being offlined but I would have expected the preceding > offlining to have failed as well. kexec hasn't allocate the memory, part of the regions user-space may specify for the next kernel may be in use. There is nothing to stop the memory being used in the meantime. Any other way of doing this would prevent us saying why it failed. Like this, the user can spot the 'kexec: Memory region in use message', and unload kexec. >> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c >> index c19c0dad1ebe..ba1d91e868ca 100644 >> --- a/kernel/kexec_core.c >> +++ b/kernel/kexec_core.c >> @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void) >> >> void __weak arch_kexec_unprotect_crashkres(void) >> {} >> + >> +/* >> + * If user-space wants to offline memory that is in use by a loaded kexec >> + * image, it should unload the image first. >> + */ > Probably this would need kexec user manual and related system call man pages > update as well. I can't see anything relevant under Documentation. (kdump yes, kexec no...) >> +static int mem_remove_cb(struct notifier_block *nb, unsigned long action, >> + void *data) >> +{ >> + int rv = NOTIFY_OK, i; >> + struct memory_notify *arg = data; >> + unsigned long pfn = arg->start_pfn; >> + unsigned long nr_segments, sstart, send; >> + unsigned long end_pfn = arg->start_pfn + arg->nr_pages; >> + >> + might_sleep(); > > Required ? Habit, and I think best practice. We take a mutex, so might_sleep(), but we also conditionally return before lockdep would see the mutex. Having this annotation means a dangerous change to the way this is called triggers a warning without having to test memory hotplug explicitly. >> + >> + if (action != MEM_GOING_OFFLINE) >> + return NOTIFY_DONE; >> + >> + mutex_lock(&kexec_mutex); >> + if (kexec_image) { >> + nr_segments = kexec_image->nr_segments; >> + >> + for (i = 0; i < nr_segments; i++) { >> + sstart = PFN_DOWN(kexec_image->segment[i].mem); >> + send = PFN_UP(kexec_image->segment[i].mem + >> + kexec_image->segment[i].memsz); >> + >> + if ((pfn <= sstart && sstart < end_pfn) || >> + (pfn <= send && send < end_pfn)) { >> + pr_warn("Memory region in use\n"); >> + rv = NOTIFY_BAD; >> + break; >> + } >> + } >> + } >> + mutex_unlock(&kexec_mutex); >> + >> + return rv; > > Variable 'rv' is redundant, should use NOTIFY_[BAD|OK] directly instead. You'd prefer a mutex_unlock() in the middle of the loop? ... or goto? (I'm not convinced) >> +} >> + >> +static struct notifier_block mem_remove_nb = { >> + .notifier_call = mem_remove_cb, >> +}; >> + >> +static int __init register_mem_remove_cb(void) >> +{ >> + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) > > Should not all these new code here be wrapped with CONFIG_MEMORY_HOTREMOVE > to reduce the scope as well as final code size when the config is disabled. The compiler is really good at this. "if (false)" means this is all dead code, and static means its not exported, so the compiler is free to remove it. Not #ifdef-ing it makes it much more readable, and means the compiler checks its valid C before it removes it. This avoids weird header include problems that only show up on some rand-config builds. Thanks, James >> + return register_memory_notifier(&mem_remove_nb); >> + >> + return 0; >> +} >> +device_initcall(register_mem_remove_cb); >> > ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-03-27 15:46 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-27 15:46 UTC (permalink / raw) To: Anshuman Khandual Cc: Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel Hi Anshuman, On 3/27/20 12:43 AM, Anshuman Khandual wrote: > On 03/26/2020 11:37 PM, James Morse wrote: >> An image loaded for kexec is not stored in place, instead its segments >> are scattered through memory, and are re-assembled when needed. In the >> meantime, the target memory may have been removed. >> >> Because mm is not aware that this memory is still in use, it allows it >> to be removed. > Why the isolation process does not fail when these pages are currently > being used by kexec ? Kexec isn't using them right now, but it will once kexec is triggered. Those physical addresses are held in some internal kexec data structures until kexec is triggered. >> Add a memory notifier to prevent the removal of memory regions that >> overlap with a loaded kexec image segment. e.g., when triggered from the >> Qemu console: >> | kexec_core: memory region in use >> | memory memory32: Offline failed. > > Yes this is definitely an added protection for these kexec loaded kernels > memory areas from being offlined but I would have expected the preceding > offlining to have failed as well. kexec hasn't allocate the memory, part of the regions user-space may specify for the next kernel may be in use. There is nothing to stop the memory being used in the meantime. Any other way of doing this would prevent us saying why it failed. Like this, the user can spot the 'kexec: Memory region in use message', and unload kexec. >> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c >> index c19c0dad1ebe..ba1d91e868ca 100644 >> --- a/kernel/kexec_core.c >> +++ b/kernel/kexec_core.c >> @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void) >> >> void __weak arch_kexec_unprotect_crashkres(void) >> {} >> + >> +/* >> + * If user-space wants to offline memory that is in use by a loaded kexec >> + * image, it should unload the image first. >> + */ > Probably this would need kexec user manual and related system call man pages > update as well. I can't see anything relevant under Documentation. (kdump yes, kexec no...) >> +static int mem_remove_cb(struct notifier_block *nb, unsigned long action, >> + void *data) >> +{ >> + int rv = NOTIFY_OK, i; >> + struct memory_notify *arg = data; >> + unsigned long pfn = arg->start_pfn; >> + unsigned long nr_segments, sstart, send; >> + unsigned long end_pfn = arg->start_pfn + arg->nr_pages; >> + >> + might_sleep(); > > Required ? Habit, and I think best practice. We take a mutex, so might_sleep(), but we also conditionally return before lockdep would see the mutex. Having this annotation means a dangerous change to the way this is called triggers a warning without having to test memory hotplug explicitly. >> + >> + if (action != MEM_GOING_OFFLINE) >> + return NOTIFY_DONE; >> + >> + mutex_lock(&kexec_mutex); >> + if (kexec_image) { >> + nr_segments = kexec_image->nr_segments; >> + >> + for (i = 0; i < nr_segments; i++) { >> + sstart = PFN_DOWN(kexec_image->segment[i].mem); >> + send = PFN_UP(kexec_image->segment[i].mem + >> + kexec_image->segment[i].memsz); >> + >> + if ((pfn <= sstart && sstart < end_pfn) || >> + (pfn <= send && send < end_pfn)) { >> + pr_warn("Memory region in use\n"); >> + rv = NOTIFY_BAD; >> + break; >> + } >> + } >> + } >> + mutex_unlock(&kexec_mutex); >> + >> + return rv; > > Variable 'rv' is redundant, should use NOTIFY_[BAD|OK] directly instead. You'd prefer a mutex_unlock() in the middle of the loop? ... or goto? (I'm not convinced) >> +} >> + >> +static struct notifier_block mem_remove_nb = { >> + .notifier_call = mem_remove_cb, >> +}; >> + >> +static int __init register_mem_remove_cb(void) >> +{ >> + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) > > Should not all these new code here be wrapped with CONFIG_MEMORY_HOTREMOVE > to reduce the scope as well as final code size when the config is disabled. The compiler is really good at this. "if (false)" means this is all dead code, and static means its not exported, so the compiler is free to remove it. Not #ifdef-ing it makes it much more readable, and means the compiler checks its valid C before it removes it. This avoids weird header include problems that only show up on some rand-config builds. Thanks, James >> + return register_memory_notifier(&mem_remove_nb); >> + >> + return 0; >> +} >> +device_initcall(register_mem_remove_cb); >> > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-03-26 18:07 ` James Morse @ 2020-03-27 2:34 ` Baoquan He -1 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-03-27 2:34 UTC (permalink / raw) To: James Morse Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma On 03/26/20 at 06:07pm, James Morse wrote: > An image loaded for kexec is not stored in place, instead its segments > are scattered through memory, and are re-assembled when needed. In the > meantime, the target memory may have been removed. > > Because mm is not aware that this memory is still in use, it allows it > to be removed. > > Add a memory notifier to prevent the removal of memory regions that > overlap with a loaded kexec image segment. e.g., when triggered from the > Qemu console: > | kexec_core: memory region in use > | memory memory32: Offline failed. As I replied to the cover letter, usually we do loading and juming of kexec-ed kernel at one time. If we expect to do both of them at different time, I agree we should do something to make thing safer if someone really want to do since it's allowed, can we do anything with the existing notifier? Mem hotplug has got a notifier to notice it will offline a memory region. memory_notify(MEM_OFFLINE, &arg); > > Signed-off-by: James Morse <james.morse@arm.com> > --- > kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 56 insertions(+) > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index c19c0dad1ebe..ba1d91e868ca 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -12,6 +12,7 @@ > #include <linux/slab.h> > #include <linux/fs.h> > #include <linux/kexec.h> > +#include <linux/memory.h> > #include <linux/mutex.h> > #include <linux/list.h> > #include <linux/highmem.h> > @@ -22,10 +23,12 @@ > #include <linux/elf.h> > #include <linux/elfcore.h> > #include <linux/utsname.h> > +#include <linux/notifier.h> > #include <linux/numa.h> > #include <linux/suspend.h> > #include <linux/device.h> > #include <linux/freezer.h> > +#include <linux/pfn.h> > #include <linux/pm.h> > #include <linux/cpu.h> > #include <linux/uaccess.h> > @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void) > > void __weak arch_kexec_unprotect_crashkres(void) > {} > + > +/* > + * If user-space wants to offline memory that is in use by a loaded kexec > + * image, it should unload the image first. > + */ > +static int mem_remove_cb(struct notifier_block *nb, unsigned long action, > + void *data) > +{ > + int rv = NOTIFY_OK, i; > + struct memory_notify *arg = data; > + unsigned long pfn = arg->start_pfn; > + unsigned long nr_segments, sstart, send; > + unsigned long end_pfn = arg->start_pfn + arg->nr_pages; > + > + might_sleep(); > + > + if (action != MEM_GOING_OFFLINE) > + return NOTIFY_DONE; > + > + mutex_lock(&kexec_mutex); > + if (kexec_image) { > + nr_segments = kexec_image->nr_segments; > + > + for (i = 0; i < nr_segments; i++) { > + sstart = PFN_DOWN(kexec_image->segment[i].mem); > + send = PFN_UP(kexec_image->segment[i].mem + > + kexec_image->segment[i].memsz); > + > + if ((pfn <= sstart && sstart < end_pfn) || > + (pfn <= send && send < end_pfn)) { > + pr_warn("Memory region in use\n"); > + rv = NOTIFY_BAD; > + break; > + } > + } > + } > + mutex_unlock(&kexec_mutex); > + > + return rv; > +} > + > +static struct notifier_block mem_remove_nb = { > + .notifier_call = mem_remove_cb, > +}; > + > +static int __init register_mem_remove_cb(void) > +{ > + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) > + return register_memory_notifier(&mem_remove_nb); > + > + return 0; > +} > +device_initcall(register_mem_remove_cb); > -- > 2.25.1 > > ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-03-27 2:34 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-03-27 2:34 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 03/26/20 at 06:07pm, James Morse wrote: > An image loaded for kexec is not stored in place, instead its segments > are scattered through memory, and are re-assembled when needed. In the > meantime, the target memory may have been removed. > > Because mm is not aware that this memory is still in use, it allows it > to be removed. > > Add a memory notifier to prevent the removal of memory regions that > overlap with a loaded kexec image segment. e.g., when triggered from the > Qemu console: > | kexec_core: memory region in use > | memory memory32: Offline failed. As I replied to the cover letter, usually we do loading and juming of kexec-ed kernel at one time. If we expect to do both of them at different time, I agree we should do something to make thing safer if someone really want to do since it's allowed, can we do anything with the existing notifier? Mem hotplug has got a notifier to notice it will offline a memory region. memory_notify(MEM_OFFLINE, &arg); > > Signed-off-by: James Morse <james.morse@arm.com> > --- > kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 56 insertions(+) > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index c19c0dad1ebe..ba1d91e868ca 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -12,6 +12,7 @@ > #include <linux/slab.h> > #include <linux/fs.h> > #include <linux/kexec.h> > +#include <linux/memory.h> > #include <linux/mutex.h> > #include <linux/list.h> > #include <linux/highmem.h> > @@ -22,10 +23,12 @@ > #include <linux/elf.h> > #include <linux/elfcore.h> > #include <linux/utsname.h> > +#include <linux/notifier.h> > #include <linux/numa.h> > #include <linux/suspend.h> > #include <linux/device.h> > #include <linux/freezer.h> > +#include <linux/pfn.h> > #include <linux/pm.h> > #include <linux/cpu.h> > #include <linux/uaccess.h> > @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void) > > void __weak arch_kexec_unprotect_crashkres(void) > {} > + > +/* > + * If user-space wants to offline memory that is in use by a loaded kexec > + * image, it should unload the image first. > + */ > +static int mem_remove_cb(struct notifier_block *nb, unsigned long action, > + void *data) > +{ > + int rv = NOTIFY_OK, i; > + struct memory_notify *arg = data; > + unsigned long pfn = arg->start_pfn; > + unsigned long nr_segments, sstart, send; > + unsigned long end_pfn = arg->start_pfn + arg->nr_pages; > + > + might_sleep(); > + > + if (action != MEM_GOING_OFFLINE) > + return NOTIFY_DONE; > + > + mutex_lock(&kexec_mutex); > + if (kexec_image) { > + nr_segments = kexec_image->nr_segments; > + > + for (i = 0; i < nr_segments; i++) { > + sstart = PFN_DOWN(kexec_image->segment[i].mem); > + send = PFN_UP(kexec_image->segment[i].mem + > + kexec_image->segment[i].memsz); > + > + if ((pfn <= sstart && sstart < end_pfn) || > + (pfn <= send && send < end_pfn)) { > + pr_warn("Memory region in use\n"); > + rv = NOTIFY_BAD; > + break; > + } > + } > + } > + mutex_unlock(&kexec_mutex); > + > + return rv; > +} > + > +static struct notifier_block mem_remove_nb = { > + .notifier_call = mem_remove_cb, > +}; > + > +static int __init register_mem_remove_cb(void) > +{ > + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) > + return register_memory_notifier(&mem_remove_nb); > + > + return 0; > +} > +device_initcall(register_mem_remove_cb); > -- > 2.25.1 > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-03-26 18:07 ` James Morse @ 2020-03-27 9:30 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-27 9:30 UTC (permalink / raw) To: James Morse, kexec, linux-mm, linux-arm-kernel Cc: Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma On 26.03.20 19:07, James Morse wrote: > An image loaded for kexec is not stored in place, instead its segments > are scattered through memory, and are re-assembled when needed. In the > meantime, the target memory may have been removed. > > Because mm is not aware that this memory is still in use, it allows it > to be removed. > > Add a memory notifier to prevent the removal of memory regions that > overlap with a loaded kexec image segment. e.g., when triggered from the > Qemu console: > | kexec_core: memory region in use > | memory memory32: Offline failed. > > Signed-off-by: James Morse <james.morse@arm.com> > --- > kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 56 insertions(+) > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index c19c0dad1ebe..ba1d91e868ca 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -12,6 +12,7 @@ > #include <linux/slab.h> > #include <linux/fs.h> > #include <linux/kexec.h> > +#include <linux/memory.h> > #include <linux/mutex.h> > #include <linux/list.h> > #include <linux/highmem.h> > @@ -22,10 +23,12 @@ > #include <linux/elf.h> > #include <linux/elfcore.h> > #include <linux/utsname.h> > +#include <linux/notifier.h> > #include <linux/numa.h> > #include <linux/suspend.h> > #include <linux/device.h> > #include <linux/freezer.h> > +#include <linux/pfn.h> > #include <linux/pm.h> > #include <linux/cpu.h> > #include <linux/uaccess.h> > @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void) > > void __weak arch_kexec_unprotect_crashkres(void) > {} > + > +/* > + * If user-space wants to offline memory that is in use by a loaded kexec > + * image, it should unload the image first. > + */ > +static int mem_remove_cb(struct notifier_block *nb, unsigned long action, > + void *data) > +{ > + int rv = NOTIFY_OK, i; > + struct memory_notify *arg = data; > + unsigned long pfn = arg->start_pfn; > + unsigned long nr_segments, sstart, send; > + unsigned long end_pfn = arg->start_pfn + arg->nr_pages; > + > + might_sleep(); > + > + if (action != MEM_GOING_OFFLINE) > + return NOTIFY_DONE; > + > + mutex_lock(&kexec_mutex); > + if (kexec_image) { > + nr_segments = kexec_image->nr_segments; > + > + for (i = 0; i < nr_segments; i++) { > + sstart = PFN_DOWN(kexec_image->segment[i].mem); > + send = PFN_UP(kexec_image->segment[i].mem + > + kexec_image->segment[i].memsz); > + > + if ((pfn <= sstart && sstart < end_pfn) || > + (pfn <= send && send < end_pfn)) { > + pr_warn("Memory region in use\n"); > + rv = NOTIFY_BAD; > + break; > + } > + } > + } > + mutex_unlock(&kexec_mutex); > + > + return rv; > +} > + > +static struct notifier_block mem_remove_nb = { > + .notifier_call = mem_remove_cb, > +}; > + > +static int __init register_mem_remove_cb(void) > +{ > + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) > + return register_memory_notifier(&mem_remove_nb); > + > + return 0; > +} > +device_initcall(register_mem_remove_cb); > E.g., in kernel/kexec_core.c:kimage_alloc_pages() "SetPageReserved(pages + i);" Pages that are reserved cannot get offlined. How are you able to trigger that before this patch? (where is the allocation path for kexec, which will not set the pages reserved?) -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-03-27 9:30 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-27 9:30 UTC (permalink / raw) To: James Morse, kexec, linux-mm, linux-arm-kernel Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Eric Biederman, Andrew Morton, Will Deacon On 26.03.20 19:07, James Morse wrote: > An image loaded for kexec is not stored in place, instead its segments > are scattered through memory, and are re-assembled when needed. In the > meantime, the target memory may have been removed. > > Because mm is not aware that this memory is still in use, it allows it > to be removed. > > Add a memory notifier to prevent the removal of memory regions that > overlap with a loaded kexec image segment. e.g., when triggered from the > Qemu console: > | kexec_core: memory region in use > | memory memory32: Offline failed. > > Signed-off-by: James Morse <james.morse@arm.com> > --- > kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 56 insertions(+) > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index c19c0dad1ebe..ba1d91e868ca 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -12,6 +12,7 @@ > #include <linux/slab.h> > #include <linux/fs.h> > #include <linux/kexec.h> > +#include <linux/memory.h> > #include <linux/mutex.h> > #include <linux/list.h> > #include <linux/highmem.h> > @@ -22,10 +23,12 @@ > #include <linux/elf.h> > #include <linux/elfcore.h> > #include <linux/utsname.h> > +#include <linux/notifier.h> > #include <linux/numa.h> > #include <linux/suspend.h> > #include <linux/device.h> > #include <linux/freezer.h> > +#include <linux/pfn.h> > #include <linux/pm.h> > #include <linux/cpu.h> > #include <linux/uaccess.h> > @@ -1219,3 +1222,56 @@ void __weak arch_kexec_protect_crashkres(void) > > void __weak arch_kexec_unprotect_crashkres(void) > {} > + > +/* > + * If user-space wants to offline memory that is in use by a loaded kexec > + * image, it should unload the image first. > + */ > +static int mem_remove_cb(struct notifier_block *nb, unsigned long action, > + void *data) > +{ > + int rv = NOTIFY_OK, i; > + struct memory_notify *arg = data; > + unsigned long pfn = arg->start_pfn; > + unsigned long nr_segments, sstart, send; > + unsigned long end_pfn = arg->start_pfn + arg->nr_pages; > + > + might_sleep(); > + > + if (action != MEM_GOING_OFFLINE) > + return NOTIFY_DONE; > + > + mutex_lock(&kexec_mutex); > + if (kexec_image) { > + nr_segments = kexec_image->nr_segments; > + > + for (i = 0; i < nr_segments; i++) { > + sstart = PFN_DOWN(kexec_image->segment[i].mem); > + send = PFN_UP(kexec_image->segment[i].mem + > + kexec_image->segment[i].memsz); > + > + if ((pfn <= sstart && sstart < end_pfn) || > + (pfn <= send && send < end_pfn)) { > + pr_warn("Memory region in use\n"); > + rv = NOTIFY_BAD; > + break; > + } > + } > + } > + mutex_unlock(&kexec_mutex); > + > + return rv; > +} > + > +static struct notifier_block mem_remove_nb = { > + .notifier_call = mem_remove_cb, > +}; > + > +static int __init register_mem_remove_cb(void) > +{ > + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) > + return register_memory_notifier(&mem_remove_nb); > + > + return 0; > +} > +device_initcall(register_mem_remove_cb); > E.g., in kernel/kexec_core.c:kimage_alloc_pages() "SetPageReserved(pages + i);" Pages that are reserved cannot get offlined. How are you able to trigger that before this patch? (where is the allocation path for kexec, which will not set the pages reserved?) -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-03-27 9:30 ` David Hildenbrand @ 2020-03-27 16:56 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-27 16:56 UTC (permalink / raw) To: David Hildenbrand Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma Hi David, On 3/27/20 9:30 AM, David Hildenbrand wrote: > On 26.03.20 19:07, James Morse wrote: >> An image loaded for kexec is not stored in place, instead its segments >> are scattered through memory, and are re-assembled when needed. In the >> meantime, the target memory may have been removed. >> >> Because mm is not aware that this memory is still in use, it allows it >> to be removed. >> >> Add a memory notifier to prevent the removal of memory regions that >> overlap with a loaded kexec image segment. e.g., when triggered from the >> Qemu console: >> | kexec_core: memory region in use >> | memory memory32: Offline failed. >> >> Signed-off-by: James Morse <james.morse@arm.com> >> --- >> kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 56 insertions(+) >> >> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c >> index c19c0dad1ebe..ba1d91e868ca 100644 >> --- a/kernel/kexec_core.c >> +++ b/kernel/kexec_core.c > E.g., in kernel/kexec_core.c:kimage_alloc_pages() > > "SetPageReserved(pages + i);" > > Pages that are reserved cannot get offlined. How are you able to trigger > that before this patch? (where is the allocation path for kexec, which > will not set the pages reserved?) This sets page reserved on the memory it gets back from alloc_pages() in kimage_alloc_pages(). This is when you load the image[0]. The problem I see is for the target or destination memory once you execute the image. Once machine_kexec() runs, it tries to write to this, assuming it is still present... How can I make the commit message clearer? 're-assembled' and 'target memory' aren't quite cutting it, is there are a correct term to use? (destination?) Thanks, James [0] Just to convince myself: | kimage_alloc_pages+0x30/0x15c | kimage_alloc_page+0x210/0x7d8 | kimage_load_segment+0x14c/0x8c8 | __arm64_sys_kexec_load+0x4f0/0x720 | do_el0_svc+0x13c/0x3c0 | el0_sync_handler+0x9c/0x3c0 | el0_sync+0x158/0x180 ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-03-27 16:56 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-27 16:56 UTC (permalink / raw) To: David Hildenbrand Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel Hi David, On 3/27/20 9:30 AM, David Hildenbrand wrote: > On 26.03.20 19:07, James Morse wrote: >> An image loaded for kexec is not stored in place, instead its segments >> are scattered through memory, and are re-assembled when needed. In the >> meantime, the target memory may have been removed. >> >> Because mm is not aware that this memory is still in use, it allows it >> to be removed. >> >> Add a memory notifier to prevent the removal of memory regions that >> overlap with a loaded kexec image segment. e.g., when triggered from the >> Qemu console: >> | kexec_core: memory region in use >> | memory memory32: Offline failed. >> >> Signed-off-by: James Morse <james.morse@arm.com> >> --- >> kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 56 insertions(+) >> >> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c >> index c19c0dad1ebe..ba1d91e868ca 100644 >> --- a/kernel/kexec_core.c >> +++ b/kernel/kexec_core.c > E.g., in kernel/kexec_core.c:kimage_alloc_pages() > > "SetPageReserved(pages + i);" > > Pages that are reserved cannot get offlined. How are you able to trigger > that before this patch? (where is the allocation path for kexec, which > will not set the pages reserved?) This sets page reserved on the memory it gets back from alloc_pages() in kimage_alloc_pages(). This is when you load the image[0]. The problem I see is for the target or destination memory once you execute the image. Once machine_kexec() runs, it tries to write to this, assuming it is still present... How can I make the commit message clearer? 're-assembled' and 'target memory' aren't quite cutting it, is there are a correct term to use? (destination?) Thanks, James [0] Just to convince myself: | kimage_alloc_pages+0x30/0x15c | kimage_alloc_page+0x210/0x7d8 | kimage_load_segment+0x14c/0x8c8 | __arm64_sys_kexec_load+0x4f0/0x720 | do_el0_svc+0x13c/0x3c0 | el0_sync_handler+0x9c/0x3c0 | el0_sync+0x158/0x180 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-03-27 16:56 ` James Morse @ 2020-03-27 17:06 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-27 17:06 UTC (permalink / raw) To: James Morse Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma On 27.03.20 17:56, James Morse wrote: > Hi David, > > On 3/27/20 9:30 AM, David Hildenbrand wrote: >> On 26.03.20 19:07, James Morse wrote: >>> An image loaded for kexec is not stored in place, instead its segments >>> are scattered through memory, and are re-assembled when needed. In the >>> meantime, the target memory may have been removed. >>> >>> Because mm is not aware that this memory is still in use, it allows it >>> to be removed. >>> >>> Add a memory notifier to prevent the removal of memory regions that >>> overlap with a loaded kexec image segment. e.g., when triggered from the >>> Qemu console: >>> | kexec_core: memory region in use >>> | memory memory32: Offline failed. >>> >>> Signed-off-by: James Morse <james.morse@arm.com> >>> --- >>> kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ >>> 1 file changed, 56 insertions(+) >>> >>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c >>> index c19c0dad1ebe..ba1d91e868ca 100644 >>> --- a/kernel/kexec_core.c >>> +++ b/kernel/kexec_core.c > >> E.g., in kernel/kexec_core.c:kimage_alloc_pages() >> >> "SetPageReserved(pages + i);" >> >> Pages that are reserved cannot get offlined. How are you able to trigger >> that before this patch? (where is the allocation path for kexec, which >> will not set the pages reserved?) > > This sets page reserved on the memory it gets back from > alloc_pages() in kimage_alloc_pages(). This is when you load the image[0]. > > The problem I see is for the target or destination memory once you execute the > image. Once machine_kexec() runs, it tries to write to this, assuming it is > still present... Let's recap 1. You load the image. You allocate memory for e.g., the kexec kernel. The pages will be marked PG_reserved, so they cannot be offlined. 2. You do the kexec. The kexec kernel will only operate on a reserved memory region (reserved via e.g., kernel cmdline crashkernel=128M). Is it that in 2., the reserved memory region (for the crashkernel) could have been offlined in the meantime? -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-03-27 17:06 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-27 17:06 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 27.03.20 17:56, James Morse wrote: > Hi David, > > On 3/27/20 9:30 AM, David Hildenbrand wrote: >> On 26.03.20 19:07, James Morse wrote: >>> An image loaded for kexec is not stored in place, instead its segments >>> are scattered through memory, and are re-assembled when needed. In the >>> meantime, the target memory may have been removed. >>> >>> Because mm is not aware that this memory is still in use, it allows it >>> to be removed. >>> >>> Add a memory notifier to prevent the removal of memory regions that >>> overlap with a loaded kexec image segment. e.g., when triggered from the >>> Qemu console: >>> | kexec_core: memory region in use >>> | memory memory32: Offline failed. >>> >>> Signed-off-by: James Morse <james.morse@arm.com> >>> --- >>> kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++++++++++++++ >>> 1 file changed, 56 insertions(+) >>> >>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c >>> index c19c0dad1ebe..ba1d91e868ca 100644 >>> --- a/kernel/kexec_core.c >>> +++ b/kernel/kexec_core.c > >> E.g., in kernel/kexec_core.c:kimage_alloc_pages() >> >> "SetPageReserved(pages + i);" >> >> Pages that are reserved cannot get offlined. How are you able to trigger >> that before this patch? (where is the allocation path for kexec, which >> will not set the pages reserved?) > > This sets page reserved on the memory it gets back from > alloc_pages() in kimage_alloc_pages(). This is when you load the image[0]. > > The problem I see is for the target or destination memory once you execute the > image. Once machine_kexec() runs, it tries to write to this, assuming it is > still present... Let's recap 1. You load the image. You allocate memory for e.g., the kexec kernel. The pages will be marked PG_reserved, so they cannot be offlined. 2. You do the kexec. The kexec kernel will only operate on a reserved memory region (reserved via e.g., kernel cmdline crashkernel=128M). Is it that in 2., the reserved memory region (for the crashkernel) could have been offlined in the meantime? -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-03-27 17:06 ` David Hildenbrand @ 2020-03-27 18:07 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-27 18:07 UTC (permalink / raw) To: David Hildenbrand Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma Hi David, On 3/27/20 5:06 PM, David Hildenbrand wrote: > On 27.03.20 17:56, James Morse wrote: >> On 3/27/20 9:30 AM, David Hildenbrand wrote: >>> On 26.03.20 19:07, James Morse wrote: >>>> An image loaded for kexec is not stored in place, instead its segments >>>> are scattered through memory, and are re-assembled when needed. In the >>>> meantime, the target memory may have been removed. >>>> >>>> Because mm is not aware that this memory is still in use, it allows it >>>> to be removed. >>>> >>>> Add a memory notifier to prevent the removal of memory regions that >>>> overlap with a loaded kexec image segment. e.g., when triggered from the >>>> Qemu console: >>>> | kexec_core: memory region in use >>>> | memory memory32: Offline failed. >>>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c >>>> index c19c0dad1ebe..ba1d91e868ca 100644 >>>> --- a/kernel/kexec_core.c >>>> +++ b/kernel/kexec_core.c >> >>> E.g., in kernel/kexec_core.c:kimage_alloc_pages() >>> >>> "SetPageReserved(pages + i);" >>> >>> Pages that are reserved cannot get offlined. How are you able to trigger >>> that before this patch? (where is the allocation path for kexec, which >>> will not set the pages reserved?) >> >> This sets page reserved on the memory it gets back from >> alloc_pages() in kimage_alloc_pages(). This is when you load the image[0]. >> >> The problem I see is for the target or destination memory once you execute the >> image. Once machine_kexec() runs, it tries to write to this, assuming it is >> still present... > Let's recap > > 1. You load the image. You allocate memory for e.g., the kexec kernel. > The pages will be marked PG_reserved, so they cannot be offlined. > > 2. You do the kexec. The kexec kernel will only operate on a reserved > memory region (reserved via e.g., kernel cmdline crashkernel=128M). I think you are merging the kexec and kdump behaviours. (Wrong terminology? The things behind 'kexec -l Image' and 'kexec -p Image') For kdump, yes, the new kernel is loaded into the crashkernel reservation, and confined to it. For regular kexec, the new kernel can be loaded any where in memory. There might be a difference with how this works on arm64.... The regular kexec kernel isn't stored in its final location when its loaded, its relocated there when the image is executed. The target/destination memory may have been removed in the meantime. (an example recipe below should clarify this) > Is it that in 2., the reserved memory region (for the crashkernel) could > have been offlined in the meantime? No, for kdump: the crashkernel reservation is PG_reserved, and its not something mm knows how to move, so that region can't be taken offline. (On arm64 we additionally prevent the boot-memory from being removed as it is all described as present by UEFI. The crashkernel reservation would always be from this type of memory) This is about a regular kexec, any crashdump reservation is irrelevant. This kexec kernel is temporarily stored out of line, then relocated when executed. A recipe so that we're at least on the same terminal! This is on a TX2 running arm64's for-next/core using Qemu-TCG to emulate x86. (Sorry for the bizarre config, its because Qemu supports hotremove on x86, but not yet on arm64). Insert the memory: (qemu) object_add memory-backend-ram,id=mem1,size=1G (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 | root@vm:~# free -m | total used free shared ... | Mem: 918 52 814 0 ... | Swap: 0 0 0 Bring it online: | root@vm:~# cd /sys/devices/system/memory/ | root@vm:/sys/devices/system/memory# for F in memory3*; do echo \ | online_movable > $F/state; done | Built 1 zonelists, mobility grouping on. Total pages: 251049 | Policy zone: DMA32 | -bash: echo: write error: Invalid argument | root@vm:/sys/devices/system/memory# free -m | total used free shared ... | Mem: 1942 53 1836 0 ... | Swap: 0 0 0 Load kexec: | root@vm:/sys/devices/system/memory# kexec -l /root/bzImage --reuse-cmdline Press the Attention button to request removal: (qemu) device_del dimm1 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Built 1 zonelists, mobility grouping on. Total pages: 233728 | Policy zone: DMA32 The memory is gone: | root@vm:/sys/devices/system/memory# free -m | total used free shared ... | Mem: 918 89 769 0 ... | Swap: 0 0 0 Trigger kexec: | root@vm:/sys/devices/system/memory# kexec -e [...] | sd 0:0:0:0: [sda] Synchronizing SCSI cache | kexec_core: Starting new kernel ... and Qemu restarts the platform firmware instead of proceeding with kexec. (I assume this is a triple fault) You can use mem-min and mem-max to control where kexec's user space will place the memory. If you apply this patch, the above sequence will fail at the device remove step, as the physical addresses match the loaded kexec image: | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | kexec_core: Memory region in use | kexec_core: Memory region in use | memory memory39: Offline failed. | Built 1 zonelists, mobility grouping on. Total pages: 299212 | Policy zone: Normal | root@vm:/sys/devices/system/memory# free -m | total used free shared ... | Mem: 1942 90 1793 0 ... | Swap: 0 0 0 I can't remove the DIMM, because we failed to offline it: (qemu) object_del mem1 object 'mem1' is in use, can not be deleted and I can trigger kexec and boot the new kernel. kexec user-space here comes from debian bullseye. It picked the removable memory all by itself without any additional arguments. (a different issue that can be ignored for now: x86 additionally fails to reboot if I remove memory, even if its not in use by the kexec image. This doesn't cause qemu to reboot via firmware, I think it dies before the console. It doesn't happen on arm64. I suspect the memory map is snapshotted and assumed to still be correct when the image is executed.) Thanks, James ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-03-27 18:07 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-27 18:07 UTC (permalink / raw) To: David Hildenbrand Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel Hi David, On 3/27/20 5:06 PM, David Hildenbrand wrote: > On 27.03.20 17:56, James Morse wrote: >> On 3/27/20 9:30 AM, David Hildenbrand wrote: >>> On 26.03.20 19:07, James Morse wrote: >>>> An image loaded for kexec is not stored in place, instead its segments >>>> are scattered through memory, and are re-assembled when needed. In the >>>> meantime, the target memory may have been removed. >>>> >>>> Because mm is not aware that this memory is still in use, it allows it >>>> to be removed. >>>> >>>> Add a memory notifier to prevent the removal of memory regions that >>>> overlap with a loaded kexec image segment. e.g., when triggered from the >>>> Qemu console: >>>> | kexec_core: memory region in use >>>> | memory memory32: Offline failed. >>>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c >>>> index c19c0dad1ebe..ba1d91e868ca 100644 >>>> --- a/kernel/kexec_core.c >>>> +++ b/kernel/kexec_core.c >> >>> E.g., in kernel/kexec_core.c:kimage_alloc_pages() >>> >>> "SetPageReserved(pages + i);" >>> >>> Pages that are reserved cannot get offlined. How are you able to trigger >>> that before this patch? (where is the allocation path for kexec, which >>> will not set the pages reserved?) >> >> This sets page reserved on the memory it gets back from >> alloc_pages() in kimage_alloc_pages(). This is when you load the image[0]. >> >> The problem I see is for the target or destination memory once you execute the >> image. Once machine_kexec() runs, it tries to write to this, assuming it is >> still present... > Let's recap > > 1. You load the image. You allocate memory for e.g., the kexec kernel. > The pages will be marked PG_reserved, so they cannot be offlined. > > 2. You do the kexec. The kexec kernel will only operate on a reserved > memory region (reserved via e.g., kernel cmdline crashkernel=128M). I think you are merging the kexec and kdump behaviours. (Wrong terminology? The things behind 'kexec -l Image' and 'kexec -p Image') For kdump, yes, the new kernel is loaded into the crashkernel reservation, and confined to it. For regular kexec, the new kernel can be loaded any where in memory. There might be a difference with how this works on arm64.... The regular kexec kernel isn't stored in its final location when its loaded, its relocated there when the image is executed. The target/destination memory may have been removed in the meantime. (an example recipe below should clarify this) > Is it that in 2., the reserved memory region (for the crashkernel) could > have been offlined in the meantime? No, for kdump: the crashkernel reservation is PG_reserved, and its not something mm knows how to move, so that region can't be taken offline. (On arm64 we additionally prevent the boot-memory from being removed as it is all described as present by UEFI. The crashkernel reservation would always be from this type of memory) This is about a regular kexec, any crashdump reservation is irrelevant. This kexec kernel is temporarily stored out of line, then relocated when executed. A recipe so that we're at least on the same terminal! This is on a TX2 running arm64's for-next/core using Qemu-TCG to emulate x86. (Sorry for the bizarre config, its because Qemu supports hotremove on x86, but not yet on arm64). Insert the memory: (qemu) object_add memory-backend-ram,id=mem1,size=1G (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 | root@vm:~# free -m | total used free shared ... | Mem: 918 52 814 0 ... | Swap: 0 0 0 Bring it online: | root@vm:~# cd /sys/devices/system/memory/ | root@vm:/sys/devices/system/memory# for F in memory3*; do echo \ | online_movable > $F/state; done | Built 1 zonelists, mobility grouping on. Total pages: 251049 | Policy zone: DMA32 | -bash: echo: write error: Invalid argument | root@vm:/sys/devices/system/memory# free -m | total used free shared ... | Mem: 1942 53 1836 0 ... | Swap: 0 0 0 Load kexec: | root@vm:/sys/devices/system/memory# kexec -l /root/bzImage --reuse-cmdline Press the Attention button to request removal: (qemu) device_del dimm1 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Built 1 zonelists, mobility grouping on. Total pages: 233728 | Policy zone: DMA32 The memory is gone: | root@vm:/sys/devices/system/memory# free -m | total used free shared ... | Mem: 918 89 769 0 ... | Swap: 0 0 0 Trigger kexec: | root@vm:/sys/devices/system/memory# kexec -e [...] | sd 0:0:0:0: [sda] Synchronizing SCSI cache | kexec_core: Starting new kernel ... and Qemu restarts the platform firmware instead of proceeding with kexec. (I assume this is a triple fault) You can use mem-min and mem-max to control where kexec's user space will place the memory. If you apply this patch, the above sequence will fail at the device remove step, as the physical addresses match the loaded kexec image: | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | Offlined Pages 32768 | kexec_core: Memory region in use | kexec_core: Memory region in use | memory memory39: Offline failed. | Built 1 zonelists, mobility grouping on. Total pages: 299212 | Policy zone: Normal | root@vm:/sys/devices/system/memory# free -m | total used free shared ... | Mem: 1942 90 1793 0 ... | Swap: 0 0 0 I can't remove the DIMM, because we failed to offline it: (qemu) object_del mem1 object 'mem1' is in use, can not be deleted and I can trigger kexec and boot the new kernel. kexec user-space here comes from debian bullseye. It picked the removable memory all by itself without any additional arguments. (a different issue that can be ignored for now: x86 additionally fails to reboot if I remove memory, even if its not in use by the kexec image. This doesn't cause qemu to reboot via firmware, I think it dies before the console. It doesn't happen on arm64. I suspect the memory map is snapshotted and assumed to still be correct when the image is executed.) Thanks, James _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-03-27 18:07 ` James Morse @ 2020-03-27 18:52 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-27 18:52 UTC (permalink / raw) To: James Morse Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma >> 2. You do the kexec. The kexec kernel will only operate on a reserved >> memory region (reserved via e.g., kernel cmdline crashkernel=128M). > > I think you are merging the kexec and kdump behaviours. > (Wrong terminology? The things behind 'kexec -l Image' and 'kexec -p Image') Oh, I see - I think your example below clarifies things. Something like that should go in the cover letter if we end up in this patch being required :) (I missed that the problematic part is "random" addresses passed by user space to the kernel, where it wants data to be loaded to on kexec -e) > > For kdump, yes, the new kernel is loaded into the crashkernel reservation, and > confined to it. > > > For regular kexec, the new kernel can be loaded any where in memory. There might > be a difference with how this works on arm64.... > > The regular kexec kernel isn't stored in its final location when its loaded, its > relocated there when the image is executed. The target/destination memory may > have been removed in the meantime. > > (an example recipe below should clarify this) > > >> Is it that in 2., the reserved memory region (for the crashkernel) could >> have been offlined in the meantime? > > No, for kdump: the crashkernel reservation is PG_reserved, and its not something > mm knows how to move, so that region can't be taken offline. > > (On arm64 we additionally prevent the boot-memory from being removed as it is > all described as present by UEFI. The crashkernel reservation would always be > from this type of memory) Right. > > > This is about a regular kexec, any crashdump reservation is irrelevant. > This kexec kernel is temporarily stored out of line, then relocated when executed. > > A recipe so that we're at least on the same terminal! This is on a TX2 running > arm64's for-next/core using Qemu-TCG to emulate x86. (Sorry for the bizarre > config, its because Qemu supports hotremove on x86, but not yet on arm64). > > > Insert the memory: > (qemu) object_add memory-backend-ram,id=mem1,size=1G > (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 > > | root@vm:~# free -m > | total used free shared ... > | Mem: 918 52 814 0 ... > | Swap: 0 0 0 > > > Bring it online: > | root@vm:~# cd /sys/devices/system/memory/ > | root@vm:/sys/devices/system/memory# for F in memory3*; do echo \ > | online_movable > $F/state; done > > | Built 1 zonelists, mobility grouping on. Total pages: 251049 > | Policy zone: DMA32 > > | -bash: echo: write error: Invalid argument > | root@vm:/sys/devices/system/memory# free -m > | total used free shared ... > | Mem: 1942 53 1836 0 ... > | Swap: 0 0 0 > > > Load kexec: > | root@vm:/sys/devices/system/memory# kexec -l /root/bzImage --reuse-cmdline > I assume this will trigger kexec_load -> do_kexec_load -> kimage_load_segment -> kimage_load_normal_segment -> kimage_alloc_page -> kimage_alloc_pages Which will just allocate a bunch of pages and mark them reserved. Now, AFAIKs, all allocations will be unmovable. So none of the kexec segment allocations will actually end up on your DIMM (as it is onlined online_movable). So, the loaded image (with its segments) from user won't be problematic and not get placed on your DIMM. Now, the problematic part is (via man kexec_load) "mem and memsz specify a physical address range that is the target of the copy." So the place where the image will be "assembled" at when doing the reboot. Understood :) > Press the Attention button to request removal: > > (qemu) device_del dimm1 > > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Built 1 zonelists, mobility grouping on. Total pages: 233728 > | Policy zone: DMA32 > > The memory is gone: > | root@vm:/sys/devices/system/memory# free -m > | total used free shared ... > | Mem: 918 89 769 0 ... > | Swap: 0 0 0 > > Trigger kexec: > | root@vm:/sys/devices/system/memory# kexec -e > > [...] > > | sd 0:0:0:0: [sda] Synchronizing SCSI cache > | kexec_core: Starting new kernel > > ... and Qemu restarts the platform firmware instead of proceeding with kexec. > (I assume this is a triple fault) > > You can use mem-min and mem-max to control where kexec's user space will place > the memory. > > > If you apply this patch, the above sequence will fail at the device remove step, > as the physical addresses match the loaded kexec image: > > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | kexec_core: Memory region in use > | kexec_core: Memory region in use Okay, so I assume the kexec userspace tool provided target kernel addresses for segments that reside on the DIMM. > | memory memory39: Offline failed. > | Built 1 zonelists, mobility grouping on. Total pages: 299212 > | Policy zone: Normal > > | root@vm:/sys/devices/system/memory# free -m > | total used free shared ... > | Mem: 1942 90 1793 0 ... > | Swap: 0 0 0 > > I can't remove the DIMM, because we failed to offline it: I wonder if we should instead make the "kexec -e" fail. It tries to touch random system memory. Denying to offline MOVABLE memory should be avoided - and what kexec does here sounds dangerous to me (allowing it to write random system memory). Roughly what I am thinking is this: diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index ba1d91e868ca..70c39a5307e5 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1135,6 +1135,10 @@ int kernel_kexec(void) error = -EINVAL; goto Unlock; } + if (!kexec_image_validate()) { + error = -EINVAL; + goto Unlock; + } #ifdef CONFIG_KEXEC_JUMP if (kexec_image->preserve_context) { kexec_image_validate() would go over all segments and validate that the involved pages are actual valid memory (pfn_to_online_page()). All we have to do is protect from memory hotplug until we switch to the new kernel. Will probably need some thought. But it will actually also bail out when user space passes wrong physical memory addresses, instead of triple-faulting silently. -- Thanks, David / dhildenb ^ permalink raw reply related [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-03-27 18:52 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-27 18:52 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel >> 2. You do the kexec. The kexec kernel will only operate on a reserved >> memory region (reserved via e.g., kernel cmdline crashkernel=128M). > > I think you are merging the kexec and kdump behaviours. > (Wrong terminology? The things behind 'kexec -l Image' and 'kexec -p Image') Oh, I see - I think your example below clarifies things. Something like that should go in the cover letter if we end up in this patch being required :) (I missed that the problematic part is "random" addresses passed by user space to the kernel, where it wants data to be loaded to on kexec -e) > > For kdump, yes, the new kernel is loaded into the crashkernel reservation, and > confined to it. > > > For regular kexec, the new kernel can be loaded any where in memory. There might > be a difference with how this works on arm64.... > > The regular kexec kernel isn't stored in its final location when its loaded, its > relocated there when the image is executed. The target/destination memory may > have been removed in the meantime. > > (an example recipe below should clarify this) > > >> Is it that in 2., the reserved memory region (for the crashkernel) could >> have been offlined in the meantime? > > No, for kdump: the crashkernel reservation is PG_reserved, and its not something > mm knows how to move, so that region can't be taken offline. > > (On arm64 we additionally prevent the boot-memory from being removed as it is > all described as present by UEFI. The crashkernel reservation would always be > from this type of memory) Right. > > > This is about a regular kexec, any crashdump reservation is irrelevant. > This kexec kernel is temporarily stored out of line, then relocated when executed. > > A recipe so that we're at least on the same terminal! This is on a TX2 running > arm64's for-next/core using Qemu-TCG to emulate x86. (Sorry for the bizarre > config, its because Qemu supports hotremove on x86, but not yet on arm64). > > > Insert the memory: > (qemu) object_add memory-backend-ram,id=mem1,size=1G > (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 > > | root@vm:~# free -m > | total used free shared ... > | Mem: 918 52 814 0 ... > | Swap: 0 0 0 > > > Bring it online: > | root@vm:~# cd /sys/devices/system/memory/ > | root@vm:/sys/devices/system/memory# for F in memory3*; do echo \ > | online_movable > $F/state; done > > | Built 1 zonelists, mobility grouping on. Total pages: 251049 > | Policy zone: DMA32 > > | -bash: echo: write error: Invalid argument > | root@vm:/sys/devices/system/memory# free -m > | total used free shared ... > | Mem: 1942 53 1836 0 ... > | Swap: 0 0 0 > > > Load kexec: > | root@vm:/sys/devices/system/memory# kexec -l /root/bzImage --reuse-cmdline > I assume this will trigger kexec_load -> do_kexec_load -> kimage_load_segment -> kimage_load_normal_segment -> kimage_alloc_page -> kimage_alloc_pages Which will just allocate a bunch of pages and mark them reserved. Now, AFAIKs, all allocations will be unmovable. So none of the kexec segment allocations will actually end up on your DIMM (as it is onlined online_movable). So, the loaded image (with its segments) from user won't be problematic and not get placed on your DIMM. Now, the problematic part is (via man kexec_load) "mem and memsz specify a physical address range that is the target of the copy." So the place where the image will be "assembled" at when doing the reboot. Understood :) > Press the Attention button to request removal: > > (qemu) device_del dimm1 > > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Built 1 zonelists, mobility grouping on. Total pages: 233728 > | Policy zone: DMA32 > > The memory is gone: > | root@vm:/sys/devices/system/memory# free -m > | total used free shared ... > | Mem: 918 89 769 0 ... > | Swap: 0 0 0 > > Trigger kexec: > | root@vm:/sys/devices/system/memory# kexec -e > > [...] > > | sd 0:0:0:0: [sda] Synchronizing SCSI cache > | kexec_core: Starting new kernel > > ... and Qemu restarts the platform firmware instead of proceeding with kexec. > (I assume this is a triple fault) > > You can use mem-min and mem-max to control where kexec's user space will place > the memory. > > > If you apply this patch, the above sequence will fail at the device remove step, > as the physical addresses match the loaded kexec image: > > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | Offlined Pages 32768 > | kexec_core: Memory region in use > | kexec_core: Memory region in use Okay, so I assume the kexec userspace tool provided target kernel addresses for segments that reside on the DIMM. > | memory memory39: Offline failed. > | Built 1 zonelists, mobility grouping on. Total pages: 299212 > | Policy zone: Normal > > | root@vm:/sys/devices/system/memory# free -m > | total used free shared ... > | Mem: 1942 90 1793 0 ... > | Swap: 0 0 0 > > I can't remove the DIMM, because we failed to offline it: I wonder if we should instead make the "kexec -e" fail. It tries to touch random system memory. Denying to offline MOVABLE memory should be avoided - and what kexec does here sounds dangerous to me (allowing it to write random system memory). Roughly what I am thinking is this: diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index ba1d91e868ca..70c39a5307e5 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1135,6 +1135,10 @@ int kernel_kexec(void) error = -EINVAL; goto Unlock; } + if (!kexec_image_validate()) { + error = -EINVAL; + goto Unlock; + } #ifdef CONFIG_KEXEC_JUMP if (kexec_image->preserve_context) { kexec_image_validate() would go over all segments and validate that the involved pages are actual valid memory (pfn_to_online_page()). All we have to do is protect from memory hotplug until we switch to the new kernel. Will probably need some thought. But it will actually also bail out when user space passes wrong physical memory addresses, instead of triple-faulting silently. -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-03-27 18:52 ` David Hildenbrand @ 2020-03-30 13:00 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-30 13:00 UTC (permalink / raw) To: David Hildenbrand Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma Hi David, On 3/27/20 6:52 PM, David Hildenbrand wrote: >>> 2. You do the kexec. The kexec kernel will only operate on a reserved >>> memory region (reserved via e.g., kernel cmdline crashkernel=128M). >> >> I think you are merging the kexec and kdump behaviours. >> (Wrong terminology? The things behind 'kexec -l Image' and 'kexec -p Image') > > Oh, I see - I think your example below clarifies things. Something like > that should go in the cover letter if we end up in this patch being > required :) Do you mean the commit message? I think its far too long... Adding a sentence about the way kexec load works may help, the first paragraph would read: | Kexec allows user-space to specify the address that the kexec image should be | loaded to. Because this memory may be in use, an image loaded for kexec is not | stored in place, instead its segments are scattered through memory, and are | re-assembled when needed. In the meantime, the target memory may have been | removed. Do you think thats clearer? > (I missed that the problematic part is "random" addresses passed by user > space to the kernel, where it wants data to be loaded to on kexec -e) [...] >> Load kexec: >> | root@vm:/sys/devices/system/memory# kexec -l /root/bzImage --reuse-cmdline >> > > I assume this will trigger > > kexec_load -> do_kexec_load -> kimage_load_segment -> > kimage_load_normal_segment -> kimage_alloc_page -> kimage_alloc_pages > > Which will just allocate a bunch of pages and mark them reserved. > > Now, AFAIKs, all allocations will be unmovable. So none of the kexec > segment allocations will actually end up on your DIMM (as it is onlined > online_movable). > > So, the loaded image (with its segments) from user won't be problematic > and not get placed on your DIMM. > > > Now, the problematic part is (via man kexec_load) "mem and memsz specify > a physical address range that is the target of the copy." > > So the place where the image will be "assembled" at when doing the > reboot. Understood :) Yup. [...] > I wonder if we should instead make the "kexec -e" fail. It tries to > touch random system memory. Heh, isn't touching random system memory what kexec does?! Its all described to user-space as 'System RAM'. Teaching it to probe /sys/devices/memory/... would require a user-space change. > Denying to offline MOVABLE memory should be avoided - and what kexec > does here sounds dangerous to me (allowing it to write random system > memory). > Roughly what I am thinking is this: > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index ba1d91e868ca..70c39a5307e5 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -1135,6 +1135,10 @@ int kernel_kexec(void) > error = -EINVAL; > goto Unlock; > } > + if (!kexec_image_validate()) { > + error = -EINVAL; > + goto Unlock; > + } > > #ifdef CONFIG_KEXEC_JUMP > if (kexec_image->preserve_context) { > > > kexec_image_validate() would go over all segments and validate that the > involved pages are actual valid memory (pfn_to_online_page()). > > All we have to do is protect from memory hotplug until we switch to the > new kernel. (migrate_to_reboot_cpu() can sleep), I think you'd end up with something like this patch, but only while kexec_in_progress. I don't think letting kexec fail if the events occur in a different order is good for user-space. > Will probably need some thought. But it will actually also bail out when > user space passes wrong physical memory addresses, instead of > triple-faulting silently. With this change, the reboot(LINUX_REBOOT_CMD_KEXEC), call would fail. This thing doesn't usually return, so we're likely to trigger error-handling that has never run before. (Last time I debugged one of these, it turned out kexec had taken the network interfaces down, meaning the nfsroot was no longer accessible) How can user-space know whether kexec is going to succeed, or fail like this? Any loaded kexec kernel could secretly be in this broken state. Can user-space know what caused this to become unreliable? (without reading the kernel source) Given kexec can be unloaded by user-space, I think its better to prevent us getting into the broken state, preferably giving the hint that kexec us using that memory. The user can 'kexec -u', then retry removing the memory. I think forbidding the memory-offline is simpler for user-space to deal with. Thanks, James ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-03-30 13:00 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-30 13:00 UTC (permalink / raw) To: David Hildenbrand Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel Hi David, On 3/27/20 6:52 PM, David Hildenbrand wrote: >>> 2. You do the kexec. The kexec kernel will only operate on a reserved >>> memory region (reserved via e.g., kernel cmdline crashkernel=128M). >> >> I think you are merging the kexec and kdump behaviours. >> (Wrong terminology? The things behind 'kexec -l Image' and 'kexec -p Image') > > Oh, I see - I think your example below clarifies things. Something like > that should go in the cover letter if we end up in this patch being > required :) Do you mean the commit message? I think its far too long... Adding a sentence about the way kexec load works may help, the first paragraph would read: | Kexec allows user-space to specify the address that the kexec image should be | loaded to. Because this memory may be in use, an image loaded for kexec is not | stored in place, instead its segments are scattered through memory, and are | re-assembled when needed. In the meantime, the target memory may have been | removed. Do you think thats clearer? > (I missed that the problematic part is "random" addresses passed by user > space to the kernel, where it wants data to be loaded to on kexec -e) [...] >> Load kexec: >> | root@vm:/sys/devices/system/memory# kexec -l /root/bzImage --reuse-cmdline >> > > I assume this will trigger > > kexec_load -> do_kexec_load -> kimage_load_segment -> > kimage_load_normal_segment -> kimage_alloc_page -> kimage_alloc_pages > > Which will just allocate a bunch of pages and mark them reserved. > > Now, AFAIKs, all allocations will be unmovable. So none of the kexec > segment allocations will actually end up on your DIMM (as it is onlined > online_movable). > > So, the loaded image (with its segments) from user won't be problematic > and not get placed on your DIMM. > > > Now, the problematic part is (via man kexec_load) "mem and memsz specify > a physical address range that is the target of the copy." > > So the place where the image will be "assembled" at when doing the > reboot. Understood :) Yup. [...] > I wonder if we should instead make the "kexec -e" fail. It tries to > touch random system memory. Heh, isn't touching random system memory what kexec does?! Its all described to user-space as 'System RAM'. Teaching it to probe /sys/devices/memory/... would require a user-space change. > Denying to offline MOVABLE memory should be avoided - and what kexec > does here sounds dangerous to me (allowing it to write random system > memory). > Roughly what I am thinking is this: > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index ba1d91e868ca..70c39a5307e5 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -1135,6 +1135,10 @@ int kernel_kexec(void) > error = -EINVAL; > goto Unlock; > } > + if (!kexec_image_validate()) { > + error = -EINVAL; > + goto Unlock; > + } > > #ifdef CONFIG_KEXEC_JUMP > if (kexec_image->preserve_context) { > > > kexec_image_validate() would go over all segments and validate that the > involved pages are actual valid memory (pfn_to_online_page()). > > All we have to do is protect from memory hotplug until we switch to the > new kernel. (migrate_to_reboot_cpu() can sleep), I think you'd end up with something like this patch, but only while kexec_in_progress. I don't think letting kexec fail if the events occur in a different order is good for user-space. > Will probably need some thought. But it will actually also bail out when > user space passes wrong physical memory addresses, instead of > triple-faulting silently. With this change, the reboot(LINUX_REBOOT_CMD_KEXEC), call would fail. This thing doesn't usually return, so we're likely to trigger error-handling that has never run before. (Last time I debugged one of these, it turned out kexec had taken the network interfaces down, meaning the nfsroot was no longer accessible) How can user-space know whether kexec is going to succeed, or fail like this? Any loaded kexec kernel could secretly be in this broken state. Can user-space know what caused this to become unreliable? (without reading the kernel source) Given kexec can be unloaded by user-space, I think its better to prevent us getting into the broken state, preferably giving the hint that kexec us using that memory. The user can 'kexec -u', then retry removing the memory. I think forbidding the memory-offline is simpler for user-space to deal with. Thanks, James _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-03-30 13:00 ` James Morse @ 2020-03-30 13:13 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-30 13:13 UTC (permalink / raw) To: James Morse Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma > Adding a sentence about the way kexec load works may help, the first paragraph > would read: > > | Kexec allows user-space to specify the address that the kexec image should be > | loaded to. Because this memory may be in use, an image loaded for kexec is not > | stored in place, instead its segments are scattered through memory, and are > | re-assembled when needed. In the meantime, the target memory may have been > | removed. > > Do you think thats clearer? Yes, very much. Maybe add, that the target is described by user space during kexec_load() and that user space - right now - parses /proc/iomem to find applicable system memory. > [...] > >>> Load kexec: >>> | root@vm:/sys/devices/system/memory# kexec -l /root/bzImage --reuse-cmdline >>> >> >> I assume this will trigger >> >> kexec_load -> do_kexec_load -> kimage_load_segment -> >> kimage_load_normal_segment -> kimage_alloc_page -> kimage_alloc_pages >> >> Which will just allocate a bunch of pages and mark them reserved. >> >> Now, AFAIKs, all allocations will be unmovable. So none of the kexec >> segment allocations will actually end up on your DIMM (as it is onlined >> online_movable). >> >> So, the loaded image (with its segments) from user won't be problematic >> and not get placed on your DIMM. >> >> >> Now, the problematic part is (via man kexec_load) "mem and memsz specify >> a physical address range that is the target of the copy." >> >> So the place where the image will be "assembled" at when doing the >> reboot. Understood :) > > Yup. > > [...] > >> I wonder if we should instead make the "kexec -e" fail. It tries to >> touch random system memory. > > Heh, isn't touching random system memory what kexec does?! Having a racy user interface that can trigger kernel crashes feels very wrong. We should limit the impact. > > Its all described to user-space as 'System RAM'. Teaching it to probe > /sys/devices/memory/... would require a user-space change. I think we should really rename hotplugged memory on all architectures. Especially also relevant for virtio-mem/hyper-v balloon, where some pieces of (hotplugged )memory blocks are partially unavailable and should not be touched - accessing them results in unpredictable behavior (e.g., crashes or discarded writes). [...] >> Will probably need some thought. But it will actually also bail out when >> user space passes wrong physical memory addresses, instead of >> triple-faulting silently. > > With this change, the reboot(LINUX_REBOOT_CMD_KEXEC), call would fail. This > thing doesn't usually return, so we're likely to trigger error-handling that has > never run before. > > (Last time I debugged one of these, it turned out kexec had taken the network > interfaces down, meaning the nfsroot was no longer accessible) > > How can user-space know whether kexec is going to succeed, or fail like this? > Any loaded kexec kernel could secretly be in this broken state. > > Can user-space know what caused this to become unreliable? (without reading the > kernel source) > > > Given kexec can be unloaded by user-space, I think its better to prevent us > getting into the broken state, preferably giving the hint that kexec us using > that memory. The user can 'kexec -u', then retry removing the memory. > > I think forbidding the memory-offline is simpler for user-space to deal with. I thought about this over the weekend, and I don't think it's the right approach. 1. It's racy. If memory is getting offlined/unplugged just while user space is about to trigger the kexec_load(), you end up with the very same triple-fault. 2. It's semantically wrong. kexec does not need online memory ("managed by the buddy"), but still you disallow offlining memory. I would really much rather want to see user-space choosing boot memory (e.g., renaming hotplugged memory on all architectures), and checking during "kexec -e" if the selected memory is actually "there", before trying to write to it. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-03-30 13:13 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-30 13:13 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel > Adding a sentence about the way kexec load works may help, the first paragraph > would read: > > | Kexec allows user-space to specify the address that the kexec image should be > | loaded to. Because this memory may be in use, an image loaded for kexec is not > | stored in place, instead its segments are scattered through memory, and are > | re-assembled when needed. In the meantime, the target memory may have been > | removed. > > Do you think thats clearer? Yes, very much. Maybe add, that the target is described by user space during kexec_load() and that user space - right now - parses /proc/iomem to find applicable system memory. > [...] > >>> Load kexec: >>> | root@vm:/sys/devices/system/memory# kexec -l /root/bzImage --reuse-cmdline >>> >> >> I assume this will trigger >> >> kexec_load -> do_kexec_load -> kimage_load_segment -> >> kimage_load_normal_segment -> kimage_alloc_page -> kimage_alloc_pages >> >> Which will just allocate a bunch of pages and mark them reserved. >> >> Now, AFAIKs, all allocations will be unmovable. So none of the kexec >> segment allocations will actually end up on your DIMM (as it is onlined >> online_movable). >> >> So, the loaded image (with its segments) from user won't be problematic >> and not get placed on your DIMM. >> >> >> Now, the problematic part is (via man kexec_load) "mem and memsz specify >> a physical address range that is the target of the copy." >> >> So the place where the image will be "assembled" at when doing the >> reboot. Understood :) > > Yup. > > [...] > >> I wonder if we should instead make the "kexec -e" fail. It tries to >> touch random system memory. > > Heh, isn't touching random system memory what kexec does?! Having a racy user interface that can trigger kernel crashes feels very wrong. We should limit the impact. > > Its all described to user-space as 'System RAM'. Teaching it to probe > /sys/devices/memory/... would require a user-space change. I think we should really rename hotplugged memory on all architectures. Especially also relevant for virtio-mem/hyper-v balloon, where some pieces of (hotplugged )memory blocks are partially unavailable and should not be touched - accessing them results in unpredictable behavior (e.g., crashes or discarded writes). [...] >> Will probably need some thought. But it will actually also bail out when >> user space passes wrong physical memory addresses, instead of >> triple-faulting silently. > > With this change, the reboot(LINUX_REBOOT_CMD_KEXEC), call would fail. This > thing doesn't usually return, so we're likely to trigger error-handling that has > never run before. > > (Last time I debugged one of these, it turned out kexec had taken the network > interfaces down, meaning the nfsroot was no longer accessible) > > How can user-space know whether kexec is going to succeed, or fail like this? > Any loaded kexec kernel could secretly be in this broken state. > > Can user-space know what caused this to become unreliable? (without reading the > kernel source) > > > Given kexec can be unloaded by user-space, I think its better to prevent us > getting into the broken state, preferably giving the hint that kexec us using > that memory. The user can 'kexec -u', then retry removing the memory. > > I think forbidding the memory-offline is simpler for user-space to deal with. I thought about this over the weekend, and I don't think it's the right approach. 1. It's racy. If memory is getting offlined/unplugged just while user space is about to trigger the kexec_load(), you end up with the very same triple-fault. 2. It's semantically wrong. kexec does not need online memory ("managed by the buddy"), but still you disallow offlining memory. I would really much rather want to see user-space choosing boot memory (e.g., renaming hotplugged memory on all architectures), and checking during "kexec -e" if the selected memory is actually "there", before trying to write to it. -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-03-30 13:13 ` David Hildenbrand @ 2020-03-30 17:17 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-30 17:17 UTC (permalink / raw) To: David Hildenbrand Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma Hi David, On 3/30/20 2:13 PM, David Hildenbrand wrote: >> Adding a sentence about the way kexec load works may help, the first paragraph >> would read: >> >> | Kexec allows user-space to specify the address that the kexec image should be >> | loaded to. Because this memory may be in use, an image loaded for kexec is not >> | stored in place, instead its segments are scattered through memory, and are >> | re-assembled when needed. In the meantime, the target memory may have been >> | removed. >> >> Do you think thats clearer? > > Yes, very much. Maybe add, that the target is described by user space > during kexec_load() and that user space - right now - parses /proc/iomem > to find applicable system memory. (I don't think x86 parses /proc/iomem anymore). I'll repost this patch with that expanded commit message, once we've agreed this is the right thing to do! >>> I wonder if we should instead make the "kexec -e" fail. It tries to >>> touch random system memory. >> >> Heh, isn't touching random system memory what kexec does?! > > Having a racy user interface that can trigger kernel crashes feels very > wrong. We should limit the impact. >> Its all described to user-space as 'System RAM'. Teaching it to probe >> /sys/devices/memory/... would require a user-space change. > > I think we should really rename hotplugged memory on all architectures. > > Especially also relevant for virtio-mem/hyper-v balloon, where some > pieces of (hotplugged )memory blocks are partially unavailable and > should not be touched - accessing them results in unpredictable behavior > (e.g., crashes or discarded writes). I'll need to look into these. I'd assume for KVM that virtio-mem can be brought back when its accessed ... its just going to be slow. >>> Will probably need some thought. But it will actually also bail out when >>> user space passes wrong physical memory addresses, instead of >>> triple-faulting silently. >> >> With this change, the reboot(LINUX_REBOOT_CMD_KEXEC), call would fail. This >> thing doesn't usually return, so we're likely to trigger error-handling that has >> never run before. >> >> (Last time I debugged one of these, it turned out kexec had taken the network >> interfaces down, meaning the nfsroot was no longer accessible) >> >> How can user-space know whether kexec is going to succeed, or fail like this? >> Any loaded kexec kernel could secretly be in this broken state. >> >> Can user-space know what caused this to become unreliable? (without reading the >> kernel source) >> >> >> Given kexec can be unloaded by user-space, I think its better to prevent us >> getting into the broken state, preferably giving the hint that kexec us using >> that memory. The user can 'kexec -u', then retry removing the memory. >> >> I think forbidding the memory-offline is simpler for user-space to deal with. > > I thought about this over the weekend, and I don't think it's the right > approach. > 1. It's racy. If memory is getting offlined/unplugged just while user > space is about to trigger the kexec_load(), you end up with the very > same triple-fault. load? How is this different to user-space providing a bogus address? Sure, user-space may take a nap between parsing /proc/iomem and calling kexec_load(), but the kernel should reject these as they would never work. (I can't see where sanity_check_segment_list() considers the platform's memory. If it doesn't, we should fix it) Once the image is loaded, and clashes with a request to remove the memory there are two choices: secretly unload the image, or prevent the memory being taken offline. > 2. It's semantically wrong. kexec does not need online memory ("managed > by the buddy"), but still you disallow offlining memory. It does need the memory if you want 'kexec -e' to succeed. If there were any sanity tests, they should have happened at load time. The memory is effectively in use by the loaded kexec image. User-space told the kernel to use this memory, you should not be able to then remove it, without unloading the kexec image first. Are you saying feeding bogus addresses to kexec_load() is _expected_ to blow up like this? > I would really much rather want to see user-space choosing boot memory > (e.g., renaming hotplugged memory on all architectures), and checking > during "kexec -e" if the selected memory is actually "there", before > trying to write to it. How does 'kexec -e' know where the kexec kernel was loaded? You'd need to pass something between 'load' and 'exec'. How do you keep existing user-space working as much as possible? What do you do if the memory isn't there? User-space just called reboot(), it would be better to avoid getting into the situation where we have to fail that call. Solving the bigger problem, would add a 'kexec_it_now' flag to the kexec_load() call. This would make the window where 'stuff' can change much smaller. Things changing while user-space sleeps isn't a solvable problem, these would need to be rejected by sanity tests at load time. Thanks, James ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-03-30 17:17 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-30 17:17 UTC (permalink / raw) To: David Hildenbrand Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel Hi David, On 3/30/20 2:13 PM, David Hildenbrand wrote: >> Adding a sentence about the way kexec load works may help, the first paragraph >> would read: >> >> | Kexec allows user-space to specify the address that the kexec image should be >> | loaded to. Because this memory may be in use, an image loaded for kexec is not >> | stored in place, instead its segments are scattered through memory, and are >> | re-assembled when needed. In the meantime, the target memory may have been >> | removed. >> >> Do you think thats clearer? > > Yes, very much. Maybe add, that the target is described by user space > during kexec_load() and that user space - right now - parses /proc/iomem > to find applicable system memory. (I don't think x86 parses /proc/iomem anymore). I'll repost this patch with that expanded commit message, once we've agreed this is the right thing to do! >>> I wonder if we should instead make the "kexec -e" fail. It tries to >>> touch random system memory. >> >> Heh, isn't touching random system memory what kexec does?! > > Having a racy user interface that can trigger kernel crashes feels very > wrong. We should limit the impact. >> Its all described to user-space as 'System RAM'. Teaching it to probe >> /sys/devices/memory/... would require a user-space change. > > I think we should really rename hotplugged memory on all architectures. > > Especially also relevant for virtio-mem/hyper-v balloon, where some > pieces of (hotplugged )memory blocks are partially unavailable and > should not be touched - accessing them results in unpredictable behavior > (e.g., crashes or discarded writes). I'll need to look into these. I'd assume for KVM that virtio-mem can be brought back when its accessed ... its just going to be slow. >>> Will probably need some thought. But it will actually also bail out when >>> user space passes wrong physical memory addresses, instead of >>> triple-faulting silently. >> >> With this change, the reboot(LINUX_REBOOT_CMD_KEXEC), call would fail. This >> thing doesn't usually return, so we're likely to trigger error-handling that has >> never run before. >> >> (Last time I debugged one of these, it turned out kexec had taken the network >> interfaces down, meaning the nfsroot was no longer accessible) >> >> How can user-space know whether kexec is going to succeed, or fail like this? >> Any loaded kexec kernel could secretly be in this broken state. >> >> Can user-space know what caused this to become unreliable? (without reading the >> kernel source) >> >> >> Given kexec can be unloaded by user-space, I think its better to prevent us >> getting into the broken state, preferably giving the hint that kexec us using >> that memory. The user can 'kexec -u', then retry removing the memory. >> >> I think forbidding the memory-offline is simpler for user-space to deal with. > > I thought about this over the weekend, and I don't think it's the right > approach. > 1. It's racy. If memory is getting offlined/unplugged just while user > space is about to trigger the kexec_load(), you end up with the very > same triple-fault. load? How is this different to user-space providing a bogus address? Sure, user-space may take a nap between parsing /proc/iomem and calling kexec_load(), but the kernel should reject these as they would never work. (I can't see where sanity_check_segment_list() considers the platform's memory. If it doesn't, we should fix it) Once the image is loaded, and clashes with a request to remove the memory there are two choices: secretly unload the image, or prevent the memory being taken offline. > 2. It's semantically wrong. kexec does not need online memory ("managed > by the buddy"), but still you disallow offlining memory. It does need the memory if you want 'kexec -e' to succeed. If there were any sanity tests, they should have happened at load time. The memory is effectively in use by the loaded kexec image. User-space told the kernel to use this memory, you should not be able to then remove it, without unloading the kexec image first. Are you saying feeding bogus addresses to kexec_load() is _expected_ to blow up like this? > I would really much rather want to see user-space choosing boot memory > (e.g., renaming hotplugged memory on all architectures), and checking > during "kexec -e" if the selected memory is actually "there", before > trying to write to it. How does 'kexec -e' know where the kexec kernel was loaded? You'd need to pass something between 'load' and 'exec'. How do you keep existing user-space working as much as possible? What do you do if the memory isn't there? User-space just called reboot(), it would be better to avoid getting into the situation where we have to fail that call. Solving the bigger problem, would add a 'kexec_it_now' flag to the kexec_load() call. This would make the window where 'stuff' can change much smaller. Things changing while user-space sleeps isn't a solvable problem, these would need to be rejected by sanity tests at load time. Thanks, James _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-03-30 17:17 ` James Morse @ 2020-03-30 18:14 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-30 18:14 UTC (permalink / raw) To: James Morse Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma On 30.03.20 19:17, James Morse wrote: > Hi David, > > On 3/30/20 2:13 PM, David Hildenbrand wrote: >>> Adding a sentence about the way kexec load works may help, the first paragraph >>> would read: >>> >>> | Kexec allows user-space to specify the address that the kexec image should be >>> | loaded to. Because this memory may be in use, an image loaded for kexec is not >>> | stored in place, instead its segments are scattered through memory, and are >>> | re-assembled when needed. In the meantime, the target memory may have been >>> | removed. >>> >>> Do you think thats clearer? >> >> Yes, very much. Maybe add, that the target is described by user space >> during kexec_load() and that user space - right now - parses /proc/iomem >> to find applicable system memory. > > (I don't think x86 parses /proc/iomem anymore). I'll repost this patch with that > expanded commit message, once we've agreed this is the right thing to do! Right, I can see kexec-tools parsing /sys/firmware/memmap first. Unfortunately, all hotplugged memory (via add_memory()) is indicated there as System RAM ... including memory added by virtio-mem. I think we should adapt the type there as well. (in your patch #2) firmware_map_add_hotplug(start, start + size, "System RAM"); > > >>>> I wonder if we should instead make the "kexec -e" fail. It tries to >>>> touch random system memory. >>> >>> Heh, isn't touching random system memory what kexec does?! >> >> Having a racy user interface that can trigger kernel crashes feels very >> wrong. We should limit the impact. > > >>> Its all described to user-space as 'System RAM'. Teaching it to probe >>> /sys/devices/memory/... would require a user-space change. >> >> I think we should really rename hotplugged memory on all architectures. >> >> Especially also relevant for virtio-mem/hyper-v balloon, where some >> pieces of (hotplugged )memory blocks are partially unavailable and >> should not be touched - accessing them results in unpredictable behavior >> (e.g., crashes or discarded writes). > > I'll need to look into these. I'd assume for KVM that virtio-mem can be brought > back when its accessed ... its just going to be slow. Touching unplugged virtio-mem memory can result in unpredictable behavior. Touching (some) unplugged Hyper-V memory will be handled similarly AFAIK. [...] >> 1. It's racy. If memory is getting offlined/unplugged just while user >> space is about to trigger the kexec_load(), you end up with the very >> same triple-fault. > > load? How is this different to user-space providing a bogus address? I guess it's not different. It's just racy because user space with good intend could crash the system :) > > Sure, user-space may take a nap between parsing /proc/iomem and calling > kexec_load(), but the kernel should reject these as they would never work. > > (I can't see where sanity_check_segment_list() considers the platform's memory. > If it doesn't, we should fix it) Right, that's what I meant. I was not able to find any sanity checks. Maybe they are in place but I was not able to spot them. > > Once the image is loaded, and clashes with a request to remove the memory there > are two choices: secretly unload the image, or prevent the memory being taken > offline. Exactly. Or make "kexec -e" fail. > > >> 2. It's semantically wrong. kexec does not need online memory ("managed >> by the buddy"), but still you disallow offlining memory. > > It does need the memory if you want 'kexec -e' to succeed. > If there were any sanity tests, they should have happened at load time. Offlining != removing. That's the point I was trying to make. (and we don't want to block removing of memory in the kernel any other way) > > The memory is effectively in use by the loaded kexec image. User-space told the > kernel to use this memory, you should not be able to then remove it, without > unloading the kexec image first. It's not in use before you do the "kexec -e" IMHO. > Are you saying feeding bogus addresses to kexec_load() is _expected_ to blow up > like this? No, not at all. I think this should be fixed if this is possible. > >> I would really much rather want to see user-space choosing boot memory >> (e.g., renaming hotplugged memory on all architectures), and checking >> during "kexec -e" if the selected memory is actually "there", before >> trying to write to it. > > How does 'kexec -e' know where the kexec kernel was loaded? You'd need to pass > something between 'load' and 'exec'. How do you keep existing user-space working > as much as possible? If we use new types (e.g., "System RAM (hotplugged)"), looks like most of kexec will continue working (memory will be treated like RANGE_RESERVED or ignored). I guess we would still have to teach kexec-tools the new types, primarily to keep the crash memory ranges from getting detected properly. (no idea how they are used, will have to take a closer look) > > What do you do if the memory isn't there? User-space just called reboot(), it > would be better to avoid getting into the situation where we have to fail that call. In kernel_kexec() we already fail if there is no kernel image loaded, so we can similarly simply fail if the kernel image cannot be moved to the target memory IMHO. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-03-30 18:14 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-30 18:14 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 30.03.20 19:17, James Morse wrote: > Hi David, > > On 3/30/20 2:13 PM, David Hildenbrand wrote: >>> Adding a sentence about the way kexec load works may help, the first paragraph >>> would read: >>> >>> | Kexec allows user-space to specify the address that the kexec image should be >>> | loaded to. Because this memory may be in use, an image loaded for kexec is not >>> | stored in place, instead its segments are scattered through memory, and are >>> | re-assembled when needed. In the meantime, the target memory may have been >>> | removed. >>> >>> Do you think thats clearer? >> >> Yes, very much. Maybe add, that the target is described by user space >> during kexec_load() and that user space - right now - parses /proc/iomem >> to find applicable system memory. > > (I don't think x86 parses /proc/iomem anymore). I'll repost this patch with that > expanded commit message, once we've agreed this is the right thing to do! Right, I can see kexec-tools parsing /sys/firmware/memmap first. Unfortunately, all hotplugged memory (via add_memory()) is indicated there as System RAM ... including memory added by virtio-mem. I think we should adapt the type there as well. (in your patch #2) firmware_map_add_hotplug(start, start + size, "System RAM"); > > >>>> I wonder if we should instead make the "kexec -e" fail. It tries to >>>> touch random system memory. >>> >>> Heh, isn't touching random system memory what kexec does?! >> >> Having a racy user interface that can trigger kernel crashes feels very >> wrong. We should limit the impact. > > >>> Its all described to user-space as 'System RAM'. Teaching it to probe >>> /sys/devices/memory/... would require a user-space change. >> >> I think we should really rename hotplugged memory on all architectures. >> >> Especially also relevant for virtio-mem/hyper-v balloon, where some >> pieces of (hotplugged )memory blocks are partially unavailable and >> should not be touched - accessing them results in unpredictable behavior >> (e.g., crashes or discarded writes). > > I'll need to look into these. I'd assume for KVM that virtio-mem can be brought > back when its accessed ... its just going to be slow. Touching unplugged virtio-mem memory can result in unpredictable behavior. Touching (some) unplugged Hyper-V memory will be handled similarly AFAIK. [...] >> 1. It's racy. If memory is getting offlined/unplugged just while user >> space is about to trigger the kexec_load(), you end up with the very >> same triple-fault. > > load? How is this different to user-space providing a bogus address? I guess it's not different. It's just racy because user space with good intend could crash the system :) > > Sure, user-space may take a nap between parsing /proc/iomem and calling > kexec_load(), but the kernel should reject these as they would never work. > > (I can't see where sanity_check_segment_list() considers the platform's memory. > If it doesn't, we should fix it) Right, that's what I meant. I was not able to find any sanity checks. Maybe they are in place but I was not able to spot them. > > Once the image is loaded, and clashes with a request to remove the memory there > are two choices: secretly unload the image, or prevent the memory being taken > offline. Exactly. Or make "kexec -e" fail. > > >> 2. It's semantically wrong. kexec does not need online memory ("managed >> by the buddy"), but still you disallow offlining memory. > > It does need the memory if you want 'kexec -e' to succeed. > If there were any sanity tests, they should have happened at load time. Offlining != removing. That's the point I was trying to make. (and we don't want to block removing of memory in the kernel any other way) > > The memory is effectively in use by the loaded kexec image. User-space told the > kernel to use this memory, you should not be able to then remove it, without > unloading the kexec image first. It's not in use before you do the "kexec -e" IMHO. > Are you saying feeding bogus addresses to kexec_load() is _expected_ to blow up > like this? No, not at all. I think this should be fixed if this is possible. > >> I would really much rather want to see user-space choosing boot memory >> (e.g., renaming hotplugged memory on all architectures), and checking >> during "kexec -e" if the selected memory is actually "there", before >> trying to write to it. > > How does 'kexec -e' know where the kexec kernel was loaded? You'd need to pass > something between 'load' and 'exec'. How do you keep existing user-space working > as much as possible? If we use new types (e.g., "System RAM (hotplugged)"), looks like most of kexec will continue working (memory will be treated like RANGE_RESERVED or ignored). I guess we would still have to teach kexec-tools the new types, primarily to keep the crash memory ranges from getting detected properly. (no idea how they are used, will have to take a closer look) > > What do you do if the memory isn't there? User-space just called reboot(), it > would be better to avoid getting into the situation where we have to fail that call. In kernel_kexec() we already fail if there is no kernel image loaded, so we can similarly simply fail if the kernel image cannot be moved to the target memory IMHO. -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-03-30 18:14 ` David Hildenbrand (?) @ 2020-04-10 19:10 ` Andrew Morton -1 siblings, 0 replies; 264+ messages in thread From: Andrew Morton @ 2020-04-10 19:10 UTC (permalink / raw) To: David Hildenbrand Cc: James Morse, kexec, linux-mm, linux-arm-kernel, Eric Biederman, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma It's unclear (to me) what is the status of this patchset. But it does appear that an new version can be expected? ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-10 19:10 ` Andrew Morton 0 siblings, 0 replies; 264+ messages in thread From: Andrew Morton @ 2020-04-10 19:10 UTC (permalink / raw) To: David Hildenbrand Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Eric Biederman, Will Deacon, linux-arm-kernel It's unclear (to me) what is the status of this patchset. But it does appear that an new version can be expected? _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-10 19:10 ` Andrew Morton 0 siblings, 0 replies; 264+ messages in thread From: Andrew Morton @ 2020-04-10 19:10 UTC (permalink / raw) To: David Hildenbrand Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Eric Biederman, Will Deacon, linux-arm-kernel It's unclear (to me) what is the status of this patchset. But it does appear that an new version can be expected? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-10 19:10 ` Andrew Morton (?) @ 2020-04-11 3:44 ` Baoquan He -1 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-11 3:44 UTC (permalink / raw) To: Andrew Morton Cc: David Hildenbrand, James Morse, kexec, linux-mm, linux-arm-kernel, Eric Biederman, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma On 04/10/20 at 12:10pm, Andrew Morton wrote: > It's unclear (to me) what is the status of this patchset. But it does appear that > an new version can be expected? As we discussed in the thread of replying to the cover letter, the idea of this patchset is not good. Because We tend to use kexec_file_load more and improve/enhance it in the future, and gradually obsolete the old kexec_load interface which this patchset is trying to fix on. And the issue James spot is a very corner case, we have suggested another easier way to avoid it by adding systemd service to load kexec and monitor memory adding/removing uevent, juas as we have done for kdump loading. Bhupesh is working on this to add a service in Fedora and test, and will put it to RHEL too if nobody is unsatisfied. Thanks Baoquan ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-11 3:44 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-11 3:44 UTC (permalink / raw) To: Andrew Morton Cc: David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, linux-mm, James Morse, Eric Biederman, Will Deacon, linux-arm-kernel On 04/10/20 at 12:10pm, Andrew Morton wrote: > It's unclear (to me) what is the status of this patchset. But it does appear that > an new version can be expected? As we discussed in the thread of replying to the cover letter, the idea of this patchset is not good. Because We tend to use kexec_file_load more and improve/enhance it in the future, and gradually obsolete the old kexec_load interface which this patchset is trying to fix on. And the issue James spot is a very corner case, we have suggested another easier way to avoid it by adding systemd service to load kexec and monitor memory adding/removing uevent, juas as we have done for kdump loading. Bhupesh is working on this to add a service in Fedora and test, and will put it to RHEL too if nobody is unsatisfied. Thanks Baoquan _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-11 3:44 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-11 3:44 UTC (permalink / raw) To: Andrew Morton Cc: David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, linux-mm, James Morse, Eric Biederman, Will Deacon, linux-arm-kernel On 04/10/20 at 12:10pm, Andrew Morton wrote: > It's unclear (to me) what is the status of this patchset. But it does appear that > an new version can be expected? As we discussed in the thread of replying to the cover letter, the idea of this patchset is not good. Because We tend to use kexec_file_load more and improve/enhance it in the future, and gradually obsolete the old kexec_load interface which this patchset is trying to fix on. And the issue James spot is a very corner case, we have suggested another easier way to avoid it by adding systemd service to load kexec and monitor memory adding/removing uevent, juas as we have done for kdump loading. Bhupesh is working on this to add a service in Fedora and test, and will put it to RHEL too if nobody is unsatisfied. Thanks Baoquan _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-11 3:44 ` Baoquan He (?) @ 2020-04-11 9:30 ` Russell King - ARM Linux admin -1 siblings, 0 replies; 264+ messages in thread From: Russell King - ARM Linux admin @ 2020-04-11 9:30 UTC (permalink / raw) To: Baoquan He Cc: Andrew Morton, David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, linux-mm, James Morse, Eric Biederman, Will Deacon, linux-arm-kernel On Sat, Apr 11, 2020 at 11:44:14AM +0800, Baoquan He wrote: > Because We tend to use kexec_file_load more and improve/enhance it in the > future, and gradually obsolete the old kexec_load interface which this > patchset is trying to fix on. That's not going to happen; 32-bit ARM kexec uses the kexec_load interface rather than the kexec_file_load version, and I see no one with any interest in changing that - and there's users of the former. I don't see how it's possible to convert 32-bit ARM kexec to the kexec_file_load interface - this assumes that all you have are the kernel, initrd, and commandline, but on 32-bit ARM kexec, we have kernel, initrd and the dtb blob which the user can specify. So, if we wanted to obsolete the kexec_load interface, _first_ there needs to be a way to provide users with the existing functionality they have already in place on 32-bit ARM - otherwise we're looking at a userspace regression. Especially as kexec_file_load takes precedence on some distro patched versions of the kexec tool, irrespective of which interface the user requests of the tool. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-11 9:30 ` Russell King - ARM Linux admin 0 siblings, 0 replies; 264+ messages in thread From: Russell King - ARM Linux admin @ 2020-04-11 9:30 UTC (permalink / raw) To: Baoquan He Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, linux-mm, James Morse, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On Sat, Apr 11, 2020 at 11:44:14AM +0800, Baoquan He wrote: > Because We tend to use kexec_file_load more and improve/enhance it in the > future, and gradually obsolete the old kexec_load interface which this > patchset is trying to fix on. That's not going to happen; 32-bit ARM kexec uses the kexec_load interface rather than the kexec_file_load version, and I see no one with any interest in changing that - and there's users of the former. I don't see how it's possible to convert 32-bit ARM kexec to the kexec_file_load interface - this assumes that all you have are the kernel, initrd, and commandline, but on 32-bit ARM kexec, we have kernel, initrd and the dtb blob which the user can specify. So, if we wanted to obsolete the kexec_load interface, _first_ there needs to be a way to provide users with the existing functionality they have already in place on 32-bit ARM - otherwise we're looking at a userspace regression. Especially as kexec_file_load takes precedence on some distro patched versions of the kexec tool, irrespective of which interface the user requests of the tool. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-11 9:30 ` Russell King - ARM Linux admin 0 siblings, 0 replies; 264+ messages in thread From: Russell King - ARM Linux admin @ 2020-04-11 9:30 UTC (permalink / raw) To: Baoquan He Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, linux-mm, James Morse, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On Sat, Apr 11, 2020 at 11:44:14AM +0800, Baoquan He wrote: > Because We tend to use kexec_file_load more and improve/enhance it in the > future, and gradually obsolete the old kexec_load interface which this > patchset is trying to fix on. That's not going to happen; 32-bit ARM kexec uses the kexec_load interface rather than the kexec_file_load version, and I see no one with any interest in changing that - and there's users of the former. I don't see how it's possible to convert 32-bit ARM kexec to the kexec_file_load interface - this assumes that all you have are the kernel, initrd, and commandline, but on 32-bit ARM kexec, we have kernel, initrd and the dtb blob which the user can specify. So, if we wanted to obsolete the kexec_load interface, _first_ there needs to be a way to provide users with the existing functionality they have already in place on 32-bit ARM - otherwise we're looking at a userspace regression. Especially as kexec_file_load takes precedence on some distro patched versions of the kexec tool, irrespective of which interface the user requests of the tool. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-11 9:30 ` Russell King - ARM Linux admin (?) @ 2020-04-11 9:58 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-11 9:58 UTC (permalink / raw) To: Russell King - ARM Linux admin Cc: Baoquan He, Andrew Morton, David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, linux-mm, James Morse, Eric Biederman, Will Deacon, linux-arm-kernel > Am 11.04.2020 um 11:40 schrieb Russell King - ARM Linux admin <linux@armlinux.org.uk>: > > On Sat, Apr 11, 2020 at 11:44:14AM +0800, Baoquan He wrote: >> Because We tend to use kexec_file_load more and improve/enhance it in the >> future, and gradually obsolete the old kexec_load interface which this >> patchset is trying to fix on. > > That's not going to happen; 32-bit ARM kexec uses the kexec_load > interface rather than the kexec_file_load version, and I see no one > with any interest in changing that - and there's users of the former. > > I don't see how it's possible to convert 32-bit ARM kexec to the > kexec_file_load interface - this assumes that all you have are the > kernel, initrd, and commandline, but on 32-bit ARM kexec, we have > kernel, initrd and the dtb blob which the user can specify. > > So, if we wanted to obsolete the kexec_load interface, _first_ there > needs to be a way to provide users with the existing functionality > they have already in place on 32-bit ARM - otherwise we're looking > at a userspace regression. Especially as kexec_file_load takes > precedence on some distro patched versions of the kexec tool, > irrespective of which interface the user requests of the tool. > On 32bit architectures we usually don‘t really care about memory hotplug. So we could deprecate it only for 64bit architectures AFAIKS. > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up > ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-11 9:58 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-11 9:58 UTC (permalink / raw) To: Russell King - ARM Linux admin Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, linux-mm, James Morse, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel > Am 11.04.2020 um 11:40 schrieb Russell King - ARM Linux admin <linux@armlinux.org.uk>: > > On Sat, Apr 11, 2020 at 11:44:14AM +0800, Baoquan He wrote: >> Because We tend to use kexec_file_load more and improve/enhance it in the >> future, and gradually obsolete the old kexec_load interface which this >> patchset is trying to fix on. > > That's not going to happen; 32-bit ARM kexec uses the kexec_load > interface rather than the kexec_file_load version, and I see no one > with any interest in changing that - and there's users of the former. > > I don't see how it's possible to convert 32-bit ARM kexec to the > kexec_file_load interface - this assumes that all you have are the > kernel, initrd, and commandline, but on 32-bit ARM kexec, we have > kernel, initrd and the dtb blob which the user can specify. > > So, if we wanted to obsolete the kexec_load interface, _first_ there > needs to be a way to provide users with the existing functionality > they have already in place on 32-bit ARM - otherwise we're looking > at a userspace regression. Especially as kexec_file_load takes > precedence on some distro patched versions of the kexec tool, > irrespective of which interface the user requests of the tool. > On 32bit architectures we usually don‘t really care about memory hotplug. So we could deprecate it only for 64bit architectures AFAIKS. > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up > _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-11 9:58 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-11 9:58 UTC (permalink / raw) To: Russell King - ARM Linux admin Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, linux-mm, James Morse, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel > Am 11.04.2020 um 11:40 schrieb Russell King - ARM Linux admin <linux@armlinux.org.uk>: > > On Sat, Apr 11, 2020 at 11:44:14AM +0800, Baoquan He wrote: >> Because We tend to use kexec_file_load more and improve/enhance it in the >> future, and gradually obsolete the old kexec_load interface which this >> patchset is trying to fix on. > > That's not going to happen; 32-bit ARM kexec uses the kexec_load > interface rather than the kexec_file_load version, and I see no one > with any interest in changing that - and there's users of the former. > > I don't see how it's possible to convert 32-bit ARM kexec to the > kexec_file_load interface - this assumes that all you have are the > kernel, initrd, and commandline, but on 32-bit ARM kexec, we have > kernel, initrd and the dtb blob which the user can specify. > > So, if we wanted to obsolete the kexec_load interface, _first_ there > needs to be a way to provide users with the existing functionality > they have already in place on 32-bit ARM - otherwise we're looking > at a userspace regression. Especially as kexec_file_load takes > precedence on some distro patched versions of the kexec tool, > irrespective of which interface the user requests of the tool. > On 32bit architectures we usually don‘t really care about memory hotplug. So we could deprecate it only for 64bit architectures AFAIKS. > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-11 9:30 ` Russell King - ARM Linux admin (?) @ 2020-04-12 5:35 ` Baoquan He -1 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-12 5:35 UTC (permalink / raw) To: Russell King - ARM Linux admin Cc: Andrew Morton, David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, linux-mm, James Morse, Eric Biederman, Will Deacon, linux-arm-kernel On 04/11/20 at 10:30am, Russell King - ARM Linux admin wrote: > On Sat, Apr 11, 2020 at 11:44:14AM +0800, Baoquan He wrote: > > Because We tend to use kexec_file_load more and improve/enhance it in the > > future, and gradually obsolete the old kexec_load interface which this > > patchset is trying to fix on. > > That's not going to happen; 32-bit ARM kexec uses the kexec_load > interface rather than the kexec_file_load version, and I see no one > with any interest in changing that - and there's users of the former. > > I don't see how it's possible to convert 32-bit ARM kexec to the > kexec_file_load interface - this assumes that all you have are the > kernel, initrd, and commandline, but on 32-bit ARM kexec, we have > kernel, initrd and the dtb blob which the user can specify. Well, I understand what you said about 32-bit ARM support with only kexec_old support thing. That's why I said we tend to obsolete it 'GRADUALLY'. It's the existing users who are using kexec_load, and the ARCHes which only has kexec_load, make us have to transfer to kexec_file_load gradually. Comparing with kexec_load, kexec_file_load has only one disadvantage, that is some ARCHes only have kexec_load. Otherwise, kexec_file_load benefits kexec/kdump developping/maintaining very much. The loading job of kexec_file_load is mostly done in kernel, we can get whatever we want about kernel information very conveniently to do anything needed. For the kexec_load interface, the loading job is mostly done in userspace, we have to export kernel information to procfs, sysfs, etc, then parse them in kexec_tools, finally passed it to kernel part of kexec loading. The gradual obsoleting means we may only add feature/improvement/enhancement to kexec_file_load. And if a bug fix is needed for both kexec_load and kexec_file_load, and the fix is very complicated, we may only fix it in kexec_file_load too. Kexec_file_load interface is suggested to add if does't have, just port user space part to kernel as x86/s390/arm64 have done. Surely, it doesn't mean we don't fix the critical/blocker bug with kexec_load loading. We still try to do, just are not so eager. In the existing product environment, the kexec_load is used, just keep using it. Do we bother to change it to kexec_file_load, e.g in our RHEL7 distros? Certainly not. But in our new product, we will change to use kexec_file_load interface. I guess this is similar with arm64. The advantage and benefit have been told in the 2nd paragraph. As for 32-bit ARM, is it like the old product, we have many in-use systems deployed in customers' laboratory? Wondering if ARM continues designing new 32-bit ARM cpu, and some companies continue producing tons of 32-bit ARM cpus. If yes, I think we need continue taking care of kexec_load if 32-bit ARM can't convert to kexec_file_load. If not, it may be not a barrier when we consider converting kexec_load to kexec_file_load in other ARCHes. We just need keep using it, try to fix those critical/blocker bug in kexec_load interface if encountered. Finally, comning back to this patchset itself, the issue James spotted is not so ciritical, I would say. When I do kexec jumping, I will do loading firstly, then trigge jumping. I can think of the case that people may load kexec-ed kernel, then do something else, later she/he triggers the kexec jumping. These are not necessary steps. As Dave and I replied to James in the cover-letter thread, adding a systemd service of kexec loading, monitor hotplug uevent, reload it if any hot remove happened. This is quite easy to do, I don't see any problem with it, and why we don't do like this. My personal opinion, please tell if I miss anything. > > So, if we wanted to obsolete the kexec_load interface, _first_ there > needs to be a way to provide users with the existing functionality > they have already in place on 32-bit ARM - otherwise we're looking > at a userspace regression. Especially as kexec_file_load takes > precedence on some distro patched versions of the kexec tool, > irrespective of which interface the user requests of the tool. > > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up > ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-12 5:35 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-12 5:35 UTC (permalink / raw) To: Russell King - ARM Linux admin Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, linux-mm, James Morse, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/11/20 at 10:30am, Russell King - ARM Linux admin wrote: > On Sat, Apr 11, 2020 at 11:44:14AM +0800, Baoquan He wrote: > > Because We tend to use kexec_file_load more and improve/enhance it in the > > future, and gradually obsolete the old kexec_load interface which this > > patchset is trying to fix on. > > That's not going to happen; 32-bit ARM kexec uses the kexec_load > interface rather than the kexec_file_load version, and I see no one > with any interest in changing that - and there's users of the former. > > I don't see how it's possible to convert 32-bit ARM kexec to the > kexec_file_load interface - this assumes that all you have are the > kernel, initrd, and commandline, but on 32-bit ARM kexec, we have > kernel, initrd and the dtb blob which the user can specify. Well, I understand what you said about 32-bit ARM support with only kexec_old support thing. That's why I said we tend to obsolete it 'GRADUALLY'. It's the existing users who are using kexec_load, and the ARCHes which only has kexec_load, make us have to transfer to kexec_file_load gradually. Comparing with kexec_load, kexec_file_load has only one disadvantage, that is some ARCHes only have kexec_load. Otherwise, kexec_file_load benefits kexec/kdump developping/maintaining very much. The loading job of kexec_file_load is mostly done in kernel, we can get whatever we want about kernel information very conveniently to do anything needed. For the kexec_load interface, the loading job is mostly done in userspace, we have to export kernel information to procfs, sysfs, etc, then parse them in kexec_tools, finally passed it to kernel part of kexec loading. The gradual obsoleting means we may only add feature/improvement/enhancement to kexec_file_load. And if a bug fix is needed for both kexec_load and kexec_file_load, and the fix is very complicated, we may only fix it in kexec_file_load too. Kexec_file_load interface is suggested to add if does't have, just port user space part to kernel as x86/s390/arm64 have done. Surely, it doesn't mean we don't fix the critical/blocker bug with kexec_load loading. We still try to do, just are not so eager. In the existing product environment, the kexec_load is used, just keep using it. Do we bother to change it to kexec_file_load, e.g in our RHEL7 distros? Certainly not. But in our new product, we will change to use kexec_file_load interface. I guess this is similar with arm64. The advantage and benefit have been told in the 2nd paragraph. As for 32-bit ARM, is it like the old product, we have many in-use systems deployed in customers' laboratory? Wondering if ARM continues designing new 32-bit ARM cpu, and some companies continue producing tons of 32-bit ARM cpus. If yes, I think we need continue taking care of kexec_load if 32-bit ARM can't convert to kexec_file_load. If not, it may be not a barrier when we consider converting kexec_load to kexec_file_load in other ARCHes. We just need keep using it, try to fix those critical/blocker bug in kexec_load interface if encountered. Finally, comning back to this patchset itself, the issue James spotted is not so ciritical, I would say. When I do kexec jumping, I will do loading firstly, then trigge jumping. I can think of the case that people may load kexec-ed kernel, then do something else, later she/he triggers the kexec jumping. These are not necessary steps. As Dave and I replied to James in the cover-letter thread, adding a systemd service of kexec loading, monitor hotplug uevent, reload it if any hot remove happened. This is quite easy to do, I don't see any problem with it, and why we don't do like this. My personal opinion, please tell if I miss anything. > > So, if we wanted to obsolete the kexec_load interface, _first_ there > needs to be a way to provide users with the existing functionality > they have already in place on 32-bit ARM - otherwise we're looking > at a userspace regression. Especially as kexec_file_load takes > precedence on some distro patched versions of the kexec tool, > irrespective of which interface the user requests of the tool. > > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up > _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-12 5:35 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-12 5:35 UTC (permalink / raw) To: Russell King - ARM Linux admin Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, linux-mm, James Morse, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/11/20 at 10:30am, Russell King - ARM Linux admin wrote: > On Sat, Apr 11, 2020 at 11:44:14AM +0800, Baoquan He wrote: > > Because We tend to use kexec_file_load more and improve/enhance it in the > > future, and gradually obsolete the old kexec_load interface which this > > patchset is trying to fix on. > > That's not going to happen; 32-bit ARM kexec uses the kexec_load > interface rather than the kexec_file_load version, and I see no one > with any interest in changing that - and there's users of the former. > > I don't see how it's possible to convert 32-bit ARM kexec to the > kexec_file_load interface - this assumes that all you have are the > kernel, initrd, and commandline, but on 32-bit ARM kexec, we have > kernel, initrd and the dtb blob which the user can specify. Well, I understand what you said about 32-bit ARM support with only kexec_old support thing. That's why I said we tend to obsolete it 'GRADUALLY'. It's the existing users who are using kexec_load, and the ARCHes which only has kexec_load, make us have to transfer to kexec_file_load gradually. Comparing with kexec_load, kexec_file_load has only one disadvantage, that is some ARCHes only have kexec_load. Otherwise, kexec_file_load benefits kexec/kdump developping/maintaining very much. The loading job of kexec_file_load is mostly done in kernel, we can get whatever we want about kernel information very conveniently to do anything needed. For the kexec_load interface, the loading job is mostly done in userspace, we have to export kernel information to procfs, sysfs, etc, then parse them in kexec_tools, finally passed it to kernel part of kexec loading. The gradual obsoleting means we may only add feature/improvement/enhancement to kexec_file_load. And if a bug fix is needed for both kexec_load and kexec_file_load, and the fix is very complicated, we may only fix it in kexec_file_load too. Kexec_file_load interface is suggested to add if does't have, just port user space part to kernel as x86/s390/arm64 have done. Surely, it doesn't mean we don't fix the critical/blocker bug with kexec_load loading. We still try to do, just are not so eager. In the existing product environment, the kexec_load is used, just keep using it. Do we bother to change it to kexec_file_load, e.g in our RHEL7 distros? Certainly not. But in our new product, we will change to use kexec_file_load interface. I guess this is similar with arm64. The advantage and benefit have been told in the 2nd paragraph. As for 32-bit ARM, is it like the old product, we have many in-use systems deployed in customers' laboratory? Wondering if ARM continues designing new 32-bit ARM cpu, and some companies continue producing tons of 32-bit ARM cpus. If yes, I think we need continue taking care of kexec_load if 32-bit ARM can't convert to kexec_file_load. If not, it may be not a barrier when we consider converting kexec_load to kexec_file_load in other ARCHes. We just need keep using it, try to fix those critical/blocker bug in kexec_load interface if encountered. Finally, comning back to this patchset itself, the issue James spotted is not so ciritical, I would say. When I do kexec jumping, I will do loading firstly, then trigge jumping. I can think of the case that people may load kexec-ed kernel, then do something else, later she/he triggers the kexec jumping. These are not necessary steps. As Dave and I replied to James in the cover-letter thread, adding a systemd service of kexec loading, monitor hotplug uevent, reload it if any hot remove happened. This is quite easy to do, I don't see any problem with it, and why we don't do like this. My personal opinion, please tell if I miss anything. > > So, if we wanted to obsolete the kexec_load interface, _first_ there > needs to be a way to provide users with the existing functionality > they have already in place on 32-bit ARM - otherwise we're looking > at a userspace regression. Especially as kexec_file_load takes > precedence on some distro patched versions of the kexec tool, > irrespective of which interface the user requests of the tool. > > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-12 5:35 ` Baoquan He (?) @ 2020-04-12 8:08 ` Russell King - ARM Linux admin -1 siblings, 0 replies; 264+ messages in thread From: Russell King - ARM Linux admin @ 2020-04-12 8:08 UTC (permalink / raw) To: Baoquan He Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, linux-mm, James Morse, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On Sun, Apr 12, 2020 at 01:35:07PM +0800, Baoquan He wrote: > On 04/11/20 at 10:30am, Russell King - ARM Linux admin wrote: > > On Sat, Apr 11, 2020 at 11:44:14AM +0800, Baoquan He wrote: > > > Because We tend to use kexec_file_load more and improve/enhance it in the > > > future, and gradually obsolete the old kexec_load interface which this > > > patchset is trying to fix on. > > > > That's not going to happen; 32-bit ARM kexec uses the kexec_load > > interface rather than the kexec_file_load version, and I see no one > > with any interest in changing that - and there's users of the former. > > > > I don't see how it's possible to convert 32-bit ARM kexec to the > > kexec_file_load interface - this assumes that all you have are the > > kernel, initrd, and commandline, but on 32-bit ARM kexec, we have > > kernel, initrd and the dtb blob which the user can specify. > > Well, I understand what you said about 32-bit ARM support with only > kexec_old support thing. That's why I said we tend to obsolete it > 'GRADUALLY'. It's the existing users who are using kexec_load, and the > ARCHes which only has kexec_load, make us have to transfer to > kexec_file_load gradually. > > Comparing with kexec_load, kexec_file_load has only one disadvantage, > that is some ARCHes only have kexec_load. Otherwise, kexec_file_load > benefits kexec/kdump developping/maintaining very much. The loading job > of kexec_file_load is mostly done in kernel, we can get whatever we > want about kernel information very conveniently to do anything needed. > For the kexec_load interface, the loading job is mostly done in > userspace, we have to export kernel information to procfs, sysfs, etc, > then parse them in kexec_tools, finally passed it to kernel part of > kexec loading. > > The gradual obsoleting means we may only add > feature/improvement/enhancement to kexec_file_load. And if a bug fix is > needed for both kexec_load and kexec_file_load, and the fix is very > complicated, we may only fix it in kexec_file_load too. Kexec_file_load > interface is suggested to add if does't have, just port user space part > to kernel as x86/s390/arm64 have done. > > Surely, it doesn't mean we don't fix the critical/blocker bug with > kexec_load loading. We still try to do, just are not so eager. In the > existing product environment, the kexec_load is used, just keep using > it. Do we bother to change it to kexec_file_load, e.g in our RHEL7 > distros? Certainly not. But in our new product, we will change to use > kexec_file_load interface. I guess this is similar with arm64. The > advantage and benefit have been told in the 2nd paragraph. > > > As for 32-bit ARM, is it like the old product, we have many in-use systems > deployed in customers' laboratory? Wondering if ARM continues designing > new 32-bit ARM cpu, and some companies continue producing tons of 32-bit ARM > cpus. If yes, I think we need continue taking care of kexec_load if > 32-bit ARM can't convert to kexec_file_load. If not, it may be not a > barrier when we consider converting kexec_load to kexec_file_load in > other ARCHes. We just need keep using it, try to fix those critical/blocker > bug in kexec_load interface if encountered. > > Finally, comning back to this patchset itself, the issue James spotted > is not so ciritical, I would say. When I do kexec jumping, I will do > loading firstly, then trigge jumping. I can think of the case that > people may load kexec-ed kernel, then do something else, later she/he > triggers the kexec jumping. These are not necessary steps. As Dave and I > replied to James in the cover-letter thread, adding a systemd service of > kexec loading, monitor hotplug uevent, reload it if any hot remove > happened. This is quite easy to do, I don't see any problem with it, and > why we don't do like this. > > My personal opinion, please tell if I miss anything. All that opinion and hand waving about the benefits of the new interface is totally irrelevent for 32-bit ARM for the reasons I stated in my email to which you replied. Gradual obsolecence or not, the file interface can't be supported on 32-bit ARM as-is - it is totally inadequate and inferior as an API compared to the functionality we have with plain kexec_load. Without that point addressed, kexec_file_load is meaningless for 32-bit ARM. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-12 8:08 ` Russell King - ARM Linux admin 0 siblings, 0 replies; 264+ messages in thread From: Russell King - ARM Linux admin @ 2020-04-12 8:08 UTC (permalink / raw) To: Baoquan He Cc: David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, linux-mm, James Morse, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On Sun, Apr 12, 2020 at 01:35:07PM +0800, Baoquan He wrote: > On 04/11/20 at 10:30am, Russell King - ARM Linux admin wrote: > > On Sat, Apr 11, 2020 at 11:44:14AM +0800, Baoquan He wrote: > > > Because We tend to use kexec_file_load more and improve/enhance it in the > > > future, and gradually obsolete the old kexec_load interface which this > > > patchset is trying to fix on. > > > > That's not going to happen; 32-bit ARM kexec uses the kexec_load > > interface rather than the kexec_file_load version, and I see no one > > with any interest in changing that - and there's users of the former. > > > > I don't see how it's possible to convert 32-bit ARM kexec to the > > kexec_file_load interface - this assumes that all you have are the > > kernel, initrd, and commandline, but on 32-bit ARM kexec, we have > > kernel, initrd and the dtb blob which the user can specify. > > Well, I understand what you said about 32-bit ARM support with only > kexec_old support thing. That's why I said we tend to obsolete it > 'GRADUALLY'. It's the existing users who are using kexec_load, and the > ARCHes which only has kexec_load, make us have to transfer to > kexec_file_load gradually. > > Comparing with kexec_load, kexec_file_load has only one disadvantage, > that is some ARCHes only have kexec_load. Otherwise, kexec_file_load > benefits kexec/kdump developping/maintaining very much. The loading job > of kexec_file_load is mostly done in kernel, we can get whatever we > want about kernel information very conveniently to do anything needed. > For the kexec_load interface, the loading job is mostly done in > userspace, we have to export kernel information to procfs, sysfs, etc, > then parse them in kexec_tools, finally passed it to kernel part of > kexec loading. > > The gradual obsoleting means we may only add > feature/improvement/enhancement to kexec_file_load. And if a bug fix is > needed for both kexec_load and kexec_file_load, and the fix is very > complicated, we may only fix it in kexec_file_load too. Kexec_file_load > interface is suggested to add if does't have, just port user space part > to kernel as x86/s390/arm64 have done. > > Surely, it doesn't mean we don't fix the critical/blocker bug with > kexec_load loading. We still try to do, just are not so eager. In the > existing product environment, the kexec_load is used, just keep using > it. Do we bother to change it to kexec_file_load, e.g in our RHEL7 > distros? Certainly not. But in our new product, we will change to use > kexec_file_load interface. I guess this is similar with arm64. The > advantage and benefit have been told in the 2nd paragraph. > > > As for 32-bit ARM, is it like the old product, we have many in-use systems > deployed in customers' laboratory? Wondering if ARM continues designing > new 32-bit ARM cpu, and some companies continue producing tons of 32-bit ARM > cpus. If yes, I think we need continue taking care of kexec_load if > 32-bit ARM can't convert to kexec_file_load. If not, it may be not a > barrier when we consider converting kexec_load to kexec_file_load in > other ARCHes. We just need keep using it, try to fix those critical/blocker > bug in kexec_load interface if encountered. > > Finally, comning back to this patchset itself, the issue James spotted > is not so ciritical, I would say. When I do kexec jumping, I will do > loading firstly, then trigge jumping. I can think of the case that > people may load kexec-ed kernel, then do something else, later she/he > triggers the kexec jumping. These are not necessary steps. As Dave and I > replied to James in the cover-letter thread, adding a systemd service of > kexec loading, monitor hotplug uevent, reload it if any hot remove > happened. This is quite easy to do, I don't see any problem with it, and > why we don't do like this. > > My personal opinion, please tell if I miss anything. All that opinion and hand waving about the benefits of the new interface is totally irrelevent for 32-bit ARM for the reasons I stated in my email to which you replied. Gradual obsolecence or not, the file interface can't be supported on 32-bit ARM as-is - it is totally inadequate and inferior as an API compared to the functionality we have with plain kexec_load. Without that point addressed, kexec_file_load is meaningless for 32-bit ARM. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-12 8:08 ` Russell King - ARM Linux admin 0 siblings, 0 replies; 264+ messages in thread From: Russell King - ARM Linux admin @ 2020-04-12 8:08 UTC (permalink / raw) To: Baoquan He Cc: David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, linux-mm, James Morse, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On Sun, Apr 12, 2020 at 01:35:07PM +0800, Baoquan He wrote: > On 04/11/20 at 10:30am, Russell King - ARM Linux admin wrote: > > On Sat, Apr 11, 2020 at 11:44:14AM +0800, Baoquan He wrote: > > > Because We tend to use kexec_file_load more and improve/enhance it in the > > > future, and gradually obsolete the old kexec_load interface which this > > > patchset is trying to fix on. > > > > That's not going to happen; 32-bit ARM kexec uses the kexec_load > > interface rather than the kexec_file_load version, and I see no one > > with any interest in changing that - and there's users of the former. > > > > I don't see how it's possible to convert 32-bit ARM kexec to the > > kexec_file_load interface - this assumes that all you have are the > > kernel, initrd, and commandline, but on 32-bit ARM kexec, we have > > kernel, initrd and the dtb blob which the user can specify. > > Well, I understand what you said about 32-bit ARM support with only > kexec_old support thing. That's why I said we tend to obsolete it > 'GRADUALLY'. It's the existing users who are using kexec_load, and the > ARCHes which only has kexec_load, make us have to transfer to > kexec_file_load gradually. > > Comparing with kexec_load, kexec_file_load has only one disadvantage, > that is some ARCHes only have kexec_load. Otherwise, kexec_file_load > benefits kexec/kdump developping/maintaining very much. The loading job > of kexec_file_load is mostly done in kernel, we can get whatever we > want about kernel information very conveniently to do anything needed. > For the kexec_load interface, the loading job is mostly done in > userspace, we have to export kernel information to procfs, sysfs, etc, > then parse them in kexec_tools, finally passed it to kernel part of > kexec loading. > > The gradual obsoleting means we may only add > feature/improvement/enhancement to kexec_file_load. And if a bug fix is > needed for both kexec_load and kexec_file_load, and the fix is very > complicated, we may only fix it in kexec_file_load too. Kexec_file_load > interface is suggested to add if does't have, just port user space part > to kernel as x86/s390/arm64 have done. > > Surely, it doesn't mean we don't fix the critical/blocker bug with > kexec_load loading. We still try to do, just are not so eager. In the > existing product environment, the kexec_load is used, just keep using > it. Do we bother to change it to kexec_file_load, e.g in our RHEL7 > distros? Certainly not. But in our new product, we will change to use > kexec_file_load interface. I guess this is similar with arm64. The > advantage and benefit have been told in the 2nd paragraph. > > > As for 32-bit ARM, is it like the old product, we have many in-use systems > deployed in customers' laboratory? Wondering if ARM continues designing > new 32-bit ARM cpu, and some companies continue producing tons of 32-bit ARM > cpus. If yes, I think we need continue taking care of kexec_load if > 32-bit ARM can't convert to kexec_file_load. If not, it may be not a > barrier when we consider converting kexec_load to kexec_file_load in > other ARCHes. We just need keep using it, try to fix those critical/blocker > bug in kexec_load interface if encountered. > > Finally, comning back to this patchset itself, the issue James spotted > is not so ciritical, I would say. When I do kexec jumping, I will do > loading firstly, then trigge jumping. I can think of the case that > people may load kexec-ed kernel, then do something else, later she/he > triggers the kexec jumping. These are not necessary steps. As Dave and I > replied to James in the cover-letter thread, adding a systemd service of > kexec loading, monitor hotplug uevent, reload it if any hot remove > happened. This is quite easy to do, I don't see any problem with it, and > why we don't do like this. > > My personal opinion, please tell if I miss anything. All that opinion and hand waving about the benefits of the new interface is totally irrelevent for 32-bit ARM for the reasons I stated in my email to which you replied. Gradual obsolecence or not, the file interface can't be supported on 32-bit ARM as-is - it is totally inadequate and inferior as an API compared to the functionality we have with plain kexec_load. Without that point addressed, kexec_file_load is meaningless for 32-bit ARM. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-12 8:08 ` Russell King - ARM Linux admin (?) @ 2020-04-12 19:52 ` Eric W. Biederman -1 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-12 19:52 UTC (permalink / raw) To: Russell King - ARM Linux admin Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel The only benefit of kexec_file_load is that it is simple enough from a kernel perspective that signatures can be checked. kexec_load in every other respect is the more capable and functional interface. It makes no sense to get rid of it. It does make sense to reload with a loaded kernel on memory hotplug. That is simple and easy. If we are going to handle something in the kernel it should simple an automated unloading of the kernel on memory hotplug. I think it would be irresponsible to deprecate kexec_load on any platform. I also suspect that kexec_file_load could be taught to copy the dtb on arm32 if someone wants to deal with signatures. We definitely can not even think of deprecating kexec_load until architecture that supports it also supports kexec_file_load and everyone is happy with that interface. That is Linus's no regression rule. Eric ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-12 19:52 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-12 19:52 UTC (permalink / raw) To: Russell King - ARM Linux admin Cc: Baoquan He, David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel The only benefit of kexec_file_load is that it is simple enough from a kernel perspective that signatures can be checked. kexec_load in every other respect is the more capable and functional interface. It makes no sense to get rid of it. It does make sense to reload with a loaded kernel on memory hotplug. That is simple and easy. If we are going to handle something in the kernel it should simple an automated unloading of the kernel on memory hotplug. I think it would be irresponsible to deprecate kexec_load on any platform. I also suspect that kexec_file_load could be taught to copy the dtb on arm32 if someone wants to deal with signatures. We definitely can not even think of deprecating kexec_load until architecture that supports it also supports kexec_file_load and everyone is happy with that interface. That is Linus's no regression rule. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-12 19:52 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-12 19:52 UTC (permalink / raw) To: Russell King - ARM Linux admin Cc: Baoquan He, David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel The only benefit of kexec_file_load is that it is simple enough from a kernel perspective that signatures can be checked. kexec_load in every other respect is the more capable and functional interface. It makes no sense to get rid of it. It does make sense to reload with a loaded kernel on memory hotplug. That is simple and easy. If we are going to handle something in the kernel it should simple an automated unloading of the kernel on memory hotplug. I think it would be irresponsible to deprecate kexec_load on any platform. I also suspect that kexec_file_load could be taught to copy the dtb on arm32 if someone wants to deal with signatures. We definitely can not even think of deprecating kexec_load until architecture that supports it also supports kexec_file_load and everyone is happy with that interface. That is Linus's no regression rule. Eric _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-12 19:52 ` Eric W. Biederman (?) @ 2020-04-12 20:37 ` Bhupesh SHARMA -1 siblings, 0 replies; 264+ messages in thread From: Bhupesh SHARMA @ 2020-04-12 20:37 UTC (permalink / raw) To: Eric W. Biederman Cc: Russell King - ARM Linux admin, Baoquan He, David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On Mon, Apr 13, 2020 at 1:26 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > > The only benefit of kexec_file_load is that it is simple enough from a > kernel perspective that signatures can be checked. > > kexec_load in every other respect is the more capable and functional > interface. It makes no sense to get rid of it. > > It does make sense to reload with a loaded kernel on memory hotplug. > That is simple and easy. If we are going to handle something in the > kernel it should simple an automated unloading of the kernel on memory > hotplug. > > > I think it would be irresponsible to deprecate kexec_load on any > platform. > > I also suspect that kexec_file_load could be taught to copy the dtb > on arm32 if someone wants to deal with signatures. > > We definitely can not even think of deprecating kexec_load until > architecture that supports it also supports kexec_file_load and everyone > is happy with that interface. That is Linus's no regression rule. TBH, I have seen several active users of kexec_load on arm32 environments and we have been trying to help them with kexec issues on arm32 in recent past as well. So, I agree with Eric's view that probably deprecating this in favour of kexec_file_load will break these existing environment. I tried to do some work at the start of this year to add kexec_file_load support for arm32 in my spare cycles, but I gave up as the arm32 hardware had a broken firmware and couldn't boot latest upstream kernel. May be I try to find some spare cycles in the coming days to do it. But I think since kexec_load is an important interface on these arm32 boards for supporting existing kexec-based bootloaders, we should continue supporting the same until kexec_file_load is supported/mature enough for arm32. Thanks, Bhupesh ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-12 20:37 ` Bhupesh SHARMA 0 siblings, 0 replies; 264+ messages in thread From: Bhupesh SHARMA @ 2020-04-12 20:37 UTC (permalink / raw) To: Eric W. Biederman Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On Mon, Apr 13, 2020 at 1:26 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > > The only benefit of kexec_file_load is that it is simple enough from a > kernel perspective that signatures can be checked. > > kexec_load in every other respect is the more capable and functional > interface. It makes no sense to get rid of it. > > It does make sense to reload with a loaded kernel on memory hotplug. > That is simple and easy. If we are going to handle something in the > kernel it should simple an automated unloading of the kernel on memory > hotplug. > > > I think it would be irresponsible to deprecate kexec_load on any > platform. > > I also suspect that kexec_file_load could be taught to copy the dtb > on arm32 if someone wants to deal with signatures. > > We definitely can not even think of deprecating kexec_load until > architecture that supports it also supports kexec_file_load and everyone > is happy with that interface. That is Linus's no regression rule. TBH, I have seen several active users of kexec_load on arm32 environments and we have been trying to help them with kexec issues on arm32 in recent past as well. So, I agree with Eric's view that probably deprecating this in favour of kexec_file_load will break these existing environment. I tried to do some work at the start of this year to add kexec_file_load support for arm32 in my spare cycles, but I gave up as the arm32 hardware had a broken firmware and couldn't boot latest upstream kernel. May be I try to find some spare cycles in the coming days to do it. But I think since kexec_load is an important interface on these arm32 boards for supporting existing kexec-based bootloaders, we should continue supporting the same until kexec_file_load is supported/mature enough for arm32. Thanks, Bhupesh _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-12 20:37 ` Bhupesh SHARMA 0 siblings, 0 replies; 264+ messages in thread From: Bhupesh SHARMA @ 2020-04-12 20:37 UTC (permalink / raw) To: Eric W. Biederman Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On Mon, Apr 13, 2020 at 1:26 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > > The only benefit of kexec_file_load is that it is simple enough from a > kernel perspective that signatures can be checked. > > kexec_load in every other respect is the more capable and functional > interface. It makes no sense to get rid of it. > > It does make sense to reload with a loaded kernel on memory hotplug. > That is simple and easy. If we are going to handle something in the > kernel it should simple an automated unloading of the kernel on memory > hotplug. > > > I think it would be irresponsible to deprecate kexec_load on any > platform. > > I also suspect that kexec_file_load could be taught to copy the dtb > on arm32 if someone wants to deal with signatures. > > We definitely can not even think of deprecating kexec_load until > architecture that supports it also supports kexec_file_load and everyone > is happy with that interface. That is Linus's no regression rule. TBH, I have seen several active users of kexec_load on arm32 environments and we have been trying to help them with kexec issues on arm32 in recent past as well. So, I agree with Eric's view that probably deprecating this in favour of kexec_file_load will break these existing environment. I tried to do some work at the start of this year to add kexec_file_load support for arm32 in my spare cycles, but I gave up as the arm32 hardware had a broken firmware and couldn't boot latest upstream kernel. May be I try to find some spare cycles in the coming days to do it. But I think since kexec_load is an important interface on these arm32 boards for supporting existing kexec-based bootloaders, we should continue supporting the same until kexec_file_load is supported/mature enough for arm32. Thanks, Bhupesh _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-12 19:52 ` Eric W. Biederman (?) @ 2020-04-13 2:37 ` Baoquan He -1 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-13 2:37 UTC (permalink / raw) To: Eric W. Biederman Cc: Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > > The only benefit of kexec_file_load is that it is simple enough from a > kernel perspective that signatures can be checked. We don't have this restriction any more with below commit: commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE") With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both secure boot or legacy system for kexec/kdump. Being simple enough is enough to astract and convince us to use it instead. And kexec_file_load has been in use for several years on systems with secure boot, since added in 2014, on x86_64. > > kexec_load in every other respect is the more capable and functional > interface. It makes no sense to get rid of it. > > It does make sense to reload with a loaded kernel on memory hotplug. > That is simple and easy. If we are going to handle something in the > kernel it should simple an automated unloading of the kernel on memory > hotplug. > > > I think it would be irresponsible to deprecate kexec_load on any > platform. > > I also suspect that kexec_file_load could be taught to copy the dtb > on arm32 if someone wants to deal with signatures. > > We definitely can not even think of deprecating kexec_load until > architecture that supports it also supports kexec_file_load and everyone > is happy with that interface. That is Linus's no regression rule. I should pick a milder word to express our tendency and tell our plan then 'obsolete'. Even though I added 'gradually', seems it doesn't help much. I didn't mean to say 'deprecate' at all when replied. The situation and trend I understand about kexec_load and kexec_file_load are: 1) Supporting kexec_file_load is suggested to add in ARCHes which don't have yet, just as x86_64, arm64 and s390 have done; 2) kexec_file_load is suggested to use, and take precedence over kexec_load in the future, if both are supported in one ARCH. 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, and by ARCHes for back compatibility w/ kexec_file_load support. For 1) and 2), I think the reason is obvious as Eric said, kexec_file_load is simple enough. And currently, whenever we got a bug report, we may need fix them twice, for kexec_load and kexec_file_load. If kexec_file_load is made by default, e.g on x86_64, we will change it in kernel space only, for kexec_file_load. This is what I meant about 'obsolete gradually'. I think for arm64, s390, they will do these too. Unless there's some critical/blocker bug in kexec_load, to corrupt the old kexec_load interface in old product. For 3), people can still use kexec_load and develop/fix for it, if no kexec_file_load supported. But 32-bit arm should be a different one, more like i386, we will leave it as is, and fix anything which could break it. But people really expects to improve or add feature to it? E.g in this patchset, the mem hotplug issue James raised, I assume James is focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in another reply, people even don't agree to continue supporting memory hotplug on 32-bit system. We ever took effort to fix a memory hotplug bug on i386 with a patch, but people would rather set it as BROKEN. ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-13 2:37 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-13 2:37 UTC (permalink / raw) To: Eric W. Biederman Cc: David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > > The only benefit of kexec_file_load is that it is simple enough from a > kernel perspective that signatures can be checked. We don't have this restriction any more with below commit: commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE") With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both secure boot or legacy system for kexec/kdump. Being simple enough is enough to astract and convince us to use it instead. And kexec_file_load has been in use for several years on systems with secure boot, since added in 2014, on x86_64. > > kexec_load in every other respect is the more capable and functional > interface. It makes no sense to get rid of it. > > It does make sense to reload with a loaded kernel on memory hotplug. > That is simple and easy. If we are going to handle something in the > kernel it should simple an automated unloading of the kernel on memory > hotplug. > > > I think it would be irresponsible to deprecate kexec_load on any > platform. > > I also suspect that kexec_file_load could be taught to copy the dtb > on arm32 if someone wants to deal with signatures. > > We definitely can not even think of deprecating kexec_load until > architecture that supports it also supports kexec_file_load and everyone > is happy with that interface. That is Linus's no regression rule. I should pick a milder word to express our tendency and tell our plan then 'obsolete'. Even though I added 'gradually', seems it doesn't help much. I didn't mean to say 'deprecate' at all when replied. The situation and trend I understand about kexec_load and kexec_file_load are: 1) Supporting kexec_file_load is suggested to add in ARCHes which don't have yet, just as x86_64, arm64 and s390 have done; 2) kexec_file_load is suggested to use, and take precedence over kexec_load in the future, if both are supported in one ARCH. 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, and by ARCHes for back compatibility w/ kexec_file_load support. For 1) and 2), I think the reason is obvious as Eric said, kexec_file_load is simple enough. And currently, whenever we got a bug report, we may need fix them twice, for kexec_load and kexec_file_load. If kexec_file_load is made by default, e.g on x86_64, we will change it in kernel space only, for kexec_file_load. This is what I meant about 'obsolete gradually'. I think for arm64, s390, they will do these too. Unless there's some critical/blocker bug in kexec_load, to corrupt the old kexec_load interface in old product. For 3), people can still use kexec_load and develop/fix for it, if no kexec_file_load supported. But 32-bit arm should be a different one, more like i386, we will leave it as is, and fix anything which could break it. But people really expects to improve or add feature to it? E.g in this patchset, the mem hotplug issue James raised, I assume James is focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in another reply, people even don't agree to continue supporting memory hotplug on 32-bit system. We ever took effort to fix a memory hotplug bug on i386 with a patch, but people would rather set it as BROKEN. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-13 2:37 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-13 2:37 UTC (permalink / raw) To: Eric W. Biederman Cc: David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > > The only benefit of kexec_file_load is that it is simple enough from a > kernel perspective that signatures can be checked. We don't have this restriction any more with below commit: commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE") With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both secure boot or legacy system for kexec/kdump. Being simple enough is enough to astract and convince us to use it instead. And kexec_file_load has been in use for several years on systems with secure boot, since added in 2014, on x86_64. > > kexec_load in every other respect is the more capable and functional > interface. It makes no sense to get rid of it. > > It does make sense to reload with a loaded kernel on memory hotplug. > That is simple and easy. If we are going to handle something in the > kernel it should simple an automated unloading of the kernel on memory > hotplug. > > > I think it would be irresponsible to deprecate kexec_load on any > platform. > > I also suspect that kexec_file_load could be taught to copy the dtb > on arm32 if someone wants to deal with signatures. > > We definitely can not even think of deprecating kexec_load until > architecture that supports it also supports kexec_file_load and everyone > is happy with that interface. That is Linus's no regression rule. I should pick a milder word to express our tendency and tell our plan then 'obsolete'. Even though I added 'gradually', seems it doesn't help much. I didn't mean to say 'deprecate' at all when replied. The situation and trend I understand about kexec_load and kexec_file_load are: 1) Supporting kexec_file_load is suggested to add in ARCHes which don't have yet, just as x86_64, arm64 and s390 have done; 2) kexec_file_load is suggested to use, and take precedence over kexec_load in the future, if both are supported in one ARCH. 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, and by ARCHes for back compatibility w/ kexec_file_load support. For 1) and 2), I think the reason is obvious as Eric said, kexec_file_load is simple enough. And currently, whenever we got a bug report, we may need fix them twice, for kexec_load and kexec_file_load. If kexec_file_load is made by default, e.g on x86_64, we will change it in kernel space only, for kexec_file_load. This is what I meant about 'obsolete gradually'. I think for arm64, s390, they will do these too. Unless there's some critical/blocker bug in kexec_load, to corrupt the old kexec_load interface in old product. For 3), people can still use kexec_load and develop/fix for it, if no kexec_file_load supported. But 32-bit arm should be a different one, more like i386, we will leave it as is, and fix anything which could break it. But people really expects to improve or add feature to it? E.g in this patchset, the mem hotplug issue James raised, I assume James is focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in another reply, people even don't agree to continue supporting memory hotplug on 32-bit system. We ever took effort to fix a memory hotplug bug on i386 with a patch, but people would rather set it as BROKEN. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-13 2:37 ` Baoquan He (?) @ 2020-04-13 13:15 ` Eric W. Biederman -1 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-13 13:15 UTC (permalink / raw) To: Baoquan He Cc: Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel Baoquan He <bhe@redhat.com> writes: > On 04/12/20 at 02:52pm, Eric W. Biederman wrote: >> >> The only benefit of kexec_file_load is that it is simple enough from a >> kernel perspective that signatures can be checked. > > We don't have this restriction any more with below commit: > > commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > and KEXEC_SIG_FORCE") > > With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > secure boot or legacy system for kexec/kdump. Being simple enough is > enough to astract and convince us to use it instead. And kexec_file_load > has been in use for several years on systems with secure boot, since > added in 2014, on x86_64. No. Actaully kexec_file_load is the less capable interface, and less flexible interface. Which is why it is appropriate for signature verification. >> kexec_load in every other respect is the more capable and functional >> interface. It makes no sense to get rid of it. >> >> It does make sense to reload with a loaded kernel on memory hotplug. >> That is simple and easy. If we are going to handle something in the >> kernel it should simple an automated unloading of the kernel on memory >> hotplug. >> >> >> I think it would be irresponsible to deprecate kexec_load on any >> platform. >> >> I also suspect that kexec_file_load could be taught to copy the dtb >> on arm32 if someone wants to deal with signatures. >> >> We definitely can not even think of deprecating kexec_load until >> architecture that supports it also supports kexec_file_load and everyone >> is happy with that interface. That is Linus's no regression rule. > > I should pick a milder word to express our tendency and tell our plan > then 'obsolete'. Even though I added 'gradually', seems it doesn't help > much. I didn't mean to say 'deprecate' at all when replied. > > The situation and trend I understand about kexec_load and kexec_file_load > are: > > 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > have yet, just as x86_64, arm64 and s390 have done; > > 2) kexec_file_load is suggested to use, and take precedence over > kexec_load in the future, if both are supported in one ARCH. The deep problem is that kexec_file_load is distinctly less expressive than kexec_load. > 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > and by ARCHes for back compatibility w/ kexec_file_load support. > > For 1) and 2), I think the reason is obvious as Eric said, > kexec_file_load is simple enough. And currently, whenever we got a bug > report, we may need fix them twice, for kexec_load and kexec_file_load. > If kexec_file_load is made by default, e.g on x86_64, we will change it > in kernel space only, for kexec_file_load. This is what I meant about > 'obsolete gradually'. I think for arm64, s390, they will do these too. > Unless there's some critical/blocker bug in kexec_load, to corrupt the > old kexec_load interface in old product. Maybe. The code that kexec_file_load sucked into the kernel is quite stable and rarely needs changes except during a port of kexec to another architecture. Last I looked the real maintenance effor of kexec and kexec on panic was in the drivers. So I don't think we can use maintenance to do anything. > For 3), people can still use kexec_load and develop/fix for it, if no > kexec_file_load supported. But 32-bit arm should be a different one, > more like i386, we will leave it as is, and fix anything which could > break it. But people really expects to improve or add feature to it? E.g > in this patchset, the mem hotplug issue James raised, I assume James is > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > another reply, people even don't agree to continue supporting memory > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > bug on i386 with a patch, but people would rather set it as BROKEN. For memory hotplug just reload. Userspace already gets good events. We should not expect anything except a panic kernel to be loaded over a memory hotplug event. The kexec on panic code should actually be loaded in a location that we don't reliquish if asked for it. Quite frankly at this point I would love to see the signature fad die, which would allow us to remove kexec_file_load. I still have not seen the signature code used anywhere except by people anticipating trouble. Given that Microsoft has already directly signed a malicous bootloader. (Not in the Linux ecosystem). I don't even know if any of the reasons for having kexec_file_load are legtimate. If someone wants to do the work and ensure everything that is possible to load with kexec_load is possible to load with kexec_file_load. Kernels supporting the multi-boot protocol etc. Then we can consider deprecating kexec_load. I think it took me about 15 years to remove the sysctl system call and it only ever had about 10 users. If you want to go through that kind of work to make certain there are no more users and that everything they could do with the old interface is doable with the new interface then please be my guest. Until then we need to fully support kexec_load. Eric ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-13 13:15 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-13 13:15 UTC (permalink / raw) To: Baoquan He Cc: David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel Baoquan He <bhe@redhat.com> writes: > On 04/12/20 at 02:52pm, Eric W. Biederman wrote: >> >> The only benefit of kexec_file_load is that it is simple enough from a >> kernel perspective that signatures can be checked. > > We don't have this restriction any more with below commit: > > commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > and KEXEC_SIG_FORCE") > > With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > secure boot or legacy system for kexec/kdump. Being simple enough is > enough to astract and convince us to use it instead. And kexec_file_load > has been in use for several years on systems with secure boot, since > added in 2014, on x86_64. No. Actaully kexec_file_load is the less capable interface, and less flexible interface. Which is why it is appropriate for signature verification. >> kexec_load in every other respect is the more capable and functional >> interface. It makes no sense to get rid of it. >> >> It does make sense to reload with a loaded kernel on memory hotplug. >> That is simple and easy. If we are going to handle something in the >> kernel it should simple an automated unloading of the kernel on memory >> hotplug. >> >> >> I think it would be irresponsible to deprecate kexec_load on any >> platform. >> >> I also suspect that kexec_file_load could be taught to copy the dtb >> on arm32 if someone wants to deal with signatures. >> >> We definitely can not even think of deprecating kexec_load until >> architecture that supports it also supports kexec_file_load and everyone >> is happy with that interface. That is Linus's no regression rule. > > I should pick a milder word to express our tendency and tell our plan > then 'obsolete'. Even though I added 'gradually', seems it doesn't help > much. I didn't mean to say 'deprecate' at all when replied. > > The situation and trend I understand about kexec_load and kexec_file_load > are: > > 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > have yet, just as x86_64, arm64 and s390 have done; > > 2) kexec_file_load is suggested to use, and take precedence over > kexec_load in the future, if both are supported in one ARCH. The deep problem is that kexec_file_load is distinctly less expressive than kexec_load. > 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > and by ARCHes for back compatibility w/ kexec_file_load support. > > For 1) and 2), I think the reason is obvious as Eric said, > kexec_file_load is simple enough. And currently, whenever we got a bug > report, we may need fix them twice, for kexec_load and kexec_file_load. > If kexec_file_load is made by default, e.g on x86_64, we will change it > in kernel space only, for kexec_file_load. This is what I meant about > 'obsolete gradually'. I think for arm64, s390, they will do these too. > Unless there's some critical/blocker bug in kexec_load, to corrupt the > old kexec_load interface in old product. Maybe. The code that kexec_file_load sucked into the kernel is quite stable and rarely needs changes except during a port of kexec to another architecture. Last I looked the real maintenance effor of kexec and kexec on panic was in the drivers. So I don't think we can use maintenance to do anything. > For 3), people can still use kexec_load and develop/fix for it, if no > kexec_file_load supported. But 32-bit arm should be a different one, > more like i386, we will leave it as is, and fix anything which could > break it. But people really expects to improve or add feature to it? E.g > in this patchset, the mem hotplug issue James raised, I assume James is > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > another reply, people even don't agree to continue supporting memory > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > bug on i386 with a patch, but people would rather set it as BROKEN. For memory hotplug just reload. Userspace already gets good events. We should not expect anything except a panic kernel to be loaded over a memory hotplug event. The kexec on panic code should actually be loaded in a location that we don't reliquish if asked for it. Quite frankly at this point I would love to see the signature fad die, which would allow us to remove kexec_file_load. I still have not seen the signature code used anywhere except by people anticipating trouble. Given that Microsoft has already directly signed a malicous bootloader. (Not in the Linux ecosystem). I don't even know if any of the reasons for having kexec_file_load are legtimate. If someone wants to do the work and ensure everything that is possible to load with kexec_load is possible to load with kexec_file_load. Kernels supporting the multi-boot protocol etc. Then we can consider deprecating kexec_load. I think it took me about 15 years to remove the sysctl system call and it only ever had about 10 users. If you want to go through that kind of work to make certain there are no more users and that everything they could do with the old interface is doable with the new interface then please be my guest. Until then we need to fully support kexec_load. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-13 13:15 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-13 13:15 UTC (permalink / raw) To: Baoquan He Cc: David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel Baoquan He <bhe@redhat.com> writes: > On 04/12/20 at 02:52pm, Eric W. Biederman wrote: >> >> The only benefit of kexec_file_load is that it is simple enough from a >> kernel perspective that signatures can be checked. > > We don't have this restriction any more with below commit: > > commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > and KEXEC_SIG_FORCE") > > With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > secure boot or legacy system for kexec/kdump. Being simple enough is > enough to astract and convince us to use it instead. And kexec_file_load > has been in use for several years on systems with secure boot, since > added in 2014, on x86_64. No. Actaully kexec_file_load is the less capable interface, and less flexible interface. Which is why it is appropriate for signature verification. >> kexec_load in every other respect is the more capable and functional >> interface. It makes no sense to get rid of it. >> >> It does make sense to reload with a loaded kernel on memory hotplug. >> That is simple and easy. If we are going to handle something in the >> kernel it should simple an automated unloading of the kernel on memory >> hotplug. >> >> >> I think it would be irresponsible to deprecate kexec_load on any >> platform. >> >> I also suspect that kexec_file_load could be taught to copy the dtb >> on arm32 if someone wants to deal with signatures. >> >> We definitely can not even think of deprecating kexec_load until >> architecture that supports it also supports kexec_file_load and everyone >> is happy with that interface. That is Linus's no regression rule. > > I should pick a milder word to express our tendency and tell our plan > then 'obsolete'. Even though I added 'gradually', seems it doesn't help > much. I didn't mean to say 'deprecate' at all when replied. > > The situation and trend I understand about kexec_load and kexec_file_load > are: > > 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > have yet, just as x86_64, arm64 and s390 have done; > > 2) kexec_file_load is suggested to use, and take precedence over > kexec_load in the future, if both are supported in one ARCH. The deep problem is that kexec_file_load is distinctly less expressive than kexec_load. > 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > and by ARCHes for back compatibility w/ kexec_file_load support. > > For 1) and 2), I think the reason is obvious as Eric said, > kexec_file_load is simple enough. And currently, whenever we got a bug > report, we may need fix them twice, for kexec_load and kexec_file_load. > If kexec_file_load is made by default, e.g on x86_64, we will change it > in kernel space only, for kexec_file_load. This is what I meant about > 'obsolete gradually'. I think for arm64, s390, they will do these too. > Unless there's some critical/blocker bug in kexec_load, to corrupt the > old kexec_load interface in old product. Maybe. The code that kexec_file_load sucked into the kernel is quite stable and rarely needs changes except during a port of kexec to another architecture. Last I looked the real maintenance effor of kexec and kexec on panic was in the drivers. So I don't think we can use maintenance to do anything. > For 3), people can still use kexec_load and develop/fix for it, if no > kexec_file_load supported. But 32-bit arm should be a different one, > more like i386, we will leave it as is, and fix anything which could > break it. But people really expects to improve or add feature to it? E.g > in this patchset, the mem hotplug issue James raised, I assume James is > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > another reply, people even don't agree to continue supporting memory > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > bug on i386 with a patch, but people would rather set it as BROKEN. For memory hotplug just reload. Userspace already gets good events. We should not expect anything except a panic kernel to be loaded over a memory hotplug event. The kexec on panic code should actually be loaded in a location that we don't reliquish if asked for it. Quite frankly at this point I would love to see the signature fad die, which would allow us to remove kexec_file_load. I still have not seen the signature code used anywhere except by people anticipating trouble. Given that Microsoft has already directly signed a malicous bootloader. (Not in the Linux ecosystem). I don't even know if any of the reasons for having kexec_file_load are legtimate. If someone wants to do the work and ensure everything that is possible to load with kexec_load is possible to load with kexec_file_load. Kernels supporting the multi-boot protocol etc. Then we can consider deprecating kexec_load. I think it took me about 15 years to remove the sysctl system call and it only ever had about 10 users. If you want to go through that kind of work to make certain there are no more users and that everything they could do with the old interface is doable with the new interface then please be my guest. Until then we need to fully support kexec_load. Eric _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-13 13:15 ` Eric W. Biederman (?) @ 2020-04-13 23:01 ` Andrew Morton -1 siblings, 0 replies; 264+ messages in thread From: Andrew Morton @ 2020-04-13 23:01 UTC (permalink / raw) To: Eric W. Biederman Cc: Baoquan He, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, linux-mm, James Morse, Will Deacon, linux-arm-kernel On Mon, 13 Apr 2020 08:15:23 -0500 ebiederm@xmission.com (Eric W. Biederman) wrote: > > For 3), people can still use kexec_load and develop/fix for it, if no > > kexec_file_load supported. But 32-bit arm should be a different one, > > more like i386, we will leave it as is, and fix anything which could > > break it. But people really expects to improve or add feature to it? E.g > > in this patchset, the mem hotplug issue James raised, I assume James is > > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > > another reply, people even don't agree to continue supporting memory > > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > > bug on i386 with a patch, but people would rather set it as BROKEN. > > For memory hotplug just reload. Userspace already gets good events. > > We should not expect anything except a panic kernel to be loaded over a > memory hotplug event. The kexec on panic code should actually be loaded > in a location that we don't reliquish if asked for it. Is that a nack for James's patchset? ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-13 23:01 ` Andrew Morton 0 siblings, 0 replies; 264+ messages in thread From: Andrew Morton @ 2020-04-13 23:01 UTC (permalink / raw) To: Eric W. Biederman Cc: Baoquan He, David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Will Deacon, linux-arm-kernel On Mon, 13 Apr 2020 08:15:23 -0500 ebiederm@xmission.com (Eric W. Biederman) wrote: > > For 3), people can still use kexec_load and develop/fix for it, if no > > kexec_file_load supported. But 32-bit arm should be a different one, > > more like i386, we will leave it as is, and fix anything which could > > break it. But people really expects to improve or add feature to it? E.g > > in this patchset, the mem hotplug issue James raised, I assume James is > > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > > another reply, people even don't agree to continue supporting memory > > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > > bug on i386 with a patch, but people would rather set it as BROKEN. > > For memory hotplug just reload. Userspace already gets good events. > > We should not expect anything except a panic kernel to be loaded over a > memory hotplug event. The kexec on panic code should actually be loaded > in a location that we don't reliquish if asked for it. Is that a nack for James's patchset? _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-13 23:01 ` Andrew Morton 0 siblings, 0 replies; 264+ messages in thread From: Andrew Morton @ 2020-04-13 23:01 UTC (permalink / raw) To: Eric W. Biederman Cc: Baoquan He, David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Will Deacon, linux-arm-kernel On Mon, 13 Apr 2020 08:15:23 -0500 ebiederm@xmission.com (Eric W. Biederman) wrote: > > For 3), people can still use kexec_load and develop/fix for it, if no > > kexec_file_load supported. But 32-bit arm should be a different one, > > more like i386, we will leave it as is, and fix anything which could > > break it. But people really expects to improve or add feature to it? E.g > > in this patchset, the mem hotplug issue James raised, I assume James is > > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > > another reply, people even don't agree to continue supporting memory > > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > > bug on i386 with a patch, but people would rather set it as BROKEN. > > For memory hotplug just reload. Userspace already gets good events. > > We should not expect anything except a panic kernel to be loaded over a > memory hotplug event. The kexec on panic code should actually be loaded > in a location that we don't reliquish if asked for it. Is that a nack for James's patchset? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-13 23:01 ` Andrew Morton (?) @ 2020-04-14 6:13 ` Eric W. Biederman -1 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-14 6:13 UTC (permalink / raw) To: Andrew Morton Cc: Baoquan He, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, linux-mm, James Morse, Will Deacon, linux-arm-kernel Andrew Morton <akpm@linux-foundation.org> writes: > On Mon, 13 Apr 2020 08:15:23 -0500 ebiederm@xmission.com (Eric W. Biederman) wrote: > >> > For 3), people can still use kexec_load and develop/fix for it, if no >> > kexec_file_load supported. But 32-bit arm should be a different one, >> > more like i386, we will leave it as is, and fix anything which could >> > break it. But people really expects to improve or add feature to it? E.g >> > in this patchset, the mem hotplug issue James raised, I assume James is >> > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in >> > another reply, people even don't agree to continue supporting memory >> > hotplug on 32-bit system. We ever took effort to fix a memory hotplug >> > bug on i386 with a patch, but people would rather set it as BROKEN. >> >> For memory hotplug just reload. Userspace already gets good events. >> >> We should not expect anything except a panic kernel to be loaded over a >> memory hotplug event. The kexec on panic code should actually be loaded >> in a location that we don't reliquish if asked for it. > > Is that a nack for James's patchset? I have just read the end of the thread and I have the sense that the patchset had already been rejected. I will see if I can go back and read the beginning. I was mostly reacting to the idea that you could stop maintaining an interface that people are actively using because there is a newer interface. Eric ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 6:13 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-14 6:13 UTC (permalink / raw) To: Andrew Morton Cc: Baoquan He, David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Will Deacon, linux-arm-kernel Andrew Morton <akpm@linux-foundation.org> writes: > On Mon, 13 Apr 2020 08:15:23 -0500 ebiederm@xmission.com (Eric W. Biederman) wrote: > >> > For 3), people can still use kexec_load and develop/fix for it, if no >> > kexec_file_load supported. But 32-bit arm should be a different one, >> > more like i386, we will leave it as is, and fix anything which could >> > break it. But people really expects to improve or add feature to it? E.g >> > in this patchset, the mem hotplug issue James raised, I assume James is >> > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in >> > another reply, people even don't agree to continue supporting memory >> > hotplug on 32-bit system. We ever took effort to fix a memory hotplug >> > bug on i386 with a patch, but people would rather set it as BROKEN. >> >> For memory hotplug just reload. Userspace already gets good events. >> >> We should not expect anything except a panic kernel to be loaded over a >> memory hotplug event. The kexec on panic code should actually be loaded >> in a location that we don't reliquish if asked for it. > > Is that a nack for James's patchset? I have just read the end of the thread and I have the sense that the patchset had already been rejected. I will see if I can go back and read the beginning. I was mostly reacting to the idea that you could stop maintaining an interface that people are actively using because there is a newer interface. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 6:13 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-14 6:13 UTC (permalink / raw) To: Andrew Morton Cc: Baoquan He, David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Will Deacon, linux-arm-kernel Andrew Morton <akpm@linux-foundation.org> writes: > On Mon, 13 Apr 2020 08:15:23 -0500 ebiederm@xmission.com (Eric W. Biederman) wrote: > >> > For 3), people can still use kexec_load and develop/fix for it, if no >> > kexec_file_load supported. But 32-bit arm should be a different one, >> > more like i386, we will leave it as is, and fix anything which could >> > break it. But people really expects to improve or add feature to it? E.g >> > in this patchset, the mem hotplug issue James raised, I assume James is >> > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in >> > another reply, people even don't agree to continue supporting memory >> > hotplug on 32-bit system. We ever took effort to fix a memory hotplug >> > bug on i386 with a patch, but people would rather set it as BROKEN. >> >> For memory hotplug just reload. Userspace already gets good events. >> >> We should not expect anything except a panic kernel to be loaded over a >> memory hotplug event. The kexec on panic code should actually be loaded >> in a location that we don't reliquish if asked for it. > > Is that a nack for James's patchset? I have just read the end of the thread and I have the sense that the patchset had already been rejected. I will see if I can go back and read the beginning. I was mostly reacting to the idea that you could stop maintaining an interface that people are actively using because there is a newer interface. Eric _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-13 13:15 ` Eric W. Biederman (?) @ 2020-04-14 6:40 ` Baoquan He -1 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-14 6:40 UTC (permalink / raw) To: Eric W. Biederman Cc: Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 04/13/20 at 08:15am, Eric W. Biederman wrote: > Baoquan He <bhe@redhat.com> writes: > > > On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > >> > >> The only benefit of kexec_file_load is that it is simple enough from a > >> kernel perspective that signatures can be checked. > > > > We don't have this restriction any more with below commit: > > > > commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > > and KEXEC_SIG_FORCE") > > > > With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > > secure boot or legacy system for kexec/kdump. Being simple enough is > > enough to astract and convince us to use it instead. And kexec_file_load > > has been in use for several years on systems with secure boot, since > > added in 2014, on x86_64. > > No. Actaully kexec_file_load is the less capable interface, and less > flexible interface. Which is why it is appropriate for signature > verification. Well, everyone has a stance and the corresponding view. You could have wider view from long time maintenance and in upstrem position, and think kexec_file_load is horrible. But I can only see from our work as a front line engineer to maintain/develop kexec/kdump in RHEL, and think kexec_file_load is easier to maintain. Surely except of multiple kernel image format support. No matter it is kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. This is produced from kerel building by default. We have no way to support it in our distros and add it into kexec_file_load. [RFC PATCH] x86/boot: make ELF kernel multiboot-able https://lkml.org/lkml/2017/2/15/654 > > >> kexec_load in every other respect is the more capable and functional > >> interface. It makes no sense to get rid of it. > >> > >> It does make sense to reload with a loaded kernel on memory hotplug. > >> That is simple and easy. If we are going to handle something in the > >> kernel it should simple an automated unloading of the kernel on memory > >> hotplug. > >> > >> > >> I think it would be irresponsible to deprecate kexec_load on any > >> platform. > >> > >> I also suspect that kexec_file_load could be taught to copy the dtb > >> on arm32 if someone wants to deal with signatures. > >> > >> We definitely can not even think of deprecating kexec_load until > >> architecture that supports it also supports kexec_file_load and everyone > >> is happy with that interface. That is Linus's no regression rule. > > > > I should pick a milder word to express our tendency and tell our plan > > then 'obsolete'. Even though I added 'gradually', seems it doesn't help > > much. I didn't mean to say 'deprecate' at all when replied. > > > > The situation and trend I understand about kexec_load and kexec_file_load > > are: > > > > 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > > have yet, just as x86_64, arm64 and s390 have done; > > > > 2) kexec_file_load is suggested to use, and take precedence over > > kexec_load in the future, if both are supported in one ARCH. > > The deep problem is that kexec_file_load is distinctly less expressive > than kexec_load. > > > 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > > and by ARCHes for back compatibility w/ kexec_file_load support. > > > > For 1) and 2), I think the reason is obvious as Eric said, > > kexec_file_load is simple enough. And currently, whenever we got a bug > > report, we may need fix them twice, for kexec_load and kexec_file_load. > > If kexec_file_load is made by default, e.g on x86_64, we will change it > > in kernel space only, for kexec_file_load. This is what I meant about > > 'obsolete gradually'. I think for arm64, s390, they will do these too. > > Unless there's some critical/blocker bug in kexec_load, to corrupt the > > old kexec_load interface in old product. > > Maybe. The code that kexec_file_load sucked into the kernel is quite > stable and rarely needs changes except during a port of kexec to > another architecture. > > Last I looked the real maintenance effor of kexec and kexec on panic was > in the drivers. So I don't think we can use maintenance to do anything. Not sure if I got it. But if check Lianbo's patches, a lot of effort has been taken to make SEV work well on kexec_file_load. And we have switched to use kexec_file_load in the newly published Fedora release on x86_64 by default. Before this, Lianbo has investigated and done many experiments to make sure the switching is safe. We finally made this decision. Next we will do the switch in Enterprise distros. Once these are proved safe, we will suggest customers to use kexec_file_load for kexec rebooting too. In the future, we will only care about kexec_file_load if everying is going well. But as I have explained repeatedly, only caring about kexec_file_load means we will leave kexec_load as is, we will not add new feature or improvement patches for it. commit 6a20bd54473e11011bf2b47efb52d0759d412854 Author: Lianbo Jiang <lijiang@redhat.com> Date: Thu Jan 16 13:47:35 2020 +0800 kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > > > For 3), people can still use kexec_load and develop/fix for it, if no > > kexec_file_load supported. But 32-bit arm should be a different one, > > more like i386, we will leave it as is, and fix anything which could > > break it. But people really expects to improve or add feature to it? E.g > > in this patchset, the mem hotplug issue James raised, I assume James is > > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > > another reply, people even don't agree to continue supporting memory > > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > > bug on i386 with a patch, but people would rather set it as BROKEN. > > For memory hotplug just reload. Userspace already gets good events. Kexec_file_load is easy to maintain. This is an example. Lock the hotplug area where kexed-ed kernel is targeted in this patchset, it's obviously not right. We can't disable memory hotplug just because kexec-ed kernel is loaded ahead of time. Reloading is also not a good fix. Kexec-ed kernel is targeted at a movable area, reloading can avoid kexec rebooting corruption if that area is hot removed. But if that area is not removed, locating kernel into the hotpluggable area will change the area into ummovable zone. Unless we decide to not support memory hotplug in kexec-ed kernel, I guess it's very hard. Now in our distros kexec rebooting has been supported, the big cloud providers are deploying linux in guest, bugs on kexec reboot failure has been reported. They need the memory hotplug to increase/decrease memory. The root cause is kexec-ed kernel is targeted at hotpluggable memory region. Just avoiding the movable area can fix it. In kexec_file_load(), just checking or picking those unmovable region to put kernel/initrd in function locate_mem_hole_callback() can fix it. The page or pageblock's zone is movable or not, it's easy to know. This fix doesn't need to bother other component. > > We should not expect anything except a panic kernel to be loaded over a > memory hotplug event. The kexec on panic code should actually be loaded > in a location that we don't reliquish if asked for it. > > Quite frankly at this point I would love to see the signature fad die, > which would allow us to remove kexec_file_load. I still have not seen > the signature code used anywhere except by people anticipating trouble. > > Given that Microsoft has already directly signed a malicous bootloader. > (Not in the Linux ecosystem). I don't even know if any of the reasons > for having kexec_file_load are legtimate. > > > If someone wants to do the work and ensure everything that is possible > to load with kexec_load is possible to load with kexec_file_load. > Kernels supporting the multi-boot protocol etc. Then we can consider > deprecating kexec_load. > > > I think it took me about 15 years to remove the sysctl system call and > it only ever had about 10 users. If you want to go through that kind of > work to make certain there are no more users and that everything they > could do with the old interface is doable with the new interface then > please be my guest. Until then we need to fully support kexec_load. I want to clarify again, we have no plan to deprecate kexec_load. We just plan to use kexec_file_load more in our distros, for both legacy system or system with secure boot. Eric, I am glad to see you told your opinion about kexec_file_load. Without the discussion in this thread, we may not know it. So I have one question, seems kexec_file_load will continue existing, the ARCHes our distros is supporting, x86_64, s390, ppc, arm64, all have kexec_file_load, do you object us to continue using kexec_file_load, for signature verification and normal kexec/kdump booting? Or you plan to deprecate kexec_file_load? ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 6:40 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-14 6:40 UTC (permalink / raw) To: Eric W. Biederman Cc: David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 04/13/20 at 08:15am, Eric W. Biederman wrote: > Baoquan He <bhe@redhat.com> writes: > > > On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > >> > >> The only benefit of kexec_file_load is that it is simple enough from a > >> kernel perspective that signatures can be checked. > > > > We don't have this restriction any more with below commit: > > > > commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > > and KEXEC_SIG_FORCE") > > > > With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > > secure boot or legacy system for kexec/kdump. Being simple enough is > > enough to astract and convince us to use it instead. And kexec_file_load > > has been in use for several years on systems with secure boot, since > > added in 2014, on x86_64. > > No. Actaully kexec_file_load is the less capable interface, and less > flexible interface. Which is why it is appropriate for signature > verification. Well, everyone has a stance and the corresponding view. You could have wider view from long time maintenance and in upstrem position, and think kexec_file_load is horrible. But I can only see from our work as a front line engineer to maintain/develop kexec/kdump in RHEL, and think kexec_file_load is easier to maintain. Surely except of multiple kernel image format support. No matter it is kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. This is produced from kerel building by default. We have no way to support it in our distros and add it into kexec_file_load. [RFC PATCH] x86/boot: make ELF kernel multiboot-able https://lkml.org/lkml/2017/2/15/654 > > >> kexec_load in every other respect is the more capable and functional > >> interface. It makes no sense to get rid of it. > >> > >> It does make sense to reload with a loaded kernel on memory hotplug. > >> That is simple and easy. If we are going to handle something in the > >> kernel it should simple an automated unloading of the kernel on memory > >> hotplug. > >> > >> > >> I think it would be irresponsible to deprecate kexec_load on any > >> platform. > >> > >> I also suspect that kexec_file_load could be taught to copy the dtb > >> on arm32 if someone wants to deal with signatures. > >> > >> We definitely can not even think of deprecating kexec_load until > >> architecture that supports it also supports kexec_file_load and everyone > >> is happy with that interface. That is Linus's no regression rule. > > > > I should pick a milder word to express our tendency and tell our plan > > then 'obsolete'. Even though I added 'gradually', seems it doesn't help > > much. I didn't mean to say 'deprecate' at all when replied. > > > > The situation and trend I understand about kexec_load and kexec_file_load > > are: > > > > 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > > have yet, just as x86_64, arm64 and s390 have done; > > > > 2) kexec_file_load is suggested to use, and take precedence over > > kexec_load in the future, if both are supported in one ARCH. > > The deep problem is that kexec_file_load is distinctly less expressive > than kexec_load. > > > 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > > and by ARCHes for back compatibility w/ kexec_file_load support. > > > > For 1) and 2), I think the reason is obvious as Eric said, > > kexec_file_load is simple enough. And currently, whenever we got a bug > > report, we may need fix them twice, for kexec_load and kexec_file_load. > > If kexec_file_load is made by default, e.g on x86_64, we will change it > > in kernel space only, for kexec_file_load. This is what I meant about > > 'obsolete gradually'. I think for arm64, s390, they will do these too. > > Unless there's some critical/blocker bug in kexec_load, to corrupt the > > old kexec_load interface in old product. > > Maybe. The code that kexec_file_load sucked into the kernel is quite > stable and rarely needs changes except during a port of kexec to > another architecture. > > Last I looked the real maintenance effor of kexec and kexec on panic was > in the drivers. So I don't think we can use maintenance to do anything. Not sure if I got it. But if check Lianbo's patches, a lot of effort has been taken to make SEV work well on kexec_file_load. And we have switched to use kexec_file_load in the newly published Fedora release on x86_64 by default. Before this, Lianbo has investigated and done many experiments to make sure the switching is safe. We finally made this decision. Next we will do the switch in Enterprise distros. Once these are proved safe, we will suggest customers to use kexec_file_load for kexec rebooting too. In the future, we will only care about kexec_file_load if everying is going well. But as I have explained repeatedly, only caring about kexec_file_load means we will leave kexec_load as is, we will not add new feature or improvement patches for it. commit 6a20bd54473e11011bf2b47efb52d0759d412854 Author: Lianbo Jiang <lijiang@redhat.com> Date: Thu Jan 16 13:47:35 2020 +0800 kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > > > For 3), people can still use kexec_load and develop/fix for it, if no > > kexec_file_load supported. But 32-bit arm should be a different one, > > more like i386, we will leave it as is, and fix anything which could > > break it. But people really expects to improve or add feature to it? E.g > > in this patchset, the mem hotplug issue James raised, I assume James is > > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > > another reply, people even don't agree to continue supporting memory > > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > > bug on i386 with a patch, but people would rather set it as BROKEN. > > For memory hotplug just reload. Userspace already gets good events. Kexec_file_load is easy to maintain. This is an example. Lock the hotplug area where kexed-ed kernel is targeted in this patchset, it's obviously not right. We can't disable memory hotplug just because kexec-ed kernel is loaded ahead of time. Reloading is also not a good fix. Kexec-ed kernel is targeted at a movable area, reloading can avoid kexec rebooting corruption if that area is hot removed. But if that area is not removed, locating kernel into the hotpluggable area will change the area into ummovable zone. Unless we decide to not support memory hotplug in kexec-ed kernel, I guess it's very hard. Now in our distros kexec rebooting has been supported, the big cloud providers are deploying linux in guest, bugs on kexec reboot failure has been reported. They need the memory hotplug to increase/decrease memory. The root cause is kexec-ed kernel is targeted at hotpluggable memory region. Just avoiding the movable area can fix it. In kexec_file_load(), just checking or picking those unmovable region to put kernel/initrd in function locate_mem_hole_callback() can fix it. The page or pageblock's zone is movable or not, it's easy to know. This fix doesn't need to bother other component. > > We should not expect anything except a panic kernel to be loaded over a > memory hotplug event. The kexec on panic code should actually be loaded > in a location that we don't reliquish if asked for it. > > Quite frankly at this point I would love to see the signature fad die, > which would allow us to remove kexec_file_load. I still have not seen > the signature code used anywhere except by people anticipating trouble. > > Given that Microsoft has already directly signed a malicous bootloader. > (Not in the Linux ecosystem). I don't even know if any of the reasons > for having kexec_file_load are legtimate. > > > If someone wants to do the work and ensure everything that is possible > to load with kexec_load is possible to load with kexec_file_load. > Kernels supporting the multi-boot protocol etc. Then we can consider > deprecating kexec_load. > > > I think it took me about 15 years to remove the sysctl system call and > it only ever had about 10 users. If you want to go through that kind of > work to make certain there are no more users and that everything they > could do with the old interface is doable with the new interface then > please be my guest. Until then we need to fully support kexec_load. I want to clarify again, we have no plan to deprecate kexec_load. We just plan to use kexec_file_load more in our distros, for both legacy system or system with secure boot. Eric, I am glad to see you told your opinion about kexec_file_load. Without the discussion in this thread, we may not know it. So I have one question, seems kexec_file_load will continue existing, the ARCHes our distros is supporting, x86_64, s390, ppc, arm64, all have kexec_file_load, do you object us to continue using kexec_file_load, for signature verification and normal kexec/kdump booting? Or you plan to deprecate kexec_file_load? _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 6:40 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-14 6:40 UTC (permalink / raw) To: Eric W. Biederman Cc: David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 04/13/20 at 08:15am, Eric W. Biederman wrote: > Baoquan He <bhe@redhat.com> writes: > > > On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > >> > >> The only benefit of kexec_file_load is that it is simple enough from a > >> kernel perspective that signatures can be checked. > > > > We don't have this restriction any more with below commit: > > > > commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > > and KEXEC_SIG_FORCE") > > > > With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > > secure boot or legacy system for kexec/kdump. Being simple enough is > > enough to astract and convince us to use it instead. And kexec_file_load > > has been in use for several years on systems with secure boot, since > > added in 2014, on x86_64. > > No. Actaully kexec_file_load is the less capable interface, and less > flexible interface. Which is why it is appropriate for signature > verification. Well, everyone has a stance and the corresponding view. You could have wider view from long time maintenance and in upstrem position, and think kexec_file_load is horrible. But I can only see from our work as a front line engineer to maintain/develop kexec/kdump in RHEL, and think kexec_file_load is easier to maintain. Surely except of multiple kernel image format support. No matter it is kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. This is produced from kerel building by default. We have no way to support it in our distros and add it into kexec_file_load. [RFC PATCH] x86/boot: make ELF kernel multiboot-able https://lkml.org/lkml/2017/2/15/654 > > >> kexec_load in every other respect is the more capable and functional > >> interface. It makes no sense to get rid of it. > >> > >> It does make sense to reload with a loaded kernel on memory hotplug. > >> That is simple and easy. If we are going to handle something in the > >> kernel it should simple an automated unloading of the kernel on memory > >> hotplug. > >> > >> > >> I think it would be irresponsible to deprecate kexec_load on any > >> platform. > >> > >> I also suspect that kexec_file_load could be taught to copy the dtb > >> on arm32 if someone wants to deal with signatures. > >> > >> We definitely can not even think of deprecating kexec_load until > >> architecture that supports it also supports kexec_file_load and everyone > >> is happy with that interface. That is Linus's no regression rule. > > > > I should pick a milder word to express our tendency and tell our plan > > then 'obsolete'. Even though I added 'gradually', seems it doesn't help > > much. I didn't mean to say 'deprecate' at all when replied. > > > > The situation and trend I understand about kexec_load and kexec_file_load > > are: > > > > 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > > have yet, just as x86_64, arm64 and s390 have done; > > > > 2) kexec_file_load is suggested to use, and take precedence over > > kexec_load in the future, if both are supported in one ARCH. > > The deep problem is that kexec_file_load is distinctly less expressive > than kexec_load. > > > 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > > and by ARCHes for back compatibility w/ kexec_file_load support. > > > > For 1) and 2), I think the reason is obvious as Eric said, > > kexec_file_load is simple enough. And currently, whenever we got a bug > > report, we may need fix them twice, for kexec_load and kexec_file_load. > > If kexec_file_load is made by default, e.g on x86_64, we will change it > > in kernel space only, for kexec_file_load. This is what I meant about > > 'obsolete gradually'. I think for arm64, s390, they will do these too. > > Unless there's some critical/blocker bug in kexec_load, to corrupt the > > old kexec_load interface in old product. > > Maybe. The code that kexec_file_load sucked into the kernel is quite > stable and rarely needs changes except during a port of kexec to > another architecture. > > Last I looked the real maintenance effor of kexec and kexec on panic was > in the drivers. So I don't think we can use maintenance to do anything. Not sure if I got it. But if check Lianbo's patches, a lot of effort has been taken to make SEV work well on kexec_file_load. And we have switched to use kexec_file_load in the newly published Fedora release on x86_64 by default. Before this, Lianbo has investigated and done many experiments to make sure the switching is safe. We finally made this decision. Next we will do the switch in Enterprise distros. Once these are proved safe, we will suggest customers to use kexec_file_load for kexec rebooting too. In the future, we will only care about kexec_file_load if everying is going well. But as I have explained repeatedly, only caring about kexec_file_load means we will leave kexec_load as is, we will not add new feature or improvement patches for it. commit 6a20bd54473e11011bf2b47efb52d0759d412854 Author: Lianbo Jiang <lijiang@redhat.com> Date: Thu Jan 16 13:47:35 2020 +0800 kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > > > For 3), people can still use kexec_load and develop/fix for it, if no > > kexec_file_load supported. But 32-bit arm should be a different one, > > more like i386, we will leave it as is, and fix anything which could > > break it. But people really expects to improve or add feature to it? E.g > > in this patchset, the mem hotplug issue James raised, I assume James is > > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > > another reply, people even don't agree to continue supporting memory > > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > > bug on i386 with a patch, but people would rather set it as BROKEN. > > For memory hotplug just reload. Userspace already gets good events. Kexec_file_load is easy to maintain. This is an example. Lock the hotplug area where kexed-ed kernel is targeted in this patchset, it's obviously not right. We can't disable memory hotplug just because kexec-ed kernel is loaded ahead of time. Reloading is also not a good fix. Kexec-ed kernel is targeted at a movable area, reloading can avoid kexec rebooting corruption if that area is hot removed. But if that area is not removed, locating kernel into the hotpluggable area will change the area into ummovable zone. Unless we decide to not support memory hotplug in kexec-ed kernel, I guess it's very hard. Now in our distros kexec rebooting has been supported, the big cloud providers are deploying linux in guest, bugs on kexec reboot failure has been reported. They need the memory hotplug to increase/decrease memory. The root cause is kexec-ed kernel is targeted at hotpluggable memory region. Just avoiding the movable area can fix it. In kexec_file_load(), just checking or picking those unmovable region to put kernel/initrd in function locate_mem_hole_callback() can fix it. The page or pageblock's zone is movable or not, it's easy to know. This fix doesn't need to bother other component. > > We should not expect anything except a panic kernel to be loaded over a > memory hotplug event. The kexec on panic code should actually be loaded > in a location that we don't reliquish if asked for it. > > Quite frankly at this point I would love to see the signature fad die, > which would allow us to remove kexec_file_load. I still have not seen > the signature code used anywhere except by people anticipating trouble. > > Given that Microsoft has already directly signed a malicous bootloader. > (Not in the Linux ecosystem). I don't even know if any of the reasons > for having kexec_file_load are legtimate. > > > If someone wants to do the work and ensure everything that is possible > to load with kexec_load is possible to load with kexec_file_load. > Kernels supporting the multi-boot protocol etc. Then we can consider > deprecating kexec_load. > > > I think it took me about 15 years to remove the sysctl system call and > it only ever had about 10 users. If you want to go through that kind of > work to make certain there are no more users and that everything they > could do with the old interface is doable with the new interface then > please be my guest. Until then we need to fully support kexec_load. I want to clarify again, we have no plan to deprecate kexec_load. We just plan to use kexec_file_load more in our distros, for both legacy system or system with secure boot. Eric, I am glad to see you told your opinion about kexec_file_load. Without the discussion in this thread, we may not know it. So I have one question, seems kexec_file_load will continue existing, the ARCHes our distros is supporting, x86_64, s390, ppc, arm64, all have kexec_file_load, do you object us to continue using kexec_file_load, for signature verification and normal kexec/kdump booting? Or you plan to deprecate kexec_file_load? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 6:40 ` Baoquan He (?) @ 2020-04-14 6:51 ` Baoquan He -1 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-14 6:51 UTC (permalink / raw) To: Eric W. Biederman Cc: Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 04/14/20 at 02:40pm, Baoquan He wrote: > On 04/13/20 at 08:15am, Eric W. Biederman wrote: > > Baoquan He <bhe@redhat.com> writes: > > > > > On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > > >> > > >> The only benefit of kexec_file_load is that it is simple enough from a > > >> kernel perspective that signatures can be checked. > > > > > > We don't have this restriction any more with below commit: > > > > > > commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > > > and KEXEC_SIG_FORCE") > > > > > > With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > > > secure boot or legacy system for kexec/kdump. Being simple enough is > > > enough to astract and convince us to use it instead. And kexec_file_load > > > has been in use for several years on systems with secure boot, since > > > added in 2014, on x86_64. > > > > No. Actaully kexec_file_load is the less capable interface, and less > > flexible interface. Which is why it is appropriate for signature > > verification. > > Well, everyone has a stance and the corresponding view. You could have > wider view from long time maintenance and in upstrem position, and think > kexec_file_load is horrible. But I can only see from our work as a front > line engineer to maintain/develop kexec/kdump in RHEL, and think > kexec_file_load is easier to maintain. > > Surely except of multiple kernel image format support. No matter it is > kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > This is produced from kerel building by default. We have no way to > support it in our distros and add it into kexec_file_load. > > [RFC PATCH] x86/boot: make ELF kernel multiboot-able > https://lkml.org/lkml/2017/2/15/654 > > > > > >> kexec_load in every other respect is the more capable and functional > > >> interface. It makes no sense to get rid of it. > > >> > > >> It does make sense to reload with a loaded kernel on memory hotplug. > > >> That is simple and easy. If we are going to handle something in the > > >> kernel it should simple an automated unloading of the kernel on memory > > >> hotplug. > > >> > > >> > > >> I think it would be irresponsible to deprecate kexec_load on any > > >> platform. > > >> > > >> I also suspect that kexec_file_load could be taught to copy the dtb > > >> on arm32 if someone wants to deal with signatures. > > >> > > >> We definitely can not even think of deprecating kexec_load until > > >> architecture that supports it also supports kexec_file_load and everyone > > >> is happy with that interface. That is Linus's no regression rule. > > > > > > I should pick a milder word to express our tendency and tell our plan > > > then 'obsolete'. Even though I added 'gradually', seems it doesn't help > > > much. I didn't mean to say 'deprecate' at all when replied. > > > > > > The situation and trend I understand about kexec_load and kexec_file_load > > > are: > > > > > > 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > > > have yet, just as x86_64, arm64 and s390 have done; > > > > > > 2) kexec_file_load is suggested to use, and take precedence over > > > kexec_load in the future, if both are supported in one ARCH. > > > > The deep problem is that kexec_file_load is distinctly less expressive > > than kexec_load. > > > > > 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > > > and by ARCHes for back compatibility w/ kexec_file_load support. > > > > > > For 1) and 2), I think the reason is obvious as Eric said, > > > kexec_file_load is simple enough. And currently, whenever we got a bug > > > report, we may need fix them twice, for kexec_load and kexec_file_load. > > > If kexec_file_load is made by default, e.g on x86_64, we will change it > > > in kernel space only, for kexec_file_load. This is what I meant about > > > 'obsolete gradually'. I think for arm64, s390, they will do these too. > > > Unless there's some critical/blocker bug in kexec_load, to corrupt the > > > old kexec_load interface in old product. > > > > Maybe. The code that kexec_file_load sucked into the kernel is quite > > stable and rarely needs changes except during a port of kexec to > > another architecture. > > > > Last I looked the real maintenance effor of kexec and kexec on panic was > > in the drivers. So I don't think we can use maintenance to do anything. > > Not sure if I got it. But if check Lianbo's patches, a lot of effort has > been taken to make SEV work well on kexec_file_load. And we have > switched to use kexec_file_load in the newly published Fedora release > on x86_64 by default. Before this, Lianbo has investigated and done many > experiments to make sure the switching is safe. We finally made this > decision. Next we will do the switch in Enterprise distros. Once these > are proved safe, we will suggest customers to use kexec_file_load for > kexec rebooting too. In the future, we will only care about > kexec_file_load if everying is going well. But as I have explained > repeatedly, only caring about kexec_file_load means we will leave > kexec_load as is, we will not add new feature or improvement patches > for it. > > commit 6a20bd54473e11011bf2b47efb52d0759d412854 > Author: Lianbo Jiang <lijiang@redhat.com> > Date: Thu Jan 16 13:47:35 2020 +0800 > > kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > > > > > > For 3), people can still use kexec_load and develop/fix for it, if no > > > kexec_file_load supported. But 32-bit arm should be a different one, > > > more like i386, we will leave it as is, and fix anything which could > > > break it. But people really expects to improve or add feature to it? E.g > > > in this patchset, the mem hotplug issue James raised, I assume James is > > > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > > > another reply, people even don't agree to continue supporting memory > > > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > > > bug on i386 with a patch, but people would rather set it as BROKEN. > > > > For memory hotplug just reload. Userspace already gets good events. > > Kexec_file_load is easy to maintain. This is an example. > > Lock the hotplug area where kexed-ed kernel is targeted in this patchset, > it's obviously not right. We can't disable memory hotplug just because > kexec-ed kernel is loaded ahead of time. > > Reloading is also not a good fix. Kexec-ed kernel is targeted at a > movable area, reloading can avoid kexec rebooting corruption if that > area is hot removed. But if that area is not removed, locating kernel > into the hotpluggable area will change the area into ummovable zone. Here I mean if kexec kernel is targeted at a hotplggable memory region, after kexec rebooting, that region will become unmovable. People can't hot remove it in kexec-ed kernel. > Unless we decide to not support memory hotplug in kexec-ed kernel, I > guess it's very hard. Now in our distros kexec rebooting has been > supported, the big cloud providers are deploying linux in guest, bugs on > kexec reboot failure has been reported. They need the memory hotplug to > increase/decrease memory. > > The root cause is kexec-ed kernel is targeted at hotpluggable memory > region. Just avoiding the movable area can fix it. In kexec_file_load(), > just checking or picking those unmovable region to put kernel/initrd in > function locate_mem_hole_callback() can fix it. The page or pageblock's > zone is movable or not, it's easy to know. This fix doesn't need to > bother other component. > > > > > We should not expect anything except a panic kernel to be loaded over a > > memory hotplug event. The kexec on panic code should actually be loaded > > in a location that we don't reliquish if asked for it. > > > > Quite frankly at this point I would love to see the signature fad die, > > which would allow us to remove kexec_file_load. I still have not seen > > the signature code used anywhere except by people anticipating trouble. > > > > Given that Microsoft has already directly signed a malicous bootloader. > > (Not in the Linux ecosystem). I don't even know if any of the reasons > > for having kexec_file_load are legtimate. > > > > > > If someone wants to do the work and ensure everything that is possible > > to load with kexec_load is possible to load with kexec_file_load. > > Kernels supporting the multi-boot protocol etc. Then we can consider > > deprecating kexec_load. > > > > > > I think it took me about 15 years to remove the sysctl system call and > > it only ever had about 10 users. If you want to go through that kind of > > work to make certain there are no more users and that everything they > > could do with the old interface is doable with the new interface then > > please be my guest. Until then we need to fully support kexec_load. > > I want to clarify again, we have no plan to deprecate kexec_load. > We just plan to use kexec_file_load more in our distros, for both legacy > system or system with secure boot. > > Eric, I am glad to see you told your opinion about kexec_file_load. > Without the discussion in this thread, we may not know it. So I have one > question, seems kexec_file_load will continue existing, the ARCHes our > distros is supporting, x86_64, s390, ppc, arm64, all have kexec_file_load, > do you object us to continue using kexec_file_load, for signature > verification and normal kexec/kdump booting? Or you plan to deprecate > kexec_file_load? ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 6:51 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-14 6:51 UTC (permalink / raw) To: Eric W. Biederman Cc: David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 04/14/20 at 02:40pm, Baoquan He wrote: > On 04/13/20 at 08:15am, Eric W. Biederman wrote: > > Baoquan He <bhe@redhat.com> writes: > > > > > On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > > >> > > >> The only benefit of kexec_file_load is that it is simple enough from a > > >> kernel perspective that signatures can be checked. > > > > > > We don't have this restriction any more with below commit: > > > > > > commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > > > and KEXEC_SIG_FORCE") > > > > > > With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > > > secure boot or legacy system for kexec/kdump. Being simple enough is > > > enough to astract and convince us to use it instead. And kexec_file_load > > > has been in use for several years on systems with secure boot, since > > > added in 2014, on x86_64. > > > > No. Actaully kexec_file_load is the less capable interface, and less > > flexible interface. Which is why it is appropriate for signature > > verification. > > Well, everyone has a stance and the corresponding view. You could have > wider view from long time maintenance and in upstrem position, and think > kexec_file_load is horrible. But I can only see from our work as a front > line engineer to maintain/develop kexec/kdump in RHEL, and think > kexec_file_load is easier to maintain. > > Surely except of multiple kernel image format support. No matter it is > kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > This is produced from kerel building by default. We have no way to > support it in our distros and add it into kexec_file_load. > > [RFC PATCH] x86/boot: make ELF kernel multiboot-able > https://lkml.org/lkml/2017/2/15/654 > > > > > >> kexec_load in every other respect is the more capable and functional > > >> interface. It makes no sense to get rid of it. > > >> > > >> It does make sense to reload with a loaded kernel on memory hotplug. > > >> That is simple and easy. If we are going to handle something in the > > >> kernel it should simple an automated unloading of the kernel on memory > > >> hotplug. > > >> > > >> > > >> I think it would be irresponsible to deprecate kexec_load on any > > >> platform. > > >> > > >> I also suspect that kexec_file_load could be taught to copy the dtb > > >> on arm32 if someone wants to deal with signatures. > > >> > > >> We definitely can not even think of deprecating kexec_load until > > >> architecture that supports it also supports kexec_file_load and everyone > > >> is happy with that interface. That is Linus's no regression rule. > > > > > > I should pick a milder word to express our tendency and tell our plan > > > then 'obsolete'. Even though I added 'gradually', seems it doesn't help > > > much. I didn't mean to say 'deprecate' at all when replied. > > > > > > The situation and trend I understand about kexec_load and kexec_file_load > > > are: > > > > > > 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > > > have yet, just as x86_64, arm64 and s390 have done; > > > > > > 2) kexec_file_load is suggested to use, and take precedence over > > > kexec_load in the future, if both are supported in one ARCH. > > > > The deep problem is that kexec_file_load is distinctly less expressive > > than kexec_load. > > > > > 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > > > and by ARCHes for back compatibility w/ kexec_file_load support. > > > > > > For 1) and 2), I think the reason is obvious as Eric said, > > > kexec_file_load is simple enough. And currently, whenever we got a bug > > > report, we may need fix them twice, for kexec_load and kexec_file_load. > > > If kexec_file_load is made by default, e.g on x86_64, we will change it > > > in kernel space only, for kexec_file_load. This is what I meant about > > > 'obsolete gradually'. I think for arm64, s390, they will do these too. > > > Unless there's some critical/blocker bug in kexec_load, to corrupt the > > > old kexec_load interface in old product. > > > > Maybe. The code that kexec_file_load sucked into the kernel is quite > > stable and rarely needs changes except during a port of kexec to > > another architecture. > > > > Last I looked the real maintenance effor of kexec and kexec on panic was > > in the drivers. So I don't think we can use maintenance to do anything. > > Not sure if I got it. But if check Lianbo's patches, a lot of effort has > been taken to make SEV work well on kexec_file_load. And we have > switched to use kexec_file_load in the newly published Fedora release > on x86_64 by default. Before this, Lianbo has investigated and done many > experiments to make sure the switching is safe. We finally made this > decision. Next we will do the switch in Enterprise distros. Once these > are proved safe, we will suggest customers to use kexec_file_load for > kexec rebooting too. In the future, we will only care about > kexec_file_load if everying is going well. But as I have explained > repeatedly, only caring about kexec_file_load means we will leave > kexec_load as is, we will not add new feature or improvement patches > for it. > > commit 6a20bd54473e11011bf2b47efb52d0759d412854 > Author: Lianbo Jiang <lijiang@redhat.com> > Date: Thu Jan 16 13:47:35 2020 +0800 > > kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > > > > > > For 3), people can still use kexec_load and develop/fix for it, if no > > > kexec_file_load supported. But 32-bit arm should be a different one, > > > more like i386, we will leave it as is, and fix anything which could > > > break it. But people really expects to improve or add feature to it? E.g > > > in this patchset, the mem hotplug issue James raised, I assume James is > > > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > > > another reply, people even don't agree to continue supporting memory > > > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > > > bug on i386 with a patch, but people would rather set it as BROKEN. > > > > For memory hotplug just reload. Userspace already gets good events. > > Kexec_file_load is easy to maintain. This is an example. > > Lock the hotplug area where kexed-ed kernel is targeted in this patchset, > it's obviously not right. We can't disable memory hotplug just because > kexec-ed kernel is loaded ahead of time. > > Reloading is also not a good fix. Kexec-ed kernel is targeted at a > movable area, reloading can avoid kexec rebooting corruption if that > area is hot removed. But if that area is not removed, locating kernel > into the hotpluggable area will change the area into ummovable zone. Here I mean if kexec kernel is targeted at a hotplggable memory region, after kexec rebooting, that region will become unmovable. People can't hot remove it in kexec-ed kernel. > Unless we decide to not support memory hotplug in kexec-ed kernel, I > guess it's very hard. Now in our distros kexec rebooting has been > supported, the big cloud providers are deploying linux in guest, bugs on > kexec reboot failure has been reported. They need the memory hotplug to > increase/decrease memory. > > The root cause is kexec-ed kernel is targeted at hotpluggable memory > region. Just avoiding the movable area can fix it. In kexec_file_load(), > just checking or picking those unmovable region to put kernel/initrd in > function locate_mem_hole_callback() can fix it. The page or pageblock's > zone is movable or not, it's easy to know. This fix doesn't need to > bother other component. > > > > > We should not expect anything except a panic kernel to be loaded over a > > memory hotplug event. The kexec on panic code should actually be loaded > > in a location that we don't reliquish if asked for it. > > > > Quite frankly at this point I would love to see the signature fad die, > > which would allow us to remove kexec_file_load. I still have not seen > > the signature code used anywhere except by people anticipating trouble. > > > > Given that Microsoft has already directly signed a malicous bootloader. > > (Not in the Linux ecosystem). I don't even know if any of the reasons > > for having kexec_file_load are legtimate. > > > > > > If someone wants to do the work and ensure everything that is possible > > to load with kexec_load is possible to load with kexec_file_load. > > Kernels supporting the multi-boot protocol etc. Then we can consider > > deprecating kexec_load. > > > > > > I think it took me about 15 years to remove the sysctl system call and > > it only ever had about 10 users. If you want to go through that kind of > > work to make certain there are no more users and that everything they > > could do with the old interface is doable with the new interface then > > please be my guest. Until then we need to fully support kexec_load. > > I want to clarify again, we have no plan to deprecate kexec_load. > We just plan to use kexec_file_load more in our distros, for both legacy > system or system with secure boot. > > Eric, I am glad to see you told your opinion about kexec_file_load. > Without the discussion in this thread, we may not know it. So I have one > question, seems kexec_file_load will continue existing, the ARCHes our > distros is supporting, x86_64, s390, ppc, arm64, all have kexec_file_load, > do you object us to continue using kexec_file_load, for signature > verification and normal kexec/kdump booting? Or you plan to deprecate > kexec_file_load? _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 6:51 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-14 6:51 UTC (permalink / raw) To: Eric W. Biederman Cc: David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 04/14/20 at 02:40pm, Baoquan He wrote: > On 04/13/20 at 08:15am, Eric W. Biederman wrote: > > Baoquan He <bhe@redhat.com> writes: > > > > > On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > > >> > > >> The only benefit of kexec_file_load is that it is simple enough from a > > >> kernel perspective that signatures can be checked. > > > > > > We don't have this restriction any more with below commit: > > > > > > commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > > > and KEXEC_SIG_FORCE") > > > > > > With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > > > secure boot or legacy system for kexec/kdump. Being simple enough is > > > enough to astract and convince us to use it instead. And kexec_file_load > > > has been in use for several years on systems with secure boot, since > > > added in 2014, on x86_64. > > > > No. Actaully kexec_file_load is the less capable interface, and less > > flexible interface. Which is why it is appropriate for signature > > verification. > > Well, everyone has a stance and the corresponding view. You could have > wider view from long time maintenance and in upstrem position, and think > kexec_file_load is horrible. But I can only see from our work as a front > line engineer to maintain/develop kexec/kdump in RHEL, and think > kexec_file_load is easier to maintain. > > Surely except of multiple kernel image format support. No matter it is > kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > This is produced from kerel building by default. We have no way to > support it in our distros and add it into kexec_file_load. > > [RFC PATCH] x86/boot: make ELF kernel multiboot-able > https://lkml.org/lkml/2017/2/15/654 > > > > > >> kexec_load in every other respect is the more capable and functional > > >> interface. It makes no sense to get rid of it. > > >> > > >> It does make sense to reload with a loaded kernel on memory hotplug. > > >> That is simple and easy. If we are going to handle something in the > > >> kernel it should simple an automated unloading of the kernel on memory > > >> hotplug. > > >> > > >> > > >> I think it would be irresponsible to deprecate kexec_load on any > > >> platform. > > >> > > >> I also suspect that kexec_file_load could be taught to copy the dtb > > >> on arm32 if someone wants to deal with signatures. > > >> > > >> We definitely can not even think of deprecating kexec_load until > > >> architecture that supports it also supports kexec_file_load and everyone > > >> is happy with that interface. That is Linus's no regression rule. > > > > > > I should pick a milder word to express our tendency and tell our plan > > > then 'obsolete'. Even though I added 'gradually', seems it doesn't help > > > much. I didn't mean to say 'deprecate' at all when replied. > > > > > > The situation and trend I understand about kexec_load and kexec_file_load > > > are: > > > > > > 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > > > have yet, just as x86_64, arm64 and s390 have done; > > > > > > 2) kexec_file_load is suggested to use, and take precedence over > > > kexec_load in the future, if both are supported in one ARCH. > > > > The deep problem is that kexec_file_load is distinctly less expressive > > than kexec_load. > > > > > 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > > > and by ARCHes for back compatibility w/ kexec_file_load support. > > > > > > For 1) and 2), I think the reason is obvious as Eric said, > > > kexec_file_load is simple enough. And currently, whenever we got a bug > > > report, we may need fix them twice, for kexec_load and kexec_file_load. > > > If kexec_file_load is made by default, e.g on x86_64, we will change it > > > in kernel space only, for kexec_file_load. This is what I meant about > > > 'obsolete gradually'. I think for arm64, s390, they will do these too. > > > Unless there's some critical/blocker bug in kexec_load, to corrupt the > > > old kexec_load interface in old product. > > > > Maybe. The code that kexec_file_load sucked into the kernel is quite > > stable and rarely needs changes except during a port of kexec to > > another architecture. > > > > Last I looked the real maintenance effor of kexec and kexec on panic was > > in the drivers. So I don't think we can use maintenance to do anything. > > Not sure if I got it. But if check Lianbo's patches, a lot of effort has > been taken to make SEV work well on kexec_file_load. And we have > switched to use kexec_file_load in the newly published Fedora release > on x86_64 by default. Before this, Lianbo has investigated and done many > experiments to make sure the switching is safe. We finally made this > decision. Next we will do the switch in Enterprise distros. Once these > are proved safe, we will suggest customers to use kexec_file_load for > kexec rebooting too. In the future, we will only care about > kexec_file_load if everying is going well. But as I have explained > repeatedly, only caring about kexec_file_load means we will leave > kexec_load as is, we will not add new feature or improvement patches > for it. > > commit 6a20bd54473e11011bf2b47efb52d0759d412854 > Author: Lianbo Jiang <lijiang@redhat.com> > Date: Thu Jan 16 13:47:35 2020 +0800 > > kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > > > > > > For 3), people can still use kexec_load and develop/fix for it, if no > > > kexec_file_load supported. But 32-bit arm should be a different one, > > > more like i386, we will leave it as is, and fix anything which could > > > break it. But people really expects to improve or add feature to it? E.g > > > in this patchset, the mem hotplug issue James raised, I assume James is > > > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > > > another reply, people even don't agree to continue supporting memory > > > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > > > bug on i386 with a patch, but people would rather set it as BROKEN. > > > > For memory hotplug just reload. Userspace already gets good events. > > Kexec_file_load is easy to maintain. This is an example. > > Lock the hotplug area where kexed-ed kernel is targeted in this patchset, > it's obviously not right. We can't disable memory hotplug just because > kexec-ed kernel is loaded ahead of time. > > Reloading is also not a good fix. Kexec-ed kernel is targeted at a > movable area, reloading can avoid kexec rebooting corruption if that > area is hot removed. But if that area is not removed, locating kernel > into the hotpluggable area will change the area into ummovable zone. Here I mean if kexec kernel is targeted at a hotplggable memory region, after kexec rebooting, that region will become unmovable. People can't hot remove it in kexec-ed kernel. > Unless we decide to not support memory hotplug in kexec-ed kernel, I > guess it's very hard. Now in our distros kexec rebooting has been > supported, the big cloud providers are deploying linux in guest, bugs on > kexec reboot failure has been reported. They need the memory hotplug to > increase/decrease memory. > > The root cause is kexec-ed kernel is targeted at hotpluggable memory > region. Just avoiding the movable area can fix it. In kexec_file_load(), > just checking or picking those unmovable region to put kernel/initrd in > function locate_mem_hole_callback() can fix it. The page or pageblock's > zone is movable or not, it's easy to know. This fix doesn't need to > bother other component. > > > > > We should not expect anything except a panic kernel to be loaded over a > > memory hotplug event. The kexec on panic code should actually be loaded > > in a location that we don't reliquish if asked for it. > > > > Quite frankly at this point I would love to see the signature fad die, > > which would allow us to remove kexec_file_load. I still have not seen > > the signature code used anywhere except by people anticipating trouble. > > > > Given that Microsoft has already directly signed a malicous bootloader. > > (Not in the Linux ecosystem). I don't even know if any of the reasons > > for having kexec_file_load are legtimate. > > > > > > If someone wants to do the work and ensure everything that is possible > > to load with kexec_load is possible to load with kexec_file_load. > > Kernels supporting the multi-boot protocol etc. Then we can consider > > deprecating kexec_load. > > > > > > I think it took me about 15 years to remove the sysctl system call and > > it only ever had about 10 users. If you want to go through that kind of > > work to make certain there are no more users and that everything they > > could do with the old interface is doable with the new interface then > > please be my guest. Until then we need to fully support kexec_load. > > I want to clarify again, we have no plan to deprecate kexec_load. > We just plan to use kexec_file_load more in our distros, for both legacy > system or system with secure boot. > > Eric, I am glad to see you told your opinion about kexec_file_load. > Without the discussion in this thread, we may not know it. So I have one > question, seems kexec_file_load will continue existing, the ARCHes our > distros is supporting, x86_64, s390, ppc, arm64, all have kexec_file_load, > do you object us to continue using kexec_file_load, for signature > verification and normal kexec/kdump booting? Or you plan to deprecate > kexec_file_load? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 6:40 ` Baoquan He (?) @ 2020-04-14 8:00 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-14 8:00 UTC (permalink / raw) To: Baoquan He, Eric W. Biederman Cc: Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 14.04.20 08:40, Baoquan He wrote: > On 04/13/20 at 08:15am, Eric W. Biederman wrote: >> Baoquan He <bhe@redhat.com> writes: >> >>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: >>>> >>>> The only benefit of kexec_file_load is that it is simple enough from a >>>> kernel perspective that signatures can be checked. >>> >>> We don't have this restriction any more with below commit: >>> >>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG >>> and KEXEC_SIG_FORCE") >>> >>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both >>> secure boot or legacy system for kexec/kdump. Being simple enough is >>> enough to astract and convince us to use it instead. And kexec_file_load >>> has been in use for several years on systems with secure boot, since >>> added in 2014, on x86_64. >> >> No. Actaully kexec_file_load is the less capable interface, and less >> flexible interface. Which is why it is appropriate for signature >> verification. > > Well, everyone has a stance and the corresponding view. You could have > wider view from long time maintenance and in upstrem position, and think > kexec_file_load is horrible. But I can only see from our work as a front > line engineer to maintain/develop kexec/kdump in RHEL, and think > kexec_file_load is easier to maintain. > > Surely except of multiple kernel image format support. No matter it is > kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > This is produced from kerel building by default. We have no way to > support it in our distros and add it into kexec_file_load. > > [RFC PATCH] x86/boot: make ELF kernel multiboot-able > https://lkml.org/lkml/2017/2/15/654 > >> >>>> kexec_load in every other respect is the more capable and functional >>>> interface. It makes no sense to get rid of it. >>>> >>>> It does make sense to reload with a loaded kernel on memory hotplug. >>>> That is simple and easy. If we are going to handle something in the >>>> kernel it should simple an automated unloading of the kernel on memory >>>> hotplug. >>>> >>>> >>>> I think it would be irresponsible to deprecate kexec_load on any >>>> platform. >>>> >>>> I also suspect that kexec_file_load could be taught to copy the dtb >>>> on arm32 if someone wants to deal with signatures. >>>> >>>> We definitely can not even think of deprecating kexec_load until >>>> architecture that supports it also supports kexec_file_load and everyone >>>> is happy with that interface. That is Linus's no regression rule. >>> >>> I should pick a milder word to express our tendency and tell our plan >>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help >>> much. I didn't mean to say 'deprecate' at all when replied. >>> >>> The situation and trend I understand about kexec_load and kexec_file_load >>> are: >>> >>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't >>> have yet, just as x86_64, arm64 and s390 have done; >>> >>> 2) kexec_file_load is suggested to use, and take precedence over >>> kexec_load in the future, if both are supported in one ARCH. >> >> The deep problem is that kexec_file_load is distinctly less expressive >> than kexec_load. >> >>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, >>> and by ARCHes for back compatibility w/ kexec_file_load support. >>> >>> For 1) and 2), I think the reason is obvious as Eric said, >>> kexec_file_load is simple enough. And currently, whenever we got a bug >>> report, we may need fix them twice, for kexec_load and kexec_file_load. >>> If kexec_file_load is made by default, e.g on x86_64, we will change it >>> in kernel space only, for kexec_file_load. This is what I meant about >>> 'obsolete gradually'. I think for arm64, s390, they will do these too. >>> Unless there's some critical/blocker bug in kexec_load, to corrupt the >>> old kexec_load interface in old product. >> >> Maybe. The code that kexec_file_load sucked into the kernel is quite >> stable and rarely needs changes except during a port of kexec to >> another architecture. >> >> Last I looked the real maintenance effor of kexec and kexec on panic was >> in the drivers. So I don't think we can use maintenance to do anything. > > Not sure if I got it. But if check Lianbo's patches, a lot of effort has > been taken to make SEV work well on kexec_file_load. And we have > switched to use kexec_file_load in the newly published Fedora release > on x86_64 by default. Before this, Lianbo has investigated and done many > experiments to make sure the switching is safe. We finally made this > decision. Next we will do the switch in Enterprise distros. Once these > are proved safe, we will suggest customers to use kexec_file_load for > kexec rebooting too. In the future, we will only care about > kexec_file_load if everying is going well. But as I have explained > repeatedly, only caring about kexec_file_load means we will leave > kexec_load as is, we will not add new feature or improvement patches > for it. > > commit 6a20bd54473e11011bf2b47efb52d0759d412854 > Author: Lianbo Jiang <lijiang@redhat.com> > Date: Thu Jan 16 13:47:35 2020 +0800 > > kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > >> >>> For 3), people can still use kexec_load and develop/fix for it, if no >>> kexec_file_load supported. But 32-bit arm should be a different one, >>> more like i386, we will leave it as is, and fix anything which could >>> break it. But people really expects to improve or add feature to it? E.g >>> in this patchset, the mem hotplug issue James raised, I assume James is >>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in >>> another reply, people even don't agree to continue supporting memory >>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug >>> bug on i386 with a patch, but people would rather set it as BROKEN. >> >> For memory hotplug just reload. Userspace already gets good events. > > Kexec_file_load is easy to maintain. This is an example. > > Lock the hotplug area where kexed-ed kernel is targeted in this patchset, > it's obviously not right. We can't disable memory hotplug just because > kexec-ed kernel is loaded ahead of time. > > Reloading is also not a good fix. Kexec-ed kernel is targeted at a > movable area, reloading can avoid kexec rebooting corruption if that > area is hot removed. But if that area is not removed, locating kernel > into the hotpluggable area will change the area into ummovable zone. > Unless we decide to not support memory hotplug in kexec-ed kernel, I > guess it's very hard. Now in our distros kexec rebooting has been > supported, the big cloud providers are deploying linux in guest, bugs on > kexec reboot failure has been reported. They need the memory hotplug to > increase/decrease memory. > > The root cause is kexec-ed kernel is targeted at hotpluggable memory > region. Just avoiding the movable area can fix it. In kexec_file_load(), > just checking or picking those unmovable region to put kernel/initrd in > function locate_mem_hole_callback() can fix it. The page or pageblock's > zone is movable or not, it's easy to know. This fix doesn't need to > bother other component. I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL does not imply that it cannot get offlined and removed e.g., this is heavily used on ppc64, with 16MB sections. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 8:00 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-14 8:00 UTC (permalink / raw) To: Baoquan He, Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 14.04.20 08:40, Baoquan He wrote: > On 04/13/20 at 08:15am, Eric W. Biederman wrote: >> Baoquan He <bhe@redhat.com> writes: >> >>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: >>>> >>>> The only benefit of kexec_file_load is that it is simple enough from a >>>> kernel perspective that signatures can be checked. >>> >>> We don't have this restriction any more with below commit: >>> >>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG >>> and KEXEC_SIG_FORCE") >>> >>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both >>> secure boot or legacy system for kexec/kdump. Being simple enough is >>> enough to astract and convince us to use it instead. And kexec_file_load >>> has been in use for several years on systems with secure boot, since >>> added in 2014, on x86_64. >> >> No. Actaully kexec_file_load is the less capable interface, and less >> flexible interface. Which is why it is appropriate for signature >> verification. > > Well, everyone has a stance and the corresponding view. You could have > wider view from long time maintenance and in upstrem position, and think > kexec_file_load is horrible. But I can only see from our work as a front > line engineer to maintain/develop kexec/kdump in RHEL, and think > kexec_file_load is easier to maintain. > > Surely except of multiple kernel image format support. No matter it is > kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > This is produced from kerel building by default. We have no way to > support it in our distros and add it into kexec_file_load. > > [RFC PATCH] x86/boot: make ELF kernel multiboot-able > https://lkml.org/lkml/2017/2/15/654 > >> >>>> kexec_load in every other respect is the more capable and functional >>>> interface. It makes no sense to get rid of it. >>>> >>>> It does make sense to reload with a loaded kernel on memory hotplug. >>>> That is simple and easy. If we are going to handle something in the >>>> kernel it should simple an automated unloading of the kernel on memory >>>> hotplug. >>>> >>>> >>>> I think it would be irresponsible to deprecate kexec_load on any >>>> platform. >>>> >>>> I also suspect that kexec_file_load could be taught to copy the dtb >>>> on arm32 if someone wants to deal with signatures. >>>> >>>> We definitely can not even think of deprecating kexec_load until >>>> architecture that supports it also supports kexec_file_load and everyone >>>> is happy with that interface. That is Linus's no regression rule. >>> >>> I should pick a milder word to express our tendency and tell our plan >>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help >>> much. I didn't mean to say 'deprecate' at all when replied. >>> >>> The situation and trend I understand about kexec_load and kexec_file_load >>> are: >>> >>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't >>> have yet, just as x86_64, arm64 and s390 have done; >>> >>> 2) kexec_file_load is suggested to use, and take precedence over >>> kexec_load in the future, if both are supported in one ARCH. >> >> The deep problem is that kexec_file_load is distinctly less expressive >> than kexec_load. >> >>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, >>> and by ARCHes for back compatibility w/ kexec_file_load support. >>> >>> For 1) and 2), I think the reason is obvious as Eric said, >>> kexec_file_load is simple enough. And currently, whenever we got a bug >>> report, we may need fix them twice, for kexec_load and kexec_file_load. >>> If kexec_file_load is made by default, e.g on x86_64, we will change it >>> in kernel space only, for kexec_file_load. This is what I meant about >>> 'obsolete gradually'. I think for arm64, s390, they will do these too. >>> Unless there's some critical/blocker bug in kexec_load, to corrupt the >>> old kexec_load interface in old product. >> >> Maybe. The code that kexec_file_load sucked into the kernel is quite >> stable and rarely needs changes except during a port of kexec to >> another architecture. >> >> Last I looked the real maintenance effor of kexec and kexec on panic was >> in the drivers. So I don't think we can use maintenance to do anything. > > Not sure if I got it. But if check Lianbo's patches, a lot of effort has > been taken to make SEV work well on kexec_file_load. And we have > switched to use kexec_file_load in the newly published Fedora release > on x86_64 by default. Before this, Lianbo has investigated and done many > experiments to make sure the switching is safe. We finally made this > decision. Next we will do the switch in Enterprise distros. Once these > are proved safe, we will suggest customers to use kexec_file_load for > kexec rebooting too. In the future, we will only care about > kexec_file_load if everying is going well. But as I have explained > repeatedly, only caring about kexec_file_load means we will leave > kexec_load as is, we will not add new feature or improvement patches > for it. > > commit 6a20bd54473e11011bf2b47efb52d0759d412854 > Author: Lianbo Jiang <lijiang@redhat.com> > Date: Thu Jan 16 13:47:35 2020 +0800 > > kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > >> >>> For 3), people can still use kexec_load and develop/fix for it, if no >>> kexec_file_load supported. But 32-bit arm should be a different one, >>> more like i386, we will leave it as is, and fix anything which could >>> break it. But people really expects to improve or add feature to it? E.g >>> in this patchset, the mem hotplug issue James raised, I assume James is >>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in >>> another reply, people even don't agree to continue supporting memory >>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug >>> bug on i386 with a patch, but people would rather set it as BROKEN. >> >> For memory hotplug just reload. Userspace already gets good events. > > Kexec_file_load is easy to maintain. This is an example. > > Lock the hotplug area where kexed-ed kernel is targeted in this patchset, > it's obviously not right. We can't disable memory hotplug just because > kexec-ed kernel is loaded ahead of time. > > Reloading is also not a good fix. Kexec-ed kernel is targeted at a > movable area, reloading can avoid kexec rebooting corruption if that > area is hot removed. But if that area is not removed, locating kernel > into the hotpluggable area will change the area into ummovable zone. > Unless we decide to not support memory hotplug in kexec-ed kernel, I > guess it's very hard. Now in our distros kexec rebooting has been > supported, the big cloud providers are deploying linux in guest, bugs on > kexec reboot failure has been reported. They need the memory hotplug to > increase/decrease memory. > > The root cause is kexec-ed kernel is targeted at hotpluggable memory > region. Just avoiding the movable area can fix it. In kexec_file_load(), > just checking or picking those unmovable region to put kernel/initrd in > function locate_mem_hole_callback() can fix it. The page or pageblock's > zone is movable or not, it's easy to know. This fix doesn't need to > bother other component. I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL does not imply that it cannot get offlined and removed e.g., this is heavily used on ppc64, with 16MB sections. -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 8:00 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-14 8:00 UTC (permalink / raw) To: Baoquan He, Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 14.04.20 08:40, Baoquan He wrote: > On 04/13/20 at 08:15am, Eric W. Biederman wrote: >> Baoquan He <bhe@redhat.com> writes: >> >>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: >>>> >>>> The only benefit of kexec_file_load is that it is simple enough from a >>>> kernel perspective that signatures can be checked. >>> >>> We don't have this restriction any more with below commit: >>> >>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG >>> and KEXEC_SIG_FORCE") >>> >>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both >>> secure boot or legacy system for kexec/kdump. Being simple enough is >>> enough to astract and convince us to use it instead. And kexec_file_load >>> has been in use for several years on systems with secure boot, since >>> added in 2014, on x86_64. >> >> No. Actaully kexec_file_load is the less capable interface, and less >> flexible interface. Which is why it is appropriate for signature >> verification. > > Well, everyone has a stance and the corresponding view. You could have > wider view from long time maintenance and in upstrem position, and think > kexec_file_load is horrible. But I can only see from our work as a front > line engineer to maintain/develop kexec/kdump in RHEL, and think > kexec_file_load is easier to maintain. > > Surely except of multiple kernel image format support. No matter it is > kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > This is produced from kerel building by default. We have no way to > support it in our distros and add it into kexec_file_load. > > [RFC PATCH] x86/boot: make ELF kernel multiboot-able > https://lkml.org/lkml/2017/2/15/654 > >> >>>> kexec_load in every other respect is the more capable and functional >>>> interface. It makes no sense to get rid of it. >>>> >>>> It does make sense to reload with a loaded kernel on memory hotplug. >>>> That is simple and easy. If we are going to handle something in the >>>> kernel it should simple an automated unloading of the kernel on memory >>>> hotplug. >>>> >>>> >>>> I think it would be irresponsible to deprecate kexec_load on any >>>> platform. >>>> >>>> I also suspect that kexec_file_load could be taught to copy the dtb >>>> on arm32 if someone wants to deal with signatures. >>>> >>>> We definitely can not even think of deprecating kexec_load until >>>> architecture that supports it also supports kexec_file_load and everyone >>>> is happy with that interface. That is Linus's no regression rule. >>> >>> I should pick a milder word to express our tendency and tell our plan >>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help >>> much. I didn't mean to say 'deprecate' at all when replied. >>> >>> The situation and trend I understand about kexec_load and kexec_file_load >>> are: >>> >>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't >>> have yet, just as x86_64, arm64 and s390 have done; >>> >>> 2) kexec_file_load is suggested to use, and take precedence over >>> kexec_load in the future, if both are supported in one ARCH. >> >> The deep problem is that kexec_file_load is distinctly less expressive >> than kexec_load. >> >>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, >>> and by ARCHes for back compatibility w/ kexec_file_load support. >>> >>> For 1) and 2), I think the reason is obvious as Eric said, >>> kexec_file_load is simple enough. And currently, whenever we got a bug >>> report, we may need fix them twice, for kexec_load and kexec_file_load. >>> If kexec_file_load is made by default, e.g on x86_64, we will change it >>> in kernel space only, for kexec_file_load. This is what I meant about >>> 'obsolete gradually'. I think for arm64, s390, they will do these too. >>> Unless there's some critical/blocker bug in kexec_load, to corrupt the >>> old kexec_load interface in old product. >> >> Maybe. The code that kexec_file_load sucked into the kernel is quite >> stable and rarely needs changes except during a port of kexec to >> another architecture. >> >> Last I looked the real maintenance effor of kexec and kexec on panic was >> in the drivers. So I don't think we can use maintenance to do anything. > > Not sure if I got it. But if check Lianbo's patches, a lot of effort has > been taken to make SEV work well on kexec_file_load. And we have > switched to use kexec_file_load in the newly published Fedora release > on x86_64 by default. Before this, Lianbo has investigated and done many > experiments to make sure the switching is safe. We finally made this > decision. Next we will do the switch in Enterprise distros. Once these > are proved safe, we will suggest customers to use kexec_file_load for > kexec rebooting too. In the future, we will only care about > kexec_file_load if everying is going well. But as I have explained > repeatedly, only caring about kexec_file_load means we will leave > kexec_load as is, we will not add new feature or improvement patches > for it. > > commit 6a20bd54473e11011bf2b47efb52d0759d412854 > Author: Lianbo Jiang <lijiang@redhat.com> > Date: Thu Jan 16 13:47:35 2020 +0800 > > kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > >> >>> For 3), people can still use kexec_load and develop/fix for it, if no >>> kexec_file_load supported. But 32-bit arm should be a different one, >>> more like i386, we will leave it as is, and fix anything which could >>> break it. But people really expects to improve or add feature to it? E.g >>> in this patchset, the mem hotplug issue James raised, I assume James is >>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in >>> another reply, people even don't agree to continue supporting memory >>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug >>> bug on i386 with a patch, but people would rather set it as BROKEN. >> >> For memory hotplug just reload. Userspace already gets good events. > > Kexec_file_load is easy to maintain. This is an example. > > Lock the hotplug area where kexed-ed kernel is targeted in this patchset, > it's obviously not right. We can't disable memory hotplug just because > kexec-ed kernel is loaded ahead of time. > > Reloading is also not a good fix. Kexec-ed kernel is targeted at a > movable area, reloading can avoid kexec rebooting corruption if that > area is hot removed. But if that area is not removed, locating kernel > into the hotpluggable area will change the area into ummovable zone. > Unless we decide to not support memory hotplug in kexec-ed kernel, I > guess it's very hard. Now in our distros kexec rebooting has been > supported, the big cloud providers are deploying linux in guest, bugs on > kexec reboot failure has been reported. They need the memory hotplug to > increase/decrease memory. > > The root cause is kexec-ed kernel is targeted at hotpluggable memory > region. Just avoiding the movable area can fix it. In kexec_file_load(), > just checking or picking those unmovable region to put kernel/initrd in > function locate_mem_hole_callback() can fix it. The page or pageblock's > zone is movable or not, it's easy to know. This fix doesn't need to > bother other component. I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL does not imply that it cannot get offlined and removed e.g., this is heavily used on ppc64, with 16MB sections. -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 8:00 ` David Hildenbrand (?) (?) @ 2020-04-14 9:22 ` Baoquan He -1 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-14 9:22 UTC (permalink / raw) To: David Hildenbrand Cc: Eric W. Biederman, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel, linuxppc-dev On 04/14/20 at 10:00am, David Hildenbrand wrote: > On 14.04.20 08:40, Baoquan He wrote: > > On 04/13/20 at 08:15am, Eric W. Biederman wrote: > >> Baoquan He <bhe@redhat.com> writes: > >> > >>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > >>>> > >>>> The only benefit of kexec_file_load is that it is simple enough from a > >>>> kernel perspective that signatures can be checked. > >>> > >>> We don't have this restriction any more with below commit: > >>> > >>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > >>> and KEXEC_SIG_FORCE") > >>> > >>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > >>> secure boot or legacy system for kexec/kdump. Being simple enough is > >>> enough to astract and convince us to use it instead. And kexec_file_load > >>> has been in use for several years on systems with secure boot, since > >>> added in 2014, on x86_64. > >> > >> No. Actaully kexec_file_load is the less capable interface, and less > >> flexible interface. Which is why it is appropriate for signature > >> verification. > > > > Well, everyone has a stance and the corresponding view. You could have > > wider view from long time maintenance and in upstrem position, and think > > kexec_file_load is horrible. But I can only see from our work as a front > > line engineer to maintain/develop kexec/kdump in RHEL, and think > > kexec_file_load is easier to maintain. > > > > Surely except of multiple kernel image format support. No matter it is > > kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > > This is produced from kerel building by default. We have no way to > > support it in our distros and add it into kexec_file_load. > > > > [RFC PATCH] x86/boot: make ELF kernel multiboot-able > > https://lkml.org/lkml/2017/2/15/654 > > > >> > >>>> kexec_load in every other respect is the more capable and functional > >>>> interface. It makes no sense to get rid of it. > >>>> > >>>> It does make sense to reload with a loaded kernel on memory hotplug. > >>>> That is simple and easy. If we are going to handle something in the > >>>> kernel it should simple an automated unloading of the kernel on memory > >>>> hotplug. > >>>> > >>>> > >>>> I think it would be irresponsible to deprecate kexec_load on any > >>>> platform. > >>>> > >>>> I also suspect that kexec_file_load could be taught to copy the dtb > >>>> on arm32 if someone wants to deal with signatures. > >>>> > >>>> We definitely can not even think of deprecating kexec_load until > >>>> architecture that supports it also supports kexec_file_load and everyone > >>>> is happy with that interface. That is Linus's no regression rule. > >>> > >>> I should pick a milder word to express our tendency and tell our plan > >>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help > >>> much. I didn't mean to say 'deprecate' at all when replied. > >>> > >>> The situation and trend I understand about kexec_load and kexec_file_load > >>> are: > >>> > >>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > >>> have yet, just as x86_64, arm64 and s390 have done; > >>> > >>> 2) kexec_file_load is suggested to use, and take precedence over > >>> kexec_load in the future, if both are supported in one ARCH. > >> > >> The deep problem is that kexec_file_load is distinctly less expressive > >> than kexec_load. > >> > >>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > >>> and by ARCHes for back compatibility w/ kexec_file_load support. > >>> > >>> For 1) and 2), I think the reason is obvious as Eric said, > >>> kexec_file_load is simple enough. And currently, whenever we got a bug > >>> report, we may need fix them twice, for kexec_load and kexec_file_load. > >>> If kexec_file_load is made by default, e.g on x86_64, we will change it > >>> in kernel space only, for kexec_file_load. This is what I meant about > >>> 'obsolete gradually'. I think for arm64, s390, they will do these too. > >>> Unless there's some critical/blocker bug in kexec_load, to corrupt the > >>> old kexec_load interface in old product. > >> > >> Maybe. The code that kexec_file_load sucked into the kernel is quite > >> stable and rarely needs changes except during a port of kexec to > >> another architecture. > >> > >> Last I looked the real maintenance effor of kexec and kexec on panic was > >> in the drivers. So I don't think we can use maintenance to do anything. > > > > Not sure if I got it. But if check Lianbo's patches, a lot of effort has > > been taken to make SEV work well on kexec_file_load. And we have > > switched to use kexec_file_load in the newly published Fedora release > > on x86_64 by default. Before this, Lianbo has investigated and done many > > experiments to make sure the switching is safe. We finally made this > > decision. Next we will do the switch in Enterprise distros. Once these > > are proved safe, we will suggest customers to use kexec_file_load for > > kexec rebooting too. In the future, we will only care about > > kexec_file_load if everying is going well. But as I have explained > > repeatedly, only caring about kexec_file_load means we will leave > > kexec_load as is, we will not add new feature or improvement patches > > for it. > > > > commit 6a20bd54473e11011bf2b47efb52d0759d412854 > > Author: Lianbo Jiang <lijiang@redhat.com> > > Date: Thu Jan 16 13:47:35 2020 +0800 > > > > kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > > > >> > >>> For 3), people can still use kexec_load and develop/fix for it, if no > >>> kexec_file_load supported. But 32-bit arm should be a different one, > >>> more like i386, we will leave it as is, and fix anything which could > >>> break it. But people really expects to improve or add feature to it? E.g > >>> in this patchset, the mem hotplug issue James raised, I assume James is > >>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > >>> another reply, people even don't agree to continue supporting memory > >>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug > >>> bug on i386 with a patch, but people would rather set it as BROKEN. > >> > >> For memory hotplug just reload. Userspace already gets good events. > > > > Kexec_file_load is easy to maintain. This is an example. > > > > Lock the hotplug area where kexed-ed kernel is targeted in this patchset, > > it's obviously not right. We can't disable memory hotplug just because > > kexec-ed kernel is loaded ahead of time. > > > > Reloading is also not a good fix. Kexec-ed kernel is targeted at a > > movable area, reloading can avoid kexec rebooting corruption if that > > area is hot removed. But if that area is not removed, locating kernel > > into the hotpluggable area will change the area into ummovable zone. > > Unless we decide to not support memory hotplug in kexec-ed kernel, I > > guess it's very hard. Now in our distros kexec rebooting has been > > supported, the big cloud providers are deploying linux in guest, bugs on > > kexec reboot failure has been reported. They need the memory hotplug to > > increase/decrease memory. > > > > The root cause is kexec-ed kernel is targeted at hotpluggable memory > > region. Just avoiding the movable area can fix it. In kexec_file_load(), > > just checking or picking those unmovable region to put kernel/initrd in > > function locate_mem_hole_callback() can fix it. The page or pageblock's > > zone is movable or not, it's easy to know. This fix doesn't need to > > bother other component. > > I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL > does not imply that it cannot get offlined and removed e.g., this is > heavily used on ppc64, with 16MB sections. Really? I just know there are two kinds of mem hoplug in ppc, but don't know the details. So in this case, is there any flag or a way to know those memory block are hotpluggable? I am curious how those kernel data is avoided to be put in this area. Or ppc just freely uses it for kernel data or user space data, then try to migrate when hot remove? ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 9:22 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-14 9:22 UTC (permalink / raw) To: David Hildenbrand Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/14/20 at 10:00am, David Hildenbrand wrote: > On 14.04.20 08:40, Baoquan He wrote: > > On 04/13/20 at 08:15am, Eric W. Biederman wrote: > >> Baoquan He <bhe@redhat.com> writes: > >> > >>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > >>>> > >>>> The only benefit of kexec_file_load is that it is simple enough from a > >>>> kernel perspective that signatures can be checked. > >>> > >>> We don't have this restriction any more with below commit: > >>> > >>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > >>> and KEXEC_SIG_FORCE") > >>> > >>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > >>> secure boot or legacy system for kexec/kdump. Being simple enough is > >>> enough to astract and convince us to use it instead. And kexec_file_load > >>> has been in use for several years on systems with secure boot, since > >>> added in 2014, on x86_64. > >> > >> No. Actaully kexec_file_load is the less capable interface, and less > >> flexible interface. Which is why it is appropriate for signature > >> verification. > > > > Well, everyone has a stance and the corresponding view. You could have > > wider view from long time maintenance and in upstrem position, and think > > kexec_file_load is horrible. But I can only see from our work as a front > > line engineer to maintain/develop kexec/kdump in RHEL, and think > > kexec_file_load is easier to maintain. > > > > Surely except of multiple kernel image format support. No matter it is > > kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > > This is produced from kerel building by default. We have no way to > > support it in our distros and add it into kexec_file_load. > > > > [RFC PATCH] x86/boot: make ELF kernel multiboot-able > > https://lkml.org/lkml/2017/2/15/654 > > > >> > >>>> kexec_load in every other respect is the more capable and functional > >>>> interface. It makes no sense to get rid of it. > >>>> > >>>> It does make sense to reload with a loaded kernel on memory hotplug. > >>>> That is simple and easy. If we are going to handle something in the > >>>> kernel it should simple an automated unloading of the kernel on memory > >>>> hotplug. > >>>> > >>>> > >>>> I think it would be irresponsible to deprecate kexec_load on any > >>>> platform. > >>>> > >>>> I also suspect that kexec_file_load could be taught to copy the dtb > >>>> on arm32 if someone wants to deal with signatures. > >>>> > >>>> We definitely can not even think of deprecating kexec_load until > >>>> architecture that supports it also supports kexec_file_load and everyone > >>>> is happy with that interface. That is Linus's no regression rule. > >>> > >>> I should pick a milder word to express our tendency and tell our plan > >>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help > >>> much. I didn't mean to say 'deprecate' at all when replied. > >>> > >>> The situation and trend I understand about kexec_load and kexec_file_load > >>> are: > >>> > >>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > >>> have yet, just as x86_64, arm64 and s390 have done; > >>> > >>> 2) kexec_file_load is suggested to use, and take precedence over > >>> kexec_load in the future, if both are supported in one ARCH. > >> > >> The deep problem is that kexec_file_load is distinctly less expressive > >> than kexec_load. > >> > >>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > >>> and by ARCHes for back compatibility w/ kexec_file_load support. > >>> > >>> For 1) and 2), I think the reason is obvious as Eric said, > >>> kexec_file_load is simple enough. And currently, whenever we got a bug > >>> report, we may need fix them twice, for kexec_load and kexec_file_load. > >>> If kexec_file_load is made by default, e.g on x86_64, we will change it > >>> in kernel space only, for kexec_file_load. This is what I meant about > >>> 'obsolete gradually'. I think for arm64, s390, they will do these too. > >>> Unless there's some critical/blocker bug in kexec_load, to corrupt the > >>> old kexec_load interface in old product. > >> > >> Maybe. The code that kexec_file_load sucked into the kernel is quite > >> stable and rarely needs changes except during a port of kexec to > >> another architecture. > >> > >> Last I looked the real maintenance effor of kexec and kexec on panic was > >> in the drivers. So I don't think we can use maintenance to do anything. > > > > Not sure if I got it. But if check Lianbo's patches, a lot of effort has > > been taken to make SEV work well on kexec_file_load. And we have > > switched to use kexec_file_load in the newly published Fedora release > > on x86_64 by default. Before this, Lianbo has investigated and done many > > experiments to make sure the switching is safe. We finally made this > > decision. Next we will do the switch in Enterprise distros. Once these > > are proved safe, we will suggest customers to use kexec_file_load for > > kexec rebooting too. In the future, we will only care about > > kexec_file_load if everying is going well. But as I have explained > > repeatedly, only caring about kexec_file_load means we will leave > > kexec_load as is, we will not add new feature or improvement patches > > for it. > > > > commit 6a20bd54473e11011bf2b47efb52d0759d412854 > > Author: Lianbo Jiang <lijiang@redhat.com> > > Date: Thu Jan 16 13:47:35 2020 +0800 > > > > kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > > > >> > >>> For 3), people can still use kexec_load and develop/fix for it, if no > >>> kexec_file_load supported. But 32-bit arm should be a different one, > >>> more like i386, we will leave it as is, and fix anything which could > >>> break it. But people really expects to improve or add feature to it? E.g > >>> in this patchset, the mem hotplug issue James raised, I assume James is > >>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > >>> another reply, people even don't agree to continue supporting memory > >>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug > >>> bug on i386 with a patch, but people would rather set it as BROKEN. > >> > >> For memory hotplug just reload. Userspace already gets good events. > > > > Kexec_file_load is easy to maintain. This is an example. > > > > Lock the hotplug area where kexed-ed kernel is targeted in this patchset, > > it's obviously not right. We can't disable memory hotplug just because > > kexec-ed kernel is loaded ahead of time. > > > > Reloading is also not a good fix. Kexec-ed kernel is targeted at a > > movable area, reloading can avoid kexec rebooting corruption if that > > area is hot removed. But if that area is not removed, locating kernel > > into the hotpluggable area will change the area into ummovable zone. > > Unless we decide to not support memory hotplug in kexec-ed kernel, I > > guess it's very hard. Now in our distros kexec rebooting has been > > supported, the big cloud providers are deploying linux in guest, bugs on > > kexec reboot failure has been reported. They need the memory hotplug to > > increase/decrease memory. > > > > The root cause is kexec-ed kernel is targeted at hotpluggable memory > > region. Just avoiding the movable area can fix it. In kexec_file_load(), > > just checking or picking those unmovable region to put kernel/initrd in > > function locate_mem_hole_callback() can fix it. The page or pageblock's > > zone is movable or not, it's easy to know. This fix doesn't need to > > bother other component. > > I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL > does not imply that it cannot get offlined and removed e.g., this is > heavily used on ppc64, with 16MB sections. Really? I just know there are two kinds of mem hoplug in ppc, but don't know the details. So in this case, is there any flag or a way to know those memory block are hotpluggable? I am curious how those kernel data is avoided to be put in this area. Or ppc just freely uses it for kernel data or user space data, then try to migrate when hot remove? _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 9:22 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-14 9:22 UTC (permalink / raw) To: David Hildenbrand Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/14/20 at 10:00am, David Hildenbrand wrote: > On 14.04.20 08:40, Baoquan He wrote: > > On 04/13/20 at 08:15am, Eric W. Biederman wrote: > >> Baoquan He <bhe@redhat.com> writes: > >> > >>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > >>>> > >>>> The only benefit of kexec_file_load is that it is simple enough from a > >>>> kernel perspective that signatures can be checked. > >>> > >>> We don't have this restriction any more with below commit: > >>> > >>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > >>> and KEXEC_SIG_FORCE") > >>> > >>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > >>> secure boot or legacy system for kexec/kdump. Being simple enough is > >>> enough to astract and convince us to use it instead. And kexec_file_load > >>> has been in use for several years on systems with secure boot, since > >>> added in 2014, on x86_64. > >> > >> No. Actaully kexec_file_load is the less capable interface, and less > >> flexible interface. Which is why it is appropriate for signature > >> verification. > > > > Well, everyone has a stance and the corresponding view. You could have > > wider view from long time maintenance and in upstrem position, and think > > kexec_file_load is horrible. But I can only see from our work as a front > > line engineer to maintain/develop kexec/kdump in RHEL, and think > > kexec_file_load is easier to maintain. > > > > Surely except of multiple kernel image format support. No matter it is > > kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > > This is produced from kerel building by default. We have no way to > > support it in our distros and add it into kexec_file_load. > > > > [RFC PATCH] x86/boot: make ELF kernel multiboot-able > > https://lkml.org/lkml/2017/2/15/654 > > > >> > >>>> kexec_load in every other respect is the more capable and functional > >>>> interface. It makes no sense to get rid of it. > >>>> > >>>> It does make sense to reload with a loaded kernel on memory hotplug. > >>>> That is simple and easy. If we are going to handle something in the > >>>> kernel it should simple an automated unloading of the kernel on memory > >>>> hotplug. > >>>> > >>>> > >>>> I think it would be irresponsible to deprecate kexec_load on any > >>>> platform. > >>>> > >>>> I also suspect that kexec_file_load could be taught to copy the dtb > >>>> on arm32 if someone wants to deal with signatures. > >>>> > >>>> We definitely can not even think of deprecating kexec_load until > >>>> architecture that supports it also supports kexec_file_load and everyone > >>>> is happy with that interface. That is Linus's no regression rule. > >>> > >>> I should pick a milder word to express our tendency and tell our plan > >>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help > >>> much. I didn't mean to say 'deprecate' at all when replied. > >>> > >>> The situation and trend I understand about kexec_load and kexec_file_load > >>> are: > >>> > >>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > >>> have yet, just as x86_64, arm64 and s390 have done; > >>> > >>> 2) kexec_file_load is suggested to use, and take precedence over > >>> kexec_load in the future, if both are supported in one ARCH. > >> > >> The deep problem is that kexec_file_load is distinctly less expressive > >> than kexec_load. > >> > >>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > >>> and by ARCHes for back compatibility w/ kexec_file_load support. > >>> > >>> For 1) and 2), I think the reason is obvious as Eric said, > >>> kexec_file_load is simple enough. And currently, whenever we got a bug > >>> report, we may need fix them twice, for kexec_load and kexec_file_load. > >>> If kexec_file_load is made by default, e.g on x86_64, we will change it > >>> in kernel space only, for kexec_file_load. This is what I meant about > >>> 'obsolete gradually'. I think for arm64, s390, they will do these too. > >>> Unless there's some critical/blocker bug in kexec_load, to corrupt the > >>> old kexec_load interface in old product. > >> > >> Maybe. The code that kexec_file_load sucked into the kernel is quite > >> stable and rarely needs changes except during a port of kexec to > >> another architecture. > >> > >> Last I looked the real maintenance effor of kexec and kexec on panic was > >> in the drivers. So I don't think we can use maintenance to do anything. > > > > Not sure if I got it. But if check Lianbo's patches, a lot of effort has > > been taken to make SEV work well on kexec_file_load. And we have > > switched to use kexec_file_load in the newly published Fedora release > > on x86_64 by default. Before this, Lianbo has investigated and done many > > experiments to make sure the switching is safe. We finally made this > > decision. Next we will do the switch in Enterprise distros. Once these > > are proved safe, we will suggest customers to use kexec_file_load for > > kexec rebooting too. In the future, we will only care about > > kexec_file_load if everying is going well. But as I have explained > > repeatedly, only caring about kexec_file_load means we will leave > > kexec_load as is, we will not add new feature or improvement patches > > for it. > > > > commit 6a20bd54473e11011bf2b47efb52d0759d412854 > > Author: Lianbo Jiang <lijiang@redhat.com> > > Date: Thu Jan 16 13:47:35 2020 +0800 > > > > kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > > > >> > >>> For 3), people can still use kexec_load and develop/fix for it, if no > >>> kexec_file_load supported. But 32-bit arm should be a different one, > >>> more like i386, we will leave it as is, and fix anything which could > >>> break it. But people really expects to improve or add feature to it? E.g > >>> in this patchset, the mem hotplug issue James raised, I assume James is > >>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > >>> another reply, people even don't agree to continue supporting memory > >>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug > >>> bug on i386 with a patch, but people would rather set it as BROKEN. > >> > >> For memory hotplug just reload. Userspace already gets good events. > > > > Kexec_file_load is easy to maintain. This is an example. > > > > Lock the hotplug area where kexed-ed kernel is targeted in this patchset, > > it's obviously not right. We can't disable memory hotplug just because > > kexec-ed kernel is loaded ahead of time. > > > > Reloading is also not a good fix. Kexec-ed kernel is targeted at a > > movable area, reloading can avoid kexec rebooting corruption if that > > area is hot removed. But if that area is not removed, locating kernel > > into the hotpluggable area will change the area into ummovable zone. > > Unless we decide to not support memory hotplug in kexec-ed kernel, I > > guess it's very hard. Now in our distros kexec rebooting has been > > supported, the big cloud providers are deploying linux in guest, bugs on > > kexec reboot failure has been reported. They need the memory hotplug to > > increase/decrease memory. > > > > The root cause is kexec-ed kernel is targeted at hotpluggable memory > > region. Just avoiding the movable area can fix it. In kexec_file_load(), > > just checking or picking those unmovable region to put kernel/initrd in > > function locate_mem_hole_callback() can fix it. The page or pageblock's > > zone is movable or not, it's easy to know. This fix doesn't need to > > bother other component. > > I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL > does not imply that it cannot get offlined and removed e.g., this is > heavily used on ppc64, with 16MB sections. Really? I just know there are two kinds of mem hoplug in ppc, but don't know the details. So in this case, is there any flag or a way to know those memory block are hotpluggable? I am curious how those kernel data is avoided to be put in this area. Or ppc just freely uses it for kernel data or user space data, then try to migrate when hot remove? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 9:22 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-14 9:22 UTC (permalink / raw) To: David Hildenbrand Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/14/20 at 10:00am, David Hildenbrand wrote: > On 14.04.20 08:40, Baoquan He wrote: > > On 04/13/20 at 08:15am, Eric W. Biederman wrote: > >> Baoquan He <bhe@redhat.com> writes: > >> > >>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > >>>> > >>>> The only benefit of kexec_file_load is that it is simple enough from a > >>>> kernel perspective that signatures can be checked. > >>> > >>> We don't have this restriction any more with below commit: > >>> > >>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > >>> and KEXEC_SIG_FORCE") > >>> > >>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > >>> secure boot or legacy system for kexec/kdump. Being simple enough is > >>> enough to astract and convince us to use it instead. And kexec_file_load > >>> has been in use for several years on systems with secure boot, since > >>> added in 2014, on x86_64. > >> > >> No. Actaully kexec_file_load is the less capable interface, and less > >> flexible interface. Which is why it is appropriate for signature > >> verification. > > > > Well, everyone has a stance and the corresponding view. You could have > > wider view from long time maintenance and in upstrem position, and think > > kexec_file_load is horrible. But I can only see from our work as a front > > line engineer to maintain/develop kexec/kdump in RHEL, and think > > kexec_file_load is easier to maintain. > > > > Surely except of multiple kernel image format support. No matter it is > > kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > > This is produced from kerel building by default. We have no way to > > support it in our distros and add it into kexec_file_load. > > > > [RFC PATCH] x86/boot: make ELF kernel multiboot-able > > https://lkml.org/lkml/2017/2/15/654 > > > >> > >>>> kexec_load in every other respect is the more capable and functional > >>>> interface. It makes no sense to get rid of it. > >>>> > >>>> It does make sense to reload with a loaded kernel on memory hotplug. > >>>> That is simple and easy. If we are going to handle something in the > >>>> kernel it should simple an automated unloading of the kernel on memory > >>>> hotplug. > >>>> > >>>> > >>>> I think it would be irresponsible to deprecate kexec_load on any > >>>> platform. > >>>> > >>>> I also suspect that kexec_file_load could be taught to copy the dtb > >>>> on arm32 if someone wants to deal with signatures. > >>>> > >>>> We definitely can not even think of deprecating kexec_load until > >>>> architecture that supports it also supports kexec_file_load and everyone > >>>> is happy with that interface. That is Linus's no regression rule. > >>> > >>> I should pick a milder word to express our tendency and tell our plan > >>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help > >>> much. I didn't mean to say 'deprecate' at all when replied. > >>> > >>> The situation and trend I understand about kexec_load and kexec_file_load > >>> are: > >>> > >>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > >>> have yet, just as x86_64, arm64 and s390 have done; > >>> > >>> 2) kexec_file_load is suggested to use, and take precedence over > >>> kexec_load in the future, if both are supported in one ARCH. > >> > >> The deep problem is that kexec_file_load is distinctly less expressive > >> than kexec_load. > >> > >>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > >>> and by ARCHes for back compatibility w/ kexec_file_load support. > >>> > >>> For 1) and 2), I think the reason is obvious as Eric said, > >>> kexec_file_load is simple enough. And currently, whenever we got a bug > >>> report, we may need fix them twice, for kexec_load and kexec_file_load. > >>> If kexec_file_load is made by default, e.g on x86_64, we will change it > >>> in kernel space only, for kexec_file_load. This is what I meant about > >>> 'obsolete gradually'. I think for arm64, s390, they will do these too. > >>> Unless there's some critical/blocker bug in kexec_load, to corrupt the > >>> old kexec_load interface in old product. > >> > >> Maybe. The code that kexec_file_load sucked into the kernel is quite > >> stable and rarely needs changes except during a port of kexec to > >> another architecture. > >> > >> Last I looked the real maintenance effor of kexec and kexec on panic was > >> in the drivers. So I don't think we can use maintenance to do anything. > > > > Not sure if I got it. But if check Lianbo's patches, a lot of effort has > > been taken to make SEV work well on kexec_file_load. And we have > > switched to use kexec_file_load in the newly published Fedora release > > on x86_64 by default. Before this, Lianbo has investigated and done many > > experiments to make sure the switching is safe. We finally made this > > decision. Next we will do the switch in Enterprise distros. Once these > > are proved safe, we will suggest customers to use kexec_file_load for > > kexec rebooting too. In the future, we will only care about > > kexec_file_load if everying is going well. But as I have explained > > repeatedly, only caring about kexec_file_load means we will leave > > kexec_load as is, we will not add new feature or improvement patches > > for it. > > > > commit 6a20bd54473e11011bf2b47efb52d0759d412854 > > Author: Lianbo Jiang <lijiang@redhat.com> > > Date: Thu Jan 16 13:47:35 2020 +0800 > > > > kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > > > >> > >>> For 3), people can still use kexec_load and develop/fix for it, if no > >>> kexec_file_load supported. But 32-bit arm should be a different one, > >>> more like i386, we will leave it as is, and fix anything which could > >>> break it. But people really expects to improve or add feature to it? E.g > >>> in this patchset, the mem hotplug issue James raised, I assume James is > >>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > >>> another reply, people even don't agree to continue supporting memory > >>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug > >>> bug on i386 with a patch, but people would rather set it as BROKEN. > >> > >> For memory hotplug just reload. Userspace already gets good events. > > > > Kexec_file_load is easy to maintain. This is an example. > > > > Lock the hotplug area where kexed-ed kernel is targeted in this patchset, > > it's obviously not right. We can't disable memory hotplug just because > > kexec-ed kernel is loaded ahead of time. > > > > Reloading is also not a good fix. Kexec-ed kernel is targeted at a > > movable area, reloading can avoid kexec rebooting corruption if that > > area is hot removed. But if that area is not removed, locating kernel > > into the hotpluggable area will change the area into ummovable zone. > > Unless we decide to not support memory hotplug in kexec-ed kernel, I > > guess it's very hard. Now in our distros kexec rebooting has been > > supported, the big cloud providers are deploying linux in guest, bugs on > > kexec reboot failure has been reported. They need the memory hotplug to > > increase/decrease memory. > > > > The root cause is kexec-ed kernel is targeted at hotpluggable memory > > region. Just avoiding the movable area can fix it. In kexec_file_load(), > > just checking or picking those unmovable region to put kernel/initrd in > > function locate_mem_hole_callback() can fix it. The page or pageblock's > > zone is movable or not, it's easy to know. This fix doesn't need to > > bother other component. > > I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL > does not imply that it cannot get offlined and removed e.g., this is > heavily used on ppc64, with 16MB sections. Really? I just know there are two kinds of mem hoplug in ppc, but don't know the details. So in this case, is there any flag or a way to know those memory block are hotpluggable? I am curious how those kernel data is avoided to be put in this area. Or ppc just freely uses it for kernel data or user space data, then try to migrate when hot remove? ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 9:22 ` Baoquan He (?) (?) @ 2020-04-14 9:37 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-14 9:37 UTC (permalink / raw) To: Baoquan He Cc: Eric W. Biederman, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel, linuxppc-dev On 14.04.20 11:22, Baoquan He wrote: > On 04/14/20 at 10:00am, David Hildenbrand wrote: >> On 14.04.20 08:40, Baoquan He wrote: >>> On 04/13/20 at 08:15am, Eric W. Biederman wrote: >>>> Baoquan He <bhe@redhat.com> writes: >>>> >>>>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: >>>>>> >>>>>> The only benefit of kexec_file_load is that it is simple enough from a >>>>>> kernel perspective that signatures can be checked. >>>>> >>>>> We don't have this restriction any more with below commit: >>>>> >>>>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG >>>>> and KEXEC_SIG_FORCE") >>>>> >>>>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both >>>>> secure boot or legacy system for kexec/kdump. Being simple enough is >>>>> enough to astract and convince us to use it instead. And kexec_file_load >>>>> has been in use for several years on systems with secure boot, since >>>>> added in 2014, on x86_64. >>>> >>>> No. Actaully kexec_file_load is the less capable interface, and less >>>> flexible interface. Which is why it is appropriate for signature >>>> verification. >>> >>> Well, everyone has a stance and the corresponding view. You could have >>> wider view from long time maintenance and in upstrem position, and think >>> kexec_file_load is horrible. But I can only see from our work as a front >>> line engineer to maintain/develop kexec/kdump in RHEL, and think >>> kexec_file_load is easier to maintain. >>> >>> Surely except of multiple kernel image format support. No matter it is >>> kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. >>> This is produced from kerel building by default. We have no way to >>> support it in our distros and add it into kexec_file_load. >>> >>> [RFC PATCH] x86/boot: make ELF kernel multiboot-able >>> https://lkml.org/lkml/2017/2/15/654 >>> >>>> >>>>>> kexec_load in every other respect is the more capable and functional >>>>>> interface. It makes no sense to get rid of it. >>>>>> >>>>>> It does make sense to reload with a loaded kernel on memory hotplug. >>>>>> That is simple and easy. If we are going to handle something in the >>>>>> kernel it should simple an automated unloading of the kernel on memory >>>>>> hotplug. >>>>>> >>>>>> >>>>>> I think it would be irresponsible to deprecate kexec_load on any >>>>>> platform. >>>>>> >>>>>> I also suspect that kexec_file_load could be taught to copy the dtb >>>>>> on arm32 if someone wants to deal with signatures. >>>>>> >>>>>> We definitely can not even think of deprecating kexec_load until >>>>>> architecture that supports it also supports kexec_file_load and everyone >>>>>> is happy with that interface. That is Linus's no regression rule. >>>>> >>>>> I should pick a milder word to express our tendency and tell our plan >>>>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help >>>>> much. I didn't mean to say 'deprecate' at all when replied. >>>>> >>>>> The situation and trend I understand about kexec_load and kexec_file_load >>>>> are: >>>>> >>>>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't >>>>> have yet, just as x86_64, arm64 and s390 have done; >>>>> >>>>> 2) kexec_file_load is suggested to use, and take precedence over >>>>> kexec_load in the future, if both are supported in one ARCH. >>>> >>>> The deep problem is that kexec_file_load is distinctly less expressive >>>> than kexec_load. >>>> >>>>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, >>>>> and by ARCHes for back compatibility w/ kexec_file_load support. >>>>> >>>>> For 1) and 2), I think the reason is obvious as Eric said, >>>>> kexec_file_load is simple enough. And currently, whenever we got a bug >>>>> report, we may need fix them twice, for kexec_load and kexec_file_load. >>>>> If kexec_file_load is made by default, e.g on x86_64, we will change it >>>>> in kernel space only, for kexec_file_load. This is what I meant about >>>>> 'obsolete gradually'. I think for arm64, s390, they will do these too. >>>>> Unless there's some critical/blocker bug in kexec_load, to corrupt the >>>>> old kexec_load interface in old product. >>>> >>>> Maybe. The code that kexec_file_load sucked into the kernel is quite >>>> stable and rarely needs changes except during a port of kexec to >>>> another architecture. >>>> >>>> Last I looked the real maintenance effor of kexec and kexec on panic was >>>> in the drivers. So I don't think we can use maintenance to do anything. >>> >>> Not sure if I got it. But if check Lianbo's patches, a lot of effort has >>> been taken to make SEV work well on kexec_file_load. And we have >>> switched to use kexec_file_load in the newly published Fedora release >>> on x86_64 by default. Before this, Lianbo has investigated and done many >>> experiments to make sure the switching is safe. We finally made this >>> decision. Next we will do the switch in Enterprise distros. Once these >>> are proved safe, we will suggest customers to use kexec_file_load for >>> kexec rebooting too. In the future, we will only care about >>> kexec_file_load if everying is going well. But as I have explained >>> repeatedly, only caring about kexec_file_load means we will leave >>> kexec_load as is, we will not add new feature or improvement patches >>> for it. >>> >>> commit 6a20bd54473e11011bf2b47efb52d0759d412854 >>> Author: Lianbo Jiang <lijiang@redhat.com> >>> Date: Thu Jan 16 13:47:35 2020 +0800 >>> >>> kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default >>> >>>> >>>>> For 3), people can still use kexec_load and develop/fix for it, if no >>>>> kexec_file_load supported. But 32-bit arm should be a different one, >>>>> more like i386, we will leave it as is, and fix anything which could >>>>> break it. But people really expects to improve or add feature to it? E.g >>>>> in this patchset, the mem hotplug issue James raised, I assume James is >>>>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in >>>>> another reply, people even don't agree to continue supporting memory >>>>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug >>>>> bug on i386 with a patch, but people would rather set it as BROKEN. >>>> >>>> For memory hotplug just reload. Userspace already gets good events. >>> >>> Kexec_file_load is easy to maintain. This is an example. >>> >>> Lock the hotplug area where kexed-ed kernel is targeted in this patchset, >>> it's obviously not right. We can't disable memory hotplug just because >>> kexec-ed kernel is loaded ahead of time. >>> >>> Reloading is also not a good fix. Kexec-ed kernel is targeted at a >>> movable area, reloading can avoid kexec rebooting corruption if that >>> area is hot removed. But if that area is not removed, locating kernel >>> into the hotpluggable area will change the area into ummovable zone. >>> Unless we decide to not support memory hotplug in kexec-ed kernel, I >>> guess it's very hard. Now in our distros kexec rebooting has been >>> supported, the big cloud providers are deploying linux in guest, bugs on >>> kexec reboot failure has been reported. They need the memory hotplug to >>> increase/decrease memory. >>> >>> The root cause is kexec-ed kernel is targeted at hotpluggable memory >>> region. Just avoiding the movable area can fix it. In kexec_file_load(), >>> just checking or picking those unmovable region to put kernel/initrd in >>> function locate_mem_hole_callback() can fix it. The page or pageblock's >>> zone is movable or not, it's easy to know. This fix doesn't need to >>> bother other component. >> >> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL >> does not imply that it cannot get offlined and removed e.g., this is >> heavily used on ppc64, with 16MB sections. > > Really? I just know there are two kinds of mem hoplug in ppc, but don't > know the details. So in this case, is there any flag or a way to know > those memory block are hotpluggable? I am curious how those kernel data > is avoided to be put in this area. Or ppc just freely uses it for kernel > data or user space data, then try to migrate when hot remove? See arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() Under DLAPR, it can remove memory in LMB granularity, which is usually 16MB (== single section on ppc64). DLPAR will directly online all hotplugged memory (LMBs) from the kernel using device_online(), which will go to ZONE_NORMAL. When trying to remove memory, it simply scans for offlineable 16MB memory blocks (==section == LMB), offlines and removes them. No need for the movable zone and all the involved issues. Now, the interesting question is, can we have LMBs added during boot (not via add_memory()), that will later be removed via remove_memory(). IIRC, we had BUGs related to that, so I think yes. If a section contains no unmovable allocations (after boot), it can get removed. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 9:37 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-14 9:37 UTC (permalink / raw) To: Baoquan He Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 14.04.20 11:22, Baoquan He wrote: > On 04/14/20 at 10:00am, David Hildenbrand wrote: >> On 14.04.20 08:40, Baoquan He wrote: >>> On 04/13/20 at 08:15am, Eric W. Biederman wrote: >>>> Baoquan He <bhe@redhat.com> writes: >>>> >>>>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: >>>>>> >>>>>> The only benefit of kexec_file_load is that it is simple enough from a >>>>>> kernel perspective that signatures can be checked. >>>>> >>>>> We don't have this restriction any more with below commit: >>>>> >>>>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG >>>>> and KEXEC_SIG_FORCE") >>>>> >>>>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both >>>>> secure boot or legacy system for kexec/kdump. Being simple enough is >>>>> enough to astract and convince us to use it instead. And kexec_file_load >>>>> has been in use for several years on systems with secure boot, since >>>>> added in 2014, on x86_64. >>>> >>>> No. Actaully kexec_file_load is the less capable interface, and less >>>> flexible interface. Which is why it is appropriate for signature >>>> verification. >>> >>> Well, everyone has a stance and the corresponding view. You could have >>> wider view from long time maintenance and in upstrem position, and think >>> kexec_file_load is horrible. But I can only see from our work as a front >>> line engineer to maintain/develop kexec/kdump in RHEL, and think >>> kexec_file_load is easier to maintain. >>> >>> Surely except of multiple kernel image format support. No matter it is >>> kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. >>> This is produced from kerel building by default. We have no way to >>> support it in our distros and add it into kexec_file_load. >>> >>> [RFC PATCH] x86/boot: make ELF kernel multiboot-able >>> https://lkml.org/lkml/2017/2/15/654 >>> >>>> >>>>>> kexec_load in every other respect is the more capable and functional >>>>>> interface. It makes no sense to get rid of it. >>>>>> >>>>>> It does make sense to reload with a loaded kernel on memory hotplug. >>>>>> That is simple and easy. If we are going to handle something in the >>>>>> kernel it should simple an automated unloading of the kernel on memory >>>>>> hotplug. >>>>>> >>>>>> >>>>>> I think it would be irresponsible to deprecate kexec_load on any >>>>>> platform. >>>>>> >>>>>> I also suspect that kexec_file_load could be taught to copy the dtb >>>>>> on arm32 if someone wants to deal with signatures. >>>>>> >>>>>> We definitely can not even think of deprecating kexec_load until >>>>>> architecture that supports it also supports kexec_file_load and everyone >>>>>> is happy with that interface. That is Linus's no regression rule. >>>>> >>>>> I should pick a milder word to express our tendency and tell our plan >>>>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help >>>>> much. I didn't mean to say 'deprecate' at all when replied. >>>>> >>>>> The situation and trend I understand about kexec_load and kexec_file_load >>>>> are: >>>>> >>>>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't >>>>> have yet, just as x86_64, arm64 and s390 have done; >>>>> >>>>> 2) kexec_file_load is suggested to use, and take precedence over >>>>> kexec_load in the future, if both are supported in one ARCH. >>>> >>>> The deep problem is that kexec_file_load is distinctly less expressive >>>> than kexec_load. >>>> >>>>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, >>>>> and by ARCHes for back compatibility w/ kexec_file_load support. >>>>> >>>>> For 1) and 2), I think the reason is obvious as Eric said, >>>>> kexec_file_load is simple enough. And currently, whenever we got a bug >>>>> report, we may need fix them twice, for kexec_load and kexec_file_load. >>>>> If kexec_file_load is made by default, e.g on x86_64, we will change it >>>>> in kernel space only, for kexec_file_load. This is what I meant about >>>>> 'obsolete gradually'. I think for arm64, s390, they will do these too. >>>>> Unless there's some critical/blocker bug in kexec_load, to corrupt the >>>>> old kexec_load interface in old product. >>>> >>>> Maybe. The code that kexec_file_load sucked into the kernel is quite >>>> stable and rarely needs changes except during a port of kexec to >>>> another architecture. >>>> >>>> Last I looked the real maintenance effor of kexec and kexec on panic was >>>> in the drivers. So I don't think we can use maintenance to do anything. >>> >>> Not sure if I got it. But if check Lianbo's patches, a lot of effort has >>> been taken to make SEV work well on kexec_file_load. And we have >>> switched to use kexec_file_load in the newly published Fedora release >>> on x86_64 by default. Before this, Lianbo has investigated and done many >>> experiments to make sure the switching is safe. We finally made this >>> decision. Next we will do the switch in Enterprise distros. Once these >>> are proved safe, we will suggest customers to use kexec_file_load for >>> kexec rebooting too. In the future, we will only care about >>> kexec_file_load if everying is going well. But as I have explained >>> repeatedly, only caring about kexec_file_load means we will leave >>> kexec_load as is, we will not add new feature or improvement patches >>> for it. >>> >>> commit 6a20bd54473e11011bf2b47efb52d0759d412854 >>> Author: Lianbo Jiang <lijiang@redhat.com> >>> Date: Thu Jan 16 13:47:35 2020 +0800 >>> >>> kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default >>> >>>> >>>>> For 3), people can still use kexec_load and develop/fix for it, if no >>>>> kexec_file_load supported. But 32-bit arm should be a different one, >>>>> more like i386, we will leave it as is, and fix anything which could >>>>> break it. But people really expects to improve or add feature to it? E.g >>>>> in this patchset, the mem hotplug issue James raised, I assume James is >>>>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in >>>>> another reply, people even don't agree to continue supporting memory >>>>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug >>>>> bug on i386 with a patch, but people would rather set it as BROKEN. >>>> >>>> For memory hotplug just reload. Userspace already gets good events. >>> >>> Kexec_file_load is easy to maintain. This is an example. >>> >>> Lock the hotplug area where kexed-ed kernel is targeted in this patchset, >>> it's obviously not right. We can't disable memory hotplug just because >>> kexec-ed kernel is loaded ahead of time. >>> >>> Reloading is also not a good fix. Kexec-ed kernel is targeted at a >>> movable area, reloading can avoid kexec rebooting corruption if that >>> area is hot removed. But if that area is not removed, locating kernel >>> into the hotpluggable area will change the area into ummovable zone. >>> Unless we decide to not support memory hotplug in kexec-ed kernel, I >>> guess it's very hard. Now in our distros kexec rebooting has been >>> supported, the big cloud providers are deploying linux in guest, bugs on >>> kexec reboot failure has been reported. They need the memory hotplug to >>> increase/decrease memory. >>> >>> The root cause is kexec-ed kernel is targeted at hotpluggable memory >>> region. Just avoiding the movable area can fix it. In kexec_file_load(), >>> just checking or picking those unmovable region to put kernel/initrd in >>> function locate_mem_hole_callback() can fix it. The page or pageblock's >>> zone is movable or not, it's easy to know. This fix doesn't need to >>> bother other component. >> >> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL >> does not imply that it cannot get offlined and removed e.g., this is >> heavily used on ppc64, with 16MB sections. > > Really? I just know there are two kinds of mem hoplug in ppc, but don't > know the details. So in this case, is there any flag or a way to know > those memory block are hotpluggable? I am curious how those kernel data > is avoided to be put in this area. Or ppc just freely uses it for kernel > data or user space data, then try to migrate when hot remove? See arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() Under DLAPR, it can remove memory in LMB granularity, which is usually 16MB (== single section on ppc64). DLPAR will directly online all hotplugged memory (LMBs) from the kernel using device_online(), which will go to ZONE_NORMAL. When trying to remove memory, it simply scans for offlineable 16MB memory blocks (==section == LMB), offlines and removes them. No need for the movable zone and all the involved issues. Now, the interesting question is, can we have LMBs added during boot (not via add_memory()), that will later be removed via remove_memory(). IIRC, we had BUGs related to that, so I think yes. If a section contains no unmovable allocations (after boot), it can get removed. -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 9:37 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-14 9:37 UTC (permalink / raw) To: Baoquan He Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 14.04.20 11:22, Baoquan He wrote: > On 04/14/20 at 10:00am, David Hildenbrand wrote: >> On 14.04.20 08:40, Baoquan He wrote: >>> On 04/13/20 at 08:15am, Eric W. Biederman wrote: >>>> Baoquan He <bhe@redhat.com> writes: >>>> >>>>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: >>>>>> >>>>>> The only benefit of kexec_file_load is that it is simple enough from a >>>>>> kernel perspective that signatures can be checked. >>>>> >>>>> We don't have this restriction any more with below commit: >>>>> >>>>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG >>>>> and KEXEC_SIG_FORCE") >>>>> >>>>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both >>>>> secure boot or legacy system for kexec/kdump. Being simple enough is >>>>> enough to astract and convince us to use it instead. And kexec_file_load >>>>> has been in use for several years on systems with secure boot, since >>>>> added in 2014, on x86_64. >>>> >>>> No. Actaully kexec_file_load is the less capable interface, and less >>>> flexible interface. Which is why it is appropriate for signature >>>> verification. >>> >>> Well, everyone has a stance and the corresponding view. You could have >>> wider view from long time maintenance and in upstrem position, and think >>> kexec_file_load is horrible. But I can only see from our work as a front >>> line engineer to maintain/develop kexec/kdump in RHEL, and think >>> kexec_file_load is easier to maintain. >>> >>> Surely except of multiple kernel image format support. No matter it is >>> kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. >>> This is produced from kerel building by default. We have no way to >>> support it in our distros and add it into kexec_file_load. >>> >>> [RFC PATCH] x86/boot: make ELF kernel multiboot-able >>> https://lkml.org/lkml/2017/2/15/654 >>> >>>> >>>>>> kexec_load in every other respect is the more capable and functional >>>>>> interface. It makes no sense to get rid of it. >>>>>> >>>>>> It does make sense to reload with a loaded kernel on memory hotplug. >>>>>> That is simple and easy. If we are going to handle something in the >>>>>> kernel it should simple an automated unloading of the kernel on memory >>>>>> hotplug. >>>>>> >>>>>> >>>>>> I think it would be irresponsible to deprecate kexec_load on any >>>>>> platform. >>>>>> >>>>>> I also suspect that kexec_file_load could be taught to copy the dtb >>>>>> on arm32 if someone wants to deal with signatures. >>>>>> >>>>>> We definitely can not even think of deprecating kexec_load until >>>>>> architecture that supports it also supports kexec_file_load and everyone >>>>>> is happy with that interface. That is Linus's no regression rule. >>>>> >>>>> I should pick a milder word to express our tendency and tell our plan >>>>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help >>>>> much. I didn't mean to say 'deprecate' at all when replied. >>>>> >>>>> The situation and trend I understand about kexec_load and kexec_file_load >>>>> are: >>>>> >>>>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't >>>>> have yet, just as x86_64, arm64 and s390 have done; >>>>> >>>>> 2) kexec_file_load is suggested to use, and take precedence over >>>>> kexec_load in the future, if both are supported in one ARCH. >>>> >>>> The deep problem is that kexec_file_load is distinctly less expressive >>>> than kexec_load. >>>> >>>>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, >>>>> and by ARCHes for back compatibility w/ kexec_file_load support. >>>>> >>>>> For 1) and 2), I think the reason is obvious as Eric said, >>>>> kexec_file_load is simple enough. And currently, whenever we got a bug >>>>> report, we may need fix them twice, for kexec_load and kexec_file_load. >>>>> If kexec_file_load is made by default, e.g on x86_64, we will change it >>>>> in kernel space only, for kexec_file_load. This is what I meant about >>>>> 'obsolete gradually'. I think for arm64, s390, they will do these too. >>>>> Unless there's some critical/blocker bug in kexec_load, to corrupt the >>>>> old kexec_load interface in old product. >>>> >>>> Maybe. The code that kexec_file_load sucked into the kernel is quite >>>> stable and rarely needs changes except during a port of kexec to >>>> another architecture. >>>> >>>> Last I looked the real maintenance effor of kexec and kexec on panic was >>>> in the drivers. So I don't think we can use maintenance to do anything. >>> >>> Not sure if I got it. But if check Lianbo's patches, a lot of effort has >>> been taken to make SEV work well on kexec_file_load. And we have >>> switched to use kexec_file_load in the newly published Fedora release >>> on x86_64 by default. Before this, Lianbo has investigated and done many >>> experiments to make sure the switching is safe. We finally made this >>> decision. Next we will do the switch in Enterprise distros. Once these >>> are proved safe, we will suggest customers to use kexec_file_load for >>> kexec rebooting too. In the future, we will only care about >>> kexec_file_load if everying is going well. But as I have explained >>> repeatedly, only caring about kexec_file_load means we will leave >>> kexec_load as is, we will not add new feature or improvement patches >>> for it. >>> >>> commit 6a20bd54473e11011bf2b47efb52d0759d412854 >>> Author: Lianbo Jiang <lijiang@redhat.com> >>> Date: Thu Jan 16 13:47:35 2020 +0800 >>> >>> kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default >>> >>>> >>>>> For 3), people can still use kexec_load and develop/fix for it, if no >>>>> kexec_file_load supported. But 32-bit arm should be a different one, >>>>> more like i386, we will leave it as is, and fix anything which could >>>>> break it. But people really expects to improve or add feature to it? E.g >>>>> in this patchset, the mem hotplug issue James raised, I assume James is >>>>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in >>>>> another reply, people even don't agree to continue supporting memory >>>>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug >>>>> bug on i386 with a patch, but people would rather set it as BROKEN. >>>> >>>> For memory hotplug just reload. Userspace already gets good events. >>> >>> Kexec_file_load is easy to maintain. This is an example. >>> >>> Lock the hotplug area where kexed-ed kernel is targeted in this patchset, >>> it's obviously not right. We can't disable memory hotplug just because >>> kexec-ed kernel is loaded ahead of time. >>> >>> Reloading is also not a good fix. Kexec-ed kernel is targeted at a >>> movable area, reloading can avoid kexec rebooting corruption if that >>> area is hot removed. But if that area is not removed, locating kernel >>> into the hotpluggable area will change the area into ummovable zone. >>> Unless we decide to not support memory hotplug in kexec-ed kernel, I >>> guess it's very hard. Now in our distros kexec rebooting has been >>> supported, the big cloud providers are deploying linux in guest, bugs on >>> kexec reboot failure has been reported. They need the memory hotplug to >>> increase/decrease memory. >>> >>> The root cause is kexec-ed kernel is targeted at hotpluggable memory >>> region. Just avoiding the movable area can fix it. In kexec_file_load(), >>> just checking or picking those unmovable region to put kernel/initrd in >>> function locate_mem_hole_callback() can fix it. The page or pageblock's >>> zone is movable or not, it's easy to know. This fix doesn't need to >>> bother other component. >> >> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL >> does not imply that it cannot get offlined and removed e.g., this is >> heavily used on ppc64, with 16MB sections. > > Really? I just know there are two kinds of mem hoplug in ppc, but don't > know the details. So in this case, is there any flag or a way to know > those memory block are hotpluggable? I am curious how those kernel data > is avoided to be put in this area. Or ppc just freely uses it for kernel > data or user space data, then try to migrate when hot remove? See arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() Under DLAPR, it can remove memory in LMB granularity, which is usually 16MB (== single section on ppc64). DLPAR will directly online all hotplugged memory (LMBs) from the kernel using device_online(), which will go to ZONE_NORMAL. When trying to remove memory, it simply scans for offlineable 16MB memory blocks (==section == LMB), offlines and removes them. No need for the movable zone and all the involved issues. Now, the interesting question is, can we have LMBs added during boot (not via add_memory()), that will later be removed via remove_memory(). IIRC, we had BUGs related to that, so I think yes. If a section contains no unmovable allocations (after boot), it can get removed. -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 9:37 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-14 9:37 UTC (permalink / raw) To: Baoquan He Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 14.04.20 11:22, Baoquan He wrote: > On 04/14/20 at 10:00am, David Hildenbrand wrote: >> On 14.04.20 08:40, Baoquan He wrote: >>> On 04/13/20 at 08:15am, Eric W. Biederman wrote: >>>> Baoquan He <bhe@redhat.com> writes: >>>> >>>>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: >>>>>> >>>>>> The only benefit of kexec_file_load is that it is simple enough from a >>>>>> kernel perspective that signatures can be checked. >>>>> >>>>> We don't have this restriction any more with below commit: >>>>> >>>>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG >>>>> and KEXEC_SIG_FORCE") >>>>> >>>>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both >>>>> secure boot or legacy system for kexec/kdump. Being simple enough is >>>>> enough to astract and convince us to use it instead. And kexec_file_load >>>>> has been in use for several years on systems with secure boot, since >>>>> added in 2014, on x86_64. >>>> >>>> No. Actaully kexec_file_load is the less capable interface, and less >>>> flexible interface. Which is why it is appropriate for signature >>>> verification. >>> >>> Well, everyone has a stance and the corresponding view. You could have >>> wider view from long time maintenance and in upstrem position, and think >>> kexec_file_load is horrible. But I can only see from our work as a front >>> line engineer to maintain/develop kexec/kdump in RHEL, and think >>> kexec_file_load is easier to maintain. >>> >>> Surely except of multiple kernel image format support. No matter it is >>> kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. >>> This is produced from kerel building by default. We have no way to >>> support it in our distros and add it into kexec_file_load. >>> >>> [RFC PATCH] x86/boot: make ELF kernel multiboot-able >>> https://lkml.org/lkml/2017/2/15/654 >>> >>>> >>>>>> kexec_load in every other respect is the more capable and functional >>>>>> interface. It makes no sense to get rid of it. >>>>>> >>>>>> It does make sense to reload with a loaded kernel on memory hotplug. >>>>>> That is simple and easy. If we are going to handle something in the >>>>>> kernel it should simple an automated unloading of the kernel on memory >>>>>> hotplug. >>>>>> >>>>>> >>>>>> I think it would be irresponsible to deprecate kexec_load on any >>>>>> platform. >>>>>> >>>>>> I also suspect that kexec_file_load could be taught to copy the dtb >>>>>> on arm32 if someone wants to deal with signatures. >>>>>> >>>>>> We definitely can not even think of deprecating kexec_load until >>>>>> architecture that supports it also supports kexec_file_load and everyone >>>>>> is happy with that interface. That is Linus's no regression rule. >>>>> >>>>> I should pick a milder word to express our tendency and tell our plan >>>>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help >>>>> much. I didn't mean to say 'deprecate' at all when replied. >>>>> >>>>> The situation and trend I understand about kexec_load and kexec_file_load >>>>> are: >>>>> >>>>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't >>>>> have yet, just as x86_64, arm64 and s390 have done; >>>>> >>>>> 2) kexec_file_load is suggested to use, and take precedence over >>>>> kexec_load in the future, if both are supported in one ARCH. >>>> >>>> The deep problem is that kexec_file_load is distinctly less expressive >>>> than kexec_load. >>>> >>>>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, >>>>> and by ARCHes for back compatibility w/ kexec_file_load support. >>>>> >>>>> For 1) and 2), I think the reason is obvious as Eric said, >>>>> kexec_file_load is simple enough. And currently, whenever we got a bug >>>>> report, we may need fix them twice, for kexec_load and kexec_file_load. >>>>> If kexec_file_load is made by default, e.g on x86_64, we will change it >>>>> in kernel space only, for kexec_file_load. This is what I meant about >>>>> 'obsolete gradually'. I think for arm64, s390, they will do these too. >>>>> Unless there's some critical/blocker bug in kexec_load, to corrupt the >>>>> old kexec_load interface in old product. >>>> >>>> Maybe. The code that kexec_file_load sucked into the kernel is quite >>>> stable and rarely needs changes except during a port of kexec to >>>> another architecture. >>>> >>>> Last I looked the real maintenance effor of kexec and kexec on panic was >>>> in the drivers. So I don't think we can use maintenance to do anything. >>> >>> Not sure if I got it. But if check Lianbo's patches, a lot of effort has >>> been taken to make SEV work well on kexec_file_load. And we have >>> switched to use kexec_file_load in the newly published Fedora release >>> on x86_64 by default. Before this, Lianbo has investigated and done many >>> experiments to make sure the switching is safe. We finally made this >>> decision. Next we will do the switch in Enterprise distros. Once these >>> are proved safe, we will suggest customers to use kexec_file_load for >>> kexec rebooting too. In the future, we will only care about >>> kexec_file_load if everying is going well. But as I have explained >>> repeatedly, only caring about kexec_file_load means we will leave >>> kexec_load as is, we will not add new feature or improvement patches >>> for it. >>> >>> commit 6a20bd54473e11011bf2b47efb52d0759d412854 >>> Author: Lianbo Jiang <lijiang@redhat.com> >>> Date: Thu Jan 16 13:47:35 2020 +0800 >>> >>> kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default >>> >>>> >>>>> For 3), people can still use kexec_load and develop/fix for it, if no >>>>> kexec_file_load supported. But 32-bit arm should be a different one, >>>>> more like i386, we will leave it as is, and fix anything which could >>>>> break it. But people really expects to improve or add feature to it? E.g >>>>> in this patchset, the mem hotplug issue James raised, I assume James is >>>>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in >>>>> another reply, people even don't agree to continue supporting memory >>>>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug >>>>> bug on i386 with a patch, but people would rather set it as BROKEN. >>>> >>>> For memory hotplug just reload. Userspace already gets good events. >>> >>> Kexec_file_load is easy to maintain. This is an example. >>> >>> Lock the hotplug area where kexed-ed kernel is targeted in this patchset, >>> it's obviously not right. We can't disable memory hotplug just because >>> kexec-ed kernel is loaded ahead of time. >>> >>> Reloading is also not a good fix. Kexec-ed kernel is targeted at a >>> movable area, reloading can avoid kexec rebooting corruption if that >>> area is hot removed. But if that area is not removed, locating kernel >>> into the hotpluggable area will change the area into ummovable zone. >>> Unless we decide to not support memory hotplug in kexec-ed kernel, I >>> guess it's very hard. Now in our distros kexec rebooting has been >>> supported, the big cloud providers are deploying linux in guest, bugs on >>> kexec reboot failure has been reported. They need the memory hotplug to >>> increase/decrease memory. >>> >>> The root cause is kexec-ed kernel is targeted at hotpluggable memory >>> region. Just avoiding the movable area can fix it. In kexec_file_load(), >>> just checking or picking those unmovable region to put kernel/initrd in >>> function locate_mem_hole_callback() can fix it. The page or pageblock's >>> zone is movable or not, it's easy to know. This fix doesn't need to >>> bother other component. >> >> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL >> does not imply that it cannot get offlined and removed e.g., this is >> heavily used on ppc64, with 16MB sections. > > Really? I just know there are two kinds of mem hoplug in ppc, but don't > know the details. So in this case, is there any flag or a way to know > those memory block are hotpluggable? I am curious how those kernel data > is avoided to be put in this area. Or ppc just freely uses it for kernel > data or user space data, then try to migrate when hot remove? See arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() Under DLAPR, it can remove memory in LMB granularity, which is usually 16MB (== single section on ppc64). DLPAR will directly online all hotplugged memory (LMBs) from the kernel using device_online(), which will go to ZONE_NORMAL. When trying to remove memory, it simply scans for offlineable 16MB memory blocks (==section == LMB), offlines and removes them. No need for the movable zone and all the involved issues. Now, the interesting question is, can we have LMBs added during boot (not via add_memory()), that will later be removed via remove_memory(). IIRC, we had BUGs related to that, so I think yes. If a section contains no unmovable allocations (after boot), it can get removed. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 9:37 ` David Hildenbrand (?) (?) @ 2020-04-14 14:39 ` Baoquan He -1 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-14 14:39 UTC (permalink / raw) To: David Hildenbrand Cc: Eric W. Biederman, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel, linuxppc-dev, piliu On 04/14/20 at 11:37am, David Hildenbrand wrote: > On 14.04.20 11:22, Baoquan He wrote: > > On 04/14/20 at 10:00am, David Hildenbrand wrote: > >> On 14.04.20 08:40, Baoquan He wrote: > >>> On 04/13/20 at 08:15am, Eric W. Biederman wrote: > >>>> Baoquan He <bhe@redhat.com> writes: > >>>> > >>>>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > >>>>>> > >>>>>> The only benefit of kexec_file_load is that it is simple enough from a > >>>>>> kernel perspective that signatures can be checked. > >>>>> > >>>>> We don't have this restriction any more with below commit: > >>>>> > >>>>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > >>>>> and KEXEC_SIG_FORCE") > >>>>> > >>>>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > >>>>> secure boot or legacy system for kexec/kdump. Being simple enough is > >>>>> enough to astract and convince us to use it instead. And kexec_file_load > >>>>> has been in use for several years on systems with secure boot, since > >>>>> added in 2014, on x86_64. > >>>> > >>>> No. Actaully kexec_file_load is the less capable interface, and less > >>>> flexible interface. Which is why it is appropriate for signature > >>>> verification. > >>> > >>> Well, everyone has a stance and the corresponding view. You could have > >>> wider view from long time maintenance and in upstrem position, and think > >>> kexec_file_load is horrible. But I can only see from our work as a front > >>> line engineer to maintain/develop kexec/kdump in RHEL, and think > >>> kexec_file_load is easier to maintain. > >>> > >>> Surely except of multiple kernel image format support. No matter it is > >>> kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > >>> This is produced from kerel building by default. We have no way to > >>> support it in our distros and add it into kexec_file_load. > >>> > >>> [RFC PATCH] x86/boot: make ELF kernel multiboot-able > >>> https://lkml.org/lkml/2017/2/15/654 > >>> > >>>> > >>>>>> kexec_load in every other respect is the more capable and functional > >>>>>> interface. It makes no sense to get rid of it. > >>>>>> > >>>>>> It does make sense to reload with a loaded kernel on memory hotplug. > >>>>>> That is simple and easy. If we are going to handle something in the > >>>>>> kernel it should simple an automated unloading of the kernel on memory > >>>>>> hotplug. > >>>>>> > >>>>>> > >>>>>> I think it would be irresponsible to deprecate kexec_load on any > >>>>>> platform. > >>>>>> > >>>>>> I also suspect that kexec_file_load could be taught to copy the dtb > >>>>>> on arm32 if someone wants to deal with signatures. > >>>>>> > >>>>>> We definitely can not even think of deprecating kexec_load until > >>>>>> architecture that supports it also supports kexec_file_load and everyone > >>>>>> is happy with that interface. That is Linus's no regression rule. > >>>>> > >>>>> I should pick a milder word to express our tendency and tell our plan > >>>>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help > >>>>> much. I didn't mean to say 'deprecate' at all when replied. > >>>>> > >>>>> The situation and trend I understand about kexec_load and kexec_file_load > >>>>> are: > >>>>> > >>>>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > >>>>> have yet, just as x86_64, arm64 and s390 have done; > >>>>> > >>>>> 2) kexec_file_load is suggested to use, and take precedence over > >>>>> kexec_load in the future, if both are supported in one ARCH. > >>>> > >>>> The deep problem is that kexec_file_load is distinctly less expressive > >>>> than kexec_load. > >>>> > >>>>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > >>>>> and by ARCHes for back compatibility w/ kexec_file_load support. > >>>>> > >>>>> For 1) and 2), I think the reason is obvious as Eric said, > >>>>> kexec_file_load is simple enough. And currently, whenever we got a bug > >>>>> report, we may need fix them twice, for kexec_load and kexec_file_load. > >>>>> If kexec_file_load is made by default, e.g on x86_64, we will change it > >>>>> in kernel space only, for kexec_file_load. This is what I meant about > >>>>> 'obsolete gradually'. I think for arm64, s390, they will do these too. > >>>>> Unless there's some critical/blocker bug in kexec_load, to corrupt the > >>>>> old kexec_load interface in old product. > >>>> > >>>> Maybe. The code that kexec_file_load sucked into the kernel is quite > >>>> stable and rarely needs changes except during a port of kexec to > >>>> another architecture. > >>>> > >>>> Last I looked the real maintenance effor of kexec and kexec on panic was > >>>> in the drivers. So I don't think we can use maintenance to do anything. > >>> > >>> Not sure if I got it. But if check Lianbo's patches, a lot of effort has > >>> been taken to make SEV work well on kexec_file_load. And we have > >>> switched to use kexec_file_load in the newly published Fedora release > >>> on x86_64 by default. Before this, Lianbo has investigated and done many > >>> experiments to make sure the switching is safe. We finally made this > >>> decision. Next we will do the switch in Enterprise distros. Once these > >>> are proved safe, we will suggest customers to use kexec_file_load for > >>> kexec rebooting too. In the future, we will only care about > >>> kexec_file_load if everying is going well. But as I have explained > >>> repeatedly, only caring about kexec_file_load means we will leave > >>> kexec_load as is, we will not add new feature or improvement patches > >>> for it. > >>> > >>> commit 6a20bd54473e11011bf2b47efb52d0759d412854 > >>> Author: Lianbo Jiang <lijiang@redhat.com> > >>> Date: Thu Jan 16 13:47:35 2020 +0800 > >>> > >>> kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > >>> > >>>> > >>>>> For 3), people can still use kexec_load and develop/fix for it, if no > >>>>> kexec_file_load supported. But 32-bit arm should be a different one, > >>>>> more like i386, we will leave it as is, and fix anything which could > >>>>> break it. But people really expects to improve or add feature to it? E.g > >>>>> in this patchset, the mem hotplug issue James raised, I assume James is > >>>>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > >>>>> another reply, people even don't agree to continue supporting memory > >>>>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug > >>>>> bug on i386 with a patch, but people would rather set it as BROKEN. > >>>> > >>>> For memory hotplug just reload. Userspace already gets good events. > >>> > >>> Kexec_file_load is easy to maintain. This is an example. > >>> > >>> Lock the hotplug area where kexed-ed kernel is targeted in this patchset, > >>> it's obviously not right. We can't disable memory hotplug just because > >>> kexec-ed kernel is loaded ahead of time. > >>> > >>> Reloading is also not a good fix. Kexec-ed kernel is targeted at a > >>> movable area, reloading can avoid kexec rebooting corruption if that > >>> area is hot removed. But if that area is not removed, locating kernel > >>> into the hotpluggable area will change the area into ummovable zone. > >>> Unless we decide to not support memory hotplug in kexec-ed kernel, I > >>> guess it's very hard. Now in our distros kexec rebooting has been > >>> supported, the big cloud providers are deploying linux in guest, bugs on > >>> kexec reboot failure has been reported. They need the memory hotplug to > >>> increase/decrease memory. > >>> > >>> The root cause is kexec-ed kernel is targeted at hotpluggable memory > >>> region. Just avoiding the movable area can fix it. In kexec_file_load(), > >>> just checking or picking those unmovable region to put kernel/initrd in > >>> function locate_mem_hole_callback() can fix it. The page or pageblock's > >>> zone is movable or not, it's easy to know. This fix doesn't need to > >>> bother other component. > >> > >> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL > >> does not imply that it cannot get offlined and removed e.g., this is > >> heavily used on ppc64, with 16MB sections. > > > > Really? I just know there are two kinds of mem hoplug in ppc, but don't > > know the details. So in this case, is there any flag or a way to know > > those memory block are hotpluggable? I am curious how those kernel data > > is avoided to be put in this area. Or ppc just freely uses it for kernel > > data or user space data, then try to migrate when hot remove? > > See > arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() > > Under DLAPR, it can remove memory in LMB granularity, which is usually > 16MB (== single section on ppc64). DLPAR will directly online all > hotplugged memory (LMBs) from the kernel using device_online(), which > will go to ZONE_NORMAL. > > When trying to remove memory, it simply scans for offlineable 16MB > memory blocks (==section == LMB), offlines and removes them. No need for > the movable zone and all the involved issues. Yes, this is a different one, thanks for pointing it out. It sounds like balloon driver in virt platform, doesn't it? Avoiding to put kexec kernel into movable zone can't solve this DLPAR case as you said. > > Now, the interesting question is, can we have LMBs added during boot > (not via add_memory()), that will later be removed via remove_memory(). > IIRC, we had BUGs related to that, so I think yes. If a section contains > no unmovable allocations (after boot), it can get removed. I do want to ask this question. If we can add LMB into system RAM, then reload kexec can solve it. Another better way is adding a common function to filter out the movable zone when search position for kexec kernel, use a arch specific funciton to filter out DLPAR memory blocks for ppc only. Over there, we can simply use for_each_drmem_lmb() to do that. ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 14:39 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-14 14:39 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/14/20 at 11:37am, David Hildenbrand wrote: > On 14.04.20 11:22, Baoquan He wrote: > > On 04/14/20 at 10:00am, David Hildenbrand wrote: > >> On 14.04.20 08:40, Baoquan He wrote: > >>> On 04/13/20 at 08:15am, Eric W. Biederman wrote: > >>>> Baoquan He <bhe@redhat.com> writes: > >>>> > >>>>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > >>>>>> > >>>>>> The only benefit of kexec_file_load is that it is simple enough from a > >>>>>> kernel perspective that signatures can be checked. > >>>>> > >>>>> We don't have this restriction any more with below commit: > >>>>> > >>>>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > >>>>> and KEXEC_SIG_FORCE") > >>>>> > >>>>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > >>>>> secure boot or legacy system for kexec/kdump. Being simple enough is > >>>>> enough to astract and convince us to use it instead. And kexec_file_load > >>>>> has been in use for several years on systems with secure boot, since > >>>>> added in 2014, on x86_64. > >>>> > >>>> No. Actaully kexec_file_load is the less capable interface, and less > >>>> flexible interface. Which is why it is appropriate for signature > >>>> verification. > >>> > >>> Well, everyone has a stance and the corresponding view. You could have > >>> wider view from long time maintenance and in upstrem position, and think > >>> kexec_file_load is horrible. But I can only see from our work as a front > >>> line engineer to maintain/develop kexec/kdump in RHEL, and think > >>> kexec_file_load is easier to maintain. > >>> > >>> Surely except of multiple kernel image format support. No matter it is > >>> kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > >>> This is produced from kerel building by default. We have no way to > >>> support it in our distros and add it into kexec_file_load. > >>> > >>> [RFC PATCH] x86/boot: make ELF kernel multiboot-able > >>> https://lkml.org/lkml/2017/2/15/654 > >>> > >>>> > >>>>>> kexec_load in every other respect is the more capable and functional > >>>>>> interface. It makes no sense to get rid of it. > >>>>>> > >>>>>> It does make sense to reload with a loaded kernel on memory hotplug. > >>>>>> That is simple and easy. If we are going to handle something in the > >>>>>> kernel it should simple an automated unloading of the kernel on memory > >>>>>> hotplug. > >>>>>> > >>>>>> > >>>>>> I think it would be irresponsible to deprecate kexec_load on any > >>>>>> platform. > >>>>>> > >>>>>> I also suspect that kexec_file_load could be taught to copy the dtb > >>>>>> on arm32 if someone wants to deal with signatures. > >>>>>> > >>>>>> We definitely can not even think of deprecating kexec_load until > >>>>>> architecture that supports it also supports kexec_file_load and everyone > >>>>>> is happy with that interface. That is Linus's no regression rule. > >>>>> > >>>>> I should pick a milder word to express our tendency and tell our plan > >>>>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help > >>>>> much. I didn't mean to say 'deprecate' at all when replied. > >>>>> > >>>>> The situation and trend I understand about kexec_load and kexec_file_load > >>>>> are: > >>>>> > >>>>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > >>>>> have yet, just as x86_64, arm64 and s390 have done; > >>>>> > >>>>> 2) kexec_file_load is suggested to use, and take precedence over > >>>>> kexec_load in the future, if both are supported in one ARCH. > >>>> > >>>> The deep problem is that kexec_file_load is distinctly less expressive > >>>> than kexec_load. > >>>> > >>>>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > >>>>> and by ARCHes for back compatibility w/ kexec_file_load support. > >>>>> > >>>>> For 1) and 2), I think the reason is obvious as Eric said, > >>>>> kexec_file_load is simple enough. And currently, whenever we got a bug > >>>>> report, we may need fix them twice, for kexec_load and kexec_file_load. > >>>>> If kexec_file_load is made by default, e.g on x86_64, we will change it > >>>>> in kernel space only, for kexec_file_load. This is what I meant about > >>>>> 'obsolete gradually'. I think for arm64, s390, they will do these too. > >>>>> Unless there's some critical/blocker bug in kexec_load, to corrupt the > >>>>> old kexec_load interface in old product. > >>>> > >>>> Maybe. The code that kexec_file_load sucked into the kernel is quite > >>>> stable and rarely needs changes except during a port of kexec to > >>>> another architecture. > >>>> > >>>> Last I looked the real maintenance effor of kexec and kexec on panic was > >>>> in the drivers. So I don't think we can use maintenance to do anything. > >>> > >>> Not sure if I got it. But if check Lianbo's patches, a lot of effort has > >>> been taken to make SEV work well on kexec_file_load. And we have > >>> switched to use kexec_file_load in the newly published Fedora release > >>> on x86_64 by default. Before this, Lianbo has investigated and done many > >>> experiments to make sure the switching is safe. We finally made this > >>> decision. Next we will do the switch in Enterprise distros. Once these > >>> are proved safe, we will suggest customers to use kexec_file_load for > >>> kexec rebooting too. In the future, we will only care about > >>> kexec_file_load if everying is going well. But as I have explained > >>> repeatedly, only caring about kexec_file_load means we will leave > >>> kexec_load as is, we will not add new feature or improvement patches > >>> for it. > >>> > >>> commit 6a20bd54473e11011bf2b47efb52d0759d412854 > >>> Author: Lianbo Jiang <lijiang@redhat.com> > >>> Date: Thu Jan 16 13:47:35 2020 +0800 > >>> > >>> kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > >>> > >>>> > >>>>> For 3), people can still use kexec_load and develop/fix for it, if no > >>>>> kexec_file_load supported. But 32-bit arm should be a different one, > >>>>> more like i386, we will leave it as is, and fix anything which could > >>>>> break it. But people really expects to improve or add feature to it? E.g > >>>>> in this patchset, the mem hotplug issue James raised, I assume James is > >>>>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > >>>>> another reply, people even don't agree to continue supporting memory > >>>>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug > >>>>> bug on i386 with a patch, but people would rather set it as BROKEN. > >>>> > >>>> For memory hotplug just reload. Userspace already gets good events. > >>> > >>> Kexec_file_load is easy to maintain. This is an example. > >>> > >>> Lock the hotplug area where kexed-ed kernel is targeted in this patchset, > >>> it's obviously not right. We can't disable memory hotplug just because > >>> kexec-ed kernel is loaded ahead of time. > >>> > >>> Reloading is also not a good fix. Kexec-ed kernel is targeted at a > >>> movable area, reloading can avoid kexec rebooting corruption if that > >>> area is hot removed. But if that area is not removed, locating kernel > >>> into the hotpluggable area will change the area into ummovable zone. > >>> Unless we decide to not support memory hotplug in kexec-ed kernel, I > >>> guess it's very hard. Now in our distros kexec rebooting has been > >>> supported, the big cloud providers are deploying linux in guest, bugs on > >>> kexec reboot failure has been reported. They need the memory hotplug to > >>> increase/decrease memory. > >>> > >>> The root cause is kexec-ed kernel is targeted at hotpluggable memory > >>> region. Just avoiding the movable area can fix it. In kexec_file_load(), > >>> just checking or picking those unmovable region to put kernel/initrd in > >>> function locate_mem_hole_callback() can fix it. The page or pageblock's > >>> zone is movable or not, it's easy to know. This fix doesn't need to > >>> bother other component. > >> > >> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL > >> does not imply that it cannot get offlined and removed e.g., this is > >> heavily used on ppc64, with 16MB sections. > > > > Really? I just know there are two kinds of mem hoplug in ppc, but don't > > know the details. So in this case, is there any flag or a way to know > > those memory block are hotpluggable? I am curious how those kernel data > > is avoided to be put in this area. Or ppc just freely uses it for kernel > > data or user space data, then try to migrate when hot remove? > > See > arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() > > Under DLAPR, it can remove memory in LMB granularity, which is usually > 16MB (== single section on ppc64). DLPAR will directly online all > hotplugged memory (LMBs) from the kernel using device_online(), which > will go to ZONE_NORMAL. > > When trying to remove memory, it simply scans for offlineable 16MB > memory blocks (==section == LMB), offlines and removes them. No need for > the movable zone and all the involved issues. Yes, this is a different one, thanks for pointing it out. It sounds like balloon driver in virt platform, doesn't it? Avoiding to put kexec kernel into movable zone can't solve this DLPAR case as you said. > > Now, the interesting question is, can we have LMBs added during boot > (not via add_memory()), that will later be removed via remove_memory(). > IIRC, we had BUGs related to that, so I think yes. If a section contains > no unmovable allocations (after boot), it can get removed. I do want to ask this question. If we can add LMB into system RAM, then reload kexec can solve it. Another better way is adding a common function to filter out the movable zone when search position for kexec kernel, use a arch specific funciton to filter out DLPAR memory blocks for ppc only. Over there, we can simply use for_each_drmem_lmb() to do that. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 14:39 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-14 14:39 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/14/20 at 11:37am, David Hildenbrand wrote: > On 14.04.20 11:22, Baoquan He wrote: > > On 04/14/20 at 10:00am, David Hildenbrand wrote: > >> On 14.04.20 08:40, Baoquan He wrote: > >>> On 04/13/20 at 08:15am, Eric W. Biederman wrote: > >>>> Baoquan He <bhe@redhat.com> writes: > >>>> > >>>>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > >>>>>> > >>>>>> The only benefit of kexec_file_load is that it is simple enough from a > >>>>>> kernel perspective that signatures can be checked. > >>>>> > >>>>> We don't have this restriction any more with below commit: > >>>>> > >>>>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > >>>>> and KEXEC_SIG_FORCE") > >>>>> > >>>>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > >>>>> secure boot or legacy system for kexec/kdump. Being simple enough is > >>>>> enough to astract and convince us to use it instead. And kexec_file_load > >>>>> has been in use for several years on systems with secure boot, since > >>>>> added in 2014, on x86_64. > >>>> > >>>> No. Actaully kexec_file_load is the less capable interface, and less > >>>> flexible interface. Which is why it is appropriate for signature > >>>> verification. > >>> > >>> Well, everyone has a stance and the corresponding view. You could have > >>> wider view from long time maintenance and in upstrem position, and think > >>> kexec_file_load is horrible. But I can only see from our work as a front > >>> line engineer to maintain/develop kexec/kdump in RHEL, and think > >>> kexec_file_load is easier to maintain. > >>> > >>> Surely except of multiple kernel image format support. No matter it is > >>> kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > >>> This is produced from kerel building by default. We have no way to > >>> support it in our distros and add it into kexec_file_load. > >>> > >>> [RFC PATCH] x86/boot: make ELF kernel multiboot-able > >>> https://lkml.org/lkml/2017/2/15/654 > >>> > >>>> > >>>>>> kexec_load in every other respect is the more capable and functional > >>>>>> interface. It makes no sense to get rid of it. > >>>>>> > >>>>>> It does make sense to reload with a loaded kernel on memory hotplug. > >>>>>> That is simple and easy. If we are going to handle something in the > >>>>>> kernel it should simple an automated unloading of the kernel on memory > >>>>>> hotplug. > >>>>>> > >>>>>> > >>>>>> I think it would be irresponsible to deprecate kexec_load on any > >>>>>> platform. > >>>>>> > >>>>>> I also suspect that kexec_file_load could be taught to copy the dtb > >>>>>> on arm32 if someone wants to deal with signatures. > >>>>>> > >>>>>> We definitely can not even think of deprecating kexec_load until > >>>>>> architecture that supports it also supports kexec_file_load and everyone > >>>>>> is happy with that interface. That is Linus's no regression rule. > >>>>> > >>>>> I should pick a milder word to express our tendency and tell our plan > >>>>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help > >>>>> much. I didn't mean to say 'deprecate' at all when replied. > >>>>> > >>>>> The situation and trend I understand about kexec_load and kexec_file_load > >>>>> are: > >>>>> > >>>>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > >>>>> have yet, just as x86_64, arm64 and s390 have done; > >>>>> > >>>>> 2) kexec_file_load is suggested to use, and take precedence over > >>>>> kexec_load in the future, if both are supported in one ARCH. > >>>> > >>>> The deep problem is that kexec_file_load is distinctly less expressive > >>>> than kexec_load. > >>>> > >>>>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > >>>>> and by ARCHes for back compatibility w/ kexec_file_load support. > >>>>> > >>>>> For 1) and 2), I think the reason is obvious as Eric said, > >>>>> kexec_file_load is simple enough. And currently, whenever we got a bug > >>>>> report, we may need fix them twice, for kexec_load and kexec_file_load. > >>>>> If kexec_file_load is made by default, e.g on x86_64, we will change it > >>>>> in kernel space only, for kexec_file_load. This is what I meant about > >>>>> 'obsolete gradually'. I think for arm64, s390, they will do these too. > >>>>> Unless there's some critical/blocker bug in kexec_load, to corrupt the > >>>>> old kexec_load interface in old product. > >>>> > >>>> Maybe. The code that kexec_file_load sucked into the kernel is quite > >>>> stable and rarely needs changes except during a port of kexec to > >>>> another architecture. > >>>> > >>>> Last I looked the real maintenance effor of kexec and kexec on panic was > >>>> in the drivers. So I don't think we can use maintenance to do anything. > >>> > >>> Not sure if I got it. But if check Lianbo's patches, a lot of effort has > >>> been taken to make SEV work well on kexec_file_load. And we have > >>> switched to use kexec_file_load in the newly published Fedora release > >>> on x86_64 by default. Before this, Lianbo has investigated and done many > >>> experiments to make sure the switching is safe. We finally made this > >>> decision. Next we will do the switch in Enterprise distros. Once these > >>> are proved safe, we will suggest customers to use kexec_file_load for > >>> kexec rebooting too. In the future, we will only care about > >>> kexec_file_load if everying is going well. But as I have explained > >>> repeatedly, only caring about kexec_file_load means we will leave > >>> kexec_load as is, we will not add new feature or improvement patches > >>> for it. > >>> > >>> commit 6a20bd54473e11011bf2b47efb52d0759d412854 > >>> Author: Lianbo Jiang <lijiang@redhat.com> > >>> Date: Thu Jan 16 13:47:35 2020 +0800 > >>> > >>> kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > >>> > >>>> > >>>>> For 3), people can still use kexec_load and develop/fix for it, if no > >>>>> kexec_file_load supported. But 32-bit arm should be a different one, > >>>>> more like i386, we will leave it as is, and fix anything which could > >>>>> break it. But people really expects to improve or add feature to it? E.g > >>>>> in this patchset, the mem hotplug issue James raised, I assume James is > >>>>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > >>>>> another reply, people even don't agree to continue supporting memory > >>>>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug > >>>>> bug on i386 with a patch, but people would rather set it as BROKEN. > >>>> > >>>> For memory hotplug just reload. Userspace already gets good events. > >>> > >>> Kexec_file_load is easy to maintain. This is an example. > >>> > >>> Lock the hotplug area where kexed-ed kernel is targeted in this patchset, > >>> it's obviously not right. We can't disable memory hotplug just because > >>> kexec-ed kernel is loaded ahead of time. > >>> > >>> Reloading is also not a good fix. Kexec-ed kernel is targeted at a > >>> movable area, reloading can avoid kexec rebooting corruption if that > >>> area is hot removed. But if that area is not removed, locating kernel > >>> into the hotpluggable area will change the area into ummovable zone. > >>> Unless we decide to not support memory hotplug in kexec-ed kernel, I > >>> guess it's very hard. Now in our distros kexec rebooting has been > >>> supported, the big cloud providers are deploying linux in guest, bugs on > >>> kexec reboot failure has been reported. They need the memory hotplug to > >>> increase/decrease memory. > >>> > >>> The root cause is kexec-ed kernel is targeted at hotpluggable memory > >>> region. Just avoiding the movable area can fix it. In kexec_file_load(), > >>> just checking or picking those unmovable region to put kernel/initrd in > >>> function locate_mem_hole_callback() can fix it. The page or pageblock's > >>> zone is movable or not, it's easy to know. This fix doesn't need to > >>> bother other component. > >> > >> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL > >> does not imply that it cannot get offlined and removed e.g., this is > >> heavily used on ppc64, with 16MB sections. > > > > Really? I just know there are two kinds of mem hoplug in ppc, but don't > > know the details. So in this case, is there any flag or a way to know > > those memory block are hotpluggable? I am curious how those kernel data > > is avoided to be put in this area. Or ppc just freely uses it for kernel > > data or user space data, then try to migrate when hot remove? > > See > arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() > > Under DLAPR, it can remove memory in LMB granularity, which is usually > 16MB (== single section on ppc64). DLPAR will directly online all > hotplugged memory (LMBs) from the kernel using device_online(), which > will go to ZONE_NORMAL. > > When trying to remove memory, it simply scans for offlineable 16MB > memory blocks (==section == LMB), offlines and removes them. No need for > the movable zone and all the involved issues. Yes, this is a different one, thanks for pointing it out. It sounds like balloon driver in virt platform, doesn't it? Avoiding to put kexec kernel into movable zone can't solve this DLPAR case as you said. > > Now, the interesting question is, can we have LMBs added during boot > (not via add_memory()), that will later be removed via remove_memory(). > IIRC, we had BUGs related to that, so I think yes. If a section contains > no unmovable allocations (after boot), it can get removed. I do want to ask this question. If we can add LMB into system RAM, then reload kexec can solve it. Another better way is adding a common function to filter out the movable zone when search position for kexec kernel, use a arch specific funciton to filter out DLPAR memory blocks for ppc only. Over there, we can simply use for_each_drmem_lmb() to do that. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 14:39 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-14 14:39 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/14/20 at 11:37am, David Hildenbrand wrote: > On 14.04.20 11:22, Baoquan He wrote: > > On 04/14/20 at 10:00am, David Hildenbrand wrote: > >> On 14.04.20 08:40, Baoquan He wrote: > >>> On 04/13/20 at 08:15am, Eric W. Biederman wrote: > >>>> Baoquan He <bhe@redhat.com> writes: > >>>> > >>>>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > >>>>>> > >>>>>> The only benefit of kexec_file_load is that it is simple enough from a > >>>>>> kernel perspective that signatures can be checked. > >>>>> > >>>>> We don't have this restriction any more with below commit: > >>>>> > >>>>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > >>>>> and KEXEC_SIG_FORCE") > >>>>> > >>>>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > >>>>> secure boot or legacy system for kexec/kdump. Being simple enough is > >>>>> enough to astract and convince us to use it instead. And kexec_file_load > >>>>> has been in use for several years on systems with secure boot, since > >>>>> added in 2014, on x86_64. > >>>> > >>>> No. Actaully kexec_file_load is the less capable interface, and less > >>>> flexible interface. Which is why it is appropriate for signature > >>>> verification. > >>> > >>> Well, everyone has a stance and the corresponding view. You could have > >>> wider view from long time maintenance and in upstrem position, and think > >>> kexec_file_load is horrible. But I can only see from our work as a front > >>> line engineer to maintain/develop kexec/kdump in RHEL, and think > >>> kexec_file_load is easier to maintain. > >>> > >>> Surely except of multiple kernel image format support. No matter it is > >>> kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. > >>> This is produced from kerel building by default. We have no way to > >>> support it in our distros and add it into kexec_file_load. > >>> > >>> [RFC PATCH] x86/boot: make ELF kernel multiboot-able > >>> https://lkml.org/lkml/2017/2/15/654 > >>> > >>>> > >>>>>> kexec_load in every other respect is the more capable and functional > >>>>>> interface. It makes no sense to get rid of it. > >>>>>> > >>>>>> It does make sense to reload with a loaded kernel on memory hotplug. > >>>>>> That is simple and easy. If we are going to handle something in the > >>>>>> kernel it should simple an automated unloading of the kernel on memory > >>>>>> hotplug. > >>>>>> > >>>>>> > >>>>>> I think it would be irresponsible to deprecate kexec_load on any > >>>>>> platform. > >>>>>> > >>>>>> I also suspect that kexec_file_load could be taught to copy the dtb > >>>>>> on arm32 if someone wants to deal with signatures. > >>>>>> > >>>>>> We definitely can not even think of deprecating kexec_load until > >>>>>> architecture that supports it also supports kexec_file_load and everyone > >>>>>> is happy with that interface. That is Linus's no regression rule. > >>>>> > >>>>> I should pick a milder word to express our tendency and tell our plan > >>>>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help > >>>>> much. I didn't mean to say 'deprecate' at all when replied. > >>>>> > >>>>> The situation and trend I understand about kexec_load and kexec_file_load > >>>>> are: > >>>>> > >>>>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > >>>>> have yet, just as x86_64, arm64 and s390 have done; > >>>>> > >>>>> 2) kexec_file_load is suggested to use, and take precedence over > >>>>> kexec_load in the future, if both are supported in one ARCH. > >>>> > >>>> The deep problem is that kexec_file_load is distinctly less expressive > >>>> than kexec_load. > >>>> > >>>>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > >>>>> and by ARCHes for back compatibility w/ kexec_file_load support. > >>>>> > >>>>> For 1) and 2), I think the reason is obvious as Eric said, > >>>>> kexec_file_load is simple enough. And currently, whenever we got a bug > >>>>> report, we may need fix them twice, for kexec_load and kexec_file_load. > >>>>> If kexec_file_load is made by default, e.g on x86_64, we will change it > >>>>> in kernel space only, for kexec_file_load. This is what I meant about > >>>>> 'obsolete gradually'. I think for arm64, s390, they will do these too. > >>>>> Unless there's some critical/blocker bug in kexec_load, to corrupt the > >>>>> old kexec_load interface in old product. > >>>> > >>>> Maybe. The code that kexec_file_load sucked into the kernel is quite > >>>> stable and rarely needs changes except during a port of kexec to > >>>> another architecture. > >>>> > >>>> Last I looked the real maintenance effor of kexec and kexec on panic was > >>>> in the drivers. So I don't think we can use maintenance to do anything. > >>> > >>> Not sure if I got it. But if check Lianbo's patches, a lot of effort has > >>> been taken to make SEV work well on kexec_file_load. And we have > >>> switched to use kexec_file_load in the newly published Fedora release > >>> on x86_64 by default. Before this, Lianbo has investigated and done many > >>> experiments to make sure the switching is safe. We finally made this > >>> decision. Next we will do the switch in Enterprise distros. Once these > >>> are proved safe, we will suggest customers to use kexec_file_load for > >>> kexec rebooting too. In the future, we will only care about > >>> kexec_file_load if everying is going well. But as I have explained > >>> repeatedly, only caring about kexec_file_load means we will leave > >>> kexec_load as is, we will not add new feature or improvement patches > >>> for it. > >>> > >>> commit 6a20bd54473e11011bf2b47efb52d0759d412854 > >>> Author: Lianbo Jiang <lijiang@redhat.com> > >>> Date: Thu Jan 16 13:47:35 2020 +0800 > >>> > >>> kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default > >>> > >>>> > >>>>> For 3), people can still use kexec_load and develop/fix for it, if no > >>>>> kexec_file_load supported. But 32-bit arm should be a different one, > >>>>> more like i386, we will leave it as is, and fix anything which could > >>>>> break it. But people really expects to improve or add feature to it? E.g > >>>>> in this patchset, the mem hotplug issue James raised, I assume James is > >>>>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > >>>>> another reply, people even don't agree to continue supporting memory > >>>>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug > >>>>> bug on i386 with a patch, but people would rather set it as BROKEN. > >>>> > >>>> For memory hotplug just reload. Userspace already gets good events. > >>> > >>> Kexec_file_load is easy to maintain. This is an example. > >>> > >>> Lock the hotplug area where kexed-ed kernel is targeted in this patchset, > >>> it's obviously not right. We can't disable memory hotplug just because > >>> kexec-ed kernel is loaded ahead of time. > >>> > >>> Reloading is also not a good fix. Kexec-ed kernel is targeted at a > >>> movable area, reloading can avoid kexec rebooting corruption if that > >>> area is hot removed. But if that area is not removed, locating kernel > >>> into the hotpluggable area will change the area into ummovable zone. > >>> Unless we decide to not support memory hotplug in kexec-ed kernel, I > >>> guess it's very hard. Now in our distros kexec rebooting has been > >>> supported, the big cloud providers are deploying linux in guest, bugs on > >>> kexec reboot failure has been reported. They need the memory hotplug to > >>> increase/decrease memory. > >>> > >>> The root cause is kexec-ed kernel is targeted at hotpluggable memory > >>> region. Just avoiding the movable area can fix it. In kexec_file_load(), > >>> just checking or picking those unmovable region to put kernel/initrd in > >>> function locate_mem_hole_callback() can fix it. The page or pageblock's > >>> zone is movable or not, it's easy to know. This fix doesn't need to > >>> bother other component. > >> > >> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL > >> does not imply that it cannot get offlined and removed e.g., this is > >> heavily used on ppc64, with 16MB sections. > > > > Really? I just know there are two kinds of mem hoplug in ppc, but don't > > know the details. So in this case, is there any flag or a way to know > > those memory block are hotpluggable? I am curious how those kernel data > > is avoided to be put in this area. Or ppc just freely uses it for kernel > > data or user space data, then try to migrate when hot remove? > > See > arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() > > Under DLAPR, it can remove memory in LMB granularity, which is usually > 16MB (== single section on ppc64). DLPAR will directly online all > hotplugged memory (LMBs) from the kernel using device_online(), which > will go to ZONE_NORMAL. > > When trying to remove memory, it simply scans for offlineable 16MB > memory blocks (==section == LMB), offlines and removes them. No need for > the movable zone and all the involved issues. Yes, this is a different one, thanks for pointing it out. It sounds like balloon driver in virt platform, doesn't it? Avoiding to put kexec kernel into movable zone can't solve this DLPAR case as you said. > > Now, the interesting question is, can we have LMBs added during boot > (not via add_memory()), that will later be removed via remove_memory(). > IIRC, we had BUGs related to that, so I think yes. If a section contains > no unmovable allocations (after boot), it can get removed. I do want to ask this question. If we can add LMB into system RAM, then reload kexec can solve it. Another better way is adding a common function to filter out the movable zone when search position for kexec kernel, use a arch specific funciton to filter out DLPAR memory blocks for ppc only. Over there, we can simply use for_each_drmem_lmb() to do that. ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 14:39 ` Baoquan He (?) (?) @ 2020-04-14 14:49 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-14 14:49 UTC (permalink / raw) To: Baoquan He Cc: Eric W. Biederman, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel, linuxppc-dev, piliu On 14.04.20 16:39, Baoquan He wrote: > On 04/14/20 at 11:37am, David Hildenbrand wrote: >> On 14.04.20 11:22, Baoquan He wrote: >>> On 04/14/20 at 10:00am, David Hildenbrand wrote: >>>> On 14.04.20 08:40, Baoquan He wrote: >>>>> On 04/13/20 at 08:15am, Eric W. Biederman wrote: >>>>>> Baoquan He <bhe@redhat.com> writes: >>>>>> >>>>>>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: >>>>>>>> >>>>>>>> The only benefit of kexec_file_load is that it is simple enough from a >>>>>>>> kernel perspective that signatures can be checked. >>>>>>> >>>>>>> We don't have this restriction any more with below commit: >>>>>>> >>>>>>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG >>>>>>> and KEXEC_SIG_FORCE") >>>>>>> >>>>>>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both >>>>>>> secure boot or legacy system for kexec/kdump. Being simple enough is >>>>>>> enough to astract and convince us to use it instead. And kexec_file_load >>>>>>> has been in use for several years on systems with secure boot, since >>>>>>> added in 2014, on x86_64. >>>>>> >>>>>> No. Actaully kexec_file_load is the less capable interface, and less >>>>>> flexible interface. Which is why it is appropriate for signature >>>>>> verification. >>>>> >>>>> Well, everyone has a stance and the corresponding view. You could have >>>>> wider view from long time maintenance and in upstrem position, and think >>>>> kexec_file_load is horrible. But I can only see from our work as a front >>>>> line engineer to maintain/develop kexec/kdump in RHEL, and think >>>>> kexec_file_load is easier to maintain. >>>>> >>>>> Surely except of multiple kernel image format support. No matter it is >>>>> kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. >>>>> This is produced from kerel building by default. We have no way to >>>>> support it in our distros and add it into kexec_file_load. >>>>> >>>>> [RFC PATCH] x86/boot: make ELF kernel multiboot-able >>>>> https://lkml.org/lkml/2017/2/15/654 >>>>> >>>>>> >>>>>>>> kexec_load in every other respect is the more capable and functional >>>>>>>> interface. It makes no sense to get rid of it. >>>>>>>> >>>>>>>> It does make sense to reload with a loaded kernel on memory hotplug. >>>>>>>> That is simple and easy. If we are going to handle something in the >>>>>>>> kernel it should simple an automated unloading of the kernel on memory >>>>>>>> hotplug. >>>>>>>> >>>>>>>> >>>>>>>> I think it would be irresponsible to deprecate kexec_load on any >>>>>>>> platform. >>>>>>>> >>>>>>>> I also suspect that kexec_file_load could be taught to copy the dtb >>>>>>>> on arm32 if someone wants to deal with signatures. >>>>>>>> >>>>>>>> We definitely can not even think of deprecating kexec_load until >>>>>>>> architecture that supports it also supports kexec_file_load and everyone >>>>>>>> is happy with that interface. That is Linus's no regression rule. >>>>>>> >>>>>>> I should pick a milder word to express our tendency and tell our plan >>>>>>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help >>>>>>> much. I didn't mean to say 'deprecate' at all when replied. >>>>>>> >>>>>>> The situation and trend I understand about kexec_load and kexec_file_load >>>>>>> are: >>>>>>> >>>>>>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't >>>>>>> have yet, just as x86_64, arm64 and s390 have done; >>>>>>> >>>>>>> 2) kexec_file_load is suggested to use, and take precedence over >>>>>>> kexec_load in the future, if both are supported in one ARCH. >>>>>> >>>>>> The deep problem is that kexec_file_load is distinctly less expressive >>>>>> than kexec_load. >>>>>> >>>>>>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, >>>>>>> and by ARCHes for back compatibility w/ kexec_file_load support. >>>>>>> >>>>>>> For 1) and 2), I think the reason is obvious as Eric said, >>>>>>> kexec_file_load is simple enough. And currently, whenever we got a bug >>>>>>> report, we may need fix them twice, for kexec_load and kexec_file_load. >>>>>>> If kexec_file_load is made by default, e.g on x86_64, we will change it >>>>>>> in kernel space only, for kexec_file_load. This is what I meant about >>>>>>> 'obsolete gradually'. I think for arm64, s390, they will do these too. >>>>>>> Unless there's some critical/blocker bug in kexec_load, to corrupt the >>>>>>> old kexec_load interface in old product. >>>>>> >>>>>> Maybe. The code that kexec_file_load sucked into the kernel is quite >>>>>> stable and rarely needs changes except during a port of kexec to >>>>>> another architecture. >>>>>> >>>>>> Last I looked the real maintenance effor of kexec and kexec on panic was >>>>>> in the drivers. So I don't think we can use maintenance to do anything. >>>>> >>>>> Not sure if I got it. But if check Lianbo's patches, a lot of effort has >>>>> been taken to make SEV work well on kexec_file_load. And we have >>>>> switched to use kexec_file_load in the newly published Fedora release >>>>> on x86_64 by default. Before this, Lianbo has investigated and done many >>>>> experiments to make sure the switching is safe. We finally made this >>>>> decision. Next we will do the switch in Enterprise distros. Once these >>>>> are proved safe, we will suggest customers to use kexec_file_load for >>>>> kexec rebooting too. In the future, we will only care about >>>>> kexec_file_load if everying is going well. But as I have explained >>>>> repeatedly, only caring about kexec_file_load means we will leave >>>>> kexec_load as is, we will not add new feature or improvement patches >>>>> for it. >>>>> >>>>> commit 6a20bd54473e11011bf2b47efb52d0759d412854 >>>>> Author: Lianbo Jiang <lijiang@redhat.com> >>>>> Date: Thu Jan 16 13:47:35 2020 +0800 >>>>> >>>>> kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default >>>>> >>>>>> >>>>>>> For 3), people can still use kexec_load and develop/fix for it, if no >>>>>>> kexec_file_load supported. But 32-bit arm should be a different one, >>>>>>> more like i386, we will leave it as is, and fix anything which could >>>>>>> break it. But people really expects to improve or add feature to it? E.g >>>>>>> in this patchset, the mem hotplug issue James raised, I assume James is >>>>>>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in >>>>>>> another reply, people even don't agree to continue supporting memory >>>>>>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug >>>>>>> bug on i386 with a patch, but people would rather set it as BROKEN. >>>>>> >>>>>> For memory hotplug just reload. Userspace already gets good events. >>>>> >>>>> Kexec_file_load is easy to maintain. This is an example. >>>>> >>>>> Lock the hotplug area where kexed-ed kernel is targeted in this patchset, >>>>> it's obviously not right. We can't disable memory hotplug just because >>>>> kexec-ed kernel is loaded ahead of time. >>>>> >>>>> Reloading is also not a good fix. Kexec-ed kernel is targeted at a >>>>> movable area, reloading can avoid kexec rebooting corruption if that >>>>> area is hot removed. But if that area is not removed, locating kernel >>>>> into the hotpluggable area will change the area into ummovable zone. >>>>> Unless we decide to not support memory hotplug in kexec-ed kernel, I >>>>> guess it's very hard. Now in our distros kexec rebooting has been >>>>> supported, the big cloud providers are deploying linux in guest, bugs on >>>>> kexec reboot failure has been reported. They need the memory hotplug to >>>>> increase/decrease memory. >>>>> >>>>> The root cause is kexec-ed kernel is targeted at hotpluggable memory >>>>> region. Just avoiding the movable area can fix it. In kexec_file_load(), >>>>> just checking or picking those unmovable region to put kernel/initrd in >>>>> function locate_mem_hole_callback() can fix it. The page or pageblock's >>>>> zone is movable or not, it's easy to know. This fix doesn't need to >>>>> bother other component. >>>> >>>> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL >>>> does not imply that it cannot get offlined and removed e.g., this is >>>> heavily used on ppc64, with 16MB sections. >>> >>> Really? I just know there are two kinds of mem hoplug in ppc, but don't >>> know the details. So in this case, is there any flag or a way to know >>> those memory block are hotpluggable? I am curious how those kernel data >>> is avoided to be put in this area. Or ppc just freely uses it for kernel >>> data or user space data, then try to migrate when hot remove? >> >> See >> arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() >> >> Under DLAPR, it can remove memory in LMB granularity, which is usually >> 16MB (== single section on ppc64). DLPAR will directly online all >> hotplugged memory (LMBs) from the kernel using device_online(), which >> will go to ZONE_NORMAL. >> >> When trying to remove memory, it simply scans for offlineable 16MB >> memory blocks (==section == LMB), offlines and removes them. No need for >> the movable zone and all the involved issues. > > Yes, this is a different one, thanks for pointing it out. It sounds like > balloon driver in virt platform, doesn't it? With DLPAR there is a hypervisor involved (which manages the actual HW DIMMs), so yes. > > Avoiding to put kexec kernel into movable zone can't solve this DLPAR > case as you said. > >> >> Now, the interesting question is, can we have LMBs added during boot >> (not via add_memory()), that will later be removed via remove_memory(). >> IIRC, we had BUGs related to that, so I think yes. If a section contains >> no unmovable allocations (after boot), it can get removed. > > I do want to ask this question. If we can add LMB into system RAM, then > reload kexec can solve it. > > Another better way is adding a common function to filter out the > movable zone when search position for kexec kernel, use a arch specific > funciton to filter out DLPAR memory blocks for ppc only. Over there, > we can simply use for_each_drmem_lmb() to do that. I was thinking about something similar. Maybe something like a notifier that can be used to test if selected memory can be used for kexec images. It would apply to - arm64 and filter out all hotadded memory (IIRC, only boot memory can be used). - powerpc to filter out all LMBs that can be removed (assuming not all memory corresponds to LMBs that can be removed, otherwise we're in trouble ... :) ) - virtio-mem to filter out all memory it added. - hyper-v to filter out partially backed memory blocks (esp. the last memory block it added and only partially backed it by memory). This would make it work for kexec_file_load(), however, I do wonder how we would want to approach that from userspace kexec-tools when handling it from kexec_load(). -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 14:49 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-14 14:49 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 14.04.20 16:39, Baoquan He wrote: > On 04/14/20 at 11:37am, David Hildenbrand wrote: >> On 14.04.20 11:22, Baoquan He wrote: >>> On 04/14/20 at 10:00am, David Hildenbrand wrote: >>>> On 14.04.20 08:40, Baoquan He wrote: >>>>> On 04/13/20 at 08:15am, Eric W. Biederman wrote: >>>>>> Baoquan He <bhe@redhat.com> writes: >>>>>> >>>>>>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: >>>>>>>> >>>>>>>> The only benefit of kexec_file_load is that it is simple enough from a >>>>>>>> kernel perspective that signatures can be checked. >>>>>>> >>>>>>> We don't have this restriction any more with below commit: >>>>>>> >>>>>>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG >>>>>>> and KEXEC_SIG_FORCE") >>>>>>> >>>>>>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both >>>>>>> secure boot or legacy system for kexec/kdump. Being simple enough is >>>>>>> enough to astract and convince us to use it instead. And kexec_file_load >>>>>>> has been in use for several years on systems with secure boot, since >>>>>>> added in 2014, on x86_64. >>>>>> >>>>>> No. Actaully kexec_file_load is the less capable interface, and less >>>>>> flexible interface. Which is why it is appropriate for signature >>>>>> verification. >>>>> >>>>> Well, everyone has a stance and the corresponding view. You could have >>>>> wider view from long time maintenance and in upstrem position, and think >>>>> kexec_file_load is horrible. But I can only see from our work as a front >>>>> line engineer to maintain/develop kexec/kdump in RHEL, and think >>>>> kexec_file_load is easier to maintain. >>>>> >>>>> Surely except of multiple kernel image format support. No matter it is >>>>> kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. >>>>> This is produced from kerel building by default. We have no way to >>>>> support it in our distros and add it into kexec_file_load. >>>>> >>>>> [RFC PATCH] x86/boot: make ELF kernel multiboot-able >>>>> https://lkml.org/lkml/2017/2/15/654 >>>>> >>>>>> >>>>>>>> kexec_load in every other respect is the more capable and functional >>>>>>>> interface. It makes no sense to get rid of it. >>>>>>>> >>>>>>>> It does make sense to reload with a loaded kernel on memory hotplug. >>>>>>>> That is simple and easy. If we are going to handle something in the >>>>>>>> kernel it should simple an automated unloading of the kernel on memory >>>>>>>> hotplug. >>>>>>>> >>>>>>>> >>>>>>>> I think it would be irresponsible to deprecate kexec_load on any >>>>>>>> platform. >>>>>>>> >>>>>>>> I also suspect that kexec_file_load could be taught to copy the dtb >>>>>>>> on arm32 if someone wants to deal with signatures. >>>>>>>> >>>>>>>> We definitely can not even think of deprecating kexec_load until >>>>>>>> architecture that supports it also supports kexec_file_load and everyone >>>>>>>> is happy with that interface. That is Linus's no regression rule. >>>>>>> >>>>>>> I should pick a milder word to express our tendency and tell our plan >>>>>>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help >>>>>>> much. I didn't mean to say 'deprecate' at all when replied. >>>>>>> >>>>>>> The situation and trend I understand about kexec_load and kexec_file_load >>>>>>> are: >>>>>>> >>>>>>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't >>>>>>> have yet, just as x86_64, arm64 and s390 have done; >>>>>>> >>>>>>> 2) kexec_file_load is suggested to use, and take precedence over >>>>>>> kexec_load in the future, if both are supported in one ARCH. >>>>>> >>>>>> The deep problem is that kexec_file_load is distinctly less expressive >>>>>> than kexec_load. >>>>>> >>>>>>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, >>>>>>> and by ARCHes for back compatibility w/ kexec_file_load support. >>>>>>> >>>>>>> For 1) and 2), I think the reason is obvious as Eric said, >>>>>>> kexec_file_load is simple enough. And currently, whenever we got a bug >>>>>>> report, we may need fix them twice, for kexec_load and kexec_file_load. >>>>>>> If kexec_file_load is made by default, e.g on x86_64, we will change it >>>>>>> in kernel space only, for kexec_file_load. This is what I meant about >>>>>>> 'obsolete gradually'. I think for arm64, s390, they will do these too. >>>>>>> Unless there's some critical/blocker bug in kexec_load, to corrupt the >>>>>>> old kexec_load interface in old product. >>>>>> >>>>>> Maybe. The code that kexec_file_load sucked into the kernel is quite >>>>>> stable and rarely needs changes except during a port of kexec to >>>>>> another architecture. >>>>>> >>>>>> Last I looked the real maintenance effor of kexec and kexec on panic was >>>>>> in the drivers. So I don't think we can use maintenance to do anything. >>>>> >>>>> Not sure if I got it. But if check Lianbo's patches, a lot of effort has >>>>> been taken to make SEV work well on kexec_file_load. And we have >>>>> switched to use kexec_file_load in the newly published Fedora release >>>>> on x86_64 by default. Before this, Lianbo has investigated and done many >>>>> experiments to make sure the switching is safe. We finally made this >>>>> decision. Next we will do the switch in Enterprise distros. Once these >>>>> are proved safe, we will suggest customers to use kexec_file_load for >>>>> kexec rebooting too. In the future, we will only care about >>>>> kexec_file_load if everying is going well. But as I have explained >>>>> repeatedly, only caring about kexec_file_load means we will leave >>>>> kexec_load as is, we will not add new feature or improvement patches >>>>> for it. >>>>> >>>>> commit 6a20bd54473e11011bf2b47efb52d0759d412854 >>>>> Author: Lianbo Jiang <lijiang@redhat.com> >>>>> Date: Thu Jan 16 13:47:35 2020 +0800 >>>>> >>>>> kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default >>>>> >>>>>> >>>>>>> For 3), people can still use kexec_load and develop/fix for it, if no >>>>>>> kexec_file_load supported. But 32-bit arm should be a different one, >>>>>>> more like i386, we will leave it as is, and fix anything which could >>>>>>> break it. But people really expects to improve or add feature to it? E.g >>>>>>> in this patchset, the mem hotplug issue James raised, I assume James is >>>>>>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in >>>>>>> another reply, people even don't agree to continue supporting memory >>>>>>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug >>>>>>> bug on i386 with a patch, but people would rather set it as BROKEN. >>>>>> >>>>>> For memory hotplug just reload. Userspace already gets good events. >>>>> >>>>> Kexec_file_load is easy to maintain. This is an example. >>>>> >>>>> Lock the hotplug area where kexed-ed kernel is targeted in this patchset, >>>>> it's obviously not right. We can't disable memory hotplug just because >>>>> kexec-ed kernel is loaded ahead of time. >>>>> >>>>> Reloading is also not a good fix. Kexec-ed kernel is targeted at a >>>>> movable area, reloading can avoid kexec rebooting corruption if that >>>>> area is hot removed. But if that area is not removed, locating kernel >>>>> into the hotpluggable area will change the area into ummovable zone. >>>>> Unless we decide to not support memory hotplug in kexec-ed kernel, I >>>>> guess it's very hard. Now in our distros kexec rebooting has been >>>>> supported, the big cloud providers are deploying linux in guest, bugs on >>>>> kexec reboot failure has been reported. They need the memory hotplug to >>>>> increase/decrease memory. >>>>> >>>>> The root cause is kexec-ed kernel is targeted at hotpluggable memory >>>>> region. Just avoiding the movable area can fix it. In kexec_file_load(), >>>>> just checking or picking those unmovable region to put kernel/initrd in >>>>> function locate_mem_hole_callback() can fix it. The page or pageblock's >>>>> zone is movable or not, it's easy to know. This fix doesn't need to >>>>> bother other component. >>>> >>>> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL >>>> does not imply that it cannot get offlined and removed e.g., this is >>>> heavily used on ppc64, with 16MB sections. >>> >>> Really? I just know there are two kinds of mem hoplug in ppc, but don't >>> know the details. So in this case, is there any flag or a way to know >>> those memory block are hotpluggable? I am curious how those kernel data >>> is avoided to be put in this area. Or ppc just freely uses it for kernel >>> data or user space data, then try to migrate when hot remove? >> >> See >> arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() >> >> Under DLAPR, it can remove memory in LMB granularity, which is usually >> 16MB (== single section on ppc64). DLPAR will directly online all >> hotplugged memory (LMBs) from the kernel using device_online(), which >> will go to ZONE_NORMAL. >> >> When trying to remove memory, it simply scans for offlineable 16MB >> memory blocks (==section == LMB), offlines and removes them. No need for >> the movable zone and all the involved issues. > > Yes, this is a different one, thanks for pointing it out. It sounds like > balloon driver in virt platform, doesn't it? With DLPAR there is a hypervisor involved (which manages the actual HW DIMMs), so yes. > > Avoiding to put kexec kernel into movable zone can't solve this DLPAR > case as you said. > >> >> Now, the interesting question is, can we have LMBs added during boot >> (not via add_memory()), that will later be removed via remove_memory(). >> IIRC, we had BUGs related to that, so I think yes. If a section contains >> no unmovable allocations (after boot), it can get removed. > > I do want to ask this question. If we can add LMB into system RAM, then > reload kexec can solve it. > > Another better way is adding a common function to filter out the > movable zone when search position for kexec kernel, use a arch specific > funciton to filter out DLPAR memory blocks for ppc only. Over there, > we can simply use for_each_drmem_lmb() to do that. I was thinking about something similar. Maybe something like a notifier that can be used to test if selected memory can be used for kexec images. It would apply to - arm64 and filter out all hotadded memory (IIRC, only boot memory can be used). - powerpc to filter out all LMBs that can be removed (assuming not all memory corresponds to LMBs that can be removed, otherwise we're in trouble ... :) ) - virtio-mem to filter out all memory it added. - hyper-v to filter out partially backed memory blocks (esp. the last memory block it added and only partially backed it by memory). This would make it work for kexec_file_load(), however, I do wonder how we would want to approach that from userspace kexec-tools when handling it from kexec_load(). -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 14:49 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-14 14:49 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 14.04.20 16:39, Baoquan He wrote: > On 04/14/20 at 11:37am, David Hildenbrand wrote: >> On 14.04.20 11:22, Baoquan He wrote: >>> On 04/14/20 at 10:00am, David Hildenbrand wrote: >>>> On 14.04.20 08:40, Baoquan He wrote: >>>>> On 04/13/20 at 08:15am, Eric W. Biederman wrote: >>>>>> Baoquan He <bhe@redhat.com> writes: >>>>>> >>>>>>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: >>>>>>>> >>>>>>>> The only benefit of kexec_file_load is that it is simple enough from a >>>>>>>> kernel perspective that signatures can be checked. >>>>>>> >>>>>>> We don't have this restriction any more with below commit: >>>>>>> >>>>>>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG >>>>>>> and KEXEC_SIG_FORCE") >>>>>>> >>>>>>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both >>>>>>> secure boot or legacy system for kexec/kdump. Being simple enough is >>>>>>> enough to astract and convince us to use it instead. And kexec_file_load >>>>>>> has been in use for several years on systems with secure boot, since >>>>>>> added in 2014, on x86_64. >>>>>> >>>>>> No. Actaully kexec_file_load is the less capable interface, and less >>>>>> flexible interface. Which is why it is appropriate for signature >>>>>> verification. >>>>> >>>>> Well, everyone has a stance and the corresponding view. You could have >>>>> wider view from long time maintenance and in upstrem position, and think >>>>> kexec_file_load is horrible. But I can only see from our work as a front >>>>> line engineer to maintain/develop kexec/kdump in RHEL, and think >>>>> kexec_file_load is easier to maintain. >>>>> >>>>> Surely except of multiple kernel image format support. No matter it is >>>>> kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. >>>>> This is produced from kerel building by default. We have no way to >>>>> support it in our distros and add it into kexec_file_load. >>>>> >>>>> [RFC PATCH] x86/boot: make ELF kernel multiboot-able >>>>> https://lkml.org/lkml/2017/2/15/654 >>>>> >>>>>> >>>>>>>> kexec_load in every other respect is the more capable and functional >>>>>>>> interface. It makes no sense to get rid of it. >>>>>>>> >>>>>>>> It does make sense to reload with a loaded kernel on memory hotplug. >>>>>>>> That is simple and easy. If we are going to handle something in the >>>>>>>> kernel it should simple an automated unloading of the kernel on memory >>>>>>>> hotplug. >>>>>>>> >>>>>>>> >>>>>>>> I think it would be irresponsible to deprecate kexec_load on any >>>>>>>> platform. >>>>>>>> >>>>>>>> I also suspect that kexec_file_load could be taught to copy the dtb >>>>>>>> on arm32 if someone wants to deal with signatures. >>>>>>>> >>>>>>>> We definitely can not even think of deprecating kexec_load until >>>>>>>> architecture that supports it also supports kexec_file_load and everyone >>>>>>>> is happy with that interface. That is Linus's no regression rule. >>>>>>> >>>>>>> I should pick a milder word to express our tendency and tell our plan >>>>>>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help >>>>>>> much. I didn't mean to say 'deprecate' at all when replied. >>>>>>> >>>>>>> The situation and trend I understand about kexec_load and kexec_file_load >>>>>>> are: >>>>>>> >>>>>>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't >>>>>>> have yet, just as x86_64, arm64 and s390 have done; >>>>>>> >>>>>>> 2) kexec_file_load is suggested to use, and take precedence over >>>>>>> kexec_load in the future, if both are supported in one ARCH. >>>>>> >>>>>> The deep problem is that kexec_file_load is distinctly less expressive >>>>>> than kexec_load. >>>>>> >>>>>>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, >>>>>>> and by ARCHes for back compatibility w/ kexec_file_load support. >>>>>>> >>>>>>> For 1) and 2), I think the reason is obvious as Eric said, >>>>>>> kexec_file_load is simple enough. And currently, whenever we got a bug >>>>>>> report, we may need fix them twice, for kexec_load and kexec_file_load. >>>>>>> If kexec_file_load is made by default, e.g on x86_64, we will change it >>>>>>> in kernel space only, for kexec_file_load. This is what I meant about >>>>>>> 'obsolete gradually'. I think for arm64, s390, they will do these too. >>>>>>> Unless there's some critical/blocker bug in kexec_load, to corrupt the >>>>>>> old kexec_load interface in old product. >>>>>> >>>>>> Maybe. The code that kexec_file_load sucked into the kernel is quite >>>>>> stable and rarely needs changes except during a port of kexec to >>>>>> another architecture. >>>>>> >>>>>> Last I looked the real maintenance effor of kexec and kexec on panic was >>>>>> in the drivers. So I don't think we can use maintenance to do anything. >>>>> >>>>> Not sure if I got it. But if check Lianbo's patches, a lot of effort has >>>>> been taken to make SEV work well on kexec_file_load. And we have >>>>> switched to use kexec_file_load in the newly published Fedora release >>>>> on x86_64 by default. Before this, Lianbo has investigated and done many >>>>> experiments to make sure the switching is safe. We finally made this >>>>> decision. Next we will do the switch in Enterprise distros. Once these >>>>> are proved safe, we will suggest customers to use kexec_file_load for >>>>> kexec rebooting too. In the future, we will only care about >>>>> kexec_file_load if everying is going well. But as I have explained >>>>> repeatedly, only caring about kexec_file_load means we will leave >>>>> kexec_load as is, we will not add new feature or improvement patches >>>>> for it. >>>>> >>>>> commit 6a20bd54473e11011bf2b47efb52d0759d412854 >>>>> Author: Lianbo Jiang <lijiang@redhat.com> >>>>> Date: Thu Jan 16 13:47:35 2020 +0800 >>>>> >>>>> kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default >>>>> >>>>>> >>>>>>> For 3), people can still use kexec_load and develop/fix for it, if no >>>>>>> kexec_file_load supported. But 32-bit arm should be a different one, >>>>>>> more like i386, we will leave it as is, and fix anything which could >>>>>>> break it. But people really expects to improve or add feature to it? E.g >>>>>>> in this patchset, the mem hotplug issue James raised, I assume James is >>>>>>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in >>>>>>> another reply, people even don't agree to continue supporting memory >>>>>>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug >>>>>>> bug on i386 with a patch, but people would rather set it as BROKEN. >>>>>> >>>>>> For memory hotplug just reload. Userspace already gets good events. >>>>> >>>>> Kexec_file_load is easy to maintain. This is an example. >>>>> >>>>> Lock the hotplug area where kexed-ed kernel is targeted in this patchset, >>>>> it's obviously not right. We can't disable memory hotplug just because >>>>> kexec-ed kernel is loaded ahead of time. >>>>> >>>>> Reloading is also not a good fix. Kexec-ed kernel is targeted at a >>>>> movable area, reloading can avoid kexec rebooting corruption if that >>>>> area is hot removed. But if that area is not removed, locating kernel >>>>> into the hotpluggable area will change the area into ummovable zone. >>>>> Unless we decide to not support memory hotplug in kexec-ed kernel, I >>>>> guess it's very hard. Now in our distros kexec rebooting has been >>>>> supported, the big cloud providers are deploying linux in guest, bugs on >>>>> kexec reboot failure has been reported. They need the memory hotplug to >>>>> increase/decrease memory. >>>>> >>>>> The root cause is kexec-ed kernel is targeted at hotpluggable memory >>>>> region. Just avoiding the movable area can fix it. In kexec_file_load(), >>>>> just checking or picking those unmovable region to put kernel/initrd in >>>>> function locate_mem_hole_callback() can fix it. The page or pageblock's >>>>> zone is movable or not, it's easy to know. This fix doesn't need to >>>>> bother other component. >>>> >>>> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL >>>> does not imply that it cannot get offlined and removed e.g., this is >>>> heavily used on ppc64, with 16MB sections. >>> >>> Really? I just know there are two kinds of mem hoplug in ppc, but don't >>> know the details. So in this case, is there any flag or a way to know >>> those memory block are hotpluggable? I am curious how those kernel data >>> is avoided to be put in this area. Or ppc just freely uses it for kernel >>> data or user space data, then try to migrate when hot remove? >> >> See >> arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() >> >> Under DLAPR, it can remove memory in LMB granularity, which is usually >> 16MB (== single section on ppc64). DLPAR will directly online all >> hotplugged memory (LMBs) from the kernel using device_online(), which >> will go to ZONE_NORMAL. >> >> When trying to remove memory, it simply scans for offlineable 16MB >> memory blocks (==section == LMB), offlines and removes them. No need for >> the movable zone and all the involved issues. > > Yes, this is a different one, thanks for pointing it out. It sounds like > balloon driver in virt platform, doesn't it? With DLPAR there is a hypervisor involved (which manages the actual HW DIMMs), so yes. > > Avoiding to put kexec kernel into movable zone can't solve this DLPAR > case as you said. > >> >> Now, the interesting question is, can we have LMBs added during boot >> (not via add_memory()), that will later be removed via remove_memory(). >> IIRC, we had BUGs related to that, so I think yes. If a section contains >> no unmovable allocations (after boot), it can get removed. > > I do want to ask this question. If we can add LMB into system RAM, then > reload kexec can solve it. > > Another better way is adding a common function to filter out the > movable zone when search position for kexec kernel, use a arch specific > funciton to filter out DLPAR memory blocks for ppc only. Over there, > we can simply use for_each_drmem_lmb() to do that. I was thinking about something similar. Maybe something like a notifier that can be used to test if selected memory can be used for kexec images. It would apply to - arm64 and filter out all hotadded memory (IIRC, only boot memory can be used). - powerpc to filter out all LMBs that can be removed (assuming not all memory corresponds to LMBs that can be removed, otherwise we're in trouble ... :) ) - virtio-mem to filter out all memory it added. - hyper-v to filter out partially backed memory blocks (esp. the last memory block it added and only partially backed it by memory). This would make it work for kexec_file_load(), however, I do wonder how we would want to approach that from userspace kexec-tools when handling it from kexec_load(). -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 14:49 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-14 14:49 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 14.04.20 16:39, Baoquan He wrote: > On 04/14/20 at 11:37am, David Hildenbrand wrote: >> On 14.04.20 11:22, Baoquan He wrote: >>> On 04/14/20 at 10:00am, David Hildenbrand wrote: >>>> On 14.04.20 08:40, Baoquan He wrote: >>>>> On 04/13/20 at 08:15am, Eric W. Biederman wrote: >>>>>> Baoquan He <bhe@redhat.com> writes: >>>>>> >>>>>>> On 04/12/20 at 02:52pm, Eric W. Biederman wrote: >>>>>>>> >>>>>>>> The only benefit of kexec_file_load is that it is simple enough from a >>>>>>>> kernel perspective that signatures can be checked. >>>>>>> >>>>>>> We don't have this restriction any more with below commit: >>>>>>> >>>>>>> commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG >>>>>>> and KEXEC_SIG_FORCE") >>>>>>> >>>>>>> With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both >>>>>>> secure boot or legacy system for kexec/kdump. Being simple enough is >>>>>>> enough to astract and convince us to use it instead. And kexec_file_load >>>>>>> has been in use for several years on systems with secure boot, since >>>>>>> added in 2014, on x86_64. >>>>>> >>>>>> No. Actaully kexec_file_load is the less capable interface, and less >>>>>> flexible interface. Which is why it is appropriate for signature >>>>>> verification. >>>>> >>>>> Well, everyone has a stance and the corresponding view. You could have >>>>> wider view from long time maintenance and in upstrem position, and think >>>>> kexec_file_load is horrible. But I can only see from our work as a front >>>>> line engineer to maintain/develop kexec/kdump in RHEL, and think >>>>> kexec_file_load is easier to maintain. >>>>> >>>>> Surely except of multiple kernel image format support. No matter it is >>>>> kexec_load and kexec_file_load, e.g in x86_64, we only support bzImage. >>>>> This is produced from kerel building by default. We have no way to >>>>> support it in our distros and add it into kexec_file_load. >>>>> >>>>> [RFC PATCH] x86/boot: make ELF kernel multiboot-able >>>>> https://lkml.org/lkml/2017/2/15/654 >>>>> >>>>>> >>>>>>>> kexec_load in every other respect is the more capable and functional >>>>>>>> interface. It makes no sense to get rid of it. >>>>>>>> >>>>>>>> It does make sense to reload with a loaded kernel on memory hotplug. >>>>>>>> That is simple and easy. If we are going to handle something in the >>>>>>>> kernel it should simple an automated unloading of the kernel on memory >>>>>>>> hotplug. >>>>>>>> >>>>>>>> >>>>>>>> I think it would be irresponsible to deprecate kexec_load on any >>>>>>>> platform. >>>>>>>> >>>>>>>> I also suspect that kexec_file_load could be taught to copy the dtb >>>>>>>> on arm32 if someone wants to deal with signatures. >>>>>>>> >>>>>>>> We definitely can not even think of deprecating kexec_load until >>>>>>>> architecture that supports it also supports kexec_file_load and everyone >>>>>>>> is happy with that interface. That is Linus's no regression rule. >>>>>>> >>>>>>> I should pick a milder word to express our tendency and tell our plan >>>>>>> then 'obsolete'. Even though I added 'gradually', seems it doesn't help >>>>>>> much. I didn't mean to say 'deprecate' at all when replied. >>>>>>> >>>>>>> The situation and trend I understand about kexec_load and kexec_file_load >>>>>>> are: >>>>>>> >>>>>>> 1) Supporting kexec_file_load is suggested to add in ARCHes which don't >>>>>>> have yet, just as x86_64, arm64 and s390 have done; >>>>>>> >>>>>>> 2) kexec_file_load is suggested to use, and take precedence over >>>>>>> kexec_load in the future, if both are supported in one ARCH. >>>>>> >>>>>> The deep problem is that kexec_file_load is distinctly less expressive >>>>>> than kexec_load. >>>>>> >>>>>>> 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, >>>>>>> and by ARCHes for back compatibility w/ kexec_file_load support. >>>>>>> >>>>>>> For 1) and 2), I think the reason is obvious as Eric said, >>>>>>> kexec_file_load is simple enough. And currently, whenever we got a bug >>>>>>> report, we may need fix them twice, for kexec_load and kexec_file_load. >>>>>>> If kexec_file_load is made by default, e.g on x86_64, we will change it >>>>>>> in kernel space only, for kexec_file_load. This is what I meant about >>>>>>> 'obsolete gradually'. I think for arm64, s390, they will do these too. >>>>>>> Unless there's some critical/blocker bug in kexec_load, to corrupt the >>>>>>> old kexec_load interface in old product. >>>>>> >>>>>> Maybe. The code that kexec_file_load sucked into the kernel is quite >>>>>> stable and rarely needs changes except during a port of kexec to >>>>>> another architecture. >>>>>> >>>>>> Last I looked the real maintenance effor of kexec and kexec on panic was >>>>>> in the drivers. So I don't think we can use maintenance to do anything. >>>>> >>>>> Not sure if I got it. But if check Lianbo's patches, a lot of effort has >>>>> been taken to make SEV work well on kexec_file_load. And we have >>>>> switched to use kexec_file_load in the newly published Fedora release >>>>> on x86_64 by default. Before this, Lianbo has investigated and done many >>>>> experiments to make sure the switching is safe. We finally made this >>>>> decision. Next we will do the switch in Enterprise distros. Once these >>>>> are proved safe, we will suggest customers to use kexec_file_load for >>>>> kexec rebooting too. In the future, we will only care about >>>>> kexec_file_load if everying is going well. But as I have explained >>>>> repeatedly, only caring about kexec_file_load means we will leave >>>>> kexec_load as is, we will not add new feature or improvement patches >>>>> for it. >>>>> >>>>> commit 6a20bd54473e11011bf2b47efb52d0759d412854 >>>>> Author: Lianbo Jiang <lijiang@redhat.com> >>>>> Date: Thu Jan 16 13:47:35 2020 +0800 >>>>> >>>>> kdump-lib: switch to the kexec_file_load() syscall on x86_64 by default >>>>> >>>>>> >>>>>>> For 3), people can still use kexec_load and develop/fix for it, if no >>>>>>> kexec_file_load supported. But 32-bit arm should be a different one, >>>>>>> more like i386, we will leave it as is, and fix anything which could >>>>>>> break it. But people really expects to improve or add feature to it? E.g >>>>>>> in this patchset, the mem hotplug issue James raised, I assume James is >>>>>>> focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in >>>>>>> another reply, people even don't agree to continue supporting memory >>>>>>> hotplug on 32-bit system. We ever took effort to fix a memory hotplug >>>>>>> bug on i386 with a patch, but people would rather set it as BROKEN. >>>>>> >>>>>> For memory hotplug just reload. Userspace already gets good events. >>>>> >>>>> Kexec_file_load is easy to maintain. This is an example. >>>>> >>>>> Lock the hotplug area where kexed-ed kernel is targeted in this patchset, >>>>> it's obviously not right. We can't disable memory hotplug just because >>>>> kexec-ed kernel is loaded ahead of time. >>>>> >>>>> Reloading is also not a good fix. Kexec-ed kernel is targeted at a >>>>> movable area, reloading can avoid kexec rebooting corruption if that >>>>> area is hot removed. But if that area is not removed, locating kernel >>>>> into the hotpluggable area will change the area into ummovable zone. >>>>> Unless we decide to not support memory hotplug in kexec-ed kernel, I >>>>> guess it's very hard. Now in our distros kexec rebooting has been >>>>> supported, the big cloud providers are deploying linux in guest, bugs on >>>>> kexec reboot failure has been reported. They need the memory hotplug to >>>>> increase/decrease memory. >>>>> >>>>> The root cause is kexec-ed kernel is targeted at hotpluggable memory >>>>> region. Just avoiding the movable area can fix it. In kexec_file_load(), >>>>> just checking or picking those unmovable region to put kernel/initrd in >>>>> function locate_mem_hole_callback() can fix it. The page or pageblock's >>>>> zone is movable or not, it's easy to know. This fix doesn't need to >>>>> bother other component. >>>> >>>> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL >>>> does not imply that it cannot get offlined and removed e.g., this is >>>> heavily used on ppc64, with 16MB sections. >>> >>> Really? I just know there are two kinds of mem hoplug in ppc, but don't >>> know the details. So in this case, is there any flag or a way to know >>> those memory block are hotpluggable? I am curious how those kernel data >>> is avoided to be put in this area. Or ppc just freely uses it for kernel >>> data or user space data, then try to migrate when hot remove? >> >> See >> arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() >> >> Under DLAPR, it can remove memory in LMB granularity, which is usually >> 16MB (== single section on ppc64). DLPAR will directly online all >> hotplugged memory (LMBs) from the kernel using device_online(), which >> will go to ZONE_NORMAL. >> >> When trying to remove memory, it simply scans for offlineable 16MB >> memory blocks (==section == LMB), offlines and removes them. No need for >> the movable zone and all the involved issues. > > Yes, this is a different one, thanks for pointing it out. It sounds like > balloon driver in virt platform, doesn't it? With DLPAR there is a hypervisor involved (which manages the actual HW DIMMs), so yes. > > Avoiding to put kexec kernel into movable zone can't solve this DLPAR > case as you said. > >> >> Now, the interesting question is, can we have LMBs added during boot >> (not via add_memory()), that will later be removed via remove_memory(). >> IIRC, we had BUGs related to that, so I think yes. If a section contains >> no unmovable allocations (after boot), it can get removed. > > I do want to ask this question. If we can add LMB into system RAM, then > reload kexec can solve it. > > Another better way is adding a common function to filter out the > movable zone when search position for kexec kernel, use a arch specific > funciton to filter out DLPAR memory blocks for ppc only. Over there, > we can simply use for_each_drmem_lmb() to do that. I was thinking about something similar. Maybe something like a notifier that can be used to test if selected memory can be used for kexec images. It would apply to - arm64 and filter out all hotadded memory (IIRC, only boot memory can be used). - powerpc to filter out all LMBs that can be removed (assuming not all memory corresponds to LMBs that can be removed, otherwise we're in trouble ... :) ) - virtio-mem to filter out all memory it added. - hyper-v to filter out partially backed memory blocks (esp. the last memory block it added and only partially backed it by memory). This would make it work for kexec_file_load(), however, I do wonder how we would want to approach that from userspace kexec-tools when handling it from kexec_load(). -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 14:49 ` David Hildenbrand (?) (?) @ 2020-04-15 2:35 ` Baoquan He -1 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-15 2:35 UTC (permalink / raw) To: David Hildenbrand Cc: Eric W. Biederman, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel, linuxppc-dev, piliu On 04/14/20 at 04:49pm, David Hildenbrand wrote: > >>>>> The root cause is kexec-ed kernel is targeted at hotpluggable memory > >>>>> region. Just avoiding the movable area can fix it. In kexec_file_load(), > >>>>> just checking or picking those unmovable region to put kernel/initrd in > >>>>> function locate_mem_hole_callback() can fix it. The page or pageblock's > >>>>> zone is movable or not, it's easy to know. This fix doesn't need to > >>>>> bother other component. > >>>> > >>>> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL > >>>> does not imply that it cannot get offlined and removed e.g., this is > >>>> heavily used on ppc64, with 16MB sections. > >>> > >>> Really? I just know there are two kinds of mem hoplug in ppc, but don't > >>> know the details. So in this case, is there any flag or a way to know > >>> those memory block are hotpluggable? I am curious how those kernel data > >>> is avoided to be put in this area. Or ppc just freely uses it for kernel > >>> data or user space data, then try to migrate when hot remove? > >> > >> See > >> arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() > >> > >> Under DLAPR, it can remove memory in LMB granularity, which is usually > >> 16MB (== single section on ppc64). DLPAR will directly online all > >> hotplugged memory (LMBs) from the kernel using device_online(), which > >> will go to ZONE_NORMAL. > >> > >> When trying to remove memory, it simply scans for offlineable 16MB > >> memory blocks (==section == LMB), offlines and removes them. No need for > >> the movable zone and all the involved issues. > > > > Yes, this is a different one, thanks for pointing it out. It sounds like > > balloon driver in virt platform, doesn't it? > > With DLPAR there is a hypervisor involved (which manages the actual HW > DIMMs), so yes. > > > > > Avoiding to put kexec kernel into movable zone can't solve this DLPAR > > case as you said. > > > >> > >> Now, the interesting question is, can we have LMBs added during boot > >> (not via add_memory()), that will later be removed via remove_memory(). > >> IIRC, we had BUGs related to that, so I think yes. If a section contains > >> no unmovable allocations (after boot), it can get removed. > > > > I do want to ask this question. If we can add LMB into system RAM, then > > reload kexec can solve it. > > > > Another better way is adding a common function to filter out the > > movable zone when search position for kexec kernel, use a arch specific > > funciton to filter out DLPAR memory blocks for ppc only. Over there, > > we can simply use for_each_drmem_lmb() to do that. > > I was thinking about something similar. Maybe something like a notifier > that can be used to test if selected memory can be used for kexec Not sure if I get the notifier idea clearly. If you mean 1) Add a common function to pick memory in unmovable zone; 2) Let DLPAR, balloon register with notifier; 3) In the common function, ask notified part to check if the picked unmovable memory is available for locating kexec kernel; Sounds doable to me, and not complicated. > images. It would apply to > > - arm64 and filter out all hotadded memory (IIRC, only boot memory can > be used). Do you mean hot added memory after boot can't be recognized and added into system RAM on arm64? > - powerpc to filter out all LMBs that can be removed (assuming not all > memory corresponds to LMBs that can be removed, otherwise we're in > trouble ... :) ) > - virtio-mem to filter out all memory it added. > - hyper-v to filter out partially backed memory blocks (esp. the last > memory block it added and only partially backed it by memory). > > This would make it work for kexec_file_load(), however, I do wonder how > we would want to approach that from userspace kexec-tools when handling > it from kexec_load(). Let's make kexec_file_load work firstly. Since this work is only first step to make kexec-ed kernel not break memory hotplug. After kexec rebooting, the KASLR may locate kernel into hotpluggable area too. ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-15 2:35 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-15 2:35 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/14/20 at 04:49pm, David Hildenbrand wrote: > >>>>> The root cause is kexec-ed kernel is targeted at hotpluggable memory > >>>>> region. Just avoiding the movable area can fix it. In kexec_file_load(), > >>>>> just checking or picking those unmovable region to put kernel/initrd in > >>>>> function locate_mem_hole_callback() can fix it. The page or pageblock's > >>>>> zone is movable or not, it's easy to know. This fix doesn't need to > >>>>> bother other component. > >>>> > >>>> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL > >>>> does not imply that it cannot get offlined and removed e.g., this is > >>>> heavily used on ppc64, with 16MB sections. > >>> > >>> Really? I just know there are two kinds of mem hoplug in ppc, but don't > >>> know the details. So in this case, is there any flag or a way to know > >>> those memory block are hotpluggable? I am curious how those kernel data > >>> is avoided to be put in this area. Or ppc just freely uses it for kernel > >>> data or user space data, then try to migrate when hot remove? > >> > >> See > >> arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() > >> > >> Under DLAPR, it can remove memory in LMB granularity, which is usually > >> 16MB (== single section on ppc64). DLPAR will directly online all > >> hotplugged memory (LMBs) from the kernel using device_online(), which > >> will go to ZONE_NORMAL. > >> > >> When trying to remove memory, it simply scans for offlineable 16MB > >> memory blocks (==section == LMB), offlines and removes them. No need for > >> the movable zone and all the involved issues. > > > > Yes, this is a different one, thanks for pointing it out. It sounds like > > balloon driver in virt platform, doesn't it? > > With DLPAR there is a hypervisor involved (which manages the actual HW > DIMMs), so yes. > > > > > Avoiding to put kexec kernel into movable zone can't solve this DLPAR > > case as you said. > > > >> > >> Now, the interesting question is, can we have LMBs added during boot > >> (not via add_memory()), that will later be removed via remove_memory(). > >> IIRC, we had BUGs related to that, so I think yes. If a section contains > >> no unmovable allocations (after boot), it can get removed. > > > > I do want to ask this question. If we can add LMB into system RAM, then > > reload kexec can solve it. > > > > Another better way is adding a common function to filter out the > > movable zone when search position for kexec kernel, use a arch specific > > funciton to filter out DLPAR memory blocks for ppc only. Over there, > > we can simply use for_each_drmem_lmb() to do that. > > I was thinking about something similar. Maybe something like a notifier > that can be used to test if selected memory can be used for kexec Not sure if I get the notifier idea clearly. If you mean 1) Add a common function to pick memory in unmovable zone; 2) Let DLPAR, balloon register with notifier; 3) In the common function, ask notified part to check if the picked unmovable memory is available for locating kexec kernel; Sounds doable to me, and not complicated. > images. It would apply to > > - arm64 and filter out all hotadded memory (IIRC, only boot memory can > be used). Do you mean hot added memory after boot can't be recognized and added into system RAM on arm64? > - powerpc to filter out all LMBs that can be removed (assuming not all > memory corresponds to LMBs that can be removed, otherwise we're in > trouble ... :) ) > - virtio-mem to filter out all memory it added. > - hyper-v to filter out partially backed memory blocks (esp. the last > memory block it added and only partially backed it by memory). > > This would make it work for kexec_file_load(), however, I do wonder how > we would want to approach that from userspace kexec-tools when handling > it from kexec_load(). Let's make kexec_file_load work firstly. Since this work is only first step to make kexec-ed kernel not break memory hotplug. After kexec rebooting, the KASLR may locate kernel into hotpluggable area too. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-15 2:35 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-15 2:35 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/14/20 at 04:49pm, David Hildenbrand wrote: > >>>>> The root cause is kexec-ed kernel is targeted at hotpluggable memory > >>>>> region. Just avoiding the movable area can fix it. In kexec_file_load(), > >>>>> just checking or picking those unmovable region to put kernel/initrd in > >>>>> function locate_mem_hole_callback() can fix it. The page or pageblock's > >>>>> zone is movable or not, it's easy to know. This fix doesn't need to > >>>>> bother other component. > >>>> > >>>> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL > >>>> does not imply that it cannot get offlined and removed e.g., this is > >>>> heavily used on ppc64, with 16MB sections. > >>> > >>> Really? I just know there are two kinds of mem hoplug in ppc, but don't > >>> know the details. So in this case, is there any flag or a way to know > >>> those memory block are hotpluggable? I am curious how those kernel data > >>> is avoided to be put in this area. Or ppc just freely uses it for kernel > >>> data or user space data, then try to migrate when hot remove? > >> > >> See > >> arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() > >> > >> Under DLAPR, it can remove memory in LMB granularity, which is usually > >> 16MB (== single section on ppc64). DLPAR will directly online all > >> hotplugged memory (LMBs) from the kernel using device_online(), which > >> will go to ZONE_NORMAL. > >> > >> When trying to remove memory, it simply scans for offlineable 16MB > >> memory blocks (==section == LMB), offlines and removes them. No need for > >> the movable zone and all the involved issues. > > > > Yes, this is a different one, thanks for pointing it out. It sounds like > > balloon driver in virt platform, doesn't it? > > With DLPAR there is a hypervisor involved (which manages the actual HW > DIMMs), so yes. > > > > > Avoiding to put kexec kernel into movable zone can't solve this DLPAR > > case as you said. > > > >> > >> Now, the interesting question is, can we have LMBs added during boot > >> (not via add_memory()), that will later be removed via remove_memory(). > >> IIRC, we had BUGs related to that, so I think yes. If a section contains > >> no unmovable allocations (after boot), it can get removed. > > > > I do want to ask this question. If we can add LMB into system RAM, then > > reload kexec can solve it. > > > > Another better way is adding a common function to filter out the > > movable zone when search position for kexec kernel, use a arch specific > > funciton to filter out DLPAR memory blocks for ppc only. Over there, > > we can simply use for_each_drmem_lmb() to do that. > > I was thinking about something similar. Maybe something like a notifier > that can be used to test if selected memory can be used for kexec Not sure if I get the notifier idea clearly. If you mean 1) Add a common function to pick memory in unmovable zone; 2) Let DLPAR, balloon register with notifier; 3) In the common function, ask notified part to check if the picked unmovable memory is available for locating kexec kernel; Sounds doable to me, and not complicated. > images. It would apply to > > - arm64 and filter out all hotadded memory (IIRC, only boot memory can > be used). Do you mean hot added memory after boot can't be recognized and added into system RAM on arm64? > - powerpc to filter out all LMBs that can be removed (assuming not all > memory corresponds to LMBs that can be removed, otherwise we're in > trouble ... :) ) > - virtio-mem to filter out all memory it added. > - hyper-v to filter out partially backed memory blocks (esp. the last > memory block it added and only partially backed it by memory). > > This would make it work for kexec_file_load(), however, I do wonder how > we would want to approach that from userspace kexec-tools when handling > it from kexec_load(). Let's make kexec_file_load work firstly. Since this work is only first step to make kexec-ed kernel not break memory hotplug. After kexec rebooting, the KASLR may locate kernel into hotpluggable area too. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-15 2:35 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-15 2:35 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/14/20 at 04:49pm, David Hildenbrand wrote: > >>>>> The root cause is kexec-ed kernel is targeted at hotpluggable memory > >>>>> region. Just avoiding the movable area can fix it. In kexec_file_load(), > >>>>> just checking or picking those unmovable region to put kernel/initrd in > >>>>> function locate_mem_hole_callback() can fix it. The page or pageblock's > >>>>> zone is movable or not, it's easy to know. This fix doesn't need to > >>>>> bother other component. > >>>> > >>>> I don't fully agree. E.g., just because memory is onlined to ZONE_NORMAL > >>>> does not imply that it cannot get offlined and removed e.g., this is > >>>> heavily used on ppc64, with 16MB sections. > >>> > >>> Really? I just know there are two kinds of mem hoplug in ppc, but don't > >>> know the details. So in this case, is there any flag or a way to know > >>> those memory block are hotpluggable? I am curious how those kernel data > >>> is avoided to be put in this area. Or ppc just freely uses it for kernel > >>> data or user space data, then try to migrate when hot remove? > >> > >> See > >> arch/powerpc/platforms/pseries/hotplug-memory.c:dlpar_memory_remove_by_count() > >> > >> Under DLAPR, it can remove memory in LMB granularity, which is usually > >> 16MB (== single section on ppc64). DLPAR will directly online all > >> hotplugged memory (LMBs) from the kernel using device_online(), which > >> will go to ZONE_NORMAL. > >> > >> When trying to remove memory, it simply scans for offlineable 16MB > >> memory blocks (==section == LMB), offlines and removes them. No need for > >> the movable zone and all the involved issues. > > > > Yes, this is a different one, thanks for pointing it out. It sounds like > > balloon driver in virt platform, doesn't it? > > With DLPAR there is a hypervisor involved (which manages the actual HW > DIMMs), so yes. > > > > > Avoiding to put kexec kernel into movable zone can't solve this DLPAR > > case as you said. > > > >> > >> Now, the interesting question is, can we have LMBs added during boot > >> (not via add_memory()), that will later be removed via remove_memory(). > >> IIRC, we had BUGs related to that, so I think yes. If a section contains > >> no unmovable allocations (after boot), it can get removed. > > > > I do want to ask this question. If we can add LMB into system RAM, then > > reload kexec can solve it. > > > > Another better way is adding a common function to filter out the > > movable zone when search position for kexec kernel, use a arch specific > > funciton to filter out DLPAR memory blocks for ppc only. Over there, > > we can simply use for_each_drmem_lmb() to do that. > > I was thinking about something similar. Maybe something like a notifier > that can be used to test if selected memory can be used for kexec Not sure if I get the notifier idea clearly. If you mean 1) Add a common function to pick memory in unmovable zone; 2) Let DLPAR, balloon register with notifier; 3) In the common function, ask notified part to check if the picked unmovable memory is available for locating kexec kernel; Sounds doable to me, and not complicated. > images. It would apply to > > - arm64 and filter out all hotadded memory (IIRC, only boot memory can > be used). Do you mean hot added memory after boot can't be recognized and added into system RAM on arm64? > - powerpc to filter out all LMBs that can be removed (assuming not all > memory corresponds to LMBs that can be removed, otherwise we're in > trouble ... :) ) > - virtio-mem to filter out all memory it added. > - hyper-v to filter out partially backed memory blocks (esp. the last > memory block it added and only partially backed it by memory). > > This would make it work for kexec_file_load(), however, I do wonder how > we would want to approach that from userspace kexec-tools when handling > it from kexec_load(). Let's make kexec_file_load work firstly. Since this work is only first step to make kexec-ed kernel not break memory hotplug. After kexec rebooting, the KASLR may locate kernel into hotpluggable area too. ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-15 2:35 ` Baoquan He (?) (?) @ 2020-04-16 13:31 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-16 13:31 UTC (permalink / raw) To: Baoquan He Cc: Eric W. Biederman, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel, linuxppc-dev, piliu > Not sure if I get the notifier idea clearly. If you mean > > 1) Add a common function to pick memory in unmovable zone; Not strictly required IMHO. But, minor detail. > 2) Let DLPAR, balloon register with notifier; Yeah, or virtio-mem, or any other technology that adds/removes memory dynamically. > 3) In the common function, ask notified part to check if the picked > unmovable memory is available for locating kexec kernel; Yeah. > > Sounds doable to me, and not complicated. > >> images. It would apply to >> >> - arm64 and filter out all hotadded memory (IIRC, only boot memory can >> be used). > > Do you mean hot added memory after boot can't be recognized and added > into system RAM on arm64? See patch #3 of this patch set, which wants to avoid placing kexec binaries on hotplugged memory. But I have no idea what the current plan regarding arm64 is (this thread exploded :) ). I would assume that we don't want to place kexec images on any hotplugged (or rather: hot(un)pluggable) memory - on any architecture. > > >> - powerpc to filter out all LMBs that can be removed (assuming not all >> memory corresponds to LMBs that can be removed, otherwise we're in >> trouble ... :) ) >> - virtio-mem to filter out all memory it added. >> - hyper-v to filter out partially backed memory blocks (esp. the last >> memory block it added and only partially backed it by memory). >> >> This would make it work for kexec_file_load(), however, I do wonder how >> we would want to approach that from userspace kexec-tools when handling >> it from kexec_load(). > > Let's make kexec_file_load work firstly. Since this work is only first > step to make kexec-ed kernel not break memory hotplug. After kexec > rebooting, the KASLR may locate kernel into hotpluggable area too. Can you elaborate how that would work? -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-16 13:31 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-16 13:31 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel > Not sure if I get the notifier idea clearly. If you mean > > 1) Add a common function to pick memory in unmovable zone; Not strictly required IMHO. But, minor detail. > 2) Let DLPAR, balloon register with notifier; Yeah, or virtio-mem, or any other technology that adds/removes memory dynamically. > 3) In the common function, ask notified part to check if the picked > unmovable memory is available for locating kexec kernel; Yeah. > > Sounds doable to me, and not complicated. > >> images. It would apply to >> >> - arm64 and filter out all hotadded memory (IIRC, only boot memory can >> be used). > > Do you mean hot added memory after boot can't be recognized and added > into system RAM on arm64? See patch #3 of this patch set, which wants to avoid placing kexec binaries on hotplugged memory. But I have no idea what the current plan regarding arm64 is (this thread exploded :) ). I would assume that we don't want to place kexec images on any hotplugged (or rather: hot(un)pluggable) memory - on any architecture. > > >> - powerpc to filter out all LMBs that can be removed (assuming not all >> memory corresponds to LMBs that can be removed, otherwise we're in >> trouble ... :) ) >> - virtio-mem to filter out all memory it added. >> - hyper-v to filter out partially backed memory blocks (esp. the last >> memory block it added and only partially backed it by memory). >> >> This would make it work for kexec_file_load(), however, I do wonder how >> we would want to approach that from userspace kexec-tools when handling >> it from kexec_load(). > > Let's make kexec_file_load work firstly. Since this work is only first > step to make kexec-ed kernel not break memory hotplug. After kexec > rebooting, the KASLR may locate kernel into hotpluggable area too. Can you elaborate how that would work? -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-16 13:31 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-16 13:31 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel > Not sure if I get the notifier idea clearly. If you mean > > 1) Add a common function to pick memory in unmovable zone; Not strictly required IMHO. But, minor detail. > 2) Let DLPAR, balloon register with notifier; Yeah, or virtio-mem, or any other technology that adds/removes memory dynamically. > 3) In the common function, ask notified part to check if the picked > unmovable memory is available for locating kexec kernel; Yeah. > > Sounds doable to me, and not complicated. > >> images. It would apply to >> >> - arm64 and filter out all hotadded memory (IIRC, only boot memory can >> be used). > > Do you mean hot added memory after boot can't be recognized and added > into system RAM on arm64? See patch #3 of this patch set, which wants to avoid placing kexec binaries on hotplugged memory. But I have no idea what the current plan regarding arm64 is (this thread exploded :) ). I would assume that we don't want to place kexec images on any hotplugged (or rather: hot(un)pluggable) memory - on any architecture. > > >> - powerpc to filter out all LMBs that can be removed (assuming not all >> memory corresponds to LMBs that can be removed, otherwise we're in >> trouble ... :) ) >> - virtio-mem to filter out all memory it added. >> - hyper-v to filter out partially backed memory blocks (esp. the last >> memory block it added and only partially backed it by memory). >> >> This would make it work for kexec_file_load(), however, I do wonder how >> we would want to approach that from userspace kexec-tools when handling >> it from kexec_load(). > > Let's make kexec_file_load work firstly. Since this work is only first > step to make kexec-ed kernel not break memory hotplug. After kexec > rebooting, the KASLR may locate kernel into hotpluggable area too. Can you elaborate how that would work? -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-16 13:31 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-16 13:31 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel > Not sure if I get the notifier idea clearly. If you mean > > 1) Add a common function to pick memory in unmovable zone; Not strictly required IMHO. But, minor detail. > 2) Let DLPAR, balloon register with notifier; Yeah, or virtio-mem, or any other technology that adds/removes memory dynamically. > 3) In the common function, ask notified part to check if the picked > unmovable memory is available for locating kexec kernel; Yeah. > > Sounds doable to me, and not complicated. > >> images. It would apply to >> >> - arm64 and filter out all hotadded memory (IIRC, only boot memory can >> be used). > > Do you mean hot added memory after boot can't be recognized and added > into system RAM on arm64? See patch #3 of this patch set, which wants to avoid placing kexec binaries on hotplugged memory. But I have no idea what the current plan regarding arm64 is (this thread exploded :) ). I would assume that we don't want to place kexec images on any hotplugged (or rather: hot(un)pluggable) memory - on any architecture. > > >> - powerpc to filter out all LMBs that can be removed (assuming not all >> memory corresponds to LMBs that can be removed, otherwise we're in >> trouble ... :) ) >> - virtio-mem to filter out all memory it added. >> - hyper-v to filter out partially backed memory blocks (esp. the last >> memory block it added and only partially backed it by memory). >> >> This would make it work for kexec_file_load(), however, I do wonder how >> we would want to approach that from userspace kexec-tools when handling >> it from kexec_load(). > > Let's make kexec_file_load work firstly. Since this work is only first > step to make kexec-ed kernel not break memory hotplug. After kexec > rebooting, the KASLR may locate kernel into hotpluggable area too. Can you elaborate how that would work? -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-16 13:31 ` David Hildenbrand (?) (?) @ 2020-04-16 14:02 ` Baoquan He -1 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-16 14:02 UTC (permalink / raw) To: David Hildenbrand Cc: Eric W. Biederman, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel, linuxppc-dev, piliu On 04/16/20 at 03:31pm, David Hildenbrand wrote: > > Not sure if I get the notifier idea clearly. If you mean > > > > 1) Add a common function to pick memory in unmovable zone; > > Not strictly required IMHO. But, minor detail. > > > 2) Let DLPAR, balloon register with notifier; > > Yeah, or virtio-mem, or any other technology that adds/removes memory > dynamically. > > > 3) In the common function, ask notified part to check if the picked > > unmovable memory is available for locating kexec kernel; > > Yeah. These may not be needed, please see below comment. > > > > > Sounds doable to me, and not complicated. > > > >> images. It would apply to > >> > >> - arm64 and filter out all hotadded memory (IIRC, only boot memory can > >> be used). > > > > Do you mean hot added memory after boot can't be recognized and added > > into system RAM on arm64? > > See patch #3 of this patch set, which wants to avoid placing kexec > binaries on hotplugged memory. But I have no idea what the current plan > regarding arm64 is (this thread exploded :) ). > > I would assume that we don't want to place kexec images on any > hotplugged (or rather: hot(un)pluggable) memory - on any architecture. Yes, noticed that and James replied to DaveY. Later, when I was considering to make a draft patch to do the picking of memory from normal zone, and add a notifier, as we discussed at above, I suddenly realized that kexec_file_load doesn't have this issue. It traverse system RAM bottom up to get an available region to put kernel/initrd/boot_param, etc. I can't think of a system where its low memory could be unavailable. > > > > > > >> - powerpc to filter out all LMBs that can be removed (assuming not all > >> memory corresponds to LMBs that can be removed, otherwise we're in > >> trouble ... :) ) > >> - virtio-mem to filter out all memory it added. > >> - hyper-v to filter out partially backed memory blocks (esp. the last > >> memory block it added and only partially backed it by memory). > >> > >> This would make it work for kexec_file_load(), however, I do wonder how > >> we would want to approach that from userspace kexec-tools when handling > >> it from kexec_load(). > > > > Let's make kexec_file_load work firstly. Since this work is only first > > step to make kexec-ed kernel not break memory hotplug. After kexec > > rebooting, the KASLR may locate kernel into hotpluggable area too. > > Can you elaborate how that would work? Well, boot memory can be hotplugged or not after boot, they are marked in uefi tables, the current kexec doesn't save and pass them into 2nd kenrel, when kexec kernel bootup, it need read them and avoid them to randomize kernel into. ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-16 14:02 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-16 14:02 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/16/20 at 03:31pm, David Hildenbrand wrote: > > Not sure if I get the notifier idea clearly. If you mean > > > > 1) Add a common function to pick memory in unmovable zone; > > Not strictly required IMHO. But, minor detail. > > > 2) Let DLPAR, balloon register with notifier; > > Yeah, or virtio-mem, or any other technology that adds/removes memory > dynamically. > > > 3) In the common function, ask notified part to check if the picked > > unmovable memory is available for locating kexec kernel; > > Yeah. These may not be needed, please see below comment. > > > > > Sounds doable to me, and not complicated. > > > >> images. It would apply to > >> > >> - arm64 and filter out all hotadded memory (IIRC, only boot memory can > >> be used). > > > > Do you mean hot added memory after boot can't be recognized and added > > into system RAM on arm64? > > See patch #3 of this patch set, which wants to avoid placing kexec > binaries on hotplugged memory. But I have no idea what the current plan > regarding arm64 is (this thread exploded :) ). > > I would assume that we don't want to place kexec images on any > hotplugged (or rather: hot(un)pluggable) memory - on any architecture. Yes, noticed that and James replied to DaveY. Later, when I was considering to make a draft patch to do the picking of memory from normal zone, and add a notifier, as we discussed at above, I suddenly realized that kexec_file_load doesn't have this issue. It traverse system RAM bottom up to get an available region to put kernel/initrd/boot_param, etc. I can't think of a system where its low memory could be unavailable. > > > > > > >> - powerpc to filter out all LMBs that can be removed (assuming not all > >> memory corresponds to LMBs that can be removed, otherwise we're in > >> trouble ... :) ) > >> - virtio-mem to filter out all memory it added. > >> - hyper-v to filter out partially backed memory blocks (esp. the last > >> memory block it added and only partially backed it by memory). > >> > >> This would make it work for kexec_file_load(), however, I do wonder how > >> we would want to approach that from userspace kexec-tools when handling > >> it from kexec_load(). > > > > Let's make kexec_file_load work firstly. Since this work is only first > > step to make kexec-ed kernel not break memory hotplug. After kexec > > rebooting, the KASLR may locate kernel into hotpluggable area too. > > Can you elaborate how that would work? Well, boot memory can be hotplugged or not after boot, they are marked in uefi tables, the current kexec doesn't save and pass them into 2nd kenrel, when kexec kernel bootup, it need read them and avoid them to randomize kernel into. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-16 14:02 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-16 14:02 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/16/20 at 03:31pm, David Hildenbrand wrote: > > Not sure if I get the notifier idea clearly. If you mean > > > > 1) Add a common function to pick memory in unmovable zone; > > Not strictly required IMHO. But, minor detail. > > > 2) Let DLPAR, balloon register with notifier; > > Yeah, or virtio-mem, or any other technology that adds/removes memory > dynamically. > > > 3) In the common function, ask notified part to check if the picked > > unmovable memory is available for locating kexec kernel; > > Yeah. These may not be needed, please see below comment. > > > > > Sounds doable to me, and not complicated. > > > >> images. It would apply to > >> > >> - arm64 and filter out all hotadded memory (IIRC, only boot memory can > >> be used). > > > > Do you mean hot added memory after boot can't be recognized and added > > into system RAM on arm64? > > See patch #3 of this patch set, which wants to avoid placing kexec > binaries on hotplugged memory. But I have no idea what the current plan > regarding arm64 is (this thread exploded :) ). > > I would assume that we don't want to place kexec images on any > hotplugged (or rather: hot(un)pluggable) memory - on any architecture. Yes, noticed that and James replied to DaveY. Later, when I was considering to make a draft patch to do the picking of memory from normal zone, and add a notifier, as we discussed at above, I suddenly realized that kexec_file_load doesn't have this issue. It traverse system RAM bottom up to get an available region to put kernel/initrd/boot_param, etc. I can't think of a system where its low memory could be unavailable. > > > > > > >> - powerpc to filter out all LMBs that can be removed (assuming not all > >> memory corresponds to LMBs that can be removed, otherwise we're in > >> trouble ... :) ) > >> - virtio-mem to filter out all memory it added. > >> - hyper-v to filter out partially backed memory blocks (esp. the last > >> memory block it added and only partially backed it by memory). > >> > >> This would make it work for kexec_file_load(), however, I do wonder how > >> we would want to approach that from userspace kexec-tools when handling > >> it from kexec_load(). > > > > Let's make kexec_file_load work firstly. Since this work is only first > > step to make kexec-ed kernel not break memory hotplug. After kexec > > rebooting, the KASLR may locate kernel into hotpluggable area too. > > Can you elaborate how that would work? Well, boot memory can be hotplugged or not after boot, they are marked in uefi tables, the current kexec doesn't save and pass them into 2nd kenrel, when kexec kernel bootup, it need read them and avoid them to randomize kernel into. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-16 14:02 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-16 14:02 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/16/20 at 03:31pm, David Hildenbrand wrote: > > Not sure if I get the notifier idea clearly. If you mean > > > > 1) Add a common function to pick memory in unmovable zone; > > Not strictly required IMHO. But, minor detail. > > > 2) Let DLPAR, balloon register with notifier; > > Yeah, or virtio-mem, or any other technology that adds/removes memory > dynamically. > > > 3) In the common function, ask notified part to check if the picked > > unmovable memory is available for locating kexec kernel; > > Yeah. These may not be needed, please see below comment. > > > > > Sounds doable to me, and not complicated. > > > >> images. It would apply to > >> > >> - arm64 and filter out all hotadded memory (IIRC, only boot memory can > >> be used). > > > > Do you mean hot added memory after boot can't be recognized and added > > into system RAM on arm64? > > See patch #3 of this patch set, which wants to avoid placing kexec > binaries on hotplugged memory. But I have no idea what the current plan > regarding arm64 is (this thread exploded :) ). > > I would assume that we don't want to place kexec images on any > hotplugged (or rather: hot(un)pluggable) memory - on any architecture. Yes, noticed that and James replied to DaveY. Later, when I was considering to make a draft patch to do the picking of memory from normal zone, and add a notifier, as we discussed at above, I suddenly realized that kexec_file_load doesn't have this issue. It traverse system RAM bottom up to get an available region to put kernel/initrd/boot_param, etc. I can't think of a system where its low memory could be unavailable. > > > > > > >> - powerpc to filter out all LMBs that can be removed (assuming not all > >> memory corresponds to LMBs that can be removed, otherwise we're in > >> trouble ... :) ) > >> - virtio-mem to filter out all memory it added. > >> - hyper-v to filter out partially backed memory blocks (esp. the last > >> memory block it added and only partially backed it by memory). > >> > >> This would make it work for kexec_file_load(), however, I do wonder how > >> we would want to approach that from userspace kexec-tools when handling > >> it from kexec_load(). > > > > Let's make kexec_file_load work firstly. Since this work is only first > > step to make kexec-ed kernel not break memory hotplug. After kexec > > rebooting, the KASLR may locate kernel into hotpluggable area too. > > Can you elaborate how that would work? Well, boot memory can be hotplugged or not after boot, they are marked in uefi tables, the current kexec doesn't save and pass them into 2nd kenrel, when kexec kernel bootup, it need read them and avoid them to randomize kernel into. ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-16 14:02 ` Baoquan He (?) (?) @ 2020-04-16 14:09 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-16 14:09 UTC (permalink / raw) To: Baoquan He Cc: Eric W. Biederman, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel, linuxppc-dev, piliu >>> Sounds doable to me, and not complicated. >>> >>>> images. It would apply to >>>> >>>> - arm64 and filter out all hotadded memory (IIRC, only boot memory can >>>> be used). >>> >>> Do you mean hot added memory after boot can't be recognized and added >>> into system RAM on arm64? >> >> See patch #3 of this patch set, which wants to avoid placing kexec >> binaries on hotplugged memory. But I have no idea what the current plan >> regarding arm64 is (this thread exploded :) ). >> >> I would assume that we don't want to place kexec images on any >> hotplugged (or rather: hot(un)pluggable) memory - on any architecture. > > Yes, noticed that and James replied to DaveY. > > Later, when I was considering to make a draft patch to do the picking of > memory from normal zone, and add a notifier, as we discussed at above, I > suddenly realized that kexec_file_load doesn't have this issue. It > traverse system RAM bottom up to get an available region to put > kernel/initrd/boot_param, etc. I can't think of a system where its > low memory could be unavailable. kexec_walk_memblock() has the option for "kbuf->top_down". Only kexec_walk_resources() seems to ignore it. So I think in case of memblocks (e.g., arm64), this still applies? >> >>> >>> >>>> - powerpc to filter out all LMBs that can be removed (assuming not all >>>> memory corresponds to LMBs that can be removed, otherwise we're in >>>> trouble ... :) ) >>>> - virtio-mem to filter out all memory it added. >>>> - hyper-v to filter out partially backed memory blocks (esp. the last >>>> memory block it added and only partially backed it by memory). >>>> >>>> This would make it work for kexec_file_load(), however, I do wonder how >>>> we would want to approach that from userspace kexec-tools when handling >>>> it from kexec_load(). >>> >>> Let's make kexec_file_load work firstly. Since this work is only first >>> step to make kexec-ed kernel not break memory hotplug. After kexec >>> rebooting, the KASLR may locate kernel into hotpluggable area too. >> >> Can you elaborate how that would work? > > Well, boot memory can be hotplugged or not after boot, they are marked > in uefi tables, the current kexec doesn't save and pass them into 2nd > kenrel, when kexec kernel bootup, it need read them and avoid them to > randomize kernel into. What about e.g., memory hotplugged by ACPI? I would assume, that the kexec kernel will not make use of that (IOW detected that) until the ACPI driver comes up and re-detects + adds that memory. Or how would that machinery work in case we have a DIMM hotplugged via ACPI? -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-16 14:09 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-16 14:09 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel >>> Sounds doable to me, and not complicated. >>> >>>> images. It would apply to >>>> >>>> - arm64 and filter out all hotadded memory (IIRC, only boot memory can >>>> be used). >>> >>> Do you mean hot added memory after boot can't be recognized and added >>> into system RAM on arm64? >> >> See patch #3 of this patch set, which wants to avoid placing kexec >> binaries on hotplugged memory. But I have no idea what the current plan >> regarding arm64 is (this thread exploded :) ). >> >> I would assume that we don't want to place kexec images on any >> hotplugged (or rather: hot(un)pluggable) memory - on any architecture. > > Yes, noticed that and James replied to DaveY. > > Later, when I was considering to make a draft patch to do the picking of > memory from normal zone, and add a notifier, as we discussed at above, I > suddenly realized that kexec_file_load doesn't have this issue. It > traverse system RAM bottom up to get an available region to put > kernel/initrd/boot_param, etc. I can't think of a system where its > low memory could be unavailable. kexec_walk_memblock() has the option for "kbuf->top_down". Only kexec_walk_resources() seems to ignore it. So I think in case of memblocks (e.g., arm64), this still applies? >> >>> >>> >>>> - powerpc to filter out all LMBs that can be removed (assuming not all >>>> memory corresponds to LMBs that can be removed, otherwise we're in >>>> trouble ... :) ) >>>> - virtio-mem to filter out all memory it added. >>>> - hyper-v to filter out partially backed memory blocks (esp. the last >>>> memory block it added and only partially backed it by memory). >>>> >>>> This would make it work for kexec_file_load(), however, I do wonder how >>>> we would want to approach that from userspace kexec-tools when handling >>>> it from kexec_load(). >>> >>> Let's make kexec_file_load work firstly. Since this work is only first >>> step to make kexec-ed kernel not break memory hotplug. After kexec >>> rebooting, the KASLR may locate kernel into hotpluggable area too. >> >> Can you elaborate how that would work? > > Well, boot memory can be hotplugged or not after boot, they are marked > in uefi tables, the current kexec doesn't save and pass them into 2nd > kenrel, when kexec kernel bootup, it need read them and avoid them to > randomize kernel into. What about e.g., memory hotplugged by ACPI? I would assume, that the kexec kernel will not make use of that (IOW detected that) until the ACPI driver comes up and re-detects + adds that memory. Or how would that machinery work in case we have a DIMM hotplugged via ACPI? -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-16 14:09 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-16 14:09 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel >>> Sounds doable to me, and not complicated. >>> >>>> images. It would apply to >>>> >>>> - arm64 and filter out all hotadded memory (IIRC, only boot memory can >>>> be used). >>> >>> Do you mean hot added memory after boot can't be recognized and added >>> into system RAM on arm64? >> >> See patch #3 of this patch set, which wants to avoid placing kexec >> binaries on hotplugged memory. But I have no idea what the current plan >> regarding arm64 is (this thread exploded :) ). >> >> I would assume that we don't want to place kexec images on any >> hotplugged (or rather: hot(un)pluggable) memory - on any architecture. > > Yes, noticed that and James replied to DaveY. > > Later, when I was considering to make a draft patch to do the picking of > memory from normal zone, and add a notifier, as we discussed at above, I > suddenly realized that kexec_file_load doesn't have this issue. It > traverse system RAM bottom up to get an available region to put > kernel/initrd/boot_param, etc. I can't think of a system where its > low memory could be unavailable. kexec_walk_memblock() has the option for "kbuf->top_down". Only kexec_walk_resources() seems to ignore it. So I think in case of memblocks (e.g., arm64), this still applies? >> >>> >>> >>>> - powerpc to filter out all LMBs that can be removed (assuming not all >>>> memory corresponds to LMBs that can be removed, otherwise we're in >>>> trouble ... :) ) >>>> - virtio-mem to filter out all memory it added. >>>> - hyper-v to filter out partially backed memory blocks (esp. the last >>>> memory block it added and only partially backed it by memory). >>>> >>>> This would make it work for kexec_file_load(), however, I do wonder how >>>> we would want to approach that from userspace kexec-tools when handling >>>> it from kexec_load(). >>> >>> Let's make kexec_file_load work firstly. Since this work is only first >>> step to make kexec-ed kernel not break memory hotplug. After kexec >>> rebooting, the KASLR may locate kernel into hotpluggable area too. >> >> Can you elaborate how that would work? > > Well, boot memory can be hotplugged or not after boot, they are marked > in uefi tables, the current kexec doesn't save and pass them into 2nd > kenrel, when kexec kernel bootup, it need read them and avoid them to > randomize kernel into. What about e.g., memory hotplugged by ACPI? I would assume, that the kexec kernel will not make use of that (IOW detected that) until the ACPI driver comes up and re-detects + adds that memory. Or how would that machinery work in case we have a DIMM hotplugged via ACPI? -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-16 14:09 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-16 14:09 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel >>> Sounds doable to me, and not complicated. >>> >>>> images. It would apply to >>>> >>>> - arm64 and filter out all hotadded memory (IIRC, only boot memory can >>>> be used). >>> >>> Do you mean hot added memory after boot can't be recognized and added >>> into system RAM on arm64? >> >> See patch #3 of this patch set, which wants to avoid placing kexec >> binaries on hotplugged memory. But I have no idea what the current plan >> regarding arm64 is (this thread exploded :) ). >> >> I would assume that we don't want to place kexec images on any >> hotplugged (or rather: hot(un)pluggable) memory - on any architecture. > > Yes, noticed that and James replied to DaveY. > > Later, when I was considering to make a draft patch to do the picking of > memory from normal zone, and add a notifier, as we discussed at above, I > suddenly realized that kexec_file_load doesn't have this issue. It > traverse system RAM bottom up to get an available region to put > kernel/initrd/boot_param, etc. I can't think of a system where its > low memory could be unavailable. kexec_walk_memblock() has the option for "kbuf->top_down". Only kexec_walk_resources() seems to ignore it. So I think in case of memblocks (e.g., arm64), this still applies? >> >>> >>> >>>> - powerpc to filter out all LMBs that can be removed (assuming not all >>>> memory corresponds to LMBs that can be removed, otherwise we're in >>>> trouble ... :) ) >>>> - virtio-mem to filter out all memory it added. >>>> - hyper-v to filter out partially backed memory blocks (esp. the last >>>> memory block it added and only partially backed it by memory). >>>> >>>> This would make it work for kexec_file_load(), however, I do wonder how >>>> we would want to approach that from userspace kexec-tools when handling >>>> it from kexec_load(). >>> >>> Let's make kexec_file_load work firstly. Since this work is only first >>> step to make kexec-ed kernel not break memory hotplug. After kexec >>> rebooting, the KASLR may locate kernel into hotpluggable area too. >> >> Can you elaborate how that would work? > > Well, boot memory can be hotplugged or not after boot, they are marked > in uefi tables, the current kexec doesn't save and pass them into 2nd > kenrel, when kexec kernel bootup, it need read them and avoid them to > randomize kernel into. What about e.g., memory hotplugged by ACPI? I would assume, that the kexec kernel will not make use of that (IOW detected that) until the ACPI driver comes up and re-detects + adds that memory. Or how would that machinery work in case we have a DIMM hotplugged via ACPI? -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-16 14:09 ` David Hildenbrand (?) (?) @ 2020-04-16 14:36 ` Baoquan He -1 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-16 14:36 UTC (permalink / raw) To: David Hildenbrand, Andrew Morton Cc: Eric W. Biederman, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Will Deacon, linux-arm-kernel, linuxppc-dev, piliu On 04/16/20 at 04:09pm, David Hildenbrand wrote: > >>> Sounds doable to me, and not complicated. > >>> > >>>> images. It would apply to > >>>> > >>>> - arm64 and filter out all hotadded memory (IIRC, only boot memory can > >>>> be used). > >>> > >>> Do you mean hot added memory after boot can't be recognized and added > >>> into system RAM on arm64? > >> > >> See patch #3 of this patch set, which wants to avoid placing kexec > >> binaries on hotplugged memory. But I have no idea what the current plan > >> regarding arm64 is (this thread exploded :) ). > >> > >> I would assume that we don't want to place kexec images on any > >> hotplugged (or rather: hot(un)pluggable) memory - on any architecture. > > > > Yes, noticed that and James replied to DaveY. > > > > Later, when I was considering to make a draft patch to do the picking of > > memory from normal zone, and add a notifier, as we discussed at above, I > > suddenly realized that kexec_file_load doesn't have this issue. It > > traverse system RAM bottom up to get an available region to put > > kernel/initrd/boot_param, etc. I can't think of a system where its > > low memory could be unavailable. > > kexec_walk_memblock() has the option for "kbuf->top_down". Only > kexec_walk_resources() seems to ignore it. Yeah, that top down searching is done in a found low mem area. Means firstly search an available region bottom up, then put kernel top down in that region. The reason is our iomem res is linked with singly linked list. So we can only search bottom up efficiently. kexec_load is doing the real top down searching, so kernel will be put at the top of system ram. I ever tried to change it to support top down searching for kexec_file_load too with patches, since QE and customers are often confused with this difference when debugging. Andrew may remeber this, he suggested me to change the singly linked list to doubly linked list for iomem res, then do the top down searching for kexec_file_load. I tried with some effort, the change introduced too much code change, I just gave up finally. http://archive.lwn.net:8080/devicetree/20180718024944.577-1-bhe@redhat.com/ I can see that top down searching for kexec can avoid the highly used low memory region, esp under 4G, for dma, kinds of firmware reserving, etc. And customers/QE of kexec get used to it. I can change kexec_file_load to top down too with a simple way if people really complain it. But now, seems bottom up is not bad too. > > So I think in case of memblocks (e.g., arm64), this still applies? Yeah, aren't you trying to remove it? I haven't read your patches carefully, maybe I got it wrong. And arm64 even can't support the hot added memory being able to recorded into firmware, seems it's not so ready, won't they change that design in the future? > > >> > >>> > >>> > >>>> - powerpc to filter out all LMBs that can be removed (assuming not all > >>>> memory corresponds to LMBs that can be removed, otherwise we're in > >>>> trouble ... :) ) > >>>> - virtio-mem to filter out all memory it added. > >>>> - hyper-v to filter out partially backed memory blocks (esp. the last > >>>> memory block it added and only partially backed it by memory). > >>>> > >>>> This would make it work for kexec_file_load(), however, I do wonder how > >>>> we would want to approach that from userspace kexec-tools when handling > >>>> it from kexec_load(). > >>> > >>> Let's make kexec_file_load work firstly. Since this work is only first > >>> step to make kexec-ed kernel not break memory hotplug. After kexec > >>> rebooting, the KASLR may locate kernel into hotpluggable area too. > >> > >> Can you elaborate how that would work? > > > > Well, boot memory can be hotplugged or not after boot, they are marked > > in uefi tables, the current kexec doesn't save and pass them into 2nd > > kenrel, when kexec kernel bootup, it need read them and avoid them to > > randomize kernel into. > > What about e.g., memory hotplugged by ACPI? I would assume, that the > kexec kernel will not make use of that (IOW detected that) until the > ACPI driver comes up and re-detects + adds that memory. > > Or how would that machinery work in case we have a DIMM hotplugged via ACPI? ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't pass the efi, it won't get the SRAT table correctly, if I remember correctly. Yeah, I remeber kvm guest can get memory hotplugged with ACPI only, this won't happen on bare metal though. Need check carefully. I have been using kvm guest with uefi firmwire recently. ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-16 14:36 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-16 14:36 UTC (permalink / raw) To: David Hildenbrand, Andrew Morton Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Will Deacon, linux-arm-kernel On 04/16/20 at 04:09pm, David Hildenbrand wrote: > >>> Sounds doable to me, and not complicated. > >>> > >>>> images. It would apply to > >>>> > >>>> - arm64 and filter out all hotadded memory (IIRC, only boot memory can > >>>> be used). > >>> > >>> Do you mean hot added memory after boot can't be recognized and added > >>> into system RAM on arm64? > >> > >> See patch #3 of this patch set, which wants to avoid placing kexec > >> binaries on hotplugged memory. But I have no idea what the current plan > >> regarding arm64 is (this thread exploded :) ). > >> > >> I would assume that we don't want to place kexec images on any > >> hotplugged (or rather: hot(un)pluggable) memory - on any architecture. > > > > Yes, noticed that and James replied to DaveY. > > > > Later, when I was considering to make a draft patch to do the picking of > > memory from normal zone, and add a notifier, as we discussed at above, I > > suddenly realized that kexec_file_load doesn't have this issue. It > > traverse system RAM bottom up to get an available region to put > > kernel/initrd/boot_param, etc. I can't think of a system where its > > low memory could be unavailable. > > kexec_walk_memblock() has the option for "kbuf->top_down". Only > kexec_walk_resources() seems to ignore it. Yeah, that top down searching is done in a found low mem area. Means firstly search an available region bottom up, then put kernel top down in that region. The reason is our iomem res is linked with singly linked list. So we can only search bottom up efficiently. kexec_load is doing the real top down searching, so kernel will be put at the top of system ram. I ever tried to change it to support top down searching for kexec_file_load too with patches, since QE and customers are often confused with this difference when debugging. Andrew may remeber this, he suggested me to change the singly linked list to doubly linked list for iomem res, then do the top down searching for kexec_file_load. I tried with some effort, the change introduced too much code change, I just gave up finally. http://archive.lwn.net:8080/devicetree/20180718024944.577-1-bhe@redhat.com/ I can see that top down searching for kexec can avoid the highly used low memory region, esp under 4G, for dma, kinds of firmware reserving, etc. And customers/QE of kexec get used to it. I can change kexec_file_load to top down too with a simple way if people really complain it. But now, seems bottom up is not bad too. > > So I think in case of memblocks (e.g., arm64), this still applies? Yeah, aren't you trying to remove it? I haven't read your patches carefully, maybe I got it wrong. And arm64 even can't support the hot added memory being able to recorded into firmware, seems it's not so ready, won't they change that design in the future? > > >> > >>> > >>> > >>>> - powerpc to filter out all LMBs that can be removed (assuming not all > >>>> memory corresponds to LMBs that can be removed, otherwise we're in > >>>> trouble ... :) ) > >>>> - virtio-mem to filter out all memory it added. > >>>> - hyper-v to filter out partially backed memory blocks (esp. the last > >>>> memory block it added and only partially backed it by memory). > >>>> > >>>> This would make it work for kexec_file_load(), however, I do wonder how > >>>> we would want to approach that from userspace kexec-tools when handling > >>>> it from kexec_load(). > >>> > >>> Let's make kexec_file_load work firstly. Since this work is only first > >>> step to make kexec-ed kernel not break memory hotplug. After kexec > >>> rebooting, the KASLR may locate kernel into hotpluggable area too. > >> > >> Can you elaborate how that would work? > > > > Well, boot memory can be hotplugged or not after boot, they are marked > > in uefi tables, the current kexec doesn't save and pass them into 2nd > > kenrel, when kexec kernel bootup, it need read them and avoid them to > > randomize kernel into. > > What about e.g., memory hotplugged by ACPI? I would assume, that the > kexec kernel will not make use of that (IOW detected that) until the > ACPI driver comes up and re-detects + adds that memory. > > Or how would that machinery work in case we have a DIMM hotplugged via ACPI? ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't pass the efi, it won't get the SRAT table correctly, if I remember correctly. Yeah, I remeber kvm guest can get memory hotplugged with ACPI only, this won't happen on bare metal though. Need check carefully. I have been using kvm guest with uefi firmwire recently. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-16 14:36 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-16 14:36 UTC (permalink / raw) To: David Hildenbrand, Andrew Morton Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Will Deacon, linux-arm-kernel On 04/16/20 at 04:09pm, David Hildenbrand wrote: > >>> Sounds doable to me, and not complicated. > >>> > >>>> images. It would apply to > >>>> > >>>> - arm64 and filter out all hotadded memory (IIRC, only boot memory can > >>>> be used). > >>> > >>> Do you mean hot added memory after boot can't be recognized and added > >>> into system RAM on arm64? > >> > >> See patch #3 of this patch set, which wants to avoid placing kexec > >> binaries on hotplugged memory. But I have no idea what the current plan > >> regarding arm64 is (this thread exploded :) ). > >> > >> I would assume that we don't want to place kexec images on any > >> hotplugged (or rather: hot(un)pluggable) memory - on any architecture. > > > > Yes, noticed that and James replied to DaveY. > > > > Later, when I was considering to make a draft patch to do the picking of > > memory from normal zone, and add a notifier, as we discussed at above, I > > suddenly realized that kexec_file_load doesn't have this issue. It > > traverse system RAM bottom up to get an available region to put > > kernel/initrd/boot_param, etc. I can't think of a system where its > > low memory could be unavailable. > > kexec_walk_memblock() has the option for "kbuf->top_down". Only > kexec_walk_resources() seems to ignore it. Yeah, that top down searching is done in a found low mem area. Means firstly search an available region bottom up, then put kernel top down in that region. The reason is our iomem res is linked with singly linked list. So we can only search bottom up efficiently. kexec_load is doing the real top down searching, so kernel will be put at the top of system ram. I ever tried to change it to support top down searching for kexec_file_load too with patches, since QE and customers are often confused with this difference when debugging. Andrew may remeber this, he suggested me to change the singly linked list to doubly linked list for iomem res, then do the top down searching for kexec_file_load. I tried with some effort, the change introduced too much code change, I just gave up finally. http://archive.lwn.net:8080/devicetree/20180718024944.577-1-bhe@redhat.com/ I can see that top down searching for kexec can avoid the highly used low memory region, esp under 4G, for dma, kinds of firmware reserving, etc. And customers/QE of kexec get used to it. I can change kexec_file_load to top down too with a simple way if people really complain it. But now, seems bottom up is not bad too. > > So I think in case of memblocks (e.g., arm64), this still applies? Yeah, aren't you trying to remove it? I haven't read your patches carefully, maybe I got it wrong. And arm64 even can't support the hot added memory being able to recorded into firmware, seems it's not so ready, won't they change that design in the future? > > >> > >>> > >>> > >>>> - powerpc to filter out all LMBs that can be removed (assuming not all > >>>> memory corresponds to LMBs that can be removed, otherwise we're in > >>>> trouble ... :) ) > >>>> - virtio-mem to filter out all memory it added. > >>>> - hyper-v to filter out partially backed memory blocks (esp. the last > >>>> memory block it added and only partially backed it by memory). > >>>> > >>>> This would make it work for kexec_file_load(), however, I do wonder how > >>>> we would want to approach that from userspace kexec-tools when handling > >>>> it from kexec_load(). > >>> > >>> Let's make kexec_file_load work firstly. Since this work is only first > >>> step to make kexec-ed kernel not break memory hotplug. After kexec > >>> rebooting, the KASLR may locate kernel into hotpluggable area too. > >> > >> Can you elaborate how that would work? > > > > Well, boot memory can be hotplugged or not after boot, they are marked > > in uefi tables, the current kexec doesn't save and pass them into 2nd > > kenrel, when kexec kernel bootup, it need read them and avoid them to > > randomize kernel into. > > What about e.g., memory hotplugged by ACPI? I would assume, that the > kexec kernel will not make use of that (IOW detected that) until the > ACPI driver comes up and re-detects + adds that memory. > > Or how would that machinery work in case we have a DIMM hotplugged via ACPI? ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't pass the efi, it won't get the SRAT table correctly, if I remember correctly. Yeah, I remeber kvm guest can get memory hotplugged with ACPI only, this won't happen on bare metal though. Need check carefully. I have been using kvm guest with uefi firmwire recently. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-16 14:36 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-16 14:36 UTC (permalink / raw) To: David Hildenbrand, Andrew Morton Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Will Deacon, linux-arm-kernel On 04/16/20 at 04:09pm, David Hildenbrand wrote: > >>> Sounds doable to me, and not complicated. > >>> > >>>> images. It would apply to > >>>> > >>>> - arm64 and filter out all hotadded memory (IIRC, only boot memory can > >>>> be used). > >>> > >>> Do you mean hot added memory after boot can't be recognized and added > >>> into system RAM on arm64? > >> > >> See patch #3 of this patch set, which wants to avoid placing kexec > >> binaries on hotplugged memory. But I have no idea what the current plan > >> regarding arm64 is (this thread exploded :) ). > >> > >> I would assume that we don't want to place kexec images on any > >> hotplugged (or rather: hot(un)pluggable) memory - on any architecture. > > > > Yes, noticed that and James replied to DaveY. > > > > Later, when I was considering to make a draft patch to do the picking of > > memory from normal zone, and add a notifier, as we discussed at above, I > > suddenly realized that kexec_file_load doesn't have this issue. It > > traverse system RAM bottom up to get an available region to put > > kernel/initrd/boot_param, etc. I can't think of a system where its > > low memory could be unavailable. > > kexec_walk_memblock() has the option for "kbuf->top_down". Only > kexec_walk_resources() seems to ignore it. Yeah, that top down searching is done in a found low mem area. Means firstly search an available region bottom up, then put kernel top down in that region. The reason is our iomem res is linked with singly linked list. So we can only search bottom up efficiently. kexec_load is doing the real top down searching, so kernel will be put at the top of system ram. I ever tried to change it to support top down searching for kexec_file_load too with patches, since QE and customers are often confused with this difference when debugging. Andrew may remeber this, he suggested me to change the singly linked list to doubly linked list for iomem res, then do the top down searching for kexec_file_load. I tried with some effort, the change introduced too much code change, I just gave up finally. http://archive.lwn.net:8080/devicetree/20180718024944.577-1-bhe@redhat.com/ I can see that top down searching for kexec can avoid the highly used low memory region, esp under 4G, for dma, kinds of firmware reserving, etc. And customers/QE of kexec get used to it. I can change kexec_file_load to top down too with a simple way if people really complain it. But now, seems bottom up is not bad too. > > So I think in case of memblocks (e.g., arm64), this still applies? Yeah, aren't you trying to remove it? I haven't read your patches carefully, maybe I got it wrong. And arm64 even can't support the hot added memory being able to recorded into firmware, seems it's not so ready, won't they change that design in the future? > > >> > >>> > >>> > >>>> - powerpc to filter out all LMBs that can be removed (assuming not all > >>>> memory corresponds to LMBs that can be removed, otherwise we're in > >>>> trouble ... :) ) > >>>> - virtio-mem to filter out all memory it added. > >>>> - hyper-v to filter out partially backed memory blocks (esp. the last > >>>> memory block it added and only partially backed it by memory). > >>>> > >>>> This would make it work for kexec_file_load(), however, I do wonder how > >>>> we would want to approach that from userspace kexec-tools when handling > >>>> it from kexec_load(). > >>> > >>> Let's make kexec_file_load work firstly. Since this work is only first > >>> step to make kexec-ed kernel not break memory hotplug. After kexec > >>> rebooting, the KASLR may locate kernel into hotpluggable area too. > >> > >> Can you elaborate how that would work? > > > > Well, boot memory can be hotplugged or not after boot, they are marked > > in uefi tables, the current kexec doesn't save and pass them into 2nd > > kenrel, when kexec kernel bootup, it need read them and avoid them to > > randomize kernel into. > > What about e.g., memory hotplugged by ACPI? I would assume, that the > kexec kernel will not make use of that (IOW detected that) until the > ACPI driver comes up and re-detects + adds that memory. > > Or how would that machinery work in case we have a DIMM hotplugged via ACPI? ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't pass the efi, it won't get the SRAT table correctly, if I remember correctly. Yeah, I remeber kvm guest can get memory hotplugged with ACPI only, this won't happen on bare metal though. Need check carefully. I have been using kvm guest with uefi firmwire recently. ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-16 14:36 ` Baoquan He (?) (?) @ 2020-04-16 14:47 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-16 14:47 UTC (permalink / raw) To: Baoquan He, Andrew Morton Cc: Eric W. Biederman, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Will Deacon, linux-arm-kernel, linuxppc-dev, piliu >> kexec_walk_memblock() has the option for "kbuf->top_down". Only >> kexec_walk_resources() seems to ignore it. > > Yeah, that top down searching is done in a found low mem area. Means > firstly search an available region bottom up, then put kernel top down > in that region. The reason is our iomem res is linked with singly linked > list. So we can only search bottom up efficiently. > > kexec_load is doing the real top down searching, so kernel will be put > at the top of system ram. I ever tried to change it to support top down > searching for kexec_file_load too with patches, since QE and customers > are often confused with this difference when debugging. > > Andrew may remeber this, he suggested me to change the singly linked list > to doubly linked list for iomem res, then do the top down searching for > kexec_file_load. I tried with some effort, the change introduced too much > code change, I just gave up finally. Well, at least right now this seems to be the right approach (hotplug), lol :) > > http://archive.lwn.net:8080/devicetree/20180718024944.577-1-bhe@redhat.com/ > > I can see that top down searching for kexec can avoid the highly used > low memory region, esp under 4G, for dma, kinds of firmware reserving, > etc. And customers/QE of kexec get used to it. I can change kexec_file_load > to top down too with a simple way if people really complain it. But now, > seems bottom up is not bad too. Ah, I understand the problem. Maybe a simple "optimization" would be to start searching bottom-up from e.g.,2GB/4GB first. If nothing was found, search botoom-up from 0-2GB/4GB etc. > >> >> So I think in case of memblocks (e.g., arm64), this still applies? > > Yeah, aren't you trying to remove it? I haven't read your patches > carefully, maybe I got it wrong. And arm64 even can't support the hot added For arm64 we're still creating memblocks for hotplugged memory, but I guess it's not too hard to stop doing that. > memory being able to recorded into firmware, seems it's not so ready, > won't they change that design in the future? It seems to be incomplete, yes. No idea if it's fixable, no arm64 expert ... >>>>>> - powerpc to filter out all LMBs that can be removed (assuming not all >>>>>> memory corresponds to LMBs that can be removed, otherwise we're in >>>>>> trouble ... :) ) >>>>>> - virtio-mem to filter out all memory it added. >>>>>> - hyper-v to filter out partially backed memory blocks (esp. the last >>>>>> memory block it added and only partially backed it by memory). >>>>>> >>>>>> This would make it work for kexec_file_load(), however, I do wonder how >>>>>> we would want to approach that from userspace kexec-tools when handling >>>>>> it from kexec_load(). >>>>> >>>>> Let's make kexec_file_load work firstly. Since this work is only first >>>>> step to make kexec-ed kernel not break memory hotplug. After kexec >>>>> rebooting, the KASLR may locate kernel into hotpluggable area too. >>>> >>>> Can you elaborate how that would work? >>> >>> Well, boot memory can be hotplugged or not after boot, they are marked >>> in uefi tables, the current kexec doesn't save and pass them into 2nd >>> kenrel, when kexec kernel bootup, it need read them and avoid them to >>> randomize kernel into. >> >> What about e.g., memory hotplugged by ACPI? I would assume, that the >> kexec kernel will not make use of that (IOW detected that) until the >> ACPI driver comes up and re-detects + adds that memory. >> >> Or how would that machinery work in case we have a DIMM hotplugged via ACPI? > > ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > pass the efi, it won't get the SRAT table correctly, if I remember > correctly. Yeah, I remeber kvm guest can get memory hotplugged with > ACPI only, this won't happen on bare metal though. Need check carefully. > I have been using kvm guest with uefi firmwire recently. Yeah, I can imagine that bare metal is different. kvm only uses ACPI. I'm also asking because of virtio-mem. Memory added via virtio-mem is not part of any efi tables or whatsoever. So I assume the kexec kernel will not detect it automatically (good!), instead load the virtio-mem driver and let it add memory back to the system. I should probably play with kexec and virtio-mem once I have some spare cycles ... to find out what's broken and needs to be addressed :) -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-16 14:47 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-16 14:47 UTC (permalink / raw) To: Baoquan He, Andrew Morton Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Will Deacon, linux-arm-kernel >> kexec_walk_memblock() has the option for "kbuf->top_down". Only >> kexec_walk_resources() seems to ignore it. > > Yeah, that top down searching is done in a found low mem area. Means > firstly search an available region bottom up, then put kernel top down > in that region. The reason is our iomem res is linked with singly linked > list. So we can only search bottom up efficiently. > > kexec_load is doing the real top down searching, so kernel will be put > at the top of system ram. I ever tried to change it to support top down > searching for kexec_file_load too with patches, since QE and customers > are often confused with this difference when debugging. > > Andrew may remeber this, he suggested me to change the singly linked list > to doubly linked list for iomem res, then do the top down searching for > kexec_file_load. I tried with some effort, the change introduced too much > code change, I just gave up finally. Well, at least right now this seems to be the right approach (hotplug), lol :) > > http://archive.lwn.net:8080/devicetree/20180718024944.577-1-bhe@redhat.com/ > > I can see that top down searching for kexec can avoid the highly used > low memory region, esp under 4G, for dma, kinds of firmware reserving, > etc. And customers/QE of kexec get used to it. I can change kexec_file_load > to top down too with a simple way if people really complain it. But now, > seems bottom up is not bad too. Ah, I understand the problem. Maybe a simple "optimization" would be to start searching bottom-up from e.g.,2GB/4GB first. If nothing was found, search botoom-up from 0-2GB/4GB etc. > >> >> So I think in case of memblocks (e.g., arm64), this still applies? > > Yeah, aren't you trying to remove it? I haven't read your patches > carefully, maybe I got it wrong. And arm64 even can't support the hot added For arm64 we're still creating memblocks for hotplugged memory, but I guess it's not too hard to stop doing that. > memory being able to recorded into firmware, seems it's not so ready, > won't they change that design in the future? It seems to be incomplete, yes. No idea if it's fixable, no arm64 expert ... >>>>>> - powerpc to filter out all LMBs that can be removed (assuming not all >>>>>> memory corresponds to LMBs that can be removed, otherwise we're in >>>>>> trouble ... :) ) >>>>>> - virtio-mem to filter out all memory it added. >>>>>> - hyper-v to filter out partially backed memory blocks (esp. the last >>>>>> memory block it added and only partially backed it by memory). >>>>>> >>>>>> This would make it work for kexec_file_load(), however, I do wonder how >>>>>> we would want to approach that from userspace kexec-tools when handling >>>>>> it from kexec_load(). >>>>> >>>>> Let's make kexec_file_load work firstly. Since this work is only first >>>>> step to make kexec-ed kernel not break memory hotplug. After kexec >>>>> rebooting, the KASLR may locate kernel into hotpluggable area too. >>>> >>>> Can you elaborate how that would work? >>> >>> Well, boot memory can be hotplugged or not after boot, they are marked >>> in uefi tables, the current kexec doesn't save and pass them into 2nd >>> kenrel, when kexec kernel bootup, it need read them and avoid them to >>> randomize kernel into. >> >> What about e.g., memory hotplugged by ACPI? I would assume, that the >> kexec kernel will not make use of that (IOW detected that) until the >> ACPI driver comes up and re-detects + adds that memory. >> >> Or how would that machinery work in case we have a DIMM hotplugged via ACPI? > > ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > pass the efi, it won't get the SRAT table correctly, if I remember > correctly. Yeah, I remeber kvm guest can get memory hotplugged with > ACPI only, this won't happen on bare metal though. Need check carefully. > I have been using kvm guest with uefi firmwire recently. Yeah, I can imagine that bare metal is different. kvm only uses ACPI. I'm also asking because of virtio-mem. Memory added via virtio-mem is not part of any efi tables or whatsoever. So I assume the kexec kernel will not detect it automatically (good!), instead load the virtio-mem driver and let it add memory back to the system. I should probably play with kexec and virtio-mem once I have some spare cycles ... to find out what's broken and needs to be addressed :) -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-16 14:47 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-16 14:47 UTC (permalink / raw) To: Baoquan He, Andrew Morton Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Will Deacon, linux-arm-kernel >> kexec_walk_memblock() has the option for "kbuf->top_down". Only >> kexec_walk_resources() seems to ignore it. > > Yeah, that top down searching is done in a found low mem area. Means > firstly search an available region bottom up, then put kernel top down > in that region. The reason is our iomem res is linked with singly linked > list. So we can only search bottom up efficiently. > > kexec_load is doing the real top down searching, so kernel will be put > at the top of system ram. I ever tried to change it to support top down > searching for kexec_file_load too with patches, since QE and customers > are often confused with this difference when debugging. > > Andrew may remeber this, he suggested me to change the singly linked list > to doubly linked list for iomem res, then do the top down searching for > kexec_file_load. I tried with some effort, the change introduced too much > code change, I just gave up finally. Well, at least right now this seems to be the right approach (hotplug), lol :) > > http://archive.lwn.net:8080/devicetree/20180718024944.577-1-bhe@redhat.com/ > > I can see that top down searching for kexec can avoid the highly used > low memory region, esp under 4G, for dma, kinds of firmware reserving, > etc. And customers/QE of kexec get used to it. I can change kexec_file_load > to top down too with a simple way if people really complain it. But now, > seems bottom up is not bad too. Ah, I understand the problem. Maybe a simple "optimization" would be to start searching bottom-up from e.g.,2GB/4GB first. If nothing was found, search botoom-up from 0-2GB/4GB etc. > >> >> So I think in case of memblocks (e.g., arm64), this still applies? > > Yeah, aren't you trying to remove it? I haven't read your patches > carefully, maybe I got it wrong. And arm64 even can't support the hot added For arm64 we're still creating memblocks for hotplugged memory, but I guess it's not too hard to stop doing that. > memory being able to recorded into firmware, seems it's not so ready, > won't they change that design in the future? It seems to be incomplete, yes. No idea if it's fixable, no arm64 expert ... >>>>>> - powerpc to filter out all LMBs that can be removed (assuming not all >>>>>> memory corresponds to LMBs that can be removed, otherwise we're in >>>>>> trouble ... :) ) >>>>>> - virtio-mem to filter out all memory it added. >>>>>> - hyper-v to filter out partially backed memory blocks (esp. the last >>>>>> memory block it added and only partially backed it by memory). >>>>>> >>>>>> This would make it work for kexec_file_load(), however, I do wonder how >>>>>> we would want to approach that from userspace kexec-tools when handling >>>>>> it from kexec_load(). >>>>> >>>>> Let's make kexec_file_load work firstly. Since this work is only first >>>>> step to make kexec-ed kernel not break memory hotplug. After kexec >>>>> rebooting, the KASLR may locate kernel into hotpluggable area too. >>>> >>>> Can you elaborate how that would work? >>> >>> Well, boot memory can be hotplugged or not after boot, they are marked >>> in uefi tables, the current kexec doesn't save and pass them into 2nd >>> kenrel, when kexec kernel bootup, it need read them and avoid them to >>> randomize kernel into. >> >> What about e.g., memory hotplugged by ACPI? I would assume, that the >> kexec kernel will not make use of that (IOW detected that) until the >> ACPI driver comes up and re-detects + adds that memory. >> >> Or how would that machinery work in case we have a DIMM hotplugged via ACPI? > > ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > pass the efi, it won't get the SRAT table correctly, if I remember > correctly. Yeah, I remeber kvm guest can get memory hotplugged with > ACPI only, this won't happen on bare metal though. Need check carefully. > I have been using kvm guest with uefi firmwire recently. Yeah, I can imagine that bare metal is different. kvm only uses ACPI. I'm also asking because of virtio-mem. Memory added via virtio-mem is not part of any efi tables or whatsoever. So I assume the kexec kernel will not detect it automatically (good!), instead load the virtio-mem driver and let it add memory back to the system. I should probably play with kexec and virtio-mem once I have some spare cycles ... to find out what's broken and needs to be addressed :) -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-16 14:47 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-16 14:47 UTC (permalink / raw) To: Baoquan He, Andrew Morton Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Will Deacon, linux-arm-kernel >> kexec_walk_memblock() has the option for "kbuf->top_down". Only >> kexec_walk_resources() seems to ignore it. > > Yeah, that top down searching is done in a found low mem area. Means > firstly search an available region bottom up, then put kernel top down > in that region. The reason is our iomem res is linked with singly linked > list. So we can only search bottom up efficiently. > > kexec_load is doing the real top down searching, so kernel will be put > at the top of system ram. I ever tried to change it to support top down > searching for kexec_file_load too with patches, since QE and customers > are often confused with this difference when debugging. > > Andrew may remeber this, he suggested me to change the singly linked list > to doubly linked list for iomem res, then do the top down searching for > kexec_file_load. I tried with some effort, the change introduced too much > code change, I just gave up finally. Well, at least right now this seems to be the right approach (hotplug), lol :) > > http://archive.lwn.net:8080/devicetree/20180718024944.577-1-bhe@redhat.com/ > > I can see that top down searching for kexec can avoid the highly used > low memory region, esp under 4G, for dma, kinds of firmware reserving, > etc. And customers/QE of kexec get used to it. I can change kexec_file_load > to top down too with a simple way if people really complain it. But now, > seems bottom up is not bad too. Ah, I understand the problem. Maybe a simple "optimization" would be to start searching bottom-up from e.g.,2GB/4GB first. If nothing was found, search botoom-up from 0-2GB/4GB etc. > >> >> So I think in case of memblocks (e.g., arm64), this still applies? > > Yeah, aren't you trying to remove it? I haven't read your patches > carefully, maybe I got it wrong. And arm64 even can't support the hot added For arm64 we're still creating memblocks for hotplugged memory, but I guess it's not too hard to stop doing that. > memory being able to recorded into firmware, seems it's not so ready, > won't they change that design in the future? It seems to be incomplete, yes. No idea if it's fixable, no arm64 expert ... >>>>>> - powerpc to filter out all LMBs that can be removed (assuming not all >>>>>> memory corresponds to LMBs that can be removed, otherwise we're in >>>>>> trouble ... :) ) >>>>>> - virtio-mem to filter out all memory it added. >>>>>> - hyper-v to filter out partially backed memory blocks (esp. the last >>>>>> memory block it added and only partially backed it by memory). >>>>>> >>>>>> This would make it work for kexec_file_load(), however, I do wonder how >>>>>> we would want to approach that from userspace kexec-tools when handling >>>>>> it from kexec_load(). >>>>> >>>>> Let's make kexec_file_load work firstly. Since this work is only first >>>>> step to make kexec-ed kernel not break memory hotplug. After kexec >>>>> rebooting, the KASLR may locate kernel into hotpluggable area too. >>>> >>>> Can you elaborate how that would work? >>> >>> Well, boot memory can be hotplugged or not after boot, they are marked >>> in uefi tables, the current kexec doesn't save and pass them into 2nd >>> kenrel, when kexec kernel bootup, it need read them and avoid them to >>> randomize kernel into. >> >> What about e.g., memory hotplugged by ACPI? I would assume, that the >> kexec kernel will not make use of that (IOW detected that) until the >> ACPI driver comes up and re-detects + adds that memory. >> >> Or how would that machinery work in case we have a DIMM hotplugged via ACPI? > > ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > pass the efi, it won't get the SRAT table correctly, if I remember > correctly. Yeah, I remeber kvm guest can get memory hotplugged with > ACPI only, this won't happen on bare metal though. Need check carefully. > I have been using kvm guest with uefi firmwire recently. Yeah, I can imagine that bare metal is different. kvm only uses ACPI. I'm also asking because of virtio-mem. Memory added via virtio-mem is not part of any efi tables or whatsoever. So I assume the kexec kernel will not detect it automatically (good!), instead load the virtio-mem driver and let it add memory back to the system. I should probably play with kexec and virtio-mem once I have some spare cycles ... to find out what's broken and needs to be addressed :) -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-16 14:47 ` David Hildenbrand (?) (?) @ 2020-04-21 13:29 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-21 13:29 UTC (permalink / raw) To: Baoquan He, Andrew Morton Cc: Eric W. Biederman, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Will Deacon, linux-arm-kernel, linuxppc-dev, piliu >> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >> pass the efi, it won't get the SRAT table correctly, if I remember >> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >> ACPI only, this won't happen on bare metal though. Need check carefully. >> I have been using kvm guest with uefi firmwire recently. > > Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > > I'm also asking because of virtio-mem. Memory added via virtio-mem is > not part of any efi tables or whatsoever. So I assume the kexec kernel > will not detect it automatically (good!), instead load the virtio-mem > driver and let it add memory back to the system. > > I should probably play with kexec and virtio-mem once I have some spare > cycles ... to find out what's broken and needs to be addressed :) FWIW, I just gave virtio-mem and kexec/kdump a try. a) kdump seems to work. Memory added by virtio-mem is getting dumped. The kexec kernel only uses memory in the crash region. The virtio-mem driver properly bails out due to is_kdump_kernel(). b) "kexec -s -l" seems to work fine. For now, the kernel does not seem to get placed on virtio-mem memory (pure luck due to the left-to-right search). Memory added by virtio-mem is not getting added to the e820 map. Once the virtio-mem driver comes back up in the kexec kernel, the right memory is readded. c) "kexec -c -l" does not work properly. All memory added by virtio-mem is added to the e820 map, which is wrong. Memory that should not be touched will be touched by the kexec kernel. I assume kexec-tools just goes ahead and adds anything it can find in /proc/iomem (or /sys/firmware/memmap/) to the e820 map of the new kernel. Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is similarly added to the e820 map and, therefore, won't be able to be onlined MOVABLE easily. At least for virtio-mem, I would either have to a) Not support "kexec -c -l". A viable option if we would be planning on not supporting it either way in the long term. I could block this in-kernel somehow eventually. b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by indicating it in /proc/iomem in a special way ("System RAM (hotplugged)"/"System RAM (virtio-mem)"). Baoquan, any opinion on that? -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-21 13:29 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-21 13:29 UTC (permalink / raw) To: Baoquan He, Andrew Morton Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Will Deacon, linux-arm-kernel >> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >> pass the efi, it won't get the SRAT table correctly, if I remember >> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >> ACPI only, this won't happen on bare metal though. Need check carefully. >> I have been using kvm guest with uefi firmwire recently. > > Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > > I'm also asking because of virtio-mem. Memory added via virtio-mem is > not part of any efi tables or whatsoever. So I assume the kexec kernel > will not detect it automatically (good!), instead load the virtio-mem > driver and let it add memory back to the system. > > I should probably play with kexec and virtio-mem once I have some spare > cycles ... to find out what's broken and needs to be addressed :) FWIW, I just gave virtio-mem and kexec/kdump a try. a) kdump seems to work. Memory added by virtio-mem is getting dumped. The kexec kernel only uses memory in the crash region. The virtio-mem driver properly bails out due to is_kdump_kernel(). b) "kexec -s -l" seems to work fine. For now, the kernel does not seem to get placed on virtio-mem memory (pure luck due to the left-to-right search). Memory added by virtio-mem is not getting added to the e820 map. Once the virtio-mem driver comes back up in the kexec kernel, the right memory is readded. c) "kexec -c -l" does not work properly. All memory added by virtio-mem is added to the e820 map, which is wrong. Memory that should not be touched will be touched by the kexec kernel. I assume kexec-tools just goes ahead and adds anything it can find in /proc/iomem (or /sys/firmware/memmap/) to the e820 map of the new kernel. Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is similarly added to the e820 map and, therefore, won't be able to be onlined MOVABLE easily. At least for virtio-mem, I would either have to a) Not support "kexec -c -l". A viable option if we would be planning on not supporting it either way in the long term. I could block this in-kernel somehow eventually. b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by indicating it in /proc/iomem in a special way ("System RAM (hotplugged)"/"System RAM (virtio-mem)"). Baoquan, any opinion on that? -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-21 13:29 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-21 13:29 UTC (permalink / raw) To: Baoquan He, Andrew Morton Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Will Deacon, linux-arm-kernel >> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >> pass the efi, it won't get the SRAT table correctly, if I remember >> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >> ACPI only, this won't happen on bare metal though. Need check carefully. >> I have been using kvm guest with uefi firmwire recently. > > Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > > I'm also asking because of virtio-mem. Memory added via virtio-mem is > not part of any efi tables or whatsoever. So I assume the kexec kernel > will not detect it automatically (good!), instead load the virtio-mem > driver and let it add memory back to the system. > > I should probably play with kexec and virtio-mem once I have some spare > cycles ... to find out what's broken and needs to be addressed :) FWIW, I just gave virtio-mem and kexec/kdump a try. a) kdump seems to work. Memory added by virtio-mem is getting dumped. The kexec kernel only uses memory in the crash region. The virtio-mem driver properly bails out due to is_kdump_kernel(). b) "kexec -s -l" seems to work fine. For now, the kernel does not seem to get placed on virtio-mem memory (pure luck due to the left-to-right search). Memory added by virtio-mem is not getting added to the e820 map. Once the virtio-mem driver comes back up in the kexec kernel, the right memory is readded. c) "kexec -c -l" does not work properly. All memory added by virtio-mem is added to the e820 map, which is wrong. Memory that should not be touched will be touched by the kexec kernel. I assume kexec-tools just goes ahead and adds anything it can find in /proc/iomem (or /sys/firmware/memmap/) to the e820 map of the new kernel. Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is similarly added to the e820 map and, therefore, won't be able to be onlined MOVABLE easily. At least for virtio-mem, I would either have to a) Not support "kexec -c -l". A viable option if we would be planning on not supporting it either way in the long term. I could block this in-kernel somehow eventually. b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by indicating it in /proc/iomem in a special way ("System RAM (hotplugged)"/"System RAM (virtio-mem)"). Baoquan, any opinion on that? -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-21 13:29 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-21 13:29 UTC (permalink / raw) To: Baoquan He, Andrew Morton Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Will Deacon, linux-arm-kernel >> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >> pass the efi, it won't get the SRAT table correctly, if I remember >> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >> ACPI only, this won't happen on bare metal though. Need check carefully. >> I have been using kvm guest with uefi firmwire recently. > > Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > > I'm also asking because of virtio-mem. Memory added via virtio-mem is > not part of any efi tables or whatsoever. So I assume the kexec kernel > will not detect it automatically (good!), instead load the virtio-mem > driver and let it add memory back to the system. > > I should probably play with kexec and virtio-mem once I have some spare > cycles ... to find out what's broken and needs to be addressed :) FWIW, I just gave virtio-mem and kexec/kdump a try. a) kdump seems to work. Memory added by virtio-mem is getting dumped. The kexec kernel only uses memory in the crash region. The virtio-mem driver properly bails out due to is_kdump_kernel(). b) "kexec -s -l" seems to work fine. For now, the kernel does not seem to get placed on virtio-mem memory (pure luck due to the left-to-right search). Memory added by virtio-mem is not getting added to the e820 map. Once the virtio-mem driver comes back up in the kexec kernel, the right memory is readded. c) "kexec -c -l" does not work properly. All memory added by virtio-mem is added to the e820 map, which is wrong. Memory that should not be touched will be touched by the kexec kernel. I assume kexec-tools just goes ahead and adds anything it can find in /proc/iomem (or /sys/firmware/memmap/) to the e820 map of the new kernel. Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is similarly added to the e820 map and, therefore, won't be able to be onlined MOVABLE easily. At least for virtio-mem, I would either have to a) Not support "kexec -c -l". A viable option if we would be planning on not supporting it either way in the long term. I could block this in-kernel somehow eventually. b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by indicating it in /proc/iomem in a special way ("System RAM (hotplugged)"/"System RAM (virtio-mem)"). Baoquan, any opinion on that? -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-21 13:29 ` David Hildenbrand (?) (?) @ 2020-04-21 13:57 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-21 13:57 UTC (permalink / raw) To: Baoquan He, Andrew Morton Cc: Eric W. Biederman, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Will Deacon, linux-arm-kernel, linuxppc-dev, piliu On 21.04.20 15:29, David Hildenbrand wrote: >>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>> pass the efi, it won't get the SRAT table correctly, if I remember >>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>> ACPI only, this won't happen on bare metal though. Need check carefully. >>> I have been using kvm guest with uefi firmwire recently. >> >> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >> >> I'm also asking because of virtio-mem. Memory added via virtio-mem is >> not part of any efi tables or whatsoever. So I assume the kexec kernel >> will not detect it automatically (good!), instead load the virtio-mem >> driver and let it add memory back to the system. >> >> I should probably play with kexec and virtio-mem once I have some spare >> cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. > > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. > > > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. > > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). I just realized, that *not* creating /sys/firmware/memmap/ entries for virtio-mem memory seems to be the right thing to do. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-21 13:57 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-21 13:57 UTC (permalink / raw) To: Baoquan He, Andrew Morton Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Will Deacon, linux-arm-kernel On 21.04.20 15:29, David Hildenbrand wrote: >>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>> pass the efi, it won't get the SRAT table correctly, if I remember >>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>> ACPI only, this won't happen on bare metal though. Need check carefully. >>> I have been using kvm guest with uefi firmwire recently. >> >> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >> >> I'm also asking because of virtio-mem. Memory added via virtio-mem is >> not part of any efi tables or whatsoever. So I assume the kexec kernel >> will not detect it automatically (good!), instead load the virtio-mem >> driver and let it add memory back to the system. >> >> I should probably play with kexec and virtio-mem once I have some spare >> cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. > > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. > > > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. > > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). I just realized, that *not* creating /sys/firmware/memmap/ entries for virtio-mem memory seems to be the right thing to do. -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-21 13:57 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-21 13:57 UTC (permalink / raw) To: Baoquan He, Andrew Morton Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Will Deacon, linux-arm-kernel On 21.04.20 15:29, David Hildenbrand wrote: >>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>> pass the efi, it won't get the SRAT table correctly, if I remember >>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>> ACPI only, this won't happen on bare metal though. Need check carefully. >>> I have been using kvm guest with uefi firmwire recently. >> >> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >> >> I'm also asking because of virtio-mem. Memory added via virtio-mem is >> not part of any efi tables or whatsoever. So I assume the kexec kernel >> will not detect it automatically (good!), instead load the virtio-mem >> driver and let it add memory back to the system. >> >> I should probably play with kexec and virtio-mem once I have some spare >> cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. > > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. > > > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. > > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). I just realized, that *not* creating /sys/firmware/memmap/ entries for virtio-mem memory seems to be the right thing to do. -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-21 13:57 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-21 13:57 UTC (permalink / raw) To: Baoquan He, Andrew Morton Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Will Deacon, linux-arm-kernel On 21.04.20 15:29, David Hildenbrand wrote: >>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>> pass the efi, it won't get the SRAT table correctly, if I remember >>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>> ACPI only, this won't happen on bare metal though. Need check carefully. >>> I have been using kvm guest with uefi firmwire recently. >> >> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >> >> I'm also asking because of virtio-mem. Memory added via virtio-mem is >> not part of any efi tables or whatsoever. So I assume the kexec kernel >> will not detect it automatically (good!), instead load the virtio-mem >> driver and let it add memory back to the system. >> >> I should probably play with kexec and virtio-mem once I have some spare >> cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. > > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. > > > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. > > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). I just realized, that *not* creating /sys/firmware/memmap/ entries for virtio-mem memory seems to be the right thing to do. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-21 13:29 ` David Hildenbrand (?) (?) @ 2020-04-21 13:59 ` Eric W. Biederman -1 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-21 13:59 UTC (permalink / raw) To: David Hildenbrand Cc: Baoquan He, Andrew Morton, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Will Deacon, linux-arm-kernel, linuxppc-dev, piliu David Hildenbrand <david@redhat.com> writes: >>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>> pass the efi, it won't get the SRAT table correctly, if I remember >>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>> ACPI only, this won't happen on bare metal though. Need check carefully. >>> I have been using kvm guest with uefi firmwire recently. >> >> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >> >> I'm also asking because of virtio-mem. Memory added via virtio-mem is >> not part of any efi tables or whatsoever. So I assume the kexec kernel >> will not detect it automatically (good!), instead load the virtio-mem >> driver and let it add memory back to the system. >> >> I should probably play with kexec and virtio-mem once I have some spare >> cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. This sounds like a bug. > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. This sounds like correct behavior to me. If you add memory to the system it is treated as memory to the system. If we need to make it a special kind of memory with special rules we can have some kind of special marking for the memory. But hotplugged is not in itself a sufficient criteria to say don't use this as normal memory. If take a huge server and I plug in an extra dimm it is just memory. For a similarly huge server I might want to have memory that the system booted with unpluggable, in case hardware error reporting notices a dimm generating a lot of memory errors. Now perhaps virtualization needs a special tier of memory that should only be used for cases where the memory is easily movable. I am not familiar with virtio-mem but my skim of the initial design is that virtio-mem was not designed to be such a special tier of memory. Perhaps something has changed? https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. No. > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). How does the kernel memory allocator treat this memory? The logic is simple. If the kernel memory allocator treats that memory as ordinary memory available for all uses it should be presented as ordinary memory available for all uses. If the kernel memory allocator treats that memory as special memory only available for uses that we can easily free later and give back to the system. AKA it is special and not oridinary memory we should mark it as such. Eric p.s. Please excuse me for jumping in I may be missing some important context, but what I read when I saw this message in my inbox just seemed very wrong. ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-21 13:59 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-21 13:59 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel David Hildenbrand <david@redhat.com> writes: >>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>> pass the efi, it won't get the SRAT table correctly, if I remember >>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>> ACPI only, this won't happen on bare metal though. Need check carefully. >>> I have been using kvm guest with uefi firmwire recently. >> >> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >> >> I'm also asking because of virtio-mem. Memory added via virtio-mem is >> not part of any efi tables or whatsoever. So I assume the kexec kernel >> will not detect it automatically (good!), instead load the virtio-mem >> driver and let it add memory back to the system. >> >> I should probably play with kexec and virtio-mem once I have some spare >> cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. This sounds like a bug. > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. This sounds like correct behavior to me. If you add memory to the system it is treated as memory to the system. If we need to make it a special kind of memory with special rules we can have some kind of special marking for the memory. But hotplugged is not in itself a sufficient criteria to say don't use this as normal memory. If take a huge server and I plug in an extra dimm it is just memory. For a similarly huge server I might want to have memory that the system booted with unpluggable, in case hardware error reporting notices a dimm generating a lot of memory errors. Now perhaps virtualization needs a special tier of memory that should only be used for cases where the memory is easily movable. I am not familiar with virtio-mem but my skim of the initial design is that virtio-mem was not designed to be such a special tier of memory. Perhaps something has changed? https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. No. > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). How does the kernel memory allocator treat this memory? The logic is simple. If the kernel memory allocator treats that memory as ordinary memory available for all uses it should be presented as ordinary memory available for all uses. If the kernel memory allocator treats that memory as special memory only available for uses that we can easily free later and give back to the system. AKA it is special and not oridinary memory we should mark it as such. Eric p.s. Please excuse me for jumping in I may be missing some important context, but what I read when I saw this message in my inbox just seemed very wrong. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-21 13:59 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-21 13:59 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel David Hildenbrand <david@redhat.com> writes: >>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>> pass the efi, it won't get the SRAT table correctly, if I remember >>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>> ACPI only, this won't happen on bare metal though. Need check carefully. >>> I have been using kvm guest with uefi firmwire recently. >> >> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >> >> I'm also asking because of virtio-mem. Memory added via virtio-mem is >> not part of any efi tables or whatsoever. So I assume the kexec kernel >> will not detect it automatically (good!), instead load the virtio-mem >> driver and let it add memory back to the system. >> >> I should probably play with kexec and virtio-mem once I have some spare >> cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. This sounds like a bug. > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. This sounds like correct behavior to me. If you add memory to the system it is treated as memory to the system. If we need to make it a special kind of memory with special rules we can have some kind of special marking for the memory. But hotplugged is not in itself a sufficient criteria to say don't use this as normal memory. If take a huge server and I plug in an extra dimm it is just memory. For a similarly huge server I might want to have memory that the system booted with unpluggable, in case hardware error reporting notices a dimm generating a lot of memory errors. Now perhaps virtualization needs a special tier of memory that should only be used for cases where the memory is easily movable. I am not familiar with virtio-mem but my skim of the initial design is that virtio-mem was not designed to be such a special tier of memory. Perhaps something has changed? https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. No. > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). How does the kernel memory allocator treat this memory? The logic is simple. If the kernel memory allocator treats that memory as ordinary memory available for all uses it should be presented as ordinary memory available for all uses. If the kernel memory allocator treats that memory as special memory only available for uses that we can easily free later and give back to the system. AKA it is special and not oridinary memory we should mark it as such. Eric p.s. Please excuse me for jumping in I may be missing some important context, but what I read when I saw this message in my inbox just seemed very wrong. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-21 13:59 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-21 13:59 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel David Hildenbrand <david@redhat.com> writes: >>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>> pass the efi, it won't get the SRAT table correctly, if I remember >>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>> ACPI only, this won't happen on bare metal though. Need check carefully. >>> I have been using kvm guest with uefi firmwire recently. >> >> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >> >> I'm also asking because of virtio-mem. Memory added via virtio-mem is >> not part of any efi tables or whatsoever. So I assume the kexec kernel >> will not detect it automatically (good!), instead load the virtio-mem >> driver and let it add memory back to the system. >> >> I should probably play with kexec and virtio-mem once I have some spare >> cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. This sounds like a bug. > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. This sounds like correct behavior to me. If you add memory to the system it is treated as memory to the system. If we need to make it a special kind of memory with special rules we can have some kind of special marking for the memory. But hotplugged is not in itself a sufficient criteria to say don't use this as normal memory. If take a huge server and I plug in an extra dimm it is just memory. For a similarly huge server I might want to have memory that the system booted with unpluggable, in case hardware error reporting notices a dimm generating a lot of memory errors. Now perhaps virtualization needs a special tier of memory that should only be used for cases where the memory is easily movable. I am not familiar with virtio-mem but my skim of the initial design is that virtio-mem was not designed to be such a special tier of memory. Perhaps something has changed? https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. No. > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). How does the kernel memory allocator treat this memory? The logic is simple. If the kernel memory allocator treats that memory as ordinary memory available for all uses it should be presented as ordinary memory available for all uses. If the kernel memory allocator treats that memory as special memory only available for uses that we can easily free later and give back to the system. AKA it is special and not oridinary memory we should mark it as such. Eric p.s. Please excuse me for jumping in I may be missing some important context, but what I read when I saw this message in my inbox just seemed very wrong. ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-21 13:59 ` Eric W. Biederman (?) (?) @ 2020-04-21 14:30 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-21 14:30 UTC (permalink / raw) To: Eric W. Biederman Cc: Baoquan He, Andrew Morton, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Will Deacon, linux-arm-kernel, linuxppc-dev, piliu >> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem >> to get placed on virtio-mem memory (pure luck due to the left-to-right >> search). Memory added by virtio-mem is not getting added to the e820 >> map. Once the virtio-mem driver comes back up in the kexec kernel, the >> right memory is readded. > > This sounds like a bug. This is how virtio-mem wants its memory to get handled. > >> c) "kexec -c -l" does not work properly. All memory added by virtio-mem >> is added to the e820 map, which is wrong. Memory that should not be >> touched will be touched by the kexec kernel. I assume kexec-tools just >> goes ahead and adds anything it can find in /proc/iomem (or >> /sys/firmware/memmap/) to the e820 map of the new kernel. >> >> Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is >> similarly added to the e820 map and, therefore, won't be able to be >> onlined MOVABLE easily. > > This sounds like correct behavior to me. If you add memory to the > system it is treated as memory to the system. Yeah, I would agree if we are talking about DIMMs, but this memory is special. It's added via a paravirtualized interface and will contain holes, especially after unplug. While memory in these holes can usually be read, it should not be written. More on that below. > > If we need to make it a special kind of memory with special rules we can > have some kind of special marking for the memory. But hotplugged is not > in itself a sufficient criteria to say don't use this as normal memory. Agreed. It is special, though. > > If take a huge server and I plug in an extra dimm it is just memory. Agreed. [...] > > Now perhaps virtualization needs a special tier of memory that should > only be used for cases where the memory is easily movable. > > I am not familiar with virtio-mem but my skim of the initial design > is that virtio-mem was not designed to be such a special tier of memory. > Perhaps something has changed? > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html Yes, a lot changed. See https://lkml.kernel.org/r/20200311171422.10484-1-david@redhat.com for the latest-greatest design overview. > >> b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by >> indicating it in /proc/iomem in a special way ("System RAM >> (hotplugged)"/"System RAM (virtio-mem)"). > > How does the kernel memory allocator treat this memory? So what virtio-mem does is add memory sections on demand and populate within these sections the requested amount of memory. E.g., if 64MB are requested, it will add a 128MB section/resource but only make the first 64MB accessible (via the hypervisor) and only give the first 64MB to the buddy. This way of adding memory is similar to what XEN and hypver-v balloon drivers do when hotplugging memory. When requested to plug more memory, it might go ahead and make (parts of) the remaining 64MB accessible and give them to the buddy. In case it cannot "fill any holes", it will add a new section. When requested to unplug memory, it will try to remove memory from the added (here 64MB) memory from the buddy and tell the hypervisor about it. So, it has some similarity to ballooning in virtual environment, however, it manages its own device memory only and can therefore give better guarantees and detect malicious guests. Right now, I think the right approach would be to not create /sys/firmware/memmap entries from memory virtio-mem added. [...] > > p.s. Please excuse me for jumping in I may be missing some important > context, but what I read when I saw this message in my inbox just seemed > very wrong. Yeah, still, thanks for having a look. Please let me know if you need more information. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-21 14:30 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-21 14:30 UTC (permalink / raw) To: Eric W. Biederman Cc: piliu, Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel >> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem >> to get placed on virtio-mem memory (pure luck due to the left-to-right >> search). Memory added by virtio-mem is not getting added to the e820 >> map. Once the virtio-mem driver comes back up in the kexec kernel, the >> right memory is readded. > > This sounds like a bug. This is how virtio-mem wants its memory to get handled. > >> c) "kexec -c -l" does not work properly. All memory added by virtio-mem >> is added to the e820 map, which is wrong. Memory that should not be >> touched will be touched by the kexec kernel. I assume kexec-tools just >> goes ahead and adds anything it can find in /proc/iomem (or >> /sys/firmware/memmap/) to the e820 map of the new kernel. >> >> Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is >> similarly added to the e820 map and, therefore, won't be able to be >> onlined MOVABLE easily. > > This sounds like correct behavior to me. If you add memory to the > system it is treated as memory to the system. Yeah, I would agree if we are talking about DIMMs, but this memory is special. It's added via a paravirtualized interface and will contain holes, especially after unplug. While memory in these holes can usually be read, it should not be written. More on that below. > > If we need to make it a special kind of memory with special rules we can > have some kind of special marking for the memory. But hotplugged is not > in itself a sufficient criteria to say don't use this as normal memory. Agreed. It is special, though. > > If take a huge server and I plug in an extra dimm it is just memory. Agreed. [...] > > Now perhaps virtualization needs a special tier of memory that should > only be used for cases where the memory is easily movable. > > I am not familiar with virtio-mem but my skim of the initial design > is that virtio-mem was not designed to be such a special tier of memory. > Perhaps something has changed? > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html Yes, a lot changed. See https://lkml.kernel.org/r/20200311171422.10484-1-david@redhat.com for the latest-greatest design overview. > >> b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by >> indicating it in /proc/iomem in a special way ("System RAM >> (hotplugged)"/"System RAM (virtio-mem)"). > > How does the kernel memory allocator treat this memory? So what virtio-mem does is add memory sections on demand and populate within these sections the requested amount of memory. E.g., if 64MB are requested, it will add a 128MB section/resource but only make the first 64MB accessible (via the hypervisor) and only give the first 64MB to the buddy. This way of adding memory is similar to what XEN and hypver-v balloon drivers do when hotplugging memory. When requested to plug more memory, it might go ahead and make (parts of) the remaining 64MB accessible and give them to the buddy. In case it cannot "fill any holes", it will add a new section. When requested to unplug memory, it will try to remove memory from the added (here 64MB) memory from the buddy and tell the hypervisor about it. So, it has some similarity to ballooning in virtual environment, however, it manages its own device memory only and can therefore give better guarantees and detect malicious guests. Right now, I think the right approach would be to not create /sys/firmware/memmap entries from memory virtio-mem added. [...] > > p.s. Please excuse me for jumping in I may be missing some important > context, but what I read when I saw this message in my inbox just seemed > very wrong. Yeah, still, thanks for having a look. Please let me know if you need more information. -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-21 14:30 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-21 14:30 UTC (permalink / raw) To: Eric W. Biederman Cc: piliu, Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel >> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem >> to get placed on virtio-mem memory (pure luck due to the left-to-right >> search). Memory added by virtio-mem is not getting added to the e820 >> map. Once the virtio-mem driver comes back up in the kexec kernel, the >> right memory is readded. > > This sounds like a bug. This is how virtio-mem wants its memory to get handled. > >> c) "kexec -c -l" does not work properly. All memory added by virtio-mem >> is added to the e820 map, which is wrong. Memory that should not be >> touched will be touched by the kexec kernel. I assume kexec-tools just >> goes ahead and adds anything it can find in /proc/iomem (or >> /sys/firmware/memmap/) to the e820 map of the new kernel. >> >> Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is >> similarly added to the e820 map and, therefore, won't be able to be >> onlined MOVABLE easily. > > This sounds like correct behavior to me. If you add memory to the > system it is treated as memory to the system. Yeah, I would agree if we are talking about DIMMs, but this memory is special. It's added via a paravirtualized interface and will contain holes, especially after unplug. While memory in these holes can usually be read, it should not be written. More on that below. > > If we need to make it a special kind of memory with special rules we can > have some kind of special marking for the memory. But hotplugged is not > in itself a sufficient criteria to say don't use this as normal memory. Agreed. It is special, though. > > If take a huge server and I plug in an extra dimm it is just memory. Agreed. [...] > > Now perhaps virtualization needs a special tier of memory that should > only be used for cases where the memory is easily movable. > > I am not familiar with virtio-mem but my skim of the initial design > is that virtio-mem was not designed to be such a special tier of memory. > Perhaps something has changed? > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html Yes, a lot changed. See https://lkml.kernel.org/r/20200311171422.10484-1-david@redhat.com for the latest-greatest design overview. > >> b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by >> indicating it in /proc/iomem in a special way ("System RAM >> (hotplugged)"/"System RAM (virtio-mem)"). > > How does the kernel memory allocator treat this memory? So what virtio-mem does is add memory sections on demand and populate within these sections the requested amount of memory. E.g., if 64MB are requested, it will add a 128MB section/resource but only make the first 64MB accessible (via the hypervisor) and only give the first 64MB to the buddy. This way of adding memory is similar to what XEN and hypver-v balloon drivers do when hotplugging memory. When requested to plug more memory, it might go ahead and make (parts of) the remaining 64MB accessible and give them to the buddy. In case it cannot "fill any holes", it will add a new section. When requested to unplug memory, it will try to remove memory from the added (here 64MB) memory from the buddy and tell the hypervisor about it. So, it has some similarity to ballooning in virtual environment, however, it manages its own device memory only and can therefore give better guarantees and detect malicious guests. Right now, I think the right approach would be to not create /sys/firmware/memmap entries from memory virtio-mem added. [...] > > p.s. Please excuse me for jumping in I may be missing some important > context, but what I read when I saw this message in my inbox just seemed > very wrong. Yeah, still, thanks for having a look. Please let me know if you need more information. -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-21 14:30 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-21 14:30 UTC (permalink / raw) To: Eric W. Biederman Cc: piliu, Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel >> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem >> to get placed on virtio-mem memory (pure luck due to the left-to-right >> search). Memory added by virtio-mem is not getting added to the e820 >> map. Once the virtio-mem driver comes back up in the kexec kernel, the >> right memory is readded. > > This sounds like a bug. This is how virtio-mem wants its memory to get handled. > >> c) "kexec -c -l" does not work properly. All memory added by virtio-mem >> is added to the e820 map, which is wrong. Memory that should not be >> touched will be touched by the kexec kernel. I assume kexec-tools just >> goes ahead and adds anything it can find in /proc/iomem (or >> /sys/firmware/memmap/) to the e820 map of the new kernel. >> >> Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is >> similarly added to the e820 map and, therefore, won't be able to be >> onlined MOVABLE easily. > > This sounds like correct behavior to me. If you add memory to the > system it is treated as memory to the system. Yeah, I would agree if we are talking about DIMMs, but this memory is special. It's added via a paravirtualized interface and will contain holes, especially after unplug. While memory in these holes can usually be read, it should not be written. More on that below. > > If we need to make it a special kind of memory with special rules we can > have some kind of special marking for the memory. But hotplugged is not > in itself a sufficient criteria to say don't use this as normal memory. Agreed. It is special, though. > > If take a huge server and I plug in an extra dimm it is just memory. Agreed. [...] > > Now perhaps virtualization needs a special tier of memory that should > only be used for cases where the memory is easily movable. > > I am not familiar with virtio-mem but my skim of the initial design > is that virtio-mem was not designed to be such a special tier of memory. > Perhaps something has changed? > https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html Yes, a lot changed. See https://lkml.kernel.org/r/20200311171422.10484-1-david@redhat.com for the latest-greatest design overview. > >> b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by >> indicating it in /proc/iomem in a special way ("System RAM >> (hotplugged)"/"System RAM (virtio-mem)"). > > How does the kernel memory allocator treat this memory? So what virtio-mem does is add memory sections on demand and populate within these sections the requested amount of memory. E.g., if 64MB are requested, it will add a 128MB section/resource but only make the first 64MB accessible (via the hypervisor) and only give the first 64MB to the buddy. This way of adding memory is similar to what XEN and hypver-v balloon drivers do when hotplugging memory. When requested to plug more memory, it might go ahead and make (parts of) the remaining 64MB accessible and give them to the buddy. In case it cannot "fill any holes", it will add a new section. When requested to unplug memory, it will try to remove memory from the added (here 64MB) memory from the buddy and tell the hypervisor about it. So, it has some similarity to ballooning in virtual environment, however, it manages its own device memory only and can therefore give better guarantees and detect malicious guests. Right now, I think the right approach would be to not create /sys/firmware/memmap entries from memory virtio-mem added. [...] > > p.s. Please excuse me for jumping in I may be missing some important > context, but what I read when I saw this message in my inbox just seemed > very wrong. Yeah, still, thanks for having a look. Please let me know if you need more information. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-21 13:29 ` David Hildenbrand (?) (?) @ 2020-04-22 9:17 ` Baoquan He -1 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-22 9:17 UTC (permalink / raw) To: David Hildenbrand Cc: Andrew Morton, Eric W. Biederman, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Will Deacon, linux-arm-kernel, linuxppc-dev, piliu On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > >> pass the efi, it won't get the SRAT table correctly, if I remember > >> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >> ACPI only, this won't happen on bare metal though. Need check carefully. > >> I have been using kvm guest with uefi firmwire recently. > > > > Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > > > > I'm also asking because of virtio-mem. Memory added via virtio-mem is > > not part of any efi tables or whatsoever. So I assume the kexec kernel > > will not detect it automatically (good!), instead load the virtio-mem > > driver and let it add memory back to the system. > > > > I should probably play with kexec and virtio-mem once I have some spare > > cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). Right, kdump is not impacted later added memory. > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. kexec_file_load just behaves as you tested. It doesn't collect later added memory to e820 because it uses e820_table_kexec directly to pass e820 to kexec-ed kernel. However, this e820_table_kexec is only updated during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel doesn't have it in e820 during bootup, but it's recoginized and added when ACPI scanning. I think we should update e820_table_kexec when hot add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, balloon will need be added into e820_table_kexec too, and if this is expected behaviour. But whatever we do, it won't impact the kexec file_loading, because of the searching strategy bottom up. Just adding them into e820_table_kexec will make it consistent with cold reboot which get recognizes and get them into e820 during bootup. > > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. Yes, kexec_load will read memory regions from /sys/firmware/memmap/ or /proc/iomem. Making it right seems a little harder, we can export them to /proc/iomem or /sys/firmware/memmap/ with mark them with 'hotplug', but the attribute that which zone they belongs to is not easy to tell. We are proactive on widely testing kexec_file_load on x86_64, s390, arm64 by adding test cases into CKI. > > > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. > > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). > > Baoquan, any opinion on that? > > -- > Thanks, > > David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 9:17 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-22 9:17 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > >> pass the efi, it won't get the SRAT table correctly, if I remember > >> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >> ACPI only, this won't happen on bare metal though. Need check carefully. > >> I have been using kvm guest with uefi firmwire recently. > > > > Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > > > > I'm also asking because of virtio-mem. Memory added via virtio-mem is > > not part of any efi tables or whatsoever. So I assume the kexec kernel > > will not detect it automatically (good!), instead load the virtio-mem > > driver and let it add memory back to the system. > > > > I should probably play with kexec and virtio-mem once I have some spare > > cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). Right, kdump is not impacted later added memory. > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. kexec_file_load just behaves as you tested. It doesn't collect later added memory to e820 because it uses e820_table_kexec directly to pass e820 to kexec-ed kernel. However, this e820_table_kexec is only updated during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel doesn't have it in e820 during bootup, but it's recoginized and added when ACPI scanning. I think we should update e820_table_kexec when hot add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, balloon will need be added into e820_table_kexec too, and if this is expected behaviour. But whatever we do, it won't impact the kexec file_loading, because of the searching strategy bottom up. Just adding them into e820_table_kexec will make it consistent with cold reboot which get recognizes and get them into e820 during bootup. > > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. Yes, kexec_load will read memory regions from /sys/firmware/memmap/ or /proc/iomem. Making it right seems a little harder, we can export them to /proc/iomem or /sys/firmware/memmap/ with mark them with 'hotplug', but the attribute that which zone they belongs to is not easy to tell. We are proactive on widely testing kexec_file_load on x86_64, s390, arm64 by adding test cases into CKI. > > > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. > > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). > > Baoquan, any opinion on that? > > -- > Thanks, > > David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 9:17 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-22 9:17 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > >> pass the efi, it won't get the SRAT table correctly, if I remember > >> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >> ACPI only, this won't happen on bare metal though. Need check carefully. > >> I have been using kvm guest with uefi firmwire recently. > > > > Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > > > > I'm also asking because of virtio-mem. Memory added via virtio-mem is > > not part of any efi tables or whatsoever. So I assume the kexec kernel > > will not detect it automatically (good!), instead load the virtio-mem > > driver and let it add memory back to the system. > > > > I should probably play with kexec and virtio-mem once I have some spare > > cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). Right, kdump is not impacted later added memory. > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. kexec_file_load just behaves as you tested. It doesn't collect later added memory to e820 because it uses e820_table_kexec directly to pass e820 to kexec-ed kernel. However, this e820_table_kexec is only updated during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel doesn't have it in e820 during bootup, but it's recoginized and added when ACPI scanning. I think we should update e820_table_kexec when hot add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, balloon will need be added into e820_table_kexec too, and if this is expected behaviour. But whatever we do, it won't impact the kexec file_loading, because of the searching strategy bottom up. Just adding them into e820_table_kexec will make it consistent with cold reboot which get recognizes and get them into e820 during bootup. > > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. Yes, kexec_load will read memory regions from /sys/firmware/memmap/ or /proc/iomem. Making it right seems a little harder, we can export them to /proc/iomem or /sys/firmware/memmap/ with mark them with 'hotplug', but the attribute that which zone they belongs to is not easy to tell. We are proactive on widely testing kexec_file_load on x86_64, s390, arm64 by adding test cases into CKI. > > > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. > > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). > > Baoquan, any opinion on that? > > -- > Thanks, > > David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 9:17 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-22 9:17 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > >> pass the efi, it won't get the SRAT table correctly, if I remember > >> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >> ACPI only, this won't happen on bare metal though. Need check carefully. > >> I have been using kvm guest with uefi firmwire recently. > > > > Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > > > > I'm also asking because of virtio-mem. Memory added via virtio-mem is > > not part of any efi tables or whatsoever. So I assume the kexec kernel > > will not detect it automatically (good!), instead load the virtio-mem > > driver and let it add memory back to the system. > > > > I should probably play with kexec and virtio-mem once I have some spare > > cycles ... to find out what's broken and needs to be addressed :) > > FWIW, I just gave virtio-mem and kexec/kdump a try. > > a) kdump seems to work. Memory added by virtio-mem is getting dumped. > The kexec kernel only uses memory in the crash region. The virtio-mem > driver properly bails out due to is_kdump_kernel(). Right, kdump is not impacted later added memory. > > b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > to get placed on virtio-mem memory (pure luck due to the left-to-right > search). Memory added by virtio-mem is not getting added to the e820 > map. Once the virtio-mem driver comes back up in the kexec kernel, the > right memory is readded. kexec_file_load just behaves as you tested. It doesn't collect later added memory to e820 because it uses e820_table_kexec directly to pass e820 to kexec-ed kernel. However, this e820_table_kexec is only updated during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel doesn't have it in e820 during bootup, but it's recoginized and added when ACPI scanning. I think we should update e820_table_kexec when hot add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, balloon will need be added into e820_table_kexec too, and if this is expected behaviour. But whatever we do, it won't impact the kexec file_loading, because of the searching strategy bottom up. Just adding them into e820_table_kexec will make it consistent with cold reboot which get recognizes and get them into e820 during bootup. > > c) "kexec -c -l" does not work properly. All memory added by virtio-mem > is added to the e820 map, which is wrong. Memory that should not be > touched will be touched by the kexec kernel. I assume kexec-tools just > goes ahead and adds anything it can find in /proc/iomem (or > /sys/firmware/memmap/) to the e820 map of the new kernel. > > Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is > similarly added to the e820 map and, therefore, won't be able to be > onlined MOVABLE easily. Yes, kexec_load will read memory regions from /sys/firmware/memmap/ or /proc/iomem. Making it right seems a little harder, we can export them to /proc/iomem or /sys/firmware/memmap/ with mark them with 'hotplug', but the attribute that which zone they belongs to is not easy to tell. We are proactive on widely testing kexec_file_load on x86_64, s390, arm64 by adding test cases into CKI. > > > At least for virtio-mem, I would either have to > a) Not support "kexec -c -l". A viable option if we would be planning on > not supporting it either way in the long term. I could block this > in-kernel somehow eventually. > > b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by > indicating it in /proc/iomem in a special way ("System RAM > (hotplugged)"/"System RAM (virtio-mem)"). > > Baoquan, any opinion on that? > > -- > Thanks, > > David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-22 9:17 ` Baoquan He (?) (?) @ 2020-04-22 9:24 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-22 9:24 UTC (permalink / raw) To: Baoquan He Cc: Andrew Morton, Eric W. Biederman, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Will Deacon, linux-arm-kernel, linuxppc-dev, piliu On 22.04.20 11:17, Baoquan He wrote: > On 04/21/20 at 03:29pm, David Hildenbrand wrote: >>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>>> pass the efi, it won't get the SRAT table correctly, if I remember >>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>>> ACPI only, this won't happen on bare metal though. Need check carefully. >>>> I have been using kvm guest with uefi firmwire recently. >>> >>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >>> >>> I'm also asking because of virtio-mem. Memory added via virtio-mem is >>> not part of any efi tables or whatsoever. So I assume the kexec kernel >>> will not detect it automatically (good!), instead load the virtio-mem >>> driver and let it add memory back to the system. >>> >>> I should probably play with kexec and virtio-mem once I have some spare >>> cycles ... to find out what's broken and needs to be addressed :) >> >> FWIW, I just gave virtio-mem and kexec/kdump a try. >> >> a) kdump seems to work. Memory added by virtio-mem is getting dumped. >> The kexec kernel only uses memory in the crash region. The virtio-mem >> driver properly bails out due to is_kdump_kernel(). > > Right, kdump is not impacted later added memory. > >> >> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem >> to get placed on virtio-mem memory (pure luck due to the left-to-right >> search). Memory added by virtio-mem is not getting added to the e820 >> map. Once the virtio-mem driver comes back up in the kexec kernel, the >> right memory is readded. > > kexec_file_load just behaves as you tested. It doesn't collect later > added memory to e820 because it uses e820_table_kexec directly to pass > e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > doesn't have it in e820 during bootup, but it's recoginized and added > when ACPI scanning. I think we should update e820_table_kexec when hot > add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > balloon will need be added into e820_table_kexec too, and if this is > expected behaviour. > > But whatever we do, it won't impact the kexec file_loading, because of > the searching strategy bottom up. Just adding them into e820_table_kexec > will make it consistent with cold reboot which get recognizes and get > them into e820 during bootup. Yeah, I think whatever a cold-booted kernel will see is what kexec-ed kernel should see. Not more, not less. Regarding virtio-mem: Not in e820 on cold-boot. Regarding DIMMs: DIMMs under KVM will never show up in the e820 map IIRC. I think on real HW it can be different. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 9:24 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-22 9:24 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 22.04.20 11:17, Baoquan He wrote: > On 04/21/20 at 03:29pm, David Hildenbrand wrote: >>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>>> pass the efi, it won't get the SRAT table correctly, if I remember >>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>>> ACPI only, this won't happen on bare metal though. Need check carefully. >>>> I have been using kvm guest with uefi firmwire recently. >>> >>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >>> >>> I'm also asking because of virtio-mem. Memory added via virtio-mem is >>> not part of any efi tables or whatsoever. So I assume the kexec kernel >>> will not detect it automatically (good!), instead load the virtio-mem >>> driver and let it add memory back to the system. >>> >>> I should probably play with kexec and virtio-mem once I have some spare >>> cycles ... to find out what's broken and needs to be addressed :) >> >> FWIW, I just gave virtio-mem and kexec/kdump a try. >> >> a) kdump seems to work. Memory added by virtio-mem is getting dumped. >> The kexec kernel only uses memory in the crash region. The virtio-mem >> driver properly bails out due to is_kdump_kernel(). > > Right, kdump is not impacted later added memory. > >> >> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem >> to get placed on virtio-mem memory (pure luck due to the left-to-right >> search). Memory added by virtio-mem is not getting added to the e820 >> map. Once the virtio-mem driver comes back up in the kexec kernel, the >> right memory is readded. > > kexec_file_load just behaves as you tested. It doesn't collect later > added memory to e820 because it uses e820_table_kexec directly to pass > e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > doesn't have it in e820 during bootup, but it's recoginized and added > when ACPI scanning. I think we should update e820_table_kexec when hot > add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > balloon will need be added into e820_table_kexec too, and if this is > expected behaviour. > > But whatever we do, it won't impact the kexec file_loading, because of > the searching strategy bottom up. Just adding them into e820_table_kexec > will make it consistent with cold reboot which get recognizes and get > them into e820 during bootup. Yeah, I think whatever a cold-booted kernel will see is what kexec-ed kernel should see. Not more, not less. Regarding virtio-mem: Not in e820 on cold-boot. Regarding DIMMs: DIMMs under KVM will never show up in the e820 map IIRC. I think on real HW it can be different. -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 9:24 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-22 9:24 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 22.04.20 11:17, Baoquan He wrote: > On 04/21/20 at 03:29pm, David Hildenbrand wrote: >>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>>> pass the efi, it won't get the SRAT table correctly, if I remember >>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>>> ACPI only, this won't happen on bare metal though. Need check carefully. >>>> I have been using kvm guest with uefi firmwire recently. >>> >>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >>> >>> I'm also asking because of virtio-mem. Memory added via virtio-mem is >>> not part of any efi tables or whatsoever. So I assume the kexec kernel >>> will not detect it automatically (good!), instead load the virtio-mem >>> driver and let it add memory back to the system. >>> >>> I should probably play with kexec and virtio-mem once I have some spare >>> cycles ... to find out what's broken and needs to be addressed :) >> >> FWIW, I just gave virtio-mem and kexec/kdump a try. >> >> a) kdump seems to work. Memory added by virtio-mem is getting dumped. >> The kexec kernel only uses memory in the crash region. The virtio-mem >> driver properly bails out due to is_kdump_kernel(). > > Right, kdump is not impacted later added memory. > >> >> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem >> to get placed on virtio-mem memory (pure luck due to the left-to-right >> search). Memory added by virtio-mem is not getting added to the e820 >> map. Once the virtio-mem driver comes back up in the kexec kernel, the >> right memory is readded. > > kexec_file_load just behaves as you tested. It doesn't collect later > added memory to e820 because it uses e820_table_kexec directly to pass > e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > doesn't have it in e820 during bootup, but it's recoginized and added > when ACPI scanning. I think we should update e820_table_kexec when hot > add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > balloon will need be added into e820_table_kexec too, and if this is > expected behaviour. > > But whatever we do, it won't impact the kexec file_loading, because of > the searching strategy bottom up. Just adding them into e820_table_kexec > will make it consistent with cold reboot which get recognizes and get > them into e820 during bootup. Yeah, I think whatever a cold-booted kernel will see is what kexec-ed kernel should see. Not more, not less. Regarding virtio-mem: Not in e820 on cold-boot. Regarding DIMMs: DIMMs under KVM will never show up in the e820 map IIRC. I think on real HW it can be different. -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 9:24 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-22 9:24 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 22.04.20 11:17, Baoquan He wrote: > On 04/21/20 at 03:29pm, David Hildenbrand wrote: >>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>>> pass the efi, it won't get the SRAT table correctly, if I remember >>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>>> ACPI only, this won't happen on bare metal though. Need check carefully. >>>> I have been using kvm guest with uefi firmwire recently. >>> >>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >>> >>> I'm also asking because of virtio-mem. Memory added via virtio-mem is >>> not part of any efi tables or whatsoever. So I assume the kexec kernel >>> will not detect it automatically (good!), instead load the virtio-mem >>> driver and let it add memory back to the system. >>> >>> I should probably play with kexec and virtio-mem once I have some spare >>> cycles ... to find out what's broken and needs to be addressed :) >> >> FWIW, I just gave virtio-mem and kexec/kdump a try. >> >> a) kdump seems to work. Memory added by virtio-mem is getting dumped. >> The kexec kernel only uses memory in the crash region. The virtio-mem >> driver properly bails out due to is_kdump_kernel(). > > Right, kdump is not impacted later added memory. > >> >> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem >> to get placed on virtio-mem memory (pure luck due to the left-to-right >> search). Memory added by virtio-mem is not getting added to the e820 >> map. Once the virtio-mem driver comes back up in the kexec kernel, the >> right memory is readded. > > kexec_file_load just behaves as you tested. It doesn't collect later > added memory to e820 because it uses e820_table_kexec directly to pass > e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > doesn't have it in e820 during bootup, but it's recoginized and added > when ACPI scanning. I think we should update e820_table_kexec when hot > add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > balloon will need be added into e820_table_kexec too, and if this is > expected behaviour. > > But whatever we do, it won't impact the kexec file_loading, because of > the searching strategy bottom up. Just adding them into e820_table_kexec > will make it consistent with cold reboot which get recognizes and get > them into e820 during bootup. Yeah, I think whatever a cold-booted kernel will see is what kexec-ed kernel should see. Not more, not less. Regarding virtio-mem: Not in e820 on cold-boot. Regarding DIMMs: DIMMs under KVM will never show up in the e820 map IIRC. I think on real HW it can be different. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-22 9:24 ` David Hildenbrand (?) (?) @ 2020-04-22 9:57 ` Baoquan He -1 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-22 9:57 UTC (permalink / raw) To: David Hildenbrand Cc: Andrew Morton, Eric W. Biederman, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Will Deacon, linux-arm-kernel, linuxppc-dev, piliu On 04/22/20 at 11:24am, David Hildenbrand wrote: > On 22.04.20 11:17, Baoquan He wrote: > > On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > >>>> pass the efi, it won't get the SRAT table correctly, if I remember > >>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >>>> ACPI only, this won't happen on bare metal though. Need check carefully. > >>>> I have been using kvm guest with uefi firmwire recently. > >>> > >>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > >>> > >>> I'm also asking because of virtio-mem. Memory added via virtio-mem is > >>> not part of any efi tables or whatsoever. So I assume the kexec kernel > >>> will not detect it automatically (good!), instead load the virtio-mem > >>> driver and let it add memory back to the system. > >>> > >>> I should probably play with kexec and virtio-mem once I have some spare > >>> cycles ... to find out what's broken and needs to be addressed :) > >> > >> FWIW, I just gave virtio-mem and kexec/kdump a try. > >> > >> a) kdump seems to work. Memory added by virtio-mem is getting dumped. > >> The kexec kernel only uses memory in the crash region. The virtio-mem > >> driver properly bails out due to is_kdump_kernel(). > > > > Right, kdump is not impacted later added memory. > > > >> > >> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > >> to get placed on virtio-mem memory (pure luck due to the left-to-right > >> search). Memory added by virtio-mem is not getting added to the e820 > >> map. Once the virtio-mem driver comes back up in the kexec kernel, the > >> right memory is readded. > > > > kexec_file_load just behaves as you tested. It doesn't collect later > > added memory to e820 because it uses e820_table_kexec directly to pass > > e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > > during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > > doesn't have it in e820 during bootup, but it's recoginized and added > > when ACPI scanning. I think we should update e820_table_kexec when hot > > add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > > balloon will need be added into e820_table_kexec too, and if this is > > expected behaviour. > > > > But whatever we do, it won't impact the kexec file_loading, because of > > the searching strategy bottom up. Just adding them into e820_table_kexec > > will make it consistent with cold reboot which get recognizes and get > > them into e820 during bootup. > > Yeah, I think whatever a cold-booted kernel will see is what kexec-ed > kernel should see. Not more, not less. > > Regarding virtio-mem: Not in e820 on cold-boot. > Regarding DIMMs: DIMMs under KVM will never show up in the e820 map > IIRC. I think on real HW it can be different. Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of QEMU/KVM guest in this area, why we don't make kvm guest recognize hotpluggable DIMM and add it into e820 map, he said he had tried to make it, but this will corrupt guest on HyperV. So he had to revert the commit on qemu. So I think we can leave it for now for both real HW and kvm, or update the e820_table_kexec to include added DIMM for both real HW and KVM. I hope one day KVM dev will find a way to conquer the defect on HyperV and make the e820map consistent with bare metal. After all, kvm guest is trying to imitate real HW for the most part. Anyway, I will think about the e820_table_kexec updating. See if we can do something about it. ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 9:57 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-22 9:57 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/22/20 at 11:24am, David Hildenbrand wrote: > On 22.04.20 11:17, Baoquan He wrote: > > On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > >>>> pass the efi, it won't get the SRAT table correctly, if I remember > >>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >>>> ACPI only, this won't happen on bare metal though. Need check carefully. > >>>> I have been using kvm guest with uefi firmwire recently. > >>> > >>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > >>> > >>> I'm also asking because of virtio-mem. Memory added via virtio-mem is > >>> not part of any efi tables or whatsoever. So I assume the kexec kernel > >>> will not detect it automatically (good!), instead load the virtio-mem > >>> driver and let it add memory back to the system. > >>> > >>> I should probably play with kexec and virtio-mem once I have some spare > >>> cycles ... to find out what's broken and needs to be addressed :) > >> > >> FWIW, I just gave virtio-mem and kexec/kdump a try. > >> > >> a) kdump seems to work. Memory added by virtio-mem is getting dumped. > >> The kexec kernel only uses memory in the crash region. The virtio-mem > >> driver properly bails out due to is_kdump_kernel(). > > > > Right, kdump is not impacted later added memory. > > > >> > >> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > >> to get placed on virtio-mem memory (pure luck due to the left-to-right > >> search). Memory added by virtio-mem is not getting added to the e820 > >> map. Once the virtio-mem driver comes back up in the kexec kernel, the > >> right memory is readded. > > > > kexec_file_load just behaves as you tested. It doesn't collect later > > added memory to e820 because it uses e820_table_kexec directly to pass > > e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > > during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > > doesn't have it in e820 during bootup, but it's recoginized and added > > when ACPI scanning. I think we should update e820_table_kexec when hot > > add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > > balloon will need be added into e820_table_kexec too, and if this is > > expected behaviour. > > > > But whatever we do, it won't impact the kexec file_loading, because of > > the searching strategy bottom up. Just adding them into e820_table_kexec > > will make it consistent with cold reboot which get recognizes and get > > them into e820 during bootup. > > Yeah, I think whatever a cold-booted kernel will see is what kexec-ed > kernel should see. Not more, not less. > > Regarding virtio-mem: Not in e820 on cold-boot. > Regarding DIMMs: DIMMs under KVM will never show up in the e820 map > IIRC. I think on real HW it can be different. Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of QEMU/KVM guest in this area, why we don't make kvm guest recognize hotpluggable DIMM and add it into e820 map, he said he had tried to make it, but this will corrupt guest on HyperV. So he had to revert the commit on qemu. So I think we can leave it for now for both real HW and kvm, or update the e820_table_kexec to include added DIMM for both real HW and KVM. I hope one day KVM dev will find a way to conquer the defect on HyperV and make the e820map consistent with bare metal. After all, kvm guest is trying to imitate real HW for the most part. Anyway, I will think about the e820_table_kexec updating. See if we can do something about it. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 9:57 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-22 9:57 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/22/20 at 11:24am, David Hildenbrand wrote: > On 22.04.20 11:17, Baoquan He wrote: > > On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > >>>> pass the efi, it won't get the SRAT table correctly, if I remember > >>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >>>> ACPI only, this won't happen on bare metal though. Need check carefully. > >>>> I have been using kvm guest with uefi firmwire recently. > >>> > >>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > >>> > >>> I'm also asking because of virtio-mem. Memory added via virtio-mem is > >>> not part of any efi tables or whatsoever. So I assume the kexec kernel > >>> will not detect it automatically (good!), instead load the virtio-mem > >>> driver and let it add memory back to the system. > >>> > >>> I should probably play with kexec and virtio-mem once I have some spare > >>> cycles ... to find out what's broken and needs to be addressed :) > >> > >> FWIW, I just gave virtio-mem and kexec/kdump a try. > >> > >> a) kdump seems to work. Memory added by virtio-mem is getting dumped. > >> The kexec kernel only uses memory in the crash region. The virtio-mem > >> driver properly bails out due to is_kdump_kernel(). > > > > Right, kdump is not impacted later added memory. > > > >> > >> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > >> to get placed on virtio-mem memory (pure luck due to the left-to-right > >> search). Memory added by virtio-mem is not getting added to the e820 > >> map. Once the virtio-mem driver comes back up in the kexec kernel, the > >> right memory is readded. > > > > kexec_file_load just behaves as you tested. It doesn't collect later > > added memory to e820 because it uses e820_table_kexec directly to pass > > e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > > during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > > doesn't have it in e820 during bootup, but it's recoginized and added > > when ACPI scanning. I think we should update e820_table_kexec when hot > > add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > > balloon will need be added into e820_table_kexec too, and if this is > > expected behaviour. > > > > But whatever we do, it won't impact the kexec file_loading, because of > > the searching strategy bottom up. Just adding them into e820_table_kexec > > will make it consistent with cold reboot which get recognizes and get > > them into e820 during bootup. > > Yeah, I think whatever a cold-booted kernel will see is what kexec-ed > kernel should see. Not more, not less. > > Regarding virtio-mem: Not in e820 on cold-boot. > Regarding DIMMs: DIMMs under KVM will never show up in the e820 map > IIRC. I think on real HW it can be different. Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of QEMU/KVM guest in this area, why we don't make kvm guest recognize hotpluggable DIMM and add it into e820 map, he said he had tried to make it, but this will corrupt guest on HyperV. So he had to revert the commit on qemu. So I think we can leave it for now for both real HW and kvm, or update the e820_table_kexec to include added DIMM for both real HW and KVM. I hope one day KVM dev will find a way to conquer the defect on HyperV and make the e820map consistent with bare metal. After all, kvm guest is trying to imitate real HW for the most part. Anyway, I will think about the e820_table_kexec updating. See if we can do something about it. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 9:57 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-22 9:57 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/22/20 at 11:24am, David Hildenbrand wrote: > On 22.04.20 11:17, Baoquan He wrote: > > On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > >>>> pass the efi, it won't get the SRAT table correctly, if I remember > >>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >>>> ACPI only, this won't happen on bare metal though. Need check carefully. > >>>> I have been using kvm guest with uefi firmwire recently. > >>> > >>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > >>> > >>> I'm also asking because of virtio-mem. Memory added via virtio-mem is > >>> not part of any efi tables or whatsoever. So I assume the kexec kernel > >>> will not detect it automatically (good!), instead load the virtio-mem > >>> driver and let it add memory back to the system. > >>> > >>> I should probably play with kexec and virtio-mem once I have some spare > >>> cycles ... to find out what's broken and needs to be addressed :) > >> > >> FWIW, I just gave virtio-mem and kexec/kdump a try. > >> > >> a) kdump seems to work. Memory added by virtio-mem is getting dumped. > >> The kexec kernel only uses memory in the crash region. The virtio-mem > >> driver properly bails out due to is_kdump_kernel(). > > > > Right, kdump is not impacted later added memory. > > > >> > >> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > >> to get placed on virtio-mem memory (pure luck due to the left-to-right > >> search). Memory added by virtio-mem is not getting added to the e820 > >> map. Once the virtio-mem driver comes back up in the kexec kernel, the > >> right memory is readded. > > > > kexec_file_load just behaves as you tested. It doesn't collect later > > added memory to e820 because it uses e820_table_kexec directly to pass > > e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > > during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > > doesn't have it in e820 during bootup, but it's recoginized and added > > when ACPI scanning. I think we should update e820_table_kexec when hot > > add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > > balloon will need be added into e820_table_kexec too, and if this is > > expected behaviour. > > > > But whatever we do, it won't impact the kexec file_loading, because of > > the searching strategy bottom up. Just adding them into e820_table_kexec > > will make it consistent with cold reboot which get recognizes and get > > them into e820 during bootup. > > Yeah, I think whatever a cold-booted kernel will see is what kexec-ed > kernel should see. Not more, not less. > > Regarding virtio-mem: Not in e820 on cold-boot. > Regarding DIMMs: DIMMs under KVM will never show up in the e820 map > IIRC. I think on real HW it can be different. Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of QEMU/KVM guest in this area, why we don't make kvm guest recognize hotpluggable DIMM and add it into e820 map, he said he had tried to make it, but this will corrupt guest on HyperV. So he had to revert the commit on qemu. So I think we can leave it for now for both real HW and kvm, or update the e820_table_kexec to include added DIMM for both real HW and KVM. I hope one day KVM dev will find a way to conquer the defect on HyperV and make the e820map consistent with bare metal. After all, kvm guest is trying to imitate real HW for the most part. Anyway, I will think about the e820_table_kexec updating. See if we can do something about it. ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-22 9:57 ` Baoquan He (?) (?) @ 2020-04-22 10:05 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-22 10:05 UTC (permalink / raw) To: Baoquan He Cc: Andrew Morton, Eric W. Biederman, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Will Deacon, linux-arm-kernel, linuxppc-dev, piliu On 22.04.20 11:57, Baoquan He wrote: > On 04/22/20 at 11:24am, David Hildenbrand wrote: >> On 22.04.20 11:17, Baoquan He wrote: >>> On 04/21/20 at 03:29pm, David Hildenbrand wrote: >>>>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>>>>> pass the efi, it won't get the SRAT table correctly, if I remember >>>>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>>>>> ACPI only, this won't happen on bare metal though. Need check carefully. >>>>>> I have been using kvm guest with uefi firmwire recently. >>>>> >>>>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >>>>> >>>>> I'm also asking because of virtio-mem. Memory added via virtio-mem is >>>>> not part of any efi tables or whatsoever. So I assume the kexec kernel >>>>> will not detect it automatically (good!), instead load the virtio-mem >>>>> driver and let it add memory back to the system. >>>>> >>>>> I should probably play with kexec and virtio-mem once I have some spare >>>>> cycles ... to find out what's broken and needs to be addressed :) >>>> >>>> FWIW, I just gave virtio-mem and kexec/kdump a try. >>>> >>>> a) kdump seems to work. Memory added by virtio-mem is getting dumped. >>>> The kexec kernel only uses memory in the crash region. The virtio-mem >>>> driver properly bails out due to is_kdump_kernel(). >>> >>> Right, kdump is not impacted later added memory. >>> >>>> >>>> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem >>>> to get placed on virtio-mem memory (pure luck due to the left-to-right >>>> search). Memory added by virtio-mem is not getting added to the e820 >>>> map. Once the virtio-mem driver comes back up in the kexec kernel, the >>>> right memory is readded. >>> >>> kexec_file_load just behaves as you tested. It doesn't collect later >>> added memory to e820 because it uses e820_table_kexec directly to pass >>> e820 to kexec-ed kernel. However, this e820_table_kexec is only updated >>> during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel >>> doesn't have it in e820 during bootup, but it's recoginized and added >>> when ACPI scanning. I think we should update e820_table_kexec when hot >>> add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, >>> balloon will need be added into e820_table_kexec too, and if this is >>> expected behaviour. >>> >>> But whatever we do, it won't impact the kexec file_loading, because of >>> the searching strategy bottom up. Just adding them into e820_table_kexec >>> will make it consistent with cold reboot which get recognizes and get >>> them into e820 during bootup. >> >> Yeah, I think whatever a cold-booted kernel will see is what kexec-ed >> kernel should see. Not more, not less. >> >> Regarding virtio-mem: Not in e820 on cold-boot. >> Regarding DIMMs: DIMMs under KVM will never show up in the e820 map >> IIRC. I think on real HW it can be different. > > Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature > of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of > QEMU/KVM guest in this area, why we don't make kvm guest recognize > hotpluggable DIMM and add it into e820 map, he said he had tried to make > it, but this will corrupt guest on HyperV. So he had to revert the Yeah, I remember that this had to be reverted due to something breaking. But OTOH, it allows us to online coldplugged DIMMs online_movable easily, so I'd say it's even a feature (although, does not behave like real HW we have). I use this extensively when testing memory hot(un)plug via coldplugged DIMMs. I do wonder if there is real HW, where this is also the case. > commit on qemu. So I think we can leave it for now for both real HW and > kvm, or update the e820_table_kexec to include added DIMM for both real > HW and KVM. I hope one day KVM dev will find a way to conquer the defect > on HyperV and make the e820map consistent with bare metal. After all, > kvm guest is trying to imitate real HW for the most part. > > Anyway, I will think about the e820_table_kexec updating. See if we can > do something about it. Yeah, for DIMMs on real HW it might definitely make sense. We might be able to hook into updates of /sys/firmware/memmap on memory add/remove. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 10:05 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-22 10:05 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 22.04.20 11:57, Baoquan He wrote: > On 04/22/20 at 11:24am, David Hildenbrand wrote: >> On 22.04.20 11:17, Baoquan He wrote: >>> On 04/21/20 at 03:29pm, David Hildenbrand wrote: >>>>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>>>>> pass the efi, it won't get the SRAT table correctly, if I remember >>>>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>>>>> ACPI only, this won't happen on bare metal though. Need check carefully. >>>>>> I have been using kvm guest with uefi firmwire recently. >>>>> >>>>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >>>>> >>>>> I'm also asking because of virtio-mem. Memory added via virtio-mem is >>>>> not part of any efi tables or whatsoever. So I assume the kexec kernel >>>>> will not detect it automatically (good!), instead load the virtio-mem >>>>> driver and let it add memory back to the system. >>>>> >>>>> I should probably play with kexec and virtio-mem once I have some spare >>>>> cycles ... to find out what's broken and needs to be addressed :) >>>> >>>> FWIW, I just gave virtio-mem and kexec/kdump a try. >>>> >>>> a) kdump seems to work. Memory added by virtio-mem is getting dumped. >>>> The kexec kernel only uses memory in the crash region. The virtio-mem >>>> driver properly bails out due to is_kdump_kernel(). >>> >>> Right, kdump is not impacted later added memory. >>> >>>> >>>> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem >>>> to get placed on virtio-mem memory (pure luck due to the left-to-right >>>> search). Memory added by virtio-mem is not getting added to the e820 >>>> map. Once the virtio-mem driver comes back up in the kexec kernel, the >>>> right memory is readded. >>> >>> kexec_file_load just behaves as you tested. It doesn't collect later >>> added memory to e820 because it uses e820_table_kexec directly to pass >>> e820 to kexec-ed kernel. However, this e820_table_kexec is only updated >>> during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel >>> doesn't have it in e820 during bootup, but it's recoginized and added >>> when ACPI scanning. I think we should update e820_table_kexec when hot >>> add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, >>> balloon will need be added into e820_table_kexec too, and if this is >>> expected behaviour. >>> >>> But whatever we do, it won't impact the kexec file_loading, because of >>> the searching strategy bottom up. Just adding them into e820_table_kexec >>> will make it consistent with cold reboot which get recognizes and get >>> them into e820 during bootup. >> >> Yeah, I think whatever a cold-booted kernel will see is what kexec-ed >> kernel should see. Not more, not less. >> >> Regarding virtio-mem: Not in e820 on cold-boot. >> Regarding DIMMs: DIMMs under KVM will never show up in the e820 map >> IIRC. I think on real HW it can be different. > > Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature > of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of > QEMU/KVM guest in this area, why we don't make kvm guest recognize > hotpluggable DIMM and add it into e820 map, he said he had tried to make > it, but this will corrupt guest on HyperV. So he had to revert the Yeah, I remember that this had to be reverted due to something breaking. But OTOH, it allows us to online coldplugged DIMMs online_movable easily, so I'd say it's even a feature (although, does not behave like real HW we have). I use this extensively when testing memory hot(un)plug via coldplugged DIMMs. I do wonder if there is real HW, where this is also the case. > commit on qemu. So I think we can leave it for now for both real HW and > kvm, or update the e820_table_kexec to include added DIMM for both real > HW and KVM. I hope one day KVM dev will find a way to conquer the defect > on HyperV and make the e820map consistent with bare metal. After all, > kvm guest is trying to imitate real HW for the most part. > > Anyway, I will think about the e820_table_kexec updating. See if we can > do something about it. Yeah, for DIMMs on real HW it might definitely make sense. We might be able to hook into updates of /sys/firmware/memmap on memory add/remove. -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 10:05 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-22 10:05 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 22.04.20 11:57, Baoquan He wrote: > On 04/22/20 at 11:24am, David Hildenbrand wrote: >> On 22.04.20 11:17, Baoquan He wrote: >>> On 04/21/20 at 03:29pm, David Hildenbrand wrote: >>>>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>>>>> pass the efi, it won't get the SRAT table correctly, if I remember >>>>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>>>>> ACPI only, this won't happen on bare metal though. Need check carefully. >>>>>> I have been using kvm guest with uefi firmwire recently. >>>>> >>>>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >>>>> >>>>> I'm also asking because of virtio-mem. Memory added via virtio-mem is >>>>> not part of any efi tables or whatsoever. So I assume the kexec kernel >>>>> will not detect it automatically (good!), instead load the virtio-mem >>>>> driver and let it add memory back to the system. >>>>> >>>>> I should probably play with kexec and virtio-mem once I have some spare >>>>> cycles ... to find out what's broken and needs to be addressed :) >>>> >>>> FWIW, I just gave virtio-mem and kexec/kdump a try. >>>> >>>> a) kdump seems to work. Memory added by virtio-mem is getting dumped. >>>> The kexec kernel only uses memory in the crash region. The virtio-mem >>>> driver properly bails out due to is_kdump_kernel(). >>> >>> Right, kdump is not impacted later added memory. >>> >>>> >>>> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem >>>> to get placed on virtio-mem memory (pure luck due to the left-to-right >>>> search). Memory added by virtio-mem is not getting added to the e820 >>>> map. Once the virtio-mem driver comes back up in the kexec kernel, the >>>> right memory is readded. >>> >>> kexec_file_load just behaves as you tested. It doesn't collect later >>> added memory to e820 because it uses e820_table_kexec directly to pass >>> e820 to kexec-ed kernel. However, this e820_table_kexec is only updated >>> during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel >>> doesn't have it in e820 during bootup, but it's recoginized and added >>> when ACPI scanning. I think we should update e820_table_kexec when hot >>> add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, >>> balloon will need be added into e820_table_kexec too, and if this is >>> expected behaviour. >>> >>> But whatever we do, it won't impact the kexec file_loading, because of >>> the searching strategy bottom up. Just adding them into e820_table_kexec >>> will make it consistent with cold reboot which get recognizes and get >>> them into e820 during bootup. >> >> Yeah, I think whatever a cold-booted kernel will see is what kexec-ed >> kernel should see. Not more, not less. >> >> Regarding virtio-mem: Not in e820 on cold-boot. >> Regarding DIMMs: DIMMs under KVM will never show up in the e820 map >> IIRC. I think on real HW it can be different. > > Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature > of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of > QEMU/KVM guest in this area, why we don't make kvm guest recognize > hotpluggable DIMM and add it into e820 map, he said he had tried to make > it, but this will corrupt guest on HyperV. So he had to revert the Yeah, I remember that this had to be reverted due to something breaking. But OTOH, it allows us to online coldplugged DIMMs online_movable easily, so I'd say it's even a feature (although, does not behave like real HW we have). I use this extensively when testing memory hot(un)plug via coldplugged DIMMs. I do wonder if there is real HW, where this is also the case. > commit on qemu. So I think we can leave it for now for both real HW and > kvm, or update the e820_table_kexec to include added DIMM for both real > HW and KVM. I hope one day KVM dev will find a way to conquer the defect > on HyperV and make the e820map consistent with bare metal. After all, > kvm guest is trying to imitate real HW for the most part. > > Anyway, I will think about the e820_table_kexec updating. See if we can > do something about it. Yeah, for DIMMs on real HW it might definitely make sense. We might be able to hook into updates of /sys/firmware/memmap on memory add/remove. -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 10:05 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-22 10:05 UTC (permalink / raw) To: Baoquan He Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 22.04.20 11:57, Baoquan He wrote: > On 04/22/20 at 11:24am, David Hildenbrand wrote: >> On 22.04.20 11:17, Baoquan He wrote: >>> On 04/21/20 at 03:29pm, David Hildenbrand wrote: >>>>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't >>>>>> pass the efi, it won't get the SRAT table correctly, if I remember >>>>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with >>>>>> ACPI only, this won't happen on bare metal though. Need check carefully. >>>>>> I have been using kvm guest with uefi firmwire recently. >>>>> >>>>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. >>>>> >>>>> I'm also asking because of virtio-mem. Memory added via virtio-mem is >>>>> not part of any efi tables or whatsoever. So I assume the kexec kernel >>>>> will not detect it automatically (good!), instead load the virtio-mem >>>>> driver and let it add memory back to the system. >>>>> >>>>> I should probably play with kexec and virtio-mem once I have some spare >>>>> cycles ... to find out what's broken and needs to be addressed :) >>>> >>>> FWIW, I just gave virtio-mem and kexec/kdump a try. >>>> >>>> a) kdump seems to work. Memory added by virtio-mem is getting dumped. >>>> The kexec kernel only uses memory in the crash region. The virtio-mem >>>> driver properly bails out due to is_kdump_kernel(). >>> >>> Right, kdump is not impacted later added memory. >>> >>>> >>>> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem >>>> to get placed on virtio-mem memory (pure luck due to the left-to-right >>>> search). Memory added by virtio-mem is not getting added to the e820 >>>> map. Once the virtio-mem driver comes back up in the kexec kernel, the >>>> right memory is readded. >>> >>> kexec_file_load just behaves as you tested. It doesn't collect later >>> added memory to e820 because it uses e820_table_kexec directly to pass >>> e820 to kexec-ed kernel. However, this e820_table_kexec is only updated >>> during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel >>> doesn't have it in e820 during bootup, but it's recoginized and added >>> when ACPI scanning. I think we should update e820_table_kexec when hot >>> add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, >>> balloon will need be added into e820_table_kexec too, and if this is >>> expected behaviour. >>> >>> But whatever we do, it won't impact the kexec file_loading, because of >>> the searching strategy bottom up. Just adding them into e820_table_kexec >>> will make it consistent with cold reboot which get recognizes and get >>> them into e820 during bootup. >> >> Yeah, I think whatever a cold-booted kernel will see is what kexec-ed >> kernel should see. Not more, not less. >> >> Regarding virtio-mem: Not in e820 on cold-boot. >> Regarding DIMMs: DIMMs under KVM will never show up in the e820 map >> IIRC. I think on real HW it can be different. > > Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature > of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of > QEMU/KVM guest in this area, why we don't make kvm guest recognize > hotpluggable DIMM and add it into e820 map, he said he had tried to make > it, but this will corrupt guest on HyperV. So he had to revert the Yeah, I remember that this had to be reverted due to something breaking. But OTOH, it allows us to online coldplugged DIMMs online_movable easily, so I'd say it's even a feature (although, does not behave like real HW we have). I use this extensively when testing memory hot(un)plug via coldplugged DIMMs. I do wonder if there is real HW, where this is also the case. > commit on qemu. So I think we can leave it for now for both real HW and > kvm, or update the e820_table_kexec to include added DIMM for both real > HW and KVM. I hope one day KVM dev will find a way to conquer the defect > on HyperV and make the e820map consistent with bare metal. After all, > kvm guest is trying to imitate real HW for the most part. > > Anyway, I will think about the e820_table_kexec updating. See if we can > do something about it. Yeah, for DIMMs on real HW it might definitely make sense. We might be able to hook into updates of /sys/firmware/memmap on memory add/remove. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-22 10:05 ` David Hildenbrand (?) (?) @ 2020-04-22 10:36 ` Baoquan He -1 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-22 10:36 UTC (permalink / raw) To: David Hildenbrand Cc: Andrew Morton, Eric W. Biederman, Russell King - ARM Linux admin, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Will Deacon, linux-arm-kernel, linuxppc-dev, piliu On 04/22/20 at 12:05pm, David Hildenbrand wrote: > On 22.04.20 11:57, Baoquan He wrote: > > On 04/22/20 at 11:24am, David Hildenbrand wrote: > >> On 22.04.20 11:17, Baoquan He wrote: > >>> On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >>>>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > >>>>>> pass the efi, it won't get the SRAT table correctly, if I remember > >>>>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >>>>>> ACPI only, this won't happen on bare metal though. Need check carefully. > >>>>>> I have been using kvm guest with uefi firmwire recently. > >>>>> > >>>>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > >>>>> > >>>>> I'm also asking because of virtio-mem. Memory added via virtio-mem is > >>>>> not part of any efi tables or whatsoever. So I assume the kexec kernel > >>>>> will not detect it automatically (good!), instead load the virtio-mem > >>>>> driver and let it add memory back to the system. > >>>>> > >>>>> I should probably play with kexec and virtio-mem once I have some spare > >>>>> cycles ... to find out what's broken and needs to be addressed :) > >>>> > >>>> FWIW, I just gave virtio-mem and kexec/kdump a try. > >>>> > >>>> a) kdump seems to work. Memory added by virtio-mem is getting dumped. > >>>> The kexec kernel only uses memory in the crash region. The virtio-mem > >>>> driver properly bails out due to is_kdump_kernel(). > >>> > >>> Right, kdump is not impacted later added memory. > >>> > >>>> > >>>> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > >>>> to get placed on virtio-mem memory (pure luck due to the left-to-right > >>>> search). Memory added by virtio-mem is not getting added to the e820 > >>>> map. Once the virtio-mem driver comes back up in the kexec kernel, the > >>>> right memory is readded. > >>> > >>> kexec_file_load just behaves as you tested. It doesn't collect later > >>> added memory to e820 because it uses e820_table_kexec directly to pass > >>> e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > >>> during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > >>> doesn't have it in e820 during bootup, but it's recoginized and added > >>> when ACPI scanning. I think we should update e820_table_kexec when hot > >>> add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > >>> balloon will need be added into e820_table_kexec too, and if this is > >>> expected behaviour. > >>> > >>> But whatever we do, it won't impact the kexec file_loading, because of > >>> the searching strategy bottom up. Just adding them into e820_table_kexec > >>> will make it consistent with cold reboot which get recognizes and get > >>> them into e820 during bootup. > >> > >> Yeah, I think whatever a cold-booted kernel will see is what kexec-ed > >> kernel should see. Not more, not less. > >> > >> Regarding virtio-mem: Not in e820 on cold-boot. > >> Regarding DIMMs: DIMMs under KVM will never show up in the e820 map > >> IIRC. I think on real HW it can be different. > > > > Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature > > of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of > > QEMU/KVM guest in this area, why we don't make kvm guest recognize > > hotpluggable DIMM and add it into e820 map, he said he had tried to make > > it, but this will corrupt guest on HyperV. So he had to revert the > > Yeah, I remember that this had to be reverted due to something breaking. > But OTOH, it allows us to online coldplugged DIMMs online_movable > easily, so I'd say it's even a feature (although, does not behave like > real HW we have). > > I use this extensively when testing memory hot(un)plug via coldplugged > DIMMs. > > I do wonder if there is real HW, where this is also the case. None for what I know. Hotplug on real HW includes two parts, the boot mem being hotpluggable is more flexiable one. It allows people to replace bad DIMM. And you can see code in boot stage has been adjusted a lot on this purpose, at that time, people haven't thought about kvm guest. > > > commit on qemu. So I think we can leave it for now for both real HW and > > kvm, or update the e820_table_kexec to include added DIMM for both real > > HW and KVM. I hope one day KVM dev will find a way to conquer the defect > > on HyperV and make the e820map consistent with bare metal. After all, > > kvm guest is trying to imitate real HW for the most part. > > > > Anyway, I will think about the e820_table_kexec updating. See if we can > > do something about it. > > Yeah, for DIMMs on real HW it might definitely make sense. We might be > able to hook into updates of /sys/firmware/memmap on memory add/remove. ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 10:36 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-22 10:36 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/22/20 at 12:05pm, David Hildenbrand wrote: > On 22.04.20 11:57, Baoquan He wrote: > > On 04/22/20 at 11:24am, David Hildenbrand wrote: > >> On 22.04.20 11:17, Baoquan He wrote: > >>> On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >>>>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > >>>>>> pass the efi, it won't get the SRAT table correctly, if I remember > >>>>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >>>>>> ACPI only, this won't happen on bare metal though. Need check carefully. > >>>>>> I have been using kvm guest with uefi firmwire recently. > >>>>> > >>>>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > >>>>> > >>>>> I'm also asking because of virtio-mem. Memory added via virtio-mem is > >>>>> not part of any efi tables or whatsoever. So I assume the kexec kernel > >>>>> will not detect it automatically (good!), instead load the virtio-mem > >>>>> driver and let it add memory back to the system. > >>>>> > >>>>> I should probably play with kexec and virtio-mem once I have some spare > >>>>> cycles ... to find out what's broken and needs to be addressed :) > >>>> > >>>> FWIW, I just gave virtio-mem and kexec/kdump a try. > >>>> > >>>> a) kdump seems to work. Memory added by virtio-mem is getting dumped. > >>>> The kexec kernel only uses memory in the crash region. The virtio-mem > >>>> driver properly bails out due to is_kdump_kernel(). > >>> > >>> Right, kdump is not impacted later added memory. > >>> > >>>> > >>>> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > >>>> to get placed on virtio-mem memory (pure luck due to the left-to-right > >>>> search). Memory added by virtio-mem is not getting added to the e820 > >>>> map. Once the virtio-mem driver comes back up in the kexec kernel, the > >>>> right memory is readded. > >>> > >>> kexec_file_load just behaves as you tested. It doesn't collect later > >>> added memory to e820 because it uses e820_table_kexec directly to pass > >>> e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > >>> during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > >>> doesn't have it in e820 during bootup, but it's recoginized and added > >>> when ACPI scanning. I think we should update e820_table_kexec when hot > >>> add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > >>> balloon will need be added into e820_table_kexec too, and if this is > >>> expected behaviour. > >>> > >>> But whatever we do, it won't impact the kexec file_loading, because of > >>> the searching strategy bottom up. Just adding them into e820_table_kexec > >>> will make it consistent with cold reboot which get recognizes and get > >>> them into e820 during bootup. > >> > >> Yeah, I think whatever a cold-booted kernel will see is what kexec-ed > >> kernel should see. Not more, not less. > >> > >> Regarding virtio-mem: Not in e820 on cold-boot. > >> Regarding DIMMs: DIMMs under KVM will never show up in the e820 map > >> IIRC. I think on real HW it can be different. > > > > Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature > > of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of > > QEMU/KVM guest in this area, why we don't make kvm guest recognize > > hotpluggable DIMM and add it into e820 map, he said he had tried to make > > it, but this will corrupt guest on HyperV. So he had to revert the > > Yeah, I remember that this had to be reverted due to something breaking. > But OTOH, it allows us to online coldplugged DIMMs online_movable > easily, so I'd say it's even a feature (although, does not behave like > real HW we have). > > I use this extensively when testing memory hot(un)plug via coldplugged > DIMMs. > > I do wonder if there is real HW, where this is also the case. None for what I know. Hotplug on real HW includes two parts, the boot mem being hotpluggable is more flexiable one. It allows people to replace bad DIMM. And you can see code in boot stage has been adjusted a lot on this purpose, at that time, people haven't thought about kvm guest. > > > commit on qemu. So I think we can leave it for now for both real HW and > > kvm, or update the e820_table_kexec to include added DIMM for both real > > HW and KVM. I hope one day KVM dev will find a way to conquer the defect > > on HyperV and make the e820map consistent with bare metal. After all, > > kvm guest is trying to imitate real HW for the most part. > > > > Anyway, I will think about the e820_table_kexec updating. See if we can > > do something about it. > > Yeah, for DIMMs on real HW it might definitely make sense. We might be > able to hook into updates of /sys/firmware/memmap on memory add/remove. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 10:36 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-22 10:36 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/22/20 at 12:05pm, David Hildenbrand wrote: > On 22.04.20 11:57, Baoquan He wrote: > > On 04/22/20 at 11:24am, David Hildenbrand wrote: > >> On 22.04.20 11:17, Baoquan He wrote: > >>> On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >>>>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > >>>>>> pass the efi, it won't get the SRAT table correctly, if I remember > >>>>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >>>>>> ACPI only, this won't happen on bare metal though. Need check carefully. > >>>>>> I have been using kvm guest with uefi firmwire recently. > >>>>> > >>>>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > >>>>> > >>>>> I'm also asking because of virtio-mem. Memory added via virtio-mem is > >>>>> not part of any efi tables or whatsoever. So I assume the kexec kernel > >>>>> will not detect it automatically (good!), instead load the virtio-mem > >>>>> driver and let it add memory back to the system. > >>>>> > >>>>> I should probably play with kexec and virtio-mem once I have some spare > >>>>> cycles ... to find out what's broken and needs to be addressed :) > >>>> > >>>> FWIW, I just gave virtio-mem and kexec/kdump a try. > >>>> > >>>> a) kdump seems to work. Memory added by virtio-mem is getting dumped. > >>>> The kexec kernel only uses memory in the crash region. The virtio-mem > >>>> driver properly bails out due to is_kdump_kernel(). > >>> > >>> Right, kdump is not impacted later added memory. > >>> > >>>> > >>>> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > >>>> to get placed on virtio-mem memory (pure luck due to the left-to-right > >>>> search). Memory added by virtio-mem is not getting added to the e820 > >>>> map. Once the virtio-mem driver comes back up in the kexec kernel, the > >>>> right memory is readded. > >>> > >>> kexec_file_load just behaves as you tested. It doesn't collect later > >>> added memory to e820 because it uses e820_table_kexec directly to pass > >>> e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > >>> during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > >>> doesn't have it in e820 during bootup, but it's recoginized and added > >>> when ACPI scanning. I think we should update e820_table_kexec when hot > >>> add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > >>> balloon will need be added into e820_table_kexec too, and if this is > >>> expected behaviour. > >>> > >>> But whatever we do, it won't impact the kexec file_loading, because of > >>> the searching strategy bottom up. Just adding them into e820_table_kexec > >>> will make it consistent with cold reboot which get recognizes and get > >>> them into e820 during bootup. > >> > >> Yeah, I think whatever a cold-booted kernel will see is what kexec-ed > >> kernel should see. Not more, not less. > >> > >> Regarding virtio-mem: Not in e820 on cold-boot. > >> Regarding DIMMs: DIMMs under KVM will never show up in the e820 map > >> IIRC. I think on real HW it can be different. > > > > Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature > > of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of > > QEMU/KVM guest in this area, why we don't make kvm guest recognize > > hotpluggable DIMM and add it into e820 map, he said he had tried to make > > it, but this will corrupt guest on HyperV. So he had to revert the > > Yeah, I remember that this had to be reverted due to something breaking. > But OTOH, it allows us to online coldplugged DIMMs online_movable > easily, so I'd say it's even a feature (although, does not behave like > real HW we have). > > I use this extensively when testing memory hot(un)plug via coldplugged > DIMMs. > > I do wonder if there is real HW, where this is also the case. None for what I know. Hotplug on real HW includes two parts, the boot mem being hotpluggable is more flexiable one. It allows people to replace bad DIMM. And you can see code in boot stage has been adjusted a lot on this purpose, at that time, people haven't thought about kvm guest. > > > commit on qemu. So I think we can leave it for now for both real HW and > > kvm, or update the e820_table_kexec to include added DIMM for both real > > HW and KVM. I hope one day KVM dev will find a way to conquer the defect > > on HyperV and make the e820map consistent with bare metal. After all, > > kvm guest is trying to imitate real HW for the most part. > > > > Anyway, I will think about the e820_table_kexec updating. See if we can > > do something about it. > > Yeah, for DIMMs on real HW it might definitely make sense. We might be > able to hook into updates of /sys/firmware/memmap on memory add/remove. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 10:36 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-04-22 10:36 UTC (permalink / raw) To: David Hildenbrand Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, linuxppc-dev, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Eric W. Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 04/22/20 at 12:05pm, David Hildenbrand wrote: > On 22.04.20 11:57, Baoquan He wrote: > > On 04/22/20 at 11:24am, David Hildenbrand wrote: > >> On 22.04.20 11:17, Baoquan He wrote: > >>> On 04/21/20 at 03:29pm, David Hildenbrand wrote: > >>>>>> ACPI SRAT is embeded into efi, need read out the rsdp pointer. If we don't > >>>>>> pass the efi, it won't get the SRAT table correctly, if I remember > >>>>>> correctly. Yeah, I remeber kvm guest can get memory hotplugged with > >>>>>> ACPI only, this won't happen on bare metal though. Need check carefully. > >>>>>> I have been using kvm guest with uefi firmwire recently. > >>>>> > >>>>> Yeah, I can imagine that bare metal is different. kvm only uses ACPI. > >>>>> > >>>>> I'm also asking because of virtio-mem. Memory added via virtio-mem is > >>>>> not part of any efi tables or whatsoever. So I assume the kexec kernel > >>>>> will not detect it automatically (good!), instead load the virtio-mem > >>>>> driver and let it add memory back to the system. > >>>>> > >>>>> I should probably play with kexec and virtio-mem once I have some spare > >>>>> cycles ... to find out what's broken and needs to be addressed :) > >>>> > >>>> FWIW, I just gave virtio-mem and kexec/kdump a try. > >>>> > >>>> a) kdump seems to work. Memory added by virtio-mem is getting dumped. > >>>> The kexec kernel only uses memory in the crash region. The virtio-mem > >>>> driver properly bails out due to is_kdump_kernel(). > >>> > >>> Right, kdump is not impacted later added memory. > >>> > >>>> > >>>> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem > >>>> to get placed on virtio-mem memory (pure luck due to the left-to-right > >>>> search). Memory added by virtio-mem is not getting added to the e820 > >>>> map. Once the virtio-mem driver comes back up in the kexec kernel, the > >>>> right memory is readded. > >>> > >>> kexec_file_load just behaves as you tested. It doesn't collect later > >>> added memory to e820 because it uses e820_table_kexec directly to pass > >>> e820 to kexec-ed kernel. However, this e820_table_kexec is only updated > >>> during boot stage. I tried hot adding DIMM after boot, kexec-ed kernel > >>> doesn't have it in e820 during bootup, but it's recoginized and added > >>> when ACPI scanning. I think we should update e820_table_kexec when hot > >>> add/remove memory, at least for DIMM. Not sure if DLPAR, virtio-mem, > >>> balloon will need be added into e820_table_kexec too, and if this is > >>> expected behaviour. > >>> > >>> But whatever we do, it won't impact the kexec file_loading, because of > >>> the searching strategy bottom up. Just adding them into e820_table_kexec > >>> will make it consistent with cold reboot which get recognizes and get > >>> them into e820 during bootup. > >> > >> Yeah, I think whatever a cold-booted kernel will see is what kexec-ed > >> kernel should see. Not more, not less. > >> > >> Regarding virtio-mem: Not in e820 on cold-boot. > >> Regarding DIMMs: DIMMs under KVM will never show up in the e820 map > >> IIRC. I think on real HW it can be different. > > > > Yeah, DIMMs under KVM won't show up in e820 map. While this is not feature > > of QEMU/KVM, but a defect of it. I ever asked Igor who is developer of > > QEMU/KVM guest in this area, why we don't make kvm guest recognize > > hotpluggable DIMM and add it into e820 map, he said he had tried to make > > it, but this will corrupt guest on HyperV. So he had to revert the > > Yeah, I remember that this had to be reverted due to something breaking. > But OTOH, it allows us to online coldplugged DIMMs online_movable > easily, so I'd say it's even a feature (although, does not behave like > real HW we have). > > I use this extensively when testing memory hot(un)plug via coldplugged > DIMMs. > > I do wonder if there is real HW, where this is also the case. None for what I know. Hotplug on real HW includes two parts, the boot mem being hotpluggable is more flexiable one. It allows people to replace bad DIMM. And you can see code in boot stage has been adjusted a lot on this purpose, at that time, people haven't thought about kvm guest. > > > commit on qemu. So I think we can leave it for now for both real HW and > > kvm, or update the e820_table_kexec to include added DIMM for both real > > HW and KVM. I hope one day KVM dev will find a way to conquer the defect > > on HyperV and make the e820map consistent with bare metal. After all, > > kvm guest is trying to imitate real HW for the most part. > > > > Anyway, I will think about the e820_table_kexec updating. See if we can > > do something about it. > > Yeah, for DIMMs on real HW it might definitely make sense. We might be > able to hook into updates of /sys/firmware/memmap on memory add/remove. ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-13 13:15 ` Eric W. Biederman (?) @ 2020-04-14 9:16 ` Dave Young -1 siblings, 0 replies; 264+ messages in thread From: Dave Young @ 2020-04-14 9:16 UTC (permalink / raw) To: Eric W. Biederman Cc: Baoquan He, David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 04/13/20 at 08:15am, Eric W. Biederman wrote: > Baoquan He <bhe@redhat.com> writes: > > > On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > >> > >> The only benefit of kexec_file_load is that it is simple enough from a > >> kernel perspective that signatures can be checked. > > > > We don't have this restriction any more with below commit: > > > > commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > > and KEXEC_SIG_FORCE") > > > > With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > > secure boot or legacy system for kexec/kdump. Being simple enough is > > enough to astract and convince us to use it instead. And kexec_file_load > > has been in use for several years on systems with secure boot, since > > added in 2014, on x86_64. > > No. Actaully kexec_file_load is the less capable interface, and less > flexible interface. Which is why it is appropriate for signature > verification. I agreed that the user space design is more flexible, but as for the common use case of loading bzImage (say x86 as an example) the kexec_file_load is good enough. We could have other potential improvement based on kexec_file_load. For example we could use it to do some early kdump loading, eg. try to load an attached kdump kernel immediately once the crashkernel memory get reserved. > > >> kexec_load in every other respect is the more capable and functional > >> interface. It makes no sense to get rid of it. We do not remove kexec_load at all, it is indeed helpful in many cases as all agreed. But if we have a bug reported for both we may fix kexec_file_load first because it is usually easier, also do not need to worry about too much about old kernel and new kernel compatibility. For example the recent breakage we found in efi path, kexec_file_load just work after the efi cleanup, but kexec_load depends on the ABI we added, so we must fix it as below: https://lore.kernel.org/linux-efi/20200410135644.GB6772@dhcp-128-65.nay.redhat.com/ > >> > >> It does make sense to reload with a loaded kernel on memory hotplug. > >> That is simple and easy. If we are going to handle something in the > >> kernel it should simple an automated unloading of the kernel on memory > >> hotplug. > >> > >> > >> I think it would be irresponsible to deprecate kexec_load on any > >> platform. > >> > >> I also suspect that kexec_file_load could be taught to copy the dtb > >> on arm32 if someone wants to deal with signatures. > >> > >> We definitely can not even think of deprecating kexec_load until > >> architecture that supports it also supports kexec_file_load and everyone > >> is happy with that interface. That is Linus's no regression rule. > > > > I should pick a milder word to express our tendency and tell our plan > > then 'obsolete'. Even though I added 'gradually', seems it doesn't help > > much. I didn't mean to say 'deprecate' at all when replied. > > > > The situation and trend I understand about kexec_load and kexec_file_load > > are: > > > > 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > > have yet, just as x86_64, arm64 and s390 have done; > > > > 2) kexec_file_load is suggested to use, and take precedence over > > kexec_load in the future, if both are supported in one ARCH. > > The deep problem is that kexec_file_load is distinctly less expressive > than kexec_load. > > > 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > > and by ARCHes for back compatibility w/ kexec_file_load support. > > > > For 1) and 2), I think the reason is obvious as Eric said, > > kexec_file_load is simple enough. And currently, whenever we got a bug > > report, we may need fix them twice, for kexec_load and kexec_file_load. > > If kexec_file_load is made by default, e.g on x86_64, we will change it > > in kernel space only, for kexec_file_load. This is what I meant about > > 'obsolete gradually'. I think for arm64, s390, they will do these too. > > Unless there's some critical/blocker bug in kexec_load, to corrupt the > > old kexec_load interface in old product. > > Maybe. The code that kexec_file_load sucked into the kernel is quite > stable and rarely needs changes except during a port of kexec to > another architecture. > > Last I looked the real maintenance effor of kexec and kexec on panic was > in the drivers. So I don't think we can use maintenance to do anything. > > > For 3), people can still use kexec_load and develop/fix for it, if no > > kexec_file_load supported. But 32-bit arm should be a different one, > > more like i386, we will leave it as is, and fix anything which could > > break it. But people really expects to improve or add feature to it? E.g > > in this patchset, the mem hotplug issue James raised, I assume James is > > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > > another reply, people even don't agree to continue supporting memory > > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > > bug on i386 with a patch, but people would rather set it as BROKEN. > > For memory hotplug just reload. Userspace already gets good events. > > We should not expect anything except a panic kernel to be loaded over a > memory hotplug event. The kexec on panic code should actually be loaded > in a location that we don't reliquish if asked for it. > > Quite frankly at this point I would love to see the signature fad die, > which would allow us to remove kexec_file_load. I still have not seen > the signature code used anywhere except by people anticipating trouble. Same to me, I also hate the Secure Boot, and I also do not like the trouble added by signature verification. But still we found that beyond of Secure Boot use cases it is also useful in other usual cases. And since kernel has the lockdown supported we have to leave with it. Thanks Dave ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 9:16 ` Dave Young 0 siblings, 0 replies; 264+ messages in thread From: Dave Young @ 2020-04-14 9:16 UTC (permalink / raw) To: Eric W. Biederman Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 04/13/20 at 08:15am, Eric W. Biederman wrote: > Baoquan He <bhe@redhat.com> writes: > > > On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > >> > >> The only benefit of kexec_file_load is that it is simple enough from a > >> kernel perspective that signatures can be checked. > > > > We don't have this restriction any more with below commit: > > > > commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > > and KEXEC_SIG_FORCE") > > > > With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > > secure boot or legacy system for kexec/kdump. Being simple enough is > > enough to astract and convince us to use it instead. And kexec_file_load > > has been in use for several years on systems with secure boot, since > > added in 2014, on x86_64. > > No. Actaully kexec_file_load is the less capable interface, and less > flexible interface. Which is why it is appropriate for signature > verification. I agreed that the user space design is more flexible, but as for the common use case of loading bzImage (say x86 as an example) the kexec_file_load is good enough. We could have other potential improvement based on kexec_file_load. For example we could use it to do some early kdump loading, eg. try to load an attached kdump kernel immediately once the crashkernel memory get reserved. > > >> kexec_load in every other respect is the more capable and functional > >> interface. It makes no sense to get rid of it. We do not remove kexec_load at all, it is indeed helpful in many cases as all agreed. But if we have a bug reported for both we may fix kexec_file_load first because it is usually easier, also do not need to worry about too much about old kernel and new kernel compatibility. For example the recent breakage we found in efi path, kexec_file_load just work after the efi cleanup, but kexec_load depends on the ABI we added, so we must fix it as below: https://lore.kernel.org/linux-efi/20200410135644.GB6772@dhcp-128-65.nay.redhat.com/ > >> > >> It does make sense to reload with a loaded kernel on memory hotplug. > >> That is simple and easy. If we are going to handle something in the > >> kernel it should simple an automated unloading of the kernel on memory > >> hotplug. > >> > >> > >> I think it would be irresponsible to deprecate kexec_load on any > >> platform. > >> > >> I also suspect that kexec_file_load could be taught to copy the dtb > >> on arm32 if someone wants to deal with signatures. > >> > >> We definitely can not even think of deprecating kexec_load until > >> architecture that supports it also supports kexec_file_load and everyone > >> is happy with that interface. That is Linus's no regression rule. > > > > I should pick a milder word to express our tendency and tell our plan > > then 'obsolete'. Even though I added 'gradually', seems it doesn't help > > much. I didn't mean to say 'deprecate' at all when replied. > > > > The situation and trend I understand about kexec_load and kexec_file_load > > are: > > > > 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > > have yet, just as x86_64, arm64 and s390 have done; > > > > 2) kexec_file_load is suggested to use, and take precedence over > > kexec_load in the future, if both are supported in one ARCH. > > The deep problem is that kexec_file_load is distinctly less expressive > than kexec_load. > > > 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > > and by ARCHes for back compatibility w/ kexec_file_load support. > > > > For 1) and 2), I think the reason is obvious as Eric said, > > kexec_file_load is simple enough. And currently, whenever we got a bug > > report, we may need fix them twice, for kexec_load and kexec_file_load. > > If kexec_file_load is made by default, e.g on x86_64, we will change it > > in kernel space only, for kexec_file_load. This is what I meant about > > 'obsolete gradually'. I think for arm64, s390, they will do these too. > > Unless there's some critical/blocker bug in kexec_load, to corrupt the > > old kexec_load interface in old product. > > Maybe. The code that kexec_file_load sucked into the kernel is quite > stable and rarely needs changes except during a port of kexec to > another architecture. > > Last I looked the real maintenance effor of kexec and kexec on panic was > in the drivers. So I don't think we can use maintenance to do anything. > > > For 3), people can still use kexec_load and develop/fix for it, if no > > kexec_file_load supported. But 32-bit arm should be a different one, > > more like i386, we will leave it as is, and fix anything which could > > break it. But people really expects to improve or add feature to it? E.g > > in this patchset, the mem hotplug issue James raised, I assume James is > > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > > another reply, people even don't agree to continue supporting memory > > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > > bug on i386 with a patch, but people would rather set it as BROKEN. > > For memory hotplug just reload. Userspace already gets good events. > > We should not expect anything except a panic kernel to be loaded over a > memory hotplug event. The kexec on panic code should actually be loaded > in a location that we don't reliquish if asked for it. > > Quite frankly at this point I would love to see the signature fad die, > which would allow us to remove kexec_file_load. I still have not seen > the signature code used anywhere except by people anticipating trouble. Same to me, I also hate the Secure Boot, and I also do not like the trouble added by signature verification. But still we found that beyond of Secure Boot use cases it is also useful in other usual cases. And since kernel has the lockdown supported we have to leave with it. Thanks Dave _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 9:16 ` Dave Young 0 siblings, 0 replies; 264+ messages in thread From: Dave Young @ 2020-04-14 9:16 UTC (permalink / raw) To: Eric W. Biederman Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 04/13/20 at 08:15am, Eric W. Biederman wrote: > Baoquan He <bhe@redhat.com> writes: > > > On 04/12/20 at 02:52pm, Eric W. Biederman wrote: > >> > >> The only benefit of kexec_file_load is that it is simple enough from a > >> kernel perspective that signatures can be checked. > > > > We don't have this restriction any more with below commit: > > > > commit 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG > > and KEXEC_SIG_FORCE") > > > > With KEXEC_SIG_FORCE not set, we can use kexec_load_file to cover both > > secure boot or legacy system for kexec/kdump. Being simple enough is > > enough to astract and convince us to use it instead. And kexec_file_load > > has been in use for several years on systems with secure boot, since > > added in 2014, on x86_64. > > No. Actaully kexec_file_load is the less capable interface, and less > flexible interface. Which is why it is appropriate for signature > verification. I agreed that the user space design is more flexible, but as for the common use case of loading bzImage (say x86 as an example) the kexec_file_load is good enough. We could have other potential improvement based on kexec_file_load. For example we could use it to do some early kdump loading, eg. try to load an attached kdump kernel immediately once the crashkernel memory get reserved. > > >> kexec_load in every other respect is the more capable and functional > >> interface. It makes no sense to get rid of it. We do not remove kexec_load at all, it is indeed helpful in many cases as all agreed. But if we have a bug reported for both we may fix kexec_file_load first because it is usually easier, also do not need to worry about too much about old kernel and new kernel compatibility. For example the recent breakage we found in efi path, kexec_file_load just work after the efi cleanup, but kexec_load depends on the ABI we added, so we must fix it as below: https://lore.kernel.org/linux-efi/20200410135644.GB6772@dhcp-128-65.nay.redhat.com/ > >> > >> It does make sense to reload with a loaded kernel on memory hotplug. > >> That is simple and easy. If we are going to handle something in the > >> kernel it should simple an automated unloading of the kernel on memory > >> hotplug. > >> > >> > >> I think it would be irresponsible to deprecate kexec_load on any > >> platform. > >> > >> I also suspect that kexec_file_load could be taught to copy the dtb > >> on arm32 if someone wants to deal with signatures. > >> > >> We definitely can not even think of deprecating kexec_load until > >> architecture that supports it also supports kexec_file_load and everyone > >> is happy with that interface. That is Linus's no regression rule. > > > > I should pick a milder word to express our tendency and tell our plan > > then 'obsolete'. Even though I added 'gradually', seems it doesn't help > > much. I didn't mean to say 'deprecate' at all when replied. > > > > The situation and trend I understand about kexec_load and kexec_file_load > > are: > > > > 1) Supporting kexec_file_load is suggested to add in ARCHes which don't > > have yet, just as x86_64, arm64 and s390 have done; > > > > 2) kexec_file_load is suggested to use, and take precedence over > > kexec_load in the future, if both are supported in one ARCH. > > The deep problem is that kexec_file_load is distinctly less expressive > than kexec_load. > > > 3) Kexec_load is kept being used by ARCHes w/o kexc_file_load support, > > and by ARCHes for back compatibility w/ kexec_file_load support. > > > > For 1) and 2), I think the reason is obvious as Eric said, > > kexec_file_load is simple enough. And currently, whenever we got a bug > > report, we may need fix them twice, for kexec_load and kexec_file_load. > > If kexec_file_load is made by default, e.g on x86_64, we will change it > > in kernel space only, for kexec_file_load. This is what I meant about > > 'obsolete gradually'. I think for arm64, s390, they will do these too. > > Unless there's some critical/blocker bug in kexec_load, to corrupt the > > old kexec_load interface in old product. > > Maybe. The code that kexec_file_load sucked into the kernel is quite > stable and rarely needs changes except during a port of kexec to > another architecture. > > Last I looked the real maintenance effor of kexec and kexec on panic was > in the drivers. So I don't think we can use maintenance to do anything. > > > For 3), people can still use kexec_load and develop/fix for it, if no > > kexec_file_load supported. But 32-bit arm should be a different one, > > more like i386, we will leave it as is, and fix anything which could > > break it. But people really expects to improve or add feature to it? E.g > > in this patchset, the mem hotplug issue James raised, I assume James is > > focusing on arm64, x86_64, but not 32-bit arm. As DavidH commented in > > another reply, people even don't agree to continue supporting memory > > hotplug on 32-bit system. We ever took effort to fix a memory hotplug > > bug on i386 with a patch, but people would rather set it as BROKEN. > > For memory hotplug just reload. Userspace already gets good events. > > We should not expect anything except a panic kernel to be loaded over a > memory hotplug event. The kexec on panic code should actually be loaded > in a location that we don't reliquish if asked for it. > > Quite frankly at this point I would love to see the signature fad die, > which would allow us to remove kexec_file_load. I still have not seen > the signature code used anywhere except by people anticipating trouble. Same to me, I also hate the Secure Boot, and I also do not like the trouble added by signature verification. But still we found that beyond of Secure Boot use cases it is also useful in other usual cases. And since kernel has the lockdown supported we have to leave with it. Thanks Dave _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 9:16 ` Dave Young (?) @ 2020-04-14 9:38 ` Dave Young -1 siblings, 0 replies; 264+ messages in thread From: Dave Young @ 2020-04-14 9:38 UTC (permalink / raw) To: Eric W. Biederman Cc: Baoquan He, David Hildenbrand, Catalin Marinas, Bhupesh Sharma, Anshuman Khandual, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel > We do not remove kexec_load at all, it is indeed helpful in many cases > as all agreed. But if we have a bug reported for both we may fix > kexec_file_load first because it is usually easier, also do not need to > worry about too much about old kernel and new kernel compatibility. > > For example the recent breakage we found in efi path, kexec_file_load > just work after the efi cleanup, but kexec_load depends on the ABI we > added, so we must fix it as below: > https://lore.kernel.org/linux-efi/20200410135644.GB6772@dhcp-128-65.nay.redhat.com/ Also, we have some specific sysfs files exported for kexec-tools use /sys/firmware/efi/runtime-map/* and a few other table addresses: fw_vendor runtime and config_table under /sys/firmware/efi That is only used by userspace kexec_tools for kexec_load, now the runtime field is useless because of Ard's cleanup in efi code, but we have to keep it there, older kexec-tools will need it. In this case kexec_file_load do not need those hacks at all. So in the future if we have to invent some kernel/userspace abi only for kexec_load we should be careful and maybe reject if no strong reason. Thanks Dave ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 9:38 ` Dave Young 0 siblings, 0 replies; 264+ messages in thread From: Dave Young @ 2020-04-14 9:38 UTC (permalink / raw) To: Eric W. Biederman Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel > We do not remove kexec_load at all, it is indeed helpful in many cases > as all agreed. But if we have a bug reported for both we may fix > kexec_file_load first because it is usually easier, also do not need to > worry about too much about old kernel and new kernel compatibility. > > For example the recent breakage we found in efi path, kexec_file_load > just work after the efi cleanup, but kexec_load depends on the ABI we > added, so we must fix it as below: > https://lore.kernel.org/linux-efi/20200410135644.GB6772@dhcp-128-65.nay.redhat.com/ Also, we have some specific sysfs files exported for kexec-tools use /sys/firmware/efi/runtime-map/* and a few other table addresses: fw_vendor runtime and config_table under /sys/firmware/efi That is only used by userspace kexec_tools for kexec_load, now the runtime field is useless because of Ard's cleanup in efi code, but we have to keep it there, older kexec-tools will need it. In this case kexec_file_load do not need those hacks at all. So in the future if we have to invent some kernel/userspace abi only for kexec_load we should be careful and maybe reject if no strong reason. Thanks Dave _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 9:38 ` Dave Young 0 siblings, 0 replies; 264+ messages in thread From: Dave Young @ 2020-04-14 9:38 UTC (permalink / raw) To: Eric W. Biederman Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, David Hildenbrand, kexec, Russell King - ARM Linux admin, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel > We do not remove kexec_load at all, it is indeed helpful in many cases > as all agreed. But if we have a bug reported for both we may fix > kexec_file_load first because it is usually easier, also do not need to > worry about too much about old kernel and new kernel compatibility. > > For example the recent breakage we found in efi path, kexec_file_load > just work after the efi cleanup, but kexec_load depends on the ABI we > added, so we must fix it as below: > https://lore.kernel.org/linux-efi/20200410135644.GB6772@dhcp-128-65.nay.redhat.com/ Also, we have some specific sysfs files exported for kexec-tools use /sys/firmware/efi/runtime-map/* and a few other table addresses: fw_vendor runtime and config_table under /sys/firmware/efi That is only used by userspace kexec_tools for kexec_load, now the runtime field is useless because of Ard's cleanup in efi code, but we have to keep it there, older kexec-tools will need it. In this case kexec_file_load do not need those hacks at all. So in the future if we have to invent some kernel/userspace abi only for kexec_load we should be careful and maybe reject if no strong reason. Thanks Dave _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-10 19:10 ` Andrew Morton (?) @ 2020-04-14 7:05 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-14 7:05 UTC (permalink / raw) To: Andrew Morton Cc: James Morse, kexec, linux-mm, linux-arm-kernel, Eric Biederman, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma On 10.04.20 21:10, Andrew Morton wrote: > It's unclear (to me) what is the status of this patchset. But it does appear that > an new version can be expected? > I'd suggest to unqueue the patches until we have a consensus. While there are a couple of ideas floating around here, my current suggestion would be either 1. Indicate all hotplugged memory as "System RAM (hotplugged)" in /proc/iomem and the firmware memmap (on all architectures). This will require kexec changes, but I would have assume that kexec has to be updated in lock-step with the kernel just like e.g., makedumpfile. Modify kexec() to not place the kexec kernel on these areas (easy) but still consider them as crash regions to dump. When loading a kexec kernel, validate in the kernel that the memory is appropriate. 2. Make kexec() reload the the kernel whenever we e.g., get a udev event for removal of memory in /sys/devices/system/memory/. On every remove_memory(), invalidate the loaded kernel in the kernel. As I mentioned somewhere, 1. will be interesting for virtio-mem, where we don't want any kexec kernel to be placed on virtio-mem-added memory. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 7:05 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-14 7:05 UTC (permalink / raw) To: Andrew Morton Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Eric Biederman, Will Deacon, linux-arm-kernel On 10.04.20 21:10, Andrew Morton wrote: > It's unclear (to me) what is the status of this patchset. But it does appear that > an new version can be expected? > I'd suggest to unqueue the patches until we have a consensus. While there are a couple of ideas floating around here, my current suggestion would be either 1. Indicate all hotplugged memory as "System RAM (hotplugged)" in /proc/iomem and the firmware memmap (on all architectures). This will require kexec changes, but I would have assume that kexec has to be updated in lock-step with the kernel just like e.g., makedumpfile. Modify kexec() to not place the kexec kernel on these areas (easy) but still consider them as crash regions to dump. When loading a kexec kernel, validate in the kernel that the memory is appropriate. 2. Make kexec() reload the the kernel whenever we e.g., get a udev event for removal of memory in /sys/devices/system/memory/. On every remove_memory(), invalidate the loaded kernel in the kernel. As I mentioned somewhere, 1. will be interesting for virtio-mem, where we don't want any kexec kernel to be placed on virtio-mem-added memory. -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 7:05 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-14 7:05 UTC (permalink / raw) To: Andrew Morton Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Eric Biederman, Will Deacon, linux-arm-kernel On 10.04.20 21:10, Andrew Morton wrote: > It's unclear (to me) what is the status of this patchset. But it does appear that > an new version can be expected? > I'd suggest to unqueue the patches until we have a consensus. While there are a couple of ideas floating around here, my current suggestion would be either 1. Indicate all hotplugged memory as "System RAM (hotplugged)" in /proc/iomem and the firmware memmap (on all architectures). This will require kexec changes, but I would have assume that kexec has to be updated in lock-step with the kernel just like e.g., makedumpfile. Modify kexec() to not place the kexec kernel on these areas (easy) but still consider them as crash regions to dump. When loading a kexec kernel, validate in the kernel that the memory is appropriate. 2. Make kexec() reload the the kernel whenever we e.g., get a udev event for removal of memory in /sys/devices/system/memory/. On every remove_memory(), invalidate the loaded kernel in the kernel. As I mentioned somewhere, 1. will be interesting for virtio-mem, where we don't want any kexec kernel to be placed on virtio-mem-added memory. -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 7:05 ` David Hildenbrand (?) @ 2020-04-14 16:55 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-14 16:55 UTC (permalink / raw) To: David Hildenbrand, Andrew Morton Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma Hi guys, On 14/04/2020 08:05, David Hildenbrand wrote: > On 10.04.20 21:10, Andrew Morton wrote: >> It's unclear (to me) what is the status of this patchset. But it does appear that >> an new version can be expected? > I'd suggest to unqueue the patches until we have a consensus. Certainly! > While there are a couple of ideas floating around here, my current > suggestion would be either > > 1. Indicate all hotplugged memory as "System RAM (hotplugged)" in > /proc/iomem and the firmware memmap (on all architectures). This will > require kexec changes, > but I would have assume that kexec has to be > updated in lock-step with the kernel News to me: I was using the version I first built when arm64's support was new. I've only had to update it once when we had to change user-space. I don't think debian updates kexec-tools when it updates the kernel. Making changes to /proc/iomem means updating user-space again, (for kdump). I'd like to avoid that if its at all possible. > just like e.g., makedumpfile. > Modify kexec() to not place the kexec kernel on these areas (easy) but > still consider them as crash regions to dump. When loading a kexec > kernel, validate in the kernel that the memory is appropriate. > 2. Make kexec() reload the the kernel whenever we e.g., get a udev event > for removal of memory in /sys/devices/system/memory/. I don't think we can rely on user-space to do something, > On every remove_memory(), invalidate the loaded kernel in the kernel. This is an option, ... but its a change of behaviour. If user-space asks for two impossible things, the second request should fail. Having the first-one disappear is a bit spooky... Fortunately user-space checks the 'kexec -l' bit happened before it calls reboot() behind 'kexec -e'. So this works, but is not intuitive. ("Did I load it? What changed and when? oh, half a mile up in dmesg is a message saying the kernel discarded the kexec kernel last wednesday.") > As I mentioned somewhere, 1. will be interesting for virtio-mem, where > we don't want any kexec kernel to be placed on virtio-mem-added memory. Do these virtio-mem-added regions need to be accessible by kdump? (do we already need a user-space change for that?) A third option, along the line of what I posted: Split the 'offline' and 'removed' ideas, which David mentioned somewhere. We'd end up with (yet) another notifier chain, that prevents the memory being removed, but you can still mark it as offline in /sys/. (...I'm not quite sure why you would do that...) This would need hooking up for ACPI (which covers x86 and arm64), and other architectures mechanisms for doing this... arm64 can then switch is arch hook that prevents 'bootmem' being removed to this new notifier chain, as the kernel can only boot from that was present at boot. My preference is 3, then 2. I think 1 is slightly less desirable than a message at kexec time that the memory layout has changed since load, and this might not work... Thanks, James ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 16:55 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-14 16:55 UTC (permalink / raw) To: David Hildenbrand, Andrew Morton Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Will Deacon, linux-arm-kernel Hi guys, On 14/04/2020 08:05, David Hildenbrand wrote: > On 10.04.20 21:10, Andrew Morton wrote: >> It's unclear (to me) what is the status of this patchset. But it does appear that >> an new version can be expected? > I'd suggest to unqueue the patches until we have a consensus. Certainly! > While there are a couple of ideas floating around here, my current > suggestion would be either > > 1. Indicate all hotplugged memory as "System RAM (hotplugged)" in > /proc/iomem and the firmware memmap (on all architectures). This will > require kexec changes, > but I would have assume that kexec has to be > updated in lock-step with the kernel News to me: I was using the version I first built when arm64's support was new. I've only had to update it once when we had to change user-space. I don't think debian updates kexec-tools when it updates the kernel. Making changes to /proc/iomem means updating user-space again, (for kdump). I'd like to avoid that if its at all possible. > just like e.g., makedumpfile. > Modify kexec() to not place the kexec kernel on these areas (easy) but > still consider them as crash regions to dump. When loading a kexec > kernel, validate in the kernel that the memory is appropriate. > 2. Make kexec() reload the the kernel whenever we e.g., get a udev event > for removal of memory in /sys/devices/system/memory/. I don't think we can rely on user-space to do something, > On every remove_memory(), invalidate the loaded kernel in the kernel. This is an option, ... but its a change of behaviour. If user-space asks for two impossible things, the second request should fail. Having the first-one disappear is a bit spooky... Fortunately user-space checks the 'kexec -l' bit happened before it calls reboot() behind 'kexec -e'. So this works, but is not intuitive. ("Did I load it? What changed and when? oh, half a mile up in dmesg is a message saying the kernel discarded the kexec kernel last wednesday.") > As I mentioned somewhere, 1. will be interesting for virtio-mem, where > we don't want any kexec kernel to be placed on virtio-mem-added memory. Do these virtio-mem-added regions need to be accessible by kdump? (do we already need a user-space change for that?) A third option, along the line of what I posted: Split the 'offline' and 'removed' ideas, which David mentioned somewhere. We'd end up with (yet) another notifier chain, that prevents the memory being removed, but you can still mark it as offline in /sys/. (...I'm not quite sure why you would do that...) This would need hooking up for ACPI (which covers x86 and arm64), and other architectures mechanisms for doing this... arm64 can then switch is arch hook that prevents 'bootmem' being removed to this new notifier chain, as the kernel can only boot from that was present at boot. My preference is 3, then 2. I think 1 is slightly less desirable than a message at kexec time that the memory layout has changed since load, and this might not work... Thanks, James _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 16:55 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-14 16:55 UTC (permalink / raw) To: David Hildenbrand, Andrew Morton Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Will Deacon, linux-arm-kernel Hi guys, On 14/04/2020 08:05, David Hildenbrand wrote: > On 10.04.20 21:10, Andrew Morton wrote: >> It's unclear (to me) what is the status of this patchset. But it does appear that >> an new version can be expected? > I'd suggest to unqueue the patches until we have a consensus. Certainly! > While there are a couple of ideas floating around here, my current > suggestion would be either > > 1. Indicate all hotplugged memory as "System RAM (hotplugged)" in > /proc/iomem and the firmware memmap (on all architectures). This will > require kexec changes, > but I would have assume that kexec has to be > updated in lock-step with the kernel News to me: I was using the version I first built when arm64's support was new. I've only had to update it once when we had to change user-space. I don't think debian updates kexec-tools when it updates the kernel. Making changes to /proc/iomem means updating user-space again, (for kdump). I'd like to avoid that if its at all possible. > just like e.g., makedumpfile. > Modify kexec() to not place the kexec kernel on these areas (easy) but > still consider them as crash regions to dump. When loading a kexec > kernel, validate in the kernel that the memory is appropriate. > 2. Make kexec() reload the the kernel whenever we e.g., get a udev event > for removal of memory in /sys/devices/system/memory/. I don't think we can rely on user-space to do something, > On every remove_memory(), invalidate the loaded kernel in the kernel. This is an option, ... but its a change of behaviour. If user-space asks for two impossible things, the second request should fail. Having the first-one disappear is a bit spooky... Fortunately user-space checks the 'kexec -l' bit happened before it calls reboot() behind 'kexec -e'. So this works, but is not intuitive. ("Did I load it? What changed and when? oh, half a mile up in dmesg is a message saying the kernel discarded the kexec kernel last wednesday.") > As I mentioned somewhere, 1. will be interesting for virtio-mem, where > we don't want any kexec kernel to be placed on virtio-mem-added memory. Do these virtio-mem-added regions need to be accessible by kdump? (do we already need a user-space change for that?) A third option, along the line of what I posted: Split the 'offline' and 'removed' ideas, which David mentioned somewhere. We'd end up with (yet) another notifier chain, that prevents the memory being removed, but you can still mark it as offline in /sys/. (...I'm not quite sure why you would do that...) This would need hooking up for ACPI (which covers x86 and arm64), and other architectures mechanisms for doing this... arm64 can then switch is arch hook that prevents 'bootmem' being removed to this new notifier chain, as the kernel can only boot from that was present at boot. My preference is 3, then 2. I think 1 is slightly less desirable than a message at kexec time that the memory layout has changed since load, and this might not work... Thanks, James _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-14 16:55 ` James Morse (?) @ 2020-04-14 17:41 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-14 17:41 UTC (permalink / raw) To: James Morse, Andrew Morton Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma >> While there are a couple of ideas floating around here, my current >> suggestion would be either >> >> 1. Indicate all hotplugged memory as "System RAM (hotplugged)" in >> /proc/iomem and the firmware memmap (on all architectures). This will >> require kexec changes, > >> but I would have assume that kexec has to be >> updated in lock-step with the kernel > > News to me: I was using the version I first built when arm64's support was new. I've only > had to update it once when we had to change user-space. > > I don't think debian updates kexec-tools when it updates the kernel. I would assume they are also not pushing the latest-greatest kernel in their current release, after settling on a kexec version, no? I think you can assume new kernels to require new kexec-tools versions to provide all features. > Making changes to /proc/iomem means updating user-space again, (for kdump). I'd like to > avoid that if its at all possible. Yes, it's not desirable, but if all that's not working is a "not all memory will be dumped out of the box", at least I think this is tolerable. It's not like we're completely breaking kexec. Your current arm64 patches require the same change AFAIKS - and I think we already have arm64 hotplug support in Linux distros. As I said, similarly, makedumpfile has to be upgraded with every kernel release to make kdump work as expected. And that is no big news I hope :) >> just like e.g., makedumpfile. >> Modify kexec() to not place the kexec kernel on these areas (easy) but >> still consider them as crash regions to dump. When loading a kexec >> kernel, validate in the kernel that the memory is appropriate. > > >> 2. Make kexec() reload the the kernel whenever we e.g., get a udev event >> for removal of memory in /sys/devices/system/memory/. > > I don't think we can rely on user-space to do something, > > >> On every remove_memory(), invalidate the loaded kernel in the kernel. > > This is an option, ... but its a change of behaviour. If user-space asks for two > impossible things, the second request should fail. Having the first-one disappear is a bit > spooky... We are talking about corner cases that are already broken, no? > > Fortunately user-space checks the 'kexec -l' bit happened before it calls reboot() behind > 'kexec -e'. So this works, but is not intuitive. > > ("Did I load it? What changed and when? oh, half a mile up in dmesg is a message saying > the kernel discarded the kexec kernel last wednesday.") > > >> As I mentioned somewhere, 1. will be interesting for virtio-mem, where >> we don't want any kexec kernel to be placed on virtio-mem-added memory. > > Do these virtio-mem-added regions need to be accessible by kdump? > (do we already need a user-space change for that?) Yes, they have to be accessible by kdump. Currently, they are also exported as "System RAM" via /proc/iomem - which is why dumping works e.g., on x86-64 (we'll have to increase the #of memory resources that can be considered in the future, but that's a different story and only applies when adding more than 100GB of memory via virtio-mem or so) But as virtio-mem is fairly new (IOW, about to get queued for integration soonish), I could still change the memory resources to show up differently ("System RAM (hotplugged)", "System RAM (virtio-mem)", etc.) and teach kexec about them. But learning that we are having similar problems on arm64 (and theoretically on Hyper-V), I think it makes sense to discuss a solution that will solve the other issues as well. > > > A third option, along the line of what I posted: > > Split the 'offline' and 'removed' ideas, which David mentioned somewhere. We'd end up with > (yet) another notifier chain, that prevents the memory being removed, but you can still I dislike limiting memory unplug - and especially making remove_memory() fail - just because somebody once thought it would be a good place to load - in the future - some kexec binary onto it. > mark it as offline in /sys/. (...I'm not quite sure why you would do that...) > > This would need hooking up for ACPI (which covers x86 and arm64), and other architectures > mechanisms for doing this... > arm64 can then switch is arch hook that prevents 'bootmem' being removed to this new > notifier chain, as the kernel can only boot from that was present at boot. We have two different problems here, right? 1. Don't place kexec binaries on specific memory areas (e.g., arm64, virtio-mem, hyper-v, ...) 2. Figure out what to do when unplugging memory that was selected as a target for kexec binaries. For 1, I have a feeling that /proc/iomem could be the right solution, eventually requiring kexec changes to handle kdump properly (IOW, dump all memory). Indicating all hotplugged memory as "System RAM (hotplugged)" would be the way to go here. For 2, I think we should unload all kexec images in case they overlap with memory to be removed (e.g., remove_memory() notifier, which cannot stop removal, it's only an indication), and make userspace reload kexec via udev events. Also, we have to think about kexec_file_load() to deal with 1. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 17:41 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-14 17:41 UTC (permalink / raw) To: James Morse, Andrew Morton Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Will Deacon, linux-arm-kernel >> While there are a couple of ideas floating around here, my current >> suggestion would be either >> >> 1. Indicate all hotplugged memory as "System RAM (hotplugged)" in >> /proc/iomem and the firmware memmap (on all architectures). This will >> require kexec changes, > >> but I would have assume that kexec has to be >> updated in lock-step with the kernel > > News to me: I was using the version I first built when arm64's support was new. I've only > had to update it once when we had to change user-space. > > I don't think debian updates kexec-tools when it updates the kernel. I would assume they are also not pushing the latest-greatest kernel in their current release, after settling on a kexec version, no? I think you can assume new kernels to require new kexec-tools versions to provide all features. > Making changes to /proc/iomem means updating user-space again, (for kdump). I'd like to > avoid that if its at all possible. Yes, it's not desirable, but if all that's not working is a "not all memory will be dumped out of the box", at least I think this is tolerable. It's not like we're completely breaking kexec. Your current arm64 patches require the same change AFAIKS - and I think we already have arm64 hotplug support in Linux distros. As I said, similarly, makedumpfile has to be upgraded with every kernel release to make kdump work as expected. And that is no big news I hope :) >> just like e.g., makedumpfile. >> Modify kexec() to not place the kexec kernel on these areas (easy) but >> still consider them as crash regions to dump. When loading a kexec >> kernel, validate in the kernel that the memory is appropriate. > > >> 2. Make kexec() reload the the kernel whenever we e.g., get a udev event >> for removal of memory in /sys/devices/system/memory/. > > I don't think we can rely on user-space to do something, > > >> On every remove_memory(), invalidate the loaded kernel in the kernel. > > This is an option, ... but its a change of behaviour. If user-space asks for two > impossible things, the second request should fail. Having the first-one disappear is a bit > spooky... We are talking about corner cases that are already broken, no? > > Fortunately user-space checks the 'kexec -l' bit happened before it calls reboot() behind > 'kexec -e'. So this works, but is not intuitive. > > ("Did I load it? What changed and when? oh, half a mile up in dmesg is a message saying > the kernel discarded the kexec kernel last wednesday.") > > >> As I mentioned somewhere, 1. will be interesting for virtio-mem, where >> we don't want any kexec kernel to be placed on virtio-mem-added memory. > > Do these virtio-mem-added regions need to be accessible by kdump? > (do we already need a user-space change for that?) Yes, they have to be accessible by kdump. Currently, they are also exported as "System RAM" via /proc/iomem - which is why dumping works e.g., on x86-64 (we'll have to increase the #of memory resources that can be considered in the future, but that's a different story and only applies when adding more than 100GB of memory via virtio-mem or so) But as virtio-mem is fairly new (IOW, about to get queued for integration soonish), I could still change the memory resources to show up differently ("System RAM (hotplugged)", "System RAM (virtio-mem)", etc.) and teach kexec about them. But learning that we are having similar problems on arm64 (and theoretically on Hyper-V), I think it makes sense to discuss a solution that will solve the other issues as well. > > > A third option, along the line of what I posted: > > Split the 'offline' and 'removed' ideas, which David mentioned somewhere. We'd end up with > (yet) another notifier chain, that prevents the memory being removed, but you can still I dislike limiting memory unplug - and especially making remove_memory() fail - just because somebody once thought it would be a good place to load - in the future - some kexec binary onto it. > mark it as offline in /sys/. (...I'm not quite sure why you would do that...) > > This would need hooking up for ACPI (which covers x86 and arm64), and other architectures > mechanisms for doing this... > arm64 can then switch is arch hook that prevents 'bootmem' being removed to this new > notifier chain, as the kernel can only boot from that was present at boot. We have two different problems here, right? 1. Don't place kexec binaries on specific memory areas (e.g., arm64, virtio-mem, hyper-v, ...) 2. Figure out what to do when unplugging memory that was selected as a target for kexec binaries. For 1, I have a feeling that /proc/iomem could be the right solution, eventually requiring kexec changes to handle kdump properly (IOW, dump all memory). Indicating all hotplugged memory as "System RAM (hotplugged)" would be the way to go here. For 2, I think we should unload all kexec images in case they overlap with memory to be removed (e.g., remove_memory() notifier, which cannot stop removal, it's only an indication), and make userspace reload kexec via udev events. Also, we have to think about kexec_file_load() to deal with 1. -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-14 17:41 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-14 17:41 UTC (permalink / raw) To: James Morse, Andrew Morton Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Will Deacon, linux-arm-kernel >> While there are a couple of ideas floating around here, my current >> suggestion would be either >> >> 1. Indicate all hotplugged memory as "System RAM (hotplugged)" in >> /proc/iomem and the firmware memmap (on all architectures). This will >> require kexec changes, > >> but I would have assume that kexec has to be >> updated in lock-step with the kernel > > News to me: I was using the version I first built when arm64's support was new. I've only > had to update it once when we had to change user-space. > > I don't think debian updates kexec-tools when it updates the kernel. I would assume they are also not pushing the latest-greatest kernel in their current release, after settling on a kexec version, no? I think you can assume new kernels to require new kexec-tools versions to provide all features. > Making changes to /proc/iomem means updating user-space again, (for kdump). I'd like to > avoid that if its at all possible. Yes, it's not desirable, but if all that's not working is a "not all memory will be dumped out of the box", at least I think this is tolerable. It's not like we're completely breaking kexec. Your current arm64 patches require the same change AFAIKS - and I think we already have arm64 hotplug support in Linux distros. As I said, similarly, makedumpfile has to be upgraded with every kernel release to make kdump work as expected. And that is no big news I hope :) >> just like e.g., makedumpfile. >> Modify kexec() to not place the kexec kernel on these areas (easy) but >> still consider them as crash regions to dump. When loading a kexec >> kernel, validate in the kernel that the memory is appropriate. > > >> 2. Make kexec() reload the the kernel whenever we e.g., get a udev event >> for removal of memory in /sys/devices/system/memory/. > > I don't think we can rely on user-space to do something, > > >> On every remove_memory(), invalidate the loaded kernel in the kernel. > > This is an option, ... but its a change of behaviour. If user-space asks for two > impossible things, the second request should fail. Having the first-one disappear is a bit > spooky... We are talking about corner cases that are already broken, no? > > Fortunately user-space checks the 'kexec -l' bit happened before it calls reboot() behind > 'kexec -e'. So this works, but is not intuitive. > > ("Did I load it? What changed and when? oh, half a mile up in dmesg is a message saying > the kernel discarded the kexec kernel last wednesday.") > > >> As I mentioned somewhere, 1. will be interesting for virtio-mem, where >> we don't want any kexec kernel to be placed on virtio-mem-added memory. > > Do these virtio-mem-added regions need to be accessible by kdump? > (do we already need a user-space change for that?) Yes, they have to be accessible by kdump. Currently, they are also exported as "System RAM" via /proc/iomem - which is why dumping works e.g., on x86-64 (we'll have to increase the #of memory resources that can be considered in the future, but that's a different story and only applies when adding more than 100GB of memory via virtio-mem or so) But as virtio-mem is fairly new (IOW, about to get queued for integration soonish), I could still change the memory resources to show up differently ("System RAM (hotplugged)", "System RAM (virtio-mem)", etc.) and teach kexec about them. But learning that we are having similar problems on arm64 (and theoretically on Hyper-V), I think it makes sense to discuss a solution that will solve the other issues as well. > > > A third option, along the line of what I posted: > > Split the 'offline' and 'removed' ideas, which David mentioned somewhere. We'd end up with > (yet) another notifier chain, that prevents the memory being removed, but you can still I dislike limiting memory unplug - and especially making remove_memory() fail - just because somebody once thought it would be a good place to load - in the future - some kexec binary onto it. > mark it as offline in /sys/. (...I'm not quite sure why you would do that...) > > This would need hooking up for ACPI (which covers x86 and arm64), and other architectures > mechanisms for doing this... > arm64 can then switch is arch hook that prevents 'bootmem' being removed to this new > notifier chain, as the kernel can only boot from that was present at boot. We have two different problems here, right? 1. Don't place kexec binaries on specific memory areas (e.g., arm64, virtio-mem, hyper-v, ...) 2. Figure out what to do when unplugging memory that was selected as a target for kexec binaries. For 1, I have a feeling that /proc/iomem could be the right solution, eventually requiring kexec changes to handle kdump properly (IOW, dump all memory). Indicating all hotplugged memory as "System RAM (hotplugged)" would be the way to go here. For 2, I think we should unload all kexec images in case they overlap with memory to be removed (e.g., remove_memory() notifier, which cannot stop removal, it's only an indication), and make userspace reload kexec via udev events. Also, we have to think about kexec_file_load() to deal with 1. -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-03-26 18:07 ` James Morse (?) @ 2020-04-15 20:33 ` Eric W. Biederman -1 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-15 20:33 UTC (permalink / raw) To: James Morse Cc: kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Andrew Morton, Will Deacon James Morse <james.morse@arm.com> writes: > An image loaded for kexec is not stored in place, instead its segments > are scattered through memory, and are re-assembled when needed. In the > meantime, the target memory may have been removed. > > Because mm is not aware that this memory is still in use, it allows it > to be removed. > > Add a memory notifier to prevent the removal of memory regions that > overlap with a loaded kexec image segment. e.g., when triggered from the > Qemu console: > | kexec_core: memory region in use > | memory memory32: Offline failed. > > Signed-off-by: James Morse <james.morse@arm.com> Given that we are talking about the destination pages for kexec not where the loaded kernel is currently stored the description is confusing. Beyond that I think it would be better to simply unload the loaded kernel at memory hotunplug time. Usually somewhere in the loaded image is a copy of the memory map at the time the kexec kernel was loaded. That will invalidate the memory map as well. All of this should be for a very brief window of a few seconds, as the loaded kexec image is quite short. So instead of failing in the notifier, if you could simply unload the loaded image in the notifier I think that would be simpler and more robust. While still preventing the loaded image from falling over when it starts executing. Eric ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-15 20:33 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-15 20:33 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel James Morse <james.morse@arm.com> writes: > An image loaded for kexec is not stored in place, instead its segments > are scattered through memory, and are re-assembled when needed. In the > meantime, the target memory may have been removed. > > Because mm is not aware that this memory is still in use, it allows it > to be removed. > > Add a memory notifier to prevent the removal of memory regions that > overlap with a loaded kexec image segment. e.g., when triggered from the > Qemu console: > | kexec_core: memory region in use > | memory memory32: Offline failed. > > Signed-off-by: James Morse <james.morse@arm.com> Given that we are talking about the destination pages for kexec not where the loaded kernel is currently stored the description is confusing. Beyond that I think it would be better to simply unload the loaded kernel at memory hotunplug time. Usually somewhere in the loaded image is a copy of the memory map at the time the kexec kernel was loaded. That will invalidate the memory map as well. All of this should be for a very brief window of a few seconds, as the loaded kexec image is quite short. So instead of failing in the notifier, if you could simply unload the loaded image in the notifier I think that would be simpler and more robust. While still preventing the loaded image from falling over when it starts executing. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-15 20:33 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-15 20:33 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel James Morse <james.morse@arm.com> writes: > An image loaded for kexec is not stored in place, instead its segments > are scattered through memory, and are re-assembled when needed. In the > meantime, the target memory may have been removed. > > Because mm is not aware that this memory is still in use, it allows it > to be removed. > > Add a memory notifier to prevent the removal of memory regions that > overlap with a loaded kexec image segment. e.g., when triggered from the > Qemu console: > | kexec_core: memory region in use > | memory memory32: Offline failed. > > Signed-off-by: James Morse <james.morse@arm.com> Given that we are talking about the destination pages for kexec not where the loaded kernel is currently stored the description is confusing. Beyond that I think it would be better to simply unload the loaded kernel at memory hotunplug time. Usually somewhere in the loaded image is a copy of the memory map at the time the kexec kernel was loaded. That will invalidate the memory map as well. All of this should be for a very brief window of a few seconds, as the loaded kexec image is quite short. So instead of failing in the notifier, if you could simply unload the loaded image in the notifier I think that would be simpler and more robust. While still preventing the loaded image from falling over when it starts executing. Eric _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-15 20:33 ` Eric W. Biederman (?) @ 2020-04-22 12:28 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-22 12:28 UTC (permalink / raw) To: Eric W. Biederman Cc: kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Andrew Morton, Will Deacon Hi Eric, On 15/04/2020 21:33, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: > >> An image loaded for kexec is not stored in place, instead its segments >> are scattered through memory, and are re-assembled when needed. In the >> meantime, the target memory may have been removed. >> >> Because mm is not aware that this memory is still in use, it allows it >> to be removed. >> >> Add a memory notifier to prevent the removal of memory regions that >> overlap with a loaded kexec image segment. e.g., when triggered from the >> Qemu console: >> | kexec_core: memory region in use >> | memory memory32: Offline failed. >> >> Signed-off-by: James Morse <james.morse@arm.com> > > Given that we are talking about the destination pages for kexec > not where the loaded kernel is currently stored the description is > confusing. I think David has some better wording to cover this. I thought I had it with 'scattered and re-assembled'. > Beyond that I think it would be better to simply unload the loaded > kernel at memory hotunplug time. Unconditionally, or if it aliases the removed region? I don't particular like it. User-space has asked for two impossible things, we are changing the answer to the first when we see the second. Its a bit spooky. (maybe no one will notice) > Usually somewhere in the loaded image > is a copy of the memory map at the time the kexec kernel was loaded. > That will invalidate the memory map as well. Ah, unconditionally. Sure, x86 needs this. (arm64 re-discovers the memory map from firmware tables after kexec) If that's an acceptable change in behaviour, sure, lets do that. > All of this should be for a very brief window of a few seconds, as > the loaded kexec image is quite short. It seems I'm the outlier anticipating anything could happen between those syscalls. Thanks, James ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 12:28 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-22 12:28 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel Hi Eric, On 15/04/2020 21:33, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: > >> An image loaded for kexec is not stored in place, instead its segments >> are scattered through memory, and are re-assembled when needed. In the >> meantime, the target memory may have been removed. >> >> Because mm is not aware that this memory is still in use, it allows it >> to be removed. >> >> Add a memory notifier to prevent the removal of memory regions that >> overlap with a loaded kexec image segment. e.g., when triggered from the >> Qemu console: >> | kexec_core: memory region in use >> | memory memory32: Offline failed. >> >> Signed-off-by: James Morse <james.morse@arm.com> > > Given that we are talking about the destination pages for kexec > not where the loaded kernel is currently stored the description is > confusing. I think David has some better wording to cover this. I thought I had it with 'scattered and re-assembled'. > Beyond that I think it would be better to simply unload the loaded > kernel at memory hotunplug time. Unconditionally, or if it aliases the removed region? I don't particular like it. User-space has asked for two impossible things, we are changing the answer to the first when we see the second. Its a bit spooky. (maybe no one will notice) > Usually somewhere in the loaded image > is a copy of the memory map at the time the kexec kernel was loaded. > That will invalidate the memory map as well. Ah, unconditionally. Sure, x86 needs this. (arm64 re-discovers the memory map from firmware tables after kexec) If that's an acceptable change in behaviour, sure, lets do that. > All of this should be for a very brief window of a few seconds, as > the loaded kexec image is quite short. It seems I'm the outlier anticipating anything could happen between those syscalls. Thanks, James _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 12:28 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-22 12:28 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel Hi Eric, On 15/04/2020 21:33, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: > >> An image loaded for kexec is not stored in place, instead its segments >> are scattered through memory, and are re-assembled when needed. In the >> meantime, the target memory may have been removed. >> >> Because mm is not aware that this memory is still in use, it allows it >> to be removed. >> >> Add a memory notifier to prevent the removal of memory regions that >> overlap with a loaded kexec image segment. e.g., when triggered from the >> Qemu console: >> | kexec_core: memory region in use >> | memory memory32: Offline failed. >> >> Signed-off-by: James Morse <james.morse@arm.com> > > Given that we are talking about the destination pages for kexec > not where the loaded kernel is currently stored the description is > confusing. I think David has some better wording to cover this. I thought I had it with 'scattered and re-assembled'. > Beyond that I think it would be better to simply unload the loaded > kernel at memory hotunplug time. Unconditionally, or if it aliases the removed region? I don't particular like it. User-space has asked for two impossible things, we are changing the answer to the first when we see the second. Its a bit spooky. (maybe no one will notice) > Usually somewhere in the loaded image > is a copy of the memory map at the time the kexec kernel was loaded. > That will invalidate the memory map as well. Ah, unconditionally. Sure, x86 needs this. (arm64 re-discovers the memory map from firmware tables after kexec) If that's an acceptable change in behaviour, sure, lets do that. > All of this should be for a very brief window of a few seconds, as > the loaded kexec image is quite short. It seems I'm the outlier anticipating anything could happen between those syscalls. Thanks, James _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-22 12:28 ` James Morse (?) @ 2020-04-22 15:25 ` Eric W. Biederman -1 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-22 15:25 UTC (permalink / raw) To: James Morse Cc: kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Andrew Morton, Will Deacon James Morse <james.morse@arm.com> writes: > Hi Eric, > > On 15/04/2020 21:33, Eric W. Biederman wrote: >> James Morse <james.morse@arm.com> writes: >> >>> An image loaded for kexec is not stored in place, instead its segments >>> are scattered through memory, and are re-assembled when needed. In the >>> meantime, the target memory may have been removed. >>> >>> Because mm is not aware that this memory is still in use, it allows it >>> to be removed. >>> >>> Add a memory notifier to prevent the removal of memory regions that >>> overlap with a loaded kexec image segment. e.g., when triggered from the >>> Qemu console: >>> | kexec_core: memory region in use >>> | memory memory32: Offline failed. >>> >>> Signed-off-by: James Morse <james.morse@arm.com> >> >> Given that we are talking about the destination pages for kexec >> not where the loaded kernel is currently stored the description is >> confusing. > > I think David has some better wording to cover this. I thought I had it with 'scattered > and re-assembled'. The confusing part was talking about memory being still in use, that is actually scheduled for use in the future. >> Usually somewhere in the loaded image >> is a copy of the memory map at the time the kexec kernel was loaded. >> That will invalidate the memory map as well. > > Ah, unconditionally. Sure, x86 needs this. > (arm64 re-discovers the memory map from firmware tables after kexec) > > If that's an acceptable change in behaviour, sure, lets do that. Yes. >> All of this should be for a very brief window of a few seconds, as >> the loaded kexec image is quite short. > > It seems I'm the outlier anticipating anything could happen between > those syscalls. The design is: sys_kexec_load() shutdown scripts sys_reboot(LINUX_REBOOT_CMD_KEXEC); There are two system call simply so that the shutdown scripts can run. Now maybe someone somewhere does something different but that is not expected. Only the kexec on panic kernel is expected to persist somewhat indefinitely. But that should be in memory that is reserved from boot time, and so the memory hotplug should have enough visibility to not allow that memory to be given up. Eric ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 15:25 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-22 15:25 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel James Morse <james.morse@arm.com> writes: > Hi Eric, > > On 15/04/2020 21:33, Eric W. Biederman wrote: >> James Morse <james.morse@arm.com> writes: >> >>> An image loaded for kexec is not stored in place, instead its segments >>> are scattered through memory, and are re-assembled when needed. In the >>> meantime, the target memory may have been removed. >>> >>> Because mm is not aware that this memory is still in use, it allows it >>> to be removed. >>> >>> Add a memory notifier to prevent the removal of memory regions that >>> overlap with a loaded kexec image segment. e.g., when triggered from the >>> Qemu console: >>> | kexec_core: memory region in use >>> | memory memory32: Offline failed. >>> >>> Signed-off-by: James Morse <james.morse@arm.com> >> >> Given that we are talking about the destination pages for kexec >> not where the loaded kernel is currently stored the description is >> confusing. > > I think David has some better wording to cover this. I thought I had it with 'scattered > and re-assembled'. The confusing part was talking about memory being still in use, that is actually scheduled for use in the future. >> Usually somewhere in the loaded image >> is a copy of the memory map at the time the kexec kernel was loaded. >> That will invalidate the memory map as well. > > Ah, unconditionally. Sure, x86 needs this. > (arm64 re-discovers the memory map from firmware tables after kexec) > > If that's an acceptable change in behaviour, sure, lets do that. Yes. >> All of this should be for a very brief window of a few seconds, as >> the loaded kexec image is quite short. > > It seems I'm the outlier anticipating anything could happen between > those syscalls. The design is: sys_kexec_load() shutdown scripts sys_reboot(LINUX_REBOOT_CMD_KEXEC); There are two system call simply so that the shutdown scripts can run. Now maybe someone somewhere does something different but that is not expected. Only the kexec on panic kernel is expected to persist somewhat indefinitely. But that should be in memory that is reserved from boot time, and so the memory hotplug should have enough visibility to not allow that memory to be given up. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 15:25 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-22 15:25 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel James Morse <james.morse@arm.com> writes: > Hi Eric, > > On 15/04/2020 21:33, Eric W. Biederman wrote: >> James Morse <james.morse@arm.com> writes: >> >>> An image loaded for kexec is not stored in place, instead its segments >>> are scattered through memory, and are re-assembled when needed. In the >>> meantime, the target memory may have been removed. >>> >>> Because mm is not aware that this memory is still in use, it allows it >>> to be removed. >>> >>> Add a memory notifier to prevent the removal of memory regions that >>> overlap with a loaded kexec image segment. e.g., when triggered from the >>> Qemu console: >>> | kexec_core: memory region in use >>> | memory memory32: Offline failed. >>> >>> Signed-off-by: James Morse <james.morse@arm.com> >> >> Given that we are talking about the destination pages for kexec >> not where the loaded kernel is currently stored the description is >> confusing. > > I think David has some better wording to cover this. I thought I had it with 'scattered > and re-assembled'. The confusing part was talking about memory being still in use, that is actually scheduled for use in the future. >> Usually somewhere in the loaded image >> is a copy of the memory map at the time the kexec kernel was loaded. >> That will invalidate the memory map as well. > > Ah, unconditionally. Sure, x86 needs this. > (arm64 re-discovers the memory map from firmware tables after kexec) > > If that's an acceptable change in behaviour, sure, lets do that. Yes. >> All of this should be for a very brief window of a few seconds, as >> the loaded kexec image is quite short. > > It seems I'm the outlier anticipating anything could happen between > those syscalls. The design is: sys_kexec_load() shutdown scripts sys_reboot(LINUX_REBOOT_CMD_KEXEC); There are two system call simply so that the shutdown scripts can run. Now maybe someone somewhere does something different but that is not expected. Only the kexec on panic kernel is expected to persist somewhat indefinitely. But that should be in memory that is reserved from boot time, and so the memory hotplug should have enough visibility to not allow that memory to be given up. Eric _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-22 15:25 ` Eric W. Biederman (?) @ 2020-04-22 16:40 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-22 16:40 UTC (permalink / raw) To: Eric W. Biederman, James Morse Cc: kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Andrew Morton, Will Deacon > The confusing part was talking about memory being still in use, > that is actually scheduled for use in the future. +1 > >>> Usually somewhere in the loaded image >>> is a copy of the memory map at the time the kexec kernel was loaded. >>> That will invalidate the memory map as well. >> >> Ah, unconditionally. Sure, x86 needs this. >> (arm64 re-discovers the memory map from firmware tables after kexec) Does this include hotplugged DIMMs e.g., under KVM? [...] >>> All of this should be for a very brief window of a few seconds, as >>> the loaded kexec image is quite short. >> >> It seems I'm the outlier anticipating anything could happen between >> those syscalls. > > The design is: > sys_kexec_load() > shutdown scripts > sys_reboot(LINUX_REBOOT_CMD_KEXEC); > > There are two system call simply so that the shutdown scripts can run. > Now maybe someone somewhere does something different but that is not > expected. > > Only the kexec on panic kernel is expected to persist somewhat > indefinitely. But that should be in memory that is reserved from boot > time, and so the memory hotplug should have enough visibility to not > allow that memory to be given up. Yes, and AFAIK, memory blocks which hold the reserved crashkernel area can usually not get offlined and, therefore, the memory cannot get removed. Interestingly, s390x even has a hotplug notifier for that arch/s390/kernel/setup.c:kdump_mem_notifier() (offlining of memory on s390x can result in memory getting depopulated in the hypervisor, so after it would have been offlined, it would no longer be accessible. I somewhat doubt that this notifier is really needed - all pages in the crashkernel area should look like ordinary allocated pages when the area is reserved early during boot via the memblock allocator, and therefore offlining cannot succeed. But that's a different story - and I suspect this is a leftover from pre-memblock times.) -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 16:40 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-22 16:40 UTC (permalink / raw) To: Eric W. Biederman, James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel > The confusing part was talking about memory being still in use, > that is actually scheduled for use in the future. +1 > >>> Usually somewhere in the loaded image >>> is a copy of the memory map at the time the kexec kernel was loaded. >>> That will invalidate the memory map as well. >> >> Ah, unconditionally. Sure, x86 needs this. >> (arm64 re-discovers the memory map from firmware tables after kexec) Does this include hotplugged DIMMs e.g., under KVM? [...] >>> All of this should be for a very brief window of a few seconds, as >>> the loaded kexec image is quite short. >> >> It seems I'm the outlier anticipating anything could happen between >> those syscalls. > > The design is: > sys_kexec_load() > shutdown scripts > sys_reboot(LINUX_REBOOT_CMD_KEXEC); > > There are two system call simply so that the shutdown scripts can run. > Now maybe someone somewhere does something different but that is not > expected. > > Only the kexec on panic kernel is expected to persist somewhat > indefinitely. But that should be in memory that is reserved from boot > time, and so the memory hotplug should have enough visibility to not > allow that memory to be given up. Yes, and AFAIK, memory blocks which hold the reserved crashkernel area can usually not get offlined and, therefore, the memory cannot get removed. Interestingly, s390x even has a hotplug notifier for that arch/s390/kernel/setup.c:kdump_mem_notifier() (offlining of memory on s390x can result in memory getting depopulated in the hypervisor, so after it would have been offlined, it would no longer be accessible. I somewhat doubt that this notifier is really needed - all pages in the crashkernel area should look like ordinary allocated pages when the area is reserved early during boot via the memblock allocator, and therefore offlining cannot succeed. But that's a different story - and I suspect this is a leftover from pre-memblock times.) -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-22 16:40 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-22 16:40 UTC (permalink / raw) To: Eric W. Biederman, James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel > The confusing part was talking about memory being still in use, > that is actually scheduled for use in the future. +1 > >>> Usually somewhere in the loaded image >>> is a copy of the memory map at the time the kexec kernel was loaded. >>> That will invalidate the memory map as well. >> >> Ah, unconditionally. Sure, x86 needs this. >> (arm64 re-discovers the memory map from firmware tables after kexec) Does this include hotplugged DIMMs e.g., under KVM? [...] >>> All of this should be for a very brief window of a few seconds, as >>> the loaded kexec image is quite short. >> >> It seems I'm the outlier anticipating anything could happen between >> those syscalls. > > The design is: > sys_kexec_load() > shutdown scripts > sys_reboot(LINUX_REBOOT_CMD_KEXEC); > > There are two system call simply so that the shutdown scripts can run. > Now maybe someone somewhere does something different but that is not > expected. > > Only the kexec on panic kernel is expected to persist somewhat > indefinitely. But that should be in memory that is reserved from boot > time, and so the memory hotplug should have enough visibility to not > allow that memory to be given up. Yes, and AFAIK, memory blocks which hold the reserved crashkernel area can usually not get offlined and, therefore, the memory cannot get removed. Interestingly, s390x even has a hotplug notifier for that arch/s390/kernel/setup.c:kdump_mem_notifier() (offlining of memory on s390x can result in memory getting depopulated in the hypervisor, so after it would have been offlined, it would no longer be accessible. I somewhat doubt that this notifier is really needed - all pages in the crashkernel area should look like ordinary allocated pages when the area is reserved early during boot via the memblock allocator, and therefore offlining cannot succeed. But that's a different story - and I suspect this is a leftover from pre-memblock times.) -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-22 16:40 ` David Hildenbrand (?) @ 2020-04-23 16:29 ` Eric W. Biederman -1 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-23 16:29 UTC (permalink / raw) To: David Hildenbrand Cc: James Morse, kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Andrew Morton, Will Deacon David Hildenbrand <david@redhat.com> writes: >> The confusing part was talking about memory being still in use, >> that is actually scheduled for use in the future. > > +1 > >> >>>> Usually somewhere in the loaded image >>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>> That will invalidate the memory map as well. >>> >>> Ah, unconditionally. Sure, x86 needs this. >>> (arm64 re-discovers the memory map from firmware tables after kexec) > > Does this include hotplugged DIMMs e.g., under KVM? > [...] As far as I know. If the memory map changes we need to drop the loaded image. Having thought about it a little more I suspect it would be the other way and just block all hotplug actions after a kexec_load. As all we expect to happen is running shutdown scripts. If blocking the hotplug action uses printk to print a nice message saying something like: "Hotplug blocked because of a loaded kexec image", then people will be able to figure out what is going on and call kexec -u if they haven't started the shutdown scripts yet. Either way it is something simple and unconditional that will make things work. >>>> All of this should be for a very brief window of a few seconds, as >>>> the loaded kexec image is quite short. >>> >>> It seems I'm the outlier anticipating anything could happen between >>> those syscalls. >> >> The design is: >> sys_kexec_load() >> shutdown scripts >> sys_reboot(LINUX_REBOOT_CMD_KEXEC); >> >> There are two system call simply so that the shutdown scripts can run. >> Now maybe someone somewhere does something different but that is not >> expected. >> >> Only the kexec on panic kernel is expected to persist somewhat >> indefinitely. But that should be in memory that is reserved from boot >> time, and so the memory hotplug should have enough visibility to not >> allow that memory to be given up. > > Yes, and AFAIK, memory blocks which hold the reserved crashkernel area > can usually not get offlined and, therefore, the memory cannot get removed. > > Interestingly, s390x even has a hotplug notifier for that > > arch/s390/kernel/setup.c:kdump_mem_notifier() > > (offlining of memory on s390x can result in memory getting depopulated > in the hypervisor, so after it would have been offlined, it would no > longer be accessible. I somewhat doubt that this notifier is really > needed - all pages in the crashkernel area should look like ordinary > allocated pages when the area is reserved early during boot via the > memblock allocator, and therefore offlining cannot succeed. But that's a > different story - and I suspect this is a leftover from pre-memblock times.) It might be worth seeing if that is true, or if we need to generalize the s390x code. Eric ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-23 16:29 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-23 16:29 UTC (permalink / raw) To: David Hildenbrand Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel David Hildenbrand <david@redhat.com> writes: >> The confusing part was talking about memory being still in use, >> that is actually scheduled for use in the future. > > +1 > >> >>>> Usually somewhere in the loaded image >>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>> That will invalidate the memory map as well. >>> >>> Ah, unconditionally. Sure, x86 needs this. >>> (arm64 re-discovers the memory map from firmware tables after kexec) > > Does this include hotplugged DIMMs e.g., under KVM? > [...] As far as I know. If the memory map changes we need to drop the loaded image. Having thought about it a little more I suspect it would be the other way and just block all hotplug actions after a kexec_load. As all we expect to happen is running shutdown scripts. If blocking the hotplug action uses printk to print a nice message saying something like: "Hotplug blocked because of a loaded kexec image", then people will be able to figure out what is going on and call kexec -u if they haven't started the shutdown scripts yet. Either way it is something simple and unconditional that will make things work. >>>> All of this should be for a very brief window of a few seconds, as >>>> the loaded kexec image is quite short. >>> >>> It seems I'm the outlier anticipating anything could happen between >>> those syscalls. >> >> The design is: >> sys_kexec_load() >> shutdown scripts >> sys_reboot(LINUX_REBOOT_CMD_KEXEC); >> >> There are two system call simply so that the shutdown scripts can run. >> Now maybe someone somewhere does something different but that is not >> expected. >> >> Only the kexec on panic kernel is expected to persist somewhat >> indefinitely. But that should be in memory that is reserved from boot >> time, and so the memory hotplug should have enough visibility to not >> allow that memory to be given up. > > Yes, and AFAIK, memory blocks which hold the reserved crashkernel area > can usually not get offlined and, therefore, the memory cannot get removed. > > Interestingly, s390x even has a hotplug notifier for that > > arch/s390/kernel/setup.c:kdump_mem_notifier() > > (offlining of memory on s390x can result in memory getting depopulated > in the hypervisor, so after it would have been offlined, it would no > longer be accessible. I somewhat doubt that this notifier is really > needed - all pages in the crashkernel area should look like ordinary > allocated pages when the area is reserved early during boot via the > memblock allocator, and therefore offlining cannot succeed. But that's a > different story - and I suspect this is a leftover from pre-memblock times.) It might be worth seeing if that is true, or if we need to generalize the s390x code. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-23 16:29 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-23 16:29 UTC (permalink / raw) To: David Hildenbrand Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel David Hildenbrand <david@redhat.com> writes: >> The confusing part was talking about memory being still in use, >> that is actually scheduled for use in the future. > > +1 > >> >>>> Usually somewhere in the loaded image >>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>> That will invalidate the memory map as well. >>> >>> Ah, unconditionally. Sure, x86 needs this. >>> (arm64 re-discovers the memory map from firmware tables after kexec) > > Does this include hotplugged DIMMs e.g., under KVM? > [...] As far as I know. If the memory map changes we need to drop the loaded image. Having thought about it a little more I suspect it would be the other way and just block all hotplug actions after a kexec_load. As all we expect to happen is running shutdown scripts. If blocking the hotplug action uses printk to print a nice message saying something like: "Hotplug blocked because of a loaded kexec image", then people will be able to figure out what is going on and call kexec -u if they haven't started the shutdown scripts yet. Either way it is something simple and unconditional that will make things work. >>>> All of this should be for a very brief window of a few seconds, as >>>> the loaded kexec image is quite short. >>> >>> It seems I'm the outlier anticipating anything could happen between >>> those syscalls. >> >> The design is: >> sys_kexec_load() >> shutdown scripts >> sys_reboot(LINUX_REBOOT_CMD_KEXEC); >> >> There are two system call simply so that the shutdown scripts can run. >> Now maybe someone somewhere does something different but that is not >> expected. >> >> Only the kexec on panic kernel is expected to persist somewhat >> indefinitely. But that should be in memory that is reserved from boot >> time, and so the memory hotplug should have enough visibility to not >> allow that memory to be given up. > > Yes, and AFAIK, memory blocks which hold the reserved crashkernel area > can usually not get offlined and, therefore, the memory cannot get removed. > > Interestingly, s390x even has a hotplug notifier for that > > arch/s390/kernel/setup.c:kdump_mem_notifier() > > (offlining of memory on s390x can result in memory getting depopulated > in the hypervisor, so after it would have been offlined, it would no > longer be accessible. I somewhat doubt that this notifier is really > needed - all pages in the crashkernel area should look like ordinary > allocated pages when the area is reserved early during boot via the > memblock allocator, and therefore offlining cannot succeed. But that's a > different story - and I suspect this is a leftover from pre-memblock times.) It might be worth seeing if that is true, or if we need to generalize the s390x code. Eric _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-23 16:29 ` Eric W. Biederman (?) @ 2020-04-24 7:39 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-24 7:39 UTC (permalink / raw) To: Eric W. Biederman Cc: James Morse, kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Andrew Morton, Will Deacon On 23.04.20 18:29, Eric W. Biederman wrote: > David Hildenbrand <david@redhat.com> writes: > >>> The confusing part was talking about memory being still in use, >>> that is actually scheduled for use in the future. >> >> +1 >> >>> >>>>> Usually somewhere in the loaded image >>>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>>> That will invalidate the memory map as well. >>>> >>>> Ah, unconditionally. Sure, x86 needs this. >>>> (arm64 re-discovers the memory map from firmware tables after kexec) >> >> Does this include hotplugged DIMMs e.g., under KVM? >> [...] > > As far as I know. If the memory map changes we need to drop the loaded > image. > > > Having thought about it a little more I suspect it would be the > other way and just block all hotplug actions after a kexec_load. > As all we expect to happen is running shutdown scripts. > > If blocking the hotplug action uses printk to print a nice message > saying something like: "Hotplug blocked because of a loaded kexec image", > then people will be able to figure out what is going on and > call kexec -u if they haven't started the shutdown scripts yet. > > > Either way it is something simple and unconditional that will make > things work. > Personally, I consider memory hotplug more important than keeping loaded kexec data alive (just because somebody once decided to do a "kexec -l" and never did a "kexec -e" we should not block any memory hot(un)plug - especially in virtualized environments - for all eternity). So IMHO we would invalidate loaded kexec data (not the crashkernel, of course) on memory hot(un)plug and print a warning. In addition, we can let kexec-tools try to reload whatever they loaded after getting notified that something changed. The "something changed" is visible to user space e.g., via udev events for /sys/devices/memory/memoryX/ >>>>> All of this should be for a very brief window of a few seconds, as >>>>> the loaded kexec image is quite short. >>>> >>>> It seems I'm the outlier anticipating anything could happen between >>>> those syscalls. >>> >>> The design is: >>> sys_kexec_load() >>> shutdown scripts >>> sys_reboot(LINUX_REBOOT_CMD_KEXEC); >>> >>> There are two system call simply so that the shutdown scripts can run. >>> Now maybe someone somewhere does something different but that is not >>> expected. >>> >>> Only the kexec on panic kernel is expected to persist somewhat >>> indefinitely. But that should be in memory that is reserved from boot >>> time, and so the memory hotplug should have enough visibility to not >>> allow that memory to be given up. >> >> Yes, and AFAIK, memory blocks which hold the reserved crashkernel area >> can usually not get offlined and, therefore, the memory cannot get removed. >> >> Interestingly, s390x even has a hotplug notifier for that >> >> arch/s390/kernel/setup.c:kdump_mem_notifier() >> >> (offlining of memory on s390x can result in memory getting depopulated >> in the hypervisor, so after it would have been offlined, it would no >> longer be accessible. I somewhat doubt that this notifier is really >> needed - all pages in the crashkernel area should look like ordinary >> allocated pages when the area is reserved early during boot via the >> memblock allocator, and therefore offlining cannot succeed. But that's a >> different story - and I suspect this is a leftover from pre-memblock times.) > > It might be worth seeing if that is true, or if we need to generalize the > s390x code. I'll try to find some time to test if the s390x handler is still relevant. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-24 7:39 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-24 7:39 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 23.04.20 18:29, Eric W. Biederman wrote: > David Hildenbrand <david@redhat.com> writes: > >>> The confusing part was talking about memory being still in use, >>> that is actually scheduled for use in the future. >> >> +1 >> >>> >>>>> Usually somewhere in the loaded image >>>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>>> That will invalidate the memory map as well. >>>> >>>> Ah, unconditionally. Sure, x86 needs this. >>>> (arm64 re-discovers the memory map from firmware tables after kexec) >> >> Does this include hotplugged DIMMs e.g., under KVM? >> [...] > > As far as I know. If the memory map changes we need to drop the loaded > image. > > > Having thought about it a little more I suspect it would be the > other way and just block all hotplug actions after a kexec_load. > As all we expect to happen is running shutdown scripts. > > If blocking the hotplug action uses printk to print a nice message > saying something like: "Hotplug blocked because of a loaded kexec image", > then people will be able to figure out what is going on and > call kexec -u if they haven't started the shutdown scripts yet. > > > Either way it is something simple and unconditional that will make > things work. > Personally, I consider memory hotplug more important than keeping loaded kexec data alive (just because somebody once decided to do a "kexec -l" and never did a "kexec -e" we should not block any memory hot(un)plug - especially in virtualized environments - for all eternity). So IMHO we would invalidate loaded kexec data (not the crashkernel, of course) on memory hot(un)plug and print a warning. In addition, we can let kexec-tools try to reload whatever they loaded after getting notified that something changed. The "something changed" is visible to user space e.g., via udev events for /sys/devices/memory/memoryX/ >>>>> All of this should be for a very brief window of a few seconds, as >>>>> the loaded kexec image is quite short. >>>> >>>> It seems I'm the outlier anticipating anything could happen between >>>> those syscalls. >>> >>> The design is: >>> sys_kexec_load() >>> shutdown scripts >>> sys_reboot(LINUX_REBOOT_CMD_KEXEC); >>> >>> There are two system call simply so that the shutdown scripts can run. >>> Now maybe someone somewhere does something different but that is not >>> expected. >>> >>> Only the kexec on panic kernel is expected to persist somewhat >>> indefinitely. But that should be in memory that is reserved from boot >>> time, and so the memory hotplug should have enough visibility to not >>> allow that memory to be given up. >> >> Yes, and AFAIK, memory blocks which hold the reserved crashkernel area >> can usually not get offlined and, therefore, the memory cannot get removed. >> >> Interestingly, s390x even has a hotplug notifier for that >> >> arch/s390/kernel/setup.c:kdump_mem_notifier() >> >> (offlining of memory on s390x can result in memory getting depopulated >> in the hypervisor, so after it would have been offlined, it would no >> longer be accessible. I somewhat doubt that this notifier is really >> needed - all pages in the crashkernel area should look like ordinary >> allocated pages when the area is reserved early during boot via the >> memblock allocator, and therefore offlining cannot succeed. But that's a >> different story - and I suspect this is a leftover from pre-memblock times.) > > It might be worth seeing if that is true, or if we need to generalize the > s390x code. I'll try to find some time to test if the s390x handler is still relevant. -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-24 7:39 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-24 7:39 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 23.04.20 18:29, Eric W. Biederman wrote: > David Hildenbrand <david@redhat.com> writes: > >>> The confusing part was talking about memory being still in use, >>> that is actually scheduled for use in the future. >> >> +1 >> >>> >>>>> Usually somewhere in the loaded image >>>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>>> That will invalidate the memory map as well. >>>> >>>> Ah, unconditionally. Sure, x86 needs this. >>>> (arm64 re-discovers the memory map from firmware tables after kexec) >> >> Does this include hotplugged DIMMs e.g., under KVM? >> [...] > > As far as I know. If the memory map changes we need to drop the loaded > image. > > > Having thought about it a little more I suspect it would be the > other way and just block all hotplug actions after a kexec_load. > As all we expect to happen is running shutdown scripts. > > If blocking the hotplug action uses printk to print a nice message > saying something like: "Hotplug blocked because of a loaded kexec image", > then people will be able to figure out what is going on and > call kexec -u if they haven't started the shutdown scripts yet. > > > Either way it is something simple and unconditional that will make > things work. > Personally, I consider memory hotplug more important than keeping loaded kexec data alive (just because somebody once decided to do a "kexec -l" and never did a "kexec -e" we should not block any memory hot(un)plug - especially in virtualized environments - for all eternity). So IMHO we would invalidate loaded kexec data (not the crashkernel, of course) on memory hot(un)plug and print a warning. In addition, we can let kexec-tools try to reload whatever they loaded after getting notified that something changed. The "something changed" is visible to user space e.g., via udev events for /sys/devices/memory/memoryX/ >>>>> All of this should be for a very brief window of a few seconds, as >>>>> the loaded kexec image is quite short. >>>> >>>> It seems I'm the outlier anticipating anything could happen between >>>> those syscalls. >>> >>> The design is: >>> sys_kexec_load() >>> shutdown scripts >>> sys_reboot(LINUX_REBOOT_CMD_KEXEC); >>> >>> There are two system call simply so that the shutdown scripts can run. >>> Now maybe someone somewhere does something different but that is not >>> expected. >>> >>> Only the kexec on panic kernel is expected to persist somewhat >>> indefinitely. But that should be in memory that is reserved from boot >>> time, and so the memory hotplug should have enough visibility to not >>> allow that memory to be given up. >> >> Yes, and AFAIK, memory blocks which hold the reserved crashkernel area >> can usually not get offlined and, therefore, the memory cannot get removed. >> >> Interestingly, s390x even has a hotplug notifier for that >> >> arch/s390/kernel/setup.c:kdump_mem_notifier() >> >> (offlining of memory on s390x can result in memory getting depopulated >> in the hypervisor, so after it would have been offlined, it would no >> longer be accessible. I somewhat doubt that this notifier is really >> needed - all pages in the crashkernel area should look like ordinary >> allocated pages when the area is reserved early during boot via the >> memblock allocator, and therefore offlining cannot succeed. But that's a >> different story - and I suspect this is a leftover from pre-memblock times.) > > It might be worth seeing if that is true, or if we need to generalize the > s390x code. I'll try to find some time to test if the s390x handler is still relevant. -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-24 7:39 ` David Hildenbrand (?) @ 2020-04-24 7:41 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-24 7:41 UTC (permalink / raw) To: Eric W. Biederman Cc: James Morse, kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Andrew Morton, Will Deacon On 24.04.20 09:39, David Hildenbrand wrote: > On 23.04.20 18:29, Eric W. Biederman wrote: >> David Hildenbrand <david@redhat.com> writes: >> >>>> The confusing part was talking about memory being still in use, >>>> that is actually scheduled for use in the future. >>> >>> +1 >>> >>>> >>>>>> Usually somewhere in the loaded image >>>>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>>>> That will invalidate the memory map as well. >>>>> >>>>> Ah, unconditionally. Sure, x86 needs this. >>>>> (arm64 re-discovers the memory map from firmware tables after kexec) >>> >>> Does this include hotplugged DIMMs e.g., under KVM? >>> [...] >> >> As far as I know. If the memory map changes we need to drop the loaded >> image. >> >> >> Having thought about it a little more I suspect it would be the >> other way and just block all hotplug actions after a kexec_load. >> As all we expect to happen is running shutdown scripts. >> >> If blocking the hotplug action uses printk to print a nice message >> saying something like: "Hotplug blocked because of a loaded kexec image", >> then people will be able to figure out what is going on and >> call kexec -u if they haven't started the shutdown scripts yet. >> >> >> Either way it is something simple and unconditional that will make >> things work. >> > > Personally, I consider memory hotplug more important than keeping loaded > kexec data alive (just because somebody once decided to do a "kexec -l" > and never did a "kexec -e" we should not block any memory hot(un)plug - > especially in virtualized environments - for all eternity). > > So IMHO we would invalidate loaded kexec data (not the crashkernel, of > course) on memory hot(un)plug and print a warning. In addition, we can > let kexec-tools try to reload whatever they loaded after getting > notified that something changed. > > The "something changed" is visible to user space e.g., via udev events > for /sys/devices/memory/memoryX/ /sys/devices/system/memory/memoryX/ ... -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-24 7:41 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-24 7:41 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 24.04.20 09:39, David Hildenbrand wrote: > On 23.04.20 18:29, Eric W. Biederman wrote: >> David Hildenbrand <david@redhat.com> writes: >> >>>> The confusing part was talking about memory being still in use, >>>> that is actually scheduled for use in the future. >>> >>> +1 >>> >>>> >>>>>> Usually somewhere in the loaded image >>>>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>>>> That will invalidate the memory map as well. >>>>> >>>>> Ah, unconditionally. Sure, x86 needs this. >>>>> (arm64 re-discovers the memory map from firmware tables after kexec) >>> >>> Does this include hotplugged DIMMs e.g., under KVM? >>> [...] >> >> As far as I know. If the memory map changes we need to drop the loaded >> image. >> >> >> Having thought about it a little more I suspect it would be the >> other way and just block all hotplug actions after a kexec_load. >> As all we expect to happen is running shutdown scripts. >> >> If blocking the hotplug action uses printk to print a nice message >> saying something like: "Hotplug blocked because of a loaded kexec image", >> then people will be able to figure out what is going on and >> call kexec -u if they haven't started the shutdown scripts yet. >> >> >> Either way it is something simple and unconditional that will make >> things work. >> > > Personally, I consider memory hotplug more important than keeping loaded > kexec data alive (just because somebody once decided to do a "kexec -l" > and never did a "kexec -e" we should not block any memory hot(un)plug - > especially in virtualized environments - for all eternity). > > So IMHO we would invalidate loaded kexec data (not the crashkernel, of > course) on memory hot(un)plug and print a warning. In addition, we can > let kexec-tools try to reload whatever they loaded after getting > notified that something changed. > > The "something changed" is visible to user space e.g., via udev events > for /sys/devices/memory/memoryX/ /sys/devices/system/memory/memoryX/ ... -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-04-24 7:41 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-04-24 7:41 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, James Morse, Andrew Morton, Will Deacon, linux-arm-kernel On 24.04.20 09:39, David Hildenbrand wrote: > On 23.04.20 18:29, Eric W. Biederman wrote: >> David Hildenbrand <david@redhat.com> writes: >> >>>> The confusing part was talking about memory being still in use, >>>> that is actually scheduled for use in the future. >>> >>> +1 >>> >>>> >>>>>> Usually somewhere in the loaded image >>>>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>>>> That will invalidate the memory map as well. >>>>> >>>>> Ah, unconditionally. Sure, x86 needs this. >>>>> (arm64 re-discovers the memory map from firmware tables after kexec) >>> >>> Does this include hotplugged DIMMs e.g., under KVM? >>> [...] >> >> As far as I know. If the memory map changes we need to drop the loaded >> image. >> >> >> Having thought about it a little more I suspect it would be the >> other way and just block all hotplug actions after a kexec_load. >> As all we expect to happen is running shutdown scripts. >> >> If blocking the hotplug action uses printk to print a nice message >> saying something like: "Hotplug blocked because of a loaded kexec image", >> then people will be able to figure out what is going on and >> call kexec -u if they haven't started the shutdown scripts yet. >> >> >> Either way it is something simple and unconditional that will make >> things work. >> > > Personally, I consider memory hotplug more important than keeping loaded > kexec data alive (just because somebody once decided to do a "kexec -l" > and never did a "kexec -e" we should not block any memory hot(un)plug - > especially in virtualized environments - for all eternity). > > So IMHO we would invalidate loaded kexec data (not the crashkernel, of > course) on memory hot(un)plug and print a warning. In addition, we can > let kexec-tools try to reload whatever they loaded after getting > notified that something changed. > > The "something changed" is visible to user space e.g., via udev events > for /sys/devices/memory/memoryX/ /sys/devices/system/memory/memoryX/ ... -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image 2020-04-22 16:40 ` David Hildenbrand (?) @ 2020-05-01 16:55 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-05-01 16:55 UTC (permalink / raw) To: David Hildenbrand, Eric W. Biederman Cc: kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Andrew Morton, Will Deacon Hi guys, On 22/04/2020 17:40, David Hildenbrand wrote: >>>> Usually somewhere in the loaded image >>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>> That will invalidate the memory map as well. >>> >>> Ah, unconditionally. Sure, x86 needs this. >>> (arm64 re-discovers the memory map from firmware tables after kexec) > Does this include hotplugged DIMMs e.g., under KVM? If you advertise hotplugged memory to the guest using ACPI, yes. We don't have a practical mechanism to pass 'fact's about the platform between kernels, instead we rely on those facts being discoverable, or described by firmware. >>>> All of this should be for a very brief window of a few seconds, as >>>> the loaded kexec image is quite short. >>> >>> It seems I'm the outlier anticipating anything could happen between >>> those syscalls. >> >> The design is: >> sys_kexec_load() >> shutdown scripts >> sys_reboot(LINUX_REBOOT_CMD_KEXEC); >> >> There are two system call simply so that the shutdown scripts can run. >> Now maybe someone somewhere does something different but that is not >> expected. [...] > Yes, and AFAIK, memory blocks which hold the reserved crashkernel area > can usually not get offlined and, therefore, the memory cannot get removed. The crashkernel area on arm64 will always land in un-removable memory. We set PG_Reserved on it too. Thanks, James ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-05-01 16:55 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-05-01 16:55 UTC (permalink / raw) To: David Hildenbrand, Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel Hi guys, On 22/04/2020 17:40, David Hildenbrand wrote: >>>> Usually somewhere in the loaded image >>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>> That will invalidate the memory map as well. >>> >>> Ah, unconditionally. Sure, x86 needs this. >>> (arm64 re-discovers the memory map from firmware tables after kexec) > Does this include hotplugged DIMMs e.g., under KVM? If you advertise hotplugged memory to the guest using ACPI, yes. We don't have a practical mechanism to pass 'fact's about the platform between kernels, instead we rely on those facts being discoverable, or described by firmware. >>>> All of this should be for a very brief window of a few seconds, as >>>> the loaded kexec image is quite short. >>> >>> It seems I'm the outlier anticipating anything could happen between >>> those syscalls. >> >> The design is: >> sys_kexec_load() >> shutdown scripts >> sys_reboot(LINUX_REBOOT_CMD_KEXEC); >> >> There are two system call simply so that the shutdown scripts can run. >> Now maybe someone somewhere does something different but that is not >> expected. [...] > Yes, and AFAIK, memory blocks which hold the reserved crashkernel area > can usually not get offlined and, therefore, the memory cannot get removed. The crashkernel area on arm64 will always land in un-removable memory. We set PG_Reserved on it too. Thanks, James _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image @ 2020-05-01 16:55 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-05-01 16:55 UTC (permalink / raw) To: David Hildenbrand, Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel Hi guys, On 22/04/2020 17:40, David Hildenbrand wrote: >>>> Usually somewhere in the loaded image >>>> is a copy of the memory map at the time the kexec kernel was loaded. >>>> That will invalidate the memory map as well. >>> >>> Ah, unconditionally. Sure, x86 needs this. >>> (arm64 re-discovers the memory map from firmware tables after kexec) > Does this include hotplugged DIMMs e.g., under KVM? If you advertise hotplugged memory to the guest using ACPI, yes. We don't have a practical mechanism to pass 'fact's about the platform between kernels, instead we rely on those facts being discoverable, or described by firmware. >>>> All of this should be for a very brief window of a few seconds, as >>>> the loaded kexec image is quite short. >>> >>> It seems I'm the outlier anticipating anything could happen between >>> those syscalls. >> >> The design is: >> sys_kexec_load() >> shutdown scripts >> sys_reboot(LINUX_REBOOT_CMD_KEXEC); >> >> There are two system call simply so that the shutdown scripts can run. >> Now maybe someone somewhere does something different but that is not >> expected. [...] > Yes, and AFAIK, memory blocks which hold the reserved crashkernel area > can usually not get offlined and, therefore, the memory cannot get removed. The crashkernel area on arm64 will always land in un-removable memory. We set PG_Reserved on it too. Thanks, James _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names 2020-03-26 18:07 ` James Morse @ 2020-03-26 18:07 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-26 18:07 UTC (permalink / raw) To: kexec, linux-mm, linux-arm-kernel Cc: Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma, James Morse Memory added to the system by hotplug has a 'System RAM' resource created for it. This is exposed to user-space via /proc/iomem. This poses problems for kexec on arm64. If kexec decides to place the kernel in one of these newly onlined regions, the new kernel will find itself booting from a region not described as memory in the firmware tables. Arm64 doesn't have a structure like the e820 memory map that can be re-written when memory is brought online. Instead arm64 uses the UEFI memory map, or the memory node from the DT, sometimes both. We never rewrite these. Allow an architecture to specify a different name for these hotplug regions. Signed-off-by: James Morse <james.morse@arm.com> --- mm/memory_hotplug.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 0a54ffac8c68..69b03dd7fc74 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -42,6 +42,10 @@ #include "internal.h" #include "shuffle.h" +#ifndef MEMORY_HOTPLUG_RES_NAME +#define MEMORY_HOTPLUG_RES_NAME "System RAM" +#endif + /* * online_page_callback contains pointer to current page onlining function. * Initially it is generic_online_page(). If it is required it could be @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) { struct resource *res; unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; - char *resource_name = "System RAM"; + char *resource_name = MEMORY_HOTPLUG_RES_NAME; if (start + size > max_mem_size) return ERR_PTR(-E2BIG); -- 2.25.1 ^ permalink raw reply related [flat|nested] 264+ messages in thread
* [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names @ 2020-03-26 18:07 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-26 18:07 UTC (permalink / raw) To: kexec, linux-mm, linux-arm-kernel Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, James Morse, Eric Biederman, Andrew Morton, Will Deacon Memory added to the system by hotplug has a 'System RAM' resource created for it. This is exposed to user-space via /proc/iomem. This poses problems for kexec on arm64. If kexec decides to place the kernel in one of these newly onlined regions, the new kernel will find itself booting from a region not described as memory in the firmware tables. Arm64 doesn't have a structure like the e820 memory map that can be re-written when memory is brought online. Instead arm64 uses the UEFI memory map, or the memory node from the DT, sometimes both. We never rewrite these. Allow an architecture to specify a different name for these hotplug regions. Signed-off-by: James Morse <james.morse@arm.com> --- mm/memory_hotplug.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 0a54ffac8c68..69b03dd7fc74 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -42,6 +42,10 @@ #include "internal.h" #include "shuffle.h" +#ifndef MEMORY_HOTPLUG_RES_NAME +#define MEMORY_HOTPLUG_RES_NAME "System RAM" +#endif + /* * online_page_callback contains pointer to current page onlining function. * Initially it is generic_online_page(). If it is required it could be @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) { struct resource *res; unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; - char *resource_name = "System RAM"; + char *resource_name = MEMORY_HOTPLUG_RES_NAME; if (start + size > max_mem_size) return ERR_PTR(-E2BIG); -- 2.25.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names 2020-03-26 18:07 ` James Morse @ 2020-03-27 9:59 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-27 9:59 UTC (permalink / raw) To: James Morse, kexec, linux-mm, linux-arm-kernel Cc: Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma On 26.03.20 19:07, James Morse wrote: > Memory added to the system by hotplug has a 'System RAM' resource created > for it. This is exposed to user-space via /proc/iomem. > > This poses problems for kexec on arm64. If kexec decides to place the > kernel in one of these newly onlined regions, the new kernel will find > itself booting from a region not described as memory in the firmware > tables. > > Arm64 doesn't have a structure like the e820 memory map that can be > re-written when memory is brought online. Instead arm64 uses the UEFI > memory map, or the memory node from the DT, sometimes both. We never > rewrite these. > > Allow an architecture to specify a different name for these hotplug > regions. > > Signed-off-by: James Morse <james.morse@arm.com> > --- > mm/memory_hotplug.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 0a54ffac8c68..69b03dd7fc74 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -42,6 +42,10 @@ > #include "internal.h" > #include "shuffle.h" > > +#ifndef MEMORY_HOTPLUG_RES_NAME > +#define MEMORY_HOTPLUG_RES_NAME "System RAM" > +#endif So I assume changing this for all architectures would result in some user space tool breaking? Are we aware of any? I do wonder if we should simply change it for all architectures if possible. > + > /* > * online_page_callback contains pointer to current page onlining function. > * Initially it is generic_online_page(). If it is required it could be > @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) > { > struct resource *res; > unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > - char *resource_name = "System RAM"; > + char *resource_name = MEMORY_HOTPLUG_RES_NAME; > > if (start + size > max_mem_size) > return ERR_PTR(-E2BIG); > -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names @ 2020-03-27 9:59 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-27 9:59 UTC (permalink / raw) To: James Morse, kexec, linux-mm, linux-arm-kernel Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Eric Biederman, Andrew Morton, Will Deacon On 26.03.20 19:07, James Morse wrote: > Memory added to the system by hotplug has a 'System RAM' resource created > for it. This is exposed to user-space via /proc/iomem. > > This poses problems for kexec on arm64. If kexec decides to place the > kernel in one of these newly onlined regions, the new kernel will find > itself booting from a region not described as memory in the firmware > tables. > > Arm64 doesn't have a structure like the e820 memory map that can be > re-written when memory is brought online. Instead arm64 uses the UEFI > memory map, or the memory node from the DT, sometimes both. We never > rewrite these. > > Allow an architecture to specify a different name for these hotplug > regions. > > Signed-off-by: James Morse <james.morse@arm.com> > --- > mm/memory_hotplug.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 0a54ffac8c68..69b03dd7fc74 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -42,6 +42,10 @@ > #include "internal.h" > #include "shuffle.h" > > +#ifndef MEMORY_HOTPLUG_RES_NAME > +#define MEMORY_HOTPLUG_RES_NAME "System RAM" > +#endif So I assume changing this for all architectures would result in some user space tool breaking? Are we aware of any? I do wonder if we should simply change it for all architectures if possible. > + > /* > * online_page_callback contains pointer to current page onlining function. > * Initially it is generic_online_page(). If it is required it could be > @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) > { > struct resource *res; > unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > - char *resource_name = "System RAM"; > + char *resource_name = MEMORY_HOTPLUG_RES_NAME; > > if (start + size > max_mem_size) > return ERR_PTR(-E2BIG); > -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names 2020-03-27 9:59 ` David Hildenbrand @ 2020-03-27 15:39 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-27 15:39 UTC (permalink / raw) To: David Hildenbrand Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma Hi David, On 3/27/20 9:59 AM, David Hildenbrand wrote: > On 26.03.20 19:07, James Morse wrote: >> Memory added to the system by hotplug has a 'System RAM' resource created >> for it. This is exposed to user-space via /proc/iomem. >> >> This poses problems for kexec on arm64. If kexec decides to place the >> kernel in one of these newly onlined regions, the new kernel will find >> itself booting from a region not described as memory in the firmware >> tables. >> >> Arm64 doesn't have a structure like the e820 memory map that can be >> re-written when memory is brought online. Instead arm64 uses the UEFI >> memory map, or the memory node from the DT, sometimes both. We never >> rewrite these. >> >> Allow an architecture to specify a different name for these hotplug >> regions. >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 0a54ffac8c68..69b03dd7fc74 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -42,6 +42,10 @@ >> #include "internal.h" >> #include "shuffle.h" >> >> +#ifndef MEMORY_HOTPLUG_RES_NAME >> +#define MEMORY_HOTPLUG_RES_NAME "System RAM" >> +#endif > > So I assume changing this for all architectures would result in some > user space tool breaking? Are we aware of any? Last time we had to touch arm64's /proc/iomem strings I went through debian's codesearch for stuff that reads it, kexec-tools was the only thing that parsed it in anger. (From memory, the other tools were looking for PCIe windows to do firmware flashing..) Looking again, having qualifiers on the end of 'System RAM' looks like it could confuse 's390-tools's detect_mem_chunks parser. It looks like the strings that come out of 'FIRMWARE_MEMMAP' are a duplicate set. > I do wonder if we should simply change it for all architectures if possible. If its possible that would be great. But I suspect that ship has sailed, changing it on other architectures could break some fragile parsing code. I'm wary of changing it on arm64, the only thing that makes it tolerable is that memory hot-add was relatively recently merged, and we don't anticipate it being widely used until you can remove memory as well. Changing it on arm64 is to prevent today's versions of kexec-tools from accidentally placing the new kernel in memory that wasn't described at boot. This leads to an unhandled exception during boot[0] because the kernel can't access itself via the mapping of all memory. (hotpluggable regions are only discovered by suitably configured ACPI systems much later) Thanks, James [0] | NUMA: NODE_DATA [mem 0x7fdf1780-0x7fdf3fff] | Unable to handle kernel paging request at virtual address ffff00004230aff8 | Mem abort info: | ESR = 0x96000006 | EC = 0x25: DABT (current EL), IL = 32 bits | SET = 0, FnV = 0 | EA = 0, S1PTW = 0 | Data abort info: | ISV = 0, ISS = 0x00000006 | CM = 0, WnR = 0 | swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000008181d000 | [ffff00004230aff8] pgd=000000007fff9003, pud=000000007fdf7003, pmd=0000000000000000 | Internal error: Oops: 96000006 [#1] PREEMPT SMP | Modules linked in: | CPU: 0 PID: 0 Comm: swapper Not tainted 5.6.0-rc3-00098-g3f6c690f5dfe #11618 | Hardware name: linux,dummy-virt (DT) | pstate: 80400085 (Nzcv daIf +PAN -UAO BTYPE=--) | pc : vmemmap_pud_populate+0x2c/0xa0 | lr : vmemmap_populate+0x78/0x154 | Call trace: | vmemmap_pud_populate+0x2c/0xa0 | vmemmap_populate+0x78/0x154 | __populate_section_memmap+0x3c/0x60 | sparse_init_nid+0x29c/0x414 | sparse_init+0x154/0x170 | bootmem_init+0x78/0xdc | setup_arch+0x280/0x5d0 | start_kernel+0x98/0x4f8 | Code: f9469a84 92748e73 8b010e61 cb040033 (f9400261) | random: get_random_bytes called from print_oops_end_marker+0x34/0x60 with crng_init=0 | ---[ end trace 0000000000000000 ]--- | Kernel panic - not syncing: Attempted to kill the idle task! | ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names @ 2020-03-27 15:39 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-27 15:39 UTC (permalink / raw) To: David Hildenbrand Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel Hi David, On 3/27/20 9:59 AM, David Hildenbrand wrote: > On 26.03.20 19:07, James Morse wrote: >> Memory added to the system by hotplug has a 'System RAM' resource created >> for it. This is exposed to user-space via /proc/iomem. >> >> This poses problems for kexec on arm64. If kexec decides to place the >> kernel in one of these newly onlined regions, the new kernel will find >> itself booting from a region not described as memory in the firmware >> tables. >> >> Arm64 doesn't have a structure like the e820 memory map that can be >> re-written when memory is brought online. Instead arm64 uses the UEFI >> memory map, or the memory node from the DT, sometimes both. We never >> rewrite these. >> >> Allow an architecture to specify a different name for these hotplug >> regions. >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 0a54ffac8c68..69b03dd7fc74 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -42,6 +42,10 @@ >> #include "internal.h" >> #include "shuffle.h" >> >> +#ifndef MEMORY_HOTPLUG_RES_NAME >> +#define MEMORY_HOTPLUG_RES_NAME "System RAM" >> +#endif > > So I assume changing this for all architectures would result in some > user space tool breaking? Are we aware of any? Last time we had to touch arm64's /proc/iomem strings I went through debian's codesearch for stuff that reads it, kexec-tools was the only thing that parsed it in anger. (From memory, the other tools were looking for PCIe windows to do firmware flashing..) Looking again, having qualifiers on the end of 'System RAM' looks like it could confuse 's390-tools's detect_mem_chunks parser. It looks like the strings that come out of 'FIRMWARE_MEMMAP' are a duplicate set. > I do wonder if we should simply change it for all architectures if possible. If its possible that would be great. But I suspect that ship has sailed, changing it on other architectures could break some fragile parsing code. I'm wary of changing it on arm64, the only thing that makes it tolerable is that memory hot-add was relatively recently merged, and we don't anticipate it being widely used until you can remove memory as well. Changing it on arm64 is to prevent today's versions of kexec-tools from accidentally placing the new kernel in memory that wasn't described at boot. This leads to an unhandled exception during boot[0] because the kernel can't access itself via the mapping of all memory. (hotpluggable regions are only discovered by suitably configured ACPI systems much later) Thanks, James [0] | NUMA: NODE_DATA [mem 0x7fdf1780-0x7fdf3fff] | Unable to handle kernel paging request at virtual address ffff00004230aff8 | Mem abort info: | ESR = 0x96000006 | EC = 0x25: DABT (current EL), IL = 32 bits | SET = 0, FnV = 0 | EA = 0, S1PTW = 0 | Data abort info: | ISV = 0, ISS = 0x00000006 | CM = 0, WnR = 0 | swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000008181d000 | [ffff00004230aff8] pgd=000000007fff9003, pud=000000007fdf7003, pmd=0000000000000000 | Internal error: Oops: 96000006 [#1] PREEMPT SMP | Modules linked in: | CPU: 0 PID: 0 Comm: swapper Not tainted 5.6.0-rc3-00098-g3f6c690f5dfe #11618 | Hardware name: linux,dummy-virt (DT) | pstate: 80400085 (Nzcv daIf +PAN -UAO BTYPE=--) | pc : vmemmap_pud_populate+0x2c/0xa0 | lr : vmemmap_populate+0x78/0x154 | Call trace: | vmemmap_pud_populate+0x2c/0xa0 | vmemmap_populate+0x78/0x154 | __populate_section_memmap+0x3c/0x60 | sparse_init_nid+0x29c/0x414 | sparse_init+0x154/0x170 | bootmem_init+0x78/0xdc | setup_arch+0x280/0x5d0 | start_kernel+0x98/0x4f8 | Code: f9469a84 92748e73 8b010e61 cb040033 (f9400261) | random: get_random_bytes called from print_oops_end_marker+0x34/0x60 with crng_init=0 | ---[ end trace 0000000000000000 ]--- | Kernel panic - not syncing: Attempted to kill the idle task! | ---[ end Kernel panic - not syncing: Attempted to kill the idle task! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names 2020-03-27 15:39 ` James Morse @ 2020-03-30 13:23 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-30 13:23 UTC (permalink / raw) To: James Morse Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma, Vitaly Kuznetsov >>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >>> index 0a54ffac8c68..69b03dd7fc74 100644 >>> --- a/mm/memory_hotplug.c >>> +++ b/mm/memory_hotplug.c >>> @@ -42,6 +42,10 @@ >>> #include "internal.h" >>> #include "shuffle.h" >>> >>> +#ifndef MEMORY_HOTPLUG_RES_NAME >>> +#define MEMORY_HOTPLUG_RES_NAME "System RAM" >>> +#endif >> >> So I assume changing this for all architectures would result in some >> user space tool breaking? Are we aware of any? > > Last time we had to touch arm64's /proc/iomem strings I went through debian's > codesearch for stuff that reads it, kexec-tools was the only thing that parsed > it in anger. (From memory, the other tools were looking for PCIe windows to do > firmware flashing..) > > Looking again, having qualifiers on the end of 'System RAM' looks like it could > confuse 's390-tools's detect_mem_chunks parser. Good to know, we should find out if this could work. > > It looks like the strings that come out of 'FIRMWARE_MEMMAP' are a duplicate set. > > >> I do wonder if we should simply change it for all architectures if possible. > > If its possible that would be great. But I suspect that ship has sailed, > changing it on other architectures could break some fragile parsing code. I assume any parser has to be prepared for new types showing up. Otherwise these would not be future proof. The question is if a common prefix is problematic. E.g., Use "Hotplugged System RAM" instead of "System RAM (hotplug)" > > I'm wary of changing it on arm64, the only thing that makes it tolerable is that > memory hot-add was relatively recently merged, and we don't anticipate it being > widely used until you can remove memory as well. > > Changing it on arm64 is to prevent today's versions of kexec-tools from > accidentally placing the new kernel in memory that wasn't described at boot. > This leads to an unhandled exception during boot[0] because the kernel can't > access itself via the mapping of all memory. (hotpluggable regions are only > discovered by suitably configured ACPI systems much later) I want the very same for virtio-mem (initially x86-only, but later open for all archs). Can also be interesting for Hyper-V. kexec should not try to use hotplugged memory as kexec target, as memory blocks can be partially inaccessible. Of course, I can provide an interface to override the name via add_memory(), but having it on all architectures handled in a similar way right from the start would be nicer. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names @ 2020-03-30 13:23 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-30 13:23 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, Vitaly Kuznetsov, linux-arm-kernel >>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >>> index 0a54ffac8c68..69b03dd7fc74 100644 >>> --- a/mm/memory_hotplug.c >>> +++ b/mm/memory_hotplug.c >>> @@ -42,6 +42,10 @@ >>> #include "internal.h" >>> #include "shuffle.h" >>> >>> +#ifndef MEMORY_HOTPLUG_RES_NAME >>> +#define MEMORY_HOTPLUG_RES_NAME "System RAM" >>> +#endif >> >> So I assume changing this for all architectures would result in some >> user space tool breaking? Are we aware of any? > > Last time we had to touch arm64's /proc/iomem strings I went through debian's > codesearch for stuff that reads it, kexec-tools was the only thing that parsed > it in anger. (From memory, the other tools were looking for PCIe windows to do > firmware flashing..) > > Looking again, having qualifiers on the end of 'System RAM' looks like it could > confuse 's390-tools's detect_mem_chunks parser. Good to know, we should find out if this could work. > > It looks like the strings that come out of 'FIRMWARE_MEMMAP' are a duplicate set. > > >> I do wonder if we should simply change it for all architectures if possible. > > If its possible that would be great. But I suspect that ship has sailed, > changing it on other architectures could break some fragile parsing code. I assume any parser has to be prepared for new types showing up. Otherwise these would not be future proof. The question is if a common prefix is problematic. E.g., Use "Hotplugged System RAM" instead of "System RAM (hotplug)" > > I'm wary of changing it on arm64, the only thing that makes it tolerable is that > memory hot-add was relatively recently merged, and we don't anticipate it being > widely used until you can remove memory as well. > > Changing it on arm64 is to prevent today's versions of kexec-tools from > accidentally placing the new kernel in memory that wasn't described at boot. > This leads to an unhandled exception during boot[0] because the kernel can't > access itself via the mapping of all memory. (hotpluggable regions are only > discovered by suitably configured ACPI systems much later) I want the very same for virtio-mem (initially x86-only, but later open for all archs). Can also be interesting for Hyper-V. kexec should not try to use hotplugged memory as kexec target, as memory blocks can be partially inaccessible. Of course, I can provide an interface to override the name via add_memory(), but having it on all architectures handled in a similar way right from the start would be nicer. -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names 2020-03-30 13:23 ` David Hildenbrand @ 2020-03-30 17:17 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-30 17:17 UTC (permalink / raw) To: David Hildenbrand Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma, Vitaly Kuznetsov Hi David, On 3/30/20 2:23 PM, David Hildenbrand wrote: >>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >>>> index 0a54ffac8c68..69b03dd7fc74 100644 >>>> --- a/mm/memory_hotplug.c >>>> +++ b/mm/memory_hotplug.c >>>> @@ -42,6 +42,10 @@ >>>> #include "internal.h" >>>> #include "shuffle.h" >>>> >>>> +#ifndef MEMORY_HOTPLUG_RES_NAME >>>> +#define MEMORY_HOTPLUG_RES_NAME "System RAM" >>>> +#endif >>> >>> So I assume changing this for all architectures would result in some >>> user space tool breaking? Are we aware of any? >> >> Last time we had to touch arm64's /proc/iomem strings I went through debian's >> codesearch for stuff that reads it, kexec-tools was the only thing that parsed >> it in anger. (From memory, the other tools were looking for PCIe windows to do >> firmware flashing..) >> >> Looking again, having qualifiers on the end of 'System RAM' looks like it could >> confuse 's390-tools's detect_mem_chunks parser. > > Good to know, we should find out if this could work. > >> >> It looks like the strings that come out of 'FIRMWARE_MEMMAP' are a duplicate set. >> >> >>> I do wonder if we should simply change it for all architectures if possible. >> >> If its possible that would be great. But I suspect that ship has sailed, >> changing it on other architectures could break some fragile parsing code. > > I assume any parser has to be prepared for new types showing up. > Otherwise these would not be future proof. The question is if a common > prefix is problematic. > > E.g., Use "Hotplugged System RAM" instead of "System RAM (hotplug)" Aha, I went for a (suffix) because that is what 32bit Arm did for the boot alias. >> I'm wary of changing it on arm64, the only thing that makes it tolerable is that >> memory hot-add was relatively recently merged, and we don't anticipate it being >> widely used until you can remove memory as well. >> >> Changing it on arm64 is to prevent today's versions of kexec-tools from >> accidentally placing the new kernel in memory that wasn't described at boot. >> This leads to an unhandled exception during boot[0] because the kernel can't >> access itself via the mapping of all memory. (hotpluggable regions are only >> discovered by suitably configured ACPI systems much later) > I want the very same for virtio-mem (initially x86-only, but later open > for all archs). Can also be interesting for Hyper-V. kexec should not > try to use hotplugged memory as kexec target, as memory blocks can be > partially inaccessible. Great! I assumed these placement requirements would be arm64 specific. > Of course, I can provide an interface to override the name via > add_memory(), but having it on all architectures handled in a similar > way right from the start would be nicer. I agree having it the same on all architectures would be good. It sounds like virtio-mem is a better argument for doing this than arm64's firmware memory description. I'll have a read, and maybe post something to linux-arch to do this at rc1. (I assume we'll have a few weeks to make sure arm64 at least uses the same name if it goes on longer) Thanks, James ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names @ 2020-03-30 17:17 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-30 17:17 UTC (permalink / raw) To: David Hildenbrand Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, Vitaly Kuznetsov, linux-arm-kernel Hi David, On 3/30/20 2:23 PM, David Hildenbrand wrote: >>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >>>> index 0a54ffac8c68..69b03dd7fc74 100644 >>>> --- a/mm/memory_hotplug.c >>>> +++ b/mm/memory_hotplug.c >>>> @@ -42,6 +42,10 @@ >>>> #include "internal.h" >>>> #include "shuffle.h" >>>> >>>> +#ifndef MEMORY_HOTPLUG_RES_NAME >>>> +#define MEMORY_HOTPLUG_RES_NAME "System RAM" >>>> +#endif >>> >>> So I assume changing this for all architectures would result in some >>> user space tool breaking? Are we aware of any? >> >> Last time we had to touch arm64's /proc/iomem strings I went through debian's >> codesearch for stuff that reads it, kexec-tools was the only thing that parsed >> it in anger. (From memory, the other tools were looking for PCIe windows to do >> firmware flashing..) >> >> Looking again, having qualifiers on the end of 'System RAM' looks like it could >> confuse 's390-tools's detect_mem_chunks parser. > > Good to know, we should find out if this could work. > >> >> It looks like the strings that come out of 'FIRMWARE_MEMMAP' are a duplicate set. >> >> >>> I do wonder if we should simply change it for all architectures if possible. >> >> If its possible that would be great. But I suspect that ship has sailed, >> changing it on other architectures could break some fragile parsing code. > > I assume any parser has to be prepared for new types showing up. > Otherwise these would not be future proof. The question is if a common > prefix is problematic. > > E.g., Use "Hotplugged System RAM" instead of "System RAM (hotplug)" Aha, I went for a (suffix) because that is what 32bit Arm did for the boot alias. >> I'm wary of changing it on arm64, the only thing that makes it tolerable is that >> memory hot-add was relatively recently merged, and we don't anticipate it being >> widely used until you can remove memory as well. >> >> Changing it on arm64 is to prevent today's versions of kexec-tools from >> accidentally placing the new kernel in memory that wasn't described at boot. >> This leads to an unhandled exception during boot[0] because the kernel can't >> access itself via the mapping of all memory. (hotpluggable regions are only >> discovered by suitably configured ACPI systems much later) > I want the very same for virtio-mem (initially x86-only, but later open > for all archs). Can also be interesting for Hyper-V. kexec should not > try to use hotplugged memory as kexec target, as memory blocks can be > partially inaccessible. Great! I assumed these placement requirements would be arm64 specific. > Of course, I can provide an interface to override the name via > add_memory(), but having it on all architectures handled in a similar > way right from the start would be nicer. I agree having it the same on all architectures would be good. It sounds like virtio-mem is a better argument for doing this than arm64's firmware memory description. I'll have a read, and maybe post something to linux-arch to do this at rc1. (I assume we'll have a few weeks to make sure arm64 at least uses the same name if it goes on longer) Thanks, James _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names 2020-03-26 18:07 ` James Morse (?) @ 2020-04-02 5:49 ` Dave Young -1 siblings, 0 replies; 264+ messages in thread From: Dave Young @ 2020-04-02 5:49 UTC (permalink / raw) To: James Morse Cc: kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Eric Biederman, Andrew Morton, Will Deacon, piliu, Hari Bathini On 03/26/20 at 06:07pm, James Morse wrote: > Memory added to the system by hotplug has a 'System RAM' resource created > for it. This is exposed to user-space via /proc/iomem. > > This poses problems for kexec on arm64. If kexec decides to place the > kernel in one of these newly onlined regions, the new kernel will find > itself booting from a region not described as memory in the firmware > tables. > > Arm64 doesn't have a structure like the e820 memory map that can be > re-written when memory is brought online. Instead arm64 uses the UEFI > memory map, or the memory node from the DT, sometimes both. We never > rewrite these. Could arm64 use similar way to update DT, or a cooked UEFI maps? Add pingfan in cc, he said ppc64 update the DT after a memremove thus it would be good to just redo a kexec load. Added Pingfan and Hari for comments and corrections. > > Allow an architecture to specify a different name for these hotplug > regions. > > Signed-off-by: James Morse <james.morse@arm.com> > --- > mm/memory_hotplug.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 0a54ffac8c68..69b03dd7fc74 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -42,6 +42,10 @@ > #include "internal.h" > #include "shuffle.h" > > +#ifndef MEMORY_HOTPLUG_RES_NAME > +#define MEMORY_HOTPLUG_RES_NAME "System RAM" > +#endif > + > /* > * online_page_callback contains pointer to current page onlining function. > * Initially it is generic_online_page(). If it is required it could be > @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) > { > struct resource *res; > unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > - char *resource_name = "System RAM"; > + char *resource_name = MEMORY_HOTPLUG_RES_NAME; > > if (start + size > max_mem_size) > return ERR_PTR(-E2BIG); > -- > 2.25.1 > > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec > ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names @ 2020-04-02 5:49 ` Dave Young 0 siblings, 0 replies; 264+ messages in thread From: Dave Young @ 2020-04-02 5:49 UTC (permalink / raw) To: James Morse Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Hari Bathini, Andrew Morton, Will Deacon, linux-arm-kernel On 03/26/20 at 06:07pm, James Morse wrote: > Memory added to the system by hotplug has a 'System RAM' resource created > for it. This is exposed to user-space via /proc/iomem. > > This poses problems for kexec on arm64. If kexec decides to place the > kernel in one of these newly onlined regions, the new kernel will find > itself booting from a region not described as memory in the firmware > tables. > > Arm64 doesn't have a structure like the e820 memory map that can be > re-written when memory is brought online. Instead arm64 uses the UEFI > memory map, or the memory node from the DT, sometimes both. We never > rewrite these. Could arm64 use similar way to update DT, or a cooked UEFI maps? Add pingfan in cc, he said ppc64 update the DT after a memremove thus it would be good to just redo a kexec load. Added Pingfan and Hari for comments and corrections. > > Allow an architecture to specify a different name for these hotplug > regions. > > Signed-off-by: James Morse <james.morse@arm.com> > --- > mm/memory_hotplug.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 0a54ffac8c68..69b03dd7fc74 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -42,6 +42,10 @@ > #include "internal.h" > #include "shuffle.h" > > +#ifndef MEMORY_HOTPLUG_RES_NAME > +#define MEMORY_HOTPLUG_RES_NAME "System RAM" > +#endif > + > /* > * online_page_callback contains pointer to current page onlining function. > * Initially it is generic_online_page(). If it is required it could be > @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) > { > struct resource *res; > unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > - char *resource_name = "System RAM"; > + char *resource_name = MEMORY_HOTPLUG_RES_NAME; > > if (start + size > max_mem_size) > return ERR_PTR(-E2BIG); > -- > 2.25.1 > > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec > _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names @ 2020-04-02 5:49 ` Dave Young 0 siblings, 0 replies; 264+ messages in thread From: Dave Young @ 2020-04-02 5:49 UTC (permalink / raw) To: James Morse Cc: piliu, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Hari Bathini, Andrew Morton, Will Deacon, linux-arm-kernel On 03/26/20 at 06:07pm, James Morse wrote: > Memory added to the system by hotplug has a 'System RAM' resource created > for it. This is exposed to user-space via /proc/iomem. > > This poses problems for kexec on arm64. If kexec decides to place the > kernel in one of these newly onlined regions, the new kernel will find > itself booting from a region not described as memory in the firmware > tables. > > Arm64 doesn't have a structure like the e820 memory map that can be > re-written when memory is brought online. Instead arm64 uses the UEFI > memory map, or the memory node from the DT, sometimes both. We never > rewrite these. Could arm64 use similar way to update DT, or a cooked UEFI maps? Add pingfan in cc, he said ppc64 update the DT after a memremove thus it would be good to just redo a kexec load. Added Pingfan and Hari for comments and corrections. > > Allow an architecture to specify a different name for these hotplug > regions. > > Signed-off-by: James Morse <james.morse@arm.com> > --- > mm/memory_hotplug.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 0a54ffac8c68..69b03dd7fc74 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -42,6 +42,10 @@ > #include "internal.h" > #include "shuffle.h" > > +#ifndef MEMORY_HOTPLUG_RES_NAME > +#define MEMORY_HOTPLUG_RES_NAME "System RAM" > +#endif > + > /* > * online_page_callback contains pointer to current page onlining function. > * Initially it is generic_online_page(). If it is required it could be > @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) > { > struct resource *res; > unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > - char *resource_name = "System RAM"; > + char *resource_name = MEMORY_HOTPLUG_RES_NAME; > > if (start + size > max_mem_size) > return ERR_PTR(-E2BIG); > -- > 2.25.1 > > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names 2020-04-02 5:49 ` Dave Young (?) @ 2020-04-02 6:12 ` piliu -1 siblings, 0 replies; 264+ messages in thread From: piliu @ 2020-04-02 6:12 UTC (permalink / raw) To: Dave Young, James Morse Cc: kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Eric Biederman, Andrew Morton, Will Deacon, Hari Bathini On 04/02/2020 01:49 PM, Dave Young wrote: > On 03/26/20 at 06:07pm, James Morse wrote: >> Memory added to the system by hotplug has a 'System RAM' resource created >> for it. This is exposed to user-space via /proc/iomem. >> >> This poses problems for kexec on arm64. If kexec decides to place the >> kernel in one of these newly onlined regions, the new kernel will find >> itself booting from a region not described as memory in the firmware >> tables. >> >> Arm64 doesn't have a structure like the e820 memory map that can be >> re-written when memory is brought online. Instead arm64 uses the UEFI >> memory map, or the memory node from the DT, sometimes both. We never >> rewrite these. > > Could arm64 use similar way to update DT, or a cooked UEFI maps? > > Add pingfan in cc, he said ppc64 update the DT after a memremove thus it > would be good to just redo a kexec load. > Yes, the memory changes will be observed through device-node under /proc/device-tree/ (which is for powerpc). Later if running kexec -l/-p , it can build new dtb with the latest info from /proc/device-tree > Added Pingfan and Hari for comments and corrections. > >> >> Allow an architecture to specify a different name for these hotplug >> regions. >> >> Signed-off-by: James Morse <james.morse@arm.com> >> --- >> mm/memory_hotplug.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 0a54ffac8c68..69b03dd7fc74 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -42,6 +42,10 @@ >> #include "internal.h" >> #include "shuffle.h" >> >> +#ifndef MEMORY_HOTPLUG_RES_NAME >> +#define MEMORY_HOTPLUG_RES_NAME "System RAM" >> +#endif >> + >> /* >> * online_page_callback contains pointer to current page onlining function. >> * Initially it is generic_online_page(). If it is required it could be >> @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) >> { >> struct resource *res; >> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >> - char *resource_name = "System RAM"; >> + char *resource_name = MEMORY_HOTPLUG_RES_NAME; >> >> if (start + size > max_mem_size) >> return ERR_PTR(-E2BIG); >> -- >> 2.25.1 >> >> >> _______________________________________________ >> kexec mailing list >> kexec@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/kexec >> ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names @ 2020-04-02 6:12 ` piliu 0 siblings, 0 replies; 264+ messages in thread From: piliu @ 2020-04-02 6:12 UTC (permalink / raw) To: Dave Young, James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Hari Bathini, Andrew Morton, Will Deacon, linux-arm-kernel On 04/02/2020 01:49 PM, Dave Young wrote: > On 03/26/20 at 06:07pm, James Morse wrote: >> Memory added to the system by hotplug has a 'System RAM' resource created >> for it. This is exposed to user-space via /proc/iomem. >> >> This poses problems for kexec on arm64. If kexec decides to place the >> kernel in one of these newly onlined regions, the new kernel will find >> itself booting from a region not described as memory in the firmware >> tables. >> >> Arm64 doesn't have a structure like the e820 memory map that can be >> re-written when memory is brought online. Instead arm64 uses the UEFI >> memory map, or the memory node from the DT, sometimes both. We never >> rewrite these. > > Could arm64 use similar way to update DT, or a cooked UEFI maps? > > Add pingfan in cc, he said ppc64 update the DT after a memremove thus it > would be good to just redo a kexec load. > Yes, the memory changes will be observed through device-node under /proc/device-tree/ (which is for powerpc). Later if running kexec -l/-p , it can build new dtb with the latest info from /proc/device-tree > Added Pingfan and Hari for comments and corrections. > >> >> Allow an architecture to specify a different name for these hotplug >> regions. >> >> Signed-off-by: James Morse <james.morse@arm.com> >> --- >> mm/memory_hotplug.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 0a54ffac8c68..69b03dd7fc74 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -42,6 +42,10 @@ >> #include "internal.h" >> #include "shuffle.h" >> >> +#ifndef MEMORY_HOTPLUG_RES_NAME >> +#define MEMORY_HOTPLUG_RES_NAME "System RAM" >> +#endif >> + >> /* >> * online_page_callback contains pointer to current page onlining function. >> * Initially it is generic_online_page(). If it is required it could be >> @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) >> { >> struct resource *res; >> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >> - char *resource_name = "System RAM"; >> + char *resource_name = MEMORY_HOTPLUG_RES_NAME; >> >> if (start + size > max_mem_size) >> return ERR_PTR(-E2BIG); >> -- >> 2.25.1 >> >> >> _______________________________________________ >> kexec mailing list >> kexec@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/kexec >> _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names @ 2020-04-02 6:12 ` piliu 0 siblings, 0 replies; 264+ messages in thread From: piliu @ 2020-04-02 6:12 UTC (permalink / raw) To: Dave Young, James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Hari Bathini, Andrew Morton, Will Deacon, linux-arm-kernel On 04/02/2020 01:49 PM, Dave Young wrote: > On 03/26/20 at 06:07pm, James Morse wrote: >> Memory added to the system by hotplug has a 'System RAM' resource created >> for it. This is exposed to user-space via /proc/iomem. >> >> This poses problems for kexec on arm64. If kexec decides to place the >> kernel in one of these newly onlined regions, the new kernel will find >> itself booting from a region not described as memory in the firmware >> tables. >> >> Arm64 doesn't have a structure like the e820 memory map that can be >> re-written when memory is brought online. Instead arm64 uses the UEFI >> memory map, or the memory node from the DT, sometimes both. We never >> rewrite these. > > Could arm64 use similar way to update DT, or a cooked UEFI maps? > > Add pingfan in cc, he said ppc64 update the DT after a memremove thus it > would be good to just redo a kexec load. > Yes, the memory changes will be observed through device-node under /proc/device-tree/ (which is for powerpc). Later if running kexec -l/-p , it can build new dtb with the latest info from /proc/device-tree > Added Pingfan and Hari for comments and corrections. > >> >> Allow an architecture to specify a different name for these hotplug >> regions. >> >> Signed-off-by: James Morse <james.morse@arm.com> >> --- >> mm/memory_hotplug.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c >> index 0a54ffac8c68..69b03dd7fc74 100644 >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -42,6 +42,10 @@ >> #include "internal.h" >> #include "shuffle.h" >> >> +#ifndef MEMORY_HOTPLUG_RES_NAME >> +#define MEMORY_HOTPLUG_RES_NAME "System RAM" >> +#endif >> + >> /* >> * online_page_callback contains pointer to current page onlining function. >> * Initially it is generic_online_page(). If it is required it could be >> @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) >> { >> struct resource *res; >> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >> - char *resource_name = "System RAM"; >> + char *resource_name = MEMORY_HOTPLUG_RES_NAME; >> >> if (start + size > max_mem_size) >> return ERR_PTR(-E2BIG); >> -- >> 2.25.1 >> >> >> _______________________________________________ >> kexec mailing list >> kexec@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/kexec >> _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names 2020-04-02 6:12 ` piliu (?) @ 2020-04-14 17:21 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-14 17:21 UTC (permalink / raw) To: piliu Cc: Dave Young, kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Eric Biederman, Andrew Morton, Will Deacon, Hari Bathini Hi Dave, Pingfan, On 02/04/2020 07:12, piliu wrote: > On 04/02/2020 01:49 PM, Dave Young wrote: >> On 03/26/20 at 06:07pm, James Morse wrote: >>> Memory added to the system by hotplug has a 'System RAM' resource created >>> for it. This is exposed to user-space via /proc/iomem. >>> >>> This poses problems for kexec on arm64. If kexec decides to place the >>> kernel in one of these newly onlined regions, the new kernel will find >>> itself booting from a region not described as memory in the firmware >>> tables. >>> >>> Arm64 doesn't have a structure like the e820 memory map that can be >>> re-written when memory is brought online. Instead arm64 uses the UEFI >>> memory map, or the memory node from the DT, sometimes both. We never >>> rewrite these. >> >> Could arm64 use similar way to update DT, or a cooked UEFI maps? >> Add pingfan in cc, he said ppc64 update the DT after a memremove thus it >> would be good to just redo a kexec load. > Yes, the memory changes will be observed through device-node under > /proc/device-tree/ (which is for powerpc). > > Later if running kexec -l/-p , it can build new dtb with the latest info > from /proc/device-tree For arm64, the device-tree is set in stone. We don't have the runtime parts of open-firmware that powerpc does. (my knowledge in this area is extremely sparse) arm64 platforms where stuff like this changes tend to use ACPI instead, and these all have to boot with UEFI, which means its the UEFI memory map that has authority. We don't cook a fake UEFI memory map when things change because we treat it like the set-in-stone DT. This means we only have discrepancies in firmware to workaround, instead of any we introduce ourselves. One of the UEFI configuration tables describes addresses Linux programmed into hardware that can't be reset. Newer versions of Linux know how to pick these up on kexec... but older versions don't know how to parse/rewrite/move that table. Cooking up new versions of these tables would prevent us doing stuff like this, which we need to workaround hardware that didn't get the 'kexec exists' memo. Thanks, James ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names @ 2020-04-14 17:21 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-14 17:21 UTC (permalink / raw) To: piliu Cc: Will Deacon, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Hari Bathini, Andrew Morton, Dave Young, linux-arm-kernel Hi Dave, Pingfan, On 02/04/2020 07:12, piliu wrote: > On 04/02/2020 01:49 PM, Dave Young wrote: >> On 03/26/20 at 06:07pm, James Morse wrote: >>> Memory added to the system by hotplug has a 'System RAM' resource created >>> for it. This is exposed to user-space via /proc/iomem. >>> >>> This poses problems for kexec on arm64. If kexec decides to place the >>> kernel in one of these newly onlined regions, the new kernel will find >>> itself booting from a region not described as memory in the firmware >>> tables. >>> >>> Arm64 doesn't have a structure like the e820 memory map that can be >>> re-written when memory is brought online. Instead arm64 uses the UEFI >>> memory map, or the memory node from the DT, sometimes both. We never >>> rewrite these. >> >> Could arm64 use similar way to update DT, or a cooked UEFI maps? >> Add pingfan in cc, he said ppc64 update the DT after a memremove thus it >> would be good to just redo a kexec load. > Yes, the memory changes will be observed through device-node under > /proc/device-tree/ (which is for powerpc). > > Later if running kexec -l/-p , it can build new dtb with the latest info > from /proc/device-tree For arm64, the device-tree is set in stone. We don't have the runtime parts of open-firmware that powerpc does. (my knowledge in this area is extremely sparse) arm64 platforms where stuff like this changes tend to use ACPI instead, and these all have to boot with UEFI, which means its the UEFI memory map that has authority. We don't cook a fake UEFI memory map when things change because we treat it like the set-in-stone DT. This means we only have discrepancies in firmware to workaround, instead of any we introduce ourselves. One of the UEFI configuration tables describes addresses Linux programmed into hardware that can't be reset. Newer versions of Linux know how to pick these up on kexec... but older versions don't know how to parse/rewrite/move that table. Cooking up new versions of these tables would prevent us doing stuff like this, which we need to workaround hardware that didn't get the 'kexec exists' memo. Thanks, James _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names @ 2020-04-14 17:21 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-14 17:21 UTC (permalink / raw) To: piliu Cc: Will Deacon, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Hari Bathini, Andrew Morton, Dave Young, linux-arm-kernel Hi Dave, Pingfan, On 02/04/2020 07:12, piliu wrote: > On 04/02/2020 01:49 PM, Dave Young wrote: >> On 03/26/20 at 06:07pm, James Morse wrote: >>> Memory added to the system by hotplug has a 'System RAM' resource created >>> for it. This is exposed to user-space via /proc/iomem. >>> >>> This poses problems for kexec on arm64. If kexec decides to place the >>> kernel in one of these newly onlined regions, the new kernel will find >>> itself booting from a region not described as memory in the firmware >>> tables. >>> >>> Arm64 doesn't have a structure like the e820 memory map that can be >>> re-written when memory is brought online. Instead arm64 uses the UEFI >>> memory map, or the memory node from the DT, sometimes both. We never >>> rewrite these. >> >> Could arm64 use similar way to update DT, or a cooked UEFI maps? >> Add pingfan in cc, he said ppc64 update the DT after a memremove thus it >> would be good to just redo a kexec load. > Yes, the memory changes will be observed through device-node under > /proc/device-tree/ (which is for powerpc). > > Later if running kexec -l/-p , it can build new dtb with the latest info > from /proc/device-tree For arm64, the device-tree is set in stone. We don't have the runtime parts of open-firmware that powerpc does. (my knowledge in this area is extremely sparse) arm64 platforms where stuff like this changes tend to use ACPI instead, and these all have to boot with UEFI, which means its the UEFI memory map that has authority. We don't cook a fake UEFI memory map when things change because we treat it like the set-in-stone DT. This means we only have discrepancies in firmware to workaround, instead of any we introduce ourselves. One of the UEFI configuration tables describes addresses Linux programmed into hardware that can't be reset. Newer versions of Linux know how to pick these up on kexec... but older versions don't know how to parse/rewrite/move that table. Cooking up new versions of these tables would prevent us doing stuff like this, which we need to workaround hardware that didn't get the 'kexec exists' memo. Thanks, James _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names 2020-03-26 18:07 ` James Morse (?) @ 2020-04-15 20:36 ` Eric W. Biederman -1 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-15 20:36 UTC (permalink / raw) To: James Morse Cc: kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Andrew Morton, Will Deacon James Morse <james.morse@arm.com> writes: > Memory added to the system by hotplug has a 'System RAM' resource created > for it. This is exposed to user-space via /proc/iomem. > > This poses problems for kexec on arm64. If kexec decides to place the > kernel in one of these newly onlined regions, the new kernel will find > itself booting from a region not described as memory in the firmware > tables. > > Arm64 doesn't have a structure like the e820 memory map that can be > re-written when memory is brought online. Instead arm64 uses the UEFI > memory map, or the memory node from the DT, sometimes both. We never > rewrite these. > > Allow an architecture to specify a different name for these hotplug > regions. Gah. No. Please find a way to pass the current memory map to the loaded kexec'd kernel. Starting a kernel with no way for it to know what the current memory map is just plain scary. Eric ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names @ 2020-04-15 20:36 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-15 20:36 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel James Morse <james.morse@arm.com> writes: > Memory added to the system by hotplug has a 'System RAM' resource created > for it. This is exposed to user-space via /proc/iomem. > > This poses problems for kexec on arm64. If kexec decides to place the > kernel in one of these newly onlined regions, the new kernel will find > itself booting from a region not described as memory in the firmware > tables. > > Arm64 doesn't have a structure like the e820 memory map that can be > re-written when memory is brought online. Instead arm64 uses the UEFI > memory map, or the memory node from the DT, sometimes both. We never > rewrite these. > > Allow an architecture to specify a different name for these hotplug > regions. Gah. No. Please find a way to pass the current memory map to the loaded kexec'd kernel. Starting a kernel with no way for it to know what the current memory map is just plain scary. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names @ 2020-04-15 20:36 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-15 20:36 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel James Morse <james.morse@arm.com> writes: > Memory added to the system by hotplug has a 'System RAM' resource created > for it. This is exposed to user-space via /proc/iomem. > > This poses problems for kexec on arm64. If kexec decides to place the > kernel in one of these newly onlined regions, the new kernel will find > itself booting from a region not described as memory in the firmware > tables. > > Arm64 doesn't have a structure like the e820 memory map that can be > re-written when memory is brought online. Instead arm64 uses the UEFI > memory map, or the memory node from the DT, sometimes both. We never > rewrite these. > > Allow an architecture to specify a different name for these hotplug > regions. Gah. No. Please find a way to pass the current memory map to the loaded kexec'd kernel. Starting a kernel with no way for it to know what the current memory map is just plain scary. Eric _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names 2020-04-15 20:36 ` Eric W. Biederman (?) @ 2020-04-22 12:14 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-22 12:14 UTC (permalink / raw) To: Eric W. Biederman Cc: kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Andrew Morton, Will Deacon Hi Eric, On 15/04/2020 21:36, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: > >> Memory added to the system by hotplug has a 'System RAM' resource created >> for it. This is exposed to user-space via /proc/iomem. >> >> This poses problems for kexec on arm64. If kexec decides to place the >> kernel in one of these newly onlined regions, the new kernel will find >> itself booting from a region not described as memory in the firmware >> tables. >> >> Arm64 doesn't have a structure like the e820 memory map that can be >> re-written when memory is brought online. Instead arm64 uses the UEFI >> memory map, or the memory node from the DT, sometimes both. We never >> rewrite these. >> >> Allow an architecture to specify a different name for these hotplug >> regions. > > Gah. No. > > Please find a way to pass the current memory map to the loaded kexec'd > kernel. > Starting a kernel with no way for it to know what the current memory map > is just plain scary. We have one. Firmware tables are the source of all this information. We don't tamper with them. Firmware describes memory present at boot in the UEFI memory map or DT. On systems with ACPI, regions that were added after booting are discovered by running AML methods. (for which we need to allocate memory, so you can't describe boot memory like this) This doesn't work if you kexec from a hot-added region. You've booted from memory that wasn't present at boot. I don't think this is fixable with the set of constraints. Thanks, James ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names @ 2020-04-22 12:14 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-22 12:14 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel Hi Eric, On 15/04/2020 21:36, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: > >> Memory added to the system by hotplug has a 'System RAM' resource created >> for it. This is exposed to user-space via /proc/iomem. >> >> This poses problems for kexec on arm64. If kexec decides to place the >> kernel in one of these newly onlined regions, the new kernel will find >> itself booting from a region not described as memory in the firmware >> tables. >> >> Arm64 doesn't have a structure like the e820 memory map that can be >> re-written when memory is brought online. Instead arm64 uses the UEFI >> memory map, or the memory node from the DT, sometimes both. We never >> rewrite these. >> >> Allow an architecture to specify a different name for these hotplug >> regions. > > Gah. No. > > Please find a way to pass the current memory map to the loaded kexec'd > kernel. > Starting a kernel with no way for it to know what the current memory map > is just plain scary. We have one. Firmware tables are the source of all this information. We don't tamper with them. Firmware describes memory present at boot in the UEFI memory map or DT. On systems with ACPI, regions that were added after booting are discovered by running AML methods. (for which we need to allocate memory, so you can't describe boot memory like this) This doesn't work if you kexec from a hot-added region. You've booted from memory that wasn't present at boot. I don't think this is fixable with the set of constraints. Thanks, James _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names @ 2020-04-22 12:14 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-22 12:14 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel Hi Eric, On 15/04/2020 21:36, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: > >> Memory added to the system by hotplug has a 'System RAM' resource created >> for it. This is exposed to user-space via /proc/iomem. >> >> This poses problems for kexec on arm64. If kexec decides to place the >> kernel in one of these newly onlined regions, the new kernel will find >> itself booting from a region not described as memory in the firmware >> tables. >> >> Arm64 doesn't have a structure like the e820 memory map that can be >> re-written when memory is brought online. Instead arm64 uses the UEFI >> memory map, or the memory node from the DT, sometimes both. We never >> rewrite these. >> >> Allow an architecture to specify a different name for these hotplug >> regions. > > Gah. No. > > Please find a way to pass the current memory map to the loaded kexec'd > kernel. > Starting a kernel with no way for it to know what the current memory map > is just plain scary. We have one. Firmware tables are the source of all this information. We don't tamper with them. Firmware describes memory present at boot in the UEFI memory map or DT. On systems with ACPI, regions that were added after booting are discovered by running AML methods. (for which we need to allocate memory, so you can't describe boot memory like this) This doesn't work if you kexec from a hot-added region. You've booted from memory that wasn't present at boot. I don't think this is fixable with the set of constraints. Thanks, James _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names 2020-03-26 18:07 ` James Morse (?) @ 2020-05-09 0:45 ` Andrew Morton -1 siblings, 0 replies; 264+ messages in thread From: Andrew Morton @ 2020-05-09 0:45 UTC (permalink / raw) To: James Morse Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma On Thu, 26 Mar 2020 18:07:29 +0000 James Morse <james.morse@arm.com> wrote: > Memory added to the system by hotplug has a 'System RAM' resource created > for it. This is exposed to user-space via /proc/iomem. > > This poses problems for kexec on arm64. If kexec decides to place the > kernel in one of these newly onlined regions, the new kernel will find > itself booting from a region not described as memory in the firmware > tables. > > Arm64 doesn't have a structure like the e820 memory map that can be > re-written when memory is brought online. Instead arm64 uses the UEFI > memory map, or the memory node from the DT, sometimes both. We never > rewrite these. > > Allow an architecture to specify a different name for these hotplug > regions. > > ... > > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -42,6 +42,10 @@ > #include "internal.h" > #include "shuffle.h" > > +#ifndef MEMORY_HOTPLUG_RES_NAME > +#define MEMORY_HOTPLUG_RES_NAME "System RAM" > +#endif > + > /* > * online_page_callback contains pointer to current page onlining function. > * Initially it is generic_online_page(). If it is required it could be > @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) > { > struct resource *res; > unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > - char *resource_name = "System RAM"; > + char *resource_name = MEMORY_HOTPLUG_RES_NAME; > > if (start + size > max_mem_size) > return ERR_PTR(-E2BIG); I suppose we should do this as well: --- a/mm/memory_hotplug.c~mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix +++ a/mm/memory_hotplug.c @@ -129,7 +129,8 @@ static struct resource *register_memory_ resource_name, flags); if (!res) { - pr_debug("Unable to reserve System RAM region: %016llx->%016llx\n", + pr_debug("Unable to reserve " MEMORY_HOTPLUG_RES_NAME + " region: %016llx->%016llx\n", start, start + size); return ERR_PTR(-EEXIST); } It assumes that MEMORY_HOTPLUG_RES_NAME will be a literal string, which is the case in [3/3]. ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names @ 2020-05-09 0:45 ` Andrew Morton 0 siblings, 0 replies; 264+ messages in thread From: Andrew Morton @ 2020-05-09 0:45 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Will Deacon, linux-arm-kernel On Thu, 26 Mar 2020 18:07:29 +0000 James Morse <james.morse@arm.com> wrote: > Memory added to the system by hotplug has a 'System RAM' resource created > for it. This is exposed to user-space via /proc/iomem. > > This poses problems for kexec on arm64. If kexec decides to place the > kernel in one of these newly onlined regions, the new kernel will find > itself booting from a region not described as memory in the firmware > tables. > > Arm64 doesn't have a structure like the e820 memory map that can be > re-written when memory is brought online. Instead arm64 uses the UEFI > memory map, or the memory node from the DT, sometimes both. We never > rewrite these. > > Allow an architecture to specify a different name for these hotplug > regions. > > ... > > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -42,6 +42,10 @@ > #include "internal.h" > #include "shuffle.h" > > +#ifndef MEMORY_HOTPLUG_RES_NAME > +#define MEMORY_HOTPLUG_RES_NAME "System RAM" > +#endif > + > /* > * online_page_callback contains pointer to current page onlining function. > * Initially it is generic_online_page(). If it is required it could be > @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) > { > struct resource *res; > unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > - char *resource_name = "System RAM"; > + char *resource_name = MEMORY_HOTPLUG_RES_NAME; > > if (start + size > max_mem_size) > return ERR_PTR(-E2BIG); I suppose we should do this as well: --- a/mm/memory_hotplug.c~mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix +++ a/mm/memory_hotplug.c @@ -129,7 +129,8 @@ static struct resource *register_memory_ resource_name, flags); if (!res) { - pr_debug("Unable to reserve System RAM region: %016llx->%016llx\n", + pr_debug("Unable to reserve " MEMORY_HOTPLUG_RES_NAME + " region: %016llx->%016llx\n", start, start + size); return ERR_PTR(-EEXIST); } It assumes that MEMORY_HOTPLUG_RES_NAME will be a literal string, which is the case in [3/3]. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names @ 2020-05-09 0:45 ` Andrew Morton 0 siblings, 0 replies; 264+ messages in thread From: Andrew Morton @ 2020-05-09 0:45 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Will Deacon, linux-arm-kernel On Thu, 26 Mar 2020 18:07:29 +0000 James Morse <james.morse@arm.com> wrote: > Memory added to the system by hotplug has a 'System RAM' resource created > for it. This is exposed to user-space via /proc/iomem. > > This poses problems for kexec on arm64. If kexec decides to place the > kernel in one of these newly onlined regions, the new kernel will find > itself booting from a region not described as memory in the firmware > tables. > > Arm64 doesn't have a structure like the e820 memory map that can be > re-written when memory is brought online. Instead arm64 uses the UEFI > memory map, or the memory node from the DT, sometimes both. We never > rewrite these. > > Allow an architecture to specify a different name for these hotplug > regions. > > ... > > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -42,6 +42,10 @@ > #include "internal.h" > #include "shuffle.h" > > +#ifndef MEMORY_HOTPLUG_RES_NAME > +#define MEMORY_HOTPLUG_RES_NAME "System RAM" > +#endif > + > /* > * online_page_callback contains pointer to current page onlining function. > * Initially it is generic_online_page(). If it is required it could be > @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) > { > struct resource *res; > unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > - char *resource_name = "System RAM"; > + char *resource_name = MEMORY_HOTPLUG_RES_NAME; > > if (start + size > max_mem_size) > return ERR_PTR(-E2BIG); I suppose we should do this as well: --- a/mm/memory_hotplug.c~mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix +++ a/mm/memory_hotplug.c @@ -129,7 +129,8 @@ static struct resource *register_memory_ resource_name, flags); if (!res) { - pr_debug("Unable to reserve System RAM region: %016llx->%016llx\n", + pr_debug("Unable to reserve " MEMORY_HOTPLUG_RES_NAME + " region: %016llx->%016llx\n", start, start + size); return ERR_PTR(-EEXIST); } It assumes that MEMORY_HOTPLUG_RES_NAME will be a literal string, which is the case in [3/3]. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names 2020-05-09 0:45 ` Andrew Morton (?) @ 2020-05-11 8:35 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-05-11 8:35 UTC (permalink / raw) To: Andrew Morton, James Morse Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma On 09.05.20 02:45, Andrew Morton wrote: > On Thu, 26 Mar 2020 18:07:29 +0000 James Morse <james.morse@arm.com> wrote: > >> Memory added to the system by hotplug has a 'System RAM' resource created >> for it. This is exposed to user-space via /proc/iomem. >> >> This poses problems for kexec on arm64. If kexec decides to place the >> kernel in one of these newly onlined regions, the new kernel will find >> itself booting from a region not described as memory in the firmware >> tables. >> >> Arm64 doesn't have a structure like the e820 memory map that can be >> re-written when memory is brought online. Instead arm64 uses the UEFI >> memory map, or the memory node from the DT, sometimes both. We never >> rewrite these. >> >> Allow an architecture to specify a different name for these hotplug >> regions. >> >> ... >> >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -42,6 +42,10 @@ >> #include "internal.h" >> #include "shuffle.h" >> >> +#ifndef MEMORY_HOTPLUG_RES_NAME >> +#define MEMORY_HOTPLUG_RES_NAME "System RAM" >> +#endif >> + >> /* >> * online_page_callback contains pointer to current page onlining function. >> * Initially it is generic_online_page(). If it is required it could be >> @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) >> { >> struct resource *res; >> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >> - char *resource_name = "System RAM"; >> + char *resource_name = MEMORY_HOTPLUG_RES_NAME; >> >> if (start + size > max_mem_size) >> return ERR_PTR(-E2BIG); > > I suppose we should do this as well: > > --- a/mm/memory_hotplug.c~mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix > +++ a/mm/memory_hotplug.c > @@ -129,7 +129,8 @@ static struct resource *register_memory_ > resource_name, flags); > > if (!res) { > - pr_debug("Unable to reserve System RAM region: %016llx->%016llx\n", > + pr_debug("Unable to reserve " MEMORY_HOTPLUG_RES_NAME > + " region: %016llx->%016llx\n", > start, start + size); > return ERR_PTR(-EEXIST); > } > > It assumes that MEMORY_HOTPLUG_RES_NAME will be a literal string, which > is the case in [3/3]. @Andrew, as discussed in this thread already [1], I suggest to drop this series from -mm tree for now. [1] https://lkml.kernel.org/r/2e3419b2-d00c-51c3-9b45-9de114608cdf@arm.com -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names @ 2020-05-11 8:35 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-05-11 8:35 UTC (permalink / raw) To: Andrew Morton, James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Will Deacon, linux-arm-kernel On 09.05.20 02:45, Andrew Morton wrote: > On Thu, 26 Mar 2020 18:07:29 +0000 James Morse <james.morse@arm.com> wrote: > >> Memory added to the system by hotplug has a 'System RAM' resource created >> for it. This is exposed to user-space via /proc/iomem. >> >> This poses problems for kexec on arm64. If kexec decides to place the >> kernel in one of these newly onlined regions, the new kernel will find >> itself booting from a region not described as memory in the firmware >> tables. >> >> Arm64 doesn't have a structure like the e820 memory map that can be >> re-written when memory is brought online. Instead arm64 uses the UEFI >> memory map, or the memory node from the DT, sometimes both. We never >> rewrite these. >> >> Allow an architecture to specify a different name for these hotplug >> regions. >> >> ... >> >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -42,6 +42,10 @@ >> #include "internal.h" >> #include "shuffle.h" >> >> +#ifndef MEMORY_HOTPLUG_RES_NAME >> +#define MEMORY_HOTPLUG_RES_NAME "System RAM" >> +#endif >> + >> /* >> * online_page_callback contains pointer to current page onlining function. >> * Initially it is generic_online_page(). If it is required it could be >> @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) >> { >> struct resource *res; >> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >> - char *resource_name = "System RAM"; >> + char *resource_name = MEMORY_HOTPLUG_RES_NAME; >> >> if (start + size > max_mem_size) >> return ERR_PTR(-E2BIG); > > I suppose we should do this as well: > > --- a/mm/memory_hotplug.c~mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix > +++ a/mm/memory_hotplug.c > @@ -129,7 +129,8 @@ static struct resource *register_memory_ > resource_name, flags); > > if (!res) { > - pr_debug("Unable to reserve System RAM region: %016llx->%016llx\n", > + pr_debug("Unable to reserve " MEMORY_HOTPLUG_RES_NAME > + " region: %016llx->%016llx\n", > start, start + size); > return ERR_PTR(-EEXIST); > } > > It assumes that MEMORY_HOTPLUG_RES_NAME will be a literal string, which > is the case in [3/3]. @Andrew, as discussed in this thread already [1], I suggest to drop this series from -mm tree for now. [1] https://lkml.kernel.org/r/2e3419b2-d00c-51c3-9b45-9de114608cdf@arm.com -- Thanks, David / dhildenb _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names @ 2020-05-11 8:35 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-05-11 8:35 UTC (permalink / raw) To: Andrew Morton, James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Will Deacon, linux-arm-kernel On 09.05.20 02:45, Andrew Morton wrote: > On Thu, 26 Mar 2020 18:07:29 +0000 James Morse <james.morse@arm.com> wrote: > >> Memory added to the system by hotplug has a 'System RAM' resource created >> for it. This is exposed to user-space via /proc/iomem. >> >> This poses problems for kexec on arm64. If kexec decides to place the >> kernel in one of these newly onlined regions, the new kernel will find >> itself booting from a region not described as memory in the firmware >> tables. >> >> Arm64 doesn't have a structure like the e820 memory map that can be >> re-written when memory is brought online. Instead arm64 uses the UEFI >> memory map, or the memory node from the DT, sometimes both. We never >> rewrite these. >> >> Allow an architecture to specify a different name for these hotplug >> regions. >> >> ... >> >> --- a/mm/memory_hotplug.c >> +++ b/mm/memory_hotplug.c >> @@ -42,6 +42,10 @@ >> #include "internal.h" >> #include "shuffle.h" >> >> +#ifndef MEMORY_HOTPLUG_RES_NAME >> +#define MEMORY_HOTPLUG_RES_NAME "System RAM" >> +#endif >> + >> /* >> * online_page_callback contains pointer to current page onlining function. >> * Initially it is generic_online_page(). If it is required it could be >> @@ -103,7 +107,7 @@ static struct resource *register_memory_resource(u64 start, u64 size) >> { >> struct resource *res; >> unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; >> - char *resource_name = "System RAM"; >> + char *resource_name = MEMORY_HOTPLUG_RES_NAME; >> >> if (start + size > max_mem_size) >> return ERR_PTR(-E2BIG); > > I suppose we should do this as well: > > --- a/mm/memory_hotplug.c~mm-memory_hotplug-allow-arch-override-of-non-boot-memory-resource-names-fix > +++ a/mm/memory_hotplug.c > @@ -129,7 +129,8 @@ static struct resource *register_memory_ > resource_name, flags); > > if (!res) { > - pr_debug("Unable to reserve System RAM region: %016llx->%016llx\n", > + pr_debug("Unable to reserve " MEMORY_HOTPLUG_RES_NAME > + " region: %016llx->%016llx\n", > start, start + size); > return ERR_PTR(-EEXIST); > } > > It assumes that MEMORY_HOTPLUG_RES_NAME will be a literal string, which > is the case in [3/3]. @Andrew, as discussed in this thread already [1], I suggest to drop this series from -mm tree for now. [1] https://lkml.kernel.org/r/2e3419b2-d00c-51c3-9b45-9de114608cdf@arm.com -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name 2020-03-26 18:07 ` James Morse @ 2020-03-26 18:07 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-26 18:07 UTC (permalink / raw) To: kexec, linux-mm, linux-arm-kernel Cc: Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma, James Morse If kexec chooses to place the kernel in a memory region that was added after boot, we fail to boot as the kernel is running from a location that is not described as memory by the UEFI memory map or the original DT. To prevent unaware user-space kexec from doing this accidentally, give these regions a different name. Signed-off-by: James Morse <james.morse@arm.com> --- This is a change in behaviour as seen by user-space, because memory hot-add has already been merged. arch/arm64/include/asm/memory.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index 2be67b232499..ef1686518469 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -166,6 +166,17 @@ #define IOREMAP_MAX_ORDER (PMD_SHIFT) #endif +/* + * Memory hotplug allows new regions of 'System RAM' to be added to the system. + * These aren't described as memory by the UEFI memory map, or DT memory node. + * If we kexec from one of these regions, the new kernel boots from a location + * that isn't described as RAM. + * + * Give these resources a different name, so unaware kexec doesn't do this by + * accident. + */ +#define MEMORY_HOTPLUG_RES_NAME "System RAM (hotplug)" + #ifndef __ASSEMBLY__ extern u64 vabits_actual; #define PAGE_END (_PAGE_END(vabits_actual)) -- 2.25.1 ^ permalink raw reply related [flat|nested] 264+ messages in thread
* [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name @ 2020-03-26 18:07 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-26 18:07 UTC (permalink / raw) To: kexec, linux-mm, linux-arm-kernel Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, James Morse, Eric Biederman, Andrew Morton, Will Deacon If kexec chooses to place the kernel in a memory region that was added after boot, we fail to boot as the kernel is running from a location that is not described as memory by the UEFI memory map or the original DT. To prevent unaware user-space kexec from doing this accidentally, give these regions a different name. Signed-off-by: James Morse <james.morse@arm.com> --- This is a change in behaviour as seen by user-space, because memory hot-add has already been merged. arch/arm64/include/asm/memory.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index 2be67b232499..ef1686518469 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -166,6 +166,17 @@ #define IOREMAP_MAX_ORDER (PMD_SHIFT) #endif +/* + * Memory hotplug allows new regions of 'System RAM' to be added to the system. + * These aren't described as memory by the UEFI memory map, or DT memory node. + * If we kexec from one of these regions, the new kernel boots from a location + * that isn't described as RAM. + * + * Give these resources a different name, so unaware kexec doesn't do this by + * accident. + */ +#define MEMORY_HOTPLUG_RES_NAME "System RAM (hotplug)" + #ifndef __ASSEMBLY__ extern u64 vabits_actual; #define PAGE_END (_PAGE_END(vabits_actual)) -- 2.25.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 264+ messages in thread
* Re: [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name 2020-03-26 18:07 ` James Morse @ 2020-03-30 19:01 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-30 19:01 UTC (permalink / raw) To: James Morse, kexec, linux-mm, linux-arm-kernel Cc: Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma On 26.03.20 19:07, James Morse wrote: > If kexec chooses to place the kernel in a memory region that was > added after boot, we fail to boot as the kernel is running from a > location that is not described as memory by the UEFI memory map or > the original DT. > > To prevent unaware user-space kexec from doing this accidentally, > give these regions a different name. > > Signed-off-by: James Morse <james.morse@arm.com> > --- > This is a change in behaviour as seen by user-space, because memory hot-add > has already been merged. > > arch/arm64/include/asm/memory.h | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h > index 2be67b232499..ef1686518469 100644 > --- a/arch/arm64/include/asm/memory.h > +++ b/arch/arm64/include/asm/memory.h > @@ -166,6 +166,17 @@ > #define IOREMAP_MAX_ORDER (PMD_SHIFT) > #endif > > +/* > + * Memory hotplug allows new regions of 'System RAM' to be added to the system. > + * These aren't described as memory by the UEFI memory map, or DT memory node. > + * If we kexec from one of these regions, the new kernel boots from a location > + * that isn't described as RAM. > + * > + * Give these resources a different name, so unaware kexec doesn't do this by > + * accident. > + */ > +#define MEMORY_HOTPLUG_RES_NAME "System RAM (hotplug)" > + > #ifndef __ASSEMBLY__ > extern u64 vabits_actual; > #define PAGE_END (_PAGE_END(vabits_actual)) > (While I am familiar with makedumpfile in the crash kernel, I am not yet familiar with kexec, so bare with me) Looking at kexec:arch/arm64/crashdump-arm64.c load_crashdump_segments() -> crash_get_memory_ranges() -> kexec_iomem_for_each_line() -> iomem_range_callback() #define SYSTEM_RAM "System RAM\n" ... } else if (strncmp(str, SYSTEM_RAM, strlen(SYSTEM_RAM)) == 0) { return mem_regions_add(&system_memory_rgns, ...); } The hotplugged memory will no longer be detected as a crashdump segment, consequently (AFAIU) not be described in the elf header, and therefore also no longer dumped (e.g., by makedumpfile). I assume you'll have to adapt kexec-tools to still consider this memory for dumping, correct? Or am I missing something? -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name @ 2020-03-30 19:01 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-30 19:01 UTC (permalink / raw) To: James Morse, kexec, linux-mm, linux-arm-kernel Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Eric Biederman, Andrew Morton, Will Deacon On 26.03.20 19:07, James Morse wrote: > If kexec chooses to place the kernel in a memory region that was > added after boot, we fail to boot as the kernel is running from a > location that is not described as memory by the UEFI memory map or > the original DT. > > To prevent unaware user-space kexec from doing this accidentally, > give these regions a different name. > > Signed-off-by: James Morse <james.morse@arm.com> > --- > This is a change in behaviour as seen by user-space, because memory hot-add > has already been merged. > > arch/arm64/include/asm/memory.h | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h > index 2be67b232499..ef1686518469 100644 > --- a/arch/arm64/include/asm/memory.h > +++ b/arch/arm64/include/asm/memory.h > @@ -166,6 +166,17 @@ > #define IOREMAP_MAX_ORDER (PMD_SHIFT) > #endif > > +/* > + * Memory hotplug allows new regions of 'System RAM' to be added to the system. > + * These aren't described as memory by the UEFI memory map, or DT memory node. > + * If we kexec from one of these regions, the new kernel boots from a location > + * that isn't described as RAM. > + * > + * Give these resources a different name, so unaware kexec doesn't do this by > + * accident. > + */ > +#define MEMORY_HOTPLUG_RES_NAME "System RAM (hotplug)" > + > #ifndef __ASSEMBLY__ > extern u64 vabits_actual; > #define PAGE_END (_PAGE_END(vabits_actual)) > (While I am familiar with makedumpfile in the crash kernel, I am not yet familiar with kexec, so bare with me) Looking at kexec:arch/arm64/crashdump-arm64.c load_crashdump_segments() -> crash_get_memory_ranges() -> kexec_iomem_for_each_line() -> iomem_range_callback() #define SYSTEM_RAM "System RAM\n" ... } else if (strncmp(str, SYSTEM_RAM, strlen(SYSTEM_RAM)) == 0) { return mem_regions_add(&system_memory_rgns, ...); } The hotplugged memory will no longer be detected as a crashdump segment, consequently (AFAIU) not be described in the elf header, and therefore also no longer dumped (e.g., by makedumpfile). I assume you'll have to adapt kexec-tools to still consider this memory for dumping, correct? Or am I missing something? -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name 2020-03-26 18:07 ` James Morse (?) @ 2020-04-15 20:37 ` Eric W. Biederman -1 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-15 20:37 UTC (permalink / raw) To: James Morse Cc: kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Andrew Morton, Will Deacon James Morse <james.morse@arm.com> writes: > If kexec chooses to place the kernel in a memory region that was > added after boot, we fail to boot as the kernel is running from a > location that is not described as memory by the UEFI memory map or > the original DT. > > To prevent unaware user-space kexec from doing this accidentally, > give these regions a different name. Please fix the problem and don't hack around it. Eric ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name @ 2020-04-15 20:37 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-15 20:37 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel James Morse <james.morse@arm.com> writes: > If kexec chooses to place the kernel in a memory region that was > added after boot, we fail to boot as the kernel is running from a > location that is not described as memory by the UEFI memory map or > the original DT. > > To prevent unaware user-space kexec from doing this accidentally, > give these regions a different name. Please fix the problem and don't hack around it. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name @ 2020-04-15 20:37 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-15 20:37 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel James Morse <james.morse@arm.com> writes: > If kexec chooses to place the kernel in a memory region that was > added after boot, we fail to boot as the kernel is running from a > location that is not described as memory by the UEFI memory map or > the original DT. > > To prevent unaware user-space kexec from doing this accidentally, > give these regions a different name. Please fix the problem and don't hack around it. Eric _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name 2020-04-15 20:37 ` Eric W. Biederman (?) @ 2020-04-22 12:14 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-22 12:14 UTC (permalink / raw) To: Eric W. Biederman Cc: kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Andrew Morton, Will Deacon Hi Eric, On 15/04/2020 21:37, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: > >> If kexec chooses to place the kernel in a memory region that was >> added after boot, we fail to boot as the kernel is running from a >> location that is not described as memory by the UEFI memory map or >> the original DT. >> >> To prevent unaware user-space kexec from doing this accidentally, >> give these regions a different name. > > Please fix the problem and don't hack around it. The problem is firmware didn't describe memory that wasn't present at boot. arm64 relies on the firmware description of memory well before it can go poking around in ACPI to find out where extra memory was added to the system. We already need kexec to not overwrite in-memory structures left by firmware. (like, the memory map). We do this by naming them reserved in /proc/iomem. Doing the same for hotadded memory means existing kexec user-space can't do this accidentally. The shape of /proc/iomem is the only trick in the book for arm64's kexec userspace, as its the only thing it looks at. Thanks, James ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name @ 2020-04-22 12:14 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-22 12:14 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel Hi Eric, On 15/04/2020 21:37, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: > >> If kexec chooses to place the kernel in a memory region that was >> added after boot, we fail to boot as the kernel is running from a >> location that is not described as memory by the UEFI memory map or >> the original DT. >> >> To prevent unaware user-space kexec from doing this accidentally, >> give these regions a different name. > > Please fix the problem and don't hack around it. The problem is firmware didn't describe memory that wasn't present at boot. arm64 relies on the firmware description of memory well before it can go poking around in ACPI to find out where extra memory was added to the system. We already need kexec to not overwrite in-memory structures left by firmware. (like, the memory map). We do this by naming them reserved in /proc/iomem. Doing the same for hotadded memory means existing kexec user-space can't do this accidentally. The shape of /proc/iomem is the only trick in the book for arm64's kexec userspace, as its the only thing it looks at. Thanks, James _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name @ 2020-04-22 12:14 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-22 12:14 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel Hi Eric, On 15/04/2020 21:37, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: > >> If kexec chooses to place the kernel in a memory region that was >> added after boot, we fail to boot as the kernel is running from a >> location that is not described as memory by the UEFI memory map or >> the original DT. >> >> To prevent unaware user-space kexec from doing this accidentally, >> give these regions a different name. > > Please fix the problem and don't hack around it. The problem is firmware didn't describe memory that wasn't present at boot. arm64 relies on the firmware description of memory well before it can go poking around in ACPI to find out where extra memory was added to the system. We already need kexec to not overwrite in-memory structures left by firmware. (like, the memory map). We do this by naming them reserved in /proc/iomem. Doing the same for hotadded memory means existing kexec user-space can't do this accidentally. The shape of /proc/iomem is the only trick in the book for arm64's kexec userspace, as its the only thing it looks at. Thanks, James _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use 2020-03-26 18:07 ` James Morse @ 2020-03-27 2:11 ` Baoquan He -1 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-03-27 2:11 UTC (permalink / raw) To: James Morse Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma On 03/26/20 at 06:07pm, James Morse wrote: > Hello! > > arm64 recently queued support for memory hotremove, which led to some > new corner cases for kexec. > > If the kexec segments are loaded for a removable region, that region may > be removed before kexec actually occurs. This causes the first kernel to > lockup when applying the relocations. (I've triggered this on x86 too). Do you mean you use 'kexec -l /boot/vmlinuz-xxxx --initrd ...' to load a kernel, next you hot remove some memory regions, then you execute 'kexec -e' to trigger kexec reboot? I may not get the point clearly, but we usually do the loading and triggering of kexec-ed kernel at the same time. > > The first patch adds a memory notifier for kexec so that it can refuse > to allow in-use regions to be taken offline. > > > This doesn't solve the problem for arm64, where the new kernel must > initially rely on the data structures from the first boot to describe > memory. These don't describe hotpluggable memory. > If kexec places the kernel in one of these regions, it must also provide > a DT that describes the region in which the kernel was mapped as memory. > (and somehow ensure its always present in the future...) > > To prevent this from happening accidentally with unaware user-space, > patches two and three allow arm64 to give these regions a different > name. > > This is a change in behaviour for arm64 as memory hotadd and hotremove > were added separately. > > > I haven't tried kdump. > Unaware kdump from user-space probably won't describe the hotplug > regions if the name is different, which saves us from problems if > the memory is no longer present at kdump time, but means the vmcore > is incomplete. Currently, we will monitor udev events of mem hot add/remove, then reload kdump kernel. That reloading is only update the elfcorehdr, because crashkernel has to be reserved during 1st kernel bootup. I don't think this will have problem. > > > These patches are based on arm64's for-next/core branch, but can all > be merged independently. > > Thanks, > > James Morse (3): > kexec: Prevent removal of memory in use by a loaded kexec image > mm/memory_hotplug: Allow arch override of non boot memory resource > names > arm64: memory: Give hotplug memory a different resource name > > arch/arm64/include/asm/memory.h | 11 +++++++ > kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++ > mm/memory_hotplug.c | 6 +++- > 3 files changed, 72 insertions(+), 1 deletion(-) > > -- > 2.25.1 > > ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-03-27 2:11 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-03-27 2:11 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 03/26/20 at 06:07pm, James Morse wrote: > Hello! > > arm64 recently queued support for memory hotremove, which led to some > new corner cases for kexec. > > If the kexec segments are loaded for a removable region, that region may > be removed before kexec actually occurs. This causes the first kernel to > lockup when applying the relocations. (I've triggered this on x86 too). Do you mean you use 'kexec -l /boot/vmlinuz-xxxx --initrd ...' to load a kernel, next you hot remove some memory regions, then you execute 'kexec -e' to trigger kexec reboot? I may not get the point clearly, but we usually do the loading and triggering of kexec-ed kernel at the same time. > > The first patch adds a memory notifier for kexec so that it can refuse > to allow in-use regions to be taken offline. > > > This doesn't solve the problem for arm64, where the new kernel must > initially rely on the data structures from the first boot to describe > memory. These don't describe hotpluggable memory. > If kexec places the kernel in one of these regions, it must also provide > a DT that describes the region in which the kernel was mapped as memory. > (and somehow ensure its always present in the future...) > > To prevent this from happening accidentally with unaware user-space, > patches two and three allow arm64 to give these regions a different > name. > > This is a change in behaviour for arm64 as memory hotadd and hotremove > were added separately. > > > I haven't tried kdump. > Unaware kdump from user-space probably won't describe the hotplug > regions if the name is different, which saves us from problems if > the memory is no longer present at kdump time, but means the vmcore > is incomplete. Currently, we will monitor udev events of mem hot add/remove, then reload kdump kernel. That reloading is only update the elfcorehdr, because crashkernel has to be reserved during 1st kernel bootup. I don't think this will have problem. > > > These patches are based on arm64's for-next/core branch, but can all > be merged independently. > > Thanks, > > James Morse (3): > kexec: Prevent removal of memory in use by a loaded kexec image > mm/memory_hotplug: Allow arch override of non boot memory resource > names > arm64: memory: Give hotplug memory a different resource name > > arch/arm64/include/asm/memory.h | 11 +++++++ > kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++ > mm/memory_hotplug.c | 6 +++- > 3 files changed, 72 insertions(+), 1 deletion(-) > > -- > 2.25.1 > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use 2020-03-27 2:11 ` Baoquan He @ 2020-03-27 15:40 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-27 15:40 UTC (permalink / raw) To: Baoquan He Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma Hi Baoquan, On 3/27/20 2:11 AM, Baoquan He wrote: > On 03/26/20 at 06:07pm, James Morse wrote: >> arm64 recently queued support for memory hotremove, which led to some >> new corner cases for kexec. >> >> If the kexec segments are loaded for a removable region, that region may >> be removed before kexec actually occurs. This causes the first kernel to >> lockup when applying the relocations. (I've triggered this on x86 too). > Do you mean you use 'kexec -l /boot/vmlinuz-xxxx --initrd ...' to load a > kernel, next you hot remove some memory regions, then you execute > 'kexec -e' to trigger kexec reboot? Yes. But to make it more fun, get someone else to trigger the hot-remove behind your back! > I may not get the point clearly, but we usually do the loading and > triggering of kexec-ed kernel at the same time. But its two syscalls. Should the second one fail if the memory layout has changed since the first? (UEFI does this for exit-boot-services, there is handshake to prove you know what the current memory map is) >> The first patch adds a memory notifier for kexec so that it can refuse >> to allow in-use regions to be taken offline. >> >> >> This doesn't solve the problem for arm64, where the new kernel must >> initially rely on the data structures from the first boot to describe >> memory. These don't describe hotpluggable memory. >> If kexec places the kernel in one of these regions, it must also provide >> a DT that describes the region in which the kernel was mapped as memory. >> (and somehow ensure its always present in the future...) >> >> To prevent this from happening accidentally with unaware user-space, >> patches two and three allow arm64 to give these regions a different >> name. >> >> This is a change in behaviour for arm64 as memory hotadd and hotremove >> were added separately. >> >> >> I haven't tried kdump. >> Unaware kdump from user-space probably won't describe the hotplug >> regions if the name is different, which saves us from problems if >> the memory is no longer present at kdump time, but means the vmcore >> is incomplete. > Currently, we will monitor udev events of mem hot add/remove, then > reload kdump kernel. That reloading is only update the elfcorehdr, > because crashkernel has to be reserved during 1st kernel bootup. I don't > think this will have problem. Great. I don't think there is much the kernel can do for the kdump case, so its good to know the tools already exist for detecting and restarting the kdump load when the memory layout changes. For kdump via kexec-file-load, we would need to regenerate the elfcorehdr, I'm hoping that can be done in core code. Thanks, James ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-03-27 15:40 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-27 15:40 UTC (permalink / raw) To: Baoquan He Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel Hi Baoquan, On 3/27/20 2:11 AM, Baoquan He wrote: > On 03/26/20 at 06:07pm, James Morse wrote: >> arm64 recently queued support for memory hotremove, which led to some >> new corner cases for kexec. >> >> If the kexec segments are loaded for a removable region, that region may >> be removed before kexec actually occurs. This causes the first kernel to >> lockup when applying the relocations. (I've triggered this on x86 too). > Do you mean you use 'kexec -l /boot/vmlinuz-xxxx --initrd ...' to load a > kernel, next you hot remove some memory regions, then you execute > 'kexec -e' to trigger kexec reboot? Yes. But to make it more fun, get someone else to trigger the hot-remove behind your back! > I may not get the point clearly, but we usually do the loading and > triggering of kexec-ed kernel at the same time. But its two syscalls. Should the second one fail if the memory layout has changed since the first? (UEFI does this for exit-boot-services, there is handshake to prove you know what the current memory map is) >> The first patch adds a memory notifier for kexec so that it can refuse >> to allow in-use regions to be taken offline. >> >> >> This doesn't solve the problem for arm64, where the new kernel must >> initially rely on the data structures from the first boot to describe >> memory. These don't describe hotpluggable memory. >> If kexec places the kernel in one of these regions, it must also provide >> a DT that describes the region in which the kernel was mapped as memory. >> (and somehow ensure its always present in the future...) >> >> To prevent this from happening accidentally with unaware user-space, >> patches two and three allow arm64 to give these regions a different >> name. >> >> This is a change in behaviour for arm64 as memory hotadd and hotremove >> were added separately. >> >> >> I haven't tried kdump. >> Unaware kdump from user-space probably won't describe the hotplug >> regions if the name is different, which saves us from problems if >> the memory is no longer present at kdump time, but means the vmcore >> is incomplete. > Currently, we will monitor udev events of mem hot add/remove, then > reload kdump kernel. That reloading is only update the elfcorehdr, > because crashkernel has to be reserved during 1st kernel bootup. I don't > think this will have problem. Great. I don't think there is much the kernel can do for the kdump case, so its good to know the tools already exist for detecting and restarting the kdump load when the memory layout changes. For kdump via kexec-file-load, we would need to regenerate the elfcorehdr, I'm hoping that can be done in core code. Thanks, James _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use 2020-03-26 18:07 ` James Morse @ 2020-03-27 9:27 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-27 9:27 UTC (permalink / raw) To: James Morse, kexec, linux-mm, linux-arm-kernel Cc: Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma On 26.03.20 19:07, James Morse wrote: > Hello! > > arm64 recently queued support for memory hotremove, which led to some > new corner cases for kexec. > > If the kexec segments are loaded for a removable region, that region may > be removed before kexec actually occurs. This causes the first kernel to > lockup when applying the relocations. (I've triggered this on x86 too). > > The first patch adds a memory notifier for kexec so that it can refuse > to allow in-use regions to be taken offline. IIRC other architectures handle that by setting the affected pages PageReserved. Any reason why to not stick to the same? > > > This doesn't solve the problem for arm64, where the new kernel must > initially rely on the data structures from the first boot to describe > memory. These don't describe hotpluggable memory. > If kexec places the kernel in one of these regions, it must also provide > a DT that describes the region in which the kernel was mapped as memory. > (and somehow ensure its always present in the future...) > > To prevent this from happening accidentally with unaware user-space, > patches two and three allow arm64 to give these regions a different > name. > > This is a change in behaviour for arm64 as memory hotadd and hotremove > were added separately. > > > I haven't tried kdump. > Unaware kdump from user-space probably won't describe the hotplug > regions if the name is different, which saves us from problems if > the memory is no longer present at kdump time, but means the vmcore > is incomplete. Whenever memory is added/removed, kdump.service is to be restarted from user space, which will fixup the data structures such that kdump will not try to dump unplugged memory. Also, makedumpfile will check if the sections are still around IIRC. Not sure what you mean by "Unaware kdump from user-space". -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-03-27 9:27 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-27 9:27 UTC (permalink / raw) To: James Morse, kexec, linux-mm, linux-arm-kernel Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Eric Biederman, Andrew Morton, Will Deacon On 26.03.20 19:07, James Morse wrote: > Hello! > > arm64 recently queued support for memory hotremove, which led to some > new corner cases for kexec. > > If the kexec segments are loaded for a removable region, that region may > be removed before kexec actually occurs. This causes the first kernel to > lockup when applying the relocations. (I've triggered this on x86 too). > > The first patch adds a memory notifier for kexec so that it can refuse > to allow in-use regions to be taken offline. IIRC other architectures handle that by setting the affected pages PageReserved. Any reason why to not stick to the same? > > > This doesn't solve the problem for arm64, where the new kernel must > initially rely on the data structures from the first boot to describe > memory. These don't describe hotpluggable memory. > If kexec places the kernel in one of these regions, it must also provide > a DT that describes the region in which the kernel was mapped as memory. > (and somehow ensure its always present in the future...) > > To prevent this from happening accidentally with unaware user-space, > patches two and three allow arm64 to give these regions a different > name. > > This is a change in behaviour for arm64 as memory hotadd and hotremove > were added separately. > > > I haven't tried kdump. > Unaware kdump from user-space probably won't describe the hotplug > regions if the name is different, which saves us from problems if > the memory is no longer present at kdump time, but means the vmcore > is incomplete. Whenever memory is added/removed, kdump.service is to be restarted from user space, which will fixup the data structures such that kdump will not try to dump unplugged memory. Also, makedumpfile will check if the sections are still around IIRC. Not sure what you mean by "Unaware kdump from user-space". -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use 2020-03-27 9:27 ` David Hildenbrand @ 2020-03-27 15:42 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-27 15:42 UTC (permalink / raw) To: David Hildenbrand Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma Hi David, On 3/27/20 9:27 AM, David Hildenbrand wrote: > On 26.03.20 19:07, James Morse wrote: >> arm64 recently queued support for memory hotremove, which led to some >> new corner cases for kexec. >> >> If the kexec segments are loaded for a removable region, that region may >> be removed before kexec actually occurs. This causes the first kernel to >> lockup when applying the relocations. (I've triggered this on x86 too). >> >> The first patch adds a memory notifier for kexec so that it can refuse >> to allow in-use regions to be taken offline. > IIRC other architectures handle that by setting the affected pages > PageReserved. Any reason why to not stick to the same? Hmm, I didn't spot this. How come core code doesn't do it if its needed? Doesn't PG_Reserved prevent the page from being used for regular allocations? (or is that only if its done early) I prefer the runtime check as the dmesg output gives the user some chance of knowing why their memory-offline failed, and doing something about it! >> This doesn't solve the problem for arm64, where the new kernel must >> initially rely on the data structures from the first boot to describe >> memory. These don't describe hotpluggable memory. >> If kexec places the kernel in one of these regions, it must also provide >> a DT that describes the region in which the kernel was mapped as memory. >> (and somehow ensure its always present in the future...) >> >> To prevent this from happening accidentally with unaware user-space, >> patches two and three allow arm64 to give these regions a different >> name. >> >> This is a change in behaviour for arm64 as memory hotadd and hotremove >> were added separately. >> >> >> I haven't tried kdump. >> Unaware kdump from user-space probably won't describe the hotplug >> regions if the name is different, which saves us from problems if >> the memory is no longer present at kdump time, but means the vmcore >> is incomplete. > Whenever memory is added/removed, kdump.service is to be restarted from > user space, which will fixup the data structures such that kdump will > not try to dump unplugged memory. Cunning. > Also, makedumpfile will check if the > sections are still around IIRC. Curious. I thought the vmcore was virtually addressed, how does it know which linear-map portions correspond to sysfs memory nodes with KASLR? > Not sure what you mean by "Unaware kdump from user-space". The existing kexec-tools binaries, that (I assume) don't go probing to find out if 'System RAM' is removable or not, loading a kdump kernel, along with the user-space generated blob that describes the first kernel's memory usage to the second kernel. 'user-space' here to distinguish all this from kexec_file_load(). Thanks, James ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-03-27 15:42 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-27 15:42 UTC (permalink / raw) To: David Hildenbrand Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel Hi David, On 3/27/20 9:27 AM, David Hildenbrand wrote: > On 26.03.20 19:07, James Morse wrote: >> arm64 recently queued support for memory hotremove, which led to some >> new corner cases for kexec. >> >> If the kexec segments are loaded for a removable region, that region may >> be removed before kexec actually occurs. This causes the first kernel to >> lockup when applying the relocations. (I've triggered this on x86 too). >> >> The first patch adds a memory notifier for kexec so that it can refuse >> to allow in-use regions to be taken offline. > IIRC other architectures handle that by setting the affected pages > PageReserved. Any reason why to not stick to the same? Hmm, I didn't spot this. How come core code doesn't do it if its needed? Doesn't PG_Reserved prevent the page from being used for regular allocations? (or is that only if its done early) I prefer the runtime check as the dmesg output gives the user some chance of knowing why their memory-offline failed, and doing something about it! >> This doesn't solve the problem for arm64, where the new kernel must >> initially rely on the data structures from the first boot to describe >> memory. These don't describe hotpluggable memory. >> If kexec places the kernel in one of these regions, it must also provide >> a DT that describes the region in which the kernel was mapped as memory. >> (and somehow ensure its always present in the future...) >> >> To prevent this from happening accidentally with unaware user-space, >> patches two and three allow arm64 to give these regions a different >> name. >> >> This is a change in behaviour for arm64 as memory hotadd and hotremove >> were added separately. >> >> >> I haven't tried kdump. >> Unaware kdump from user-space probably won't describe the hotplug >> regions if the name is different, which saves us from problems if >> the memory is no longer present at kdump time, but means the vmcore >> is incomplete. > Whenever memory is added/removed, kdump.service is to be restarted from > user space, which will fixup the data structures such that kdump will > not try to dump unplugged memory. Cunning. > Also, makedumpfile will check if the > sections are still around IIRC. Curious. I thought the vmcore was virtually addressed, how does it know which linear-map portions correspond to sysfs memory nodes with KASLR? > Not sure what you mean by "Unaware kdump from user-space". The existing kexec-tools binaries, that (I assume) don't go probing to find out if 'System RAM' is removable or not, loading a kdump kernel, along with the user-space generated blob that describes the first kernel's memory usage to the second kernel. 'user-space' here to distinguish all this from kexec_file_load(). Thanks, James _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use 2020-03-27 15:42 ` James Morse @ 2020-03-30 13:18 ` David Hildenbrand -1 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-30 13:18 UTC (permalink / raw) To: James Morse Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma On 27.03.20 16:42, James Morse wrote: > Hi David, > > On 3/27/20 9:27 AM, David Hildenbrand wrote: >> On 26.03.20 19:07, James Morse wrote: >>> arm64 recently queued support for memory hotremove, which led to some >>> new corner cases for kexec. >>> >>> If the kexec segments are loaded for a removable region, that region may >>> be removed before kexec actually occurs. This causes the first kernel to >>> lockup when applying the relocations. (I've triggered this on x86 too). >>> >>> The first patch adds a memory notifier for kexec so that it can refuse >>> to allow in-use regions to be taken offline. > >> IIRC other architectures handle that by setting the affected pages >> PageReserved. Any reason why to not stick to the same? > > Hmm, I didn't spot this. How come core code doesn't do it if its needed? > > Doesn't PG_Reserved prevent the page from being used for regular allocations? > (or is that only if its done early) > > I prefer the runtime check as the dmesg output gives the user some chance of > knowing why their memory-offline failed, and doing something about it! I was confused which memory we are trying to protect. Understood now, that you are dealing with the target physical memory described during described during kexec_load. [...] > >> Also, makedumpfile will check if the >> sections are still around IIRC. > > Curious. I thought the vmcore was virtually addressed, how does it know which > linear-map portions correspond to sysfs memory nodes with KASLR? That's a very interesting question. I remember there was KASLR support being implemented specifically for that - but I don't know any details. >> Not sure what you mean by "Unaware kdump from user-space". > > The existing kexec-tools binaries, that (I assume) don't go probing to find out > if 'System RAM' is removable or not, loading a kdump kernel, along with the > user-space generated blob that describes the first kernel's memory usage to the > second kernel. Finally understood how kexec without kdump works, thanks. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-03-30 13:18 ` David Hildenbrand 0 siblings, 0 replies; 264+ messages in thread From: David Hildenbrand @ 2020-03-30 13:18 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel On 27.03.20 16:42, James Morse wrote: > Hi David, > > On 3/27/20 9:27 AM, David Hildenbrand wrote: >> On 26.03.20 19:07, James Morse wrote: >>> arm64 recently queued support for memory hotremove, which led to some >>> new corner cases for kexec. >>> >>> If the kexec segments are loaded for a removable region, that region may >>> be removed before kexec actually occurs. This causes the first kernel to >>> lockup when applying the relocations. (I've triggered this on x86 too). >>> >>> The first patch adds a memory notifier for kexec so that it can refuse >>> to allow in-use regions to be taken offline. > >> IIRC other architectures handle that by setting the affected pages >> PageReserved. Any reason why to not stick to the same? > > Hmm, I didn't spot this. How come core code doesn't do it if its needed? > > Doesn't PG_Reserved prevent the page from being used for regular allocations? > (or is that only if its done early) > > I prefer the runtime check as the dmesg output gives the user some chance of > knowing why their memory-offline failed, and doing something about it! I was confused which memory we are trying to protect. Understood now, that you are dealing with the target physical memory described during described during kexec_load. [...] > >> Also, makedumpfile will check if the >> sections are still around IIRC. > > Curious. I thought the vmcore was virtually addressed, how does it know which > linear-map portions correspond to sysfs memory nodes with KASLR? That's a very interesting question. I remember there was KASLR support being implemented specifically for that - but I don't know any details. >> Not sure what you mean by "Unaware kdump from user-space". > > The existing kexec-tools binaries, that (I assume) don't go probing to find out > if 'System RAM' is removable or not, loading a kdump kernel, along with the > user-space generated blob that describes the first kernel's memory usage to the > second kernel. Finally understood how kexec without kdump works, thanks. -- Thanks, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use 2020-03-26 18:07 ` James Morse @ 2020-03-30 13:55 ` Baoquan He -1 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-03-30 13:55 UTC (permalink / raw) To: James Morse Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma Hi James, On 03/26/20 at 06:07pm, James Morse wrote: > Hello! > > arm64 recently queued support for memory hotremove, which led to some > new corner cases for kexec. > > If the kexec segments are loaded for a removable region, that region may > be removed before kexec actually occurs. This causes the first kernel to > lockup when applying the relocations. (I've triggered this on x86 too). > > The first patch adds a memory notifier for kexec so that it can refuse > to allow in-use regions to be taken offline. I talked about this with Dave Young. Currently, we tend to use kexec_file_load more in the future since most of its implementation is in kernel, we can get information about kernel more easilier. For the kexec kernel loaded into hotpluggable area, we can fix it in kexec_file_load side, we know the MOVABLE zone's start and end. As for the old kexec_load, we would like to keep it for back compatibility. At least in our distros, we have switched to kexec_file_load, will gradually obsolete kexec_load. So for this one, I suggest avoiding those MOVZBLE memory region when searching place for kexec kernel. Not sure if arm64 will still have difficulty. > > > This doesn't solve the problem for arm64, where the new kernel must > initially rely on the data structures from the first boot to describe > memory. These don't describe hotpluggable memory. > If kexec places the kernel in one of these regions, it must also provide > a DT that describes the region in which the kernel was mapped as memory. > (and somehow ensure its always present in the future...) > > To prevent this from happening accidentally with unaware user-space, > patches two and three allow arm64 to give these regions a different > name. > > This is a change in behaviour for arm64 as memory hotadd and hotremove > were added separately. > > > I haven't tried kdump. > Unaware kdump from user-space probably won't describe the hotplug > regions if the name is different, which saves us from problems if > the memory is no longer present at kdump time, but means the vmcore > is incomplete. > > > These patches are based on arm64's for-next/core branch, but can all > be merged independently. > > Thanks, > > James Morse (3): > kexec: Prevent removal of memory in use by a loaded kexec image > mm/memory_hotplug: Allow arch override of non boot memory resource > names > arm64: memory: Give hotplug memory a different resource name > > arch/arm64/include/asm/memory.h | 11 +++++++ > kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++ > mm/memory_hotplug.c | 6 +++- > 3 files changed, 72 insertions(+), 1 deletion(-) > > -- > 2.25.1 > > ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-03-30 13:55 ` Baoquan He 0 siblings, 0 replies; 264+ messages in thread From: Baoquan He @ 2020-03-30 13:55 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel Hi James, On 03/26/20 at 06:07pm, James Morse wrote: > Hello! > > arm64 recently queued support for memory hotremove, which led to some > new corner cases for kexec. > > If the kexec segments are loaded for a removable region, that region may > be removed before kexec actually occurs. This causes the first kernel to > lockup when applying the relocations. (I've triggered this on x86 too). > > The first patch adds a memory notifier for kexec so that it can refuse > to allow in-use regions to be taken offline. I talked about this with Dave Young. Currently, we tend to use kexec_file_load more in the future since most of its implementation is in kernel, we can get information about kernel more easilier. For the kexec kernel loaded into hotpluggable area, we can fix it in kexec_file_load side, we know the MOVABLE zone's start and end. As for the old kexec_load, we would like to keep it for back compatibility. At least in our distros, we have switched to kexec_file_load, will gradually obsolete kexec_load. So for this one, I suggest avoiding those MOVZBLE memory region when searching place for kexec kernel. Not sure if arm64 will still have difficulty. > > > This doesn't solve the problem for arm64, where the new kernel must > initially rely on the data structures from the first boot to describe > memory. These don't describe hotpluggable memory. > If kexec places the kernel in one of these regions, it must also provide > a DT that describes the region in which the kernel was mapped as memory. > (and somehow ensure its always present in the future...) > > To prevent this from happening accidentally with unaware user-space, > patches two and three allow arm64 to give these regions a different > name. > > This is a change in behaviour for arm64 as memory hotadd and hotremove > were added separately. > > > I haven't tried kdump. > Unaware kdump from user-space probably won't describe the hotplug > regions if the name is different, which saves us from problems if > the memory is no longer present at kdump time, but means the vmcore > is incomplete. > > > These patches are based on arm64's for-next/core branch, but can all > be merged independently. > > Thanks, > > James Morse (3): > kexec: Prevent removal of memory in use by a loaded kexec image > mm/memory_hotplug: Allow arch override of non boot memory resource > names > arm64: memory: Give hotplug memory a different resource name > > arch/arm64/include/asm/memory.h | 11 +++++++ > kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++ > mm/memory_hotplug.c | 6 +++- > 3 files changed, 72 insertions(+), 1 deletion(-) > > -- > 2.25.1 > > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use 2020-03-30 13:55 ` Baoquan He @ 2020-03-30 17:17 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-30 17:17 UTC (permalink / raw) To: Baoquan He Cc: kexec, linux-mm, linux-arm-kernel, Eric Biederman, Andrew Morton, Catalin Marinas, Will Deacon, Anshuman Khandual, Bhupesh Sharma Hi Baoquan, On 3/30/20 2:55 PM, Baoquan He wrote: > On 03/26/20 at 06:07pm, James Morse wrote: >> arm64 recently queued support for memory hotremove, which led to some >> new corner cases for kexec. >> >> If the kexec segments are loaded for a removable region, that region may >> be removed before kexec actually occurs. This causes the first kernel to >> lockup when applying the relocations. (I've triggered this on x86 too). >> >> The first patch adds a memory notifier for kexec so that it can refuse >> to allow in-use regions to be taken offline. > > I talked about this with Dave Young. Currently, we tend to use > kexec_file_load more in the future since most of its implementation is > in kernel, we can get information about kernel more easilier. For the > kexec kernel loaded into hotpluggable area, we can fix it in > kexec_file_load side, we know the MOVABLE zone's start and end. As for > the old kexec_load, we would like to keep it for back compatibility. At > least in our distros, we have switched to kexec_file_load, will > gradually obsolete kexec_load. > So for this one, I suggest avoiding those > MOVZBLE memory region when searching place for kexec kernel. How does today's user-space know? > Not sure if arm64 will still have difficulty. arm64 added support for kexec_load first, then kexec_file_load. (evidently a mistake). kexec_file_load support was only added in the last year or so, I'd hazard most people using this, are using the regular load kind. (and probably don't know or care). Thanks, James ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-03-30 17:17 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-03-30 17:17 UTC (permalink / raw) To: Baoquan He Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel Hi Baoquan, On 3/30/20 2:55 PM, Baoquan He wrote: > On 03/26/20 at 06:07pm, James Morse wrote: >> arm64 recently queued support for memory hotremove, which led to some >> new corner cases for kexec. >> >> If the kexec segments are loaded for a removable region, that region may >> be removed before kexec actually occurs. This causes the first kernel to >> lockup when applying the relocations. (I've triggered this on x86 too). >> >> The first patch adds a memory notifier for kexec so that it can refuse >> to allow in-use regions to be taken offline. > > I talked about this with Dave Young. Currently, we tend to use > kexec_file_load more in the future since most of its implementation is > in kernel, we can get information about kernel more easilier. For the > kexec kernel loaded into hotpluggable area, we can fix it in > kexec_file_load side, we know the MOVABLE zone's start and end. As for > the old kexec_load, we would like to keep it for back compatibility. At > least in our distros, we have switched to kexec_file_load, will > gradually obsolete kexec_load. > So for this one, I suggest avoiding those > MOVZBLE memory region when searching place for kexec kernel. How does today's user-space know? > Not sure if arm64 will still have difficulty. arm64 added support for kexec_load first, then kexec_file_load. (evidently a mistake). kexec_file_load support was only added in the last year or so, I'd hazard most people using this, are using the regular load kind. (and probably don't know or care). Thanks, James _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use 2020-03-30 17:17 ` James Morse @ 2020-03-31 3:46 ` Dave Young -1 siblings, 0 replies; 264+ messages in thread From: Dave Young @ 2020-03-31 3:46 UTC (permalink / raw) To: James Morse Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel Hi James, On 03/30/20 at 06:17pm, James Morse wrote: > Hi Baoquan, > > On 3/30/20 2:55 PM, Baoquan He wrote: > > On 03/26/20 at 06:07pm, James Morse wrote: > >> arm64 recently queued support for memory hotremove, which led to some > >> new corner cases for kexec. > >> > >> If the kexec segments are loaded for a removable region, that region may > >> be removed before kexec actually occurs. This causes the first kernel to > >> lockup when applying the relocations. (I've triggered this on x86 too). > >> > >> The first patch adds a memory notifier for kexec so that it can refuse > >> to allow in-use regions to be taken offline. > > > > I talked about this with Dave Young. Currently, we tend to use > > kexec_file_load more in the future since most of its implementation is > > in kernel, we can get information about kernel more easilier. For the > > kexec kernel loaded into hotpluggable area, we can fix it in > > kexec_file_load side, we know the MOVABLE zone's start and end. As for > > the old kexec_load, we would like to keep it for back compatibility. At > > least in our distros, we have switched to kexec_file_load, will > > gradually obsolete kexec_load. > > > So for this one, I suggest avoiding those > > MOVZBLE memory region when searching place for kexec kernel. > > How does today's user-space know? > > > > Not sure if arm64 will still have difficulty. > > arm64 added support for kexec_load first, then kexec_file_load. (evidently a > mistake). > kexec_file_load support was only added in the last year or so, I'd hazard most > people using this, are using the regular load kind. (and probably don't know or > care). I agreed that file load is still not widely used, but in the long run we should not maintain both of them all the future time. Especially when some kernel-userspace interfaces need to be introduced, file load will have the natural advantage. We may keep the kexec_load for other misc usecases, but we can use file load for the major modern linux-to-linux loading. I'm not saying we can do it immediately, just thought we should reduce the duplicate effort and try to avoid hacking if possible. Anyway about this particular issue, I wonder if we can just reload with a udev rule as replied in another mail. Thanks Dave ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-03-31 3:46 ` Dave Young 0 siblings, 0 replies; 264+ messages in thread From: Dave Young @ 2020-03-31 3:46 UTC (permalink / raw) To: James Morse Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel Hi James, On 03/30/20 at 06:17pm, James Morse wrote: > Hi Baoquan, > > On 3/30/20 2:55 PM, Baoquan He wrote: > > On 03/26/20 at 06:07pm, James Morse wrote: > >> arm64 recently queued support for memory hotremove, which led to some > >> new corner cases for kexec. > >> > >> If the kexec segments are loaded for a removable region, that region may > >> be removed before kexec actually occurs. This causes the first kernel to > >> lockup when applying the relocations. (I've triggered this on x86 too). > >> > >> The first patch adds a memory notifier for kexec so that it can refuse > >> to allow in-use regions to be taken offline. > > > > I talked about this with Dave Young. Currently, we tend to use > > kexec_file_load more in the future since most of its implementation is > > in kernel, we can get information about kernel more easilier. For the > > kexec kernel loaded into hotpluggable area, we can fix it in > > kexec_file_load side, we know the MOVABLE zone's start and end. As for > > the old kexec_load, we would like to keep it for back compatibility. At > > least in our distros, we have switched to kexec_file_load, will > > gradually obsolete kexec_load. > > > So for this one, I suggest avoiding those > > MOVZBLE memory region when searching place for kexec kernel. > > How does today's user-space know? > > > > Not sure if arm64 will still have difficulty. > > arm64 added support for kexec_load first, then kexec_file_load. (evidently a > mistake). > kexec_file_load support was only added in the last year or so, I'd hazard most > people using this, are using the regular load kind. (and probably don't know or > care). I agreed that file load is still not widely used, but in the long run we should not maintain both of them all the future time. Especially when some kernel-userspace interfaces need to be introduced, file load will have the natural advantage. We may keep the kexec_load for other misc usecases, but we can use file load for the major modern linux-to-linux loading. I'm not saying we can do it immediately, just thought we should reduce the duplicate effort and try to avoid hacking if possible. Anyway about this particular issue, I wonder if we can just reload with a udev rule as replied in another mail. Thanks Dave _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use 2020-03-31 3:46 ` Dave Young (?) @ 2020-04-14 17:31 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-14 17:31 UTC (permalink / raw) To: Dave Young Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel Hi Dave, On 31/03/2020 04:46, Dave Young wrote: > I agreed that file load is still not widely used, but in the long run > we should not maintain both of them all the future time. Especially > when some kernel-userspace interfaces need to be introduced, file load > will have the natural advantage. We may keep the kexec_load for other > misc usecases, but we can use file load for the major modern > linux-to-linux loading. I'm not saying we can do it immediately, just > thought we should reduce the duplicate effort and try to avoid hacking if > possible. Sure. My aim here is to never debug this problem again. > Anyway about this particular issue, I wonder if we can just reload with > a udev rule as replied in another mail. What if it doesn't? I can't find such a rule on my debian machine. I don't think user-space can be relied on for something like this. The best we could hope for here is a dying gasp from the old kernel: | kexec: memory layout changed since kexec load, this may not work. | Bye! ... assuming anyone sees such a message. Thanks, James ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-04-14 17:31 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-14 17:31 UTC (permalink / raw) To: Dave Young Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel Hi Dave, On 31/03/2020 04:46, Dave Young wrote: > I agreed that file load is still not widely used, but in the long run > we should not maintain both of them all the future time. Especially > when some kernel-userspace interfaces need to be introduced, file load > will have the natural advantage. We may keep the kexec_load for other > misc usecases, but we can use file load for the major modern > linux-to-linux loading. I'm not saying we can do it immediately, just > thought we should reduce the duplicate effort and try to avoid hacking if > possible. Sure. My aim here is to never debug this problem again. > Anyway about this particular issue, I wonder if we can just reload with > a udev rule as replied in another mail. What if it doesn't? I can't find such a rule on my debian machine. I don't think user-space can be relied on for something like this. The best we could hope for here is a dying gasp from the old kernel: | kexec: memory layout changed since kexec load, this may not work. | Bye! ... assuming anyone sees such a message. Thanks, James _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-04-14 17:31 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-14 17:31 UTC (permalink / raw) To: Dave Young Cc: Baoquan He, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel Hi Dave, On 31/03/2020 04:46, Dave Young wrote: > I agreed that file load is still not widely used, but in the long run > we should not maintain both of them all the future time. Especially > when some kernel-userspace interfaces need to be introduced, file load > will have the natural advantage. We may keep the kexec_load for other > misc usecases, but we can use file load for the major modern > linux-to-linux loading. I'm not saying we can do it immediately, just > thought we should reduce the duplicate effort and try to avoid hacking if > possible. Sure. My aim here is to never debug this problem again. > Anyway about this particular issue, I wonder if we can just reload with > a udev rule as replied in another mail. What if it doesn't? I can't find such a rule on my debian machine. I don't think user-space can be relied on for something like this. The best we could hope for here is a dying gasp from the old kernel: | kexec: memory layout changed since kexec load, this may not work. | Bye! ... assuming anyone sees such a message. Thanks, James _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use 2020-03-26 18:07 ` James Morse @ 2020-03-31 3:38 ` Dave Young -1 siblings, 0 replies; 264+ messages in thread From: Dave Young @ 2020-03-31 3:38 UTC (permalink / raw) To: James Morse Cc: kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Eric Biederman, Andrew Morton, Will Deacon Hi James, On 03/26/20 at 06:07pm, James Morse wrote: > Hello! > > arm64 recently queued support for memory hotremove, which led to some > new corner cases for kexec. > > If the kexec segments are loaded for a removable region, that region may > be removed before kexec actually occurs. This causes the first kernel to > lockup when applying the relocations. (I've triggered this on x86 too). Does a kexec reload work for your case? If yes then I would suggest to do it in userspace, for example have a udev rule to reload kexec if needed. Actually we have a rule to restart kdump loading, but not for kexec, it sounds also need a service to load kexec, and an udev rule to reload for memory hotplug. > > The first patch adds a memory notifier for kexec so that it can refuse > to allow in-use regions to be taken offline. > > > This doesn't solve the problem for arm64, where the new kernel must > initially rely on the data structures from the first boot to describe > memory. These don't describe hotpluggable memory. > If kexec places the kernel in one of these regions, it must also provide > a DT that describes the region in which the kernel was mapped as memory. > (and somehow ensure its always present in the future...) > > To prevent this from happening accidentally with unaware user-space, > patches two and three allow arm64 to give these regions a different > name. > > This is a change in behaviour for arm64 as memory hotadd and hotremove > were added separately. > > > I haven't tried kdump. > Unaware kdump from user-space probably won't describe the hotplug > regions if the name is different, which saves us from problems if > the memory is no longer present at kdump time, but means the vmcore > is incomplete. > > > These patches are based on arm64's for-next/core branch, but can all > be merged independently. > > Thanks, > > James Morse (3): > kexec: Prevent removal of memory in use by a loaded kexec image > mm/memory_hotplug: Allow arch override of non boot memory resource > names > arm64: memory: Give hotplug memory a different resource name > > arch/arm64/include/asm/memory.h | 11 +++++++ > kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++ > mm/memory_hotplug.c | 6 +++- > 3 files changed, 72 insertions(+), 1 deletion(-) > > -- > 2.25.1 > > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec > Thanks Dave ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-03-31 3:38 ` Dave Young 0 siblings, 0 replies; 264+ messages in thread From: Dave Young @ 2020-03-31 3:38 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Eric Biederman, Andrew Morton, Will Deacon, linux-arm-kernel Hi James, On 03/26/20 at 06:07pm, James Morse wrote: > Hello! > > arm64 recently queued support for memory hotremove, which led to some > new corner cases for kexec. > > If the kexec segments are loaded for a removable region, that region may > be removed before kexec actually occurs. This causes the first kernel to > lockup when applying the relocations. (I've triggered this on x86 too). Does a kexec reload work for your case? If yes then I would suggest to do it in userspace, for example have a udev rule to reload kexec if needed. Actually we have a rule to restart kdump loading, but not for kexec, it sounds also need a service to load kexec, and an udev rule to reload for memory hotplug. > > The first patch adds a memory notifier for kexec so that it can refuse > to allow in-use regions to be taken offline. > > > This doesn't solve the problem for arm64, where the new kernel must > initially rely on the data structures from the first boot to describe > memory. These don't describe hotpluggable memory. > If kexec places the kernel in one of these regions, it must also provide > a DT that describes the region in which the kernel was mapped as memory. > (and somehow ensure its always present in the future...) > > To prevent this from happening accidentally with unaware user-space, > patches two and three allow arm64 to give these regions a different > name. > > This is a change in behaviour for arm64 as memory hotadd and hotremove > were added separately. > > > I haven't tried kdump. > Unaware kdump from user-space probably won't describe the hotplug > regions if the name is different, which saves us from problems if > the memory is no longer present at kdump time, but means the vmcore > is incomplete. > > > These patches are based on arm64's for-next/core branch, but can all > be merged independently. > > Thanks, > > James Morse (3): > kexec: Prevent removal of memory in use by a loaded kexec image > mm/memory_hotplug: Allow arch override of non boot memory resource > names > arm64: memory: Give hotplug memory a different resource name > > arch/arm64/include/asm/memory.h | 11 +++++++ > kernel/kexec_core.c | 56 +++++++++++++++++++++++++++++++++ > mm/memory_hotplug.c | 6 +++- > 3 files changed, 72 insertions(+), 1 deletion(-) > > -- > 2.25.1 > > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec > Thanks Dave _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use 2020-03-26 18:07 ` James Morse (?) @ 2020-04-15 20:29 ` Eric W. Biederman -1 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-15 20:29 UTC (permalink / raw) To: James Morse Cc: kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Andrew Morton, Will Deacon James Morse <james.morse@arm.com> writes: > Hello! > > arm64 recently queued support for memory hotremove, which led to some > new corner cases for kexec. > > If the kexec segments are loaded for a removable region, that region may > be removed before kexec actually occurs. This causes the first kernel to > lockup when applying the relocations. (I've triggered this on x86 too). > > The first patch adds a memory notifier for kexec so that it can refuse > to allow in-use regions to be taken offline. > > > This doesn't solve the problem for arm64, where the new kernel must > initially rely on the data structures from the first boot to describe > memory. These don't describe hotpluggable memory. > If kexec places the kernel in one of these regions, it must also provide > a DT that describes the region in which the kernel was mapped as memory. > (and somehow ensure its always present in the future...) > > To prevent this from happening accidentally with unaware user-space, > patches two and three allow arm64 to give these regions a different > name. > > This is a change in behaviour for arm64 as memory hotadd and hotremove > were added separately. > > > I haven't tried kdump. > Unaware kdump from user-space probably won't describe the hotplug > regions if the name is different, which saves us from problems if > the memory is no longer present at kdump time, but means the vmcore > is incomplete. > > > These patches are based on arm64's for-next/core branch, but can all > be merged independently. So I just looked through these quickly and I think there are real problems here we can fix, and that are worth fixing. However I am not thrilled with the fixes you propose. Eric ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-04-15 20:29 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-15 20:29 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel James Morse <james.morse@arm.com> writes: > Hello! > > arm64 recently queued support for memory hotremove, which led to some > new corner cases for kexec. > > If the kexec segments are loaded for a removable region, that region may > be removed before kexec actually occurs. This causes the first kernel to > lockup when applying the relocations. (I've triggered this on x86 too). > > The first patch adds a memory notifier for kexec so that it can refuse > to allow in-use regions to be taken offline. > > > This doesn't solve the problem for arm64, where the new kernel must > initially rely on the data structures from the first boot to describe > memory. These don't describe hotpluggable memory. > If kexec places the kernel in one of these regions, it must also provide > a DT that describes the region in which the kernel was mapped as memory. > (and somehow ensure its always present in the future...) > > To prevent this from happening accidentally with unaware user-space, > patches two and three allow arm64 to give these regions a different > name. > > This is a change in behaviour for arm64 as memory hotadd and hotremove > were added separately. > > > I haven't tried kdump. > Unaware kdump from user-space probably won't describe the hotplug > regions if the name is different, which saves us from problems if > the memory is no longer present at kdump time, but means the vmcore > is incomplete. > > > These patches are based on arm64's for-next/core branch, but can all > be merged independently. So I just looked through these quickly and I think there are real problems here we can fix, and that are worth fixing. However I am not thrilled with the fixes you propose. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-04-15 20:29 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-15 20:29 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel James Morse <james.morse@arm.com> writes: > Hello! > > arm64 recently queued support for memory hotremove, which led to some > new corner cases for kexec. > > If the kexec segments are loaded for a removable region, that region may > be removed before kexec actually occurs. This causes the first kernel to > lockup when applying the relocations. (I've triggered this on x86 too). > > The first patch adds a memory notifier for kexec so that it can refuse > to allow in-use regions to be taken offline. > > > This doesn't solve the problem for arm64, where the new kernel must > initially rely on the data structures from the first boot to describe > memory. These don't describe hotpluggable memory. > If kexec places the kernel in one of these regions, it must also provide > a DT that describes the region in which the kernel was mapped as memory. > (and somehow ensure its always present in the future...) > > To prevent this from happening accidentally with unaware user-space, > patches two and three allow arm64 to give these regions a different > name. > > This is a change in behaviour for arm64 as memory hotadd and hotremove > were added separately. > > > I haven't tried kdump. > Unaware kdump from user-space probably won't describe the hotplug > regions if the name is different, which saves us from problems if > the memory is no longer present at kdump time, but means the vmcore > is incomplete. > > > These patches are based on arm64's for-next/core branch, but can all > be merged independently. So I just looked through these quickly and I think there are real problems here we can fix, and that are worth fixing. However I am not thrilled with the fixes you propose. Eric _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use 2020-04-15 20:29 ` Eric W. Biederman (?) @ 2020-04-22 12:14 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-22 12:14 UTC (permalink / raw) To: Eric W. Biederman Cc: kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Andrew Morton, Will Deacon Hi Eric, On 15/04/2020 21:29, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: > >> Hello! >> >> arm64 recently queued support for memory hotremove, which led to some >> new corner cases for kexec. >> >> If the kexec segments are loaded for a removable region, that region may >> be removed before kexec actually occurs. This causes the first kernel to >> lockup when applying the relocations. (I've triggered this on x86 too). >> >> The first patch adds a memory notifier for kexec so that it can refuse >> to allow in-use regions to be taken offline. >> >> >> This doesn't solve the problem for arm64, where the new kernel must >> initially rely on the data structures from the first boot to describe >> memory. These don't describe hotpluggable memory. >> If kexec places the kernel in one of these regions, it must also provide >> a DT that describes the region in which the kernel was mapped as memory. >> (and somehow ensure its always present in the future...) >> >> To prevent this from happening accidentally with unaware user-space, >> patches two and three allow arm64 to give these regions a different >> name. >> >> This is a change in behaviour for arm64 as memory hotadd and hotremove >> were added separately. >> >> >> I haven't tried kdump. >> Unaware kdump from user-space probably won't describe the hotplug >> regions if the name is different, which saves us from problems if >> the memory is no longer present at kdump time, but means the vmcore >> is incomplete. >> >> >> These patches are based on arm64's for-next/core branch, but can all >> be merged independently. > > So I just looked through these quickly and I think there are real > problems here we can fix, and that are worth fixing. > > However I am not thrilled with the fixes you propose. Sure. Unfortunately /proc/iomem is the only trick arm64 has to keep the existing kexec-tools working. (We've had 'unthrilling' patches like this before to prevent user-space from loading the kernel over the top of the in-memory firmware tables.) arm64 expects the description of memory to come from firmware, be that UEFI for memory present at boot, or the ACPI AML methods for memory that was added later. On arm64 there is no standard location for memory. The kernel has to be handed a pointer to the firmware tables that describe it. The kernel expects to boot from memory that was present at boot. Modifying the firmware tables at runtime doesn't solve the problem as we may need to move the firmware-reserved memory region that describes memory. User-space may still load and kexec either side of that update. Even if we could modify the structures at runtime, we can't update a loaded kexec image. We have no idea which blob from userspace is the DT. It may not even be linux that has been loaded. We can't emulate parts of UEFI's handover because kexec's purgatory isn't an EFI program. I can't see a path through all this. If we have to modify existing user-space, I'd rather leave it broken. We can detect the problem in the arch code and print a warning at load time. James ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-04-22 12:14 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-22 12:14 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel Hi Eric, On 15/04/2020 21:29, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: > >> Hello! >> >> arm64 recently queued support for memory hotremove, which led to some >> new corner cases for kexec. >> >> If the kexec segments are loaded for a removable region, that region may >> be removed before kexec actually occurs. This causes the first kernel to >> lockup when applying the relocations. (I've triggered this on x86 too). >> >> The first patch adds a memory notifier for kexec so that it can refuse >> to allow in-use regions to be taken offline. >> >> >> This doesn't solve the problem for arm64, where the new kernel must >> initially rely on the data structures from the first boot to describe >> memory. These don't describe hotpluggable memory. >> If kexec places the kernel in one of these regions, it must also provide >> a DT that describes the region in which the kernel was mapped as memory. >> (and somehow ensure its always present in the future...) >> >> To prevent this from happening accidentally with unaware user-space, >> patches two and three allow arm64 to give these regions a different >> name. >> >> This is a change in behaviour for arm64 as memory hotadd and hotremove >> were added separately. >> >> >> I haven't tried kdump. >> Unaware kdump from user-space probably won't describe the hotplug >> regions if the name is different, which saves us from problems if >> the memory is no longer present at kdump time, but means the vmcore >> is incomplete. >> >> >> These patches are based on arm64's for-next/core branch, but can all >> be merged independently. > > So I just looked through these quickly and I think there are real > problems here we can fix, and that are worth fixing. > > However I am not thrilled with the fixes you propose. Sure. Unfortunately /proc/iomem is the only trick arm64 has to keep the existing kexec-tools working. (We've had 'unthrilling' patches like this before to prevent user-space from loading the kernel over the top of the in-memory firmware tables.) arm64 expects the description of memory to come from firmware, be that UEFI for memory present at boot, or the ACPI AML methods for memory that was added later. On arm64 there is no standard location for memory. The kernel has to be handed a pointer to the firmware tables that describe it. The kernel expects to boot from memory that was present at boot. Modifying the firmware tables at runtime doesn't solve the problem as we may need to move the firmware-reserved memory region that describes memory. User-space may still load and kexec either side of that update. Even if we could modify the structures at runtime, we can't update a loaded kexec image. We have no idea which blob from userspace is the DT. It may not even be linux that has been loaded. We can't emulate parts of UEFI's handover because kexec's purgatory isn't an EFI program. I can't see a path through all this. If we have to modify existing user-space, I'd rather leave it broken. We can detect the problem in the arch code and print a warning at load time. James _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-04-22 12:14 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-22 12:14 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel Hi Eric, On 15/04/2020 21:29, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: > >> Hello! >> >> arm64 recently queued support for memory hotremove, which led to some >> new corner cases for kexec. >> >> If the kexec segments are loaded for a removable region, that region may >> be removed before kexec actually occurs. This causes the first kernel to >> lockup when applying the relocations. (I've triggered this on x86 too). >> >> The first patch adds a memory notifier for kexec so that it can refuse >> to allow in-use regions to be taken offline. >> >> >> This doesn't solve the problem for arm64, where the new kernel must >> initially rely on the data structures from the first boot to describe >> memory. These don't describe hotpluggable memory. >> If kexec places the kernel in one of these regions, it must also provide >> a DT that describes the region in which the kernel was mapped as memory. >> (and somehow ensure its always present in the future...) >> >> To prevent this from happening accidentally with unaware user-space, >> patches two and three allow arm64 to give these regions a different >> name. >> >> This is a change in behaviour for arm64 as memory hotadd and hotremove >> were added separately. >> >> >> I haven't tried kdump. >> Unaware kdump from user-space probably won't describe the hotplug >> regions if the name is different, which saves us from problems if >> the memory is no longer present at kdump time, but means the vmcore >> is incomplete. >> >> >> These patches are based on arm64's for-next/core branch, but can all >> be merged independently. > > So I just looked through these quickly and I think there are real > problems here we can fix, and that are worth fixing. > > However I am not thrilled with the fixes you propose. Sure. Unfortunately /proc/iomem is the only trick arm64 has to keep the existing kexec-tools working. (We've had 'unthrilling' patches like this before to prevent user-space from loading the kernel over the top of the in-memory firmware tables.) arm64 expects the description of memory to come from firmware, be that UEFI for memory present at boot, or the ACPI AML methods for memory that was added later. On arm64 there is no standard location for memory. The kernel has to be handed a pointer to the firmware tables that describe it. The kernel expects to boot from memory that was present at boot. Modifying the firmware tables at runtime doesn't solve the problem as we may need to move the firmware-reserved memory region that describes memory. User-space may still load and kexec either side of that update. Even if we could modify the structures at runtime, we can't update a loaded kexec image. We have no idea which blob from userspace is the DT. It may not even be linux that has been loaded. We can't emulate parts of UEFI's handover because kexec's purgatory isn't an EFI program. I can't see a path through all this. If we have to modify existing user-space, I'd rather leave it broken. We can detect the problem in the arch code and print a warning at load time. James _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use 2020-04-22 12:14 ` James Morse (?) @ 2020-04-22 13:04 ` Eric W. Biederman -1 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-22 13:04 UTC (permalink / raw) To: James Morse Cc: kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Andrew Morton, Will Deacon James Morse <james.morse@arm.com> writes: > Hi Eric, > > On 15/04/2020 21:29, Eric W. Biederman wrote: >> James Morse <james.morse@arm.com> writes: >> >>> Hello! >>> >>> arm64 recently queued support for memory hotremove, which led to some >>> new corner cases for kexec. >>> >>> If the kexec segments are loaded for a removable region, that region may >>> be removed before kexec actually occurs. This causes the first kernel to >>> lockup when applying the relocations. (I've triggered this on x86 too). >>> >>> The first patch adds a memory notifier for kexec so that it can refuse >>> to allow in-use regions to be taken offline. >>> >>> >>> This doesn't solve the problem for arm64, where the new kernel must >>> initially rely on the data structures from the first boot to describe >>> memory. These don't describe hotpluggable memory. >>> If kexec places the kernel in one of these regions, it must also provide >>> a DT that describes the region in which the kernel was mapped as memory. >>> (and somehow ensure its always present in the future...) >>> >>> To prevent this from happening accidentally with unaware user-space, >>> patches two and three allow arm64 to give these regions a different >>> name. >>> >>> This is a change in behaviour for arm64 as memory hotadd and hotremove >>> were added separately. >>> >>> >>> I haven't tried kdump. >>> Unaware kdump from user-space probably won't describe the hotplug >>> regions if the name is different, which saves us from problems if >>> the memory is no longer present at kdump time, but means the vmcore >>> is incomplete. >>> >>> >>> These patches are based on arm64's for-next/core branch, but can all >>> be merged independently. >> >> So I just looked through these quickly and I think there are real >> problems here we can fix, and that are worth fixing. >> >> However I am not thrilled with the fixes you propose. > > Sure. Unfortunately /proc/iomem is the only trick arm64 has to keep the existing > kexec-tools working. > (We've had 'unthrilling' patches like this before to prevent user-space from loading the > kernel over the top of the in-memory firmware tables.) > > arm64 expects the description of memory to come from firmware, be that UEFI for memory > present at boot, or the ACPI AML methods for memory that was added > later. > > On arm64 there is no standard location for memory. The kernel has to be handed a pointer > to the firmware tables that describe it. The kernel expects to boot from memory that was > present at boot. What do you do when the firmware is wrong? Does arm64 support the mem=xxx@yyy kernel command line options? If you want to handle the general case of memory hotplug having a limitation that you have to boot from memory that was present at boot is a bug, because the memory might not be there. > Modifying the firmware tables at runtime doesn't solve the problem as we may need to move > the firmware-reserved memory region that describes memory. User-space may still load and > kexec either side of that update. > > Even if we could modify the structures at runtime, we can't update a loaded kexec image. > We have no idea which blob from userspace is the DT. It may not even be linux that has > been loaded. What can be done and very reasonably so is on memory hotplug: - Unloaded any loaded kexec image. - Block loading any new image until the hotplug operation completes. That is simple and generic, and can be done for all architectures. This doesn't apply to kexec on panic kernel because it fundamentally needs to figure out how to limp along (or reliably stop) when it has the wrong memory map. > We can't emulate parts of UEFI's handover because kexec's purgatory > isn't an EFI program. Plus much of EFI is unusable after ExitBootServices is called. > I can't see a path through all this. If we have to modify existing user-space, I'd rather > leave it broken. We can detect the problem in the arch code and print a warning at load time. The weirdest thing to me in all of this is that you have been wanting to handle memory hotplug. But you don't want to change or deal with the memory map changing when hotplug occurs. The memory map changing is fundamentally memory hotplug does. So I think it is fundamental to figure out how to pass the updated memory map. Either through command line mem=xxx@yyy command line options or through another option. If you really want to keep the limitation that you have to have the kernel in the initial memory map you can compare that map to the efi tables when selecting the load address. Expecting userspace to reload the loaded kernel after memory hotplug is completely reasonable. Unless I am mistaken memory hotplug is expected to be a rare event not something that happens every day, certainly not something that happens every minute. Eric ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-04-22 13:04 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-22 13:04 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel James Morse <james.morse@arm.com> writes: > Hi Eric, > > On 15/04/2020 21:29, Eric W. Biederman wrote: >> James Morse <james.morse@arm.com> writes: >> >>> Hello! >>> >>> arm64 recently queued support for memory hotremove, which led to some >>> new corner cases for kexec. >>> >>> If the kexec segments are loaded for a removable region, that region may >>> be removed before kexec actually occurs. This causes the first kernel to >>> lockup when applying the relocations. (I've triggered this on x86 too). >>> >>> The first patch adds a memory notifier for kexec so that it can refuse >>> to allow in-use regions to be taken offline. >>> >>> >>> This doesn't solve the problem for arm64, where the new kernel must >>> initially rely on the data structures from the first boot to describe >>> memory. These don't describe hotpluggable memory. >>> If kexec places the kernel in one of these regions, it must also provide >>> a DT that describes the region in which the kernel was mapped as memory. >>> (and somehow ensure its always present in the future...) >>> >>> To prevent this from happening accidentally with unaware user-space, >>> patches two and three allow arm64 to give these regions a different >>> name. >>> >>> This is a change in behaviour for arm64 as memory hotadd and hotremove >>> were added separately. >>> >>> >>> I haven't tried kdump. >>> Unaware kdump from user-space probably won't describe the hotplug >>> regions if the name is different, which saves us from problems if >>> the memory is no longer present at kdump time, but means the vmcore >>> is incomplete. >>> >>> >>> These patches are based on arm64's for-next/core branch, but can all >>> be merged independently. >> >> So I just looked through these quickly and I think there are real >> problems here we can fix, and that are worth fixing. >> >> However I am not thrilled with the fixes you propose. > > Sure. Unfortunately /proc/iomem is the only trick arm64 has to keep the existing > kexec-tools working. > (We've had 'unthrilling' patches like this before to prevent user-space from loading the > kernel over the top of the in-memory firmware tables.) > > arm64 expects the description of memory to come from firmware, be that UEFI for memory > present at boot, or the ACPI AML methods for memory that was added > later. > > On arm64 there is no standard location for memory. The kernel has to be handed a pointer > to the firmware tables that describe it. The kernel expects to boot from memory that was > present at boot. What do you do when the firmware is wrong? Does arm64 support the mem=xxx@yyy kernel command line options? If you want to handle the general case of memory hotplug having a limitation that you have to boot from memory that was present at boot is a bug, because the memory might not be there. > Modifying the firmware tables at runtime doesn't solve the problem as we may need to move > the firmware-reserved memory region that describes memory. User-space may still load and > kexec either side of that update. > > Even if we could modify the structures at runtime, we can't update a loaded kexec image. > We have no idea which blob from userspace is the DT. It may not even be linux that has > been loaded. What can be done and very reasonably so is on memory hotplug: - Unloaded any loaded kexec image. - Block loading any new image until the hotplug operation completes. That is simple and generic, and can be done for all architectures. This doesn't apply to kexec on panic kernel because it fundamentally needs to figure out how to limp along (or reliably stop) when it has the wrong memory map. > We can't emulate parts of UEFI's handover because kexec's purgatory > isn't an EFI program. Plus much of EFI is unusable after ExitBootServices is called. > I can't see a path through all this. If we have to modify existing user-space, I'd rather > leave it broken. We can detect the problem in the arch code and print a warning at load time. The weirdest thing to me in all of this is that you have been wanting to handle memory hotplug. But you don't want to change or deal with the memory map changing when hotplug occurs. The memory map changing is fundamentally memory hotplug does. So I think it is fundamental to figure out how to pass the updated memory map. Either through command line mem=xxx@yyy command line options or through another option. If you really want to keep the limitation that you have to have the kernel in the initial memory map you can compare that map to the efi tables when selecting the load address. Expecting userspace to reload the loaded kernel after memory hotplug is completely reasonable. Unless I am mistaken memory hotplug is expected to be a rare event not something that happens every day, certainly not something that happens every minute. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-04-22 13:04 ` Eric W. Biederman 0 siblings, 0 replies; 264+ messages in thread From: Eric W. Biederman @ 2020-04-22 13:04 UTC (permalink / raw) To: James Morse Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel James Morse <james.morse@arm.com> writes: > Hi Eric, > > On 15/04/2020 21:29, Eric W. Biederman wrote: >> James Morse <james.morse@arm.com> writes: >> >>> Hello! >>> >>> arm64 recently queued support for memory hotremove, which led to some >>> new corner cases for kexec. >>> >>> If the kexec segments are loaded for a removable region, that region may >>> be removed before kexec actually occurs. This causes the first kernel to >>> lockup when applying the relocations. (I've triggered this on x86 too). >>> >>> The first patch adds a memory notifier for kexec so that it can refuse >>> to allow in-use regions to be taken offline. >>> >>> >>> This doesn't solve the problem for arm64, where the new kernel must >>> initially rely on the data structures from the first boot to describe >>> memory. These don't describe hotpluggable memory. >>> If kexec places the kernel in one of these regions, it must also provide >>> a DT that describes the region in which the kernel was mapped as memory. >>> (and somehow ensure its always present in the future...) >>> >>> To prevent this from happening accidentally with unaware user-space, >>> patches two and three allow arm64 to give these regions a different >>> name. >>> >>> This is a change in behaviour for arm64 as memory hotadd and hotremove >>> were added separately. >>> >>> >>> I haven't tried kdump. >>> Unaware kdump from user-space probably won't describe the hotplug >>> regions if the name is different, which saves us from problems if >>> the memory is no longer present at kdump time, but means the vmcore >>> is incomplete. >>> >>> >>> These patches are based on arm64's for-next/core branch, but can all >>> be merged independently. >> >> So I just looked through these quickly and I think there are real >> problems here we can fix, and that are worth fixing. >> >> However I am not thrilled with the fixes you propose. > > Sure. Unfortunately /proc/iomem is the only trick arm64 has to keep the existing > kexec-tools working. > (We've had 'unthrilling' patches like this before to prevent user-space from loading the > kernel over the top of the in-memory firmware tables.) > > arm64 expects the description of memory to come from firmware, be that UEFI for memory > present at boot, or the ACPI AML methods for memory that was added > later. > > On arm64 there is no standard location for memory. The kernel has to be handed a pointer > to the firmware tables that describe it. The kernel expects to boot from memory that was > present at boot. What do you do when the firmware is wrong? Does arm64 support the mem=xxx@yyy kernel command line options? If you want to handle the general case of memory hotplug having a limitation that you have to boot from memory that was present at boot is a bug, because the memory might not be there. > Modifying the firmware tables at runtime doesn't solve the problem as we may need to move > the firmware-reserved memory region that describes memory. User-space may still load and > kexec either side of that update. > > Even if we could modify the structures at runtime, we can't update a loaded kexec image. > We have no idea which blob from userspace is the DT. It may not even be linux that has > been loaded. What can be done and very reasonably so is on memory hotplug: - Unloaded any loaded kexec image. - Block loading any new image until the hotplug operation completes. That is simple and generic, and can be done for all architectures. This doesn't apply to kexec on panic kernel because it fundamentally needs to figure out how to limp along (or reliably stop) when it has the wrong memory map. > We can't emulate parts of UEFI's handover because kexec's purgatory > isn't an EFI program. Plus much of EFI is unusable after ExitBootServices is called. > I can't see a path through all this. If we have to modify existing user-space, I'd rather > leave it broken. We can detect the problem in the arch code and print a warning at load time. The weirdest thing to me in all of this is that you have been wanting to handle memory hotplug. But you don't want to change or deal with the memory map changing when hotplug occurs. The memory map changing is fundamentally memory hotplug does. So I think it is fundamental to figure out how to pass the updated memory map. Either through command line mem=xxx@yyy command line options or through another option. If you really want to keep the limitation that you have to have the kernel in the initial memory map you can compare that map to the efi tables when selecting the load address. Expecting userspace to reload the loaded kernel after memory hotplug is completely reasonable. Unless I am mistaken memory hotplug is expected to be a rare event not something that happens every day, certainly not something that happens every minute. Eric _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use 2020-04-22 13:04 ` Eric W. Biederman (?) @ 2020-04-22 15:40 ` James Morse -1 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-22 15:40 UTC (permalink / raw) To: Eric W. Biederman Cc: kexec, linux-mm, linux-arm-kernel, Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, Andrew Morton, Will Deacon Hi Eric, On 22/04/2020 14:04, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: >> On 15/04/2020 21:29, Eric W. Biederman wrote: >>> James Morse <james.morse@arm.com> writes: >>>> arm64 recently queued support for memory hotremove, which led to some >>>> new corner cases for kexec. >>>> >>>> If the kexec segments are loaded for a removable region, that region may >>>> be removed before kexec actually occurs. This causes the first kernel to >>>> lockup when applying the relocations. (I've triggered this on x86 too). >>>> >>>> The first patch adds a memory notifier for kexec so that it can refuse >>>> to allow in-use regions to be taken offline. >>>> >>>> >>>> This doesn't solve the problem for arm64, where the new kernel must >>>> initially rely on the data structures from the first boot to describe >>>> memory. These don't describe hotpluggable memory. >>>> If kexec places the kernel in one of these regions, it must also provide >>>> a DT that describes the region in which the kernel was mapped as memory. >>>> (and somehow ensure its always present in the future...) >>>> >>>> To prevent this from happening accidentally with unaware user-space, >>>> patches two and three allow arm64 to give these regions a different >>>> name. >>>> >>>> This is a change in behaviour for arm64 as memory hotadd and hotremove >>>> were added separately. >>>> >>>> >>>> I haven't tried kdump. >>>> Unaware kdump from user-space probably won't describe the hotplug >>>> regions if the name is different, which saves us from problems if >>>> the memory is no longer present at kdump time, but means the vmcore >>>> is incomplete. >>>> >>>> >>>> These patches are based on arm64's for-next/core branch, but can all >>>> be merged independently. >>> >>> So I just looked through these quickly and I think there are real >>> problems here we can fix, and that are worth fixing. >>> >>> However I am not thrilled with the fixes you propose. >> >> Sure. Unfortunately /proc/iomem is the only trick arm64 has to keep the existing >> kexec-tools working. >> (We've had 'unthrilling' patches like this before to prevent user-space from loading the >> kernel over the top of the in-memory firmware tables.) >> >> arm64 expects the description of memory to come from firmware, be that UEFI for memory >> present at boot, or the ACPI AML methods for memory that was added >> later. >> >> On arm64 there is no standard location for memory. The kernel has to be handed a pointer >> to the firmware tables that describe it. The kernel expects to boot from memory that was >> present at boot. > What do you do when the firmware is wrong? The firmware gets fixed. Its the only source of facts about the platform. > Does arm64 support the > mem=xxx@yyy kernel command line options? Only the debug option to reduce the available memory. > If you want to handle the general case of memory hotplug having a > limitation that you have to boot from memory that was present at boot is > a bug, because the memory might not be there. arm64's arch code prevents the memory described by the UEFI memory map from being taken offline/removed. Memory present at boot may have firmware reservations, that are being used by some other agent in the system. firmware-first RAS errors are one, the interrupt controllers' property and pending tables are another. The UEFI memory map's description of memory may have been incomplete, as there may have been regions carved-out, not described at all instead of described as reserved. The UEFI runtime services will live in memory described by the UEFI memory map. >> Modifying the firmware tables at runtime doesn't solve the problem as we may need to move >> the firmware-reserved memory region that describes memory. User-space may still load and >> kexec either side of that update. >> >> Even if we could modify the structures at runtime, we can't update a loaded kexec image. >> We have no idea which blob from userspace is the DT. It may not even be linux that has >> been loaded. > > What can be done and very reasonably so is on memory hotplug: > - Unloaded any loaded kexec image. > - Block loading any new image until the hotplug operation completes. > > That is simple and generic, and can be done for all architectures. Yes, certainly. > This doesn't apply to kexec on panic kernel because it fundamentally > needs to figure out how to limp along (or reliably stop) when it has the > wrong memory map. > >> We can't emulate parts of UEFI's handover because kexec's purgatory >> isn't an EFI program. > > Plus much of EFI is unusable after ExitBootServices is called. Of course, we even overwrite its code when allocating memory for the kernel. I bring it up because it is our only way of handing over the memory map of the system. >> I can't see a path through all this. If we have to modify existing user-space, I'd rather >> leave it broken. We can detect the problem in the arch code and print a warning at load time. > The weirdest thing to me in all of this is that you have been wanting to > handle memory hotplug. But you don't want to change or deal with the > memory map changing when hotplug occurs. The memory map changing is > fundamentally memory hotplug does. arm64 doesn't have a 'the memory map', just what came from firmware. The memory map linux uses is built from these firmware descriptions. Memory is discovered from: early: The DT memory node. early: The UEFI memory map. later: ACPI hotplug memory. Later kexec()d or kdump'd kernels rebuild the memory map from the firmware description. This means kexec is totally invisible. Not changing these descriptions is important to ensure we don't accidentally corrupt them, or make up some property that isn't true. Your request to 'change' the memory map involves creating a new UEFI memory map that describes the memory we found via ACPI hotplug. arm64 doesn't do this because we expect the next kernel to re-discover this memory via ACPI hotplug. Generally, arm64 expects a kexec'd kernel to learn and discover things in exactly the same way that it would have done if it were the first kernel to have been booted. > So I think it is fundamental to figure out how to pass the updated > memory map. Either through command line mem=xxx@yyy command line > options or through another option. We re-discover it from firmware. Booting from memory that is not described as memory early enough is the second problem addressed by this series. > If you really want to keep the limitation that you have to have the > kernel in the initial memory map you can compare that map to the > efi tables when selecting the load address. Great. How can user-space know the contents of that map? It only reads /proc/iomem today. On a system that doesn't support APCI memory hotplug, /proc/iomem describes the memory present at boot. These things have never been different before. > Expecting userspace to reload the loaded kernel after memory hotplug is > completely reasonable. I'm sold on this, it implicitly solves the 'kexec image wants to be copied into removed memory' problem. > Unless I am mistaken memory hotplug is expected to be a rare event not > something that happens every day, certainly not something that happens > every minute. One of the motivations for supporting memory hotplug is for VMs. Container projects like to create VMs in advance, then reconfigure them just before they are used. This saves the time taken by the hypervisor to do its work. Hitting the 'not booted from boot memory' is now just using kexec in a VM deployed like this. Thanks, James ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-04-22 15:40 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-22 15:40 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel Hi Eric, On 22/04/2020 14:04, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: >> On 15/04/2020 21:29, Eric W. Biederman wrote: >>> James Morse <james.morse@arm.com> writes: >>>> arm64 recently queued support for memory hotremove, which led to some >>>> new corner cases for kexec. >>>> >>>> If the kexec segments are loaded for a removable region, that region may >>>> be removed before kexec actually occurs. This causes the first kernel to >>>> lockup when applying the relocations. (I've triggered this on x86 too). >>>> >>>> The first patch adds a memory notifier for kexec so that it can refuse >>>> to allow in-use regions to be taken offline. >>>> >>>> >>>> This doesn't solve the problem for arm64, where the new kernel must >>>> initially rely on the data structures from the first boot to describe >>>> memory. These don't describe hotpluggable memory. >>>> If kexec places the kernel in one of these regions, it must also provide >>>> a DT that describes the region in which the kernel was mapped as memory. >>>> (and somehow ensure its always present in the future...) >>>> >>>> To prevent this from happening accidentally with unaware user-space, >>>> patches two and three allow arm64 to give these regions a different >>>> name. >>>> >>>> This is a change in behaviour for arm64 as memory hotadd and hotremove >>>> were added separately. >>>> >>>> >>>> I haven't tried kdump. >>>> Unaware kdump from user-space probably won't describe the hotplug >>>> regions if the name is different, which saves us from problems if >>>> the memory is no longer present at kdump time, but means the vmcore >>>> is incomplete. >>>> >>>> >>>> These patches are based on arm64's for-next/core branch, but can all >>>> be merged independently. >>> >>> So I just looked through these quickly and I think there are real >>> problems here we can fix, and that are worth fixing. >>> >>> However I am not thrilled with the fixes you propose. >> >> Sure. Unfortunately /proc/iomem is the only trick arm64 has to keep the existing >> kexec-tools working. >> (We've had 'unthrilling' patches like this before to prevent user-space from loading the >> kernel over the top of the in-memory firmware tables.) >> >> arm64 expects the description of memory to come from firmware, be that UEFI for memory >> present at boot, or the ACPI AML methods for memory that was added >> later. >> >> On arm64 there is no standard location for memory. The kernel has to be handed a pointer >> to the firmware tables that describe it. The kernel expects to boot from memory that was >> present at boot. > What do you do when the firmware is wrong? The firmware gets fixed. Its the only source of facts about the platform. > Does arm64 support the > mem=xxx@yyy kernel command line options? Only the debug option to reduce the available memory. > If you want to handle the general case of memory hotplug having a > limitation that you have to boot from memory that was present at boot is > a bug, because the memory might not be there. arm64's arch code prevents the memory described by the UEFI memory map from being taken offline/removed. Memory present at boot may have firmware reservations, that are being used by some other agent in the system. firmware-first RAS errors are one, the interrupt controllers' property and pending tables are another. The UEFI memory map's description of memory may have been incomplete, as there may have been regions carved-out, not described at all instead of described as reserved. The UEFI runtime services will live in memory described by the UEFI memory map. >> Modifying the firmware tables at runtime doesn't solve the problem as we may need to move >> the firmware-reserved memory region that describes memory. User-space may still load and >> kexec either side of that update. >> >> Even if we could modify the structures at runtime, we can't update a loaded kexec image. >> We have no idea which blob from userspace is the DT. It may not even be linux that has >> been loaded. > > What can be done and very reasonably so is on memory hotplug: > - Unloaded any loaded kexec image. > - Block loading any new image until the hotplug operation completes. > > That is simple and generic, and can be done for all architectures. Yes, certainly. > This doesn't apply to kexec on panic kernel because it fundamentally > needs to figure out how to limp along (or reliably stop) when it has the > wrong memory map. > >> We can't emulate parts of UEFI's handover because kexec's purgatory >> isn't an EFI program. > > Plus much of EFI is unusable after ExitBootServices is called. Of course, we even overwrite its code when allocating memory for the kernel. I bring it up because it is our only way of handing over the memory map of the system. >> I can't see a path through all this. If we have to modify existing user-space, I'd rather >> leave it broken. We can detect the problem in the arch code and print a warning at load time. > The weirdest thing to me in all of this is that you have been wanting to > handle memory hotplug. But you don't want to change or deal with the > memory map changing when hotplug occurs. The memory map changing is > fundamentally memory hotplug does. arm64 doesn't have a 'the memory map', just what came from firmware. The memory map linux uses is built from these firmware descriptions. Memory is discovered from: early: The DT memory node. early: The UEFI memory map. later: ACPI hotplug memory. Later kexec()d or kdump'd kernels rebuild the memory map from the firmware description. This means kexec is totally invisible. Not changing these descriptions is important to ensure we don't accidentally corrupt them, or make up some property that isn't true. Your request to 'change' the memory map involves creating a new UEFI memory map that describes the memory we found via ACPI hotplug. arm64 doesn't do this because we expect the next kernel to re-discover this memory via ACPI hotplug. Generally, arm64 expects a kexec'd kernel to learn and discover things in exactly the same way that it would have done if it were the first kernel to have been booted. > So I think it is fundamental to figure out how to pass the updated > memory map. Either through command line mem=xxx@yyy command line > options or through another option. We re-discover it from firmware. Booting from memory that is not described as memory early enough is the second problem addressed by this series. > If you really want to keep the limitation that you have to have the > kernel in the initial memory map you can compare that map to the > efi tables when selecting the load address. Great. How can user-space know the contents of that map? It only reads /proc/iomem today. On a system that doesn't support APCI memory hotplug, /proc/iomem describes the memory present at boot. These things have never been different before. > Expecting userspace to reload the loaded kernel after memory hotplug is > completely reasonable. I'm sold on this, it implicitly solves the 'kexec image wants to be copied into removed memory' problem. > Unless I am mistaken memory hotplug is expected to be a rare event not > something that happens every day, certainly not something that happens > every minute. One of the motivations for supporting memory hotplug is for VMs. Container projects like to create VMs in advance, then reconfigure them just before they are used. This saves the time taken by the hypervisor to do its work. Hitting the 'not booted from boot memory' is now just using kexec in a VM deployed like this. Thanks, James _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 264+ messages in thread
* Re: [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use @ 2020-04-22 15:40 ` James Morse 0 siblings, 0 replies; 264+ messages in thread From: James Morse @ 2020-04-22 15:40 UTC (permalink / raw) To: Eric W. Biederman Cc: Anshuman Khandual, Catalin Marinas, Bhupesh Sharma, kexec, linux-mm, Andrew Morton, Will Deacon, linux-arm-kernel Hi Eric, On 22/04/2020 14:04, Eric W. Biederman wrote: > James Morse <james.morse@arm.com> writes: >> On 15/04/2020 21:29, Eric W. Biederman wrote: >>> James Morse <james.morse@arm.com> writes: >>>> arm64 recently queued support for memory hotremove, which led to some >>>> new corner cases for kexec. >>>> >>>> If the kexec segments are loaded for a removable region, that region may >>>> be removed before kexec actually occurs. This causes the first kernel to >>>> lockup when applying the relocations. (I've triggered this on x86 too). >>>> >>>> The first patch adds a memory notifier for kexec so that it can refuse >>>> to allow in-use regions to be taken offline. >>>> >>>> >>>> This doesn't solve the problem for arm64, where the new kernel must >>>> initially rely on the data structures from the first boot to describe >>>> memory. These don't describe hotpluggable memory. >>>> If kexec places the kernel in one of these regions, it must also provide >>>> a DT that describes the region in which the kernel was mapped as memory. >>>> (and somehow ensure its always present in the future...) >>>> >>>> To prevent this from happening accidentally with unaware user-space, >>>> patches two and three allow arm64 to give these regions a different >>>> name. >>>> >>>> This is a change in behaviour for arm64 as memory hotadd and hotremove >>>> were added separately. >>>> >>>> >>>> I haven't tried kdump. >>>> Unaware kdump from user-space probably won't describe the hotplug >>>> regions if the name is different, which saves us from problems if >>>> the memory is no longer present at kdump time, but means the vmcore >>>> is incomplete. >>>> >>>> >>>> These patches are based on arm64's for-next/core branch, but can all >>>> be merged independently. >>> >>> So I just looked through these quickly and I think there are real >>> problems here we can fix, and that are worth fixing. >>> >>> However I am not thrilled with the fixes you propose. >> >> Sure. Unfortunately /proc/iomem is the only trick arm64 has to keep the existing >> kexec-tools working. >> (We've had 'unthrilling' patches like this before to prevent user-space from loading the >> kernel over the top of the in-memory firmware tables.) >> >> arm64 expects the description of memory to come from firmware, be that UEFI for memory >> present at boot, or the ACPI AML methods for memory that was added >> later. >> >> On arm64 there is no standard location for memory. The kernel has to be handed a pointer >> to the firmware tables that describe it. The kernel expects to boot from memory that was >> present at boot. > What do you do when the firmware is wrong? The firmware gets fixed. Its the only source of facts about the platform. > Does arm64 support the > mem=xxx@yyy kernel command line options? Only the debug option to reduce the available memory. > If you want to handle the general case of memory hotplug having a > limitation that you have to boot from memory that was present at boot is > a bug, because the memory might not be there. arm64's arch code prevents the memory described by the UEFI memory map from being taken offline/removed. Memory present at boot may have firmware reservations, that are being used by some other agent in the system. firmware-first RAS errors are one, the interrupt controllers' property and pending tables are another. The UEFI memory map's description of memory may have been incomplete, as there may have been regions carved-out, not described at all instead of described as reserved. The UEFI runtime services will live in memory described by the UEFI memory map. >> Modifying the firmware tables at runtime doesn't solve the problem as we may need to move >> the firmware-reserved memory region that describes memory. User-space may still load and >> kexec either side of that update. >> >> Even if we could modify the structures at runtime, we can't update a loaded kexec image. >> We have no idea which blob from userspace is the DT. It may not even be linux that has >> been loaded. > > What can be done and very reasonably so is on memory hotplug: > - Unloaded any loaded kexec image. > - Block loading any new image until the hotplug operation completes. > > That is simple and generic, and can be done for all architectures. Yes, certainly. > This doesn't apply to kexec on panic kernel because it fundamentally > needs to figure out how to limp along (or reliably stop) when it has the > wrong memory map. > >> We can't emulate parts of UEFI's handover because kexec's purgatory >> isn't an EFI program. > > Plus much of EFI is unusable after ExitBootServices is called. Of course, we even overwrite its code when allocating memory for the kernel. I bring it up because it is our only way of handing over the memory map of the system. >> I can't see a path through all this. If we have to modify existing user-space, I'd rather >> leave it broken. We can detect the problem in the arch code and print a warning at load time. > The weirdest thing to me in all of this is that you have been wanting to > handle memory hotplug. But you don't want to change or deal with the > memory map changing when hotplug occurs. The memory map changing is > fundamentally memory hotplug does. arm64 doesn't have a 'the memory map', just what came from firmware. The memory map linux uses is built from these firmware descriptions. Memory is discovered from: early: The DT memory node. early: The UEFI memory map. later: ACPI hotplug memory. Later kexec()d or kdump'd kernels rebuild the memory map from the firmware description. This means kexec is totally invisible. Not changing these descriptions is important to ensure we don't accidentally corrupt them, or make up some property that isn't true. Your request to 'change' the memory map involves creating a new UEFI memory map that describes the memory we found via ACPI hotplug. arm64 doesn't do this because we expect the next kernel to re-discover this memory via ACPI hotplug. Generally, arm64 expects a kexec'd kernel to learn and discover things in exactly the same way that it would have done if it were the first kernel to have been booted. > So I think it is fundamental to figure out how to pass the updated > memory map. Either through command line mem=xxx@yyy command line > options or through another option. We re-discover it from firmware. Booting from memory that is not described as memory early enough is the second problem addressed by this series. > If you really want to keep the limitation that you have to have the > kernel in the initial memory map you can compare that map to the > efi tables when selecting the load address. Great. How can user-space know the contents of that map? It only reads /proc/iomem today. On a system that doesn't support APCI memory hotplug, /proc/iomem describes the memory present at boot. These things have never been different before. > Expecting userspace to reload the loaded kernel after memory hotplug is > completely reasonable. I'm sold on this, it implicitly solves the 'kexec image wants to be copied into removed memory' problem. > Unless I am mistaken memory hotplug is expected to be a rare event not > something that happens every day, certainly not something that happens > every minute. One of the motivations for supporting memory hotplug is for VMs. Container projects like to create VMs in advance, then reconfigure them just before they are used. This saves the time taken by the hypervisor to do its work. Hitting the 'not booted from boot memory' is now just using kexec in a VM deployed like this. Thanks, James _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 264+ messages in thread
end of thread, other threads:[~2020-05-11 8:35 UTC | newest] Thread overview: 264+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-03-26 18:07 [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use James Morse 2020-03-26 18:07 ` James Morse 2020-03-26 18:07 ` [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image James Morse 2020-03-26 18:07 ` James Morse 2020-03-27 0:43 ` Anshuman Khandual 2020-03-27 0:43 ` Anshuman Khandual 2020-03-27 2:54 ` Baoquan He 2020-03-27 2:54 ` Baoquan He 2020-03-27 15:46 ` James Morse 2020-03-27 15:46 ` James Morse 2020-03-27 2:34 ` Baoquan He 2020-03-27 2:34 ` Baoquan He 2020-03-27 9:30 ` David Hildenbrand 2020-03-27 9:30 ` David Hildenbrand 2020-03-27 16:56 ` James Morse 2020-03-27 16:56 ` James Morse 2020-03-27 17:06 ` David Hildenbrand 2020-03-27 17:06 ` David Hildenbrand 2020-03-27 18:07 ` James Morse 2020-03-27 18:07 ` James Morse 2020-03-27 18:52 ` David Hildenbrand 2020-03-27 18:52 ` David Hildenbrand 2020-03-30 13:00 ` James Morse 2020-03-30 13:00 ` James Morse 2020-03-30 13:13 ` David Hildenbrand 2020-03-30 13:13 ` David Hildenbrand 2020-03-30 17:17 ` James Morse 2020-03-30 17:17 ` James Morse 2020-03-30 18:14 ` David Hildenbrand 2020-03-30 18:14 ` David Hildenbrand 2020-04-10 19:10 ` Andrew Morton 2020-04-10 19:10 ` Andrew Morton 2020-04-10 19:10 ` Andrew Morton 2020-04-11 3:44 ` Baoquan He 2020-04-11 3:44 ` Baoquan He 2020-04-11 3:44 ` Baoquan He 2020-04-11 9:30 ` Russell King - ARM Linux admin 2020-04-11 9:30 ` Russell King - ARM Linux admin 2020-04-11 9:30 ` Russell King - ARM Linux admin 2020-04-11 9:58 ` David Hildenbrand 2020-04-11 9:58 ` David Hildenbrand 2020-04-11 9:58 ` David Hildenbrand 2020-04-12 5:35 ` Baoquan He 2020-04-12 5:35 ` Baoquan He 2020-04-12 5:35 ` Baoquan He 2020-04-12 8:08 ` Russell King - ARM Linux admin 2020-04-12 8:08 ` Russell King - ARM Linux admin 2020-04-12 8:08 ` Russell King - ARM Linux admin 2020-04-12 19:52 ` Eric W. Biederman 2020-04-12 19:52 ` Eric W. Biederman 2020-04-12 19:52 ` Eric W. Biederman 2020-04-12 20:37 ` Bhupesh SHARMA 2020-04-12 20:37 ` Bhupesh SHARMA 2020-04-12 20:37 ` Bhupesh SHARMA 2020-04-13 2:37 ` Baoquan He 2020-04-13 2:37 ` Baoquan He 2020-04-13 2:37 ` Baoquan He 2020-04-13 13:15 ` Eric W. Biederman 2020-04-13 13:15 ` Eric W. Biederman 2020-04-13 13:15 ` Eric W. Biederman 2020-04-13 23:01 ` Andrew Morton 2020-04-13 23:01 ` Andrew Morton 2020-04-13 23:01 ` Andrew Morton 2020-04-14 6:13 ` Eric W. Biederman 2020-04-14 6:13 ` Eric W. Biederman 2020-04-14 6:13 ` Eric W. Biederman 2020-04-14 6:40 ` Baoquan He 2020-04-14 6:40 ` Baoquan He 2020-04-14 6:40 ` Baoquan He 2020-04-14 6:51 ` Baoquan He 2020-04-14 6:51 ` Baoquan He 2020-04-14 6:51 ` Baoquan He 2020-04-14 8:00 ` David Hildenbrand 2020-04-14 8:00 ` David Hildenbrand 2020-04-14 8:00 ` David Hildenbrand 2020-04-14 9:22 ` Baoquan He 2020-04-14 9:22 ` Baoquan He 2020-04-14 9:22 ` Baoquan He 2020-04-14 9:22 ` Baoquan He 2020-04-14 9:37 ` David Hildenbrand 2020-04-14 9:37 ` David Hildenbrand 2020-04-14 9:37 ` David Hildenbrand 2020-04-14 9:37 ` David Hildenbrand 2020-04-14 14:39 ` Baoquan He 2020-04-14 14:39 ` Baoquan He 2020-04-14 14:39 ` Baoquan He 2020-04-14 14:39 ` Baoquan He 2020-04-14 14:49 ` David Hildenbrand 2020-04-14 14:49 ` David Hildenbrand 2020-04-14 14:49 ` David Hildenbrand 2020-04-14 14:49 ` David Hildenbrand 2020-04-15 2:35 ` Baoquan He 2020-04-15 2:35 ` Baoquan He 2020-04-15 2:35 ` Baoquan He 2020-04-15 2:35 ` Baoquan He 2020-04-16 13:31 ` David Hildenbrand 2020-04-16 13:31 ` David Hildenbrand 2020-04-16 13:31 ` David Hildenbrand 2020-04-16 13:31 ` David Hildenbrand 2020-04-16 14:02 ` Baoquan He 2020-04-16 14:02 ` Baoquan He 2020-04-16 14:02 ` Baoquan He 2020-04-16 14:02 ` Baoquan He 2020-04-16 14:09 ` David Hildenbrand 2020-04-16 14:09 ` David Hildenbrand 2020-04-16 14:09 ` David Hildenbrand 2020-04-16 14:09 ` David Hildenbrand 2020-04-16 14:36 ` Baoquan He 2020-04-16 14:36 ` Baoquan He 2020-04-16 14:36 ` Baoquan He 2020-04-16 14:36 ` Baoquan He 2020-04-16 14:47 ` David Hildenbrand 2020-04-16 14:47 ` David Hildenbrand 2020-04-16 14:47 ` David Hildenbrand 2020-04-16 14:47 ` David Hildenbrand 2020-04-21 13:29 ` David Hildenbrand 2020-04-21 13:29 ` David Hildenbrand 2020-04-21 13:29 ` David Hildenbrand 2020-04-21 13:29 ` David Hildenbrand 2020-04-21 13:57 ` David Hildenbrand 2020-04-21 13:57 ` David Hildenbrand 2020-04-21 13:57 ` David Hildenbrand 2020-04-21 13:57 ` David Hildenbrand 2020-04-21 13:59 ` Eric W. Biederman 2020-04-21 13:59 ` Eric W. Biederman 2020-04-21 13:59 ` Eric W. Biederman 2020-04-21 13:59 ` Eric W. Biederman 2020-04-21 14:30 ` David Hildenbrand 2020-04-21 14:30 ` David Hildenbrand 2020-04-21 14:30 ` David Hildenbrand 2020-04-21 14:30 ` David Hildenbrand 2020-04-22 9:17 ` Baoquan He 2020-04-22 9:17 ` Baoquan He 2020-04-22 9:17 ` Baoquan He 2020-04-22 9:17 ` Baoquan He 2020-04-22 9:24 ` David Hildenbrand 2020-04-22 9:24 ` David Hildenbrand 2020-04-22 9:24 ` David Hildenbrand 2020-04-22 9:24 ` David Hildenbrand 2020-04-22 9:57 ` Baoquan He 2020-04-22 9:57 ` Baoquan He 2020-04-22 9:57 ` Baoquan He 2020-04-22 9:57 ` Baoquan He 2020-04-22 10:05 ` David Hildenbrand 2020-04-22 10:05 ` David Hildenbrand 2020-04-22 10:05 ` David Hildenbrand 2020-04-22 10:05 ` David Hildenbrand 2020-04-22 10:36 ` Baoquan He 2020-04-22 10:36 ` Baoquan He 2020-04-22 10:36 ` Baoquan He 2020-04-22 10:36 ` Baoquan He 2020-04-14 9:16 ` Dave Young 2020-04-14 9:16 ` Dave Young 2020-04-14 9:16 ` Dave Young 2020-04-14 9:38 ` Dave Young 2020-04-14 9:38 ` Dave Young 2020-04-14 9:38 ` Dave Young 2020-04-14 7:05 ` David Hildenbrand 2020-04-14 7:05 ` David Hildenbrand 2020-04-14 7:05 ` David Hildenbrand 2020-04-14 16:55 ` James Morse 2020-04-14 16:55 ` James Morse 2020-04-14 16:55 ` James Morse 2020-04-14 17:41 ` David Hildenbrand 2020-04-14 17:41 ` David Hildenbrand 2020-04-14 17:41 ` David Hildenbrand 2020-04-15 20:33 ` Eric W. Biederman 2020-04-15 20:33 ` Eric W. Biederman 2020-04-15 20:33 ` Eric W. Biederman 2020-04-22 12:28 ` James Morse 2020-04-22 12:28 ` James Morse 2020-04-22 12:28 ` James Morse 2020-04-22 15:25 ` Eric W. Biederman 2020-04-22 15:25 ` Eric W. Biederman 2020-04-22 15:25 ` Eric W. Biederman 2020-04-22 16:40 ` David Hildenbrand 2020-04-22 16:40 ` David Hildenbrand 2020-04-22 16:40 ` David Hildenbrand 2020-04-23 16:29 ` Eric W. Biederman 2020-04-23 16:29 ` Eric W. Biederman 2020-04-23 16:29 ` Eric W. Biederman 2020-04-24 7:39 ` David Hildenbrand 2020-04-24 7:39 ` David Hildenbrand 2020-04-24 7:39 ` David Hildenbrand 2020-04-24 7:41 ` David Hildenbrand 2020-04-24 7:41 ` David Hildenbrand 2020-04-24 7:41 ` David Hildenbrand 2020-05-01 16:55 ` James Morse 2020-05-01 16:55 ` James Morse 2020-05-01 16:55 ` James Morse 2020-03-26 18:07 ` [PATCH 2/3] mm/memory_hotplug: Allow arch override of non boot memory resource names James Morse 2020-03-26 18:07 ` James Morse 2020-03-27 9:59 ` David Hildenbrand 2020-03-27 9:59 ` David Hildenbrand 2020-03-27 15:39 ` James Morse 2020-03-27 15:39 ` James Morse 2020-03-30 13:23 ` David Hildenbrand 2020-03-30 13:23 ` David Hildenbrand 2020-03-30 17:17 ` James Morse 2020-03-30 17:17 ` James Morse 2020-04-02 5:49 ` Dave Young 2020-04-02 5:49 ` Dave Young 2020-04-02 5:49 ` Dave Young 2020-04-02 6:12 ` piliu 2020-04-02 6:12 ` piliu 2020-04-02 6:12 ` piliu 2020-04-14 17:21 ` James Morse 2020-04-14 17:21 ` James Morse 2020-04-14 17:21 ` James Morse 2020-04-15 20:36 ` Eric W. Biederman 2020-04-15 20:36 ` Eric W. Biederman 2020-04-15 20:36 ` Eric W. Biederman 2020-04-22 12:14 ` James Morse 2020-04-22 12:14 ` James Morse 2020-04-22 12:14 ` James Morse 2020-05-09 0:45 ` Andrew Morton 2020-05-09 0:45 ` Andrew Morton 2020-05-09 0:45 ` Andrew Morton 2020-05-11 8:35 ` David Hildenbrand 2020-05-11 8:35 ` David Hildenbrand 2020-05-11 8:35 ` David Hildenbrand 2020-03-26 18:07 ` [PATCH 3/3] arm64: memory: Give hotplug memory a different resource name James Morse 2020-03-26 18:07 ` James Morse 2020-03-30 19:01 ` David Hildenbrand 2020-03-30 19:01 ` David Hildenbrand 2020-04-15 20:37 ` Eric W. Biederman 2020-04-15 20:37 ` Eric W. Biederman 2020-04-15 20:37 ` Eric W. Biederman 2020-04-22 12:14 ` James Morse 2020-04-22 12:14 ` James Morse 2020-04-22 12:14 ` James Morse 2020-03-27 2:11 ` [PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use Baoquan He 2020-03-27 2:11 ` Baoquan He 2020-03-27 15:40 ` James Morse 2020-03-27 15:40 ` James Morse 2020-03-27 9:27 ` David Hildenbrand 2020-03-27 9:27 ` David Hildenbrand 2020-03-27 15:42 ` James Morse 2020-03-27 15:42 ` James Morse 2020-03-30 13:18 ` David Hildenbrand 2020-03-30 13:18 ` David Hildenbrand 2020-03-30 13:55 ` Baoquan He 2020-03-30 13:55 ` Baoquan He 2020-03-30 17:17 ` James Morse 2020-03-30 17:17 ` James Morse 2020-03-31 3:46 ` Dave Young 2020-03-31 3:46 ` Dave Young 2020-04-14 17:31 ` James Morse 2020-04-14 17:31 ` James Morse 2020-04-14 17:31 ` James Morse 2020-03-31 3:38 ` Dave Young 2020-03-31 3:38 ` Dave Young 2020-04-15 20:29 ` Eric W. Biederman 2020-04-15 20:29 ` Eric W. Biederman 2020-04-15 20:29 ` Eric W. Biederman 2020-04-22 12:14 ` James Morse 2020-04-22 12:14 ` James Morse 2020-04-22 12:14 ` James Morse 2020-04-22 13:04 ` Eric W. Biederman 2020-04-22 13:04 ` Eric W. Biederman 2020-04-22 13:04 ` Eric W. Biederman 2020-04-22 15:40 ` James Morse 2020-04-22 15:40 ` James Morse 2020-04-22 15:40 ` James Morse
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.