* [PATCH 0/2] kexec_file: Load kernel at top of system RAM if required @ 2023-11-14 9:16 Baoquan He 2023-11-14 9:16 ` [PATCH 1/2] resource: add walk_system_ram_res_rev() Baoquan He ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Baoquan He @ 2023-11-14 9:16 UTC (permalink / raw) To: linux-kernel Cc: kexec, x86, linux-parisc, linuxppc-dev, linux-riscv, linux-s390, akpm, ebiederm, takahiro.akashi, Baoquan He Justification: ============== Kexec_load interface has been doing top down searching and loading kernel/initrd/purgtory etc to prepare for kexec reboot. In that way, the benefits are that it avoids to consume and fragment limited low memory which satisfy DMA buffer allocation and big chunk of continuous memory during system init; and avoids to stir with BIOS/FW reserved or occupied areas, or corner case handling/work around/quirk occupied areas when doing system init. By the way, the top-down searching and loading of kexec-ed kernel is done in user space utility code. For kexec_file loading, even if kexec_buf.top_down is 'true', it's simply ignored. It calls walk_system_ram_res() directly to go through all resources of System RAM bottom up, to find an available memory region, then call locate_mem_hole_callback() to allocate memory in that found memory region from top to down. This is not expected and inconsistent with kexec_load. Implementation =============== In patch 1, introduce a new function walk_system_ram_res_rev() which is a variant of walk_system_ram_res(), it walks through a list of all the resources of System RAM in reversed order, i.e., from higher to lower. In patch 2, check if kexec_buf.top_down is 'true' in kexec_walk_resources(), if yes, call walk_system_ram_res_rev() to find memory region of system RAM from top to down to load kernel/initrd etc. Background information: ======================= And I ever tried this in the past in a different way, please see below link. In the post, I tried to adjust struct sibling linking code, replace the the singly linked list with list_head so that walk_system_ram_res_rev() can be implemented in a much easier way. Finally I failed. https://lore.kernel.org/all/20180718024944.577-4-bhe@redhat.com/ This time, I picked up the patch from AKASHI Takahiro's old post and made some change to take as the current patch 1: https://lists.infradead.org/pipermail/linux-arm-kernel/2017-September/531456.html Testing: ======== Only tried on x86_64 Baoquan He (2): resource: add walk_system_ram_res_rev() kexec_file: Load kernel at top of system RAM if required include/linux/ioport.h | 3 +++ kernel/kexec_file.c | 2 ++ kernel/resource.c | 61 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 66 insertions(+) -- 2.41.0 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/2] resource: add walk_system_ram_res_rev() 2023-11-14 9:16 [PATCH 0/2] kexec_file: Load kernel at top of system RAM if required Baoquan He @ 2023-11-14 9:16 ` Baoquan He 2023-11-14 23:17 ` Andrew Morton 2023-11-15 13:00 ` [PATCH v2 " Baoquan He 2023-11-14 9:16 ` [PATCH 2/2] kexec_file: Load kernel at top of system RAM if required Baoquan He 2024-01-20 21:09 ` [PATCH 0/2] " patchwork-bot+linux-riscv 2 siblings, 2 replies; 19+ messages in thread From: Baoquan He @ 2023-11-14 9:16 UTC (permalink / raw) To: linux-kernel Cc: kexec, x86, linux-parisc, linuxppc-dev, linux-riscv, linux-s390, akpm, ebiederm, takahiro.akashi, Baoquan He This function, being a variant of walk_system_ram_res() introduced in commit 8c86e70acead ("resource: provide new functions to walk through resources"), walks through a list of all the resources of System RAM in reversed order, i.e., from higher to lower. It will be used in kexec_file code to load kernel, initrd etc when preparing kexec reboot. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Signed-off-by: Baoquan He <bhe@redhat.com> --- include/linux/ioport.h | 3 +++ kernel/resource.c | 61 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 64 insertions(+) diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 14f5cfabbbc8..db7fe25f3370 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -331,6 +331,9 @@ extern int walk_system_ram_res(u64 start, u64 end, void *arg, int (*func)(struct resource *, void *)); extern int +walk_system_ram_res_rev(u64 start, u64 end, void *arg, + int (*func)(struct resource *, void *)); +extern int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end, void *arg, int (*func)(struct resource *, void *)); diff --git a/kernel/resource.c b/kernel/resource.c index 866ef3663a0b..12bce44a2c08 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -27,6 +27,8 @@ #include <linux/mount.h> #include <linux/resource_ext.h> #include <uapi/linux/magic.h> +#include <linux/string.h> +#include <linux/vmalloc.h> #include <asm/io.h> @@ -429,6 +431,65 @@ int walk_system_ram_res(u64 start, u64 end, void *arg, func); } +/* + * This function, being a variant of walk_system_ram_res(), calls the @func + * callback against all memory ranges of type System RAM which are marked as + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from + * higher to lower. + */ +int walk_system_ram_res_rev(u64 start, u64 end, void *arg, + int (*func)(struct resource *, void *)) +{ + struct resource res, *rams; + int rams_size = 16, i; + unsigned long flags; + int ret = -1; + + /* create a list */ + rams = kvcalloc(rams_size, sizeof(struct resource), GFP_KERNEL); + if (!rams) + return ret; + + flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; + i = 0; + while ((start < end) && + (!find_next_iomem_res(start, end, flags, IORES_DESC_NONE, &res))) { + if (i >= rams_size) { + /* re-alloc */ + struct resource *rams_new; + int rams_new_size; + + rams_new_size = rams_size + 16; + rams_new = kvcalloc(rams_new_size, sizeof(struct resource), + GFP_KERNEL); + if (!rams_new) + goto out; + + memcpy(rams_new, rams, + sizeof(struct resource) * rams_size); + kvfree(rams); + rams = rams_new; + rams_size = rams_new_size; + } + + rams[i].start = res.start; + rams[i++].end = res.end; + + start = res.end + 1; + } + + /* go reverse */ + for (i--; i >= 0; i--) { + ret = (*func)(&rams[i], arg); + if (ret) + break; + } + +out: + kvfree(rams); + return ret; +} + /* * This function calls the @func callback against all memory ranges, which * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY. -- 2.41.0 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] resource: add walk_system_ram_res_rev() 2023-11-14 9:16 ` [PATCH 1/2] resource: add walk_system_ram_res_rev() Baoquan He @ 2023-11-14 23:17 ` Andrew Morton 2023-11-15 0:40 ` Baoquan He 2023-11-15 13:00 ` [PATCH v2 " Baoquan He 1 sibling, 1 reply; 19+ messages in thread From: Andrew Morton @ 2023-11-14 23:17 UTC (permalink / raw) To: Baoquan He Cc: linux-kernel, kexec, x86, linux-parisc, linuxppc-dev, linux-riscv, linux-s390, ebiederm, takahiro.akashi On Tue, 14 Nov 2023 17:16:57 +0800 Baoquan He <bhe@redhat.com> wrote: > This function, being a variant of walk_system_ram_res() introduced in > commit 8c86e70acead ("resource: provide new functions to walk through > resources"), walks through a list of all the resources of System RAM > in reversed order, i.e., from higher to lower. > > It will be used in kexec_file code to load kernel, initrd etc when > preparing kexec reboot. > > ... > > +/* > + * This function, being a variant of walk_system_ram_res(), calls the @func > + * callback against all memory ranges of type System RAM which are marked as > + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from > + * higher to lower. > + */ > +int walk_system_ram_res_rev(u64 start, u64 end, void *arg, > + int (*func)(struct resource *, void *)) > +{ > + struct resource res, *rams; > + int rams_size = 16, i; > + unsigned long flags; > + int ret = -1; > + > + /* create a list */ > + rams = kvcalloc(rams_size, sizeof(struct resource), GFP_KERNEL); > + if (!rams) > + return ret; > + > + flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > + i = 0; > + while ((start < end) && > + (!find_next_iomem_res(start, end, flags, IORES_DESC_NONE, &res))) { > + if (i >= rams_size) { > + /* re-alloc */ > + struct resource *rams_new; > + int rams_new_size; > + > + rams_new_size = rams_size + 16; > + rams_new = kvcalloc(rams_new_size, sizeof(struct resource), > + GFP_KERNEL); kvrealloc()? > + if (!rams_new) > + goto out; > + > + memcpy(rams_new, rams, > + sizeof(struct resource) * rams_size); > + kvfree(rams); > + rams = rams_new; > + rams_size = rams_new_size; > + } > + > + rams[i].start = res.start; > + rams[i++].end = res.end; > + > + start = res.end + 1; > + } > + > + /* go reverse */ > + for (i--; i >= 0; i--) { > + ret = (*func)(&rams[i], arg); > + if (ret) > + break; > + } > + > +out: > + kvfree(rams); > + return ret; > +} > + > /* > * This function calls the @func callback against all memory ranges, which > * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY. > -- > 2.41.0 ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] resource: add walk_system_ram_res_rev() 2023-11-14 23:17 ` Andrew Morton @ 2023-11-15 0:40 ` Baoquan He 0 siblings, 0 replies; 19+ messages in thread From: Baoquan He @ 2023-11-15 0:40 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, kexec, x86, linux-parisc, linuxppc-dev, linux-riscv, linux-s390, ebiederm, takahiro.akashi On 11/14/23 at 03:17pm, Andrew Morton wrote: > On Tue, 14 Nov 2023 17:16:57 +0800 Baoquan He <bhe@redhat.com> wrote: > > > This function, being a variant of walk_system_ram_res() introduced in > > commit 8c86e70acead ("resource: provide new functions to walk through > > resources"), walks through a list of all the resources of System RAM > > in reversed order, i.e., from higher to lower. > > > > It will be used in kexec_file code to load kernel, initrd etc when > > preparing kexec reboot. > > > > ... > > > > +/* > > + * This function, being a variant of walk_system_ram_res(), calls the @func > > + * callback against all memory ranges of type System RAM which are marked as > > + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from > > + * higher to lower. > > + */ > > +int walk_system_ram_res_rev(u64 start, u64 end, void *arg, > > + int (*func)(struct resource *, void *)) > > +{ > > + struct resource res, *rams; > > + int rams_size = 16, i; > > + unsigned long flags; > > + int ret = -1; > > + > > + /* create a list */ > > + rams = kvcalloc(rams_size, sizeof(struct resource), GFP_KERNEL); > > + if (!rams) > > + return ret; > > + > > + flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > > + i = 0; > > + while ((start < end) && > > + (!find_next_iomem_res(start, end, flags, IORES_DESC_NONE, &res))) { > > + if (i >= rams_size) { > > + /* re-alloc */ > > + struct resource *rams_new; > > + int rams_new_size; > > + > > + rams_new_size = rams_size + 16; > > + rams_new = kvcalloc(rams_new_size, sizeof(struct resource), > > + GFP_KERNEL); > > kvrealloc()? Exactly. Will udpate. Thanks for the great suggestion. > > > + if (!rams_new) > > + goto out; > > + > > + memcpy(rams_new, rams, > > + sizeof(struct resource) * rams_size); > > + kvfree(rams); > > + rams = rams_new; > > + rams_size = rams_new_size; > > + } > > + > > + rams[i].start = res.start; > > + rams[i++].end = res.end; > > + > > + start = res.end + 1; > > + } > > + > > + /* go reverse */ > > + for (i--; i >= 0; i--) { > > + ret = (*func)(&rams[i], arg); > > + if (ret) > > + break; > > + } > > + > > +out: > > + kvfree(rams); > > + return ret; > > +} > > + > > /* > > * This function calls the @func callback against all memory ranges, which > > * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY. > > -- > > 2.41.0 > ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v2 1/2] resource: add walk_system_ram_res_rev() 2023-11-14 9:16 ` [PATCH 1/2] resource: add walk_system_ram_res_rev() Baoquan He 2023-11-14 23:17 ` Andrew Morton @ 2023-11-15 13:00 ` Baoquan He 2023-11-23 13:27 ` Baoquan He 2024-01-20 21:09 ` patchwork-bot+linux-riscv 1 sibling, 2 replies; 19+ messages in thread From: Baoquan He @ 2023-11-15 13:00 UTC (permalink / raw) To: linux-kernel Cc: kexec, x86, linux-parisc, linuxppc-dev, linux-riscv, linux-s390, akpm, ebiederm, takahiro.akashi This function, being a variant of walk_system_ram_res() introduced in commit 8c86e70acead ("resource: provide new functions to walk through resources"), walks through a list of all the resources of System RAM in reversed order, i.e., from higher to lower. It will be used in kexec_file code to load kernel, initrd etc when preparing kexec reboot. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Signed-off-by: Baoquan He <bhe@redhat.com> --- v1->v2: - Use kvrealloc() to reallocate memory instead of kvcalloc(), this simplifies code. Suggested by Andrew. include/linux/ioport.h | 3 +++ kernel/resource.c | 57 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 60 insertions(+) diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 14f5cfabbbc8..db7fe25f3370 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -331,6 +331,9 @@ extern int walk_system_ram_res(u64 start, u64 end, void *arg, int (*func)(struct resource *, void *)); extern int +walk_system_ram_res_rev(u64 start, u64 end, void *arg, + int (*func)(struct resource *, void *)); +extern int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end, void *arg, int (*func)(struct resource *, void *)); diff --git a/kernel/resource.c b/kernel/resource.c index 866ef3663a0b..e8a244300e5b 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -27,6 +27,8 @@ #include <linux/mount.h> #include <linux/resource_ext.h> #include <uapi/linux/magic.h> +#include <linux/string.h> +#include <linux/vmalloc.h> #include <asm/io.h> @@ -429,6 +431,61 @@ int walk_system_ram_res(u64 start, u64 end, void *arg, func); } +/* + * This function, being a variant of walk_system_ram_res(), calls the @func + * callback against all memory ranges of type System RAM which are marked as + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from + * higher to lower. + */ +int walk_system_ram_res_rev(u64 start, u64 end, void *arg, + int (*func)(struct resource *, void *)) +{ + struct resource res, *rams; + int rams_size = 16, i; + unsigned long flags; + int ret = -1; + + /* create a list */ + rams = kvcalloc(rams_size, sizeof(struct resource), GFP_KERNEL); + if (!rams) + return ret; + + flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; + i = 0; + while ((start < end) && + (!find_next_iomem_res(start, end, flags, IORES_DESC_NONE, &res))) { + if (i >= rams_size) { + /* re-alloc */ + struct resource *rams_new; + + rams_new = kvrealloc(rams, rams_size * sizeof(struct resource), + (rams_size + 16) * sizeof(struct resource), + GFP_KERNEL); + if (!rams_new) + goto out; + + rams = rams_new; + rams_size += 16; + } + + rams[i].start = res.start; + rams[i++].end = res.end; + + start = res.end + 1; + } + + /* go reverse */ + for (i--; i >= 0; i--) { + ret = (*func)(&rams[i], arg); + if (ret) + break; + } + +out: + kvfree(rams); + return ret; +} + /* * This function calls the @func callback against all memory ranges, which * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY. -- 2.41.0 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/2] resource: add walk_system_ram_res_rev() 2023-11-15 13:00 ` [PATCH v2 " Baoquan He @ 2023-11-23 13:27 ` Baoquan He 2024-01-20 21:09 ` patchwork-bot+linux-riscv 1 sibling, 0 replies; 19+ messages in thread From: Baoquan He @ 2023-11-23 13:27 UTC (permalink / raw) To: akpm Cc: kexec, x86, linux-parisc, linuxppc-dev, linux-riscv, linux-s390, linux-kernel, ebiederm, takahiro.akashi Hi Andrew, On 11/15/23 at 09:00pm, Baoquan He wrote: > This function, being a variant of walk_system_ram_res() introduced in > commit 8c86e70acead ("resource: provide new functions to walk through > resources"), walks through a list of all the resources of System RAM > in reversed order, i.e., from higher to lower. > > It will be used in kexec_file code to load kernel, initrd etc when > preparing kexec reboot. > > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> > Signed-off-by: Baoquan He <bhe@redhat.com> > --- > v1->v2: > - Use kvrealloc() to reallocate memory instead of kvcalloc(), this > simplifies code. Suggested by Andrew. > > include/linux/ioport.h | 3 +++ > kernel/resource.c | 57 ++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 60 insertions(+) Gentle ping. Could you pick this patchset into next tree so that it can be run on testing robots? Thanks Baoquan > > diff --git a/include/linux/ioport.h b/include/linux/ioport.h > index 14f5cfabbbc8..db7fe25f3370 100644 > --- a/include/linux/ioport.h > +++ b/include/linux/ioport.h > @@ -331,6 +331,9 @@ extern int > walk_system_ram_res(u64 start, u64 end, void *arg, > int (*func)(struct resource *, void *)); > extern int > +walk_system_ram_res_rev(u64 start, u64 end, void *arg, > + int (*func)(struct resource *, void *)); > +extern int > walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end, > void *arg, int (*func)(struct resource *, void *)); > > diff --git a/kernel/resource.c b/kernel/resource.c > index 866ef3663a0b..e8a244300e5b 100644 > --- a/kernel/resource.c > +++ b/kernel/resource.c > @@ -27,6 +27,8 @@ > #include <linux/mount.h> > #include <linux/resource_ext.h> > #include <uapi/linux/magic.h> > +#include <linux/string.h> > +#include <linux/vmalloc.h> > #include <asm/io.h> > > > @@ -429,6 +431,61 @@ int walk_system_ram_res(u64 start, u64 end, void *arg, > func); > } > > +/* > + * This function, being a variant of walk_system_ram_res(), calls the @func > + * callback against all memory ranges of type System RAM which are marked as > + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from > + * higher to lower. > + */ > +int walk_system_ram_res_rev(u64 start, u64 end, void *arg, > + int (*func)(struct resource *, void *)) > +{ > + struct resource res, *rams; > + int rams_size = 16, i; > + unsigned long flags; > + int ret = -1; > + > + /* create a list */ > + rams = kvcalloc(rams_size, sizeof(struct resource), GFP_KERNEL); > + if (!rams) > + return ret; > + > + flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > + i = 0; > + while ((start < end) && > + (!find_next_iomem_res(start, end, flags, IORES_DESC_NONE, &res))) { > + if (i >= rams_size) { > + /* re-alloc */ > + struct resource *rams_new; > + > + rams_new = kvrealloc(rams, rams_size * sizeof(struct resource), > + (rams_size + 16) * sizeof(struct resource), > + GFP_KERNEL); > + if (!rams_new) > + goto out; > + > + rams = rams_new; > + rams_size += 16; > + } > + > + rams[i].start = res.start; > + rams[i++].end = res.end; > + > + start = res.end + 1; > + } > + > + /* go reverse */ > + for (i--; i >= 0; i--) { > + ret = (*func)(&rams[i], arg); > + if (ret) > + break; > + } > + > +out: > + kvfree(rams); > + return ret; > +} > + > /* > * This function calls the @func callback against all memory ranges, which > * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY. > -- > 2.41.0 > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/2] resource: add walk_system_ram_res_rev() 2023-11-15 13:00 ` [PATCH v2 " Baoquan He 2023-11-23 13:27 ` Baoquan He @ 2024-01-20 21:09 ` patchwork-bot+linux-riscv 1 sibling, 0 replies; 19+ messages in thread From: patchwork-bot+linux-riscv @ 2024-01-20 21:09 UTC (permalink / raw) To: Baoquan He Cc: linux-riscv, linux-kernel, kexec, x86, linux-parisc, linuxppc-dev, linux-s390, akpm, ebiederm, takahiro.akashi Hello: This patch was applied to riscv/linux.git (fixes) by Andrew Morton <akpm@linux-foundation.org>: On Wed, 15 Nov 2023 21:00:27 +0800 you wrote: > This function, being a variant of walk_system_ram_res() introduced in > commit 8c86e70acead ("resource: provide new functions to walk through > resources"), walks through a list of all the resources of System RAM > in reversed order, i.e., from higher to lower. > > It will be used in kexec_file code to load kernel, initrd etc when > preparing kexec reboot. > > [...] Here is the summary with links: - [v2,1/2] resource: add walk_system_ram_res_rev() https://git.kernel.org/riscv/c/7acf164b259d You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 2/2] kexec_file: Load kernel at top of system RAM if required 2023-11-14 9:16 [PATCH 0/2] kexec_file: Load kernel at top of system RAM if required Baoquan He 2023-11-14 9:16 ` [PATCH 1/2] resource: add walk_system_ram_res_rev() Baoquan He @ 2023-11-14 9:16 ` Baoquan He 2024-01-20 21:09 ` [PATCH 0/2] " patchwork-bot+linux-riscv 2 siblings, 0 replies; 19+ messages in thread From: Baoquan He @ 2023-11-14 9:16 UTC (permalink / raw) To: linux-kernel Cc: kexec, x86, linux-parisc, linuxppc-dev, linux-riscv, linux-s390, akpm, ebiederm, takahiro.akashi, Baoquan He Kexec_load interface has been doing top down searching and loading kernel/initrd/purgtory etc to prepare for kexec reboot. In that way, the benefits are that it avoids to consume and fragment limited low memory which satisfy DMA buffer allocation and big chunk of continuous memory during system init; and avoids to stir with BIOS/FW reserved or occupied areas, or corner case handling/work around/quirk occupied areas when doing system init. By the way, the top-down searching and loading of kexec-ed kernel is done in user space utility code. For kexec_file loading, even if kexec_buf.top_down is 'true', it's simply ignored. It calls walk_system_ram_res() directly to go through all resources of System RAM bottom up, to find an available memory region, then call locate_mem_hole_callback() to allocate memory in that found memory region from top to down. This is not expected and inconsistent with kexec_load. Here check if kexec_buf.top_down is 'true' in kexec_walk_resources(), if yes, call the newly added walk_system_ram_res_rev() to find memory region of system RAM from top to down to load kernel/initrd etc. Signed-off-by: Baoquan He <bhe@redhat.com> --- kernel/kexec_file.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index f9a419cd22d4..ba3ef30921b8 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -592,6 +592,8 @@ static int kexec_walk_resources(struct kexec_buf *kbuf, IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY, crashk_res.start, crashk_res.end, kbuf, func); + else if (kbuf->top_down) + return walk_system_ram_res_rev(0, ULONG_MAX, kbuf, func); else return walk_system_ram_res(0, ULONG_MAX, kbuf, func); } -- 2.41.0 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 0/2] kexec_file: Load kernel at top of system RAM if required 2023-11-14 9:16 [PATCH 0/2] kexec_file: Load kernel at top of system RAM if required Baoquan He 2023-11-14 9:16 ` [PATCH 1/2] resource: add walk_system_ram_res_rev() Baoquan He 2023-11-14 9:16 ` [PATCH 2/2] kexec_file: Load kernel at top of system RAM if required Baoquan He @ 2024-01-20 21:09 ` patchwork-bot+linux-riscv 2 siblings, 0 replies; 19+ messages in thread From: patchwork-bot+linux-riscv @ 2024-01-20 21:09 UTC (permalink / raw) To: Baoquan He Cc: linux-riscv, linux-kernel, kexec, x86, linux-parisc, linuxppc-dev, linux-s390, akpm, ebiederm, takahiro.akashi Hello: This series was applied to riscv/linux.git (fixes) by Andrew Morton <akpm@linux-foundation.org>: On Tue, 14 Nov 2023 17:16:56 +0800 you wrote: > Justification: > ============== > Kexec_load interface has been doing top down searching and loading > kernel/initrd/purgtory etc to prepare for kexec reboot. In that way, > the benefits are that it avoids to consume and fragment limited low > memory which satisfy DMA buffer allocation and big chunk of continuous > memory during system init; and avoids to stir with BIOS/FW reserved > or occupied areas, or corner case handling/work around/quirk occupied > areas when doing system init. By the way, the top-down searching and > loading of kexec-ed kernel is done in user space utility code. > > [...] Here is the summary with links: - [1/2] resource: add walk_system_ram_res_rev() (no matching commit) - [2/2] kexec_file: Load kernel at top of system RAM if required https://git.kernel.org/riscv/c/b3ba234171cd You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 0/2] Kexec_file: Load kernel at top of system ram @ 2018-03-22 3:37 Baoquan He 2018-03-22 3:37 ` [PATCH 1/2] resource: add walk_system_ram_res_rev() Baoquan He 0 siblings, 1 reply; 19+ messages in thread From: Baoquan He @ 2018-03-22 3:37 UTC (permalink / raw) To: linux-kernel Cc: kexec, akpm, takahiro.akashi, ebiederm, vgoyal, dyoung, prudo, Baoquan He The current kexec_file ignores kexec_buf.top_down value when call arch_kexec_walk_mem() to allocate memory for loading kernel/initrd stuffs. This is not supposed to be what kexec_buf.top_down is used for. In patch 0001, introduce a new function walk_system_ram_res_rev() which is a variant of walk_system_ram_res(), walks through resources of System RAM from top to down. And patch 0001 is picked from AKASHI's patchset which adds arm64 kexec_file support. His next round of post won't need walk_system_ram_res_rev any more, so I take it into this patchset and use it in patch 0002. In patch 0002, check kexec_buf.top_down in arch_kexec_walk_mem(), if its value is 'true', call walk_system_ram_res_rev(). Otherwise call walk_system_ram_res(). AKASHI Takahiro (1): resource: add walk_system_ram_res_rev() Baoquan He (1): kexec_file: Load kernel at top of system RAM if required include/linux/ioport.h | 3 +++ kernel/kexec_file.c | 2 ++ kernel/resource.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 68 insertions(+) -- 2.13.6 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/2] resource: add walk_system_ram_res_rev() 2018-03-22 3:37 [PATCH 0/2] Kexec_file: Load kernel at top of system ram Baoquan He @ 2018-03-22 3:37 ` Baoquan He 2018-03-22 22:29 ` Andrew Morton 0 siblings, 1 reply; 19+ messages in thread From: Baoquan He @ 2018-03-22 3:37 UTC (permalink / raw) To: linux-kernel Cc: kexec, akpm, takahiro.akashi, ebiederm, vgoyal, dyoung, prudo, Baoquan He From: AKASHI Takahiro <takahiro.akashi@linaro.org> This function, being a variant of walk_system_ram_res() introduced in commit 8c86e70acead ("resource: provide new functions to walk through resources"), walks through a list of all the resources of System RAM in reversed order, i.e., from higher to lower. It will be used in kexec_file code. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Signed-off-by: Baoquan He <bhe@redhat.com> --- include/linux/ioport.h | 3 +++ kernel/resource.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 66 insertions(+) diff --git a/include/linux/ioport.h b/include/linux/ioport.h index da0ebaec25f0..f12d95fe038b 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -277,6 +277,9 @@ extern int walk_system_ram_res(u64 start, u64 end, void *arg, int (*func)(struct resource *, void *)); extern int +walk_system_ram_res_rev(u64 start, u64 end, void *arg, + int (*func)(struct resource *, void *)); +extern int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end, void *arg, int (*func)(struct resource *, void *)); diff --git a/kernel/resource.c b/kernel/resource.c index e270b5048988..f456fc95f1b2 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -23,6 +23,8 @@ #include <linux/pfn.h> #include <linux/mm.h> #include <linux/resource_ext.h> +#include <linux/string.h> +#include <linux/vmalloc.h> #include <asm/io.h> @@ -470,6 +472,67 @@ int walk_system_ram_res(u64 start, u64 end, void *arg, } /* + * This function, being a variant of walk_system_ram_res(), calls the @func + * callback against all memory ranges of type System RAM which are marked as + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from + * higher to lower. + */ +int walk_system_ram_res_rev(u64 start, u64 end, void *arg, + int (*func)(struct resource *, void *)) +{ + struct resource res, *rams; + int rams_size = 16, i; + int ret = -1; + + /* create a list */ + rams = vmalloc(sizeof(struct resource) * rams_size); + if (!rams) + return ret; + + res.start = start; + res.end = end; + res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; + i = 0; + while ((res.start < res.end) && + (!find_next_iomem_res(&res, IORES_DESC_NONE, true))) { + if (i >= rams_size) { + /* re-alloc */ + struct resource *rams_new; + int rams_new_size; + + rams_new_size = rams_size + 16; + rams_new = vmalloc(sizeof(struct resource) + * rams_new_size); + if (!rams_new) + goto out; + + memcpy(rams_new, rams, + sizeof(struct resource) * rams_size); + vfree(rams); + rams = rams_new; + rams_size = rams_new_size; + } + + rams[i].start = res.start; + rams[i++].end = res.end; + + res.start = res.end + 1; + res.end = end; + } + + /* go reverse */ + for (i--; i >= 0; i--) { + ret = (*func)(&rams[i], arg); + if (ret) + break; + } + +out: + vfree(rams); + return ret; +} + +/* * This function calls the @func callback against all memory ranges, which * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY. */ -- 2.13.6 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] resource: add walk_system_ram_res_rev() 2018-03-22 3:37 ` [PATCH 1/2] resource: add walk_system_ram_res_rev() Baoquan He @ 2018-03-22 22:29 ` Andrew Morton 2018-03-23 0:58 ` Baoquan He 0 siblings, 1 reply; 19+ messages in thread From: Andrew Morton @ 2018-03-22 22:29 UTC (permalink / raw) To: Baoquan He Cc: linux-kernel, kexec, takahiro.akashi, ebiederm, vgoyal, dyoung, prudo On Thu, 22 Mar 2018 11:37:21 +0800 Baoquan He <bhe@redhat.com> wrote: > From: AKASHI Takahiro <takahiro.akashi@linaro.org> > > This function, being a variant of walk_system_ram_res() introduced in > commit 8c86e70acead ("resource: provide new functions to walk through > resources"), walks through a list of all the resources of System RAM > in reversed order, i.e., from higher to lower. > > It will be used in kexec_file code. > > ... > > --- a/kernel/resource.c > +++ b/kernel/resource.c > @@ -23,6 +23,8 @@ > #include <linux/pfn.h> > #include <linux/mm.h> > #include <linux/resource_ext.h> > +#include <linux/string.h> > +#include <linux/vmalloc.h> > #include <asm/io.h> > > > @@ -470,6 +472,67 @@ int walk_system_ram_res(u64 start, u64 end, void *arg, > } > > /* > + * This function, being a variant of walk_system_ram_res(), calls the @func > + * callback against all memory ranges of type System RAM which are marked as > + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from > + * higher to lower. > + */ This should document the return value, as should walk_system_ram_res(). Why does it return -1 on error rather than an errno (ENOMEM)? > +int walk_system_ram_res_rev(u64 start, u64 end, void *arg, > + int (*func)(struct resource *, void *)) > +{ > + struct resource res, *rams; > + int rams_size = 16, i; > + int ret = -1; > + > + /* create a list */ > + rams = vmalloc(sizeof(struct resource) * rams_size); > + if (!rams) > + return ret; > + > + res.start = start; > + res.end = end; > + res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > + i = 0; > + while ((res.start < res.end) && > + (!find_next_iomem_res(&res, IORES_DESC_NONE, true))) { > + if (i >= rams_size) { > + /* re-alloc */ > + struct resource *rams_new; > + int rams_new_size; > + > + rams_new_size = rams_size + 16; > + rams_new = vmalloc(sizeof(struct resource) > + * rams_new_size); > + if (!rams_new) > + goto out; > + > + memcpy(rams_new, rams, > + sizeof(struct resource) * rams_size); > + vfree(rams); > + rams = rams_new; > + rams_size = rams_new_size; > + } > + > + rams[i].start = res.start; > + rams[i++].end = res.end; > + > + res.start = res.end + 1; > + res.end = end; > + } > + > + /* go reverse */ > + for (i--; i >= 0; i--) { > + ret = (*func)(&rams[i], arg); > + if (ret) > + break; > + } erk, this is pretty nasty. Isn't there a better way :( > +out: > + vfree(rams); > + return ret; > +} ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] resource: add walk_system_ram_res_rev() 2018-03-22 22:29 ` Andrew Morton @ 2018-03-23 0:58 ` Baoquan He 2018-03-23 2:06 ` Andrew Morton 0 siblings, 1 reply; 19+ messages in thread From: Baoquan He @ 2018-03-23 0:58 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, kexec, takahiro.akashi, ebiederm, vgoyal, dyoung, prudo Hi Andrew, Thanks a lot for your reviewing! On 03/22/18 at 03:29pm, Andrew Morton wrote: > > /* > > + * This function, being a variant of walk_system_ram_res(), calls the @func > > + * callback against all memory ranges of type System RAM which are marked as > > + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from > > + * higher to lower. > > + */ > > This should document the return value, as should walk_system_ram_res(). > Why does it return -1 on error rather than an errno (ENOMEM)? OK, will add sentences to tell this. So for walk_system_ram_res() only '-1' indicates the failure of finding, '0' the success. While in walk_system_ram_res_rev(), add '-ENOMEM' to indicate failure of vmalloc allocation. > > > +int walk_system_ram_res_rev(u64 start, u64 end, void *arg, > > + int (*func)(struct resource *, void *)) > > +{ > > + struct resource res, *rams; > > + int rams_size = 16, i; > > + int ret = -1; > > + > > + /* create a list */ > > + rams = vmalloc(sizeof(struct resource) * rams_size); > > + if (!rams) > > + return ret; > > + > > + res.start = start; > > + res.end = end; > > + res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > > + i = 0; > > + while ((res.start < res.end) && > > + (!find_next_iomem_res(&res, IORES_DESC_NONE, true))) { > > + if (i >= rams_size) { > > + /* re-alloc */ > > + struct resource *rams_new; > > + int rams_new_size; > > + > > + rams_new_size = rams_size + 16; > > + rams_new = vmalloc(sizeof(struct resource) > > + * rams_new_size); > > + if (!rams_new) > > + goto out; > > + > > + memcpy(rams_new, rams, > > + sizeof(struct resource) * rams_size); > > + vfree(rams); > > + rams = rams_new; > > + rams_size = rams_new_size; > > + } > > + > > + rams[i].start = res.start; > > + rams[i++].end = res.end; > > + > > + res.start = res.end + 1; > > + res.end = end; > > + } > > + > > + /* go reverse */ > > + for (i--; i >= 0; i--) { > > + ret = (*func)(&rams[i], arg); > > + if (ret) > > + break; > > + } > > erk, this is pretty nasty. Isn't there a better way :( Yes, this is not efficient. In struct resource{}, ->sibling list is a singly linked list. I ever thought about changing it to doubly linked list, yet not very sure if it will have effect since struct resource is a core data structure. AKASHI's method is more acceptable, and currently only kexec has this requirement. > > > +out: > > + vfree(rams); > > + return ret; > > +} > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] resource: add walk_system_ram_res_rev() 2018-03-23 0:58 ` Baoquan He @ 2018-03-23 2:06 ` Andrew Morton 2018-03-23 3:10 ` Baoquan He 0 siblings, 1 reply; 19+ messages in thread From: Andrew Morton @ 2018-03-23 2:06 UTC (permalink / raw) To: Baoquan He Cc: linux-kernel, kexec, takahiro.akashi, ebiederm, vgoyal, dyoung, prudo On Fri, 23 Mar 2018 08:58:45 +0800 Baoquan He <bhe@redhat.com> wrote: > > erk, this is pretty nasty. Isn't there a better way :( > > Yes, this is not efficient. > > In struct resource{}, ->sibling list is a singly linked list. I ever > thought about changing it to doubly linked list, yet not very sure if > it will have effect since struct resource is a core data structure. Switching to a list_head sounds OK. The only issue really is memory consumption and surely we don't have tens of thousands of struct resources floating about(?). Or if we do have a lot, the machine is presumably huge (hope?). > AKASHI's method is more acceptable, and currently only kexec has this > requirement. What method is that? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] resource: add walk_system_ram_res_rev() 2018-03-23 2:06 ` Andrew Morton @ 2018-03-23 3:10 ` Baoquan He 2018-03-23 20:06 ` Andrew Morton 0 siblings, 1 reply; 19+ messages in thread From: Baoquan He @ 2018-03-23 3:10 UTC (permalink / raw) To: Andrew Morton Cc: linux-kernel, kexec, takahiro.akashi, ebiederm, vgoyal, dyoung, prudo On 03/22/18 at 07:06pm, Andrew Morton wrote: > On Fri, 23 Mar 2018 08:58:45 +0800 Baoquan He <bhe@redhat.com> wrote: > > > > erk, this is pretty nasty. Isn't there a better way :( > > > > Yes, this is not efficient. > > > > In struct resource{}, ->sibling list is a singly linked list. I ever > > thought about changing it to doubly linked list, yet not very sure if > > it will have effect since struct resource is a core data structure. > > Switching to a list_head sounds OK. The only issue really is memory > consumption and surely we don't have tens of thousands of struct > resources floating about(?). Or if we do have a lot, the machine is > presumably huge (hope?). Yes. It doubles the memory consumption. AFAIK, the biggest number of resrouces I heard of possibly is mentioned in this user space kexec_tools commit. In this commit, Xunlei told on SGI system with 64TB RAM, the array which we have been using to store "System RAM"|"Reserved"|"ACPI **" regions is not big enough. In that case, we need extra 8Byte*2048=16KB at most. With my understanding, this increase is system wide, since each resource instance only needs its own list_head member, right? commit 4a6d67d9e938a7accf128aff23f8ad4bda67f729 Author: Xunlei Pang <xlpang@redhat.com> Date: Thu Mar 23 19:16:59 2017 +0800 x86: Support large number of memory ranges We got a problem on one SGI 64TB machine, the current kexec-tools failed to work due to the insufficient ranges(MAX_MEMORY_RANGES) allowed which is defined as 1024(less than the ranges on the machine). The kcore header is insufficient due to the same reason as well. To solve this, this patch simply doubles "MAX_MEMORY_RANGES" and "KCORE_ELF_HEADERS_SIZE". Signed-off-by: Xunlei Pang <xlpang@redhat.com> Tested-by: Frank Ramsay <frank.ramsay@hpe.com> Signed-off-by: Simon Horman <horms@verge.net.au> diff --git a/kexec/arch/i386/kexec-x86.h b/kexec/arch/i386/kexec-x86.h index 33df352..51855f8 100644 --- a/kexec/arch/i386/kexec-x86.h +++ b/kexec/arch/i386/kexec-x86.h @@ -1,7 +1,7 @@ #ifndef KEXEC_X86_H #define KEXEC_X86_H -#define MAX_MEMORY_RANGES 1024 +#define MAX_MEMORY_RANGES 2048 > > > AKASHI's method is more acceptable, and currently only kexec has this > > requirement. > > What method is that? I meant this patch is made by AKASHI, he posted a patchset to add kexec_file support for arm64. Among those patches this one is used to arm64 kernel at top of system RAM. Later they change a different way to search memory region to load kernel, so he dropped this patch. ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] resource: add walk_system_ram_res_rev() 2018-03-23 3:10 ` Baoquan He @ 2018-03-23 20:06 ` Andrew Morton 2018-03-24 13:33 ` Baoquan He 0 siblings, 1 reply; 19+ messages in thread From: Andrew Morton @ 2018-03-23 20:06 UTC (permalink / raw) To: Baoquan He Cc: linux-kernel, kexec, takahiro.akashi, ebiederm, vgoyal, dyoung, prudo On Fri, 23 Mar 2018 11:10:13 +0800 Baoquan He <bhe@redhat.com> wrote: > On 03/22/18 at 07:06pm, Andrew Morton wrote: > > On Fri, 23 Mar 2018 08:58:45 +0800 Baoquan He <bhe@redhat.com> wrote: > > > > > > erk, this is pretty nasty. Isn't there a better way :( > > > > > > Yes, this is not efficient. > > > > > > In struct resource{}, ->sibling list is a singly linked list. I ever > > > thought about changing it to doubly linked list, yet not very sure if > > > it will have effect since struct resource is a core data structure. > > > > Switching to a list_head sounds OK. The only issue really is memory > > consumption and surely we don't have tens of thousands of struct > > resources floating about(?). Or if we do have a lot, the machine is > > presumably huge (hope?). > > Yes. It doubles the memory consumption. > > AFAIK, the biggest number of resrouces I heard of possibly is mentioned > in this user space kexec_tools commit. In this commit, Xunlei told on > SGI system with 64TB RAM, the array which we have been using to store > "System RAM"|"Reserved"|"ACPI **" regions is not big enough. In that > case, we need extra 8Byte*2048=16KB at most. With my understanding, this > increase is system wide, since each resource instance only needs its own > list_head member, right? Yes. That sounds perfectly acceptable. It would be interesting to see what this approach looks like, if you have time to toss something together? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] resource: add walk_system_ram_res_rev() 2018-03-23 20:06 ` Andrew Morton @ 2018-03-24 13:33 ` Baoquan He 2018-03-24 16:13 ` Wei Yang 0 siblings, 1 reply; 19+ messages in thread From: Baoquan He @ 2018-03-24 13:33 UTC (permalink / raw) To: Andrew Morton Cc: prudo, kexec, linux-kernel, takahiro.akashi, ebiederm, dyoung, vgoyal On 03/23/18 at 01:06pm, Andrew Morton wrote: > On Fri, 23 Mar 2018 11:10:13 +0800 Baoquan He <bhe@redhat.com> wrote: > > > On 03/22/18 at 07:06pm, Andrew Morton wrote: > > > On Fri, 23 Mar 2018 08:58:45 +0800 Baoquan He <bhe@redhat.com> wrote: > > > > > > > > erk, this is pretty nasty. Isn't there a better way :( > > > > > > > > Yes, this is not efficient. > > > > > > > > In struct resource{}, ->sibling list is a singly linked list. I ever > > > > thought about changing it to doubly linked list, yet not very sure if > > > > it will have effect since struct resource is a core data structure. > > > > > > Switching to a list_head sounds OK. The only issue really is memory > > > consumption and surely we don't have tens of thousands of struct > > > resources floating about(?). Or if we do have a lot, the machine is > > > presumably huge (hope?). > > > > Yes. It doubles the memory consumption. > > > > AFAIK, the biggest number of resrouces I heard of possibly is mentioned > > in this user space kexec_tools commit. In this commit, Xunlei told on > > SGI system with 64TB RAM, the array which we have been using to store > > "System RAM"|"Reserved"|"ACPI **" regions is not big enough. In that > > case, we need extra 8Byte*2048=16KB at most. With my understanding, this > > increase is system wide, since each resource instance only needs its own > > list_head member, right? > > Yes. That sounds perfectly acceptable. > > It would be interesting to see what this approach looks like, if you > have time to toss something together? OK, will make patches for reviewing. Thanks! ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] resource: add walk_system_ram_res_rev() 2018-03-24 13:33 ` Baoquan He @ 2018-03-24 16:13 ` Wei Yang 2018-03-26 14:30 ` Baoquan He 0 siblings, 1 reply; 19+ messages in thread From: Wei Yang @ 2018-03-24 16:13 UTC (permalink / raw) To: Baoquan He Cc: Andrew Morton, prudo, kexec, linux-kernel, takahiro.akashi, ebiederm, dyoung, vgoyal On Sat, Mar 24, 2018 at 09:33:30PM +0800, Baoquan He wrote: >> >> Yes. That sounds perfectly acceptable. >> >> It would be interesting to see what this approach looks like, if you >> have time to toss something together? > >OK, will make patches for reviewing. Thanks! Hi, Baoquan, Andrew I have come up with an implementation for top-down search the ram resources. Hope this would meet your need. >From b36d50487f1d4e4d6a5103965a27101b3121e0ea Mon Sep 17 00:00:00 2001 From: Wei Yang <richard.weiyang@gmail.com> Date: Sat, 24 Mar 2018 23:25:46 +0800 Subject: [PATCH] kernel/resource: add walk_system_ram_res_rev() As discussed on https://patchwork.kernel.org/patch/10300819/, this patch comes up with a variant implementation of walk_system_ram_res_rev(), which uses iteration instead of allocating array to store those resources. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> --- include/linux/ioport.h | 3 ++ kernel/resource.c | 113 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 116 insertions(+) diff --git a/include/linux/ioport.h b/include/linux/ioport.h index da0ebaec25f0..473f1d9cb97e 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -277,6 +277,9 @@ extern int walk_system_ram_res(u64 start, u64 end, void *arg, int (*func)(struct resource *, void *)); extern int +walk_system_ram_res_rev(u64 start, u64 end, void *arg, + int (*func)(struct resource *, void *)); +extern int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end, void *arg, int (*func)(struct resource *, void *)); diff --git a/kernel/resource.c b/kernel/resource.c index 769109f20fb7..ddf6b4c41498 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -73,6 +73,38 @@ static struct resource *next_resource(struct resource *p, bool sibling_only) return p->sibling; } +static struct resource *prev_resource(struct resource *p, bool sibling_only) +{ + struct resource *prev; + if (NULL == iomem_resource.child) + return NULL; + + if (p == NULL) { + prev = iomem_resource.child; + while (prev->sibling) + prev = prev->sibling; + } else { + if (p->parent->child == p) { + return p->parent; + } + + for (prev = p->parent->child; prev->sibling != p; + prev = prev->sibling) {} + } + + /* Caller wants to traverse through siblings only */ + if (sibling_only) + return prev; + + for (;prev->child;) { + prev = prev->child; + + while (prev->sibling) + prev = prev->sibling; + } + return prev; +} + static void *r_next(struct seq_file *m, void *v, loff_t *pos) { struct resource *p = v; @@ -401,6 +433,47 @@ static int find_next_iomem_res(struct resource *res, unsigned long desc, return 0; } +/* + * Finds the highest iomem resource existing within [res->start.res->end). + * The caller must specify res->start, res->end, res->flags, and optionally + * desc. If found, returns 0, res is overwritten, if not found, returns -1. + * This function walks the whole tree and not just first level children until + * and unless first_level_children_only is true. + */ +static int find_prev_iomem_res(struct resource *res, unsigned long desc, + bool first_level_children_only) +{ + struct resource *p; + + BUG_ON(!res); + BUG_ON(res->start >= res->end); + + read_lock(&resource_lock); + + for (p = prev_resource(NULL, first_level_children_only); p; + p = prev_resource(p, first_level_children_only)) { + if ((p->flags & res->flags) != res->flags) + continue; + if ((desc != IORES_DESC_NONE) && (desc != p->desc)) + continue; + if (p->end < res->start) { + p = NULL; + break; + } + if ((p->end >= res->start) && (p->start < res->end)) + break; + } + + read_unlock(&resource_lock); + if (!p) + return -1; + /* copy data */ + resource_clip(res, p->start, p->end); + res->flags = p->flags; + res->desc = p->desc; + return 0; +} + static int __walk_iomem_res_desc(struct resource *res, unsigned long desc, bool first_level_children_only, void *arg, @@ -422,6 +495,27 @@ static int __walk_iomem_res_desc(struct resource *res, unsigned long desc, return ret; } +static int __walk_iomem_res_rev_desc(struct resource *res, unsigned long desc, + bool first_level_children_only, + void *arg, + int (*func)(struct resource *, void *)) +{ + u64 orig_start = res->start; + int ret = -1; + + while ((res->start < res->end) && + !find_prev_iomem_res(res, desc, first_level_children_only)) { + ret = (*func)(res, arg); + if (ret) + break; + + res->end = res->start - 1; + res->start = orig_start; + } + + return ret; +} + /* * Walks through iomem resources and calls func() with matching resource * ranges. This walks through whole tree and not just first level children. @@ -468,6 +562,25 @@ int walk_system_ram_res(u64 start, u64 end, void *arg, arg, func); } +/* + * This function, being a variant of walk_system_ram_res(), calls the @func + * callback against all memory ranges of type System RAM which are marked as + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from + * higher to lower. + */ +int walk_system_ram_res_rev(u64 start, u64 end, void *arg, + int (*func)(struct resource *, void *)) +{ + struct resource res; + + res.start = start; + res.end = end; + res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; + + return __walk_iomem_res_rev_desc(&res, IORES_DESC_NONE, true, + arg, func); +} + /* * This function calls the @func callback against all memory ranges, which * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY. -- 2.15.1 -- Wei Yang Help you, Help me ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] resource: add walk_system_ram_res_rev() 2018-03-24 16:13 ` Wei Yang @ 2018-03-26 14:30 ` Baoquan He 2018-03-26 15:04 ` Wei Yang 0 siblings, 1 reply; 19+ messages in thread From: Baoquan He @ 2018-03-26 14:30 UTC (permalink / raw) To: Wei Yang Cc: Andrew Morton, prudo, kexec, linux-kernel, takahiro.akashi, ebiederm, dyoung, vgoyal Hi Wei Yang, On 03/25/18 at 12:13am, Wei Yang wrote: > On Sat, Mar 24, 2018 at 09:33:30PM +0800, Baoquan He wrote: > >> > >> Yes. That sounds perfectly acceptable. > >> > >> It would be interesting to see what this approach looks like, if you > >> have time to toss something together? > > > >OK, will make patches for reviewing. Thanks! > > Hi, Baoquan, Andrew > > I have come up with an implementation for top-down search the ram resources. > Hope this would meet your need. Thanks for telling and your effort. Glad to know I am not the only buyer of walk_system_ram_res_rev. I am fine with other ways to make it, people can compare them and know which one is better. I am working to use the list_head instead, the doubly linked list way as Andrew suggested. Andrew and other people can help make a choice. It won't be long. Thanks Baoquan > > From b36d50487f1d4e4d6a5103965a27101b3121e0ea Mon Sep 17 00:00:00 2001 > From: Wei Yang <richard.weiyang@gmail.com> > Date: Sat, 24 Mar 2018 23:25:46 +0800 > Subject: [PATCH] kernel/resource: add walk_system_ram_res_rev() > > As discussed on https://patchwork.kernel.org/patch/10300819/, this patch > comes up with a variant implementation of walk_system_ram_res_rev(), which > uses iteration instead of allocating array to store those resources. > > Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > --- > include/linux/ioport.h | 3 ++ > kernel/resource.c | 113 +++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 116 insertions(+) > > diff --git a/include/linux/ioport.h b/include/linux/ioport.h > index da0ebaec25f0..473f1d9cb97e 100644 > --- a/include/linux/ioport.h > +++ b/include/linux/ioport.h > @@ -277,6 +277,9 @@ extern int > walk_system_ram_res(u64 start, u64 end, void *arg, > int (*func)(struct resource *, void *)); > extern int > +walk_system_ram_res_rev(u64 start, u64 end, void *arg, > + int (*func)(struct resource *, void *)); > +extern int > walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end, > void *arg, int (*func)(struct resource *, void *)); > > diff --git a/kernel/resource.c b/kernel/resource.c > index 769109f20fb7..ddf6b4c41498 100644 > --- a/kernel/resource.c > +++ b/kernel/resource.c > @@ -73,6 +73,38 @@ static struct resource *next_resource(struct resource *p, bool sibling_only) > return p->sibling; > } > > +static struct resource *prev_resource(struct resource *p, bool sibling_only) > +{ > + struct resource *prev; > + if (NULL == iomem_resource.child) > + return NULL; > + > + if (p == NULL) { > + prev = iomem_resource.child; > + while (prev->sibling) > + prev = prev->sibling; > + } else { > + if (p->parent->child == p) { > + return p->parent; > + } > + > + for (prev = p->parent->child; prev->sibling != p; > + prev = prev->sibling) {} > + } > + > + /* Caller wants to traverse through siblings only */ > + if (sibling_only) > + return prev; > + > + for (;prev->child;) { > + prev = prev->child; > + > + while (prev->sibling) > + prev = prev->sibling; > + } > + return prev; > +} > + > static void *r_next(struct seq_file *m, void *v, loff_t *pos) > { > struct resource *p = v; > @@ -401,6 +433,47 @@ static int find_next_iomem_res(struct resource *res, unsigned long desc, > return 0; > } > > +/* > + * Finds the highest iomem resource existing within [res->start.res->end). > + * The caller must specify res->start, res->end, res->flags, and optionally > + * desc. If found, returns 0, res is overwritten, if not found, returns -1. > + * This function walks the whole tree and not just first level children until > + * and unless first_level_children_only is true. > + */ > +static int find_prev_iomem_res(struct resource *res, unsigned long desc, > + bool first_level_children_only) > +{ > + struct resource *p; > + > + BUG_ON(!res); > + BUG_ON(res->start >= res->end); > + > + read_lock(&resource_lock); > + > + for (p = prev_resource(NULL, first_level_children_only); p; > + p = prev_resource(p, first_level_children_only)) { > + if ((p->flags & res->flags) != res->flags) > + continue; > + if ((desc != IORES_DESC_NONE) && (desc != p->desc)) > + continue; > + if (p->end < res->start) { > + p = NULL; > + break; > + } > + if ((p->end >= res->start) && (p->start < res->end)) > + break; > + } > + > + read_unlock(&resource_lock); > + if (!p) > + return -1; > + /* copy data */ > + resource_clip(res, p->start, p->end); > + res->flags = p->flags; > + res->desc = p->desc; > + return 0; > +} > + > static int __walk_iomem_res_desc(struct resource *res, unsigned long desc, > bool first_level_children_only, > void *arg, > @@ -422,6 +495,27 @@ static int __walk_iomem_res_desc(struct resource *res, unsigned long desc, > return ret; > } > > +static int __walk_iomem_res_rev_desc(struct resource *res, unsigned long desc, > + bool first_level_children_only, > + void *arg, > + int (*func)(struct resource *, void *)) > +{ > + u64 orig_start = res->start; > + int ret = -1; > + > + while ((res->start < res->end) && > + !find_prev_iomem_res(res, desc, first_level_children_only)) { > + ret = (*func)(res, arg); > + if (ret) > + break; > + > + res->end = res->start - 1; > + res->start = orig_start; > + } > + > + return ret; > +} > + > /* > * Walks through iomem resources and calls func() with matching resource > * ranges. This walks through whole tree and not just first level children. > @@ -468,6 +562,25 @@ int walk_system_ram_res(u64 start, u64 end, void *arg, > arg, func); > } > > +/* > + * This function, being a variant of walk_system_ram_res(), calls the @func > + * callback against all memory ranges of type System RAM which are marked as > + * IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY in reversed order, i.e., from > + * higher to lower. > + */ > +int walk_system_ram_res_rev(u64 start, u64 end, void *arg, > + int (*func)(struct resource *, void *)) > +{ > + struct resource res; > + > + res.start = start; > + res.end = end; > + res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; > + > + return __walk_iomem_res_rev_desc(&res, IORES_DESC_NONE, true, > + arg, func); > +} > + > /* > * This function calls the @func callback against all memory ranges, which > * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY. > -- > 2.15.1 > > > -- > Wei Yang > Help you, Help me ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] resource: add walk_system_ram_res_rev() 2018-03-26 14:30 ` Baoquan He @ 2018-03-26 15:04 ` Wei Yang 0 siblings, 0 replies; 19+ messages in thread From: Wei Yang @ 2018-03-26 15:04 UTC (permalink / raw) To: Baoquan He Cc: Wei Yang, Andrew Morton, prudo, kexec, linux-kernel, takahiro.akashi, ebiederm, dyoung, vgoyal On Mon, Mar 26, 2018 at 10:30:16PM +0800, Baoquan He wrote: >Hi Wei Yang, > >On 03/25/18 at 12:13am, Wei Yang wrote: >> On Sat, Mar 24, 2018 at 09:33:30PM +0800, Baoquan He wrote: >> >> >> >> Yes. That sounds perfectly acceptable. >> >> >> >> It would be interesting to see what this approach looks like, if you >> >> have time to toss something together? >> > >> >OK, will make patches for reviewing. Thanks! >> >> Hi, Baoquan, Andrew >> >> I have come up with an implementation for top-down search the ram resources. >> Hope this would meet your need. > >Thanks for telling and your effort. Glad to know >I am not the only buyer of walk_system_ram_res_rev. I am fine with other >ways to make it, people can compare them and know which one is better. >I am working to use the list_head instead, the doubly linked list way >as Andrew suggested. Andrew and other people can help make a choice. It >won't be long. > Sure, Look forward your approach. >Thanks >Baoquan > -- Wei Yang Help you, Help me ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2024-01-20 21:09 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-11-14 9:16 [PATCH 0/2] kexec_file: Load kernel at top of system RAM if required Baoquan He 2023-11-14 9:16 ` [PATCH 1/2] resource: add walk_system_ram_res_rev() Baoquan He 2023-11-14 23:17 ` Andrew Morton 2023-11-15 0:40 ` Baoquan He 2023-11-15 13:00 ` [PATCH v2 " Baoquan He 2023-11-23 13:27 ` Baoquan He 2024-01-20 21:09 ` patchwork-bot+linux-riscv 2023-11-14 9:16 ` [PATCH 2/2] kexec_file: Load kernel at top of system RAM if required Baoquan He 2024-01-20 21:09 ` [PATCH 0/2] " patchwork-bot+linux-riscv -- strict thread matches above, loose matches on Subject: below -- 2018-03-22 3:37 [PATCH 0/2] Kexec_file: Load kernel at top of system ram Baoquan He 2018-03-22 3:37 ` [PATCH 1/2] resource: add walk_system_ram_res_rev() Baoquan He 2018-03-22 22:29 ` Andrew Morton 2018-03-23 0:58 ` Baoquan He 2018-03-23 2:06 ` Andrew Morton 2018-03-23 3:10 ` Baoquan He 2018-03-23 20:06 ` Andrew Morton 2018-03-24 13:33 ` Baoquan He 2018-03-24 16:13 ` Wei Yang 2018-03-26 14:30 ` Baoquan He 2018-03-26 15:04 ` Wei Yang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).