* [PATCH v2 0/3] RISC-V: Add kexec/kdump support
@ 2020-06-23 15:05 Nick Kossifidis
2020-06-23 15:05 ` [PATCH v2 1/3] RISC-V: Add kexec support Nick Kossifidis
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Nick Kossifidis @ 2020-06-23 15:05 UTC (permalink / raw)
To: linux-riscv, palmer
Cc: david.abdurachmanov, anup, atish.patra, yibin_liu, zong.li,
paul.walmsley, Nick Kossifidis
This patch series adds kexec/kdump and crash kernel
support on RISC-V. For testing the patches a patched
version of kexec-tools is needed. The patch is still
a work in progress but a draft version can be found at:
http://riscv.ics.forth.gr/kexec-tools.patch
v2:
* Re-base on newer kernel tree and minor cleanups
* Properly populate the ioresources tree, so that it can be
used later on for implementing strict /dev/mem.
* Use linux,usable-memory on /memory instead of a new binding
* Use a reserved-memory node for ELF core header
Nick Kossifidis (3):
RISC-V: Add kexec support
RISC-V: Add kdump support
RISC-V: Add crash kernel support
arch/riscv/Kconfig | 24 +++
arch/riscv/include/asm/kexec.h | 54 +++++++
arch/riscv/include/uapi/asm/elf.h | 6 +
arch/riscv/kernel/Makefile | 2 +
arch/riscv/kernel/crash_dump.c | 46 ++++++
arch/riscv/kernel/crash_save_regs.S | 56 +++++++
arch/riscv/kernel/kexec_relocate.S | 217 ++++++++++++++++++++++++++++
arch/riscv/kernel/machine_kexec.c | 193 +++++++++++++++++++++++++
arch/riscv/kernel/setup.c | 209 +++++++++++++++++++++++++++
arch/riscv/mm/init.c | 110 ++++++++++++++
include/uapi/linux/kexec.h | 1 +
11 files changed, 918 insertions(+)
create mode 100644 arch/riscv/include/asm/kexec.h
create mode 100644 arch/riscv/kernel/crash_dump.c
create mode 100644 arch/riscv/kernel/crash_save_regs.S
create mode 100644 arch/riscv/kernel/kexec_relocate.S
create mode 100644 arch/riscv/kernel/machine_kexec.c
--
2.26.2
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 1/3] RISC-V: Add kexec support
2020-06-23 15:05 [PATCH v2 0/3] RISC-V: Add kexec/kdump support Nick Kossifidis
@ 2020-06-23 15:05 ` Nick Kossifidis
2020-07-11 3:59 ` Palmer Dabbelt
2020-06-23 15:05 ` [PATCH v2 2/3] RISC-V: Add kdump support Nick Kossifidis
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: Nick Kossifidis @ 2020-06-23 15:05 UTC (permalink / raw)
To: linux-riscv, palmer
Cc: david.abdurachmanov, anup, atish.patra, yibin_liu, zong.li,
paul.walmsley, Nick Kossifidis
This patch adds support for kexec on RISC-V. On SMP systems it depends
on HOTPLUG_CPU in order to be able to bring up all harts after kexec.
It also needs a recent OpenSBI version that supports the HSM extension.
I tested it on riscv64 QEMU on both an smp and a non-smp system.
The older version of this patch can be found at:
https://patchwork.kernel.org/patch/11508671/
v4:
* No functional changes, just re-based
v3:
* Use the new smp_shutdown_nonboot_cpus() call.
* Move riscv_kexec_relocate to .rodata
v2:
* Pass needed parameters as arguments to riscv_kexec_relocate
instead of using global variables.
* Use kimage_arch to hold the fdt address of the included fdt.
* Use SYM_* macros on kexec_relocate.S.
* Compatibility with STRICT_KERNEL_RWX.
* Compatibility with HOTPLUG_CPU for SMP
* Small cleanups
Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
Tested-by: Nick Kossifidis <mick@ics.forth.gr>
---
arch/riscv/Kconfig | 14 +++
arch/riscv/include/asm/kexec.h | 47 ++++++++
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/kexec_relocate.S | 158 ++++++++++++++++++++++++
arch/riscv/kernel/machine_kexec.c | 188 +++++++++++++++++++++++++++++
include/uapi/linux/kexec.h | 1 +
6 files changed, 409 insertions(+)
create mode 100644 arch/riscv/include/asm/kexec.h
create mode 100644 arch/riscv/kernel/kexec_relocate.S
create mode 100644 arch/riscv/kernel/machine_kexec.c
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 128192e14..e7dd2d1c5 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -340,6 +340,20 @@ config RISCV_SBI_V01
help
This config allows kernel to use SBI v0.1 APIs. This will be
deprecated in future once legacy M-mode software are no longer in use.
+
+config KEXEC
+ bool "Kexec system call"
+ select KEXEC_CORE
+ select HOTPLUG_CPU if SMP
+ help
+ kexec is a system call that implements the ability to shutdown your
+ current kernel, and to start another kernel. It is like a reboot
+ but it is independent of the system firmware. And like a reboot
+ you can start any kernel with it, not just Linux.
+
+ The name comes from the similarity to the exec system call.
+
+
endmenu
menu "Boot options"
diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h
new file mode 100644
index 000000000..efc69feb4
--- /dev/null
+++ b/arch/riscv/include/asm/kexec.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 FORTH-ICS/CARV
+ * Nick Kossifidis <mick@ics.forth.gr>
+ */
+
+#ifndef _RISCV_KEXEC_H
+#define _RISCV_KEXEC_H
+
+/* Maximum physical address we can use pages from */
+#define KEXEC_SOURCE_MEMORY_LIMIT (-1UL)
+
+/* Maximum address we can reach in physical address mode */
+#define KEXEC_DESTINATION_MEMORY_LIMIT (-1UL)
+
+/* Maximum address we can use for the control code buffer */
+#define KEXEC_CONTROL_MEMORY_LIMIT (-1UL)
+
+/* Reserve a page for the control code buffer */
+#define KEXEC_CONTROL_PAGE_SIZE 4096
+
+#define KEXEC_ARCH KEXEC_ARCH_RISCV
+
+static inline void
+crash_setup_regs(struct pt_regs *newregs,
+ struct pt_regs *oldregs)
+{
+ /* Dummy implementation for now */
+}
+
+
+#define ARCH_HAS_KIMAGE_ARCH
+
+struct kimage_arch {
+ unsigned long fdt_addr;
+};
+
+const extern unsigned char riscv_kexec_relocate[];
+const extern unsigned int riscv_kexec_relocate_size;
+
+typedef void (*riscv_kexec_do_relocate)(unsigned long first_ind_entry,
+ unsigned long jump_addr,
+ unsigned long fdt_addr,
+ unsigned long hartid,
+ unsigned long va_pa_off);
+
+#endif
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index b355cf485..63fc530a4 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -52,5 +52,6 @@ obj-$(CONFIG_SMP) += cpu_ops_sbi.o
endif
obj-$(CONFIG_HOTPLUG_CPU) += cpu-hotplug.o
obj-$(CONFIG_KGDB) += kgdb.o
+obj-${CONFIG_KEXEC} += kexec_relocate.o machine_kexec.o
clean:
diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S
new file mode 100644
index 000000000..cd17d585f
--- /dev/null
+++ b/arch/riscv/kernel/kexec_relocate.S
@@ -0,0 +1,158 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 FORTH-ICS/CARV
+ * Nick Kossifidis <mick@ics.forth.gr>
+ */
+
+#include <asm/asm.h> /* For RISCV_* and REG_* macros */
+#include <asm/page.h> /* For PAGE_SHIFT */
+#include <linux/linkage.h> /* For SYM_* macros */
+
+.section ".rodata"
+SYM_CODE_START(riscv_kexec_relocate)
+
+ /*
+ * s0: Pointer to the current entry
+ * s1: (const) Phys address to jump to after relocation
+ * s2: (const) Phys address of the FDT image
+ * s3: (const) The hartid of the current hart
+ * s4: Pointer to the destination address for the relocation
+ * s5: (const) Number of words per page
+ * s6: (const) 1, used for subtraction
+ * s7: (const) va_pa_offset, used when switching MMU off
+ * s8: (const) Physical address of the main loop
+ * s9: (debug) indirection page counter
+ * s10: (debug) entry counter
+ * s11: (debug) copied words counter
+ */
+ mv s0, a0
+ mv s1, a1
+ mv s2, a2
+ mv s3, a3
+ mv s4, zero
+ li s5, ((1 << PAGE_SHIFT) / RISCV_SZPTR)
+ li s6, 1
+ mv s7, a4
+ mv s8, zero
+ mv s9, zero
+ mv s10, zero
+ mv s11, zero
+
+ /* Disable / cleanup interrupts */
+ csrw sie, zero
+ csrw sip, zero
+
+ /*
+ * When we switch SATP.MODE to "Bare" we'll only
+ * play with physical addresses. However the first time
+ * we try to jump somewhere, the offset on the jump
+ * will be relative to pc which will still be on VA. To
+ * deal with this we set stvec to the physical address at
+ * the start of the loop below so that we jump there in
+ * any case.
+ */
+ la s8, 1f
+ sub s8, s8, s7
+ csrw stvec, s8
+
+ /* Process entries in a loop */
+1:
+ addi s10, s10, 1
+ REG_L t0, 0(s0) /* t0 = *image->entry */
+ addi s0, s0, RISCV_SZPTR /* image->entry++ */
+
+ /* IND_DESTINATION entry ? -> save destination address */
+ andi t1, t0, 0x1
+ beqz t1, 2f
+ andi s4, t0, ~0x1
+ j 1b
+
+2:
+ /* IND_INDIRECTION entry ? -> update next entry ptr (PA) */
+ andi t1, t0, 0x2
+ beqz t1, 2f
+ andi s0, t0, ~0x2
+ addi s9, s9, 1
+ csrw sptbr, zero
+ jalr zero, s8, 0
+
+2:
+ /* IND_DONE entry ? -> jump to done label */
+ andi t1, t0, 0x4
+ beqz t1, 2f
+ j 4f
+
+2:
+ /*
+ * IND_SOURCE entry ? -> copy page word by word to the
+ * destination address we got from IND_DESTINATION
+ */
+ andi t1, t0, 0x8
+ beqz t1, 1b /* Unknown entry type, ignore it */
+ andi t0, t0, ~0x8
+ mv t3, s5 /* i = num words per page */
+3: /* copy loop */
+ REG_L t1, (t0) /* t1 = *src_ptr */
+ REG_S t1, (s4) /* *dst_ptr = *src_ptr */
+ addi t0, t0, RISCV_SZPTR /* stc_ptr++ */
+ addi s4, s4, RISCV_SZPTR /* dst_ptr++ */
+ sub t3, t3, s6 /* i-- */
+ addi s11, s11, 1 /* c++ */
+ beqz t3, 1b /* copy done ? */
+ j 3b
+
+4:
+ /* Wait for the relocation to be visible by other harts */
+ fence w,w
+
+ /* Pass the arguments to the next kernel / Cleanup*/
+ mv a0, s3
+ mv a1, s2
+ mv a2, s1
+
+ /* Cleanup */
+ mv a3, zero
+ mv a4, zero
+ mv a5, zero
+ mv a6, zero
+ mv a7, zero
+
+ mv s0, zero
+ mv s1, zero
+ mv s2, zero
+ mv s3, zero
+ mv s4, zero
+ mv s5, zero
+ mv s6, zero
+ mv s7, zero
+ mv s8, zero
+ mv s9, zero
+ mv s10, zero
+ mv s11, zero
+
+ mv t0, zero
+ mv t1, zero
+ mv t2, zero
+ mv t3, zero
+ mv t4, zero
+ mv t5, zero
+ mv t6, zero
+ csrw sepc, zero
+ csrw scause, zero
+ csrw sscratch, zero
+
+ /*
+ * Make sure the relocated code is visible
+ * and jump to the new kernel
+ */
+ fence.i
+
+ jalr zero, a2, 0
+
+SYM_CODE_END(riscv_kexec_relocate)
+riscv_kexec_relocate_end:
+
+ .section ".rodata"
+SYM_DATA(riscv_kexec_relocate_size,
+ .long riscv_kexec_relocate_end - riscv_kexec_relocate)
+
diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c
new file mode 100644
index 000000000..88734a056
--- /dev/null
+++ b/arch/riscv/kernel/machine_kexec.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 FORTH-ICS/CARV
+ * Nick Kossifidis <mick@ics.forth.gr>
+ */
+
+#include <linux/kexec.h>
+#include <asm/kexec.h> /* For riscv_kexec_* symbol defines */
+#include <linux/smp.h> /* For smp_send_stop () */
+#include <asm/cacheflush.h> /* For local_flush_icache_all() */
+#include <asm/barrier.h> /* For smp_wmb() */
+#include <asm/page.h> /* For PAGE_MASK */
+#include <linux/libfdt.h> /* For fdt_check_header() */
+#include <asm/set_memory.h> /* For set_memory_x() */
+#include <linux/compiler.h> /* For unreachable() */
+#include <linux/cpu.h> /* For cpu_down() */
+
+/**
+ * kexec_image_info - Print received image details
+ */
+static void
+kexec_image_info(const struct kimage *image)
+{
+ unsigned long i;
+
+ pr_debug("Kexec image info:\n");
+ pr_debug("\ttype: %d\n", image->type);
+ pr_debug("\tstart: %lx\n", image->start);
+ pr_debug("\thead: %lx\n", image->head);
+ pr_debug("\tnr_segments: %lu\n", image->nr_segments);
+
+ for (i = 0; i < image->nr_segments; i++) {
+ pr_debug("\t segment[%lu]: %016lx - %016lx", i,
+ image->segment[i].mem,
+ image->segment[i].mem + image->segment[i].memsz);
+ pr_debug("\t\t0x%lx bytes, %lu pages\n",
+ (unsigned long) image->segment[i].memsz,
+ (unsigned long) image->segment[i].memsz / PAGE_SIZE);
+ }
+}
+
+/**
+ * machine_kexec_prepare - Initialize kexec
+ *
+ * This function is called from do_kexec_load, when the user has
+ * provided us with an image to be loaded. Its goal is to validate
+ * the image and prepare the control code buffer as needed.
+ * Note that kimage_alloc_init has already been called and the
+ * control buffer has already been allocated.
+ */
+int
+machine_kexec_prepare(struct kimage *image)
+{
+ struct kimage_arch *internal = &image->arch;
+ struct fdt_header fdt = {0};
+ void *control_code_buffer = NULL;
+ int i = 0;
+
+ kexec_image_info(image);
+
+ if (image->type == KEXEC_TYPE_CRASH) {
+ pr_warn("Loading a crash kernel is unsupported for now.\n");
+ return -EINVAL;
+ }
+
+ /* Find the Flattened Device Tree and save its physical address */
+ for (i = 0; i < image->nr_segments; i++) {
+ if (image->segment[i].memsz <= sizeof(fdt))
+ continue;
+
+ if (copy_from_user(&fdt, image->segment[i].buf, sizeof(fdt)))
+ continue;
+
+ if (fdt_check_header(&fdt))
+ continue;
+
+ internal->fdt_addr = (unsigned long) image->segment[i].mem;
+ break;
+ }
+
+ if (!internal->fdt_addr) {
+ pr_err("Device tree not included in the provided image\n");
+ return -EINVAL;
+ }
+
+ /* Copy the assembler code for relocation to the control page */
+ control_code_buffer = page_address(image->control_code_page);
+ memcpy(control_code_buffer, riscv_kexec_relocate,
+ riscv_kexec_relocate_size);
+
+ /* Mark the control page executable */
+ set_memory_x((unsigned long) control_code_buffer, 1);
+
+#ifdef CONFIG_SMP
+ /*
+ * Make sure other harts see the copied data
+ * if they try to read the control buffer
+ */
+ smp_wmb();
+#endif
+
+ return 0;
+}
+
+
+/**
+ * machine_kexec_cleanup - Cleanup any leftovers from
+ * machine_kexec_prepare
+ *
+ * This function is called by kimage_free to handle any arch-specific
+ * allocations done on machine_kexec_prepare. Since we didn't do any
+ * allocations there, this is just an empty function. Note that the
+ * control buffer is freed by kimage_free.
+ */
+void
+machine_kexec_cleanup(struct kimage *image)
+{
+}
+
+
+/*
+ * machine_shutdown - Prepare for a kexec reboot
+ *
+ * This function is called by kernel_kexec just before machine_kexec
+ * below. Its goal is to prepare the rest of the system (the other
+ * harts and possibly devices etc) for a kexec reboot.
+ */
+void machine_shutdown(void)
+{
+ /*
+ * No more interrupts on this hart
+ * until we are back up.
+ */
+ local_irq_disable();
+
+#if defined(CONFIG_HOTPLUG_CPU) && (CONFIG_SMP)
+ smp_shutdown_nonboot_cpus(smp_processor_id());
+#endif
+}
+
+/**
+ * machine_crash_shutdown - Prepare to kexec after a kernel crash
+ *
+ * This function is called by crash_kexec just before machine_kexec
+ * below and its goal is similar to machine_shutdown, but in case of
+ * a kernel crash. Since we don't handle such cases yet, this function
+ * is empty.
+ */
+void
+machine_crash_shutdown(struct pt_regs *regs)
+{
+}
+
+/**
+ * machine_kexec - Jump to the loaded kimage
+ *
+ * This function is called by kernel_kexec which is called by the
+ * reboot system call when the reboot cmd is LINUX_REBOOT_CMD_KEXEC,
+ * or by crash_kernel which is called by the kernel's arch-specific
+ * trap handler in case of a kernel panic. It's the final stage of
+ * the kexec process where the pre-loaded kimage is ready to be
+ * executed. We assume at this point that all other harts are
+ * suspended and this hart will be the new boot hart.
+ */
+void __noreturn
+machine_kexec(struct kimage *image)
+{
+ struct kimage_arch *internal = &image->arch;
+ unsigned long jump_addr = (unsigned long) image->start;
+ unsigned long first_ind_entry = (unsigned long) &image->head;
+ unsigned long this_hart_id = raw_smp_processor_id();
+ unsigned long fdt_addr = internal->fdt_addr;
+ void *control_code_buffer = page_address(image->control_code_page);
+ riscv_kexec_do_relocate do_relocate = control_code_buffer;
+
+ pr_notice("Will call new kernel at %08lx from hart id %lx\n",
+ jump_addr, this_hart_id);
+ pr_notice("FDT image at %08lx\n", fdt_addr);
+
+ /* Make sure the relocation code is visible to the hart */
+ local_flush_icache_all();
+
+ /* Jump to the relocation code */
+ pr_notice("Bye...\n");
+ do_relocate(first_ind_entry, jump_addr, fdt_addr,
+ this_hart_id, va_pa_offset);
+ unreachable();
+}
diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
index 05669c87a..778dc191c 100644
--- a/include/uapi/linux/kexec.h
+++ b/include/uapi/linux/kexec.h
@@ -42,6 +42,7 @@
#define KEXEC_ARCH_MIPS_LE (10 << 16)
#define KEXEC_ARCH_MIPS ( 8 << 16)
#define KEXEC_ARCH_AARCH64 (183 << 16)
+#define KEXEC_ARCH_RISCV (243 << 16)
/* The artificial cap on the number of segments passed to kexec_load. */
#define KEXEC_SEGMENT_MAX 16
--
2.26.2
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 2/3] RISC-V: Add kdump support
2020-06-23 15:05 [PATCH v2 0/3] RISC-V: Add kexec/kdump support Nick Kossifidis
2020-06-23 15:05 ` [PATCH v2 1/3] RISC-V: Add kexec support Nick Kossifidis
@ 2020-06-23 15:05 ` Nick Kossifidis
2020-07-11 3:59 ` Palmer Dabbelt
2020-06-23 15:05 ` [PATCH v2 3/3] RISC-V: Add crash kernel support Nick Kossifidis
2020-07-11 3:59 ` BPATCH=20v2=200/3=5D=20RISC-V=3A=20Add=20kexec/kdump=20support=E2=80=8B?= Palmer Dabbelt
3 siblings, 1 reply; 12+ messages in thread
From: Nick Kossifidis @ 2020-06-23 15:05 UTC (permalink / raw)
To: linux-riscv, palmer
Cc: david.abdurachmanov, anup, atish.patra, yibin_liu, zong.li,
paul.walmsley, Nick Kossifidis
This patch adds support for kdump, the kernel will reserve a
region for the crash kernel and jump there on panic. In order
for userspace tools (kexec-tools) to prepare the crash kernel
kexec image, we also need to expose some information on
/proc/iomem for the memory regions used by the kernel and for
the region reserved for crash kernel. Note that on userspace
the device tree is used to determine the system's memory
layout so the "System RAM" on /proc/iomem is ignored. I just
put it there for compatibility with other archs.
I tested this on riscv64 qemu and works as expected, you may
test it by triggering a crash through /proc/sysrq_trigger:
echo c > /proc/sysrq_trigger
v2:
* Properly populate the ioresources tree, so that it can be
used later on for implementing strict /dev/mem.
* Minor cleanups and re-base
Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
---
arch/riscv/include/asm/kexec.h | 19 ++-
arch/riscv/include/uapi/asm/elf.h | 6 +
arch/riscv/kernel/Makefile | 2 +-
arch/riscv/kernel/crash_save_regs.S | 56 +++++++++
arch/riscv/kernel/kexec_relocate.S | 61 +++++++++-
arch/riscv/kernel/machine_kexec.c | 31 +++--
arch/riscv/kernel/setup.c | 180 ++++++++++++++++++++++++++++
arch/riscv/mm/init.c | 77 ++++++++++++
8 files changed, 411 insertions(+), 21 deletions(-)
create mode 100644 arch/riscv/kernel/crash_save_regs.S
diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h
index efc69feb4..4fd583acc 100644
--- a/arch/riscv/include/asm/kexec.h
+++ b/arch/riscv/include/asm/kexec.h
@@ -21,11 +21,16 @@
#define KEXEC_ARCH KEXEC_ARCH_RISCV
+extern void riscv_crash_save_regs(struct pt_regs *newregs);
+
static inline void
crash_setup_regs(struct pt_regs *newregs,
struct pt_regs *oldregs)
{
- /* Dummy implementation for now */
+ if (oldregs)
+ memcpy(newregs, oldregs, sizeof(struct pt_regs));
+ else
+ riscv_crash_save_regs(newregs);
}
@@ -38,10 +43,12 @@ struct kimage_arch {
const extern unsigned char riscv_kexec_relocate[];
const extern unsigned int riscv_kexec_relocate_size;
-typedef void (*riscv_kexec_do_relocate)(unsigned long first_ind_entry,
- unsigned long jump_addr,
- unsigned long fdt_addr,
- unsigned long hartid,
- unsigned long va_pa_off);
+typedef void (*riscv_kexec_method)(unsigned long first_ind_entry,
+ unsigned long jump_addr,
+ unsigned long fdt_addr,
+ unsigned long hartid,
+ unsigned long va_pa_off);
+
+extern riscv_kexec_method riscv_kexec_norelocate;
#endif
diff --git a/arch/riscv/include/uapi/asm/elf.h b/arch/riscv/include/uapi/asm/elf.h
index d696d6610..5b19f5547 100644
--- a/arch/riscv/include/uapi/asm/elf.h
+++ b/arch/riscv/include/uapi/asm/elf.h
@@ -19,6 +19,12 @@ typedef unsigned long elf_greg_t;
typedef struct user_regs_struct elf_gregset_t;
#define ELF_NGREG (sizeof(elf_gregset_t) / sizeof(elf_greg_t))
+#define ELF_CORE_COPY_REGS(dest, regs) \
+do { \
+ *(struct user_regs_struct *)&(dest) = \
+ *(struct user_regs_struct *)regs; \
+} while (0);
+
/* We don't support f without d, or q. */
typedef __u64 elf_fpreg_t;
typedef union __riscv_fp_state elf_fpregset_t;
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 63fc530a4..b9a3842ad 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -52,6 +52,6 @@ obj-$(CONFIG_SMP) += cpu_ops_sbi.o
endif
obj-$(CONFIG_HOTPLUG_CPU) += cpu-hotplug.o
obj-$(CONFIG_KGDB) += kgdb.o
-obj-${CONFIG_KEXEC} += kexec_relocate.o machine_kexec.o
+obj-${CONFIG_KEXEC} += kexec_relocate.o crash_save_regs.o machine_kexec.o
clean:
diff --git a/arch/riscv/kernel/crash_save_regs.S b/arch/riscv/kernel/crash_save_regs.S
new file mode 100644
index 000000000..7832fb763
--- /dev/null
+++ b/arch/riscv/kernel/crash_save_regs.S
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2020 FORTH-ICS/CARV
+ * Nick Kossifidis <mick@ics.forth.gr>
+ */
+
+#include <asm/asm.h> /* For RISCV_* and REG_* macros */
+#include <asm/csr.h> /* For CSR_* macros */
+#include <asm/asm-offsets.h> /* For offsets on pt_regs */
+#include <linux/linkage.h> /* For SYM_* macros */
+
+.section ".text"
+SYM_CODE_START(riscv_crash_save_regs)
+ REG_S ra, PT_RA(a0) /* x1 */
+ REG_S sp, PT_SP(a0) /* x2 */
+ REG_S gp, PT_GP(a0) /* x3 */
+ REG_S tp, PT_TP(a0) /* x4 */
+ REG_S t0, PT_T0(a0) /* x5 */
+ REG_S t1, PT_T1(a0) /* x6 */
+ REG_S t2, PT_T2(a0) /* x7 */
+ REG_S s0, PT_S0(a0) /* x8/fp */
+ REG_S s1, PT_S1(a0) /* x9 */
+ REG_S a0, PT_A0(a0) /* x10 */
+ REG_S a1, PT_A1(a0) /* x11 */
+ REG_S a2, PT_A2(a0) /* x12 */
+ REG_S a3, PT_A3(a0) /* x13 */
+ REG_S a4, PT_A4(a0) /* x14 */
+ REG_S a5, PT_A5(a0) /* x15 */
+ REG_S a6, PT_A6(a0) /* x16 */
+ REG_S a7, PT_A7(a0) /* x17 */
+ REG_S s2, PT_S2(a0) /* x18 */
+ REG_S s3, PT_S3(a0) /* x19 */
+ REG_S s4, PT_S4(a0) /* x20 */
+ REG_S s5, PT_S5(a0) /* x21 */
+ REG_S s6, PT_S6(a0) /* x22 */
+ REG_S s7, PT_S7(a0) /* x23 */
+ REG_S s8, PT_S8(a0) /* x24 */
+ REG_S s9, PT_S9(a0) /* x25 */
+ REG_S s10, PT_S10(a0) /* x26 */
+ REG_S s11, PT_S11(a0) /* x27 */
+ REG_S t3, PT_T3(a0) /* x28 */
+ REG_S t4, PT_T4(a0) /* x29 */
+ REG_S t5, PT_T5(a0) /* x30 */
+ REG_S t6, PT_T6(a0) /* x31 */
+
+ csrr t1, CSR_STATUS
+ csrr t2, CSR_EPC
+ csrr t3, CSR_TVAL
+ csrr t4, CSR_CAUSE
+
+ REG_S t1, PT_STATUS(a0)
+ REG_S t2, PT_EPC(a0)
+ REG_S t3, PT_BADADDR(a0)
+ REG_S t4, PT_CAUSE(a0)
+ ret
+SYM_CODE_END(riscv_crash_save_regs)
diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S
index cd17d585f..c417c44ea 100644
--- a/arch/riscv/kernel/kexec_relocate.S
+++ b/arch/riscv/kernel/kexec_relocate.S
@@ -152,7 +152,66 @@ SYM_CODE_START(riscv_kexec_relocate)
SYM_CODE_END(riscv_kexec_relocate)
riscv_kexec_relocate_end:
- .section ".rodata"
+
+/* Used for jumping to crashkernel */
+.section ".text"
+SYM_CODE_START(riscv_kexec_norelocate)
+ /*
+ * s0: (const) Phys address to jump to
+ * s1: (const) Phys address of the FDT image
+ * s2: (const) The hartid of the current hart
+ */
+ mv s0, a1
+ mv s1, a2
+ mv s2, a3
+
+ /* Disable / cleanup interrupts */
+ csrw sie, zero
+ csrw sip, zero
+
+ /* Switch to physical addressing */
+ csrw sptbr, zero
+
+ /* Pass the arguments to the next kernel / Cleanup*/
+ mv a0, s2
+ mv a1, s1
+ mv a2, s0
+
+ /* Cleanup */
+ mv a3, zero
+ mv a4, zero
+ mv a5, zero
+ mv a6, zero
+ mv a7, zero
+
+ mv s0, zero
+ mv s1, zero
+ mv s2, zero
+ mv s3, zero
+ mv s4, zero
+ mv s5, zero
+ mv s6, zero
+ mv s7, zero
+ mv s8, zero
+ mv s9, zero
+ mv s10, zero
+ mv s11, zero
+
+ mv t0, zero
+ mv t1, zero
+ mv t2, zero
+ mv t3, zero
+ mv t4, zero
+ mv t5, zero
+ mv t6, zero
+ csrw sepc, zero
+ csrw scause, zero
+ csrw sscratch, zero
+
+ jalr zero, a2, 0
+SYM_CODE_END(riscv_kexec_norelocate)
+
+.section ".rodata"
SYM_DATA(riscv_kexec_relocate_size,
.long riscv_kexec_relocate_end - riscv_kexec_relocate)
diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c
index 88734a056..dfe6e9066 100644
--- a/arch/riscv/kernel/machine_kexec.c
+++ b/arch/riscv/kernel/machine_kexec.c
@@ -58,11 +58,6 @@ machine_kexec_prepare(struct kimage *image)
kexec_image_info(image);
- if (image->type == KEXEC_TYPE_CRASH) {
- pr_warn("Loading a crash kernel is unsupported for now.\n");
- return -EINVAL;
- }
-
/* Find the Flattened Device Tree and save its physical address */
for (i = 0; i < image->nr_segments; i++) {
if (image->segment[i].memsz <= sizeof(fdt))
@@ -84,12 +79,14 @@ machine_kexec_prepare(struct kimage *image)
}
/* Copy the assembler code for relocation to the control page */
- control_code_buffer = page_address(image->control_code_page);
- memcpy(control_code_buffer, riscv_kexec_relocate,
- riscv_kexec_relocate_size);
+ if (image->type != KEXEC_TYPE_CRASH) {
+ control_code_buffer = page_address(image->control_code_page);
+ memcpy(control_code_buffer, riscv_kexec_relocate,
+ riscv_kexec_relocate_size);
- /* Mark the control page executable */
- set_memory_x((unsigned long) control_code_buffer, 1);
+ /* Mark the control page executable */
+ set_memory_x((unsigned long) control_code_buffer, 1);
+ }
#ifdef CONFIG_SMP
/*
@@ -149,6 +146,9 @@ void machine_shutdown(void)
void
machine_crash_shutdown(struct pt_regs *regs)
{
+ crash_save_cpu(regs, smp_processor_id());
+ machine_shutdown();
+ pr_info("Starting crashdump kernel...\n");
}
/**
@@ -171,7 +171,12 @@ machine_kexec(struct kimage *image)
unsigned long this_hart_id = raw_smp_processor_id();
unsigned long fdt_addr = internal->fdt_addr;
void *control_code_buffer = page_address(image->control_code_page);
- riscv_kexec_do_relocate do_relocate = control_code_buffer;
+ riscv_kexec_method kexec_method = NULL;
+
+ if (image->type != KEXEC_TYPE_CRASH)
+ kexec_method = control_code_buffer;
+ else
+ kexec_method = (riscv_kexec_method) &riscv_kexec_norelocate;
pr_notice("Will call new kernel at %08lx from hart id %lx\n",
jump_addr, this_hart_id);
@@ -182,7 +187,7 @@ machine_kexec(struct kimage *image)
/* Jump to the relocation code */
pr_notice("Bye...\n");
- do_relocate(first_ind_entry, jump_addr, fdt_addr,
- this_hart_id, va_pa_offset);
+ kexec_method(first_ind_entry, jump_addr, fdt_addr,
+ this_hart_id, va_pa_offset);
unreachable();
}
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index f04373be5..4dc940389 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -4,6 +4,8 @@
* Chen Liqin <liqin.chen@sunplusct.com>
* Lennox Wu <lennox.wu@sunplusct.com>
* Copyright (C) 2012 Regents of the University of California
+ * Copyright (C) 2020 FORTH-ICS/CARV
+ * Nick Kossifidis <mick@ics.forth.gr>
*/
#include <linux/init.h>
@@ -17,6 +19,8 @@
#include <linux/sched/task.h>
#include <linux/swiotlb.h>
#include <linux/smp.h>
+#include <linux/crash_dump.h>
+#include <linux/initrd.h>
#include <asm/clint.h>
#include <asm/cpu_ops.h>
@@ -49,6 +53,181 @@ atomic_t hart_lottery __section(.sdata);
unsigned long boot_cpu_hartid;
static DEFINE_PER_CPU(struct cpu, cpu_devices);
+/*
+ * Place kernel memory regions on the resource tree so that
+ * kexec-tools can retrieve them from /proc/iomem. While there
+ * also add "System RAM" regions for compatibility with other
+ * archs, and the rest of the known regions for completeness.
+ */
+static struct resource code_res = { .name = "Kernel code", };
+static struct resource data_res = { .name = "Kernel data", };
+static struct resource rodata_res = { .name = "Kernel rodata", };
+static struct resource bss_res = { .name = "Kernel bss", };
+
+static int __init add_resource(struct resource *parent,
+ struct resource *res)
+{
+ int ret = 0;
+
+ ret = insert_resource(parent, res);
+ if (ret < 0) {
+ pr_err("Failed to add a %s resource at %llx\n",
+ res->name, res->start);
+ return ret;
+ }
+
+ return 1;
+}
+
+static int __init add_kernel_resources(struct resource *res)
+{
+ int ret = 0;
+
+ /*
+ * The memory region of the kernel image is continuous and
+ * was reserved on setup_bootmem, find it here and register
+ * it as a resource, then register the various segments of
+ * the image as child nodes
+ */
+ if (!(res->start <= code_res.start && res->end >= data_res.end))
+ return 0;
+
+ res->name = "Kernel image";
+ res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+
+ /*
+ * We removed a part of this region on setup_bootmem so
+ * we need to expand the resource for the bss to fit in.
+ */
+ res->end = bss_res.end;
+
+ ret = add_resource(&iomem_resource, res);
+ if (ret < 0)
+ return ret;
+
+ ret = add_resource(res, &code_res);
+ if (ret < 0)
+ return ret;
+
+ ret = add_resource(res, &rodata_res);
+ if (ret < 0)
+ return ret;
+
+ ret = add_resource(res, &data_res);
+ if (ret < 0)
+ return ret;
+
+ ret = add_resource(res, &bss_res);
+
+ return ret;
+}
+
+#ifdef CONFIG_KEXEC_CORE
+static int __init add_crashk_resource(struct resource *res)
+{
+ if (res->start <= crashk_res.start && res->end >= crashk_res.end)
+ return add_resource(&iomem_resource, &crashk_res);
+
+ return 0;
+}
+#endif
+
+static void __init init_resources(void)
+{
+ struct memblock_region *region = NULL;
+ struct resource *res = NULL;
+ int ret = 0;
+
+ code_res.start = __pa_symbol(_text);
+ code_res.end = __pa_symbol(_etext) - 1;
+ code_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+
+ rodata_res.start = __pa_symbol(__start_rodata);
+ rodata_res.end = __pa_symbol(__end_rodata) - 1;
+ rodata_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+
+ data_res.start = __pa_symbol(_data);
+ data_res.end = __pa_symbol(_edata) - 1;
+ data_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+
+ bss_res.start = __pa_symbol(__bss_start);
+ bss_res.end = __pa_symbol(__bss_stop) - 1;
+ bss_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+
+ /*
+ * Start by adding the reserved regions, if they overlap
+ * with /memory regions, insert_resource later on will take
+ * care of it.
+ */
+ for_each_memblock(reserved, region) {
+ res = memblock_alloc(sizeof(struct resource), SMP_CACHE_BYTES);
+ if (!res)
+ panic("%s: Failed to allocate %zu bytes\n", __func__,
+ sizeof(struct resource));
+
+ res->name = "Reserved";
+ res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+ res->start = __pfn_to_phys(memblock_region_reserved_base_pfn(region));
+ res->end = __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1;
+
+ ret = add_kernel_resources(res);
+ if (ret < 0)
+ goto error;
+ else if (ret)
+ continue;
+
+#ifdef CONFIG_KEXEC_CORE
+ ret = add_crashk_resource(res);
+ if (ret < 0)
+ goto error;
+ else if (ret)
+ continue;
+#endif
+
+ /*
+ * Ignore any other reserved regions within
+ * system memory.
+ */
+ if (memblock_is_memory(res->start))
+ continue;
+
+ ret = add_resource(&iomem_resource, res);
+ if (ret < 0)
+ goto error;
+ }
+
+ /* Add /memory regions to the resource tree */
+ for_each_memblock(memory, region) {
+ res = memblock_alloc(sizeof(struct resource), SMP_CACHE_BYTES);
+ if (!res)
+ panic("%s: Failed to allocate %zu bytes\n", __func__,
+ sizeof(struct resource));
+
+ if (unlikely(memblock_is_nomap(region))) {
+ res->name = "Reserved";
+ res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+ } else {
+ res->name = "System RAM";
+ res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+ }
+
+ res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
+ res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
+
+ ret = add_resource(&iomem_resource, res);
+ if (ret < 0)
+ goto error;
+ }
+
+ return;
+
+ error:
+ memblock_free((phys_addr_t) res, sizeof(struct resource));
+ /* Better an empty resource tree than an inconsistent one */
+ release_child_resources(&iomem_resource);
+}
+
+
void __init parse_dtb(void)
{
if (early_init_dt_scan(dtb_early_va))
@@ -74,6 +253,7 @@ void __init setup_arch(char **cmdline_p)
setup_bootmem();
paging_init();
+ init_resources();
#if IS_ENABLED(CONFIG_BUILTIN_DTB)
unflatten_and_copy_device_tree();
#else
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index f4adb3684..e4ce3a2af 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -2,6 +2,8 @@
/*
* Copyright (C) 2012 Regents of the University of California
* Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ * Copyright (C) 2020 FORTH-ICS/CARV
+ * Nick Kossifidis <mick@ics.forth.gr>
*/
#include <linux/init.h>
@@ -13,6 +15,7 @@
#include <linux/of_fdt.h>
#include <linux/libfdt.h>
#include <linux/set_memory.h>
+#include <linux/crash_dump.h>
#include <asm/fixmap.h>
#include <asm/tlbflush.h>
@@ -517,6 +520,77 @@ void mark_rodata_ro(void)
}
#endif
+#ifdef CONFIG_KEXEC_CORE
+/*
+ * reserve_crashkernel() - reserves memory for crash kernel
+ *
+ * This function reserves memory area given in "crashkernel=" kernel command
+ * line parameter. The memory reserved is used by dump capture kernel when
+ * primary kernel is crashing.
+ */
+static void __init reserve_crashkernel(void)
+{
+ unsigned long long crash_base = 0;
+ unsigned long long crash_size = 0;
+ unsigned long search_start = memblock_start_of_DRAM();
+ unsigned long search_end = memblock_end_of_DRAM();
+
+ int ret = 0;
+
+ ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
+ &crash_size, &crash_base);
+ if (ret || !crash_size)
+ return;
+
+ crash_size = PAGE_ALIGN(crash_size);
+
+ if (crash_base == 0) {
+ /*
+ * Current riscv boot protocol requires 2MB alignment for
+ * RV64 and 4MB alignment for RV32 (hugepage size)
+ */
+ crash_base = memblock_find_in_range(search_start, search_end,
+#ifdef CONFIG_64BIT
+ crash_size, SZ_2M);
+#else
+ crash_size, SZ_4M);
+#endif
+ if (crash_base == 0) {
+ pr_warn("crashkernel: couldn't allocate %lldKB\n",
+ crash_size >> 10);
+ return;
+ }
+ } else {
+ /* User specifies base address explicitly. */
+ if (!memblock_is_region_memory(crash_base, crash_size)) {
+ pr_warn("crashkernel: requested region is not memory\n");
+ return;
+ }
+
+ if (memblock_is_region_reserved(crash_base, crash_size)) {
+ pr_warn("crashkernel: requested region is reserved\n");
+ return;
+ }
+
+#ifdef CONFIG_64BIT
+ if (!IS_ALIGNED(crash_base, SZ_2M)) {
+#else
+ if (!IS_ALIGNED(crash_base, SZ_4M)) {
+#endif
+ pr_warn("crashkernel: requested region is misaligned\n");
+ return;
+ }
+ }
+ memblock_reserve(crash_base, crash_size);
+
+ pr_info("crashkernel: reserved 0x%016llx - 0x%016llx (%lld MB)\n",
+ crash_base, crash_base + crash_size, crash_size >> 20);
+
+ crashk_res.start = crash_base;
+ crashk_res.end = crash_base + crash_size - 1;
+}
+#endif
+
void __init paging_init(void)
{
setup_vm_final();
@@ -524,6 +598,9 @@ void __init paging_init(void)
sparse_init();
setup_zero_page();
zone_sizes_init();
+#ifdef CONFIG_KEXEC_CORE
+ reserve_crashkernel();
+#endif
}
#ifdef CONFIG_SPARSEMEM_VMEMMAP
--
2.26.2
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 3/3] RISC-V: Add crash kernel support
2020-06-23 15:05 [PATCH v2 0/3] RISC-V: Add kexec/kdump support Nick Kossifidis
2020-06-23 15:05 ` [PATCH v2 1/3] RISC-V: Add kexec support Nick Kossifidis
2020-06-23 15:05 ` [PATCH v2 2/3] RISC-V: Add kdump support Nick Kossifidis
@ 2020-06-23 15:05 ` Nick Kossifidis
2020-07-11 3:59 ` Palmer Dabbelt
2020-07-11 3:59 ` BPATCH=20v2=200/3=5D=20RISC-V=3A=20Add=20kexec/kdump=20support=E2=80=8B?= Palmer Dabbelt
3 siblings, 1 reply; 12+ messages in thread
From: Nick Kossifidis @ 2020-06-23 15:05 UTC (permalink / raw)
To: linux-riscv, palmer
Cc: david.abdurachmanov, anup, atish.patra, yibin_liu, zong.li,
paul.walmsley, Nick Kossifidis
This patch allows Linux to act as a crash kernel for use with
kdump. Userspace will let the crash kernel know about the
memory region it can use through linux,usable-memory property
on the /memory node (overriding its reg property), and about the
memory region where the elf core header of the previous kernel
is saved, through a reserved-memory node with a compatible string
of "linux,elfcorehdr". This approach is the least invasive and
re-uses functionality already present.
I tested this on riscv64 qemu and it works as expected, you
may test it by retrieving the dmesg of the previous kernel
through /proc/vmcore, using the vmcore-dmesg utility from
kexec-tools.
v2:
* Use linux,usable-memory on /memory instead of a new binding
* Use a reserved-memory node for ELF core header
Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
---
arch/riscv/Kconfig | 10 ++++++++
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/crash_dump.c | 46 ++++++++++++++++++++++++++++++++++
arch/riscv/kernel/setup.c | 35 +++++++++++++++++++++++---
arch/riscv/mm/init.c | 33 ++++++++++++++++++++++++
5 files changed, 122 insertions(+), 3 deletions(-)
create mode 100644 arch/riscv/kernel/crash_dump.c
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e7dd2d1c5..6c59987f0 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -353,6 +353,16 @@ config KEXEC
The name comes from the similarity to the exec system call.
+config CRASH_DUMP
+ bool "Build kdump crash kernel"
+ help
+ Generate crash dump after being started by kexec. This should
+ be normally only set in special crash dump kernels which are
+ loaded in the main kernel with kexec-tools into a specially
+ reserved region and then later executed after a crash by
+ kdump/kexec.
+
+ For more details see Documentation/admin-guide/kdump/kdump.rst
endmenu
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index b9a3842ad..f067f55f1 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -53,5 +53,6 @@ endif
obj-$(CONFIG_HOTPLUG_CPU) += cpu-hotplug.o
obj-$(CONFIG_KGDB) += kgdb.o
obj-${CONFIG_KEXEC} += kexec_relocate.o crash_save_regs.o machine_kexec.o
+obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
clean:
diff --git a/arch/riscv/kernel/crash_dump.c b/arch/riscv/kernel/crash_dump.c
new file mode 100644
index 000000000..81b9d2a71
--- /dev/null
+++ b/arch/riscv/kernel/crash_dump.c
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This code comes from arch/arm64/kernel/crash_dump.c
+ * Created by: AKASHI Takahiro <takahiro.akashi@linaro.org>
+ * Copyright (C) 2017 Linaro Limited
+ */
+
+#include <linux/crash_dump.h>
+#include <linux/io.h>
+
+/**
+ * copy_oldmem_page() - copy one page from old kernel memory
+ * @pfn: page frame number to be copied
+ * @buf: buffer where the copied page is placed
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page
+ * @userbuf: if set, @buf is in a user address space
+ *
+ * This function copies one page from old kernel memory into buffer pointed by
+ * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
+ * copied or negative error in case of failure.
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
+ size_t csize, unsigned long offset,
+ int userbuf)
+{
+ void *vaddr;
+
+ if (!csize)
+ return 0;
+
+ vaddr = memremap(__pfn_to_phys(pfn), PAGE_SIZE, MEMREMAP_WB);
+ if (!vaddr)
+ return -ENOMEM;
+
+ if (userbuf) {
+ if (copy_to_user((char __user *)buf, vaddr + offset, csize)) {
+ memunmap(vaddr);
+ return -EFAULT;
+ }
+ } else
+ memcpy(buf, vaddr + offset, csize);
+
+ memunmap(vaddr);
+ return csize;
+}
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 4dc940389..469470b0b 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -63,6 +63,9 @@ static struct resource code_res = { .name = "Kernel code", };
static struct resource data_res = { .name = "Kernel data", };
static struct resource rodata_res = { .name = "Kernel rodata", };
static struct resource bss_res = { .name = "Kernel bss", };
+#ifdef CONFIG_CRASH_DUMP
+static struct resource elfcorehdr_res = { .name = "ELF Core hdr", };
+#endif
static int __init add_resource(struct resource *parent,
struct resource *res)
@@ -133,6 +135,16 @@ static int __init add_crashk_resource(struct resource *res)
}
#endif
+#ifdef CONFIG_CRASH_DUMP
+static int __init add_elfcorehdr_resource(struct resource *res)
+{
+ if (res->start <= elfcorehdr_res.start && res->end >= elfcorehdr_res.end)
+ return add_resource(&iomem_resource, &elfcorehdr_res);
+
+ return 0;
+}
+#endif
+
static void __init init_resources(void)
{
struct memblock_region *region = NULL;
@@ -155,6 +167,14 @@ static void __init init_resources(void)
bss_res.end = __pa_symbol(__bss_stop) - 1;
bss_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+#ifdef CONFIG_CRASH_DUMP
+ if (elfcorehdr_size) {
+ elfcorehdr_res.start = elfcorehdr_addr;
+ elfcorehdr_res.end = elfcorehdr_addr + elfcorehdr_size - 1;
+ elfcorehdr_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+ }
+#endif
+
/*
* Start by adding the reserved regions, if they overlap
* with /memory regions, insert_resource later on will take
@@ -185,6 +206,14 @@ static void __init init_resources(void)
continue;
#endif
+#ifdef CONFIG_CRASH_DUMP
+ ret = add_elfcorehdr_resource(res);
+ if (ret < 0)
+ goto error;
+ else if (ret)
+ continue;
+#endif
+
/*
* Ignore any other reserved regions within
* system memory.
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index e4ce3a2af..c85ddfe44 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -13,6 +13,7 @@
#include <linux/swap.h>
#include <linux/sizes.h>
#include <linux/of_fdt.h>
+#include <linux/of_reserved_mem.h>
#include <linux/libfdt.h>
#include <linux/set_memory.h>
#include <linux/crash_dump.h>
@@ -123,6 +124,26 @@ static void __init setup_initrd(void)
}
#endif /* CONFIG_BLK_DEV_INITRD */
+#ifdef CONFIG_CRASH_DUMP
+/*
+ * We keep track of the ELF core header of the crashed
+ * kernel with a reserved-memory region with compatible
+ * string "linux,elfcorehdr". Here we register a callback
+ * to populate elfcorehdr_addr/size when this region is
+ * present. Note that this region will be marked as
+ * reserved once we call early_init_fdt_scan_reserved_mem()
+ * later on.
+ */
+static int elfcore_hdr_setup(struct reserved_mem *rmem)
+{
+ elfcorehdr_addr = rmem->base;
+ elfcorehdr_size = rmem->size;
+ return 0;
+}
+
+RESERVEDMEM_OF_DECLARE(elfcorehdr, "linux,elfcorehdr", elfcore_hdr_setup);
+#endif
+
static phys_addr_t dtb_early_pa __initdata;
void __init setup_bootmem(void)
@@ -537,6 +558,18 @@ static void __init reserve_crashkernel(void)
int ret = 0;
+ /*
+ * Don't reserve a region for a crash kernel on a crash kernel
+ * since it doesn't make much sense and we have limited memory
+ * resources.
+ */
+#ifdef CONFIG_CRASH_DUMP
+ if (is_kdump_kernel()) {
+ pr_info("crashkernel: ignoring reservation request\n");
+ return;
+ }
+#endif
+
ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
&crash_size, &crash_base);
if (ret || !crash_size)
--
2.26.2
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/3] RISC-V: Add kexec support
2020-06-23 15:05 ` [PATCH v2 1/3] RISC-V: Add kexec support Nick Kossifidis
@ 2020-07-11 3:59 ` Palmer Dabbelt
2020-07-16 14:57 ` Nick Kossifidis
0 siblings, 1 reply; 12+ messages in thread
From: Palmer Dabbelt @ 2020-07-11 3:59 UTC (permalink / raw)
To: mick
Cc: david.abdurachmanov, anup, Atish Patra, yibin_liu, zong.li,
Paul Walmsley, mick, linux-riscv
On Tue, 23 Jun 2020 08:05:10 PDT (-0700), mick@ics.forth.gr wrote:
> This patch adds support for kexec on RISC-V. On SMP systems it depends
> on HOTPLUG_CPU in order to be able to bring up all harts after kexec.
> It also needs a recent OpenSBI version that supports the HSM extension.
> I tested it on riscv64 QEMU on both an smp and a non-smp system.
>
> The older version of this patch can be found at:
> https://patchwork.kernel.org/patch/11508671/
>
> v4:
> * No functional changes, just re-based
>
> v3:
> * Use the new smp_shutdown_nonboot_cpus() call.
> * Move riscv_kexec_relocate to .rodata
>
> v2:
> * Pass needed parameters as arguments to riscv_kexec_relocate
> instead of using global variables.
> * Use kimage_arch to hold the fdt address of the included fdt.
> * Use SYM_* macros on kexec_relocate.S.
> * Compatibility with STRICT_KERNEL_RWX.
> * Compatibility with HOTPLUG_CPU for SMP
> * Small cleanups
>
> Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
> Tested-by: Nick Kossifidis <mick@ics.forth.gr>
> ---
> arch/riscv/Kconfig | 14 +++
> arch/riscv/include/asm/kexec.h | 47 ++++++++
> arch/riscv/kernel/Makefile | 1 +
> arch/riscv/kernel/kexec_relocate.S | 158 ++++++++++++++++++++++++
> arch/riscv/kernel/machine_kexec.c | 188 +++++++++++++++++++++++++++++
> include/uapi/linux/kexec.h | 1 +
> 6 files changed, 409 insertions(+)
> create mode 100644 arch/riscv/include/asm/kexec.h
> create mode 100644 arch/riscv/kernel/kexec_relocate.S
> create mode 100644 arch/riscv/kernel/machine_kexec.c
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 128192e14..e7dd2d1c5 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -340,6 +340,20 @@ config RISCV_SBI_V01
> help
> This config allows kernel to use SBI v0.1 APIs. This will be
> deprecated in future once legacy M-mode software are no longer in use.
> +
> +config KEXEC
> + bool "Kexec system call"
> + select KEXEC_CORE
> + select HOTPLUG_CPU if SMP
This needs an S-mode dependency, as this definately won't run as-is in M mode.
While we might be fix up the CSRs pretty quickly, but as we don't really have
any standard-smelling M-mode platforms
> + help
> + kexec is a system call that implements the ability to shutdown your
> + current kernel, and to start another kernel. It is like a reboot
> + but it is independent of the system firmware. And like a reboot
> + you can start any kernel with it, not just Linux.
> +
> + The name comes from the similarity to the exec system call.
> +
> +
> endmenu
>
> menu "Boot options"
> diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h
> new file mode 100644
> index 000000000..efc69feb4
> --- /dev/null
> +++ b/arch/riscv/include/asm/kexec.h
> @@ -0,0 +1,47 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2019 FORTH-ICS/CARV
> + * Nick Kossifidis <mick@ics.forth.gr>
> + */
> +
> +#ifndef _RISCV_KEXEC_H
> +#define _RISCV_KEXEC_H
> +
> +/* Maximum physical address we can use pages from */
> +#define KEXEC_SOURCE_MEMORY_LIMIT (-1UL)
> +
> +/* Maximum address we can reach in physical address mode */
> +#define KEXEC_DESTINATION_MEMORY_LIMIT (-1UL)
> +
> +/* Maximum address we can use for the control code buffer */
> +#define KEXEC_CONTROL_MEMORY_LIMIT (-1UL)
> +
> +/* Reserve a page for the control code buffer */
> +#define KEXEC_CONTROL_PAGE_SIZE 4096
> +
> +#define KEXEC_ARCH KEXEC_ARCH_RISCV
> +
> +static inline void
> +crash_setup_regs(struct pt_regs *newregs,
> + struct pt_regs *oldregs)
> +{
> + /* Dummy implementation for now */
> +}
> +
> +
> +#define ARCH_HAS_KIMAGE_ARCH
checkpatch complains about this, but everyone else does it this way so I'm
going to take it for now and fix it up later.
> +
> +struct kimage_arch {
> + unsigned long fdt_addr;
> +};
> +
> +const extern unsigned char riscv_kexec_relocate[];
> +const extern unsigned int riscv_kexec_relocate_size;
> +
> +typedef void (*riscv_kexec_do_relocate)(unsigned long first_ind_entry,
> + unsigned long jump_addr,
> + unsigned long fdt_addr,
> + unsigned long hartid,
> + unsigned long va_pa_off);
> +
> +#endif
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index b355cf485..63fc530a4 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -52,5 +52,6 @@ obj-$(CONFIG_SMP) += cpu_ops_sbi.o
> endif
> obj-$(CONFIG_HOTPLUG_CPU) += cpu-hotplug.o
> obj-$(CONFIG_KGDB) += kgdb.o
> +obj-${CONFIG_KEXEC} += kexec_relocate.o machine_kexec.o
>
> clean:
> diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S
> new file mode 100644
> index 000000000..cd17d585f
> --- /dev/null
> +++ b/arch/riscv/kernel/kexec_relocate.S
> @@ -0,0 +1,158 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2019 FORTH-ICS/CARV
> + * Nick Kossifidis <mick@ics.forth.gr>
> + */
> +
> +#include <asm/asm.h> /* For RISCV_* and REG_* macros */
> +#include <asm/page.h> /* For PAGE_SHIFT */
> +#include <linux/linkage.h> /* For SYM_* macros */
> +
> +.section ".rodata"
> +SYM_CODE_START(riscv_kexec_relocate)
> +
> + /*
> + * s0: Pointer to the current entry
> + * s1: (const) Phys address to jump to after relocation
> + * s2: (const) Phys address of the FDT image
> + * s3: (const) The hartid of the current hart
> + * s4: Pointer to the destination address for the relocation
> + * s5: (const) Number of words per page
> + * s6: (const) 1, used for subtraction
> + * s7: (const) va_pa_offset, used when switching MMU off
> + * s8: (const) Physical address of the main loop
> + * s9: (debug) indirection page counter
> + * s10: (debug) entry counter
> + * s11: (debug) copied words counter
> + */
> + mv s0, a0
> + mv s1, a1
> + mv s2, a2
> + mv s3, a3
> + mv s4, zero
> + li s5, ((1 << PAGE_SHIFT) / RISCV_SZPTR)
> + li s6, 1
> + mv s7, a4
> + mv s8, zero
> + mv s9, zero
> + mv s10, zero
> + mv s11, zero
> +
> + /* Disable / cleanup interrupts */
> + csrw sie, zero
> + csrw sip, zero
> +
> + /*
> + * When we switch SATP.MODE to "Bare" we'll only
> + * play with physical addresses. However the first time
> + * we try to jump somewhere, the offset on the jump
> + * will be relative to pc which will still be on VA. To
> + * deal with this we set stvec to the physical address at
> + * the start of the loop below so that we jump there in
> + * any case.
> + */
> + la s8, 1f
> + sub s8, s8, s7
> + csrw stvec, s8
stvec needs to be aligned.
> +
> + /* Process entries in a loop */
> +1:
> + addi s10, s10, 1
> + REG_L t0, 0(s0) /* t0 = *image->entry */
> + addi s0, s0, RISCV_SZPTR /* image->entry++ */
> +
> + /* IND_DESTINATION entry ? -> save destination address */
> + andi t1, t0, 0x1
> + beqz t1, 2f
> + andi s4, t0, ~0x1
> + j 1b
> +
> +2:
> + /* IND_INDIRECTION entry ? -> update next entry ptr (PA) */
> + andi t1, t0, 0x2
> + beqz t1, 2f
> + andi s0, t0, ~0x2
> + addi s9, s9, 1
> + csrw sptbr, zero
If I understand correctly (it's ambiguous in the ISA right now), we don't need
a fence here. I've opened https://github.com/riscv/riscv-isa-manual/issues/538
for clarification.
> + jalr zero, s8, 0
> +
> +2:
> + /* IND_DONE entry ? -> jump to done label */
> + andi t1, t0, 0x4
> + beqz t1, 2f
> + j 4f
> +
> +2:
> + /*
> + * IND_SOURCE entry ? -> copy page word by word to the
> + * destination address we got from IND_DESTINATION
> + */
> + andi t1, t0, 0x8
> + beqz t1, 1b /* Unknown entry type, ignore it */
> + andi t0, t0, ~0x8
> + mv t3, s5 /* i = num words per page */
> +3: /* copy loop */
> + REG_L t1, (t0) /* t1 = *src_ptr */
> + REG_S t1, (s4) /* *dst_ptr = *src_ptr */
> + addi t0, t0, RISCV_SZPTR /* stc_ptr++ */
> + addi s4, s4, RISCV_SZPTR /* dst_ptr++ */
> + sub t3, t3, s6 /* i-- */
> + addi s11, s11, 1 /* c++ */
> + beqz t3, 1b /* copy done ? */
> + j 3b
> +
> +4:
> + /* Wait for the relocation to be visible by other harts */
> + fence w,w
That's not how fences work. A "fence w,w" just ensures that prior writes are
visible before subsequent writes. Usually that would be some sort of control
write being ordered after the data writes, which on the other harts would be
paired with a "fence r,r" to make sure the control read is seen before any
other reads. I don't see that, though.
I think the right answer here is to ignore ordering here, as we're operating in
a single processor mode, and instead push the fences into the CPU hotplug up
codepath. We have to handle that anyway, so we might as well just do it once.
> + /* Pass the arguments to the next kernel / Cleanup*/
> + mv a0, s3
> + mv a1, s2
> + mv a2, s1
> +
> + /* Cleanup */
> + mv a3, zero
> + mv a4, zero
> + mv a5, zero
> + mv a6, zero
> + mv a7, zero
> +
> + mv s0, zero
> + mv s1, zero
> + mv s2, zero
> + mv s3, zero
> + mv s4, zero
> + mv s5, zero
> + mv s6, zero
> + mv s7, zero
> + mv s8, zero
> + mv s9, zero
> + mv s10, zero
> + mv s11, zero
> +
> + mv t0, zero
> + mv t1, zero
> + mv t2, zero
> + mv t3, zero
> + mv t4, zero
> + mv t5, zero
> + mv t6, zero
> + csrw sepc, zero
> + csrw scause, zero
> + csrw sscratch, zero
> +
> + /*
> + * Make sure the relocated code is visible
> + * and jump to the new kernel
> + */
> + fence.i
> +
> + jalr zero, a2, 0
> +
> +SYM_CODE_END(riscv_kexec_relocate)
> +riscv_kexec_relocate_end:
> +
> + .section ".rodata"
> +SYM_DATA(riscv_kexec_relocate_size,
> + .long riscv_kexec_relocate_end - riscv_kexec_relocate)
> +
> diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c
> new file mode 100644
> index 000000000..88734a056
> --- /dev/null
> +++ b/arch/riscv/kernel/machine_kexec.c
> @@ -0,0 +1,188 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2019 FORTH-ICS/CARV
> + * Nick Kossifidis <mick@ics.forth.gr>
> + */
> +
> +#include <linux/kexec.h>
> +#include <asm/kexec.h> /* For riscv_kexec_* symbol defines */
> +#include <linux/smp.h> /* For smp_send_stop () */
> +#include <asm/cacheflush.h> /* For local_flush_icache_all() */
> +#include <asm/barrier.h> /* For smp_wmb() */
> +#include <asm/page.h> /* For PAGE_MASK */
> +#include <linux/libfdt.h> /* For fdt_check_header() */
> +#include <asm/set_memory.h> /* For set_memory_x() */
> +#include <linux/compiler.h> /* For unreachable() */
> +#include <linux/cpu.h> /* For cpu_down() */
> +
> +/**
> + * kexec_image_info - Print received image details
> + */
> +static void
> +kexec_image_info(const struct kimage *image)
> +{
> + unsigned long i;
> +
> + pr_debug("Kexec image info:\n");
> + pr_debug("\ttype: %d\n", image->type);
> + pr_debug("\tstart: %lx\n", image->start);
> + pr_debug("\thead: %lx\n", image->head);
> + pr_debug("\tnr_segments: %lu\n", image->nr_segments);
> +
> + for (i = 0; i < image->nr_segments; i++) {
> + pr_debug("\t segment[%lu]: %016lx - %016lx", i,
> + image->segment[i].mem,
> + image->segment[i].mem + image->segment[i].memsz);
> + pr_debug("\t\t0x%lx bytes, %lu pages\n",
> + (unsigned long) image->segment[i].memsz,
> + (unsigned long) image->segment[i].memsz / PAGE_SIZE);
> + }
> +}
> +
> +/**
> + * machine_kexec_prepare - Initialize kexec
> + *
> + * This function is called from do_kexec_load, when the user has
> + * provided us with an image to be loaded. Its goal is to validate
> + * the image and prepare the control code buffer as needed.
> + * Note that kimage_alloc_init has already been called and the
> + * control buffer has already been allocated.
> + */
> +int
> +machine_kexec_prepare(struct kimage *image)
> +{
> + struct kimage_arch *internal = &image->arch;
> + struct fdt_header fdt = {0};
> + void *control_code_buffer = NULL;
> + int i = 0;
> +
> + kexec_image_info(image);
> +
> + if (image->type == KEXEC_TYPE_CRASH) {
> + pr_warn("Loading a crash kernel is unsupported for now.\n");
> + return -EINVAL;
> + }
> +
> + /* Find the Flattened Device Tree and save its physical address */
> + for (i = 0; i < image->nr_segments; i++) {
> + if (image->segment[i].memsz <= sizeof(fdt))
> + continue;
> +
> + if (copy_from_user(&fdt, image->segment[i].buf, sizeof(fdt)))
> + continue;
> +
> + if (fdt_check_header(&fdt))
> + continue;
> +
> + internal->fdt_addr = (unsigned long) image->segment[i].mem;
> + break;
> + }
> +
> + if (!internal->fdt_addr) {
> + pr_err("Device tree not included in the provided image\n");
> + return -EINVAL;
> + }
> +
> + /* Copy the assembler code for relocation to the control page */
> + control_code_buffer = page_address(image->control_code_page);
> + memcpy(control_code_buffer, riscv_kexec_relocate,
> + riscv_kexec_relocate_size);
The toolchain doesn't make any guarantees that you can move code around without
-fPIC, but we're already taking advantage of the fact that simply medany code
can be moved around so this should be OK. This is all assembly and it looks
fine, but since it's placed in the rodata segment there's at least a
reasonably possibility that GP could end up close enough that things get
relaxed. We'd be safer adding .norelax to that assembly file, as there's
nothing in there that's performance critical.
Also, as far as I can tell nothing is checking that riscv_kexec_relocate_size
is smaller than a page. That could cause overflows in this memcpy, but a
static check should be sufficient.
> +
> + /* Mark the control page executable */
> + set_memory_x((unsigned long) control_code_buffer, 1);
> +
> +#ifdef CONFIG_SMP
> + /*
> + * Make sure other harts see the copied data
> + * if they try to read the control buffer
> + */
> + smp_wmb();
> +#endif
This appears to have similar issues as the fence.
Also, smp_wmb() already has the relevant CONFIG_SMP checks.
> +
> + return 0;
> +}
> +
> +
> +/**
> + * machine_kexec_cleanup - Cleanup any leftovers from
> + * machine_kexec_prepare
> + *
> + * This function is called by kimage_free to handle any arch-specific
> + * allocations done on machine_kexec_prepare. Since we didn't do any
> + * allocations there, this is just an empty function. Note that the
> + * control buffer is freed by kimage_free.
> + */
> +void
> +machine_kexec_cleanup(struct kimage *image)
> +{
> +}
> +
> +
> +/*
> + * machine_shutdown - Prepare for a kexec reboot
> + *
> + * This function is called by kernel_kexec just before machine_kexec
> + * below. Its goal is to prepare the rest of the system (the other
> + * harts and possibly devices etc) for a kexec reboot.
> + */
> +void machine_shutdown(void)
> +{
> + /*
> + * No more interrupts on this hart
> + * until we are back up.
> + */
> + local_irq_disable();
> +
> +#if defined(CONFIG_HOTPLUG_CPU) && (CONFIG_SMP)
> + smp_shutdown_nonboot_cpus(smp_processor_id());
> +#endif
> +}
> +
> +/**
> + * machine_crash_shutdown - Prepare to kexec after a kernel crash
> + *
> + * This function is called by crash_kexec just before machine_kexec
> + * below and its goal is similar to machine_shutdown, but in case of
> + * a kernel crash. Since we don't handle such cases yet, this function
> + * is empty.
> + */
> +void
> +machine_crash_shutdown(struct pt_regs *regs)
> +{
> +}
> +
> +/**
> + * machine_kexec - Jump to the loaded kimage
> + *
> + * This function is called by kernel_kexec which is called by the
> + * reboot system call when the reboot cmd is LINUX_REBOOT_CMD_KEXEC,
> + * or by crash_kernel which is called by the kernel's arch-specific
> + * trap handler in case of a kernel panic. It's the final stage of
> + * the kexec process where the pre-loaded kimage is ready to be
> + * executed. We assume at this point that all other harts are
> + * suspended and this hart will be the new boot hart.
> + */
> +void __noreturn
> +machine_kexec(struct kimage *image)
> +{
> + struct kimage_arch *internal = &image->arch;
> + unsigned long jump_addr = (unsigned long) image->start;
> + unsigned long first_ind_entry = (unsigned long) &image->head;
> + unsigned long this_hart_id = raw_smp_processor_id();
> + unsigned long fdt_addr = internal->fdt_addr;
> + void *control_code_buffer = page_address(image->control_code_page);
> + riscv_kexec_do_relocate do_relocate = control_code_buffer;
> +
> + pr_notice("Will call new kernel at %08lx from hart id %lx\n",
> + jump_addr, this_hart_id);
> + pr_notice("FDT image at %08lx\n", fdt_addr);
> +
> + /* Make sure the relocation code is visible to the hart */
> + local_flush_icache_all();
> +
> + /* Jump to the relocation code */
> + pr_notice("Bye...\n");
> + do_relocate(first_ind_entry, jump_addr, fdt_addr,
> + this_hart_id, va_pa_offset);
> + unreachable();
> +}
> diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
> index 05669c87a..778dc191c 100644
> --- a/include/uapi/linux/kexec.h
> +++ b/include/uapi/linux/kexec.h
> @@ -42,6 +42,7 @@
> #define KEXEC_ARCH_MIPS_LE (10 << 16)
> #define KEXEC_ARCH_MIPS ( 8 << 16)
> #define KEXEC_ARCH_AARCH64 (183 << 16)
> +#define KEXEC_ARCH_RISCV (243 << 16)
I usually split the UAPI changes out as their own patch. This one is pretty
trivial, but it's always nice to make sure everyone knows we're making a UAPI
change just to be on the safe side.
>
> /* The artificial cap on the number of segments passed to kexec_load. */
> #define KEXEC_SEGMENT_MAX 16
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] RISC-V: Add kdump support
2020-06-23 15:05 ` [PATCH v2 2/3] RISC-V: Add kdump support Nick Kossifidis
@ 2020-07-11 3:59 ` Palmer Dabbelt
2020-07-16 15:31 ` Nick Kossifidis
0 siblings, 1 reply; 12+ messages in thread
From: Palmer Dabbelt @ 2020-07-11 3:59 UTC (permalink / raw)
To: mick
Cc: david.abdurachmanov, anup, Atish Patra, yibin_liu, zong.li,
Paul Walmsley, mick, linux-riscv
On Tue, 23 Jun 2020 08:05:11 PDT (-0700), mick@ics.forth.gr wrote:
> This patch adds support for kdump, the kernel will reserve a
> region for the crash kernel and jump there on panic. In order
> for userspace tools (kexec-tools) to prepare the crash kernel
> kexec image, we also need to expose some information on
> /proc/iomem for the memory regions used by the kernel and for
> the region reserved for crash kernel. Note that on userspace
> the device tree is used to determine the system's memory
> layout so the "System RAM" on /proc/iomem is ignored. I just
> put it there for compatibility with other archs.
>
> I tested this on riscv64 qemu and works as expected, you may
> test it by triggering a crash through /proc/sysrq_trigger:
>
> echo c > /proc/sysrq_trigger
>
> v2:
> * Properly populate the ioresources tree, so that it can be
> used later on for implementing strict /dev/mem.
> * Minor cleanups and re-base
>
> Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
> ---
> arch/riscv/include/asm/kexec.h | 19 ++-
> arch/riscv/include/uapi/asm/elf.h | 6 +
> arch/riscv/kernel/Makefile | 2 +-
> arch/riscv/kernel/crash_save_regs.S | 56 +++++++++
> arch/riscv/kernel/kexec_relocate.S | 61 +++++++++-
> arch/riscv/kernel/machine_kexec.c | 31 +++--
> arch/riscv/kernel/setup.c | 180 ++++++++++++++++++++++++++++
> arch/riscv/mm/init.c | 77 ++++++++++++
> 8 files changed, 411 insertions(+), 21 deletions(-)
> create mode 100644 arch/riscv/kernel/crash_save_regs.S
>
> diff --git a/arch/riscv/include/asm/kexec.h b/arch/riscv/include/asm/kexec.h
> index efc69feb4..4fd583acc 100644
> --- a/arch/riscv/include/asm/kexec.h
> +++ b/arch/riscv/include/asm/kexec.h
> @@ -21,11 +21,16 @@
>
> #define KEXEC_ARCH KEXEC_ARCH_RISCV
>
> +extern void riscv_crash_save_regs(struct pt_regs *newregs);
> +
> static inline void
> crash_setup_regs(struct pt_regs *newregs,
> struct pt_regs *oldregs)
> {
> - /* Dummy implementation for now */
> + if (oldregs)
> + memcpy(newregs, oldregs, sizeof(struct pt_regs));
> + else
> + riscv_crash_save_regs(newregs);
> }
>
>
> @@ -38,10 +43,12 @@ struct kimage_arch {
> const extern unsigned char riscv_kexec_relocate[];
> const extern unsigned int riscv_kexec_relocate_size;
>
> -typedef void (*riscv_kexec_do_relocate)(unsigned long first_ind_entry,
> - unsigned long jump_addr,
> - unsigned long fdt_addr,
> - unsigned long hartid,
> - unsigned long va_pa_off);
> +typedef void (*riscv_kexec_method)(unsigned long first_ind_entry,
> + unsigned long jump_addr,
> + unsigned long fdt_addr,
> + unsigned long hartid,
> + unsigned long va_pa_off);
> +
> +extern riscv_kexec_method riscv_kexec_norelocate;
>
> #endif
> diff --git a/arch/riscv/include/uapi/asm/elf.h b/arch/riscv/include/uapi/asm/elf.h
> index d696d6610..5b19f5547 100644
> --- a/arch/riscv/include/uapi/asm/elf.h
> +++ b/arch/riscv/include/uapi/asm/elf.h
> @@ -19,6 +19,12 @@ typedef unsigned long elf_greg_t;
> typedef struct user_regs_struct elf_gregset_t;
> #define ELF_NGREG (sizeof(elf_gregset_t) / sizeof(elf_greg_t))
>
> +#define ELF_CORE_COPY_REGS(dest, regs) \
> +do { \
> + *(struct user_regs_struct *)&(dest) = \
> + *(struct user_regs_struct *)regs; \
> +} while (0);
> +
> /* We don't support f without d, or q. */
> typedef __u64 elf_fpreg_t;
> typedef union __riscv_fp_state elf_fpregset_t;
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index 63fc530a4..b9a3842ad 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -52,6 +52,6 @@ obj-$(CONFIG_SMP) += cpu_ops_sbi.o
> endif
> obj-$(CONFIG_HOTPLUG_CPU) += cpu-hotplug.o
> obj-$(CONFIG_KGDB) += kgdb.o
> -obj-${CONFIG_KEXEC} += kexec_relocate.o machine_kexec.o
> +obj-${CONFIG_KEXEC} += kexec_relocate.o crash_save_regs.o machine_kexec.o
>
> clean:
> diff --git a/arch/riscv/kernel/crash_save_regs.S b/arch/riscv/kernel/crash_save_regs.S
> new file mode 100644
> index 000000000..7832fb763
> --- /dev/null
> +++ b/arch/riscv/kernel/crash_save_regs.S
> @@ -0,0 +1,56 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2020 FORTH-ICS/CARV
> + * Nick Kossifidis <mick@ics.forth.gr>
> + */
> +
> +#include <asm/asm.h> /* For RISCV_* and REG_* macros */
> +#include <asm/csr.h> /* For CSR_* macros */
> +#include <asm/asm-offsets.h> /* For offsets on pt_regs */
> +#include <linux/linkage.h> /* For SYM_* macros */
> +
> +.section ".text"
> +SYM_CODE_START(riscv_crash_save_regs)
> + REG_S ra, PT_RA(a0) /* x1 */
> + REG_S sp, PT_SP(a0) /* x2 */
> + REG_S gp, PT_GP(a0) /* x3 */
> + REG_S tp, PT_TP(a0) /* x4 */
> + REG_S t0, PT_T0(a0) /* x5 */
> + REG_S t1, PT_T1(a0) /* x6 */
> + REG_S t2, PT_T2(a0) /* x7 */
> + REG_S s0, PT_S0(a0) /* x8/fp */
> + REG_S s1, PT_S1(a0) /* x9 */
> + REG_S a0, PT_A0(a0) /* x10 */
> + REG_S a1, PT_A1(a0) /* x11 */
> + REG_S a2, PT_A2(a0) /* x12 */
> + REG_S a3, PT_A3(a0) /* x13 */
> + REG_S a4, PT_A4(a0) /* x14 */
> + REG_S a5, PT_A5(a0) /* x15 */
> + REG_S a6, PT_A6(a0) /* x16 */
> + REG_S a7, PT_A7(a0) /* x17 */
> + REG_S s2, PT_S2(a0) /* x18 */
> + REG_S s3, PT_S3(a0) /* x19 */
> + REG_S s4, PT_S4(a0) /* x20 */
> + REG_S s5, PT_S5(a0) /* x21 */
> + REG_S s6, PT_S6(a0) /* x22 */
> + REG_S s7, PT_S7(a0) /* x23 */
> + REG_S s8, PT_S8(a0) /* x24 */
> + REG_S s9, PT_S9(a0) /* x25 */
> + REG_S s10, PT_S10(a0) /* x26 */
> + REG_S s11, PT_S11(a0) /* x27 */
> + REG_S t3, PT_T3(a0) /* x28 */
> + REG_S t4, PT_T4(a0) /* x29 */
> + REG_S t5, PT_T5(a0) /* x30 */
> + REG_S t6, PT_T6(a0) /* x31 */
> +
> + csrr t1, CSR_STATUS
> + csrr t2, CSR_EPC
> + csrr t3, CSR_TVAL
> + csrr t4, CSR_CAUSE
> +
> + REG_S t1, PT_STATUS(a0)
> + REG_S t2, PT_EPC(a0)
> + REG_S t3, PT_BADADDR(a0)
> + REG_S t4, PT_CAUSE(a0)
> + ret
> +SYM_CODE_END(riscv_crash_save_regs)
> diff --git a/arch/riscv/kernel/kexec_relocate.S b/arch/riscv/kernel/kexec_relocate.S
> index cd17d585f..c417c44ea 100644
> --- a/arch/riscv/kernel/kexec_relocate.S
> +++ b/arch/riscv/kernel/kexec_relocate.S
> @@ -152,7 +152,66 @@ SYM_CODE_START(riscv_kexec_relocate)
> SYM_CODE_END(riscv_kexec_relocate)
> riscv_kexec_relocate_end:
>
> - .section ".rodata"
> +
> +/* Used for jumping to crashkernel */
> +.section ".text"
> +SYM_CODE_START(riscv_kexec_norelocate)
> + /*
> + * s0: (const) Phys address to jump to
> + * s1: (const) Phys address of the FDT image
> + * s2: (const) The hartid of the current hart
> + */
> + mv s0, a1
> + mv s1, a2
> + mv s2, a3
> +
> + /* Disable / cleanup interrupts */
> + csrw sie, zero
> + csrw sip, zero
> +
> + /* Switch to physical addressing */
> + csrw sptbr, zero
I guess I'm missing something, but I'd assume that we would need some sort of
tvec juggling here like we usually do when jumping between virtual and physical
spaces. I get that we're not relocating the kernel that we're jumping to, but
I don't understand how PAGE_OFFSET works.
> +
> + /* Pass the arguments to the next kernel / Cleanup*/
Missing space in that comment.
> + mv a0, s2
> + mv a1, s1
> + mv a2, s0
> +
> + /* Cleanup */
> + mv a3, zero
> + mv a4, zero
> + mv a5, zero
> + mv a6, zero
> + mv a7, zero
> +
> + mv s0, zero
> + mv s1, zero
> + mv s2, zero
> + mv s3, zero
> + mv s4, zero
> + mv s5, zero
> + mv s6, zero
> + mv s7, zero
> + mv s8, zero
> + mv s9, zero
> + mv s10, zero
> + mv s11, zero
> +
> + mv t0, zero
> + mv t1, zero
> + mv t2, zero
> + mv t3, zero
> + mv t4, zero
> + mv t5, zero
> + mv t6, zero
> + csrw sepc, zero
> + csrw scause, zero
> + csrw sscratch, zero
> +
> + jalr zero, a2, 0
> +SYM_CODE_END(riscv_kexec_norelocate)
> +
> +.section ".rodata"
> SYM_DATA(riscv_kexec_relocate_size,
> .long riscv_kexec_relocate_end - riscv_kexec_relocate)
>
> diff --git a/arch/riscv/kernel/machine_kexec.c b/arch/riscv/kernel/machine_kexec.c
> index 88734a056..dfe6e9066 100644
> --- a/arch/riscv/kernel/machine_kexec.c
> +++ b/arch/riscv/kernel/machine_kexec.c
> @@ -58,11 +58,6 @@ machine_kexec_prepare(struct kimage *image)
>
> kexec_image_info(image);
>
> - if (image->type == KEXEC_TYPE_CRASH) {
> - pr_warn("Loading a crash kernel is unsupported for now.\n");
> - return -EINVAL;
> - }
> -
> /* Find the Flattened Device Tree and save its physical address */
> for (i = 0; i < image->nr_segments; i++) {
> if (image->segment[i].memsz <= sizeof(fdt))
> @@ -84,12 +79,14 @@ machine_kexec_prepare(struct kimage *image)
> }
>
> /* Copy the assembler code for relocation to the control page */
> - control_code_buffer = page_address(image->control_code_page);
> - memcpy(control_code_buffer, riscv_kexec_relocate,
> - riscv_kexec_relocate_size);
> + if (image->type != KEXEC_TYPE_CRASH) {
> + control_code_buffer = page_address(image->control_code_page);
> + memcpy(control_code_buffer, riscv_kexec_relocate,
> + riscv_kexec_relocate_size);
>
> - /* Mark the control page executable */
> - set_memory_x((unsigned long) control_code_buffer, 1);
> + /* Mark the control page executable */
> + set_memory_x((unsigned long) control_code_buffer, 1);
> + }
>
> #ifdef CONFIG_SMP
> /*
> @@ -149,6 +146,9 @@ void machine_shutdown(void)
> void
> machine_crash_shutdown(struct pt_regs *regs)
> {
> + crash_save_cpu(regs, smp_processor_id());
> + machine_shutdown();
> + pr_info("Starting crashdump kernel...\n");
> }
>
> /**
> @@ -171,7 +171,12 @@ machine_kexec(struct kimage *image)
> unsigned long this_hart_id = raw_smp_processor_id();
> unsigned long fdt_addr = internal->fdt_addr;
> void *control_code_buffer = page_address(image->control_code_page);
> - riscv_kexec_do_relocate do_relocate = control_code_buffer;
> + riscv_kexec_method kexec_method = NULL;
> +
> + if (image->type != KEXEC_TYPE_CRASH)
> + kexec_method = control_code_buffer;
> + else
> + kexec_method = (riscv_kexec_method) &riscv_kexec_norelocate;
>
> pr_notice("Will call new kernel at %08lx from hart id %lx\n",
> jump_addr, this_hart_id);
> @@ -182,7 +187,7 @@ machine_kexec(struct kimage *image)
>
> /* Jump to the relocation code */
> pr_notice("Bye...\n");
> - do_relocate(first_ind_entry, jump_addr, fdt_addr,
> - this_hart_id, va_pa_offset);
> + kexec_method(first_ind_entry, jump_addr, fdt_addr,
> + this_hart_id, va_pa_offset);
> unreachable();
> }
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index f04373be5..4dc940389 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -4,6 +4,8 @@
> * Chen Liqin <liqin.chen@sunplusct.com>
> * Lennox Wu <lennox.wu@sunplusct.com>
> * Copyright (C) 2012 Regents of the University of California
> + * Copyright (C) 2020 FORTH-ICS/CARV
> + * Nick Kossifidis <mick@ics.forth.gr>
> */
>
> #include <linux/init.h>
> @@ -17,6 +19,8 @@
> #include <linux/sched/task.h>
> #include <linux/swiotlb.h>
> #include <linux/smp.h>
> +#include <linux/crash_dump.h>
> +#include <linux/initrd.h>
>
> #include <asm/clint.h>
> #include <asm/cpu_ops.h>
> @@ -49,6 +53,181 @@ atomic_t hart_lottery __section(.sdata);
> unsigned long boot_cpu_hartid;
> static DEFINE_PER_CPU(struct cpu, cpu_devices);
>
> +/*
> + * Place kernel memory regions on the resource tree so that
> + * kexec-tools can retrieve them from /proc/iomem. While there
> + * also add "System RAM" regions for compatibility with other
> + * archs, and the rest of the known regions for completeness.
> + */
> +static struct resource code_res = { .name = "Kernel code", };
> +static struct resource data_res = { .name = "Kernel data", };
> +static struct resource rodata_res = { .name = "Kernel rodata", };
> +static struct resource bss_res = { .name = "Kernel bss", };
> +
> +static int __init add_resource(struct resource *parent,
> + struct resource *res)
> +{
> + int ret = 0;
> +
> + ret = insert_resource(parent, res);
> + if (ret < 0) {
> + pr_err("Failed to add a %s resource at %llx\n",
> + res->name, res->start);
> + return ret;
> + }
> +
> + return 1;
> +}
> +
> +static int __init add_kernel_resources(struct resource *res)
> +{
> + int ret = 0;
> +
> + /*
> + * The memory region of the kernel image is continuous and
> + * was reserved on setup_bootmem, find it here and register
> + * it as a resource, then register the various segments of
> + * the image as child nodes
> + */
> + if (!(res->start <= code_res.start && res->end >= data_res.end))
> + return 0;
> +
> + res->name = "Kernel image";
> + res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> +
> + /*
> + * We removed a part of this region on setup_bootmem so
> + * we need to expand the resource for the bss to fit in.
> + */
> + res->end = bss_res.end;
> +
> + ret = add_resource(&iomem_resource, res);
> + if (ret < 0)
> + return ret;
> +
> + ret = add_resource(res, &code_res);
> + if (ret < 0)
> + return ret;
> +
> + ret = add_resource(res, &rodata_res);
> + if (ret < 0)
> + return ret;
> +
> + ret = add_resource(res, &data_res);
> + if (ret < 0)
> + return ret;
> +
> + ret = add_resource(res, &bss_res);
> +
> + return ret;
> +}
> +
> +#ifdef CONFIG_KEXEC_CORE
> +static int __init add_crashk_resource(struct resource *res)
> +{
> + if (res->start <= crashk_res.start && res->end >= crashk_res.end)
> + return add_resource(&iomem_resource, &crashk_res);
> +
> + return 0;
> +}
> +#endif
> +
> +static void __init init_resources(void)
> +{
> + struct memblock_region *region = NULL;
> + struct resource *res = NULL;
> + int ret = 0;
> +
> + code_res.start = __pa_symbol(_text);
> + code_res.end = __pa_symbol(_etext) - 1;
> + code_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> +
> + rodata_res.start = __pa_symbol(__start_rodata);
> + rodata_res.end = __pa_symbol(__end_rodata) - 1;
> + rodata_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> +
> + data_res.start = __pa_symbol(_data);
> + data_res.end = __pa_symbol(_edata) - 1;
> + data_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> +
> + bss_res.start = __pa_symbol(__bss_start);
> + bss_res.end = __pa_symbol(__bss_stop) - 1;
> + bss_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> +
> + /*
> + * Start by adding the reserved regions, if they overlap
> + * with /memory regions, insert_resource later on will take
> + * care of it.
> + */
> + for_each_memblock(reserved, region) {
> + res = memblock_alloc(sizeof(struct resource), SMP_CACHE_BYTES);
> + if (!res)
> + panic("%s: Failed to allocate %zu bytes\n", __func__,
> + sizeof(struct resource));
> +
> + res->name = "Reserved";
> + res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> + res->start = __pfn_to_phys(memblock_region_reserved_base_pfn(region));
> + res->end = __pfn_to_phys(memblock_region_reserved_end_pfn(region)) - 1;
> +
> + ret = add_kernel_resources(res);
> + if (ret < 0)
> + goto error;
> + else if (ret)
> + continue;
> +
> +#ifdef CONFIG_KEXEC_CORE
> + ret = add_crashk_resource(res);
> + if (ret < 0)
> + goto error;
> + else if (ret)
> + continue;
> +#endif
> +
> + /*
> + * Ignore any other reserved regions within
> + * system memory.
> + */
> + if (memblock_is_memory(res->start))
> + continue;
> +
> + ret = add_resource(&iomem_resource, res);
> + if (ret < 0)
> + goto error;
> + }
> +
> + /* Add /memory regions to the resource tree */
> + for_each_memblock(memory, region) {
> + res = memblock_alloc(sizeof(struct resource), SMP_CACHE_BYTES);
> + if (!res)
> + panic("%s: Failed to allocate %zu bytes\n", __func__,
> + sizeof(struct resource));
> +
> + if (unlikely(memblock_is_nomap(region))) {
> + res->name = "Reserved";
> + res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
This differs from what I ended up with in that I don't have IORESOURCE_BUSY. I
just copied this blindly from arm64 so I don't actually know what the
difference is.
> + } else {
> + res->name = "System RAM";
> + res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> + }
> +
> + res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
> + res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
> +
> + ret = add_resource(&iomem_resource, res);
> + if (ret < 0)
> + goto error;
> + }
> +
> + return;
> +
> + error:
> + memblock_free((phys_addr_t) res, sizeof(struct resource));
> + /* Better an empty resource tree than an inconsistent one */
> + release_child_resources(&iomem_resource);
> +}
> +
> +
> void __init parse_dtb(void)
> {
> if (early_init_dt_scan(dtb_early_va))
> @@ -74,6 +253,7 @@ void __init setup_arch(char **cmdline_p)
>
> setup_bootmem();
> paging_init();
> + init_resources();
> #if IS_ENABLED(CONFIG_BUILTIN_DTB)
> unflatten_and_copy_device_tree();
> #else
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index f4adb3684..e4ce3a2af 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -2,6 +2,8 @@
> /*
> * Copyright (C) 2012 Regents of the University of California
> * Copyright (C) 2019 Western Digital Corporation or its affiliates.
> + * Copyright (C) 2020 FORTH-ICS/CARV
> + * Nick Kossifidis <mick@ics.forth.gr>
> */
>
> #include <linux/init.h>
> @@ -13,6 +15,7 @@
> #include <linux/of_fdt.h>
> #include <linux/libfdt.h>
> #include <linux/set_memory.h>
> +#include <linux/crash_dump.h>
>
> #include <asm/fixmap.h>
> #include <asm/tlbflush.h>
> @@ -517,6 +520,77 @@ void mark_rodata_ro(void)
> }
> #endif
>
> +#ifdef CONFIG_KEXEC_CORE
> +/*
> + * reserve_crashkernel() - reserves memory for crash kernel
> + *
> + * This function reserves memory area given in "crashkernel=" kernel command
> + * line parameter. The memory reserved is used by dump capture kernel when
> + * primary kernel is crashing.
> + */
> +static void __init reserve_crashkernel(void)
> +{
> + unsigned long long crash_base = 0;
> + unsigned long long crash_size = 0;
> + unsigned long search_start = memblock_start_of_DRAM();
> + unsigned long search_end = memblock_end_of_DRAM();
> +
> + int ret = 0;
> +
> + ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> + &crash_size, &crash_base);
> + if (ret || !crash_size)
> + return;
> +
> + crash_size = PAGE_ALIGN(crash_size);
> +
> + if (crash_base == 0) {
> + /*
> + * Current riscv boot protocol requires 2MB alignment for
> + * RV64 and 4MB alignment for RV32 (hugepage size)
> + */
> + crash_base = memblock_find_in_range(search_start, search_end,
> +#ifdef CONFIG_64BIT
> + crash_size, SZ_2M);
> +#else
> + crash_size, SZ_4M);
> +#endif
> + if (crash_base == 0) {
> + pr_warn("crashkernel: couldn't allocate %lldKB\n",
> + crash_size >> 10);
> + return;
> + }
> + } else {
> + /* User specifies base address explicitly. */
> + if (!memblock_is_region_memory(crash_base, crash_size)) {
> + pr_warn("crashkernel: requested region is not memory\n");
> + return;
> + }
> +
> + if (memblock_is_region_reserved(crash_base, crash_size)) {
> + pr_warn("crashkernel: requested region is reserved\n");
> + return;
> + }
> +
> +#ifdef CONFIG_64BIT
> + if (!IS_ALIGNED(crash_base, SZ_2M)) {
> +#else
> + if (!IS_ALIGNED(crash_base, SZ_4M)) {
> +#endif
> + pr_warn("crashkernel: requested region is misaligned\n");
> + return;
> + }
> + }
> + memblock_reserve(crash_base, crash_size);
> +
> + pr_info("crashkernel: reserved 0x%016llx - 0x%016llx (%lld MB)\n",
> + crash_base, crash_base + crash_size, crash_size >> 20);
> +
> + crashk_res.start = crash_base;
> + crashk_res.end = crash_base + crash_size - 1;
> +}
> +#endif
> +
> void __init paging_init(void)
> {
> setup_vm_final();
> @@ -524,6 +598,9 @@ void __init paging_init(void)
> sparse_init();
> setup_zero_page();
> zone_sizes_init();
> +#ifdef CONFIG_KEXEC_CORE
> + reserve_crashkernel();
> +#endif
> }
>
> #ifdef CONFIG_SPARSEMEM_VMEMMAP
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
If you split out init_resource() then I can take it before the rest of the
kexec stuff.
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 3/3] RISC-V: Add crash kernel support
2020-06-23 15:05 ` [PATCH v2 3/3] RISC-V: Add crash kernel support Nick Kossifidis
@ 2020-07-11 3:59 ` Palmer Dabbelt
2020-07-16 15:52 ` Nick Kossifidis
0 siblings, 1 reply; 12+ messages in thread
From: Palmer Dabbelt @ 2020-07-11 3:59 UTC (permalink / raw)
To: mick
Cc: david.abdurachmanov, anup, Atish Patra, yibin_liu, zong.li,
Paul Walmsley, mick, linux-riscv
On Tue, 23 Jun 2020 08:05:12 PDT (-0700), mick@ics.forth.gr wrote:
> This patch allows Linux to act as a crash kernel for use with
> kdump. Userspace will let the crash kernel know about the
> memory region it can use through linux,usable-memory property
> on the /memory node (overriding its reg property), and about the
> memory region where the elf core header of the previous kernel
> is saved, through a reserved-memory node with a compatible string
> of "linux,elfcorehdr". This approach is the least invasive and
> re-uses functionality already present.
>
> I tested this on riscv64 qemu and it works as expected, you
> may test it by retrieving the dmesg of the previous kernel
> through /proc/vmcore, using the vmcore-dmesg utility from
> kexec-tools.
>
> v2:
> * Use linux,usable-memory on /memory instead of a new binding
> * Use a reserved-memory node for ELF core header
>
> Signed-off-by: Nick Kossifidis <mick@ics.forth.gr>
> ---
> arch/riscv/Kconfig | 10 ++++++++
> arch/riscv/kernel/Makefile | 1 +
> arch/riscv/kernel/crash_dump.c | 46 ++++++++++++++++++++++++++++++++++
> arch/riscv/kernel/setup.c | 35 +++++++++++++++++++++++---
> arch/riscv/mm/init.c | 33 ++++++++++++++++++++++++
> 5 files changed, 122 insertions(+), 3 deletions(-)
> create mode 100644 arch/riscv/kernel/crash_dump.c
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index e7dd2d1c5..6c59987f0 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -353,6 +353,16 @@ config KEXEC
>
> The name comes from the similarity to the exec system call.
>
> +config CRASH_DUMP
> + bool "Build kdump crash kernel"
> + help
> + Generate crash dump after being started by kexec. This should
> + be normally only set in special crash dump kernels which are
> + loaded in the main kernel with kexec-tools into a specially
> + reserved region and then later executed after a crash by
> + kdump/kexec.
> +
> + For more details see Documentation/admin-guide/kdump/kdump.rst
>
> endmenu
>
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index b9a3842ad..f067f55f1 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -53,5 +53,6 @@ endif
> obj-$(CONFIG_HOTPLUG_CPU) += cpu-hotplug.o
> obj-$(CONFIG_KGDB) += kgdb.o
> obj-${CONFIG_KEXEC} += kexec_relocate.o crash_save_regs.o machine_kexec.o
> +obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
>
> clean:
> diff --git a/arch/riscv/kernel/crash_dump.c b/arch/riscv/kernel/crash_dump.c
> new file mode 100644
> index 000000000..81b9d2a71
> --- /dev/null
> +++ b/arch/riscv/kernel/crash_dump.c
> @@ -0,0 +1,46 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * This code comes from arch/arm64/kernel/crash_dump.c
Which is "GPL-2.0-only", not "GPL-2.0". I just sent out a patch set to make
this generic, as a handful of architectures use it exactly (and some others may
be able to).
> + * Created by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> + * Copyright (C) 2017 Linaro Limited
> + */
> +
> +#include <linux/crash_dump.h>
> +#include <linux/io.h>
> +
> +/**
> + * copy_oldmem_page() - copy one page from old kernel memory
> + * @pfn: page frame number to be copied
> + * @buf: buffer where the copied page is placed
> + * @csize: number of bytes to copy
> + * @offset: offset in bytes into the page
> + * @userbuf: if set, @buf is in a user address space
> + *
> + * This function copies one page from old kernel memory into buffer pointed by
> + * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
> + * copied or negative error in case of failure.
> + */
> +ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
> + size_t csize, unsigned long offset,
> + int userbuf)
> +{
> + void *vaddr;
> +
> + if (!csize)
> + return 0;
> +
> + vaddr = memremap(__pfn_to_phys(pfn), PAGE_SIZE, MEMREMAP_WB);
> + if (!vaddr)
> + return -ENOMEM;
> +
> + if (userbuf) {
> + if (copy_to_user((char __user *)buf, vaddr + offset, csize)) {
> + memunmap(vaddr);
> + return -EFAULT;
> + }
> + } else
> + memcpy(buf, vaddr + offset, csize);
> +
> + memunmap(vaddr);
> + return csize;
> +}
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 4dc940389..469470b0b 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -63,6 +63,9 @@ static struct resource code_res = { .name = "Kernel code", };
> static struct resource data_res = { .name = "Kernel data", };
> static struct resource rodata_res = { .name = "Kernel rodata", };
> static struct resource bss_res = { .name = "Kernel bss", };
> +#ifdef CONFIG_CRASH_DUMP
> +static struct resource elfcorehdr_res = { .name = "ELF Core hdr", };
> +#endif
>
> static int __init add_resource(struct resource *parent,
> struct resource *res)
> @@ -133,6 +135,16 @@ static int __init add_crashk_resource(struct resource *res)
> }
> #endif
>
> +#ifdef CONFIG_CRASH_DUMP
> +static int __init add_elfcorehdr_resource(struct resource *res)
> +{
> + if (res->start <= elfcorehdr_res.start && res->end >= elfcorehdr_res.end)
> + return add_resource(&iomem_resource, &elfcorehdr_res);
> +
> + return 0;
> +}
> +#endif
> +
> static void __init init_resources(void)
> {
> struct memblock_region *region = NULL;
> @@ -155,6 +167,14 @@ static void __init init_resources(void)
> bss_res.end = __pa_symbol(__bss_stop) - 1;
> bss_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>
> +#ifdef CONFIG_CRASH_DUMP
> + if (elfcorehdr_size) {
> + elfcorehdr_res.start = elfcorehdr_addr;
> + elfcorehdr_res.end = elfcorehdr_addr + elfcorehdr_size - 1;
> + elfcorehdr_res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> + }
> +#endif
> +
> /*
> * Start by adding the reserved regions, if they overlap
> * with /memory regions, insert_resource later on will take
> @@ -185,6 +206,14 @@ static void __init init_resources(void)
> continue;
> #endif
>
> +#ifdef CONFIG_CRASH_DUMP
> + ret = add_elfcorehdr_resource(res);
> + if (ret < 0)
> + goto error;
> + else if (ret)
> + continue;
> +#endif
> +
> /*
> * Ignore any other reserved regions within
> * system memory.
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index e4ce3a2af..c85ddfe44 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -13,6 +13,7 @@
> #include <linux/swap.h>
> #include <linux/sizes.h>
> #include <linux/of_fdt.h>
> +#include <linux/of_reserved_mem.h>
> #include <linux/libfdt.h>
> #include <linux/set_memory.h>
> #include <linux/crash_dump.h>
> @@ -123,6 +124,26 @@ static void __init setup_initrd(void)
> }
> #endif /* CONFIG_BLK_DEV_INITRD */
>
> +#ifdef CONFIG_CRASH_DUMP
> +/*
> + * We keep track of the ELF core header of the crashed
> + * kernel with a reserved-memory region with compatible
> + * string "linux,elfcorehdr". Here we register a callback
> + * to populate elfcorehdr_addr/size when this region is
> + * present. Note that this region will be marked as
> + * reserved once we call early_init_fdt_scan_reserved_mem()
> + * later on.
> + */
> +static int elfcore_hdr_setup(struct reserved_mem *rmem)
> +{
> + elfcorehdr_addr = rmem->base;
> + elfcorehdr_size = rmem->size;
> + return 0;
> +}
> +
> +RESERVEDMEM_OF_DECLARE(elfcorehdr, "linux,elfcorehdr", elfcore_hdr_setup);
> +#endif
> +
> static phys_addr_t dtb_early_pa __initdata;
>
> void __init setup_bootmem(void)
> @@ -537,6 +558,18 @@ static void __init reserve_crashkernel(void)
>
> int ret = 0;
>
> + /*
> + * Don't reserve a region for a crash kernel on a crash kernel
> + * since it doesn't make much sense and we have limited memory
> + * resources.
> + */
> +#ifdef CONFIG_CRASH_DUMP
> + if (is_kdump_kernel()) {
> + pr_info("crashkernel: ignoring reservation request\n");
> + return;
> + }
> +#endif
> +
> ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
> &crash_size, &crash_base);
> if (ret || !crash_size)
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BPATCH=20v2=200/3=5D=20RISC-V=3A=20Add=20kexec/kdump=20support=E2=80=8B?=
2020-06-23 15:05 [PATCH v2 0/3] RISC-V: Add kexec/kdump support Nick Kossifidis
` (2 preceding siblings ...)
2020-06-23 15:05 ` [PATCH v2 3/3] RISC-V: Add crash kernel support Nick Kossifidis
@ 2020-07-11 3:59 ` Palmer Dabbelt
2020-07-16 14:04 ` BPATCH=20v2=200/3=5D=20RISC-V=3A=20Add=20kexec/kdump=20support=E2=80=8B?= Nick Kossifidis
3 siblings, 1 reply; 12+ messages in thread
From: Palmer Dabbelt @ 2020-07-11 3:59 UTC (permalink / raw)
To: mick
Cc: david.abdurachmanov, anup, Atish Patra, yibin_liu, zong.li,
Paul Walmsley, mick, linux-riscv
On Tue, 23 Jun 2020 08:05:09 PDT (-0700), mick@ics.forth.gr wrote:
> This patch series adds kexec/kdump and crash kernel
> support on RISC-V. For testing the patches a patched
> version of kexec-tools is needed. The patch is still
> a work in progress but a draft version can be found at:
>
> http://riscv.ics.forth.gr/kexec-tools.patch
>
> v2:
> * Re-base on newer kernel tree and minor cleanups
> * Properly populate the ioresources tree, so that it can be
> used later on for implementing strict /dev/mem.
> * Use linux,usable-memory on /memory instead of a new binding
> * Use a reserved-memory node for ELF core header
>
> Nick Kossifidis (3):
> RISC-V: Add kexec support
> RISC-V: Add kdump support
> RISC-V: Add crash kernel support
>
> arch/riscv/Kconfig | 24 +++
> arch/riscv/include/asm/kexec.h | 54 +++++++
> arch/riscv/include/uapi/asm/elf.h | 6 +
> arch/riscv/kernel/Makefile | 2 +
> arch/riscv/kernel/crash_dump.c | 46 ++++++
> arch/riscv/kernel/crash_save_regs.S | 56 +++++++
> arch/riscv/kernel/kexec_relocate.S | 217 ++++++++++++++++++++++++++++
> arch/riscv/kernel/machine_kexec.c | 193 +++++++++++++++++++++++++
> arch/riscv/kernel/setup.c | 209 +++++++++++++++++++++++++++
> arch/riscv/mm/init.c | 110 ++++++++++++++
> include/uapi/linux/kexec.h | 1 +
> 11 files changed, 918 insertions(+)
> create mode 100644 arch/riscv/include/asm/kexec.h
> create mode 100644 arch/riscv/kernel/crash_dump.c
> create mode 100644 arch/riscv/kernel/crash_save_regs.S
> create mode 100644 arch/riscv/kernel/kexec_relocate.S
> create mode 100644 arch/riscv/kernel/machine_kexec.c
There's some fairly straight-forward to fix issues here, LMK if you want to fix
them or if you want me to take it over.
Thanks!
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: BPATCH=20v2=200/3=5D=20RISC-V=3A=20Add=20kexec/kdump=20support=E2=80=8B?=
2020-07-11 3:59 ` BPATCH=20v2=200/3=5D=20RISC-V=3A=20Add=20kexec/kdump=20support=E2=80=8B?= Palmer Dabbelt
@ 2020-07-16 14:04 ` Nick Kossifidis
0 siblings, 0 replies; 12+ messages in thread
From: Nick Kossifidis @ 2020-07-16 14:04 UTC (permalink / raw)
To: Palmer Dabbelt
Cc: david.abdurachmanov, anup, Atish Patra, yibin_liu, zong.li,
Paul Walmsley, mick, linux-riscv
Στις 2020-07-11 06:59, Palmer Dabbelt έγραψε:
> There's some fairly straight-forward to fix issues here, LMK if you
> want to fix
> them or if you want me to take it over.
>
> Thanks!
Thanks a lot for the feedback ! I'll send a patch to fix the issues with
kexec (also the one caught by the build bot), another patch for the
ioresources and re-send kdump/crashkernel.
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/3] RISC-V: Add kexec support
2020-07-11 3:59 ` Palmer Dabbelt
@ 2020-07-16 14:57 ` Nick Kossifidis
0 siblings, 0 replies; 12+ messages in thread
From: Nick Kossifidis @ 2020-07-16 14:57 UTC (permalink / raw)
To: Palmer Dabbelt
Cc: david.abdurachmanov, anup, Atish Patra, yibin_liu, zong.li,
Paul Walmsley, mick, linux-riscv
Στις 2020-07-11 06:59, Palmer Dabbelt έγραψε:
> On Tue, 23 Jun 2020 08:05:10 PDT (-0700), mick@ics.forth.gr wrote:
>> +
>> +config KEXEC
>> + bool "Kexec system call"
>> + select KEXEC_CORE
>> + select HOTPLUG_CPU if SMP
>
> This needs an S-mode dependency, as this definately won't run as-is in
> M mode.
> While we might be fix up the CSRs pretty quickly, but as we don't
> really have
> any standard-smelling M-mode platforms
>
Good point, I'll add a dependency on MMU for now and work on supporting
NOMMU platforms at a later stage.
>> + /*
>> + * When we switch SATP.MODE to "Bare" we'll only
>> + * play with physical addresses. However the first time
>> + * we try to jump somewhere, the offset on the jump
>> + * will be relative to pc which will still be on VA. To
>> + * deal with this we set stvec to the physical address at
>> + * the start of the loop below so that we jump there in
>> + * any case.
>> + */
>> + la s8, 1f
>> + sub s8, s8, s7
>> + csrw stvec, s8
>
> stvec needs to be aligned.
>
I thought the compiler would align the address of 1: since it's a label.
Would an ".align 8" after the label do the trick ?
>> +
>> + /* Process entries in a loop */
>> +1:
>> + addi s10, s10, 1
>> + REG_L t0, 0(s0) /* t0 = *image->entry */
>> + addi s0, s0, RISCV_SZPTR /* image->entry++ */
>> +
>> + /* IND_DESTINATION entry ? -> save destination address */
>> + andi t1, t0, 0x1
>> + beqz t1, 2f
>> + andi s4, t0, ~0x1
>> + j 1b
>> +
>> +2:
>> + /* IND_INDIRECTION entry ? -> update next entry ptr (PA) */
>> + andi t1, t0, 0x2
>> + beqz t1, 2f
>> + andi s0, t0, ~0x2
>> + addi s9, s9, 1
>> + csrw sptbr, zero
>
> If I understand correctly (it's ambiguous in the ISA right now), we
> don't need
> a fence here. I've opened
> https://github.com/riscv/riscv-isa-manual/issues/538
> for clarification.
>
Thanks, that was also my understanding since we don't modify the page
table but we just skip it.
>> +4:
>> + /* Wait for the relocation to be visible by other harts */
>> + fence w,w
>
> That's not how fences work. A "fence w,w" just ensures that prior
> writes are
> visible before subsequent writes. Usually that would be some sort of
> control
> write being ordered after the data writes, which on the other harts
> would be
> paired with a "fence r,r" to make sure the control read is seen before
> any
> other reads. I don't see that, though.
>
> I think the right answer here is to ignore ordering here, as we're
> operating in
> a single processor mode, and instead push the fences into the CPU
> hotplug up
> codepath. We have to handle that anyway, so we might as well just do
> it once.
>
Good catch, that line was a leftover from debuging CPU_HOTPLUG, I'll
clean it up. The fence.i later on is the important one.
>> + /* Copy the assembler code for relocation to the control page */
>> + control_code_buffer = page_address(image->control_code_page);
>> + memcpy(control_code_buffer, riscv_kexec_relocate,
>> + riscv_kexec_relocate_size);
>
> The toolchain doesn't make any guarantees that you can move code around
> without
> -fPIC, but we're already taking advantage of the fact that simply
> medany code
> can be moved around so this should be OK. This is all assembly and it
> looks
> fine, but since it's placed in the rodata segment there's at least a
> reasonably possibility that GP could end up close enough that things
> get
> relaxed. We'd be safer adding .norelax to that assembly file, as
> there's
> nothing in there that's performance critical.
>
Good point, I assumed PC-relative addressing. Should I also add medany
to the CFLAGS of machine_kexec.c or add a dependency on CMODEL_MEDANY ?
> Also, as far as I can tell nothing is checking that
> riscv_kexec_relocate_size
> is smaller than a page. That could cause overflows in this memcpy, but
> a
> static check should be sufficient.
>
A page is huge compared to the relocation code plus we already know its
size at compile time so I thought it was an overkill to check it at
runtime every time. I'll see if I can do something with the preprocessor
there or I'll add a runtime check just in case.
>> +
>> + /* Mark the control page executable */
>> + set_memory_x((unsigned long) control_code_buffer, 1);
>> +
>> +#ifdef CONFIG_SMP
>> + /*
>> + * Make sure other harts see the copied data
>> + * if they try to read the control buffer
>> + */
>> + smp_wmb();
>> +#endif
>
> This appears to have similar issues as the fence.
>
> Also, smp_wmb() already has the relevant CONFIG_SMP checks.
>
I thought it would be safer to add a write memory barrier here, on the
other hand this code is executed upon loading the image so it doesn't
make much sense, I'll clean it up.
>> diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
>> index 05669c87a..778dc191c 100644
>> --- a/include/uapi/linux/kexec.h
>> +++ b/include/uapi/linux/kexec.h
>> @@ -42,6 +42,7 @@
>> #define KEXEC_ARCH_MIPS_LE (10 << 16)
>> #define KEXEC_ARCH_MIPS ( 8 << 16)
>> #define KEXEC_ARCH_AARCH64 (183 << 16)
>> +#define KEXEC_ARCH_RISCV (243 << 16)
>
> I usually split the UAPI changes out as their own patch. This one is
> pretty
> trivial, but it's always nice to make sure everyone knows we're making
> a UAPI
> change just to be on the safe side.
>
Good point, the reason I put this here is so that the patch is complete
and compiles fine (we include this on include/asm/kexec.h).
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/3] RISC-V: Add kdump support
2020-07-11 3:59 ` Palmer Dabbelt
@ 2020-07-16 15:31 ` Nick Kossifidis
0 siblings, 0 replies; 12+ messages in thread
From: Nick Kossifidis @ 2020-07-16 15:31 UTC (permalink / raw)
To: Palmer Dabbelt
Cc: david.abdurachmanov, anup, Atish Patra, yibin_liu, zong.li,
Paul Walmsley, mick, linux-riscv
Στις 2020-07-11 06:59, Palmer Dabbelt έγραψε:
> On Tue, 23 Jun 2020 08:05:11 PDT (-0700), mick@ics.forth.gr wrote:
>>
>> +
>> + /* Switch to physical addressing */
>> + csrw sptbr, zero
>
> I guess I'm missing something, but I'd assume that we would need some
> sort of
> tvec juggling here like we usually do when jumping between virtual and
> physical
> spaces. I get that we're not relocating the kernel that we're jumping
> to, but
> I don't understand how PAGE_OFFSET works.
>
We only clean up registers after this, interrupts are disabled, and jalr
will be called with an absolute physical address after satp is zero. I
don't see how we can end up with a trap but I can add a1 to stvec just
in case if you want.
>> +
>> + /* Pass the arguments to the next kernel / Cleanup*/
>
> Missing space in that comment.
>
ACK
>> + /* Add /memory regions to the resource tree */
>> + for_each_memblock(memory, region) {
>> + res = memblock_alloc(sizeof(struct resource), SMP_CACHE_BYTES);
>> + if (!res)
>> + panic("%s: Failed to allocate %zu bytes\n", __func__,
>> + sizeof(struct resource));
>> +
>> + if (unlikely(memblock_is_nomap(region))) {
>> + res->name = "Reserved";
>> + res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
>
> This differs from what I ended up with in that I don't have
> IORESOURCE_BUSY. I
> just copied this blindly from arm64 so I don't actually know what the
> difference is.
>
For now since we don't mark any /memory regions with the NOMAP flag (ARM
for example does this for the kernel image region) this code won't be
executed, I put it there for completeness in case we do so in the
future. In any case the idea is to prevent a region marked with NOMAP to
be accessible through /dev/mem when STRICT_DEVMEM & IO_STRICT_DEVMEM is
enabled. On devmem_is_allowed(), the default page_is_ram() will check
for IORESOURCE_SYSTEM_RAM flag so it will not prevent this region from
being mapped through /dev/mem since it's flaged with IORESOURCE_MEM
instead. The idea is to rely on iomem_is_exclusive() to do the work, and
for that we need the region to be marked as busy
(https://elixir.bootlin.com/linux/v5.8-rc4/source/kernel/resource.c#L1614).
I wanted to do something to also honor the no_map flag on the
/reserved-memory regions that according to the device tree spec,
shouldn't be available to any other than the driver (so we should also
prevent them from being accessible through /dev/mem), and I noticed that
other archs don't deal with this. However this was a bit off-topic since
it had nothing to do with kexec/kdump.
>
> If you split out init_resource() then I can take it before the rest of
> the
> kexec stuff.
I'll send a patch for populating ioresources and also add the extra
sauce for handling no_map for /reserved-memory regions. Then I'll send
kdump / crashkernel on top. Is it OK if I remove the existing code from
mm/init.c and add the new one on kernel/setup.c ? After all it's not
related to mm, ioresources is a different thing and other archs also
handle this on setup.c.
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 3/3] RISC-V: Add crash kernel support
2020-07-11 3:59 ` Palmer Dabbelt
@ 2020-07-16 15:52 ` Nick Kossifidis
0 siblings, 0 replies; 12+ messages in thread
From: Nick Kossifidis @ 2020-07-16 15:52 UTC (permalink / raw)
To: Palmer Dabbelt
Cc: david.abdurachmanov, anup, Atish Patra, yibin_liu, zong.li,
Paul Walmsley, mick, linux-riscv
Στις 2020-07-11 06:59, Palmer Dabbelt έγραψε:
> On Tue, 23 Jun 2020 08:05:12 PDT (-0700), mick@ics.forth.gr wrote:
>> new file mode 100644
>> index 000000000..81b9d2a71
>> --- /dev/null
>> +++ b/arch/riscv/kernel/crash_dump.c
>> @@ -0,0 +1,46 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * This code comes from arch/arm64/kernel/crash_dump.c
>
> Which is "GPL-2.0-only", not "GPL-2.0". I just sent out a patch set to
> make
> this generic, as a handful of architectures use it exactly (and some
> others may
> be able to).
>
Good call, it'll be much cleaner this way, thank you ! I'll re-send this
once your patch gets merged.
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2020-07-16 15:53 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-23 15:05 [PATCH v2 0/3] RISC-V: Add kexec/kdump support Nick Kossifidis
2020-06-23 15:05 ` [PATCH v2 1/3] RISC-V: Add kexec support Nick Kossifidis
2020-07-11 3:59 ` Palmer Dabbelt
2020-07-16 14:57 ` Nick Kossifidis
2020-06-23 15:05 ` [PATCH v2 2/3] RISC-V: Add kdump support Nick Kossifidis
2020-07-11 3:59 ` Palmer Dabbelt
2020-07-16 15:31 ` Nick Kossifidis
2020-06-23 15:05 ` [PATCH v2 3/3] RISC-V: Add crash kernel support Nick Kossifidis
2020-07-11 3:59 ` Palmer Dabbelt
2020-07-16 15:52 ` Nick Kossifidis
2020-07-11 3:59 ` BPATCH=20v2=200/3=5D=20RISC-V=3A=20Add=20kexec/kdump=20support=E2=80=8B?= Palmer Dabbelt
2020-07-16 14:04 ` BPATCH=20v2=200/3=5D=20RISC-V=3A=20Add=20kexec/kdump=20support=E2=80=8B?= Nick Kossifidis
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).