linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2 v6] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active
@ 2019-10-28  2:45 Lianbo Jiang
  2019-10-28  2:45 ` [PATCH 1/2 v6] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified Lianbo Jiang
  2019-10-28  2:45 ` [PATCH 2/2 v6] x86/kdump: clean up all the code related to the backup region Lianbo Jiang
  0 siblings, 2 replies; 5+ messages in thread
From: Lianbo Jiang @ 2019-10-28  2:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, bp, hpa, x86, bhe, dyoung, jgross, dhowells,
	Thomas.Lendacky, ebiederm, vgoyal, d.hatayama, horms, kexec

In purgatory(), the main things are as below:

[1] verify sha256 hashes for various segments.
    Lets keep these codes, and do not touch the logic.

[2] copy the first 640k content to a backup region.
    Lets safely remove it and clean all code related to backup region.

This patch series will remove the backup region, because the current
handling of copying the first 640k runs into problems when SME is
active(https://bugzilla.kernel.org/show_bug.cgi?id=204793).

The low 1MiB region will always be reserved when the crashkernel kernel
command line option is specified. And this way makes it unnecessary to
do anything with the low 1MiB region, because the memory allocated later
won't fall into the low 1MiB area.

This series includes two patches:
[1] x86/kdump: always reserve the low 1MiB when the crashkernel option
    is specified
    The low 1MiB region will always be reserved when the crashkernel
    kernel command line option is specified, which ensures that the
    memory allocated later won't fall into the low 1MiB area.

[2] x86/kdump: clean up all the code related to the backup region
    Remove the backup region and clean up.

Changes since v1:
[1] Add extra checking condition: when the crashkernel option is
    specified, reserve the low 640k area.

Changes since v2:
[1] Reserve the low 1MiB region when the crashkernel option is only
    specified.(Suggested by Eric)

[2] Remove the unused crash_copy_backup_region()

[3] Remove the backup region and clean up

[4] Split them into three patches

Changes since v3:
[1] Improve the first patch's log

[2] Improve the third patch based on Eric's suggestions

Changes since v4:
[1] Correct some typos, and also improve the first patch's log

[2] Add a new function kexec_reserve_low_1MiB() in kernel/kexec_core.c
    and which is called by reserve_real_mode(). (Suggested by Boris)

Changes since v5:
[1] Call the cmdline_find_option() instead of strstr() to check the
    crashkernel option. (Suggested by Hatayama)

[2] Add a weak function kexec_reserve_low_1MiB() in kernel/kexec_core.c,
    and implement the kexec_reserve_low_1MiB() in arch/x86/kernel/
    machine_kexec_64.c so that it does not cause the compile error
    on non-x86 kernel, and also ensures that it can work well on x86
    kernel.

Lianbo Jiang (2):
  x86/kdump: always reserve the low 1MiB when the crashkernel option is
    specified
  x86/kdump: clean up all the code related to the backup region

 arch/x86/include/asm/kexec.h       | 10 ----
 arch/x86/include/asm/purgatory.h   | 10 ----
 arch/x86/kernel/crash.c            | 87 ++++--------------------------
 arch/x86/kernel/machine_kexec_64.c | 62 ++++++---------------
 arch/x86/purgatory/purgatory.c     | 19 -------
 arch/x86/realmode/init.c           |  2 +
 include/linux/kexec.h              |  2 +
 kernel/kexec_core.c                |  3 ++
 8 files changed, 33 insertions(+), 162 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2 v6] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified
  2019-10-28  2:45 [PATCH 0/2 v6] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active Lianbo Jiang
@ 2019-10-28  2:45 ` Lianbo Jiang
  2019-10-28  3:19   ` Dave Young
  2019-10-28  2:45 ` [PATCH 2/2 v6] x86/kdump: clean up all the code related to the backup region Lianbo Jiang
  1 sibling, 1 reply; 5+ messages in thread
From: Lianbo Jiang @ 2019-10-28  2:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, bp, hpa, x86, bhe, dyoung, jgross, dhowells,
	Thomas.Lendacky, ebiederm, vgoyal, d.hatayama, horms, kexec

Kdump kernel will reuse the first 640k region because the real mode
trampoline has to work in this area. When the vmcore is dumped, the
old memory in this area may be accessed, therefore, kernel has to
copy the contents of the first 640k area to a backup region so that
kdump kernel can read the old memory from the backup area of the
first 640k area, which is done in the purgatory().

But, the current handling of copying the first 640k area runs into
problems when SME is enabled, kernel does not properly copy these
old memory to the backup area in the purgatory(), thereby, kdump
kernel reads out the encrypted contents, because the kdump kernel
must access the first kernel's memory with the encryption bit set
when SME is enabled in the first kernel. Please refer to this link:

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793

Finally, it causes the following errors, and the crash tool gets
invalid pointers when parsing the vmcore.

crash> kmem -s|grep -i invalid
kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
crash>

To avoid the above errors, when the crashkernel option is specified,
lets reserve the remaining low 1MiB memory(after reserving real mode
memory) so that the allocated memory does not fall into the low 1MiB
area, which makes us not to copy the first 640k content to a backup
region in purgatory(). This indicates that it does not need to be
included in crash dumps or used for anything except the processor
trampolines that must live in the low 1MiB.

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
---
BTW:I also tried to fix the above problem in purgatory(), but there
are too many restricts in purgatory() context, for example: i can't
allocate new memory to create the identity mapping page table for
SME situation.

Currently, there are two places where the first 640k area is needed,
the first one is in the find_trampoline_placement(), another one is
in the reserve_real_mode(), and their content doesn't matter.

In addition, also need to clean all the code related to the backup
region later.

 arch/x86/kernel/machine_kexec_64.c | 15 +++++++++++++++
 arch/x86/realmode/init.c           |  2 ++
 include/linux/kexec.h              |  2 ++
 kernel/kexec_core.c                |  3 +++
 4 files changed, 22 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 5dcd438ad8f2..42d7c15c45f1 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -17,6 +17,7 @@
 #include <linux/suspend.h>
 #include <linux/vmalloc.h>
 #include <linux/efi.h>
+#include <linux/memblock.h>
 
 #include <asm/init.h>
 #include <asm/pgtable.h>
@@ -27,6 +28,7 @@
 #include <asm/kexec-bzimage64.h>
 #include <asm/setup.h>
 #include <asm/set_memory.h>
+#include <asm/cmdline.h>
 
 #ifdef CONFIG_ACPI
 /*
@@ -687,3 +689,16 @@ void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
 	 */
 	set_memory_encrypted((unsigned long)vaddr, pages);
 }
+
+/*
+ * When the crashkernel option is specified, only use the low
+ * 1MiB for the real mode trampoline.
+ */
+void __init kexec_reserve_low_1MiB(void)
+{
+	if (cmdline_find_option(boot_command_line, "crashkernel",
+				NULL, 0) > 0) {
+		memblock_reserve(0, 1<<20);
+		pr_info("Reserving the low 1MiB of memory for crashkernel\n");
+	}
+}
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 7dce39c8c034..064cc79a015d 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -3,6 +3,7 @@
 #include <linux/slab.h>
 #include <linux/memblock.h>
 #include <linux/mem_encrypt.h>
+#include <linux/kexec.h>
 
 #include <asm/set_memory.h>
 #include <asm/pgtable.h>
@@ -34,6 +35,7 @@ void __init reserve_real_mode(void)
 
 	memblock_reserve(mem, size);
 	set_real_mode_mem(mem);
+	kexec_reserve_low_1MiB();
 }
 
 static void __init setup_real_mode(void)
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 1776eb2e43a4..988bf2de51a7 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -306,6 +306,7 @@ extern void __crash_kexec(struct pt_regs *);
 extern void crash_kexec(struct pt_regs *);
 int kexec_should_crash(struct task_struct *);
 int kexec_crash_loaded(void);
+void __init kexec_reserve_low_1MiB(void);
 void crash_save_cpu(struct pt_regs *regs, int cpu);
 extern int kimage_crash_copy_vmcoreinfo(struct kimage *image);
 
@@ -397,6 +398,7 @@ static inline void __crash_kexec(struct pt_regs *regs) { }
 static inline void crash_kexec(struct pt_regs *regs) { }
 static inline int kexec_should_crash(struct task_struct *p) { return 0; }
 static inline int kexec_crash_loaded(void) { return 0; }
+static inline void __init kexec_reserve_low_1MiB(void) { }
 #define kexec_in_progress false
 #endif /* CONFIG_KEXEC_CORE */
 
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 15d70a90b50d..8856047bcdc8 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1213,3 +1213,6 @@ void __weak arch_kexec_protect_crashkres(void)
 
 void __weak arch_kexec_unprotect_crashkres(void)
 {}
+
+void __init __weak kexec_reserve_low_1MiB(void)
+{}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2 v6] x86/kdump: clean up all the code related to the backup region
  2019-10-28  2:45 [PATCH 0/2 v6] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active Lianbo Jiang
  2019-10-28  2:45 ` [PATCH 1/2 v6] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified Lianbo Jiang
@ 2019-10-28  2:45 ` Lianbo Jiang
  1 sibling, 0 replies; 5+ messages in thread
From: Lianbo Jiang @ 2019-10-28  2:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, bp, hpa, x86, bhe, dyoung, jgross, dhowells,
	Thomas.Lendacky, ebiederm, vgoyal, d.hatayama, horms, kexec

When the crashkernel kernel command line option is specified, the
low 1MiB memory will always be reserved, which makes that the memory
allocated later won't fall into the low 1MiB area, thereby, it's not
necessary to create a backup region and also no need to copy the first
640k content to a backup region.

Currently, the code related to the backup region can be safely removed,
so lets clean up.

Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
---
 arch/x86/include/asm/kexec.h       | 10 ----
 arch/x86/include/asm/purgatory.h   | 10 ----
 arch/x86/kernel/crash.c            | 87 ++++--------------------------
 arch/x86/kernel/machine_kexec_64.c | 47 ----------------
 arch/x86/purgatory/purgatory.c     | 19 -------
 5 files changed, 11 insertions(+), 162 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 5e7d6b46de97..6802c59e8252 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -66,10 +66,6 @@ struct kimage;
 # define KEXEC_ARCH KEXEC_ARCH_X86_64
 #endif
 
-/* Memory to backup during crash kdump */
-#define KEXEC_BACKUP_SRC_START	(0UL)
-#define KEXEC_BACKUP_SRC_END	(640 * 1024UL - 1)	/* 640K */
-
 /*
  * This function is responsible for capturing register states if coming
  * via panic otherwise just fix up the ss and sp if coming via kernel
@@ -154,12 +150,6 @@ struct kimage_arch {
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
-	/* Details of backup region */
-	unsigned long backup_src_start;
-	unsigned long backup_src_sz;
-
-	/* Physical address of backup segment */
-	unsigned long backup_load_addr;
 
 	/* Core ELF header buffer */
 	void *elf_headers;
diff --git a/arch/x86/include/asm/purgatory.h b/arch/x86/include/asm/purgatory.h
index 92c34e517da1..5528e9325049 100644
--- a/arch/x86/include/asm/purgatory.h
+++ b/arch/x86/include/asm/purgatory.h
@@ -6,16 +6,6 @@
 #include <linux/purgatory.h>
 
 extern void purgatory(void);
-/*
- * These forward declarations serve two purposes:
- *
- * 1) Make sparse happy when checking arch/purgatory
- * 2) Document that these are required to be global so the symbol
- *    lookup in kexec works
- */
-extern unsigned long purgatory_backup_dest;
-extern unsigned long purgatory_backup_src;
-extern unsigned long purgatory_backup_sz;
 #endif	/* __ASSEMBLY__ */
 
 #endif /* _ASM_PURGATORY_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index eb651fbde92a..ef54b3ffb0f6 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -173,8 +173,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 
 #ifdef CONFIG_KEXEC_FILE
 
-static unsigned long crash_zero_bytes;
-
 static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
 	unsigned int *nr_ranges = arg;
@@ -217,6 +215,11 @@ static int elf_header_exclude_ranges(struct crash_mem *cmem)
 {
 	int ret = 0;
 
+	/* Exclude the low 1MiB because it is always reserved */
+	ret = crash_exclude_mem_range(cmem, 0, 1<<20);
+	if (ret)
+		return ret;
+
 	/* Exclude crashkernel region */
 	ret = crash_exclude_mem_range(cmem, crashk_res.start, crashk_res.end);
 	if (ret)
@@ -246,9 +249,7 @@ static int prepare_elf_headers(struct kimage *image, void **addr,
 					unsigned long *sz)
 {
 	struct crash_mem *cmem;
-	Elf64_Ehdr *ehdr;
-	Elf64_Phdr *phdr;
-	int ret, i;
+	int ret;
 
 	cmem = fill_up_crash_elf_data();
 	if (!cmem)
@@ -267,22 +268,7 @@ static int prepare_elf_headers(struct kimage *image, void **addr,
 	/* By default prepare 64bit headers */
 	ret =  crash_prepare_elf64_headers(cmem,
 				IS_ENABLED(CONFIG_X86_64), addr, sz);
-	if (ret)
-		goto out;
 
-	/*
-	 * If a range matches backup region, adjust offset to backup
-	 * segment.
-	 */
-	ehdr = (Elf64_Ehdr *)*addr;
-	phdr = (Elf64_Phdr *)(ehdr + 1);
-	for (i = 0; i < ehdr->e_phnum; phdr++, i++)
-		if (phdr->p_type == PT_LOAD &&
-				phdr->p_paddr == image->arch.backup_src_start &&
-				phdr->p_memsz == image->arch.backup_src_sz) {
-			phdr->p_offset = image->arch.backup_load_addr;
-			break;
-		}
 out:
 	vfree(cmem);
 	return ret;
@@ -321,19 +307,11 @@ static int memmap_exclude_ranges(struct kimage *image, struct crash_mem *cmem,
 				 unsigned long long mend)
 {
 	unsigned long start, end;
-	int ret = 0;
 
 	cmem->ranges[0].start = mstart;
 	cmem->ranges[0].end = mend;
 	cmem->nr_ranges = 1;
 
-	/* Exclude Backup region */
-	start = image->arch.backup_load_addr;
-	end = start + image->arch.backup_src_sz - 1;
-	ret = crash_exclude_mem_range(cmem, start, end);
-	if (ret)
-		return ret;
-
 	/* Exclude elf header region */
 	start = image->arch.elf_load_addr;
 	end = start + image->arch.elf_headers_sz - 1;
@@ -356,11 +334,11 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
 	memset(&cmd, 0, sizeof(struct crash_memmap_data));
 	cmd.params = params;
 
-	/* Add first 640K segment */
-	ei.addr = image->arch.backup_src_start;
-	ei.size = image->arch.backup_src_sz;
-	ei.type = E820_TYPE_RAM;
-	add_e820_entry(params, &ei);
+	/* Add the low 1MiB */
+	cmd.type = E820_TYPE_RAM;
+	flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+	walk_iomem_res_desc(IORES_DESC_NONE, flags, 0, (1<<20)-1, &cmd,
+			memmap_entry_callback);
 
 	/* Add ACPI tables */
 	cmd.type = E820_TYPE_ACPI;
@@ -409,55 +387,12 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
 	return ret;
 }
 
-static int determine_backup_region(struct resource *res, void *arg)
-{
-	struct kimage *image = arg;
-
-	image->arch.backup_src_start = res->start;
-	image->arch.backup_src_sz = resource_size(res);
-
-	/* Expecting only one range for backup region */
-	return 1;
-}
-
 int crash_load_segments(struct kimage *image)
 {
 	int ret;
 	struct kexec_buf kbuf = { .image = image, .buf_min = 0,
 				  .buf_max = ULONG_MAX, .top_down = false };
 
-	/*
-	 * Determine and load a segment for backup area. First 640K RAM
-	 * region is backup source
-	 */
-
-	ret = walk_system_ram_res(KEXEC_BACKUP_SRC_START, KEXEC_BACKUP_SRC_END,
-				image, determine_backup_region);
-
-	/* Zero or postive return values are ok */
-	if (ret < 0)
-		return ret;
-
-	/* Add backup segment. */
-	if (image->arch.backup_src_sz) {
-		kbuf.buffer = &crash_zero_bytes;
-		kbuf.bufsz = sizeof(crash_zero_bytes);
-		kbuf.memsz = image->arch.backup_src_sz;
-		kbuf.buf_align = PAGE_SIZE;
-		/*
-		 * Ideally there is no source for backup segment. This is
-		 * copied in purgatory after crash. Just add a zero filled
-		 * segment for now to make sure checksum logic works fine.
-		 */
-		ret = kexec_add_buffer(&kbuf);
-		if (ret)
-			return ret;
-		image->arch.backup_load_addr = kbuf.mem;
-		pr_debug("Loaded backup region at 0x%lx backup_start=0x%lx memsz=0x%lx\n",
-			 image->arch.backup_load_addr,
-			 image->arch.backup_src_start, kbuf.memsz);
-	}
-
 	/* Prepare elf headers and add a segment */
 	ret = prepare_elf_headers(image, &kbuf.buffer, &kbuf.bufsz);
 	if (ret)
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 42d7c15c45f1..8afe2a8e41f8 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -300,48 +300,6 @@ static void load_segments(void)
 		);
 }
 
-#ifdef CONFIG_KEXEC_FILE
-/* Update purgatory as needed after various image segments have been prepared */
-static int arch_update_purgatory(struct kimage *image)
-{
-	int ret = 0;
-
-	if (!image->file_mode)
-		return 0;
-
-	/* Setup copying of backup region */
-	if (image->type == KEXEC_TYPE_CRASH) {
-		ret = kexec_purgatory_get_set_symbol(image,
-				"purgatory_backup_dest",
-				&image->arch.backup_load_addr,
-				sizeof(image->arch.backup_load_addr), 0);
-		if (ret)
-			return ret;
-
-		ret = kexec_purgatory_get_set_symbol(image,
-				"purgatory_backup_src",
-				&image->arch.backup_src_start,
-				sizeof(image->arch.backup_src_start), 0);
-		if (ret)
-			return ret;
-
-		ret = kexec_purgatory_get_set_symbol(image,
-				"purgatory_backup_sz",
-				&image->arch.backup_src_sz,
-				sizeof(image->arch.backup_src_sz), 0);
-		if (ret)
-			return ret;
-	}
-
-	return ret;
-}
-#else /* !CONFIG_KEXEC_FILE */
-static inline int arch_update_purgatory(struct kimage *image)
-{
-	return 0;
-}
-#endif /* CONFIG_KEXEC_FILE */
-
 int machine_kexec_prepare(struct kimage *image)
 {
 	unsigned long start_pgtable;
@@ -355,11 +313,6 @@ int machine_kexec_prepare(struct kimage *image)
 	if (result)
 		return result;
 
-	/* update purgatory as needed */
-	result = arch_update_purgatory(image);
-	if (result)
-		return result;
-
 	return 0;
 }
 
diff --git a/arch/x86/purgatory/purgatory.c b/arch/x86/purgatory/purgatory.c
index 3b95410ff0f8..2961234d0795 100644
--- a/arch/x86/purgatory/purgatory.c
+++ b/arch/x86/purgatory/purgatory.c
@@ -14,28 +14,10 @@
 
 #include "../boot/string.h"
 
-unsigned long purgatory_backup_dest __section(.kexec-purgatory);
-unsigned long purgatory_backup_src __section(.kexec-purgatory);
-unsigned long purgatory_backup_sz __section(.kexec-purgatory);
-
 u8 purgatory_sha256_digest[SHA256_DIGEST_SIZE] __section(.kexec-purgatory);
 
 struct kexec_sha_region purgatory_sha_regions[KEXEC_SEGMENT_MAX] __section(.kexec-purgatory);
 
-/*
- * On x86, second kernel requries first 640K of memory to boot. Copy
- * first 640K to a backup region in reserved memory range so that second
- * kernel can use first 640K.
- */
-static int copy_backup_region(void)
-{
-	if (purgatory_backup_dest) {
-		memcpy((void *)purgatory_backup_dest,
-		       (void *)purgatory_backup_src, purgatory_backup_sz);
-	}
-	return 0;
-}
-
 static int verify_sha256_digest(void)
 {
 	struct kexec_sha_region *ptr, *end;
@@ -66,7 +48,6 @@ void purgatory(void)
 		for (;;)
 			;
 	}
-	copy_backup_region();
 }
 
 /*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2 v6] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified
  2019-10-28  2:45 ` [PATCH 1/2 v6] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified Lianbo Jiang
@ 2019-10-28  3:19   ` Dave Young
  2019-10-28  3:37     ` lijiang
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Young @ 2019-10-28  3:19 UTC (permalink / raw)
  To: Lianbo Jiang
  Cc: linux-kernel, tglx, mingo, bp, hpa, x86, bhe, jgross, dhowells,
	Thomas.Lendacky, ebiederm, vgoyal, d.hatayama, horms, kexec

On 10/28/19 at 10:45am, Lianbo Jiang wrote:
> Kdump kernel will reuse the first 640k region because the real mode
> trampoline has to work in this area. When the vmcore is dumped, the
> old memory in this area may be accessed, therefore, kernel has to
> copy the contents of the first 640k area to a backup region so that
> kdump kernel can read the old memory from the backup area of the
> first 640k area, which is done in the purgatory().
> 
> But, the current handling of copying the first 640k area runs into
> problems when SME is enabled, kernel does not properly copy these
> old memory to the backup area in the purgatory(), thereby, kdump
> kernel reads out the encrypted contents, because the kdump kernel
> must access the first kernel's memory with the encryption bit set
> when SME is enabled in the first kernel. Please refer to this link:
> 
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793
> 
> Finally, it causes the following errors, and the crash tool gets
> invalid pointers when parsing the vmcore.
> 
> crash> kmem -s|grep -i invalid
> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
> crash>
> 
> To avoid the above errors, when the crashkernel option is specified,
> lets reserve the remaining low 1MiB memory(after reserving real mode
> memory) so that the allocated memory does not fall into the low 1MiB
> area, which makes us not to copy the first 640k content to a backup
> region in purgatory(). This indicates that it does not need to be
> included in crash dumps or used for anything except the processor
> trampolines that must live in the low 1MiB.
> 
> Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
> ---
> BTW:I also tried to fix the above problem in purgatory(), but there
> are too many restricts in purgatory() context, for example: i can't
> allocate new memory to create the identity mapping page table for
> SME situation.
> 
> Currently, there are two places where the first 640k area is needed,
> the first one is in the find_trampoline_placement(), another one is
> in the reserve_real_mode(), and their content doesn't matter.
> 
> In addition, also need to clean all the code related to the backup
> region later.
> 
>  arch/x86/kernel/machine_kexec_64.c | 15 +++++++++++++++
>  arch/x86/realmode/init.c           |  2 ++
>  include/linux/kexec.h              |  2 ++
>  kernel/kexec_core.c                |  3 +++
>  4 files changed, 22 insertions(+)
> 
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 5dcd438ad8f2..42d7c15c45f1 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -17,6 +17,7 @@
>  #include <linux/suspend.h>
>  #include <linux/vmalloc.h>
>  #include <linux/efi.h>
> +#include <linux/memblock.h>
>  
>  #include <asm/init.h>
>  #include <asm/pgtable.h>
> @@ -27,6 +28,7 @@
>  #include <asm/kexec-bzimage64.h>
>  #include <asm/setup.h>
>  #include <asm/set_memory.h>
> +#include <asm/cmdline.h>
>  
>  #ifdef CONFIG_ACPI
>  /*
> @@ -687,3 +689,16 @@ void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
>  	 */
>  	set_memory_encrypted((unsigned long)vaddr, pages);
>  }
> +
> +/*
> + * When the crashkernel option is specified, only use the low
> + * 1MiB for the real mode trampoline.
> + */
> +void __init kexec_reserve_low_1MiB(void)
> +{
> +	if (cmdline_find_option(boot_command_line, "crashkernel",
> +				NULL, 0) > 0) {
> +		memblock_reserve(0, 1<<20);
> +		pr_info("Reserving the low 1MiB of memory for crashkernel\n");
> +	}
> +}
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 7dce39c8c034..064cc79a015d 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -3,6 +3,7 @@
>  #include <linux/slab.h>
>  #include <linux/memblock.h>
>  #include <linux/mem_encrypt.h>
> +#include <linux/kexec.h>
>  
>  #include <asm/set_memory.h>
>  #include <asm/pgtable.h>
> @@ -34,6 +35,7 @@ void __init reserve_real_mode(void)
>  
>  	memblock_reserve(mem, size);
>  	set_real_mode_mem(mem);
> +	kexec_reserve_low_1MiB();
>  }
>  
>  static void __init setup_real_mode(void)
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 1776eb2e43a4..988bf2de51a7 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -306,6 +306,7 @@ extern void __crash_kexec(struct pt_regs *);
>  extern void crash_kexec(struct pt_regs *);
>  int kexec_should_crash(struct task_struct *);
>  int kexec_crash_loaded(void);
> +void __init kexec_reserve_low_1MiB(void);
>  void crash_save_cpu(struct pt_regs *regs, int cpu);
>  extern int kimage_crash_copy_vmcoreinfo(struct kimage *image);
>  
> @@ -397,6 +398,7 @@ static inline void __crash_kexec(struct pt_regs *regs) { }
>  static inline void crash_kexec(struct pt_regs *regs) { }
>  static inline int kexec_should_crash(struct task_struct *p) { return 0; }
>  static inline int kexec_crash_loaded(void) { return 0; }
> +static inline void __init kexec_reserve_low_1MiB(void) { }
>  #define kexec_in_progress false
>  #endif /* CONFIG_KEXEC_CORE */
>  
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 15d70a90b50d..8856047bcdc8 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -1213,3 +1213,6 @@ void __weak arch_kexec_protect_crashkres(void)
>  
>  void __weak arch_kexec_unprotect_crashkres(void)
>  {}
> +
> +void __init __weak kexec_reserve_low_1MiB(void)
> +{}

Hi Lianbo,

Since this is x86 only, add a weak function sounds not necessary.
For example can just move it to  arch/x86/include/asm/kexec.h
Or maybe even better to limit it to kdump only, move it to 
arch/x86/include/asm/crash.h
and arch/x86/kernel/crash.c

Thanks
Dave


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2 v6] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified
  2019-10-28  3:19   ` Dave Young
@ 2019-10-28  3:37     ` lijiang
  0 siblings, 0 replies; 5+ messages in thread
From: lijiang @ 2019-10-28  3:37 UTC (permalink / raw)
  To: Dave Young
  Cc: linux-kernel, tglx, mingo, bp, hpa, x86, bhe, jgross, dhowells,
	Thomas.Lendacky, ebiederm, vgoyal, d.hatayama, horms, kexec

在 2019年10月28日 11:19, Dave Young 写道:
> On 10/28/19 at 10:45am, Lianbo Jiang wrote:
>> Kdump kernel will reuse the first 640k region because the real mode
>> trampoline has to work in this area. When the vmcore is dumped, the
>> old memory in this area may be accessed, therefore, kernel has to
>> copy the contents of the first 640k area to a backup region so that
>> kdump kernel can read the old memory from the backup area of the
>> first 640k area, which is done in the purgatory().
>>
>> But, the current handling of copying the first 640k area runs into
>> problems when SME is enabled, kernel does not properly copy these
>> old memory to the backup area in the purgatory(), thereby, kdump
>> kernel reads out the encrypted contents, because the kdump kernel
>> must access the first kernel's memory with the encryption bit set
>> when SME is enabled in the first kernel. Please refer to this link:
>>
>> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793
>>
>> Finally, it causes the following errors, and the crash tool gets
>> invalid pointers when parsing the vmcore.
>>
>> crash> kmem -s|grep -i invalid
>> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
>> kmem: dma-kmalloc-512: slab:ffffd77680001c00 invalid freepointer:a6086ac099f0c5a4
>> crash>
>>
>> To avoid the above errors, when the crashkernel option is specified,
>> lets reserve the remaining low 1MiB memory(after reserving real mode
>> memory) so that the allocated memory does not fall into the low 1MiB
>> area, which makes us not to copy the first 640k content to a backup
>> region in purgatory(). This indicates that it does not need to be
>> included in crash dumps or used for anything except the processor
>> trampolines that must live in the low 1MiB.
>>
>> Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
>> ---
>> BTW:I also tried to fix the above problem in purgatory(), but there
>> are too many restricts in purgatory() context, for example: i can't
>> allocate new memory to create the identity mapping page table for
>> SME situation.
>>
>> Currently, there are two places where the first 640k area is needed,
>> the first one is in the find_trampoline_placement(), another one is
>> in the reserve_real_mode(), and their content doesn't matter.
>>
>> In addition, also need to clean all the code related to the backup
>> region later.
>>
>>  arch/x86/kernel/machine_kexec_64.c | 15 +++++++++++++++
>>  arch/x86/realmode/init.c           |  2 ++
>>  include/linux/kexec.h              |  2 ++
>>  kernel/kexec_core.c                |  3 +++
>>  4 files changed, 22 insertions(+)
>>
>> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
>> index 5dcd438ad8f2..42d7c15c45f1 100644
>> --- a/arch/x86/kernel/machine_kexec_64.c
>> +++ b/arch/x86/kernel/machine_kexec_64.c
>> @@ -17,6 +17,7 @@
>>  #include <linux/suspend.h>
>>  #include <linux/vmalloc.h>
>>  #include <linux/efi.h>
>> +#include <linux/memblock.h>
>>  
>>  #include <asm/init.h>
>>  #include <asm/pgtable.h>
>> @@ -27,6 +28,7 @@
>>  #include <asm/kexec-bzimage64.h>
>>  #include <asm/setup.h>
>>  #include <asm/set_memory.h>
>> +#include <asm/cmdline.h>
>>  
>>  #ifdef CONFIG_ACPI
>>  /*
>> @@ -687,3 +689,16 @@ void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
>>  	 */
>>  	set_memory_encrypted((unsigned long)vaddr, pages);
>>  }
>> +
>> +/*
>> + * When the crashkernel option is specified, only use the low
>> + * 1MiB for the real mode trampoline.
>> + */
>> +void __init kexec_reserve_low_1MiB(void)
>> +{
>> +	if (cmdline_find_option(boot_command_line, "crashkernel",
>> +				NULL, 0) > 0) {
>> +		memblock_reserve(0, 1<<20);
>> +		pr_info("Reserving the low 1MiB of memory for crashkernel\n");
>> +	}
>> +}
>> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
>> index 7dce39c8c034..064cc79a015d 100644
>> --- a/arch/x86/realmode/init.c
>> +++ b/arch/x86/realmode/init.c
>> @@ -3,6 +3,7 @@
>>  #include <linux/slab.h>
>>  #include <linux/memblock.h>
>>  #include <linux/mem_encrypt.h>
>> +#include <linux/kexec.h>
>>  
>>  #include <asm/set_memory.h>
>>  #include <asm/pgtable.h>
>> @@ -34,6 +35,7 @@ void __init reserve_real_mode(void)
>>  
>>  	memblock_reserve(mem, size);
>>  	set_real_mode_mem(mem);
>> +	kexec_reserve_low_1MiB();
>>  }
>>  
>>  static void __init setup_real_mode(void)
>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>> index 1776eb2e43a4..988bf2de51a7 100644
>> --- a/include/linux/kexec.h
>> +++ b/include/linux/kexec.h
>> @@ -306,6 +306,7 @@ extern void __crash_kexec(struct pt_regs *);
>>  extern void crash_kexec(struct pt_regs *);
>>  int kexec_should_crash(struct task_struct *);
>>  int kexec_crash_loaded(void);
>> +void __init kexec_reserve_low_1MiB(void);
>>  void crash_save_cpu(struct pt_regs *regs, int cpu);
>>  extern int kimage_crash_copy_vmcoreinfo(struct kimage *image);
>>  
>> @@ -397,6 +398,7 @@ static inline void __crash_kexec(struct pt_regs *regs) { }
>>  static inline void crash_kexec(struct pt_regs *regs) { }
>>  static inline int kexec_should_crash(struct task_struct *p) { return 0; }
>>  static inline int kexec_crash_loaded(void) { return 0; }
>> +static inline void __init kexec_reserve_low_1MiB(void) { }
>>  #define kexec_in_progress false
>>  #endif /* CONFIG_KEXEC_CORE */
>>  
>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>> index 15d70a90b50d..8856047bcdc8 100644
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c
>> @@ -1213,3 +1213,6 @@ void __weak arch_kexec_protect_crashkres(void)
>>  
>>  void __weak arch_kexec_unprotect_crashkres(void)
>>  {}
>> +
>> +void __init __weak kexec_reserve_low_1MiB(void)
>> +{}
> 
> Hi Lianbo,
> 
> Since this is x86 only, add a weak function sounds not necessary.
> For example can just move it to  arch/x86/include/asm/kexec.h
> Or maybe even better to limit it to kdump only, move it to 
> arch/x86/include/asm/crash.h
> and arch/x86/kernel/crash.c
> 
Good point. I will improve them and then post again.

Thanks.
Lianbo

> Thanks
> Dave
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-10-28  3:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-28  2:45 [PATCH 0/2 v6] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active Lianbo Jiang
2019-10-28  2:45 ` [PATCH 1/2 v6] x86/kdump: always reserve the low 1MiB when the crashkernel option is specified Lianbo Jiang
2019-10-28  3:19   ` Dave Young
2019-10-28  3:37     ` lijiang
2019-10-28  2:45 ` [PATCH 2/2 v6] x86/kdump: clean up all the code related to the backup region Lianbo Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).