All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] faster kexec reboot
@ 2022-07-25  8:38 ` Albert Huang
  0 siblings, 0 replies; 32+ messages in thread
From: Albert Huang @ 2022-07-25  8:38 UTC (permalink / raw)
  Cc: huangjie.albert, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Eric Biederman,
	Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Kirill A. Shutemov, Michael Roth, Kuppuswamy Sathyanarayanan,
	Nathan Chancellor, Peter Zijlstra, Sean Christopherson,
	Joerg Roedel, Mark Rutland, Kees Cook, linux-kernel, kexec,
	linux-kbuild

From: "huangjie.albert" <huangjie.albert@bytedance.com>

In many time-sensitive scenarios, we need a shorter time to restart 
the kernel. However, in the current kexec fast restart code, there 
are many places in the memory copy operation, verification operation 
and decompression operation, which take more time than 500ms. Through 
the following patch series. machine_kexec-->start_kernel only takes 15ms

How to measure time:

c code:
uint64_t current_cycles(void)
{
    uint32_t low, high;
    asm volatile("rdtsc" : "=a"(low), "=d"(high));
    return ((uint64_t)low) | ((uint64_t)high << 32);
}
assembly code:
       pushq %rax
       pushq %rdx
       rdtsc
       mov   %eax,%eax
       shl   $0x20,%rdx
       or    %rax,%rdx
       movq  %rdx,0x840(%r14)
       popq  %rdx
       popq  %rax
the timestamp may store in boot_params or kexec control page, so we can
get the all timestamp after kernel boot up.

huangjie.albert (4):
  kexec: reuse crash kernel reserved memory for normal kexec
  kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG
  x86: Support the uncompressed kernel to speed up booting
  x86: boot: avoid memory copy if kernel is uncompressed

 arch/x86/Kconfig                   | 10 +++++++++
 arch/x86/boot/compressed/Makefile  |  5 ++++-
 arch/x86/boot/compressed/head_64.S |  8 +++++--
 arch/x86/boot/compressed/misc.c    | 35 +++++++++++++++++++++++++-----
 arch/x86/purgatory/purgatory.c     |  7 ++++++
 include/linux/kexec.h              |  9 ++++----
 include/uapi/linux/kexec.h         |  2 ++
 kernel/kexec.c                     | 19 +++++++++++++++-
 kernel/kexec_core.c                | 16 ++++++++------
 kernel/kexec_file.c                | 20 +++++++++++++++--
 scripts/Makefile.lib               |  5 +++++
 11 files changed, 114 insertions(+), 22 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 0/4] faster kexec reboot
@ 2022-07-25  8:38 ` Albert Huang
  0 siblings, 0 replies; 32+ messages in thread
From: Albert Huang @ 2022-07-25  8:38 UTC (permalink / raw)
  Cc: huangjie.albert, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Eric Biederman,
	Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Kirill A. Shutemov, Michael Roth, Kuppuswamy Sathyanarayanan,
	Nathan Chancellor, Peter Zijlstra, Sean Christopherson,
	Joerg Roedel, Mark Rutland, Kees Cook, linux-kernel, kexec,
	linux-kbuild

From: "huangjie.albert" <huangjie.albert@bytedance.com>

In many time-sensitive scenarios, we need a shorter time to restart 
the kernel. However, in the current kexec fast restart code, there 
are many places in the memory copy operation, verification operation 
and decompression operation, which take more time than 500ms. Through 
the following patch series. machine_kexec-->start_kernel only takes 15ms

How to measure time:

c code:
uint64_t current_cycles(void)
{
    uint32_t low, high;
    asm volatile("rdtsc" : "=a"(low), "=d"(high));
    return ((uint64_t)low) | ((uint64_t)high << 32);
}
assembly code:
       pushq %rax
       pushq %rdx
       rdtsc
       mov   %eax,%eax
       shl   $0x20,%rdx
       or    %rax,%rdx
       movq  %rdx,0x840(%r14)
       popq  %rdx
       popq  %rax
the timestamp may store in boot_params or kexec control page, so we can
get the all timestamp after kernel boot up.

huangjie.albert (4):
  kexec: reuse crash kernel reserved memory for normal kexec
  kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG
  x86: Support the uncompressed kernel to speed up booting
  x86: boot: avoid memory copy if kernel is uncompressed

 arch/x86/Kconfig                   | 10 +++++++++
 arch/x86/boot/compressed/Makefile  |  5 ++++-
 arch/x86/boot/compressed/head_64.S |  8 +++++--
 arch/x86/boot/compressed/misc.c    | 35 +++++++++++++++++++++++++-----
 arch/x86/purgatory/purgatory.c     |  7 ++++++
 include/linux/kexec.h              |  9 ++++----
 include/uapi/linux/kexec.h         |  2 ++
 kernel/kexec.c                     | 19 +++++++++++++++-
 kernel/kexec_core.c                | 16 ++++++++------
 kernel/kexec_file.c                | 20 +++++++++++++++--
 scripts/Makefile.lib               |  5 +++++
 11 files changed, 114 insertions(+), 22 deletions(-)

-- 
2.31.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/4] kexec: reuse crash kernel reserved memory for normal kexec
  2022-07-25  8:38 ` Albert Huang
@ 2022-07-25  8:38   ` Albert Huang
  -1 siblings, 0 replies; 32+ messages in thread
From: Albert Huang @ 2022-07-25  8:38 UTC (permalink / raw)
  Cc: huangjie.albert, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Eric Biederman,
	Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Kirill A. Shutemov, Kuppuswamy Sathyanarayanan, Michael Roth,
	Nathan Chancellor, Ard Biesheuvel, Joerg Roedel, Mark Rutland,
	Peter Zijlstra, Sean Christopherson, Kees Cook, linux-kernel,
	kexec, linux-kbuild

From: "huangjie.albert" <huangjie.albert@bytedance.com>

normally, for kexec reboot, each segment of the second os
(such as : kernel、initrd、etc.) will be copied to discontinuous
physical memory during kexec load.  and then a memory copy will
be performed when kexec -e is executed to copy each segment of
the second os to contiguous physical memory, which will Affects
the time the kexec switch to the  new os. Therefore, if we reuse
the crash kernel reserved memory for kexec. When kexec loads the
second os, each segment of the second OS is directly copied to the
contiguous physical memory, so there is no need to make a second copy
when kexec -e is executed later.

The kexec userspace tool also needs to add parameter options(-r) that
support the use of reserved memory (see another patch for kexec)

examples:
bzimage: 53M initramfs: 28M
can save aboat 40 ms, The larger the image size, the greater the time
savings

Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
---
 include/linux/kexec.h      |  9 +++++----
 include/uapi/linux/kexec.h |  2 ++
 kernel/kexec.c             | 19 ++++++++++++++++++-
 kernel/kexec_core.c        | 16 +++++++++-------
 kernel/kexec_file.c        | 20 ++++++++++++++++++--
 5 files changed, 52 insertions(+), 14 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 475683cd67f1..9a8b9932b42a 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -305,9 +305,10 @@ struct kimage {
 	unsigned long control_page;
 
 	/* Flags to indicate special processing */
-	unsigned int type : 1;
+	unsigned int type : 2;
 #define KEXEC_TYPE_DEFAULT 0
 #define KEXEC_TYPE_CRASH   1
+#define KEXEC_TYPE_RESERVED_MEM 2
 	unsigned int preserve_context : 1;
 	/* If set, we are using file mode kexec syscall */
 	unsigned int file_mode:1;
@@ -377,14 +378,14 @@ extern int kexec_load_disabled;
 
 /* List of defined/legal kexec flags */
 #ifndef CONFIG_KEXEC_JUMP
-#define KEXEC_FLAGS    KEXEC_ON_CRASH
+#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_RESERVED_MEM)
 #else
-#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT)
+#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_RESERVED_MEM)
 #endif
 
 /* List of defined/legal kexec file flags */
 #define KEXEC_FILE_FLAGS	(KEXEC_FILE_UNLOAD | KEXEC_FILE_ON_CRASH | \
-				 KEXEC_FILE_NO_INITRAMFS)
+				 KEXEC_FILE_NO_INITRAMFS | KEXEC_FILE_RESERVED_MEM)
 
 /* flag to track if kexec reboot is in progress */
 extern bool kexec_in_progress;
diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
index 981016e05cfa..c29011eb7fc2 100644
--- a/include/uapi/linux/kexec.h
+++ b/include/uapi/linux/kexec.h
@@ -12,6 +12,7 @@
 /* kexec flags for different usage scenarios */
 #define KEXEC_ON_CRASH		0x00000001
 #define KEXEC_PRESERVE_CONTEXT	0x00000002
+#define KEXEC_RESERVED_MEM	0x00000004
 #define KEXEC_ARCH_MASK		0xffff0000
 
 /*
@@ -24,6 +25,7 @@
 #define KEXEC_FILE_UNLOAD	0x00000001
 #define KEXEC_FILE_ON_CRASH	0x00000002
 #define KEXEC_FILE_NO_INITRAMFS	0x00000004
+#define KEXEC_FILE_RESERVED_MEM 0x00000008
 
 /* These values match the ELF architecture values.
  * Unless there is a good reason that should continue to be the case.
diff --git a/kernel/kexec.c b/kernel/kexec.c
index b5e40f069768..0d9ea52c81c1 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -27,8 +27,14 @@ static int kimage_alloc_init(struct kimage **rimage, unsigned long entry,
 	int ret;
 	struct kimage *image;
 	bool kexec_on_panic = flags & KEXEC_ON_CRASH;
+	bool kexec_on_reserved = flags & KEXEC_RESERVED_MEM;
 
-	if (kexec_on_panic) {
+	if (kexec_on_panic && kexec_on_reserved) {
+		pr_err("both kexec_on_panic and kexec_on_reserved is true, they can not coexist");
+		return -EINVAL;
+	}
+
+	if (kexec_on_panic || kexec_on_reserved) {
 		/* Verify we have a valid entry point */
 		if ((entry < phys_to_boot_phys(crashk_res.start)) ||
 		    (entry > phys_to_boot_phys(crashk_res.end)))
@@ -50,6 +56,12 @@ static int kimage_alloc_init(struct kimage **rimage, unsigned long entry,
 		image->type = KEXEC_TYPE_CRASH;
 	}
 
+	if (kexec_on_reserved) {
+		/* Enable special reserved kernel control page alloc policy. */
+		image->control_page = crashk_res.start;
+		image->type = KEXEC_TYPE_RESERVED_MEM;
+	}
+
 	ret = sanity_check_segment_list(image);
 	if (ret)
 		goto out_free_image;
@@ -110,6 +122,11 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
 		dest_image = &kexec_image;
 	}
 
+	if (flags & KEXEC_RESERVED_MEM) {
+		if (kexec_crash_image)
+			arch_kexec_unprotect_crashkres();
+	}
+
 	if (nr_segments == 0) {
 		/* Uninstall image */
 		kimage_free(xchg(dest_image, NULL));
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 4d34c78334ce..6220c2e0d6f7 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -230,13 +230,13 @@ int sanity_check_segment_list(struct kimage *image)
 	 * Verify we have good destination addresses.  Normally
 	 * the caller is responsible for making certain we don't
 	 * attempt to load the new image into invalid or reserved
-	 * areas of RAM.  But crash kernels are preloaded into a
+	 * areas of RAM.  But crash kernels (or we specify to load
+	 * the new image into reserved areas) are preloaded into a
 	 * reserved area of ram.  We must ensure the addresses
 	 * are in the reserved area otherwise preloading the
 	 * kernel could corrupt things.
 	 */
-
-	if (image->type == KEXEC_TYPE_CRASH) {
+	if (image->type == KEXEC_TYPE_CRASH || image->type == KEXEC_TYPE_RESERVED_MEM) {
 		for (i = 0; i < nr_segments; i++) {
 			unsigned long mstart, mend;
 
@@ -414,7 +414,7 @@ static struct page *kimage_alloc_normal_control_pages(struct kimage *image,
 	return pages;
 }
 
-static struct page *kimage_alloc_crash_control_pages(struct kimage *image,
+static struct page *kimage_alloc_reserverd_control_pages(struct kimage *image,
 						      unsigned int order)
 {
 	/* Control pages are special, they are the intermediaries
@@ -491,7 +491,8 @@ struct page *kimage_alloc_control_pages(struct kimage *image,
 		pages = kimage_alloc_normal_control_pages(image, order);
 		break;
 	case KEXEC_TYPE_CRASH:
-		pages = kimage_alloc_crash_control_pages(image, order);
+	case KEXEC_TYPE_RESERVED_MEM:
+		pages = kimage_alloc_reserverd_control_pages(image, order);
 		break;
 	}
 
@@ -846,7 +847,7 @@ static int kimage_load_normal_segment(struct kimage *image,
 	return result;
 }
 
-static int kimage_load_crash_segment(struct kimage *image,
+static int kimage_load_reserved_segment(struct kimage *image,
 					struct kexec_segment *segment)
 {
 	/* For crash dumps kernels we simply copy the data from
@@ -924,7 +925,8 @@ int kimage_load_segment(struct kimage *image,
 		result = kimage_load_normal_segment(image, segment);
 		break;
 	case KEXEC_TYPE_CRASH:
-		result = kimage_load_crash_segment(image, segment);
+	case KEXEC_TYPE_RESERVED_MEM:
+		result = kimage_load_reserved_segment(image, segment);
 		break;
 	}
 
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index f9261c07b048..5242ad7e5302 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -277,7 +277,7 @@ kimage_file_alloc_init(struct kimage **rimage, int kernel_fd,
 	int ret;
 	struct kimage *image;
 	bool kexec_on_panic = flags & KEXEC_FILE_ON_CRASH;
-
+	bool kexec_on_reserved = flags & KEXEC_FILE_RESERVED_MEM;
 	image = do_kimage_alloc_init();
 	if (!image)
 		return -ENOMEM;
@@ -290,6 +290,12 @@ kimage_file_alloc_init(struct kimage **rimage, int kernel_fd,
 		image->type = KEXEC_TYPE_CRASH;
 	}
 
+	if (kexec_on_reserved) {
+		/* Enable special crash kernel control page alloc policy. */
+		image->control_page = crashk_res.start;
+		image->type = KEXEC_TYPE_RESERVED_MEM;
+	}
+
 	ret = kimage_file_prepare_segments(image, kernel_fd, initrd_fd,
 					   cmdline_ptr, cmdline_len, flags);
 	if (ret)
@@ -346,6 +352,11 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
 	if (!mutex_trylock(&kexec_mutex))
 		return -EBUSY;
 
+	if ((flags & KEXEC_FILE_ON_CRASH) && (flags & KEXEC_FILE_RESERVED_MEM)) {
+		pr_err("both kexec_on_panic and kexec_on_reserved is true, they can not coexist");
+		return -EINVAL;
+	}
+
 	dest_image = &kexec_image;
 	if (flags & KEXEC_FILE_ON_CRASH) {
 		dest_image = &kexec_crash_image;
@@ -353,6 +364,11 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
 			arch_kexec_unprotect_crashkres();
 	}
 
+	if (flags & KEXEC_FILE_RESERVED_MEM) {
+		if (kexec_crash_image)
+			arch_kexec_unprotect_crashkres();
+	}
+
 	if (flags & KEXEC_FILE_UNLOAD)
 		goto exchange;
 
@@ -588,7 +604,7 @@ static int kexec_walk_memblock(struct kexec_buf *kbuf,
 static int kexec_walk_resources(struct kexec_buf *kbuf,
 				int (*func)(struct resource *, void *))
 {
-	if (kbuf->image->type == KEXEC_TYPE_CRASH)
+	if (kbuf->image->type == KEXEC_TYPE_CRASH || kbuf->image->type == KEXEC_TYPE_RESERVED_MEM)
 		return walk_iomem_res_desc(crashk_res.desc,
 					   IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
 					   crashk_res.start, crashk_res.end,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 1/4] kexec: reuse crash kernel reserved memory for normal kexec
@ 2022-07-25  8:38   ` Albert Huang
  0 siblings, 0 replies; 32+ messages in thread
From: Albert Huang @ 2022-07-25  8:38 UTC (permalink / raw)
  Cc: huangjie.albert, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Eric Biederman,
	Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Kirill A. Shutemov, Kuppuswamy Sathyanarayanan, Michael Roth,
	Nathan Chancellor, Ard Biesheuvel, Joerg Roedel, Mark Rutland,
	Peter Zijlstra, Sean Christopherson, Kees Cook, linux-kernel,
	kexec, linux-kbuild

From: "huangjie.albert" <huangjie.albert@bytedance.com>

normally, for kexec reboot, each segment of the second os
(such as : kernel、initrd、etc.) will be copied to discontinuous
physical memory during kexec load.  and then a memory copy will
be performed when kexec -e is executed to copy each segment of
the second os to contiguous physical memory, which will Affects
the time the kexec switch to the  new os. Therefore, if we reuse
the crash kernel reserved memory for kexec. When kexec loads the
second os, each segment of the second OS is directly copied to the
contiguous physical memory, so there is no need to make a second copy
when kexec -e is executed later.

The kexec userspace tool also needs to add parameter options(-r) that
support the use of reserved memory (see another patch for kexec)

examples:
bzimage: 53M initramfs: 28M
can save aboat 40 ms, The larger the image size, the greater the time
savings

Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
---
 include/linux/kexec.h      |  9 +++++----
 include/uapi/linux/kexec.h |  2 ++
 kernel/kexec.c             | 19 ++++++++++++++++++-
 kernel/kexec_core.c        | 16 +++++++++-------
 kernel/kexec_file.c        | 20 ++++++++++++++++++--
 5 files changed, 52 insertions(+), 14 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 475683cd67f1..9a8b9932b42a 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -305,9 +305,10 @@ struct kimage {
 	unsigned long control_page;
 
 	/* Flags to indicate special processing */
-	unsigned int type : 1;
+	unsigned int type : 2;
 #define KEXEC_TYPE_DEFAULT 0
 #define KEXEC_TYPE_CRASH   1
+#define KEXEC_TYPE_RESERVED_MEM 2
 	unsigned int preserve_context : 1;
 	/* If set, we are using file mode kexec syscall */
 	unsigned int file_mode:1;
@@ -377,14 +378,14 @@ extern int kexec_load_disabled;
 
 /* List of defined/legal kexec flags */
 #ifndef CONFIG_KEXEC_JUMP
-#define KEXEC_FLAGS    KEXEC_ON_CRASH
+#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_RESERVED_MEM)
 #else
-#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT)
+#define KEXEC_FLAGS    (KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT | KEXEC_RESERVED_MEM)
 #endif
 
 /* List of defined/legal kexec file flags */
 #define KEXEC_FILE_FLAGS	(KEXEC_FILE_UNLOAD | KEXEC_FILE_ON_CRASH | \
-				 KEXEC_FILE_NO_INITRAMFS)
+				 KEXEC_FILE_NO_INITRAMFS | KEXEC_FILE_RESERVED_MEM)
 
 /* flag to track if kexec reboot is in progress */
 extern bool kexec_in_progress;
diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
index 981016e05cfa..c29011eb7fc2 100644
--- a/include/uapi/linux/kexec.h
+++ b/include/uapi/linux/kexec.h
@@ -12,6 +12,7 @@
 /* kexec flags for different usage scenarios */
 #define KEXEC_ON_CRASH		0x00000001
 #define KEXEC_PRESERVE_CONTEXT	0x00000002
+#define KEXEC_RESERVED_MEM	0x00000004
 #define KEXEC_ARCH_MASK		0xffff0000
 
 /*
@@ -24,6 +25,7 @@
 #define KEXEC_FILE_UNLOAD	0x00000001
 #define KEXEC_FILE_ON_CRASH	0x00000002
 #define KEXEC_FILE_NO_INITRAMFS	0x00000004
+#define KEXEC_FILE_RESERVED_MEM 0x00000008
 
 /* These values match the ELF architecture values.
  * Unless there is a good reason that should continue to be the case.
diff --git a/kernel/kexec.c b/kernel/kexec.c
index b5e40f069768..0d9ea52c81c1 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -27,8 +27,14 @@ static int kimage_alloc_init(struct kimage **rimage, unsigned long entry,
 	int ret;
 	struct kimage *image;
 	bool kexec_on_panic = flags & KEXEC_ON_CRASH;
+	bool kexec_on_reserved = flags & KEXEC_RESERVED_MEM;
 
-	if (kexec_on_panic) {
+	if (kexec_on_panic && kexec_on_reserved) {
+		pr_err("both kexec_on_panic and kexec_on_reserved is true, they can not coexist");
+		return -EINVAL;
+	}
+
+	if (kexec_on_panic || kexec_on_reserved) {
 		/* Verify we have a valid entry point */
 		if ((entry < phys_to_boot_phys(crashk_res.start)) ||
 		    (entry > phys_to_boot_phys(crashk_res.end)))
@@ -50,6 +56,12 @@ static int kimage_alloc_init(struct kimage **rimage, unsigned long entry,
 		image->type = KEXEC_TYPE_CRASH;
 	}
 
+	if (kexec_on_reserved) {
+		/* Enable special reserved kernel control page alloc policy. */
+		image->control_page = crashk_res.start;
+		image->type = KEXEC_TYPE_RESERVED_MEM;
+	}
+
 	ret = sanity_check_segment_list(image);
 	if (ret)
 		goto out_free_image;
@@ -110,6 +122,11 @@ static int do_kexec_load(unsigned long entry, unsigned long nr_segments,
 		dest_image = &kexec_image;
 	}
 
+	if (flags & KEXEC_RESERVED_MEM) {
+		if (kexec_crash_image)
+			arch_kexec_unprotect_crashkres();
+	}
+
 	if (nr_segments == 0) {
 		/* Uninstall image */
 		kimage_free(xchg(dest_image, NULL));
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 4d34c78334ce..6220c2e0d6f7 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -230,13 +230,13 @@ int sanity_check_segment_list(struct kimage *image)
 	 * Verify we have good destination addresses.  Normally
 	 * the caller is responsible for making certain we don't
 	 * attempt to load the new image into invalid or reserved
-	 * areas of RAM.  But crash kernels are preloaded into a
+	 * areas of RAM.  But crash kernels (or we specify to load
+	 * the new image into reserved areas) are preloaded into a
 	 * reserved area of ram.  We must ensure the addresses
 	 * are in the reserved area otherwise preloading the
 	 * kernel could corrupt things.
 	 */
-
-	if (image->type == KEXEC_TYPE_CRASH) {
+	if (image->type == KEXEC_TYPE_CRASH || image->type == KEXEC_TYPE_RESERVED_MEM) {
 		for (i = 0; i < nr_segments; i++) {
 			unsigned long mstart, mend;
 
@@ -414,7 +414,7 @@ static struct page *kimage_alloc_normal_control_pages(struct kimage *image,
 	return pages;
 }
 
-static struct page *kimage_alloc_crash_control_pages(struct kimage *image,
+static struct page *kimage_alloc_reserverd_control_pages(struct kimage *image,
 						      unsigned int order)
 {
 	/* Control pages are special, they are the intermediaries
@@ -491,7 +491,8 @@ struct page *kimage_alloc_control_pages(struct kimage *image,
 		pages = kimage_alloc_normal_control_pages(image, order);
 		break;
 	case KEXEC_TYPE_CRASH:
-		pages = kimage_alloc_crash_control_pages(image, order);
+	case KEXEC_TYPE_RESERVED_MEM:
+		pages = kimage_alloc_reserverd_control_pages(image, order);
 		break;
 	}
 
@@ -846,7 +847,7 @@ static int kimage_load_normal_segment(struct kimage *image,
 	return result;
 }
 
-static int kimage_load_crash_segment(struct kimage *image,
+static int kimage_load_reserved_segment(struct kimage *image,
 					struct kexec_segment *segment)
 {
 	/* For crash dumps kernels we simply copy the data from
@@ -924,7 +925,8 @@ int kimage_load_segment(struct kimage *image,
 		result = kimage_load_normal_segment(image, segment);
 		break;
 	case KEXEC_TYPE_CRASH:
-		result = kimage_load_crash_segment(image, segment);
+	case KEXEC_TYPE_RESERVED_MEM:
+		result = kimage_load_reserved_segment(image, segment);
 		break;
 	}
 
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index f9261c07b048..5242ad7e5302 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -277,7 +277,7 @@ kimage_file_alloc_init(struct kimage **rimage, int kernel_fd,
 	int ret;
 	struct kimage *image;
 	bool kexec_on_panic = flags & KEXEC_FILE_ON_CRASH;
-
+	bool kexec_on_reserved = flags & KEXEC_FILE_RESERVED_MEM;
 	image = do_kimage_alloc_init();
 	if (!image)
 		return -ENOMEM;
@@ -290,6 +290,12 @@ kimage_file_alloc_init(struct kimage **rimage, int kernel_fd,
 		image->type = KEXEC_TYPE_CRASH;
 	}
 
+	if (kexec_on_reserved) {
+		/* Enable special crash kernel control page alloc policy. */
+		image->control_page = crashk_res.start;
+		image->type = KEXEC_TYPE_RESERVED_MEM;
+	}
+
 	ret = kimage_file_prepare_segments(image, kernel_fd, initrd_fd,
 					   cmdline_ptr, cmdline_len, flags);
 	if (ret)
@@ -346,6 +352,11 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
 	if (!mutex_trylock(&kexec_mutex))
 		return -EBUSY;
 
+	if ((flags & KEXEC_FILE_ON_CRASH) && (flags & KEXEC_FILE_RESERVED_MEM)) {
+		pr_err("both kexec_on_panic and kexec_on_reserved is true, they can not coexist");
+		return -EINVAL;
+	}
+
 	dest_image = &kexec_image;
 	if (flags & KEXEC_FILE_ON_CRASH) {
 		dest_image = &kexec_crash_image;
@@ -353,6 +364,11 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
 			arch_kexec_unprotect_crashkres();
 	}
 
+	if (flags & KEXEC_FILE_RESERVED_MEM) {
+		if (kexec_crash_image)
+			arch_kexec_unprotect_crashkres();
+	}
+
 	if (flags & KEXEC_FILE_UNLOAD)
 		goto exchange;
 
@@ -588,7 +604,7 @@ static int kexec_walk_memblock(struct kexec_buf *kbuf,
 static int kexec_walk_resources(struct kexec_buf *kbuf,
 				int (*func)(struct resource *, void *))
 {
-	if (kbuf->image->type == KEXEC_TYPE_CRASH)
+	if (kbuf->image->type == KEXEC_TYPE_CRASH || kbuf->image->type == KEXEC_TYPE_RESERVED_MEM)
 		return walk_iomem_res_desc(crashk_res.desc,
 					   IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
 					   crashk_res.start, crashk_res.end,
-- 
2.31.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/4] kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG
  2022-07-25  8:38 ` Albert Huang
@ 2022-07-25  8:38   ` Albert Huang
  -1 siblings, 0 replies; 32+ messages in thread
From: Albert Huang @ 2022-07-25  8:38 UTC (permalink / raw)
  Cc: huangjie.albert, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Eric Biederman,
	Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Kirill A. Shutemov, Brijesh Singh, Michael Roth,
	Nathan Chancellor, Kuppuswamy Sathyanarayanan, Ard Biesheuvel,
	Peter Zijlstra, Sean Christopherson, Joerg Roedel, Mark Rutland,
	Kees Cook, linux-kernel, kexec, linux-kbuild

From: "huangjie.albert" <huangjie.albert@bytedance.com>

the verify_sha256_digest may cost 300+ ms in my test environment:
bzImage: 53M initramfs:28M

We can add a macro to control whether to enable this check. If we
can confirm that the data in this will not change, we can turn off
the check and get a faster startup.

Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
---
 arch/x86/Kconfig               | 9 +++++++++
 arch/x86/purgatory/purgatory.c | 7 +++++++
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 52a7f91527fe..adbd3a2bd60f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2052,6 +2052,15 @@ config KEXEC_BZIMAGE_VERIFY_SIG
 	help
 	  Enable bzImage signature verification support.
 
+config KEXEC_PURGATORY_SKIP_SIG
+	bool "skip kexec purgatory signature verification"
+	depends on ARCH_HAS_KEXEC_PURGATORY
+	help
+	  this options makes the kexec purgatory do  not signature verification
+	  which would get hundreds of milliseconds saved during kexec boot. If we can
+	  confirm that the data of each segment loaded by kexec will not change we may
+	  enable this option
+
 config CRASH_DUMP
 	bool "kernel crash dumps"
 	depends on X86_64 || (X86_32 && HIGHMEM)
diff --git a/arch/x86/purgatory/purgatory.c b/arch/x86/purgatory/purgatory.c
index 7558139920f8..b3f15774d86d 100644
--- a/arch/x86/purgatory/purgatory.c
+++ b/arch/x86/purgatory/purgatory.c
@@ -20,6 +20,12 @@ u8 purgatory_sha256_digest[SHA256_DIGEST_SIZE] __section(".kexec-purgatory");
 
 struct kexec_sha_region purgatory_sha_regions[KEXEC_SEGMENT_MAX] __section(".kexec-purgatory");
 
+#ifdef CONFIG_KEXEC_PURGATORY_SKIP_SIG
+static int verify_sha256_digest(void)
+{
+	return 0;
+}
+#else
 static int verify_sha256_digest(void)
 {
 	struct kexec_sha_region *ptr, *end;
@@ -39,6 +45,7 @@ static int verify_sha256_digest(void)
 
 	return 0;
 }
+#endif
 
 void purgatory(void)
 {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/4] kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG
@ 2022-07-25  8:38   ` Albert Huang
  0 siblings, 0 replies; 32+ messages in thread
From: Albert Huang @ 2022-07-25  8:38 UTC (permalink / raw)
  Cc: huangjie.albert, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Eric Biederman,
	Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Kirill A. Shutemov, Brijesh Singh, Michael Roth,
	Nathan Chancellor, Kuppuswamy Sathyanarayanan, Ard Biesheuvel,
	Peter Zijlstra, Sean Christopherson, Joerg Roedel, Mark Rutland,
	Kees Cook, linux-kernel, kexec, linux-kbuild

From: "huangjie.albert" <huangjie.albert@bytedance.com>

the verify_sha256_digest may cost 300+ ms in my test environment:
bzImage: 53M initramfs:28M

We can add a macro to control whether to enable this check. If we
can confirm that the data in this will not change, we can turn off
the check and get a faster startup.

Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
---
 arch/x86/Kconfig               | 9 +++++++++
 arch/x86/purgatory/purgatory.c | 7 +++++++
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 52a7f91527fe..adbd3a2bd60f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2052,6 +2052,15 @@ config KEXEC_BZIMAGE_VERIFY_SIG
 	help
 	  Enable bzImage signature verification support.
 
+config KEXEC_PURGATORY_SKIP_SIG
+	bool "skip kexec purgatory signature verification"
+	depends on ARCH_HAS_KEXEC_PURGATORY
+	help
+	  this options makes the kexec purgatory do  not signature verification
+	  which would get hundreds of milliseconds saved during kexec boot. If we can
+	  confirm that the data of each segment loaded by kexec will not change we may
+	  enable this option
+
 config CRASH_DUMP
 	bool "kernel crash dumps"
 	depends on X86_64 || (X86_32 && HIGHMEM)
diff --git a/arch/x86/purgatory/purgatory.c b/arch/x86/purgatory/purgatory.c
index 7558139920f8..b3f15774d86d 100644
--- a/arch/x86/purgatory/purgatory.c
+++ b/arch/x86/purgatory/purgatory.c
@@ -20,6 +20,12 @@ u8 purgatory_sha256_digest[SHA256_DIGEST_SIZE] __section(".kexec-purgatory");
 
 struct kexec_sha_region purgatory_sha_regions[KEXEC_SEGMENT_MAX] __section(".kexec-purgatory");
 
+#ifdef CONFIG_KEXEC_PURGATORY_SKIP_SIG
+static int verify_sha256_digest(void)
+{
+	return 0;
+}
+#else
 static int verify_sha256_digest(void)
 {
 	struct kexec_sha_region *ptr, *end;
@@ -39,6 +45,7 @@ static int verify_sha256_digest(void)
 
 	return 0;
 }
+#endif
 
 void purgatory(void)
 {
-- 
2.31.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/4] x86: Support the uncompressed kernel to speed up booting
  2022-07-25  8:38 ` Albert Huang
@ 2022-07-25  8:38   ` Albert Huang
  -1 siblings, 0 replies; 32+ messages in thread
From: Albert Huang @ 2022-07-25  8:38 UTC (permalink / raw)
  Cc: huangjie.albert, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Eric Biederman,
	Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Kirill A. Shutemov, Kuppuswamy Sathyanarayanan, Tony Luck,
	Michael Roth, Nathan Chancellor, Ard Biesheuvel, Mark Rutland,
	Joerg Roedel, Sean Christopherson, Peter Zijlstra, Kees Cook,
	linux-kernel, kexec, linux-kbuild

From: "huangjie.albert" <huangjie.albert@bytedance.com>

Although the compressed kernel can save the time of loading the
kernel into the memory and save the disk space for storing the kernel,
but in some time-sensitive scenarios, the time for decompressing the
kernel is intolerable. Therefore, it is necessary to support uncompressed
kernel images, so that the time of kernel decompression can be saved when
the kernel is started.

This part of the time on my machine is approximately:
image type      image  size      times
compressed(gzip) 8.5M            159ms
uncompressed     53M             8.5ms

Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
---
 arch/x86/Kconfig                  |  1 +
 arch/x86/boot/compressed/Makefile |  5 ++++-
 arch/x86/boot/compressed/misc.c   | 13 +++++++++++++
 scripts/Makefile.lib              |  5 +++++
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index adbd3a2bd60f..231187624c68 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -221,6 +221,7 @@ config X86
 	select HAVE_KERNEL_LZO
 	select HAVE_KERNEL_XZ
 	select HAVE_KERNEL_ZSTD
+	select HAVE_KERNEL_UNCOMPRESSED
 	select HAVE_KPROBES
 	select HAVE_KPROBES_ON_FTRACE
 	select HAVE_FUNCTION_ERROR_INJECTION
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 19e1905dcbf6..0c8417a2f792 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -26,7 +26,7 @@ OBJECT_FILES_NON_STANDARD	:= y
 KCOV_INSTRUMENT		:= n
 
 targets := vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma \
-	vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4 vmlinux.bin.zst
+	vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4 vmlinux.bin.zst vmlinux.bin.none
 
 # CLANG_FLAGS must come before any cc-disable-warning or cc-option calls in
 # case of cross compiling, as it has the '--target=' flag, which is needed to
@@ -139,6 +139,8 @@ $(obj)/vmlinux.bin.lz4: $(vmlinux.bin.all-y) FORCE
 	$(call if_changed,lz4_with_size)
 $(obj)/vmlinux.bin.zst: $(vmlinux.bin.all-y) FORCE
 	$(call if_changed,zstd22_with_size)
+$(obj)/vmlinux.bin.none: $(vmlinux.bin.all-y) FORCE
+	$(call if_changed,none)
 
 suffix-$(CONFIG_KERNEL_GZIP)	:= gz
 suffix-$(CONFIG_KERNEL_BZIP2)	:= bz2
@@ -147,6 +149,7 @@ suffix-$(CONFIG_KERNEL_XZ)	:= xz
 suffix-$(CONFIG_KERNEL_LZO) 	:= lzo
 suffix-$(CONFIG_KERNEL_LZ4) 	:= lz4
 suffix-$(CONFIG_KERNEL_ZSTD)	:= zst
+suffix-$(CONFIG_KERNEL_UNCOMPRESSED)	:= none
 
 quiet_cmd_mkpiggy = MKPIGGY $@
       cmd_mkpiggy = $(obj)/mkpiggy $< > $@
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index cf690d8712f4..c23c0f525d93 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -181,6 +181,19 @@ void __puthex(unsigned long value)
 	}
 }
 
+#ifdef CONFIG_KERNEL_UNCOMPRESSED
+#include <linux/decompress/mm.h>
+static int __decompress(unsigned char *buf, long len,
+				long (*fill)(void*, unsigned long),
+				long (*flush)(void*, unsigned long),
+				unsigned char *outbuf, long olen,
+				long *pos, void (*error)(char *x))
+{
+	memcpy(outbuf, buf, olen);
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_X86_NEED_RELOCS
 static void handle_relocations(void *output, unsigned long output_len,
 			       unsigned long virt_addr)
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 3fb6a99e78c4..c89d5466c617 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -438,6 +438,11 @@ quiet_cmd_lz4 = LZ4     $@
 quiet_cmd_lz4_with_size = LZ4     $@
       cmd_lz4_with_size = { cat $(real-prereqs) | $(LZ4) -l -c1 stdin stdout; \
                   $(size_append); } > $@
+# none
+quiet_cmd_none = NONE     $@
+      cmd_none = (cat $(filter-out FORCE,$^) && \
+      $(call size_append, $(filter-out FORCE,$^))) > $@ || \
+      (rm -f $@ ; false)
 
 # U-Boot mkimage
 # ---------------------------------------------------------------------------
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/4] x86: Support the uncompressed kernel to speed up booting
@ 2022-07-25  8:38   ` Albert Huang
  0 siblings, 0 replies; 32+ messages in thread
From: Albert Huang @ 2022-07-25  8:38 UTC (permalink / raw)
  Cc: huangjie.albert, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Eric Biederman,
	Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Kirill A. Shutemov, Kuppuswamy Sathyanarayanan, Tony Luck,
	Michael Roth, Nathan Chancellor, Ard Biesheuvel, Mark Rutland,
	Joerg Roedel, Sean Christopherson, Peter Zijlstra, Kees Cook,
	linux-kernel, kexec, linux-kbuild

From: "huangjie.albert" <huangjie.albert@bytedance.com>

Although the compressed kernel can save the time of loading the
kernel into the memory and save the disk space for storing the kernel,
but in some time-sensitive scenarios, the time for decompressing the
kernel is intolerable. Therefore, it is necessary to support uncompressed
kernel images, so that the time of kernel decompression can be saved when
the kernel is started.

This part of the time on my machine is approximately:
image type      image  size      times
compressed(gzip) 8.5M            159ms
uncompressed     53M             8.5ms

Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
---
 arch/x86/Kconfig                  |  1 +
 arch/x86/boot/compressed/Makefile |  5 ++++-
 arch/x86/boot/compressed/misc.c   | 13 +++++++++++++
 scripts/Makefile.lib              |  5 +++++
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index adbd3a2bd60f..231187624c68 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -221,6 +221,7 @@ config X86
 	select HAVE_KERNEL_LZO
 	select HAVE_KERNEL_XZ
 	select HAVE_KERNEL_ZSTD
+	select HAVE_KERNEL_UNCOMPRESSED
 	select HAVE_KPROBES
 	select HAVE_KPROBES_ON_FTRACE
 	select HAVE_FUNCTION_ERROR_INJECTION
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 19e1905dcbf6..0c8417a2f792 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -26,7 +26,7 @@ OBJECT_FILES_NON_STANDARD	:= y
 KCOV_INSTRUMENT		:= n
 
 targets := vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma \
-	vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4 vmlinux.bin.zst
+	vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4 vmlinux.bin.zst vmlinux.bin.none
 
 # CLANG_FLAGS must come before any cc-disable-warning or cc-option calls in
 # case of cross compiling, as it has the '--target=' flag, which is needed to
@@ -139,6 +139,8 @@ $(obj)/vmlinux.bin.lz4: $(vmlinux.bin.all-y) FORCE
 	$(call if_changed,lz4_with_size)
 $(obj)/vmlinux.bin.zst: $(vmlinux.bin.all-y) FORCE
 	$(call if_changed,zstd22_with_size)
+$(obj)/vmlinux.bin.none: $(vmlinux.bin.all-y) FORCE
+	$(call if_changed,none)
 
 suffix-$(CONFIG_KERNEL_GZIP)	:= gz
 suffix-$(CONFIG_KERNEL_BZIP2)	:= bz2
@@ -147,6 +149,7 @@ suffix-$(CONFIG_KERNEL_XZ)	:= xz
 suffix-$(CONFIG_KERNEL_LZO) 	:= lzo
 suffix-$(CONFIG_KERNEL_LZ4) 	:= lz4
 suffix-$(CONFIG_KERNEL_ZSTD)	:= zst
+suffix-$(CONFIG_KERNEL_UNCOMPRESSED)	:= none
 
 quiet_cmd_mkpiggy = MKPIGGY $@
       cmd_mkpiggy = $(obj)/mkpiggy $< > $@
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index cf690d8712f4..c23c0f525d93 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -181,6 +181,19 @@ void __puthex(unsigned long value)
 	}
 }
 
+#ifdef CONFIG_KERNEL_UNCOMPRESSED
+#include <linux/decompress/mm.h>
+static int __decompress(unsigned char *buf, long len,
+				long (*fill)(void*, unsigned long),
+				long (*flush)(void*, unsigned long),
+				unsigned char *outbuf, long olen,
+				long *pos, void (*error)(char *x))
+{
+	memcpy(outbuf, buf, olen);
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_X86_NEED_RELOCS
 static void handle_relocations(void *output, unsigned long output_len,
 			       unsigned long virt_addr)
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 3fb6a99e78c4..c89d5466c617 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -438,6 +438,11 @@ quiet_cmd_lz4 = LZ4     $@
 quiet_cmd_lz4_with_size = LZ4     $@
       cmd_lz4_with_size = { cat $(real-prereqs) | $(LZ4) -l -c1 stdin stdout; \
                   $(size_append); } > $@
+# none
+quiet_cmd_none = NONE     $@
+      cmd_none = (cat $(filter-out FORCE,$^) && \
+      $(call size_append, $(filter-out FORCE,$^))) > $@ || \
+      (rm -f $@ ; false)
 
 # U-Boot mkimage
 # ---------------------------------------------------------------------------
-- 
2.31.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/4] x86: boot: avoid memory copy if kernel is uncompressed
  2022-07-25  8:38 ` Albert Huang
@ 2022-07-25  8:38   ` Albert Huang
  -1 siblings, 0 replies; 32+ messages in thread
From: Albert Huang @ 2022-07-25  8:38 UTC (permalink / raw)
  Cc: huangjie.albert, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Eric Biederman,
	Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Kirill A. Shutemov, Kuppuswamy Sathyanarayanan, Michael Roth,
	Nathan Chancellor, Ard Biesheuvel, Mark Rutland,
	Sean Christopherson, Peter Zijlstra, Kees Cook, Tony Luck,
	linux-kernel, kexec, linux-kbuild

From: "huangjie.albert" <huangjie.albert@bytedance.com>

1、if kernel is uncompressed. we do not need to relocate
kernel image for decompression

2、if kaslr is disabled, we do not need to do a memory copy
before prase_elf.

Two memory copies can be skipped with this patch. this can
save aboat 20ms during booting.

Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
---
 arch/x86/boot/compressed/head_64.S |  8 ++++++--
 arch/x86/boot/compressed/misc.c    | 22 +++++++++++++++++-----
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index d33f060900d2..9e7770c7047b 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -398,10 +398,13 @@ SYM_CODE_START(startup_64)
 1:
 
 	/* Target address to relocate to for decompression */
+#ifdef CONFIG_KERNEL_UNCOMPRESSED
+	movq %rbp, %rbx
+#else
 	movl	BP_init_size(%rsi), %ebx
 	subl	$ rva(_end), %ebx
 	addq	%rbp, %rbx
-
+#endif
 	/* Set up the stack */
 	leaq	rva(boot_stack_end)(%rbx), %rsp
 
@@ -522,6 +525,7 @@ trampoline_return:
  * Copy the compressed kernel to the end of our buffer
  * where decompression in place becomes safe.
  */
+#ifndef CONFIG_KERNEL_UNCOMPRESSED
 	pushq	%rsi
 	leaq	(_bss-8)(%rip), %rsi
 	leaq	rva(_bss-8)(%rbx), %rdi
@@ -531,7 +535,7 @@ trampoline_return:
 	rep	movsq
 	cld
 	popq	%rsi
-
+#endif
 	/*
 	 * The GDT may get overwritten either during the copy we just did or
 	 * during extract_kernel below. To avoid any issues, repoint the GDTR
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index c23c0f525d93..d8445562d4e9 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -290,7 +290,7 @@ static inline void handle_relocations(void *output, unsigned long output_len,
 { }
 #endif
 
-static void parse_elf(void *output)
+static void parse_elf(void *output, void *input)
 {
 #ifdef CONFIG_X86_64
 	Elf64_Ehdr ehdr;
@@ -302,7 +302,7 @@ static void parse_elf(void *output)
 	void *dest;
 	int i;
 
-	memcpy(&ehdr, output, sizeof(ehdr));
+	memcpy(&ehdr, input, sizeof(ehdr));
 	if (ehdr.e_ident[EI_MAG0] != ELFMAG0 ||
 	   ehdr.e_ident[EI_MAG1] != ELFMAG1 ||
 	   ehdr.e_ident[EI_MAG2] != ELFMAG2 ||
@@ -317,7 +317,7 @@ static void parse_elf(void *output)
 	if (!phdrs)
 		error("Failed to allocate space for phdrs");
 
-	memcpy(phdrs, output + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum);
+	memcpy(phdrs, input + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum);
 
 	for (i = 0; i < ehdr.e_phnum; i++) {
 		phdr = &phdrs[i];
@@ -334,7 +334,7 @@ static void parse_elf(void *output)
 #else
 			dest = (void *)(phdr->p_paddr);
 #endif
-			memmove(dest, output + phdr->p_offset, phdr->p_filesz);
+			memmove(dest, input + phdr->p_offset, phdr->p_filesz);
 			break;
 		default: /* Ignore other PT_* */ break;
 		}
@@ -467,9 +467,21 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 #endif
 
 	debug_putstr("\nDecompressing Linux... ");
+
+#ifdef CONFIG_KERNEL_UNCOMPRESSED
+	if (cmdline_find_option_bool("nokaslr")) {
+		parse_elf(output, input_data);
+	} else {
+		__decompress(input_data, input_len, NULL, NULL, output, output_len,
+				NULL, error);
+		parse_elf(output, output);
+	}
+#else
 	__decompress(input_data, input_len, NULL, NULL, output, output_len,
 			NULL, error);
-	parse_elf(output);
+	parse_elf(output, output);
+#endif
+
 	handle_relocations(output, output_len, virt_addr);
 	debug_putstr("done.\nBooting the kernel.\n");
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/4] x86: boot: avoid memory copy if kernel is uncompressed
@ 2022-07-25  8:38   ` Albert Huang
  0 siblings, 0 replies; 32+ messages in thread
From: Albert Huang @ 2022-07-25  8:38 UTC (permalink / raw)
  Cc: huangjie.albert, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Eric Biederman,
	Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Kirill A. Shutemov, Kuppuswamy Sathyanarayanan, Michael Roth,
	Nathan Chancellor, Ard Biesheuvel, Mark Rutland,
	Sean Christopherson, Peter Zijlstra, Kees Cook, Tony Luck,
	linux-kernel, kexec, linux-kbuild

From: "huangjie.albert" <huangjie.albert@bytedance.com>

1、if kernel is uncompressed. we do not need to relocate
kernel image for decompression

2、if kaslr is disabled, we do not need to do a memory copy
before prase_elf.

Two memory copies can be skipped with this patch. this can
save aboat 20ms during booting.

Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
---
 arch/x86/boot/compressed/head_64.S |  8 ++++++--
 arch/x86/boot/compressed/misc.c    | 22 +++++++++++++++++-----
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index d33f060900d2..9e7770c7047b 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -398,10 +398,13 @@ SYM_CODE_START(startup_64)
 1:
 
 	/* Target address to relocate to for decompression */
+#ifdef CONFIG_KERNEL_UNCOMPRESSED
+	movq %rbp, %rbx
+#else
 	movl	BP_init_size(%rsi), %ebx
 	subl	$ rva(_end), %ebx
 	addq	%rbp, %rbx
-
+#endif
 	/* Set up the stack */
 	leaq	rva(boot_stack_end)(%rbx), %rsp
 
@@ -522,6 +525,7 @@ trampoline_return:
  * Copy the compressed kernel to the end of our buffer
  * where decompression in place becomes safe.
  */
+#ifndef CONFIG_KERNEL_UNCOMPRESSED
 	pushq	%rsi
 	leaq	(_bss-8)(%rip), %rsi
 	leaq	rva(_bss-8)(%rbx), %rdi
@@ -531,7 +535,7 @@ trampoline_return:
 	rep	movsq
 	cld
 	popq	%rsi
-
+#endif
 	/*
 	 * The GDT may get overwritten either during the copy we just did or
 	 * during extract_kernel below. To avoid any issues, repoint the GDTR
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index c23c0f525d93..d8445562d4e9 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -290,7 +290,7 @@ static inline void handle_relocations(void *output, unsigned long output_len,
 { }
 #endif
 
-static void parse_elf(void *output)
+static void parse_elf(void *output, void *input)
 {
 #ifdef CONFIG_X86_64
 	Elf64_Ehdr ehdr;
@@ -302,7 +302,7 @@ static void parse_elf(void *output)
 	void *dest;
 	int i;
 
-	memcpy(&ehdr, output, sizeof(ehdr));
+	memcpy(&ehdr, input, sizeof(ehdr));
 	if (ehdr.e_ident[EI_MAG0] != ELFMAG0 ||
 	   ehdr.e_ident[EI_MAG1] != ELFMAG1 ||
 	   ehdr.e_ident[EI_MAG2] != ELFMAG2 ||
@@ -317,7 +317,7 @@ static void parse_elf(void *output)
 	if (!phdrs)
 		error("Failed to allocate space for phdrs");
 
-	memcpy(phdrs, output + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum);
+	memcpy(phdrs, input + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum);
 
 	for (i = 0; i < ehdr.e_phnum; i++) {
 		phdr = &phdrs[i];
@@ -334,7 +334,7 @@ static void parse_elf(void *output)
 #else
 			dest = (void *)(phdr->p_paddr);
 #endif
-			memmove(dest, output + phdr->p_offset, phdr->p_filesz);
+			memmove(dest, input + phdr->p_offset, phdr->p_filesz);
 			break;
 		default: /* Ignore other PT_* */ break;
 		}
@@ -467,9 +467,21 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 #endif
 
 	debug_putstr("\nDecompressing Linux... ");
+
+#ifdef CONFIG_KERNEL_UNCOMPRESSED
+	if (cmdline_find_option_bool("nokaslr")) {
+		parse_elf(output, input_data);
+	} else {
+		__decompress(input_data, input_len, NULL, NULL, output, output_len,
+				NULL, error);
+		parse_elf(output, output);
+	}
+#else
 	__decompress(input_data, input_len, NULL, NULL, output, output_len,
 			NULL, error);
-	parse_elf(output);
+	parse_elf(output, output);
+#endif
+
 	handle_relocations(output, output_len, virt_addr);
 	debug_putstr("done.\nBooting the kernel.\n");
 
-- 
2.31.1


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] kexec: reuse crash kernel reserved memory for normal kexec
  2022-07-25  8:38   ` Albert Huang
@ 2022-07-25 12:02     ` Jason A. Donenfeld
  -1 siblings, 0 replies; 32+ messages in thread
From: Jason A. Donenfeld @ 2022-07-25 12:02 UTC (permalink / raw)
  To: Albert Huang
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Eric Biederman, Masahiro Yamada, Michal Marek,
	Nick Desaulniers, Kirill A. Shutemov, Kuppuswamy Sathyanarayanan,
	Michael Roth, Nathan Chancellor, Ard Biesheuvel, Joerg Roedel,
	Mark Rutland, Peter Zijlstra, Sean Christopherson, Kees Cook,
	linux-kernel, kexec, linux-kbuild

Hi Albert,

On Mon, Jul 25, 2022 at 04:38:53PM +0800, Albert Huang wrote:
> The kexec userspace tool also needs to add parameter options(-r) that
> support the use of reserved memory (see another patch for kexec)
>
> [...]
>
> -	if (kexec_on_panic) {
> +	if (kexec_on_panic && kexec_on_reserved) {

Two small questions related to this:

- Why does kexec-tools need an option, or more specifically, why does
  userspace need to communicate about this at all? Can't the kernel just
  automatically use the available reserved memory in the case that's not
  already being used by the panic handler kernel? I'm curious about
  whether there are caveats that would make this occasionally
  undesirable, hence suggesting an option.

- I don't totally understand how this works, so I might be a bit off
  here, but is there any chance that this could be made to co-exist with
  kexec_on_panic? Can a larger region just be reserved, specifically for
  this, rather than piggy backing on the panic handler region?

> +static struct page *kimage_alloc_reserverd_control_pages(struct kimage *image,
> +	case KEXEC_TYPE_RESERVED_MEM:
> +		pages = kimage_alloc_reserverd_control_pages(image, order);

Nit:
  reserverd -> reserved

Jason

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] kexec: reuse crash kernel reserved memory for normal kexec
@ 2022-07-25 12:02     ` Jason A. Donenfeld
  0 siblings, 0 replies; 32+ messages in thread
From: Jason A. Donenfeld @ 2022-07-25 12:02 UTC (permalink / raw)
  To: Albert Huang
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Eric Biederman, Masahiro Yamada, Michal Marek,
	Nick Desaulniers, Kirill A. Shutemov, Kuppuswamy Sathyanarayanan,
	Michael Roth, Nathan Chancellor, Ard Biesheuvel, Joerg Roedel,
	Mark Rutland, Peter Zijlstra, Sean Christopherson, Kees Cook,
	linux-kernel, kexec, linux-kbuild

Hi Albert,

On Mon, Jul 25, 2022 at 04:38:53PM +0800, Albert Huang wrote:
> The kexec userspace tool also needs to add parameter options(-r) that
> support the use of reserved memory (see another patch for kexec)
>
> [...]
>
> -	if (kexec_on_panic) {
> +	if (kexec_on_panic && kexec_on_reserved) {

Two small questions related to this:

- Why does kexec-tools need an option, or more specifically, why does
  userspace need to communicate about this at all? Can't the kernel just
  automatically use the available reserved memory in the case that's not
  already being used by the panic handler kernel? I'm curious about
  whether there are caveats that would make this occasionally
  undesirable, hence suggesting an option.

- I don't totally understand how this works, so I might be a bit off
  here, but is there any chance that this could be made to co-exist with
  kexec_on_panic? Can a larger region just be reserved, specifically for
  this, rather than piggy backing on the panic handler region?

> +static struct page *kimage_alloc_reserverd_control_pages(struct kimage *image,
> +	case KEXEC_TYPE_RESERVED_MEM:
> +		pages = kimage_alloc_reserverd_control_pages(image, order);

Nit:
  reserverd -> reserved

Jason

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG
  2022-07-25  8:38   ` Albert Huang
  (?)
@ 2022-07-25 12:15   ` Jason A. Donenfeld
  2022-07-25 13:32       ` 黄杰
  -1 siblings, 1 reply; 32+ messages in thread
From: Jason A. Donenfeld @ 2022-07-25 12:15 UTC (permalink / raw)
  To: Albert Huang
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Eric Biederman, Masahiro Yamada, Michal Marek,
	Nick Desaulniers, Kirill A. Shutemov, Brijesh Singh,
	Michael Roth, Nathan Chancellor, Kuppuswamy Sathyanarayanan,
	Ard Biesheuvel, Peter Zijlstra, Sean Christopherson,
	Joerg Roedel, Mark Rutland, Kees Cook, linux-kernel, kexec,
	linux-kbuild

Hi Albert,

On Mon, Jul 25, 2022 at 04:38:54PM +0800, Albert Huang wrote:
> +config KEXEC_PURGATORY_SKIP_SIG
> +	bool "skip kexec purgatory signature verification"
> +	depends on ARCH_HAS_KEXEC_PURGATORY
> +	help
> +	  this options makes the kexec purgatory do  not signature verification
> +	  which would get hundreds of milliseconds saved during kexec boot. If we can
> +	  confirm that the data of each segment loaded by kexec will not change we may
> +	  enable this option
> +

Some grammar nits here, but actually, wouldn't it be better to make this
depend on some other signature things instead? Like if the parent kernel
actually did a big signature computation, then maybe the purgatory step
is needed, but if it didn't bother, then maybe you can skip it. This
way, you don't need a compile-time option that might change some aspect
of signature verification people might otherwise be relying on.

Jason

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Fwd: [PATCH 0/4] faster kexec reboot
  2022-07-25  8:38 ` Albert Huang
                   ` (4 preceding siblings ...)
  (?)
@ 2022-07-25 12:54 ` 黄杰
  -1 siblings, 0 replies; 32+ messages in thread
From: 黄杰 @ 2022-07-25 12:54 UTC (permalink / raw)
  To: linux-kernel

---------- Forwarded message ---------
发件人: Albert Huang <huangjie.albert@bytedance.com>
Date: 2022年7月25日周一 16:39
Subject: [PATCH 0/4] faster kexec reboot
To:
Cc: huangjie.albert <huangjie.albert@bytedance.com>, Thomas Gleixner
<tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov
<bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>,
<x86@kernel.org>, H. Peter Anvin <hpa@zytor.com>, Eric Biederman
<ebiederm@xmission.com>, Masahiro Yamada <masahiroy@kernel.org>,
Michal Marek <michal.lkml@markovi.net>, Nick Desaulniers
<ndesaulniers@google.com>, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com>, Michael Roth
<michael.roth@amd.com>, Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@linux.intel.com>, Nathan Chancellor
<nathan@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Sean
Christopherson <seanjc@google.com>, Joerg Roedel <jroedel@suse.de>,
Mark Rutland <mark.rutland@arm.com>, Kees Cook
<keescook@chromium.org>, <linux-kernel@vger.kernel.org>,
<kexec@lists.infradead.org>, <linux-kbuild@vger.kernel.org>


From: "huangjie.albert" <huangjie.albert@bytedance.com>

In many time-sensitive scenarios, we need a shorter time to restart
the kernel. However, in the current kexec fast restart code, there
are many places in the memory copy operation, verification operation
and decompression operation, which take more time than 500ms. Through
the following patch series. machine_kexec-->start_kernel only takes 15ms

How to measure time:

c code:
uint64_t current_cycles(void)
{
    uint32_t low, high;
    asm volatile("rdtsc" : "=a"(low), "=d"(high));
    return ((uint64_t)low) | ((uint64_t)high << 32);
}
assembly code:
       pushq %rax
       pushq %rdx
       rdtsc
       mov   %eax,%eax
       shl   $0x20,%rdx
       or    %rax,%rdx
       movq  %rdx,0x840(%r14)
       popq  %rdx
       popq  %rax
the timestamp may store in boot_params or kexec control page, so we can
get the all timestamp after kernel boot up.

huangjie.albert (4):
  kexec: reuse crash kernel reserved memory for normal kexec
  kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG
  x86: Support the uncompressed kernel to speed up booting
  x86: boot: avoid memory copy if kernel is uncompressed

 arch/x86/Kconfig                   | 10 +++++++++
 arch/x86/boot/compressed/Makefile  |  5 ++++-
 arch/x86/boot/compressed/head_64.S |  8 +++++--
 arch/x86/boot/compressed/misc.c    | 35 +++++++++++++++++++++++++-----
 arch/x86/purgatory/purgatory.c     |  7 ++++++
 include/linux/kexec.h              |  9 ++++----
 include/uapi/linux/kexec.h         |  2 ++
 kernel/kexec.c                     | 19 +++++++++++++++-
 kernel/kexec_core.c                | 16 ++++++++------
 kernel/kexec_file.c                | 20 +++++++++++++++--
 scripts/Makefile.lib               |  5 +++++
 11 files changed, 114 insertions(+), 22 deletions(-)

--
2.31.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Fwd: [PATCH 3/4] x86: Support the uncompressed kernel to speed up booting
  2022-07-25  8:38   ` Albert Huang
  (?)
@ 2022-07-25 12:55   ` 黄杰
  -1 siblings, 0 replies; 32+ messages in thread
From: 黄杰 @ 2022-07-25 12:55 UTC (permalink / raw)
  To: linux-kernel

---------- Forwarded message ---------
发件人: Albert Huang <huangjie.albert@bytedance.com>
Date: 2022年7月25日周一 16:40
Subject: [PATCH 3/4] x86: Support the uncompressed kernel to speed up booting
To:
Cc: huangjie.albert <huangjie.albert@bytedance.com>, Thomas Gleixner
<tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov
<bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>,
<x86@kernel.org>, H. Peter Anvin <hpa@zytor.com>, Eric Biederman
<ebiederm@xmission.com>, Masahiro Yamada <masahiroy@kernel.org>,
Michal Marek <michal.lkml@markovi.net>, Nick Desaulniers
<ndesaulniers@google.com>, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com>, Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@linux.intel.com>, Tony Luck
<tony.luck@intel.com>, Michael Roth <michael.roth@amd.com>, Nathan
Chancellor <nathan@kernel.org>, Ard Biesheuvel <ardb@kernel.org>, Mark
Rutland <mark.rutland@arm.com>, Joerg Roedel <jroedel@suse.de>, Sean
Christopherson <seanjc@google.com>, Peter Zijlstra
<peterz@infradead.org>, Kees Cook <keescook@chromium.org>,
<linux-kernel@vger.kernel.org>, <kexec@lists.infradead.org>,
<linux-kbuild@vger.kernel.org>


From: "huangjie.albert" <huangjie.albert@bytedance.com>

Although the compressed kernel can save the time of loading the
kernel into the memory and save the disk space for storing the kernel,
but in some time-sensitive scenarios, the time for decompressing the
kernel is intolerable. Therefore, it is necessary to support uncompressed
kernel images, so that the time of kernel decompression can be saved when
the kernel is started.

This part of the time on my machine is approximately:
image type      image  size      times
compressed(gzip) 8.5M            159ms
uncompressed     53M             8.5ms

Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
---
 arch/x86/Kconfig                  |  1 +
 arch/x86/boot/compressed/Makefile |  5 ++++-
 arch/x86/boot/compressed/misc.c   | 13 +++++++++++++
 scripts/Makefile.lib              |  5 +++++
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index adbd3a2bd60f..231187624c68 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -221,6 +221,7 @@ config X86
        select HAVE_KERNEL_LZO
        select HAVE_KERNEL_XZ
        select HAVE_KERNEL_ZSTD
+       select HAVE_KERNEL_UNCOMPRESSED
        select HAVE_KPROBES
        select HAVE_KPROBES_ON_FTRACE
        select HAVE_FUNCTION_ERROR_INJECTION
diff --git a/arch/x86/boot/compressed/Makefile
b/arch/x86/boot/compressed/Makefile
index 19e1905dcbf6..0c8417a2f792 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -26,7 +26,7 @@ OBJECT_FILES_NON_STANDARD     := y
 KCOV_INSTRUMENT                := n

 targets := vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2
vmlinux.bin.lzma \
-       vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4 vmlinux.bin.zst
+       vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4 vmlinux.bin.zst
vmlinux.bin.none

 # CLANG_FLAGS must come before any cc-disable-warning or cc-option calls in
 # case of cross compiling, as it has the '--target=' flag, which is needed to
@@ -139,6 +139,8 @@ $(obj)/vmlinux.bin.lz4: $(vmlinux.bin.all-y) FORCE
        $(call if_changed,lz4_with_size)
 $(obj)/vmlinux.bin.zst: $(vmlinux.bin.all-y) FORCE
        $(call if_changed,zstd22_with_size)
+$(obj)/vmlinux.bin.none: $(vmlinux.bin.all-y) FORCE
+       $(call if_changed,none)

 suffix-$(CONFIG_KERNEL_GZIP)   := gz
 suffix-$(CONFIG_KERNEL_BZIP2)  := bz2
@@ -147,6 +149,7 @@ suffix-$(CONFIG_KERNEL_XZ)  := xz
 suffix-$(CONFIG_KERNEL_LZO)    := lzo
 suffix-$(CONFIG_KERNEL_LZ4)    := lz4
 suffix-$(CONFIG_KERNEL_ZSTD)   := zst
+suffix-$(CONFIG_KERNEL_UNCOMPRESSED)   := none

 quiet_cmd_mkpiggy = MKPIGGY $@
       cmd_mkpiggy = $(obj)/mkpiggy $< > $@
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index cf690d8712f4..c23c0f525d93 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -181,6 +181,19 @@ void __puthex(unsigned long value)
        }
 }

+#ifdef CONFIG_KERNEL_UNCOMPRESSED
+#include <linux/decompress/mm.h>
+static int __decompress(unsigned char *buf, long len,
+                               long (*fill)(void*, unsigned long),
+                               long (*flush)(void*, unsigned long),
+                               unsigned char *outbuf, long olen,
+                               long *pos, void (*error)(char *x))
+{
+       memcpy(outbuf, buf, olen);
+       return 0;
+}
+#endif
+
 #ifdef CONFIG_X86_NEED_RELOCS
 static void handle_relocations(void *output, unsigned long output_len,
                               unsigned long virt_addr)
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 3fb6a99e78c4..c89d5466c617 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -438,6 +438,11 @@ quiet_cmd_lz4 = LZ4     $@
 quiet_cmd_lz4_with_size = LZ4     $@
       cmd_lz4_with_size = { cat $(real-prereqs) | $(LZ4) -l -c1 stdin stdout; \
                   $(size_append); } > $@
+# none
+quiet_cmd_none = NONE     $@
+      cmd_none = (cat $(filter-out FORCE,$^) && \
+      $(call size_append, $(filter-out FORCE,$^))) > $@ || \
+      (rm -f $@ ; false)

 # U-Boot mkimage
 # ---------------------------------------------------------------------------
--
2.31.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Fwd: [PATCH 4/4] x86: boot: avoid memory copy if kernel is uncompressed
  2022-07-25  8:38   ` Albert Huang
  (?)
@ 2022-07-25 12:55   ` 黄杰
  -1 siblings, 0 replies; 32+ messages in thread
From: 黄杰 @ 2022-07-25 12:55 UTC (permalink / raw)
  To: linux-kernel

---------- Forwarded message ---------
发件人: Albert Huang <huangjie.albert@bytedance.com>
Date: 2022年7月25日周一 16:40
Subject: [PATCH 4/4] x86: boot: avoid memory copy if kernel is uncompressed
To:
Cc: huangjie.albert <huangjie.albert@bytedance.com>, Thomas Gleixner
<tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov
<bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>,
<x86@kernel.org>, H. Peter Anvin <hpa@zytor.com>, Eric Biederman
<ebiederm@xmission.com>, Masahiro Yamada <masahiroy@kernel.org>,
Michal Marek <michal.lkml@markovi.net>, Nick Desaulniers
<ndesaulniers@google.com>, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com>, Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@linux.intel.com>, Michael Roth
<michael.roth@amd.com>, Nathan Chancellor <nathan@kernel.org>, Ard
Biesheuvel <ardb@kernel.org>, Mark Rutland <mark.rutland@arm.com>,
Sean Christopherson <seanjc@google.com>, Peter Zijlstra
<peterz@infradead.org>, Kees Cook <keescook@chromium.org>, Tony Luck
<tony.luck@intel.com>, <linux-kernel@vger.kernel.org>,
<kexec@lists.infradead.org>, <linux-kbuild@vger.kernel.org>


From: "huangjie.albert" <huangjie.albert@bytedance.com>

1、if kernel is uncompressed. we do not need to relocate
kernel image for decompression

2、if kaslr is disabled, we do not need to do a memory copy
before prase_elf.

Two memory copies can be skipped with this patch. this can
save aboat 20ms during booting.

Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
---
 arch/x86/boot/compressed/head_64.S |  8 ++++++--
 arch/x86/boot/compressed/misc.c    | 22 +++++++++++++++++-----
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S
b/arch/x86/boot/compressed/head_64.S
index d33f060900d2..9e7770c7047b 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -398,10 +398,13 @@ SYM_CODE_START(startup_64)
 1:

        /* Target address to relocate to for decompression */
+#ifdef CONFIG_KERNEL_UNCOMPRESSED
+       movq %rbp, %rbx
+#else
        movl    BP_init_size(%rsi), %ebx
        subl    $ rva(_end), %ebx
        addq    %rbp, %rbx
-
+#endif
        /* Set up the stack */
        leaq    rva(boot_stack_end)(%rbx), %rsp

@@ -522,6 +525,7 @@ trampoline_return:
  * Copy the compressed kernel to the end of our buffer
  * where decompression in place becomes safe.
  */
+#ifndef CONFIG_KERNEL_UNCOMPRESSED
        pushq   %rsi
        leaq    (_bss-8)(%rip), %rsi
        leaq    rva(_bss-8)(%rbx), %rdi
@@ -531,7 +535,7 @@ trampoline_return:
        rep     movsq
        cld
        popq    %rsi
-
+#endif
        /*
         * The GDT may get overwritten either during the copy we just did or
         * during extract_kernel below. To avoid any issues, repoint the GDTR
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index c23c0f525d93..d8445562d4e9 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -290,7 +290,7 @@ static inline void handle_relocations(void
*output, unsigned long output_len,
 { }
 #endif

-static void parse_elf(void *output)
+static void parse_elf(void *output, void *input)
 {
 #ifdef CONFIG_X86_64
        Elf64_Ehdr ehdr;
@@ -302,7 +302,7 @@ static void parse_elf(void *output)
        void *dest;
        int i;

-       memcpy(&ehdr, output, sizeof(ehdr));
+       memcpy(&ehdr, input, sizeof(ehdr));
        if (ehdr.e_ident[EI_MAG0] != ELFMAG0 ||
           ehdr.e_ident[EI_MAG1] != ELFMAG1 ||
           ehdr.e_ident[EI_MAG2] != ELFMAG2 ||
@@ -317,7 +317,7 @@ static void parse_elf(void *output)
        if (!phdrs)
                error("Failed to allocate space for phdrs");

-       memcpy(phdrs, output + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum);
+       memcpy(phdrs, input + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum);

        for (i = 0; i < ehdr.e_phnum; i++) {
                phdr = &phdrs[i];
@@ -334,7 +334,7 @@ static void parse_elf(void *output)
 #else
                        dest = (void *)(phdr->p_paddr);
 #endif
-                       memmove(dest, output + phdr->p_offset, phdr->p_filesz);
+                       memmove(dest, input + phdr->p_offset, phdr->p_filesz);
                        break;
                default: /* Ignore other PT_* */ break;
                }
@@ -467,9 +467,21 @@ asmlinkage __visible void *extract_kernel(void
*rmode, memptr heap,
 #endif

        debug_putstr("\nDecompressing Linux... ");
+
+#ifdef CONFIG_KERNEL_UNCOMPRESSED
+       if (cmdline_find_option_bool("nokaslr")) {
+               parse_elf(output, input_data);
+       } else {
+               __decompress(input_data, input_len, NULL, NULL,
output, output_len,
+                               NULL, error);
+               parse_elf(output, output);
+       }
+#else
        __decompress(input_data, input_len, NULL, NULL, output, output_len,
                        NULL, error);
-       parse_elf(output);
+       parse_elf(output, output);
+#endif
+
        handle_relocations(output, output_len, virt_addr);
        debug_putstr("done.\nBooting the kernel.\n");

--
2.31.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Fwd: [External] Re: [PATCH 1/4] kexec: reuse crash kernel reserved memory for normal kexec
  2022-07-25 12:02     ` Jason A. Donenfeld
  (?)
@ 2022-07-25 12:56     ` 黄杰
  -1 siblings, 0 replies; 32+ messages in thread
From: 黄杰 @ 2022-07-25 12:56 UTC (permalink / raw)
  To: linux-kernel

---------- Forwarded message ---------
发件人: Jason A. Donenfeld <Jason@zx2c4.com>
Date: 2022年7月25日周一 20:02
Subject: [External] Re: [PATCH 1/4] kexec: reuse crash kernel reserved
memory for normal kexec
To: Albert Huang <huangjie.albert@bytedance.com>
Cc: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar
<mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen
<dave.hansen@linux.intel.com>, <x86@kernel.org>, H. Peter Anvin
<hpa@zytor.com>, Eric Biederman <ebiederm@xmission.com>, Masahiro
Yamada <masahiroy@kernel.org>, Michal Marek <michal.lkml@markovi.net>,
Nick Desaulniers <ndesaulniers@google.com>, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com>, Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@linux.intel.com>, Michael Roth
<michael.roth@amd.com>, Nathan Chancellor <nathan@kernel.org>, Ard
Biesheuvel <ardb@kernel.org>, Joerg Roedel <jroedel@suse.de>, Mark
Rutland <mark.rutland@arm.com>, Peter Zijlstra <peterz@infradead.org>,
Sean Christopherson <seanjc@google.com>, Kees Cook
<keescook@chromium.org>, <linux-kernel@vger.kernel.org>,
<kexec@lists.infradead.org>, <linux-kbuild@vger.kernel.org>


Hi Albert,

On Mon, Jul 25, 2022 at 04:38:53PM +0800, Albert Huang wrote:
> The kexec userspace tool also needs to add parameter options(-r) that
> support the use of reserved memory (see another patch for kexec)
>
> [...]
>
> -     if (kexec_on_panic) {
> +     if (kexec_on_panic && kexec_on_reserved) {

Two small questions related to this:

- Why does kexec-tools need an option, or more specifically, why does
  userspace need to communicate about this at all? Can't the kernel just
  automatically use the available reserved memory in the case that's not
  already being used by the panic handler kernel? I'm curious about
  whether there are caveats that would make this occasionally
  undesirable, hence suggesting an option.

- I don't totally understand how this works, so I might be a bit off
  here, but is there any chance that this could be made to co-exist with
  kexec_on_panic? Can a larger region just be reserved, specifically for
  this, rather than piggy backing on the panic handler region?

> +static struct page *kimage_alloc_reserverd_control_pages(struct kimage *image,
> +     case KEXEC_TYPE_RESERVED_MEM:
> +             pages = kimage_alloc_reserverd_control_pages(image, order);

Nit:
  reserverd -> reserved

Jason

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Fwd: [PATCH 2/4] kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG
  2022-07-25  8:38   ` Albert Huang
  (?)
  (?)
@ 2022-07-25 12:56   ` 黄杰
  -1 siblings, 0 replies; 32+ messages in thread
From: 黄杰 @ 2022-07-25 12:56 UTC (permalink / raw)
  To: linux-kernel

---------- Forwarded message ---------
发件人: Albert Huang <huangjie.albert@bytedance.com>
Date: 2022年7月25日周一 16:40
Subject: [PATCH 2/4] kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG
To:
Cc: huangjie.albert <huangjie.albert@bytedance.com>, Thomas Gleixner
<tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov
<bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>,
<x86@kernel.org>, H. Peter Anvin <hpa@zytor.com>, Eric Biederman
<ebiederm@xmission.com>, Masahiro Yamada <masahiroy@kernel.org>,
Michal Marek <michal.lkml@markovi.net>, Nick Desaulniers
<ndesaulniers@google.com>, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com>, Brijesh Singh
<brijesh.singh@amd.com>, Michael Roth <michael.roth@amd.com>, Nathan
Chancellor <nathan@kernel.org>, Kuppuswamy Sathyanarayanan
<sathyanarayanan.kuppuswamy@linux.intel.com>, Ard Biesheuvel
<ardb@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Sean
Christopherson <seanjc@google.com>, Joerg Roedel <jroedel@suse.de>,
Mark Rutland <mark.rutland@arm.com>, Kees Cook
<keescook@chromium.org>, <linux-kernel@vger.kernel.org>,
<kexec@lists.infradead.org>, <linux-kbuild@vger.kernel.org>


From: "huangjie.albert" <huangjie.albert@bytedance.com>

the verify_sha256_digest may cost 300+ ms in my test environment:
bzImage: 53M initramfs:28M

We can add a macro to control whether to enable this check. If we
can confirm that the data in this will not change, we can turn off
the check and get a faster startup.

Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
---
 arch/x86/Kconfig               | 9 +++++++++
 arch/x86/purgatory/purgatory.c | 7 +++++++
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 52a7f91527fe..adbd3a2bd60f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2052,6 +2052,15 @@ config KEXEC_BZIMAGE_VERIFY_SIG
        help
          Enable bzImage signature verification support.

+config KEXEC_PURGATORY_SKIP_SIG
+       bool "skip kexec purgatory signature verification"
+       depends on ARCH_HAS_KEXEC_PURGATORY
+       help
+         this options makes the kexec purgatory do  not signature verification
+         which would get hundreds of milliseconds saved during kexec
boot. If we can
+         confirm that the data of each segment loaded by kexec will
not change we may
+         enable this option
+
 config CRASH_DUMP
        bool "kernel crash dumps"
        depends on X86_64 || (X86_32 && HIGHMEM)
diff --git a/arch/x86/purgatory/purgatory.c b/arch/x86/purgatory/purgatory.c
index 7558139920f8..b3f15774d86d 100644
--- a/arch/x86/purgatory/purgatory.c
+++ b/arch/x86/purgatory/purgatory.c
@@ -20,6 +20,12 @@ u8 purgatory_sha256_digest[SHA256_DIGEST_SIZE]
__section(".kexec-purgatory");

 struct kexec_sha_region purgatory_sha_regions[KEXEC_SEGMENT_MAX]
__section(".kexec-purgatory");

+#ifdef CONFIG_KEXEC_PURGATORY_SKIP_SIG
+static int verify_sha256_digest(void)
+{
+       return 0;
+}
+#else
 static int verify_sha256_digest(void)
 {
        struct kexec_sha_region *ptr, *end;
@@ -39,6 +45,7 @@ static int verify_sha256_digest(void)

        return 0;
 }
+#endif

 void purgatory(void)
 {
--
2.31.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [External] Re: [PATCH 1/4] kexec: reuse crash kernel reserved memory for normal kexec
  2022-07-25 12:02     ` Jason A. Donenfeld
@ 2022-07-25 13:30       ` 黄杰
  -1 siblings, 0 replies; 32+ messages in thread
From: 黄杰 @ 2022-07-25 13:30 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Eric Biederman, Masahiro Yamada, Michal Marek,
	Nick Desaulniers, Kirill A. Shutemov, Kuppuswamy Sathyanarayanan,
	Michael Roth, Nathan Chancellor, Ard Biesheuvel, Joerg Roedel,
	Mark Rutland, Peter Zijlstra, Sean Christopherson, Kees Cook,
	linux-kernel, kexec, linux-kbuild

Jason A. Donenfeld <Jason@zx2c4.com> 于2022年7月25日周一 20:02写道:
>
> Hi Albert,
>
> On Mon, Jul 25, 2022 at 04:38:53PM +0800, Albert Huang wrote:
> > The kexec userspace tool also needs to add parameter options(-r) that
> > support the use of reserved memory (see another patch for kexec)
> >
> > [...]
> >
> > -     if (kexec_on_panic) {
> > +     if (kexec_on_panic && kexec_on_reserved) {
>
> Two small questions related to this:
>
> - Why does kexec-tools need an option, or more specifically, why does
>   userspace need to communicate about this at all? Can't the kernel just
>   automatically use the available reserved memory in the case that's not
>   already being used by the panic handler kernel? I'm curious about
>   whether there are caveats that would make this occasionally
>   undesirable, hence suggesting an option.

Because the crash kernel will also use this part of the memory,
so do not use this mechanism unless explicitly specified. So I tend to
add an option

>
> - I don't totally understand how this works, so I might be a bit off
>   here, but is there any chance that this could be made to co-exist with
>   kexec_on_panic? Can a larger region just be reserved, specifically for
>   this, rather than piggy backing on the panic handler region?
>
> > +static struct page *kimage_alloc_reserverd_control_pages(struct kimage *image,
> > +     case KEXEC_TYPE_RESERVED_MEM:
> > +             pages = kimage_alloc_reserverd_control_pages(image, order);
>
> Nit:
>   reserverd -> reserved

thanks for that, I will correct it

>
> Jason

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [External] Re: [PATCH 1/4] kexec: reuse crash kernel reserved memory for normal kexec
@ 2022-07-25 13:30       ` 黄杰
  0 siblings, 0 replies; 32+ messages in thread
From: 黄杰 @ 2022-07-25 13:30 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Eric Biederman, Masahiro Yamada, Michal Marek,
	Nick Desaulniers, Kirill A. Shutemov, Kuppuswamy Sathyanarayanan,
	Michael Roth, Nathan Chancellor, Ard Biesheuvel, Joerg Roedel,
	Mark Rutland, Peter Zijlstra, Sean Christopherson, Kees Cook,
	linux-kernel, kexec, linux-kbuild

Jason A. Donenfeld <Jason@zx2c4.com> 于2022年7月25日周一 20:02写道:
>
> Hi Albert,
>
> On Mon, Jul 25, 2022 at 04:38:53PM +0800, Albert Huang wrote:
> > The kexec userspace tool also needs to add parameter options(-r) that
> > support the use of reserved memory (see another patch for kexec)
> >
> > [...]
> >
> > -     if (kexec_on_panic) {
> > +     if (kexec_on_panic && kexec_on_reserved) {
>
> Two small questions related to this:
>
> - Why does kexec-tools need an option, or more specifically, why does
>   userspace need to communicate about this at all? Can't the kernel just
>   automatically use the available reserved memory in the case that's not
>   already being used by the panic handler kernel? I'm curious about
>   whether there are caveats that would make this occasionally
>   undesirable, hence suggesting an option.

Because the crash kernel will also use this part of the memory,
so do not use this mechanism unless explicitly specified. So I tend to
add an option

>
> - I don't totally understand how this works, so I might be a bit off
>   here, but is there any chance that this could be made to co-exist with
>   kexec_on_panic? Can a larger region just be reserved, specifically for
>   this, rather than piggy backing on the panic handler region?
>
> > +static struct page *kimage_alloc_reserverd_control_pages(struct kimage *image,
> > +     case KEXEC_TYPE_RESERVED_MEM:
> > +             pages = kimage_alloc_reserverd_control_pages(image, order);
>
> Nit:
>   reserverd -> reserved

thanks for that, I will correct it

>
> Jason

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [External] Re: [PATCH 2/4] kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG
  2022-07-25 12:15   ` Jason A. Donenfeld
@ 2022-07-25 13:32       ` 黄杰
  0 siblings, 0 replies; 32+ messages in thread
From: 黄杰 @ 2022-07-25 13:32 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Eric Biederman, Masahiro Yamada, Michal Marek,
	Nick Desaulniers, Kirill A. Shutemov, Brijesh Singh,
	Michael Roth, Nathan Chancellor, Kuppuswamy Sathyanarayanan,
	Ard Biesheuvel, Peter Zijlstra, Sean Christopherson,
	Joerg Roedel, Mark Rutland, Kees Cook, linux-kernel, kexec,
	linux-kbuild

maybe a boot parameter ?

Jason A. Donenfeld <Jason@zx2c4.com> 于2022年7月25日周一 20:15写道:
>
> Hi Albert,
>
> On Mon, Jul 25, 2022 at 04:38:54PM +0800, Albert Huang wrote:
> > +config KEXEC_PURGATORY_SKIP_SIG
> > +     bool "skip kexec purgatory signature verification"
> > +     depends on ARCH_HAS_KEXEC_PURGATORY
> > +     help
> > +       this options makes the kexec purgatory do  not signature verification
> > +       which would get hundreds of milliseconds saved during kexec boot. If we can
> > +       confirm that the data of each segment loaded by kexec will not change we may
> > +       enable this option
> > +
>
> Some grammar nits here, but actually, wouldn't it be better to make this
> depend on some other signature things instead? Like if the parent kernel
> actually did a big signature computation, then maybe the purgatory step
> is needed, but if it didn't bother, then maybe you can skip it. This
> way, you don't need a compile-time option that might change some aspect
> of signature verification people might otherwise be relying on.
>
> Jason

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [External] Re: [PATCH 2/4] kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG
@ 2022-07-25 13:32       ` 黄杰
  0 siblings, 0 replies; 32+ messages in thread
From: 黄杰 @ 2022-07-25 13:32 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Eric Biederman, Masahiro Yamada, Michal Marek,
	Nick Desaulniers, Kirill A. Shutemov, Brijesh Singh,
	Michael Roth, Nathan Chancellor, Kuppuswamy Sathyanarayanan,
	Ard Biesheuvel, Peter Zijlstra, Sean Christopherson,
	Joerg Roedel, Mark Rutland, Kees Cook, linux-kernel, kexec,
	linux-kbuild

maybe a boot parameter ?

Jason A. Donenfeld <Jason@zx2c4.com> 于2022年7月25日周一 20:15写道:
>
> Hi Albert,
>
> On Mon, Jul 25, 2022 at 04:38:54PM +0800, Albert Huang wrote:
> > +config KEXEC_PURGATORY_SKIP_SIG
> > +     bool "skip kexec purgatory signature verification"
> > +     depends on ARCH_HAS_KEXEC_PURGATORY
> > +     help
> > +       this options makes the kexec purgatory do  not signature verification
> > +       which would get hundreds of milliseconds saved during kexec boot. If we can
> > +       confirm that the data of each segment loaded by kexec will not change we may
> > +       enable this option
> > +
>
> Some grammar nits here, but actually, wouldn't it be better to make this
> depend on some other signature things instead? Like if the parent kernel
> actually did a big signature computation, then maybe the purgatory step
> is needed, but if it didn't bother, then maybe you can skip it. This
> way, you don't need a compile-time option that might change some aspect
> of signature verification people might otherwise be relying on.
>
> Jason

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] x86: Support the uncompressed kernel to speed up booting
  2022-07-25  8:38   ` Albert Huang
@ 2022-07-25 16:57     ` Eric W. Biederman
  -1 siblings, 0 replies; 32+ messages in thread
From: Eric W. Biederman @ 2022-07-25 16:57 UTC (permalink / raw)
  To: Albert Huang
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Kirill A. Shutemov, Kuppuswamy Sathyanarayanan, Tony Luck,
	Michael Roth, Nathan Chancellor, Ard Biesheuvel, Mark Rutland,
	Joerg Roedel, Sean Christopherson, Peter Zijlstra, Kees Cook,
	linux-kernel, kexec, linux-kbuild

Albert Huang <huangjie.albert@bytedance.com> writes:

> From: "huangjie.albert" <huangjie.albert@bytedance.com>
>
> Although the compressed kernel can save the time of loading the
> kernel into the memory and save the disk space for storing the kernel,
> but in some time-sensitive scenarios, the time for decompressing the
> kernel is intolerable. Therefore, it is necessary to support uncompressed
> kernel images, so that the time of kernel decompression can be saved when
> the kernel is started.
>
> This part of the time on my machine is approximately:
> image type      image  size      times
> compressed(gzip) 8.5M            159ms
> uncompressed     53M             8.5ms

Why in the world are you using arch/x86/boot/compressed/... for an
uncompressed kernel.  Especially if you don't plan to process
relocations.

Even if it somehow makes sense why have you not followed the pattern
used by the rest of the code and implemented a file that implements
a no-op __decompress routine?

Eric


> Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
> ---
>  arch/x86/Kconfig                  |  1 +
>  arch/x86/boot/compressed/Makefile |  5 ++++-
>  arch/x86/boot/compressed/misc.c   | 13 +++++++++++++
>  scripts/Makefile.lib              |  5 +++++
>  4 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index adbd3a2bd60f..231187624c68 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -221,6 +221,7 @@ config X86
>  	select HAVE_KERNEL_LZO
>  	select HAVE_KERNEL_XZ
>  	select HAVE_KERNEL_ZSTD
> +	select HAVE_KERNEL_UNCOMPRESSED
>  	select HAVE_KPROBES
>  	select HAVE_KPROBES_ON_FTRACE
>  	select HAVE_FUNCTION_ERROR_INJECTION
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 19e1905dcbf6..0c8417a2f792 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -26,7 +26,7 @@ OBJECT_FILES_NON_STANDARD	:= y
>  KCOV_INSTRUMENT		:= n
>  
>  targets := vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma \
> -	vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4 vmlinux.bin.zst
> +	vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4 vmlinux.bin.zst vmlinux.bin.none
>  
>  # CLANG_FLAGS must come before any cc-disable-warning or cc-option calls in
>  # case of cross compiling, as it has the '--target=' flag, which is needed to
> @@ -139,6 +139,8 @@ $(obj)/vmlinux.bin.lz4: $(vmlinux.bin.all-y) FORCE
>  	$(call if_changed,lz4_with_size)
>  $(obj)/vmlinux.bin.zst: $(vmlinux.bin.all-y) FORCE
>  	$(call if_changed,zstd22_with_size)
> +$(obj)/vmlinux.bin.none: $(vmlinux.bin.all-y) FORCE
> +	$(call if_changed,none)
>  
>  suffix-$(CONFIG_KERNEL_GZIP)	:= gz
>  suffix-$(CONFIG_KERNEL_BZIP2)	:= bz2
> @@ -147,6 +149,7 @@ suffix-$(CONFIG_KERNEL_XZ)	:= xz
>  suffix-$(CONFIG_KERNEL_LZO) 	:= lzo
>  suffix-$(CONFIG_KERNEL_LZ4) 	:= lz4
>  suffix-$(CONFIG_KERNEL_ZSTD)	:= zst
> +suffix-$(CONFIG_KERNEL_UNCOMPRESSED)	:= none
>  
>  quiet_cmd_mkpiggy = MKPIGGY $@
>        cmd_mkpiggy = $(obj)/mkpiggy $< > $@
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index cf690d8712f4..c23c0f525d93 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -181,6 +181,19 @@ void __puthex(unsigned long value)
>  	}
>  }
>  
> +#ifdef CONFIG_KERNEL_UNCOMPRESSED
> +#include <linux/decompress/mm.h>
> +static int __decompress(unsigned char *buf, long len,
> +				long (*fill)(void*, unsigned long),
> +				long (*flush)(void*, unsigned long),
> +				unsigned char *outbuf, long olen,
> +				long *pos, void (*error)(char *x))
> +{
> +	memcpy(outbuf, buf, olen);
> +	return 0;
> +}
> +#endif
> +
>  #ifdef CONFIG_X86_NEED_RELOCS
>  static void handle_relocations(void *output, unsigned long output_len,
>  			       unsigned long virt_addr)
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 3fb6a99e78c4..c89d5466c617 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -438,6 +438,11 @@ quiet_cmd_lz4 = LZ4     $@
>  quiet_cmd_lz4_with_size = LZ4     $@
>        cmd_lz4_with_size = { cat $(real-prereqs) | $(LZ4) -l -c1 stdin stdout; \
>                    $(size_append); } > $@
> +# none
> +quiet_cmd_none = NONE     $@
> +      cmd_none = (cat $(filter-out FORCE,$^) && \
> +      $(call size_append, $(filter-out FORCE,$^))) > $@ || \
> +      (rm -f $@ ; false)
>  
>  # U-Boot mkimage
>  # ---------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/4] x86: Support the uncompressed kernel to speed up booting
@ 2022-07-25 16:57     ` Eric W. Biederman
  0 siblings, 0 replies; 32+ messages in thread
From: Eric W. Biederman @ 2022-07-25 16:57 UTC (permalink / raw)
  To: Albert Huang
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Kirill A. Shutemov, Kuppuswamy Sathyanarayanan, Tony Luck,
	Michael Roth, Nathan Chancellor, Ard Biesheuvel, Mark Rutland,
	Joerg Roedel, Sean Christopherson, Peter Zijlstra, Kees Cook,
	linux-kernel, kexec, linux-kbuild

Albert Huang <huangjie.albert@bytedance.com> writes:

> From: "huangjie.albert" <huangjie.albert@bytedance.com>
>
> Although the compressed kernel can save the time of loading the
> kernel into the memory and save the disk space for storing the kernel,
> but in some time-sensitive scenarios, the time for decompressing the
> kernel is intolerable. Therefore, it is necessary to support uncompressed
> kernel images, so that the time of kernel decompression can be saved when
> the kernel is started.
>
> This part of the time on my machine is approximately:
> image type      image  size      times
> compressed(gzip) 8.5M            159ms
> uncompressed     53M             8.5ms

Why in the world are you using arch/x86/boot/compressed/... for an
uncompressed kernel.  Especially if you don't plan to process
relocations.

Even if it somehow makes sense why have you not followed the pattern
used by the rest of the code and implemented a file that implements
a no-op __decompress routine?

Eric


> Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
> ---
>  arch/x86/Kconfig                  |  1 +
>  arch/x86/boot/compressed/Makefile |  5 ++++-
>  arch/x86/boot/compressed/misc.c   | 13 +++++++++++++
>  scripts/Makefile.lib              |  5 +++++
>  4 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index adbd3a2bd60f..231187624c68 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -221,6 +221,7 @@ config X86
>  	select HAVE_KERNEL_LZO
>  	select HAVE_KERNEL_XZ
>  	select HAVE_KERNEL_ZSTD
> +	select HAVE_KERNEL_UNCOMPRESSED
>  	select HAVE_KPROBES
>  	select HAVE_KPROBES_ON_FTRACE
>  	select HAVE_FUNCTION_ERROR_INJECTION
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 19e1905dcbf6..0c8417a2f792 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -26,7 +26,7 @@ OBJECT_FILES_NON_STANDARD	:= y
>  KCOV_INSTRUMENT		:= n
>  
>  targets := vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma \
> -	vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4 vmlinux.bin.zst
> +	vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4 vmlinux.bin.zst vmlinux.bin.none
>  
>  # CLANG_FLAGS must come before any cc-disable-warning or cc-option calls in
>  # case of cross compiling, as it has the '--target=' flag, which is needed to
> @@ -139,6 +139,8 @@ $(obj)/vmlinux.bin.lz4: $(vmlinux.bin.all-y) FORCE
>  	$(call if_changed,lz4_with_size)
>  $(obj)/vmlinux.bin.zst: $(vmlinux.bin.all-y) FORCE
>  	$(call if_changed,zstd22_with_size)
> +$(obj)/vmlinux.bin.none: $(vmlinux.bin.all-y) FORCE
> +	$(call if_changed,none)
>  
>  suffix-$(CONFIG_KERNEL_GZIP)	:= gz
>  suffix-$(CONFIG_KERNEL_BZIP2)	:= bz2
> @@ -147,6 +149,7 @@ suffix-$(CONFIG_KERNEL_XZ)	:= xz
>  suffix-$(CONFIG_KERNEL_LZO) 	:= lzo
>  suffix-$(CONFIG_KERNEL_LZ4) 	:= lz4
>  suffix-$(CONFIG_KERNEL_ZSTD)	:= zst
> +suffix-$(CONFIG_KERNEL_UNCOMPRESSED)	:= none
>  
>  quiet_cmd_mkpiggy = MKPIGGY $@
>        cmd_mkpiggy = $(obj)/mkpiggy $< > $@
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index cf690d8712f4..c23c0f525d93 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -181,6 +181,19 @@ void __puthex(unsigned long value)
>  	}
>  }
>  
> +#ifdef CONFIG_KERNEL_UNCOMPRESSED
> +#include <linux/decompress/mm.h>
> +static int __decompress(unsigned char *buf, long len,
> +				long (*fill)(void*, unsigned long),
> +				long (*flush)(void*, unsigned long),
> +				unsigned char *outbuf, long olen,
> +				long *pos, void (*error)(char *x))
> +{
> +	memcpy(outbuf, buf, olen);
> +	return 0;
> +}
> +#endif
> +
>  #ifdef CONFIG_X86_NEED_RELOCS
>  static void handle_relocations(void *output, unsigned long output_len,
>  			       unsigned long virt_addr)
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 3fb6a99e78c4..c89d5466c617 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -438,6 +438,11 @@ quiet_cmd_lz4 = LZ4     $@
>  quiet_cmd_lz4_with_size = LZ4     $@
>        cmd_lz4_with_size = { cat $(real-prereqs) | $(LZ4) -l -c1 stdin stdout; \
>                    $(size_append); } > $@
> +# none
> +quiet_cmd_none = NONE     $@
> +      cmd_none = (cat $(filter-out FORCE,$^) && \
> +      $(call size_append, $(filter-out FORCE,$^))) > $@ || \
> +      (rm -f $@ ; false)
>  
>  # U-Boot mkimage
>  # ---------------------------------------------------------------------------

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/4] faster kexec reboot
  2022-07-25  8:38 ` Albert Huang
@ 2022-07-25 17:04   ` Eric W. Biederman
  -1 siblings, 0 replies; 32+ messages in thread
From: Eric W. Biederman @ 2022-07-25 17:04 UTC (permalink / raw)
  To: Albert Huang
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Kirill A. Shutemov, Michael Roth, Kuppuswamy Sathyanarayanan,
	Nathan Chancellor, Peter Zijlstra, Sean Christopherson,
	Joerg Roedel, Mark Rutland, Kees Cook, linux-kernel, kexec,
	linux-kbuild

Albert Huang <huangjie.albert@bytedance.com> writes:

> From: "huangjie.albert" <huangjie.albert@bytedance.com>
>
> In many time-sensitive scenarios, we need a shorter time to restart 
> the kernel. However, in the current kexec fast restart code, there 
> are many places in the memory copy operation, verification operation 
> and decompression operation, which take more time than 500ms. Through 
> the following patch series. machine_kexec-->start_kernel only takes
> 15ms

Is this a tiny embedded device you are taking the timings of?

How are you handling driver shutdown and restart?  I would expect those
to be a larger piece of the puzzle than memory.

My desktop can do something like 128GiB/s.  Which would suggest that
copying 128MiB of kernel+initrd would take perhaps 10ms.  The SHA256
implementation may not be tuned so that could be part of the performance
issue.  The SHA256 hash has a reputation for having fast
implementations.  I chose SHA256 originally simply because it has more
bits so it makes the odds of detecting an error higher.


If all you care about is booting a kernel as fast as possible it make
make sense to have a large reserved region of memory like we have for
the kexec on panic kernel.  If that really makes sense I recommend
adding a second kernel command line option and a reserving second region
of reserved memory.  That makes telling if the are any conflicts simple.


I am having a hard time seeing how anyone else would want these options.
Losing megabytes of memory simply because you might reboot using kexec
seems like the wrong side of a trade-off.

The CONFIG_KEXEC_PURGATORY_SKIP_SIG option is very misnamed.  It is not
signature verification that is happening it is a hash verification.
There are not encrypted bits at play.  Instead there is a check to
ensure that the kernel has not been corrupted by in-flight DMA that some
driver forgot to shut down.

So you are building a version of kexec that if something goes wrong it
could very easily eat your data, or otherwise do some very bad things
that are absolutely non-trivial to debug.

That the decision to skip the sha256 hash that prevents corruption is
happening at compile time, instead of at run-time, will guarantee the
option is simply not available on any general purpose kernel
configuration.  Given how dangerous it is to skip the hash verification
it is probably not a bad thing overall, but it is most definitely
something that will make maintenance more difficult.


If done well I don't see why anyone would mind a uncompressed kernel
but I don't see what the advantage of what you are doing is over using
vmlinux is the build directory.  It isn't a bzImage but it is the
uncompressed kernel.

As I proof of concept I think what you are doing goes a way to showing
that things can be improved.  My overall sense is that improving things
the way you are proposing does not help the general case and simply adds
to the maintenance burden.

Eric

>
> How to measure time:
>
> c code:
> uint64_t current_cycles(void)
> {
>     uint32_t low, high;
>     asm volatile("rdtsc" : "=a"(low), "=d"(high));
>     return ((uint64_t)low) | ((uint64_t)high << 32);
> }
> assembly code:
>        pushq %rax
>        pushq %rdx
>        rdtsc
>        mov   %eax,%eax
>        shl   $0x20,%rdx
>        or    %rax,%rdx
>        movq  %rdx,0x840(%r14)
>        popq  %rdx
>        popq  %rax
> the timestamp may store in boot_params or kexec control page, so we can
> get the all timestamp after kernel boot up.
>
> huangjie.albert (4):
>   kexec: reuse crash kernel reserved memory for normal kexec
>   kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG
>   x86: Support the uncompressed kernel to speed up booting
>   x86: boot: avoid memory copy if kernel is uncompressed
>
>  arch/x86/Kconfig                   | 10 +++++++++
>  arch/x86/boot/compressed/Makefile  |  5 ++++-
>  arch/x86/boot/compressed/head_64.S |  8 +++++--
>  arch/x86/boot/compressed/misc.c    | 35 +++++++++++++++++++++++++-----
>  arch/x86/purgatory/purgatory.c     |  7 ++++++
>  include/linux/kexec.h              |  9 ++++----
>  include/uapi/linux/kexec.h         |  2 ++
>  kernel/kexec.c                     | 19 +++++++++++++++-
>  kernel/kexec_core.c                | 16 ++++++++------
>  kernel/kexec_file.c                | 20 +++++++++++++++--
>  scripts/Makefile.lib               |  5 +++++
>  11 files changed, 114 insertions(+), 22 deletions(-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/4] faster kexec reboot
@ 2022-07-25 17:04   ` Eric W. Biederman
  0 siblings, 0 replies; 32+ messages in thread
From: Eric W. Biederman @ 2022-07-25 17:04 UTC (permalink / raw)
  To: Albert Huang
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Kirill A. Shutemov, Michael Roth, Kuppuswamy Sathyanarayanan,
	Nathan Chancellor, Peter Zijlstra, Sean Christopherson,
	Joerg Roedel, Mark Rutland, Kees Cook, linux-kernel, kexec,
	linux-kbuild

Albert Huang <huangjie.albert@bytedance.com> writes:

> From: "huangjie.albert" <huangjie.albert@bytedance.com>
>
> In many time-sensitive scenarios, we need a shorter time to restart 
> the kernel. However, in the current kexec fast restart code, there 
> are many places in the memory copy operation, verification operation 
> and decompression operation, which take more time than 500ms. Through 
> the following patch series. machine_kexec-->start_kernel only takes
> 15ms

Is this a tiny embedded device you are taking the timings of?

How are you handling driver shutdown and restart?  I would expect those
to be a larger piece of the puzzle than memory.

My desktop can do something like 128GiB/s.  Which would suggest that
copying 128MiB of kernel+initrd would take perhaps 10ms.  The SHA256
implementation may not be tuned so that could be part of the performance
issue.  The SHA256 hash has a reputation for having fast
implementations.  I chose SHA256 originally simply because it has more
bits so it makes the odds of detecting an error higher.


If all you care about is booting a kernel as fast as possible it make
make sense to have a large reserved region of memory like we have for
the kexec on panic kernel.  If that really makes sense I recommend
adding a second kernel command line option and a reserving second region
of reserved memory.  That makes telling if the are any conflicts simple.


I am having a hard time seeing how anyone else would want these options.
Losing megabytes of memory simply because you might reboot using kexec
seems like the wrong side of a trade-off.

The CONFIG_KEXEC_PURGATORY_SKIP_SIG option is very misnamed.  It is not
signature verification that is happening it is a hash verification.
There are not encrypted bits at play.  Instead there is a check to
ensure that the kernel has not been corrupted by in-flight DMA that some
driver forgot to shut down.

So you are building a version of kexec that if something goes wrong it
could very easily eat your data, or otherwise do some very bad things
that are absolutely non-trivial to debug.

That the decision to skip the sha256 hash that prevents corruption is
happening at compile time, instead of at run-time, will guarantee the
option is simply not available on any general purpose kernel
configuration.  Given how dangerous it is to skip the hash verification
it is probably not a bad thing overall, but it is most definitely
something that will make maintenance more difficult.


If done well I don't see why anyone would mind a uncompressed kernel
but I don't see what the advantage of what you are doing is over using
vmlinux is the build directory.  It isn't a bzImage but it is the
uncompressed kernel.

As I proof of concept I think what you are doing goes a way to showing
that things can be improved.  My overall sense is that improving things
the way you are proposing does not help the general case and simply adds
to the maintenance burden.

Eric

>
> How to measure time:
>
> c code:
> uint64_t current_cycles(void)
> {
>     uint32_t low, high;
>     asm volatile("rdtsc" : "=a"(low), "=d"(high));
>     return ((uint64_t)low) | ((uint64_t)high << 32);
> }
> assembly code:
>        pushq %rax
>        pushq %rdx
>        rdtsc
>        mov   %eax,%eax
>        shl   $0x20,%rdx
>        or    %rax,%rdx
>        movq  %rdx,0x840(%r14)
>        popq  %rdx
>        popq  %rax
> the timestamp may store in boot_params or kexec control page, so we can
> get the all timestamp after kernel boot up.
>
> huangjie.albert (4):
>   kexec: reuse crash kernel reserved memory for normal kexec
>   kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG
>   x86: Support the uncompressed kernel to speed up booting
>   x86: boot: avoid memory copy if kernel is uncompressed
>
>  arch/x86/Kconfig                   | 10 +++++++++
>  arch/x86/boot/compressed/Makefile  |  5 ++++-
>  arch/x86/boot/compressed/head_64.S |  8 +++++--
>  arch/x86/boot/compressed/misc.c    | 35 +++++++++++++++++++++++++-----
>  arch/x86/purgatory/purgatory.c     |  7 ++++++
>  include/linux/kexec.h              |  9 ++++----
>  include/uapi/linux/kexec.h         |  2 ++
>  kernel/kexec.c                     | 19 +++++++++++++++-
>  kernel/kexec_core.c                | 16 ++++++++------
>  kernel/kexec_file.c                | 20 +++++++++++++++--
>  scripts/Makefile.lib               |  5 +++++
>  11 files changed, 114 insertions(+), 22 deletions(-)

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [External] Re: [PATCH 0/4] faster kexec reboot
  2022-07-25 17:04   ` Eric W. Biederman
@ 2022-07-26  5:53     ` 黄杰
  -1 siblings, 0 replies; 32+ messages in thread
From: 黄杰 @ 2022-07-26  5:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Kirill A. Shutemov, Michael Roth, Kuppuswamy Sathyanarayanan,
	Nathan Chancellor, Peter Zijlstra, Sean Christopherson,
	Joerg Roedel, Mark Rutland, Kees Cook, linux-kernel, kexec,
	linux-kbuild

Hi
Eric W. Biederman
Thank you for your advice and opinion, I am very honored

Eric W. Biederman <ebiederm@xmission.com> 于2022年7月26日周二 01:04写道:
>
> Albert Huang <huangjie.albert@bytedance.com> writes:
>
> > From: "huangjie.albert" <huangjie.albert@bytedance.com>
> >
> > In many time-sensitive scenarios, we need a shorter time to restart
> > the kernel. However, in the current kexec fast restart code, there
> > are many places in the memory copy operation, verification operation
> > and decompression operation, which take more time than 500ms. Through
> > the following patch series. machine_kexec-->start_kernel only takes
> > 15ms
>
> Is this a tiny embedded device you are taking the timings of?
>
> How are you handling driver shutdown and restart?  I would expect those
> to be a larger piece of the puzzle than memory.

There is no way to make the code universal in the time optimization here,
and various devices need to be customized, but we have some solutions to
achieve the maintenance and recovery of these devices,
especially the scanning and initialization of pci devices

>
> My desktop can do something like 128GiB/s.  Which would suggest that
> copying 128MiB of kernel+initrd would take perhaps 10ms.  The SHA256
> implementation may not be tuned so that could be part of the performance
> issue.  The SHA256 hash has a reputation for having fast
> implementations.  I chose SHA256 originally simply because it has more
> bits so it makes the odds of detecting an error higher.
>

Yes, sha256 is a better choice, but if there is no memory copy between
kexec load
and kexec -e, and this part of the memory is reserved. Don't think
this part of memory will be changed.
Especially in virtual machine scenarios

>
> If all you care about is booting a kernel as fast as possible it make
> make sense to have a large reserved region of memory like we have for
> the kexec on panic kernel.  If that really makes sense I recommend
> adding a second kernel command line option and a reserving second region
> of reserved memory.  That makes telling if the are any conflicts simple.
>

I initially implemented re-adding a parameter and region, but I
figured out later
that it doesn't really make sense and would waste extra memory.

>
> I am having a hard time seeing how anyone else would want these options.
> Losing megabytes of memory simply because you might reboot using kexec
> seems like the wrong side of a trade-off.

Reuse the memory reserved by the crash kernel? Why does it increase
memory consumption?

>
> The CONFIG_KEXEC_PURGATORY_SKIP_SIG option is very misnamed.  It is not
> signature verification that is happening it is a hash verification.
> There are not encrypted bits at play.  Instead there is a check to
> ensure that the kernel has not been corrupted by in-flight DMA that some
> driver forgot to shut down.
>
Thanks for pointing that out.
but Even if the data is detected to have been changed, there is
currently no way to recover it.
I don't have a good understanding of this place yet. maybe for security reasons?


> So you are building a version of kexec that if something goes wrong it
> could very easily eat your data, or otherwise do some very bad things
> that are absolutely non-trivial to debug.
>
> That the decision to skip the sha256 hash that prevents corruption is
> happening at compile time, instead of at run-time, will guarantee the
> option is simply not available on any general purpose kernel
> configuration.  Given how dangerous it is to skip the hash verification
> it is probably not a bad thing overall, but it is most definitely
> something that will make maintenance more difficult.
>

Maybe parameters will be a better choice. What do you think ?

>
> If done well I don't see why anyone would mind a uncompressed kernel
> but I don't see what the advantage of what you are doing is over using
> vmlinux is the build directory.  It isn't a bzImage but it is the
> uncompressed kernel.
>


> As I proof of concept I think what you are doing goes a way to showing
> that things can be improved.  My overall sense is that improving things
> the way you are proposing does not help the general case and simply adds
> to the maintenance burden.

I don't think so. The kernel startup time of some lightweight virtual
machines maybe
100-200ms (start_kernel->init). But this kexec->start_kernel took more
than 500ms.
This is still valuable, and the overall code size is also very small.

> Eric
>
> >
> > How to measure time:
> >
> > c code:
> > uint64_t current_cycles(void)
> > {
> >     uint32_t low, high;
> >     asm volatile("rdtsc" : "=a"(low), "=d"(high));
> >     return ((uint64_t)low) | ((uint64_t)high << 32);
> > }
> > assembly code:
> >        pushq %rax
> >        pushq %rdx
> >        rdtsc
> >        mov   %eax,%eax
> >        shl   $0x20,%rdx
> >        or    %rax,%rdx
> >        movq  %rdx,0x840(%r14)
> >        popq  %rdx
> >        popq  %rax
> > the timestamp may store in boot_params or kexec control page, so we can
> > get the all timestamp after kernel boot up.
> >
> > huangjie.albert (4):
> >   kexec: reuse crash kernel reserved memory for normal kexec
> >   kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG
> >   x86: Support the uncompressed kernel to speed up booting
> >   x86: boot: avoid memory copy if kernel is uncompressed
> >
> >  arch/x86/Kconfig                   | 10 +++++++++
> >  arch/x86/boot/compressed/Makefile  |  5 ++++-
> >  arch/x86/boot/compressed/head_64.S |  8 +++++--
> >  arch/x86/boot/compressed/misc.c    | 35 +++++++++++++++++++++++++-----
> >  arch/x86/purgatory/purgatory.c     |  7 ++++++
> >  include/linux/kexec.h              |  9 ++++----
> >  include/uapi/linux/kexec.h         |  2 ++
> >  kernel/kexec.c                     | 19 +++++++++++++++-
> >  kernel/kexec_core.c                | 16 ++++++++------
> >  kernel/kexec_file.c                | 20 +++++++++++++++--
> >  scripts/Makefile.lib               |  5 +++++
> >  11 files changed, 114 insertions(+), 22 deletions(-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [External] Re: [PATCH 0/4] faster kexec reboot
@ 2022-07-26  5:53     ` 黄杰
  0 siblings, 0 replies; 32+ messages in thread
From: 黄杰 @ 2022-07-26  5:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Kirill A. Shutemov, Michael Roth, Kuppuswamy Sathyanarayanan,
	Nathan Chancellor, Peter Zijlstra, Sean Christopherson,
	Joerg Roedel, Mark Rutland, Kees Cook, linux-kernel, kexec,
	linux-kbuild

Hi
Eric W. Biederman
Thank you for your advice and opinion, I am very honored

Eric W. Biederman <ebiederm@xmission.com> 于2022年7月26日周二 01:04写道:
>
> Albert Huang <huangjie.albert@bytedance.com> writes:
>
> > From: "huangjie.albert" <huangjie.albert@bytedance.com>
> >
> > In many time-sensitive scenarios, we need a shorter time to restart
> > the kernel. However, in the current kexec fast restart code, there
> > are many places in the memory copy operation, verification operation
> > and decompression operation, which take more time than 500ms. Through
> > the following patch series. machine_kexec-->start_kernel only takes
> > 15ms
>
> Is this a tiny embedded device you are taking the timings of?
>
> How are you handling driver shutdown and restart?  I would expect those
> to be a larger piece of the puzzle than memory.

There is no way to make the code universal in the time optimization here,
and various devices need to be customized, but we have some solutions to
achieve the maintenance and recovery of these devices,
especially the scanning and initialization of pci devices

>
> My desktop can do something like 128GiB/s.  Which would suggest that
> copying 128MiB of kernel+initrd would take perhaps 10ms.  The SHA256
> implementation may not be tuned so that could be part of the performance
> issue.  The SHA256 hash has a reputation for having fast
> implementations.  I chose SHA256 originally simply because it has more
> bits so it makes the odds of detecting an error higher.
>

Yes, sha256 is a better choice, but if there is no memory copy between
kexec load
and kexec -e, and this part of the memory is reserved. Don't think
this part of memory will be changed.
Especially in virtual machine scenarios

>
> If all you care about is booting a kernel as fast as possible it make
> make sense to have a large reserved region of memory like we have for
> the kexec on panic kernel.  If that really makes sense I recommend
> adding a second kernel command line option and a reserving second region
> of reserved memory.  That makes telling if the are any conflicts simple.
>

I initially implemented re-adding a parameter and region, but I
figured out later
that it doesn't really make sense and would waste extra memory.

>
> I am having a hard time seeing how anyone else would want these options.
> Losing megabytes of memory simply because you might reboot using kexec
> seems like the wrong side of a trade-off.

Reuse the memory reserved by the crash kernel? Why does it increase
memory consumption?

>
> The CONFIG_KEXEC_PURGATORY_SKIP_SIG option is very misnamed.  It is not
> signature verification that is happening it is a hash verification.
> There are not encrypted bits at play.  Instead there is a check to
> ensure that the kernel has not been corrupted by in-flight DMA that some
> driver forgot to shut down.
>
Thanks for pointing that out.
but Even if the data is detected to have been changed, there is
currently no way to recover it.
I don't have a good understanding of this place yet. maybe for security reasons?


> So you are building a version of kexec that if something goes wrong it
> could very easily eat your data, or otherwise do some very bad things
> that are absolutely non-trivial to debug.
>
> That the decision to skip the sha256 hash that prevents corruption is
> happening at compile time, instead of at run-time, will guarantee the
> option is simply not available on any general purpose kernel
> configuration.  Given how dangerous it is to skip the hash verification
> it is probably not a bad thing overall, but it is most definitely
> something that will make maintenance more difficult.
>

Maybe parameters will be a better choice. What do you think ?

>
> If done well I don't see why anyone would mind a uncompressed kernel
> but I don't see what the advantage of what you are doing is over using
> vmlinux is the build directory.  It isn't a bzImage but it is the
> uncompressed kernel.
>


> As I proof of concept I think what you are doing goes a way to showing
> that things can be improved.  My overall sense is that improving things
> the way you are proposing does not help the general case and simply adds
> to the maintenance burden.

I don't think so. The kernel startup time of some lightweight virtual
machines maybe
100-200ms (start_kernel->init). But this kexec->start_kernel took more
than 500ms.
This is still valuable, and the overall code size is also very small.

> Eric
>
> >
> > How to measure time:
> >
> > c code:
> > uint64_t current_cycles(void)
> > {
> >     uint32_t low, high;
> >     asm volatile("rdtsc" : "=a"(low), "=d"(high));
> >     return ((uint64_t)low) | ((uint64_t)high << 32);
> > }
> > assembly code:
> >        pushq %rax
> >        pushq %rdx
> >        rdtsc
> >        mov   %eax,%eax
> >        shl   $0x20,%rdx
> >        or    %rax,%rdx
> >        movq  %rdx,0x840(%r14)
> >        popq  %rdx
> >        popq  %rax
> > the timestamp may store in boot_params or kexec control page, so we can
> > get the all timestamp after kernel boot up.
> >
> > huangjie.albert (4):
> >   kexec: reuse crash kernel reserved memory for normal kexec
> >   kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG
> >   x86: Support the uncompressed kernel to speed up booting
> >   x86: boot: avoid memory copy if kernel is uncompressed
> >
> >  arch/x86/Kconfig                   | 10 +++++++++
> >  arch/x86/boot/compressed/Makefile  |  5 ++++-
> >  arch/x86/boot/compressed/head_64.S |  8 +++++--
> >  arch/x86/boot/compressed/misc.c    | 35 +++++++++++++++++++++++++-----
> >  arch/x86/purgatory/purgatory.c     |  7 ++++++
> >  include/linux/kexec.h              |  9 ++++----
> >  include/uapi/linux/kexec.h         |  2 ++
> >  kernel/kexec.c                     | 19 +++++++++++++++-
> >  kernel/kexec_core.c                | 16 ++++++++------
> >  kernel/kexec_file.c                | 20 +++++++++++++++--
> >  scripts/Makefile.lib               |  5 +++++
> >  11 files changed, 114 insertions(+), 22 deletions(-)

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [External] Re: [PATCH 0/4] faster kexec reboot
  2022-07-26  5:53     ` 黄杰
@ 2022-07-28  1:55       ` 黄杰
  -1 siblings, 0 replies; 32+ messages in thread
From: 黄杰 @ 2022-07-28  1:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Kirill A. Shutemov, Michael Roth, Kuppuswamy Sathyanarayanan,
	Nathan Chancellor, Peter Zijlstra, Sean Christopherson,
	Joerg Roedel, Mark Rutland, Kees Cook, linux-kernel, kexec,
	linux-kbuild

黄杰 <huangjie.albert@bytedance.com> 于2022年7月26日周二 13:53写道:
>
> Hi
> Eric W. Biederman
> Thank you for your advice and opinion, I am very honored
>
> Eric W. Biederman <ebiederm@xmission.com> 于2022年7月26日周二 01:04写道:
> >
> > Albert Huang <huangjie.albert@bytedance.com> writes:
> >
> > > From: "huangjie.albert" <huangjie.albert@bytedance.com>
> > >
> > > In many time-sensitive scenarios, we need a shorter time to restart
> > > the kernel. However, in the current kexec fast restart code, there
> > > are many places in the memory copy operation, verification operation
> > > and decompression operation, which take more time than 500ms. Through
> > > the following patch series. machine_kexec-->start_kernel only takes
> > > 15ms
> >
> > Is this a tiny embedded device you are taking the timings of?
> >
> > How are you handling driver shutdown and restart?  I would expect those
> > to be a larger piece of the puzzle than memory.
>
> There is no way to make the code universal in the time optimization here,
> and various devices need to be customized, but we have some solutions to
> achieve the maintenance and recovery of these devices,
> especially the scanning and initialization of pci devices
>
> >
> > My desktop can do something like 128GiB/s.  Which would suggest that
> > copying 128MiB of kernel+initrd would take perhaps 10ms.  The SHA256
> > implementation may not be tuned so that could be part of the performance
> > issue.  The SHA256 hash has a reputation for having fast
> > implementations.  I chose SHA256 originally simply because it has more
> > bits so it makes the odds of detecting an error higher.
> >
>
> Yes, sha256 is a better choice, but if there is no memory copy between
> kexec load
> and kexec -e, and this part of the memory is reserved. Don't think
> this part of memory will be changed.
> Especially in virtual machine scenarios
>

hi  Eric :

Do you know why this sha256 check is put here? I feel that it is
better to put it in the system call of kexec -e.
If the verification is not passed, the second kernel will not be
started, and some prompt information will be
printed at the same time, which seems to be better than when the
second kernel is started.
Doing the verification operation will be more friendly, and it can
also reduce downtime.

BR
albert.

> >
> > If all you care about is booting a kernel as fast as possible it make
> > make sense to have a large reserved region of memory like we have for
> > the kexec on panic kernel.  If that really makes sense I recommend
> > adding a second kernel command line option and a reserving second region
> > of reserved memory.  That makes telling if the are any conflicts simple.
> >
>
> I initially implemented re-adding a parameter and region, but I
> figured out later
> that it doesn't really make sense and would waste extra memory.
>
> >
> > I am having a hard time seeing how anyone else would want these options.
> > Losing megabytes of memory simply because you might reboot using kexec
> > seems like the wrong side of a trade-off.
>
> Reuse the memory reserved by the crash kernel? Why does it increase
> memory consumption?
>
> >
> > The CONFIG_KEXEC_PURGATORY_SKIP_SIG option is very misnamed.  It is not
> > signature verification that is happening it is a hash verification.
> > There are not encrypted bits at play.  Instead there is a check to
> > ensure that the kernel has not been corrupted by in-flight DMA that some
> > driver forgot to shut down.
> >
> Thanks for pointing that out.
> but Even if the data is detected to have been changed, there is
> currently no way to recover it.
> I don't have a good understanding of this place yet. maybe for security reasons?
>
>
> > So you are building a version of kexec that if something goes wrong it
> > could very easily eat your data, or otherwise do some very bad things
> > that are absolutely non-trivial to debug.
> >
> > That the decision to skip the sha256 hash that prevents corruption is
> > happening at compile time, instead of at run-time, will guarantee the
> > option is simply not available on any general purpose kernel
> > configuration.  Given how dangerous it is to skip the hash verification
> > it is probably not a bad thing overall, but it is most definitely
> > something that will make maintenance more difficult.
> >
>
> Maybe parameters will be a better choice. What do you think ?
>
> >
> > If done well I don't see why anyone would mind a uncompressed kernel
> > but I don't see what the advantage of what you are doing is over using
> > vmlinux is the build directory.  It isn't a bzImage but it is the
> > uncompressed kernel.
> >
>
>
> > As I proof of concept I think what you are doing goes a way to showing
> > that things can be improved.  My overall sense is that improving things
> > the way you are proposing does not help the general case and simply adds
> > to the maintenance burden.
>
> I don't think so. The kernel startup time of some lightweight virtual
> machines maybe
> 100-200ms (start_kernel->init). But this kexec->start_kernel took more
> than 500ms.
> This is still valuable, and the overall code size is also very small.
>
> > Eric
> >
> > >
> > > How to measure time:
> > >
> > > c code:
> > > uint64_t current_cycles(void)
> > > {
> > >     uint32_t low, high;
> > >     asm volatile("rdtsc" : "=a"(low), "=d"(high));
> > >     return ((uint64_t)low) | ((uint64_t)high << 32);
> > > }
> > > assembly code:
> > >        pushq %rax
> > >        pushq %rdx
> > >        rdtsc
> > >        mov   %eax,%eax
> > >        shl   $0x20,%rdx
> > >        or    %rax,%rdx
> > >        movq  %rdx,0x840(%r14)
> > >        popq  %rdx
> > >        popq  %rax
> > > the timestamp may store in boot_params or kexec control page, so we can
> > > get the all timestamp after kernel boot up.
> > >
> > > huangjie.albert (4):
> > >   kexec: reuse crash kernel reserved memory for normal kexec
> > >   kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG
> > >   x86: Support the uncompressed kernel to speed up booting
> > >   x86: boot: avoid memory copy if kernel is uncompressed
> > >
> > >  arch/x86/Kconfig                   | 10 +++++++++
> > >  arch/x86/boot/compressed/Makefile  |  5 ++++-
> > >  arch/x86/boot/compressed/head_64.S |  8 +++++--
> > >  arch/x86/boot/compressed/misc.c    | 35 +++++++++++++++++++++++++-----
> > >  arch/x86/purgatory/purgatory.c     |  7 ++++++
> > >  include/linux/kexec.h              |  9 ++++----
> > >  include/uapi/linux/kexec.h         |  2 ++
> > >  kernel/kexec.c                     | 19 +++++++++++++++-
> > >  kernel/kexec_core.c                | 16 ++++++++------
> > >  kernel/kexec_file.c                | 20 +++++++++++++++--
> > >  scripts/Makefile.lib               |  5 +++++
> > >  11 files changed, 114 insertions(+), 22 deletions(-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [External] Re: [PATCH 0/4] faster kexec reboot
@ 2022-07-28  1:55       ` 黄杰
  0 siblings, 0 replies; 32+ messages in thread
From: 黄杰 @ 2022-07-28  1:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Masahiro Yamada, Michal Marek, Nick Desaulniers,
	Kirill A. Shutemov, Michael Roth, Kuppuswamy Sathyanarayanan,
	Nathan Chancellor, Peter Zijlstra, Sean Christopherson,
	Joerg Roedel, Mark Rutland, Kees Cook, linux-kernel, kexec,
	linux-kbuild

黄杰 <huangjie.albert@bytedance.com> 于2022年7月26日周二 13:53写道:
>
> Hi
> Eric W. Biederman
> Thank you for your advice and opinion, I am very honored
>
> Eric W. Biederman <ebiederm@xmission.com> 于2022年7月26日周二 01:04写道:
> >
> > Albert Huang <huangjie.albert@bytedance.com> writes:
> >
> > > From: "huangjie.albert" <huangjie.albert@bytedance.com>
> > >
> > > In many time-sensitive scenarios, we need a shorter time to restart
> > > the kernel. However, in the current kexec fast restart code, there
> > > are many places in the memory copy operation, verification operation
> > > and decompression operation, which take more time than 500ms. Through
> > > the following patch series. machine_kexec-->start_kernel only takes
> > > 15ms
> >
> > Is this a tiny embedded device you are taking the timings of?
> >
> > How are you handling driver shutdown and restart?  I would expect those
> > to be a larger piece of the puzzle than memory.
>
> There is no way to make the code universal in the time optimization here,
> and various devices need to be customized, but we have some solutions to
> achieve the maintenance and recovery of these devices,
> especially the scanning and initialization of pci devices
>
> >
> > My desktop can do something like 128GiB/s.  Which would suggest that
> > copying 128MiB of kernel+initrd would take perhaps 10ms.  The SHA256
> > implementation may not be tuned so that could be part of the performance
> > issue.  The SHA256 hash has a reputation for having fast
> > implementations.  I chose SHA256 originally simply because it has more
> > bits so it makes the odds of detecting an error higher.
> >
>
> Yes, sha256 is a better choice, but if there is no memory copy between
> kexec load
> and kexec -e, and this part of the memory is reserved. Don't think
> this part of memory will be changed.
> Especially in virtual machine scenarios
>

hi  Eric :

Do you know why this sha256 check is put here? I feel that it is
better to put it in the system call of kexec -e.
If the verification is not passed, the second kernel will not be
started, and some prompt information will be
printed at the same time, which seems to be better than when the
second kernel is started.
Doing the verification operation will be more friendly, and it can
also reduce downtime.

BR
albert.

> >
> > If all you care about is booting a kernel as fast as possible it make
> > make sense to have a large reserved region of memory like we have for
> > the kexec on panic kernel.  If that really makes sense I recommend
> > adding a second kernel command line option and a reserving second region
> > of reserved memory.  That makes telling if the are any conflicts simple.
> >
>
> I initially implemented re-adding a parameter and region, but I
> figured out later
> that it doesn't really make sense and would waste extra memory.
>
> >
> > I am having a hard time seeing how anyone else would want these options.
> > Losing megabytes of memory simply because you might reboot using kexec
> > seems like the wrong side of a trade-off.
>
> Reuse the memory reserved by the crash kernel? Why does it increase
> memory consumption?
>
> >
> > The CONFIG_KEXEC_PURGATORY_SKIP_SIG option is very misnamed.  It is not
> > signature verification that is happening it is a hash verification.
> > There are not encrypted bits at play.  Instead there is a check to
> > ensure that the kernel has not been corrupted by in-flight DMA that some
> > driver forgot to shut down.
> >
> Thanks for pointing that out.
> but Even if the data is detected to have been changed, there is
> currently no way to recover it.
> I don't have a good understanding of this place yet. maybe for security reasons?
>
>
> > So you are building a version of kexec that if something goes wrong it
> > could very easily eat your data, or otherwise do some very bad things
> > that are absolutely non-trivial to debug.
> >
> > That the decision to skip the sha256 hash that prevents corruption is
> > happening at compile time, instead of at run-time, will guarantee the
> > option is simply not available on any general purpose kernel
> > configuration.  Given how dangerous it is to skip the hash verification
> > it is probably not a bad thing overall, but it is most definitely
> > something that will make maintenance more difficult.
> >
>
> Maybe parameters will be a better choice. What do you think ?
>
> >
> > If done well I don't see why anyone would mind a uncompressed kernel
> > but I don't see what the advantage of what you are doing is over using
> > vmlinux is the build directory.  It isn't a bzImage but it is the
> > uncompressed kernel.
> >
>
>
> > As I proof of concept I think what you are doing goes a way to showing
> > that things can be improved.  My overall sense is that improving things
> > the way you are proposing does not help the general case and simply adds
> > to the maintenance burden.
>
> I don't think so. The kernel startup time of some lightweight virtual
> machines maybe
> 100-200ms (start_kernel->init). But this kexec->start_kernel took more
> than 500ms.
> This is still valuable, and the overall code size is also very small.
>
> > Eric
> >
> > >
> > > How to measure time:
> > >
> > > c code:
> > > uint64_t current_cycles(void)
> > > {
> > >     uint32_t low, high;
> > >     asm volatile("rdtsc" : "=a"(low), "=d"(high));
> > >     return ((uint64_t)low) | ((uint64_t)high << 32);
> > > }
> > > assembly code:
> > >        pushq %rax
> > >        pushq %rdx
> > >        rdtsc
> > >        mov   %eax,%eax
> > >        shl   $0x20,%rdx
> > >        or    %rax,%rdx
> > >        movq  %rdx,0x840(%r14)
> > >        popq  %rdx
> > >        popq  %rax
> > > the timestamp may store in boot_params or kexec control page, so we can
> > > get the all timestamp after kernel boot up.
> > >
> > > huangjie.albert (4):
> > >   kexec: reuse crash kernel reserved memory for normal kexec
> > >   kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG
> > >   x86: Support the uncompressed kernel to speed up booting
> > >   x86: boot: avoid memory copy if kernel is uncompressed
> > >
> > >  arch/x86/Kconfig                   | 10 +++++++++
> > >  arch/x86/boot/compressed/Makefile  |  5 ++++-
> > >  arch/x86/boot/compressed/head_64.S |  8 +++++--
> > >  arch/x86/boot/compressed/misc.c    | 35 +++++++++++++++++++++++++-----
> > >  arch/x86/purgatory/purgatory.c     |  7 ++++++
> > >  include/linux/kexec.h              |  9 ++++----
> > >  include/uapi/linux/kexec.h         |  2 ++
> > >  kernel/kexec.c                     | 19 +++++++++++++++-
> > >  kernel/kexec_core.c                | 16 ++++++++------
> > >  kernel/kexec_file.c                | 20 +++++++++++++++--
> > >  scripts/Makefile.lib               |  5 +++++
> > >  11 files changed, 114 insertions(+), 22 deletions(-)

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [External] Re: [PATCH 2/4] kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG
  2022-07-25 13:32       ` 黄杰
@ 2022-07-28  1:57         ` 黄杰
  -1 siblings, 0 replies; 32+ messages in thread
From: 黄杰 @ 2022-07-28  1:57 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Eric Biederman, Masahiro Yamada, Michal Marek,
	Nick Desaulniers, Kirill A. Shutemov, Brijesh Singh,
	Michael Roth, Nathan Chancellor, Kuppuswamy Sathyanarayanan,
	Ard Biesheuvel, Peter Zijlstra, Sean Christopherson,
	Joerg Roedel, Mark Rutland, Kees Cook, linux-kernel, kexec,
	linux-kbuild

Does anyone know why this sha256 checksum is put here? I feel that it
is better to put it in the system call of kexec -e.
If the verification is not passed, the second kernel will not be
started, and some prompt information will be printed at the
same time, which seems to be better than when the second kernel is
started. Doing the verification operation will be more friendly,
 and it can also reduce downtime.

黄杰 <huangjie.albert@bytedance.com> 于2022年7月25日周一 21:32写道:
>
> maybe a boot parameter ?
>
> Jason A. Donenfeld <Jason@zx2c4.com> 于2022年7月25日周一 20:15写道:
> >
> > Hi Albert,
> >
> > On Mon, Jul 25, 2022 at 04:38:54PM +0800, Albert Huang wrote:
> > > +config KEXEC_PURGATORY_SKIP_SIG
> > > +     bool "skip kexec purgatory signature verification"
> > > +     depends on ARCH_HAS_KEXEC_PURGATORY
> > > +     help
> > > +       this options makes the kexec purgatory do  not signature verification
> > > +       which would get hundreds of milliseconds saved during kexec boot. If we can
> > > +       confirm that the data of each segment loaded by kexec will not change we may
> > > +       enable this option
> > > +
> >
> > Some grammar nits here, but actually, wouldn't it be better to make this
> > depend on some other signature things instead? Like if the parent kernel
> > actually did a big signature computation, then maybe the purgatory step
> > is needed, but if it didn't bother, then maybe you can skip it. This
> > way, you don't need a compile-time option that might change some aspect
> > of signature verification people might otherwise be relying on.
> >
> > Jason

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [External] Re: [PATCH 2/4] kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG
@ 2022-07-28  1:57         ` 黄杰
  0 siblings, 0 replies; 32+ messages in thread
From: 黄杰 @ 2022-07-28  1:57 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Eric Biederman, Masahiro Yamada, Michal Marek,
	Nick Desaulniers, Kirill A. Shutemov, Brijesh Singh,
	Michael Roth, Nathan Chancellor, Kuppuswamy Sathyanarayanan,
	Ard Biesheuvel, Peter Zijlstra, Sean Christopherson,
	Joerg Roedel, Mark Rutland, Kees Cook, linux-kernel, kexec,
	linux-kbuild

Does anyone know why this sha256 checksum is put here? I feel that it
is better to put it in the system call of kexec -e.
If the verification is not passed, the second kernel will not be
started, and some prompt information will be printed at the
same time, which seems to be better than when the second kernel is
started. Doing the verification operation will be more friendly,
 and it can also reduce downtime.

黄杰 <huangjie.albert@bytedance.com> 于2022年7月25日周一 21:32写道:
>
> maybe a boot parameter ?
>
> Jason A. Donenfeld <Jason@zx2c4.com> 于2022年7月25日周一 20:15写道:
> >
> > Hi Albert,
> >
> > On Mon, Jul 25, 2022 at 04:38:54PM +0800, Albert Huang wrote:
> > > +config KEXEC_PURGATORY_SKIP_SIG
> > > +     bool "skip kexec purgatory signature verification"
> > > +     depends on ARCH_HAS_KEXEC_PURGATORY
> > > +     help
> > > +       this options makes the kexec purgatory do  not signature verification
> > > +       which would get hundreds of milliseconds saved during kexec boot. If we can
> > > +       confirm that the data of each segment loaded by kexec will not change we may
> > > +       enable this option
> > > +
> >
> > Some grammar nits here, but actually, wouldn't it be better to make this
> > depend on some other signature things instead? Like if the parent kernel
> > actually did a big signature computation, then maybe the purgatory step
> > is needed, but if it didn't bother, then maybe you can skip it. This
> > way, you don't need a compile-time option that might change some aspect
> > of signature verification people might otherwise be relying on.
> >
> > Jason

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2022-07-28  1:58 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-25  8:38 [PATCH 0/4] faster kexec reboot Albert Huang
2022-07-25  8:38 ` Albert Huang
2022-07-25  8:38 ` [PATCH 1/4] kexec: reuse crash kernel reserved memory for normal kexec Albert Huang
2022-07-25  8:38   ` Albert Huang
2022-07-25 12:02   ` Jason A. Donenfeld
2022-07-25 12:02     ` Jason A. Donenfeld
2022-07-25 12:56     ` Fwd: [External] " 黄杰
2022-07-25 13:30     ` 黄杰
2022-07-25 13:30       ` 黄杰
2022-07-25  8:38 ` [PATCH 2/4] kexec: add CONFING_KEXEC_PURGATORY_SKIP_SIG Albert Huang
2022-07-25  8:38   ` Albert Huang
2022-07-25 12:15   ` Jason A. Donenfeld
2022-07-25 13:32     ` [External] " 黄杰
2022-07-25 13:32       ` 黄杰
2022-07-28  1:57       ` 黄杰
2022-07-28  1:57         ` 黄杰
2022-07-25 12:56   ` Fwd: " 黄杰
2022-07-25  8:38 ` [PATCH 3/4] x86: Support the uncompressed kernel to speed up booting Albert Huang
2022-07-25  8:38   ` Albert Huang
2022-07-25 12:55   ` Fwd: " 黄杰
2022-07-25 16:57   ` Eric W. Biederman
2022-07-25 16:57     ` Eric W. Biederman
2022-07-25  8:38 ` [PATCH 4/4] x86: boot: avoid memory copy if kernel is uncompressed Albert Huang
2022-07-25  8:38   ` Albert Huang
2022-07-25 12:55   ` Fwd: " 黄杰
2022-07-25 12:54 ` Fwd: [PATCH 0/4] faster kexec reboot 黄杰
2022-07-25 17:04 ` Eric W. Biederman
2022-07-25 17:04   ` Eric W. Biederman
2022-07-26  5:53   ` [External] " 黄杰
2022-07-26  5:53     ` 黄杰
2022-07-28  1:55     ` 黄杰
2022-07-28  1:55       ` 黄杰

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.