linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] Speed booting by sorting ORC unwind tables at build time
@ 2019-11-07 14:32 shile.zhang
  2019-11-07 14:32 ` [RFC PATCH 1/4] scripts: Add sortorctable to sort ORC unwind tables shile.zhang
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: shile.zhang @ 2019-11-07 14:32 UTC (permalink / raw)
  To: Masahiro Yamada, Michal Marek, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Josh Poimboeuf, x86
  Cc: H . Peter Anvin, linux-kernel, linux-kbuild, Shile Zhang

From: Shile Zhang <shile.zhang@linux.alibaba.com>

Hi,

I found the unwind_init taken long time (more than 90ms) in kernel
booting, mainly spent on sorting the two ORC unwind tables, orc_unwind
and orc_unwind_ip.

I also noticed that this issued has reported and discussed last year:
https://lkml.org/lkml/2018/10/8/342
But seems no final solution until now, I tried to sort the ORC tables at
build time, followed the helpful hints from Josh and Ingo in that thread.
And mainly referred the implementation of 'sortextable' tool:
https://lore.kernel.org/linux-mips/1334872799-14589-1-git-send-email-ddaney.cavm@gmail.com/

What I did:

- Add a Kconfig to control build-time sorting or runtime sorting;
- Referred 'sortextable', create a similar helper tool 'sortorctable',
  help to sort the ORC unwind tables at vmlinux link process.

One potential improvement is to sort the module ORC tables in future.

Thanks!

Shile Zhang (4):
  scripts: Add sortorctable to sort ORC unwind tables
  kbuild: Sort ORC unwind tables in vmlinux link process
  x86/unwind/orc: Skip sorting if BUILDTIME_ORCTABLE_SORT is configured
  x86/Kconfig: Add a Kconfig option to sort ORC tables at build time

 arch/x86/Kconfig.debug       |   9 ++
 arch/x86/kernel/unwind_orc.c |   2 +
 scripts/.gitignore           |   1 +
 scripts/Makefile             |   2 +
 scripts/link-vmlinux.sh      |  10 ++
 scripts/sortorctable.c       | 246 +++++++++++++++++++++++++++++++++++
 scripts/sortorctable.h       |  25 ++++
 7 files changed, 295 insertions(+)
 create mode 100644 scripts/sortorctable.c
 create mode 100644 scripts/sortorctable.h

-- 
2.24.0.rc2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH 1/4] scripts: Add sortorctable to sort ORC unwind tables
  2019-11-07 14:32 [RFC PATCH 0/4] Speed booting by sorting ORC unwind tables at build time shile.zhang
@ 2019-11-07 14:32 ` shile.zhang
  2019-11-07 14:32 ` [RFC PATCH 2/4] kbuild: Sort ORC unwind tables in vmlinux link process shile.zhang
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: shile.zhang @ 2019-11-07 14:32 UTC (permalink / raw)
  To: Masahiro Yamada, Michal Marek, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Josh Poimboeuf, x86
  Cc: H . Peter Anvin, linux-kernel, linux-kbuild, Shile Zhang

From: Shile Zhang <shile.zhang@linux.alibaba.com>

Sort orc_unwind and orc_unwind_ip tables at build time instead of runtime
in boot pharse can save more boot time.

Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
---
 scripts/.gitignore     |   1 +
 scripts/Makefile       |   2 +
 scripts/sortorctable.c | 246 +++++++++++++++++++++++++++++++++++++++++
 scripts/sortorctable.h |  25 +++++
 4 files changed, 274 insertions(+)
 create mode 100644 scripts/sortorctable.c
 create mode 100644 scripts/sortorctable.h

diff --git a/scripts/.gitignore b/scripts/.gitignore
index 17f8cef88fa8..52976f32f974 100644
--- a/scripts/.gitignore
+++ b/scripts/.gitignore
@@ -12,3 +12,4 @@ asn1_compiler
 extract-cert
 sign-file
 insert-sys-cert
+sortorctable
diff --git a/scripts/Makefile b/scripts/Makefile
index 3e86b300f5a1..51b2d465f042 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -16,12 +16,14 @@ hostprogs-$(CONFIG_LOGO)         += pnmtologo
 hostprogs-$(CONFIG_VT)           += conmakehash
 hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount
 hostprogs-$(CONFIG_BUILDTIME_EXTABLE_SORT) += sortextable
+hostprogs-$(CONFIG_BUILDTIME_ORCTABLE_SORT) += sortorctable
 hostprogs-$(CONFIG_ASN1)	 += asn1_compiler
 hostprogs-$(CONFIG_MODULE_SIG_FORMAT) += sign-file
 hostprogs-$(CONFIG_SYSTEM_TRUSTED_KEYRING) += extract-cert
 hostprogs-$(CONFIG_SYSTEM_EXTRA_CERTIFICATE) += insert-sys-cert
 
 HOSTCFLAGS_sortextable.o = -I$(srctree)/tools/include
+HOSTCFLAGS_sortorctable.o = -I$(srctree)/tools/include
 HOSTCFLAGS_asn1_compiler.o = -I$(srctree)/include
 HOSTLDLIBS_sign-file = -lcrypto
 HOSTLDLIBS_extract-cert = -lcrypto
diff --git a/scripts/sortorctable.c b/scripts/sortorctable.c
new file mode 100644
index 000000000000..39be47d1d296
--- /dev/null
+++ b/scripts/sortorctable.c
@@ -0,0 +1,246 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * sortorctable: used to sort the ORC unwind tables
+ *
+ * Strategy: alter the vmlinux file in-place
+ *
+ * Some code taken out of lib/sort.c and arch/x86/kernel/unwind_orc.c.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <string.h>
+#include <errno.h>
+#include <sys/mman.h>
+#include <elf.h>
+
+#include "sortorctable.h"
+
+int *cur_orc_ip_table;
+struct orc_entry *cur_orc_table;
+
+/**
+ * sort - sort an array of elements
+ * @base: pointer to data to sort
+ * @num: number of elements
+ * @size: size of each element
+ * @cmp_func: pointer to comparison function
+ * @swap_func: pointer to swap function
+ *
+ * This function does a heapsort on the given array. You may provide a
+ * swap_func function optimized to your element type.
+ *
+ * Sorting time is O(n log n) both on average and worst-case. While
+ * qsort is about 20% faster on average, it suffers from exploitable
+ * O(n*n) worst-case behavior and extra memory requirements that make
+ * it less suitable for kernel use.
+ */
+static void sort(void *base, size_t num, size_t size,
+	  int (*cmp_func)(const void *, const void *),
+	  void (*swap_func)(void *, void *, int size))
+{
+	/* pre-scale counters for performance */
+	int i = (num/2 - 1) * size, n = num * size, c, r;
+
+	/* heapify */
+	for ( ; i >= 0; i -= size) {
+		for (r = i; r * 2 + size < n; r  = c) {
+			c = r * 2 + size;
+			if (c < n - size &&
+					cmp_func(base + c, base + c + size) < 0)
+				c += size;
+			if (cmp_func(base + r, base + c) >= 0)
+				break;
+			swap_func(base + r, base + c, size);
+		}
+	}
+
+	/* sort */
+	for (i = n - size; i > 0; i -= size) {
+		swap_func(base, base + i, size);
+		for (r = 0; r * 2 + size < i; r = c) {
+			c = r * 2 + size;
+			if (c < i - size &&
+					cmp_func(base + c, base + c + size) < 0)
+				c += size;
+			if (cmp_func(base + r, base + c) >= 0)
+				break;
+			swap_func(base + r, base + c, size);
+		}
+	}
+}
+
+static inline unsigned long orc_ip(const int *ip)
+{
+	return (unsigned long)ip + *ip;
+}
+
+static void orc_sort_swap(void *_a, void *_b, int size)
+{
+	struct orc_entry *orc_a, *orc_b;
+	struct orc_entry orc_tmp;
+	int *a = _a, *b = _b, tmp;
+	int delta = _b - _a;
+
+	/* Swap the .orc_unwind_ip entries: */
+	tmp = *a;
+	*a = *b + delta;
+	*b = tmp - delta;
+
+	/* Swap the corresponding .orc_unwind entries: */
+	orc_a = cur_orc_table + (a - cur_orc_ip_table);
+	orc_b = cur_orc_table + (b - cur_orc_ip_table);
+	orc_tmp = *orc_a;
+	*orc_a = *orc_b;
+	*orc_b = orc_tmp;
+}
+
+static int orc_sort_cmp(const void *_a, const void *_b)
+{
+	struct orc_entry *orc_a;
+	const int *a = _a, *b = _b;
+	unsigned long a_val = orc_ip(a);
+	unsigned long b_val = orc_ip(b);
+
+	if (a_val > b_val)
+		return 1;
+	if (a_val < b_val)
+		return -1;
+
+	/*
+	 * The "weak" section terminator entries need to always be on the left
+	 * to ensure the lookup code skips them in favor of real entries.
+	 * These terminator entries exist to handle any gaps created by
+	 * whitelisted .o files which didn't get objtool generation.
+	 */
+	orc_a = cur_orc_table + (a - cur_orc_ip_table);
+	return orc_a->sp_reg == ORC_REG_UNDEFINED && !orc_a->end ? -1 : 1;
+}
+
+/* ORC unwind only supports X86_64 */
+static int do_precheck(const char *fname, void *addr)
+{
+	Elf64_Ehdr * const ehdr = addr;
+
+	if (ehdr->e_ident[EI_DATA] != ELFDATA2LSB) {
+		fprintf(stderr, "%s: unsupported ELF data encoding %d\n",
+			fname, ehdr->e_ident[EI_DATA]);
+		return 1;
+	}
+
+	if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0
+	    || (ehdr->e_type != ET_EXEC && ehdr->e_type != ET_DYN)
+	    || ehdr->e_ident[EI_VERSION] != EV_CURRENT) {
+		fprintf(stderr,
+			"%s: unrecognized ET_EXEC/ET_DYN file\n", fname);
+		return 1;
+	}
+
+	if (ehdr->e_machine != EM_X86_64) {
+		fprintf(stderr, "%s: unsupported e_machine %d\n",
+			fname, ehdr->e_machine);
+		return 1;
+	}
+
+	if (ehdr->e_ident[EI_CLASS] != ELFCLASS64
+	    || ehdr->e_ehsize != sizeof(Elf64_Ehdr)
+	    || ehdr->e_shentsize != sizeof(Elf64_Shdr)) {
+		fprintf(stderr, "%s: unrecognized ELF class %d\n",
+			fname, ehdr->e_ident[EI_CLASS]);
+		return 1;
+	}
+
+	return 0;
+}
+
+static int do_sort(const char *fname, void *addr)
+{
+	unsigned int orc_size, orc_ip_size, num_entries;
+	const Elf64_Shdr *s, *_orc = NULL, *_orc_ip = NULL;
+	Elf64_Ehdr * const ehdr = (Elf64_Ehdr *)addr;
+	Elf64_Shdr * const shdr = (Elf64_Shdr *)((void *)ehdr + ehdr->e_shoff);
+	char *secstrings = (void *)ehdr + shdr[ehdr->e_shstrndx].sh_offset;
+
+	for (s = shdr; s < shdr + ehdr->e_shnum; s++) {
+		if (!strcmp(".orc_unwind_ip", secstrings + s->sh_name))
+			_orc_ip = s;
+		if (!strcmp(".orc_unwind", secstrings + s->sh_name))
+			_orc = s;
+	}
+
+	if (!_orc_ip || !_orc) {
+		fprintf(stderr,
+			"%s: cannot find ORC unwind tables\n", fname);
+		return 1;
+	}
+
+	orc_size	= _orc->sh_size;
+	orc_ip_size	= _orc_ip->sh_size;
+	num_entries	= orc_ip_size / sizeof(int);
+	cur_orc_table	= (struct orc_entry *)((void *)ehdr + _orc->sh_offset);
+	cur_orc_ip_table = (int *)((void *)ehdr + _orc_ip->sh_offset);
+
+	if (orc_ip_size % sizeof(int) != 0
+	    || orc_size % sizeof(struct orc_entry) != 0
+	    || num_entries != orc_size / sizeof(struct orc_entry)) {
+		fprintf(stderr,
+			"%s: wrong ORC unwind table entries number\n", fname);
+		return 1;
+	}
+
+	sort(cur_orc_ip_table, num_entries, sizeof(int), orc_sort_cmp,
+	     orc_sort_swap);
+
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int ret = 0;
+	int fd;
+	char *fname = NULL;
+	struct stat sb;
+	void *addr = NULL;
+
+	if (argc != 2) {
+		fprintf(stderr, "usage: sortorctable vmlinux\n");
+		return 0;
+	}
+
+	fname = argv[1];
+	fd = open(fname, O_RDWR);
+	if (fd < 0 || fstat(fd, &sb) < 0) {
+		perror(fname);
+		return errno;
+	}
+
+	if (!S_ISREG(sb.st_mode)) {
+		fprintf(stderr, "'%s': not a regular file\n", fname);
+		close(fd);
+		return 1;
+	}
+
+	addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+	if (addr == MAP_FAILED) {
+		fprintf(stderr, "'%s': mmap failed\n", fname);
+		close(fd);
+		return 1;
+	}
+
+	ret = do_precheck(fname, addr);
+	if (ret)
+		goto out;
+
+	ret = do_sort(fname, addr);
+
+out:
+	munmap(addr, sb.st_size);
+	close(fd);
+
+	return ret;
+}
diff --git a/scripts/sortorctable.h b/scripts/sortorctable.h
new file mode 100644
index 000000000000..8a7d3bd8b01b
--- /dev/null
+++ b/scripts/sortorctable.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Shile Zhang <shile.zhang@linux.alibaba.com>
+ *
+ * This code was taken out of arch/x86/include/asm/orc_types.h written by:
+ * Copyright (C) 2017 Josh Poimboeuf <jpoimboe@redhat.com>
+ */
+
+#ifndef _SORTORCTABLE_H_
+#define _SORTORCTABLE_H_
+
+#include <linux/types.h>
+
+#define ORC_REG_UNDEFINED	0
+
+struct orc_entry {
+	s16		sp_offset;
+	s16		bp_offset;
+	unsigned	sp_reg:4;
+	unsigned	bp_reg:4;
+	unsigned	type:2;
+	unsigned	end:1;
+} __attribute__((packed));
+
+#endif//_SORTORCTABLE_H_
-- 
2.24.0.rc2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 2/4] kbuild: Sort ORC unwind tables in vmlinux link process
  2019-11-07 14:32 [RFC PATCH 0/4] Speed booting by sorting ORC unwind tables at build time shile.zhang
  2019-11-07 14:32 ` [RFC PATCH 1/4] scripts: Add sortorctable to sort ORC unwind tables shile.zhang
@ 2019-11-07 14:32 ` shile.zhang
  2019-11-07 14:32 ` [RFC PATCH 3/4] x86/unwind/orc: Skip sorting if BUILDTIME_ORCTABLE_SORT is configured shile.zhang
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: shile.zhang @ 2019-11-07 14:32 UTC (permalink / raw)
  To: Masahiro Yamada, Michal Marek, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Josh Poimboeuf, x86
  Cc: H . Peter Anvin, linux-kernel, linux-kbuild, Shile Zhang

From: Shile Zhang <shile.zhang@linux.alibaba.com>

To sort the ORC unwind tables in vmlinux link process, controlled
by configure BUILDTIME_ORCTABLE_SORT.

Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
---
 scripts/link-vmlinux.sh | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 06495379fcd8..43fe8c151c8d 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -183,6 +183,11 @@ sortextable()
 	${objtree}/scripts/sortextable ${1}
 }
 
+sortorctable()
+{
+	${objtree}/scripts/sortorctable ${1}
+}
+
 # Delete output files in case of error
 cleanup()
 {
@@ -303,6 +308,11 @@ if [ -n "${CONFIG_BUILDTIME_EXTABLE_SORT}" ]; then
 	sortextable vmlinux
 fi
 
+if [ -n "${CONFIG_BUILDTIME_ORCTABLE_SORT}" ]; then
+	info SORTORC vmlinux
+	sortorctable vmlinux
+fi
+
 info SYSMAP System.map
 mksysmap vmlinux System.map
 
-- 
2.24.0.rc2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 3/4] x86/unwind/orc: Skip sorting if BUILDTIME_ORCTABLE_SORT is configured
  2019-11-07 14:32 [RFC PATCH 0/4] Speed booting by sorting ORC unwind tables at build time shile.zhang
  2019-11-07 14:32 ` [RFC PATCH 1/4] scripts: Add sortorctable to sort ORC unwind tables shile.zhang
  2019-11-07 14:32 ` [RFC PATCH 2/4] kbuild: Sort ORC unwind tables in vmlinux link process shile.zhang
@ 2019-11-07 14:32 ` shile.zhang
  2019-11-07 14:32 ` [RFC PATCH 4/4] x86/Kconfig: Add a Kconfig option to sort ORC tables at build time shile.zhang
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: shile.zhang @ 2019-11-07 14:32 UTC (permalink / raw)
  To: Masahiro Yamada, Michal Marek, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Josh Poimboeuf, x86
  Cc: H . Peter Anvin, linux-kernel, linux-kbuild, Shile Zhang

From: Shile Zhang <shile.zhang@linux.alibaba.com>

The orc_unwind and orc_unwind_ip tables are sorted at build time if
BUILDTIME_ORCTABLE_SORT is configured, skip sort at boot time.

Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
---
 arch/x86/kernel/unwind_orc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 332ae6530fa8..2c2fa9ef116e 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -273,9 +273,11 @@ void __init unwind_init(void)
 		return;
 	}
 
+#ifndef CONFIG_BUILDTIME_ORCTABLE_SORT
 	/* Sort the .orc_unwind and .orc_unwind_ip tables: */
 	sort(__start_orc_unwind_ip, num_entries, sizeof(int), orc_sort_cmp,
 	     orc_sort_swap);
+#endif
 
 	/* Initialize the fast lookup table: */
 	lookup_num_blocks = orc_lookup_end - orc_lookup;
-- 
2.24.0.rc2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 4/4] x86/Kconfig: Add a Kconfig option to sort ORC tables at build time
  2019-11-07 14:32 [RFC PATCH 0/4] Speed booting by sorting ORC unwind tables at build time shile.zhang
                   ` (2 preceding siblings ...)
  2019-11-07 14:32 ` [RFC PATCH 3/4] x86/unwind/orc: Skip sorting if BUILDTIME_ORCTABLE_SORT is configured shile.zhang
@ 2019-11-07 14:32 ` shile.zhang
  2019-11-07 15:22 ` [RFC PATCH 0/4] Speed booting by sorting ORC unwind " Peter Zijlstra
  2019-11-07 15:46 ` Josh Poimboeuf
  5 siblings, 0 replies; 12+ messages in thread
From: shile.zhang @ 2019-11-07 14:32 UTC (permalink / raw)
  To: Masahiro Yamada, Michal Marek, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Josh Poimboeuf, x86
  Cc: H . Peter Anvin, linux-kernel, linux-kbuild, Shile Zhang

From: Shile Zhang <shile.zhang@linux.alibaba.com>

Add a new Kconfig BUILDTIME_ORCTABLE_SORT to control the ORC unwind
tables at build time. Select for ORC unwinder on x86_64 by default.

Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
---
 arch/x86/Kconfig.debug | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index bf9cd83de777..320ff2af4837 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -335,6 +335,15 @@ config UNWINDER_GUESS
 
 endchoice
 
+config BUILDTIME_ORCTABLE_SORT
+	bool "Sort ORC unwind tables at build time"
+	depends on X86_64
+	depends on UNWINDER_ORC
+	default y
+	help
+	This option enables the build-time sorting for ORC unwind tables. It
+	can help to speed up kernel boot by skip the runtime sorting.
+
 config FRAME_POINTER
 	depends on !UNWINDER_ORC && !UNWINDER_GUESS
 	bool
-- 
2.24.0.rc2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/4] Speed booting by sorting ORC unwind tables at build time
  2019-11-07 14:32 [RFC PATCH 0/4] Speed booting by sorting ORC unwind tables at build time shile.zhang
                   ` (3 preceding siblings ...)
  2019-11-07 14:32 ` [RFC PATCH 4/4] x86/Kconfig: Add a Kconfig option to sort ORC tables at build time shile.zhang
@ 2019-11-07 15:22 ` Peter Zijlstra
  2019-11-08  1:42   ` Shile Zhang
  2019-11-07 15:46 ` Josh Poimboeuf
  5 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2019-11-07 15:22 UTC (permalink / raw)
  To: shile.zhang
  Cc: Masahiro Yamada, Michal Marek, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Josh Poimboeuf, x86, H . Peter Anvin,
	linux-kernel, linux-kbuild

On Thu, Nov 07, 2019 at 10:32:01PM +0800, shile.zhang@linux.alibaba.com wrote:
> From: Shile Zhang <shile.zhang@linux.alibaba.com>
> 
> Hi,
> 
> I found the unwind_init taken long time (more than 90ms) in kernel
> booting, mainly spent on sorting the two ORC unwind tables, orc_unwind
> and orc_unwind_ip.
> 
> I also noticed that this issued has reported and discussed last year:
> https://lkml.org/lkml/2018/10/8/342
> But seems no final solution until now, I tried to sort the ORC tables at
> build time, followed the helpful hints from Josh and Ingo in that thread.
> And mainly referred the implementation of 'sortextable' tool:
> https://lore.kernel.org/linux-mips/1334872799-14589-1-git-send-email-ddaney.cavm@gmail.com/
> 
> What I did:
> 
> - Add a Kconfig to control build-time sorting or runtime sorting;
> - Referred 'sortextable', create a similar helper tool 'sortorctable',
>   help to sort the ORC unwind tables at vmlinux link process.

What is the build-time cost for doing this? The link phase is already a
fairly big bottleneck for building a kernel.

Can sort{ex,orc}table() be ran concurrently? Do they want to be the same
(threaded) tool?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/4] Speed booting by sorting ORC unwind tables at build time
  2019-11-07 14:32 [RFC PATCH 0/4] Speed booting by sorting ORC unwind tables at build time shile.zhang
                   ` (4 preceding siblings ...)
  2019-11-07 15:22 ` [RFC PATCH 0/4] Speed booting by sorting ORC unwind " Peter Zijlstra
@ 2019-11-07 15:46 ` Josh Poimboeuf
  2019-11-08  1:43   ` Shile Zhang
  5 siblings, 1 reply; 12+ messages in thread
From: Josh Poimboeuf @ 2019-11-07 15:46 UTC (permalink / raw)
  To: shile.zhang
  Cc: Masahiro Yamada, Michal Marek, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H . Peter Anvin, linux-kernel,
	linux-kbuild

On Thu, Nov 07, 2019 at 10:32:01PM +0800, shile.zhang@linux.alibaba.com wrote:
> From: Shile Zhang <shile.zhang@linux.alibaba.com>
> 
> Hi,
> 
> I found the unwind_init taken long time (more than 90ms) in kernel
> booting, mainly spent on sorting the two ORC unwind tables, orc_unwind
> and orc_unwind_ip.
> 
> I also noticed that this issued has reported and discussed last year:
> https://lkml.org/lkml/2018/10/8/342
> But seems no final solution until now, I tried to sort the ORC tables at
> build time, followed the helpful hints from Josh and Ingo in that thread.
> And mainly referred the implementation of 'sortextable' tool:
> https://lore.kernel.org/linux-mips/1334872799-14589-1-git-send-email-ddaney.cavm@gmail.com/
> 
> What I did:
> 
> - Add a Kconfig to control build-time sorting or runtime sorting;
> - Referred 'sortextable', create a similar helper tool 'sortorctable',
>   help to sort the ORC unwind tables at vmlinux link process.
> 
> One potential improvement is to sort the module ORC tables in future.
> 
> Thanks!

Thanks a lot for working on this!

I'd say the new config option isn't needed.  The runtime ORC sorting
logic is unconditionally bad and the code should just be removed.  I saw
recently that it's one of the main offenders for boot time latency.

I also agree with Peter that we should try to reduce the link-time
penalty as much as possible.  But it's a necessary evil to a certain
extent.

-- 
Josh


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/4] Speed booting by sorting ORC unwind tables at build time
  2019-11-07 15:22 ` [RFC PATCH 0/4] Speed booting by sorting ORC unwind " Peter Zijlstra
@ 2019-11-08  1:42   ` Shile Zhang
  2019-11-08  9:21     ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Shile Zhang @ 2019-11-08  1:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Masahiro Yamada, Michal Marek, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Josh Poimboeuf, x86, H . Peter Anvin,
	linux-kernel, linux-kbuild



On 2019/11/7 23:22, Peter Zijlstra wrote:
> On Thu, Nov 07, 2019 at 10:32:01PM +0800, shile.zhang@linux.alibaba.com wrote:
>> From: Shile Zhang <shile.zhang@linux.alibaba.com>
>>
>> Hi,
>>
>> I found the unwind_init taken long time (more than 90ms) in kernel
>> booting, mainly spent on sorting the two ORC unwind tables, orc_unwind
>> and orc_unwind_ip.
>>
>> I also noticed that this issued has reported and discussed last year:
>> https://lkml.org/lkml/2018/10/8/342
>> But seems no final solution until now, I tried to sort the ORC tables at
>> build time, followed the helpful hints from Josh and Ingo in that thread.
>> And mainly referred the implementation of 'sortextable' tool:
>> https://lore.kernel.org/linux-mips/1334872799-14589-1-git-send-email-ddaney.cavm@gmail.com/
>>
>> What I did:
>>
>> - Add a Kconfig to control build-time sorting or runtime sorting;
>> - Referred 'sortextable', create a similar helper tool 'sortorctable',
>>    help to sort the ORC unwind tables at vmlinux link process.
> What is the build-time cost for doing this? The link phase is already a
> fairly big bottleneck for building a kernel.
Hi, Peter, Thanks for your kindly reply!
On my test env, the build-time sort spend about 100ms, which is similar 
to runtime sorting due to the same sorting code. Of course there are few 
compiling cost in sortorctable tool itself. I think the overall cost of 
this build-time sorting is not so much.

I agree with you that the link time of vmlinux shoud be optimized.
But IMHO, for one kernel product release, the kernel building is once 
for all. So put the sorting in build time can save boot time for 
customer, for each booting. I think this is significant for boot time 
sensitive products, such as embedded devices in IoT, or VM in Cloud.
> Can sort{ex,orc}table() be ran concurrently? Do they want to be the same
> (threaded) tool?
I think it is possible to do those sort work concurrently, likes 
deferred memory init which is big boot time speed up.
But I don't know if the exception table and ORC unwind tables can be 
deferred, due to those tables might be used in early boot time, for 
early exception handling and early debugging. I'm not familiar with that.

IMO, the init works in kernel boot time should do the necessary 
runtime-depends initialization, which cannot done out-of-bands. For 
exception, ORC unwind like tables, which does not depends on the runtime 
ENV. It can/should be ready in building time. IOW, this kind of "setup 
cost" should not put on customer.

Thanks again!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/4] Speed booting by sorting ORC unwind tables at build time
  2019-11-07 15:46 ` Josh Poimboeuf
@ 2019-11-08  1:43   ` Shile Zhang
  0 siblings, 0 replies; 12+ messages in thread
From: Shile Zhang @ 2019-11-08  1:43 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Masahiro Yamada, Michal Marek, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H . Peter Anvin, linux-kernel,
	linux-kbuild



On 2019/11/7 23:46, Josh Poimboeuf wrote:
> On Thu, Nov 07, 2019 at 10:32:01PM +0800, shile.zhang@linux.alibaba.com wrote:
>> From: Shile Zhang <shile.zhang@linux.alibaba.com>
>>
>> Hi,
>>
>> I found the unwind_init taken long time (more than 90ms) in kernel
>> booting, mainly spent on sorting the two ORC unwind tables, orc_unwind
>> and orc_unwind_ip.
>>
>> I also noticed that this issued has reported and discussed last year:
>> https://lkml.org/lkml/2018/10/8/342
>> But seems no final solution until now, I tried to sort the ORC tables at
>> build time, followed the helpful hints from Josh and Ingo in that thread.
>> And mainly referred the implementation of 'sortextable' tool:
>> https://lore.kernel.org/linux-mips/1334872799-14589-1-git-send-email-ddaney.cavm@gmail.com/
>>
>> What I did:
>>
>> - Add a Kconfig to control build-time sorting or runtime sorting;
>> - Referred 'sortextable', create a similar helper tool 'sortorctable',
>>    help to sort the ORC unwind tables at vmlinux link process.
>>
>> One potential improvement is to sort the module ORC tables in future.
>>
>> Thanks!
> Thanks a lot for working on this!
>
> I'd say the new config option isn't needed.  The runtime ORC sorting
> logic is unconditionally bad and the code should just be removed.  I saw
> recently that it's one of the main offenders for boot time latency.
Hi, Josh, Thanks very much for your quickly response!

I'll refactor the code as your advice.
Yes, the run-time sorting cost is bigger for currently Cloud products, 
such as serverless products, which needs boot up ASAP.
>
> I also agree with Peter that we should try to reduce the link-time
> penalty as much as possible.  But it's a necessary evil to a certain
> extent.
>
agree!

Thanks again and looking forwards more advice from you!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/4] Speed booting by sorting ORC unwind tables at build time
  2019-11-08  1:42   ` Shile Zhang
@ 2019-11-08  9:21     ` Peter Zijlstra
  2019-11-08  9:25       ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2019-11-08  9:21 UTC (permalink / raw)
  To: Shile Zhang
  Cc: Masahiro Yamada, Michal Marek, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Josh Poimboeuf, x86, H . Peter Anvin,
	linux-kernel, linux-kbuild

On Fri, Nov 08, 2019 at 09:42:55AM +0800, Shile Zhang wrote:

> > Can sort{ex,orc}table() be ran concurrently? Do they want to be the same
> > (threaded) tool?

> I think it is possible to do those sort work concurrently, likes deferred
> memory init which is big boot time speed up.
> But I don't know if the exception table and ORC unwind tables can be
> deferred, due to those tables might be used in early boot time, for early
> exception handling and early debugging. I'm not familiar with that.

I meant at link time, run both sorts concurrently such that we only have
to wait for the longest, instead of the sum of them.

They're not changing the same part of the ELF file, so it should be
possible to have one tool have multiple threads, each sorting a
different table.

Aside from the .ex_table and ORC there's also .jump_table that wants
sorting (see jump_label_sort_entries()).

I agree that doing it at link time makes sense, I just hate to do all
this sorting in sequence and blowing up the link time. I don't build for
customers, I build for single use boot and linking _SUCKS_.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/4] Speed booting by sorting ORC unwind tables at build time
  2019-11-08  9:21     ` Peter Zijlstra
@ 2019-11-08  9:25       ` Peter Zijlstra
  2019-11-11  2:44         ` Shile Zhang
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2019-11-08  9:25 UTC (permalink / raw)
  To: Shile Zhang
  Cc: Masahiro Yamada, Michal Marek, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Josh Poimboeuf, x86, H . Peter Anvin,
	linux-kernel, linux-kbuild

On Fri, Nov 08, 2019 at 10:21:36AM +0100, Peter Zijlstra wrote:
> On Fri, Nov 08, 2019 at 09:42:55AM +0800, Shile Zhang wrote:
> 
> > > Can sort{ex,orc}table() be ran concurrently? Do they want to be the same
> > > (threaded) tool?
> 
> > I think it is possible to do those sort work concurrently, likes deferred
> > memory init which is big boot time speed up.
> > But I don't know if the exception table and ORC unwind tables can be
> > deferred, due to those tables might be used in early boot time, for early
> > exception handling and early debugging. I'm not familiar with that.
> 
> I meant at link time, run both sorts concurrently such that we only have
> to wait for the longest, instead of the sum of them.
> 
> They're not changing the same part of the ELF file, so it should be
> possible to have one tool have multiple threads, each sorting a
> different table.
> 
> Aside from the .ex_table and ORC there's also .jump_table that wants
> sorting (see jump_label_sort_entries()).

Oh, and I'll be adding .static_call_sites soon, see:

  https://lkml.kernel.org/r/20191007082708.013939311@infradead.org

(I should repost that)

That gives us 4 tables to sort which we can do concurrently in 4
threads.

> I agree that doing it at link time makes sense, I just hate to do all
> this sorting in sequence and blowing up the link time. I don't build for
> customers, I build for single use boot and linking _SUCKS_.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 0/4] Speed booting by sorting ORC unwind tables at build time
  2019-11-08  9:25       ` Peter Zijlstra
@ 2019-11-11  2:44         ` Shile Zhang
  0 siblings, 0 replies; 12+ messages in thread
From: Shile Zhang @ 2019-11-11  2:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Masahiro Yamada, Michal Marek, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Josh Poimboeuf, x86, H . Peter Anvin,
	linux-kernel, linux-kbuild



On 2019/11/8 17:25, Peter Zijlstra wrote:
> On Fri, Nov 08, 2019 at 10:21:36AM +0100, Peter Zijlstra wrote:
>> On Fri, Nov 08, 2019 at 09:42:55AM +0800, Shile Zhang wrote:
>>
>>>> Can sort{ex,orc}table() be ran concurrently? Do they want to be the same
>>>> (threaded) tool?
>>> I think it is possible to do those sort work concurrently, likes deferred
>>> memory init which is big boot time speed up.
>>> But I don't know if the exception table and ORC unwind tables can be
>>> deferred, due to those tables might be used in early boot time, for early
>>> exception handling and early debugging. I'm not familiar with that.
>> I meant at link time, run both sorts concurrently such that we only have
>> to wait for the longest, instead of the sum of them.
>>
>> They're not changing the same part of the ELF file, so it should be
>> possible to have one tool have multiple threads, each sorting a
>> different table.
>>
>> Aside from the .ex_table and ORC there's also .jump_table that wants
>> sorting (see jump_label_sort_entries()).
> Oh, and I'll be adding .static_call_sites soon, see:
>
>    https://lkml.kernel.org/r/20191007082708.013939311@infradead.org
>
> (I should repost that)
>
> That gives us 4 tables to sort which we can do concurrently in 4
> threads.

I got your point now.
I'll try to rework the sort tool to sort all tables concurrently in one 
tool with multiple-threads.
Thanks for your advice!

>> I agree that doing it at link time makes sense, I just hate to do all
>> this sorting in sequence and blowing up the link time. I don't build for
>> customers, I build for single use boot and linking _SUCKS_.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-11-11  2:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-07 14:32 [RFC PATCH 0/4] Speed booting by sorting ORC unwind tables at build time shile.zhang
2019-11-07 14:32 ` [RFC PATCH 1/4] scripts: Add sortorctable to sort ORC unwind tables shile.zhang
2019-11-07 14:32 ` [RFC PATCH 2/4] kbuild: Sort ORC unwind tables in vmlinux link process shile.zhang
2019-11-07 14:32 ` [RFC PATCH 3/4] x86/unwind/orc: Skip sorting if BUILDTIME_ORCTABLE_SORT is configured shile.zhang
2019-11-07 14:32 ` [RFC PATCH 4/4] x86/Kconfig: Add a Kconfig option to sort ORC tables at build time shile.zhang
2019-11-07 15:22 ` [RFC PATCH 0/4] Speed booting by sorting ORC unwind " Peter Zijlstra
2019-11-08  1:42   ` Shile Zhang
2019-11-08  9:21     ` Peter Zijlstra
2019-11-08  9:25       ` Peter Zijlstra
2019-11-11  2:44         ` Shile Zhang
2019-11-07 15:46 ` Josh Poimboeuf
2019-11-08  1:43   ` Shile Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).