[RFC][PATCH 0/2] x86/jump labels: Add the 5 byte or 2 byte jumps

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC][PATCH 0/2] x86/jump labels: Add the 5 byte or 2 byte jumps
@ 2013-08-07 17:36 Steven Rostedt
  2013-08-07 17:36 ` [RFC][PATCH 1/2] jump labels: Add infrastructure to update jump labels at compile time Steven Rostedt
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Steven Rostedt @ 2013-08-07 17:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: H. Peter Anvin, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Jason Baron

As I said, I would post the patches that let the jmps used by jump labels
be turn to 2 bytes where possible. These are a bit controversial due
to the complexity of the update_jump_label code.

These patches are based off of tip's x86/jumplabel code.

But if someone cares to play with it, feel free. I'll push this up
to my repo under: tip/perf/jump-label-7 (internally this is my 7th version).

I'll post a patch that does the counting as well as a reply to this
post.

Enjoy,

-- Steve


  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
tip/perf/jump-label-7

Head SHA1: 2a7235814df0cccda60a366fb8b1e5502055ea4d


Steven Rostedt (2):
      jump labels: Add infrastructure to update jump labels at compile time
      x86/jump labels: Use etiher 5 byte or 2 byte jumps

----
 Makefile                          |    7 +
 arch/Kconfig                      |    6 +
 arch/x86/Kconfig                  |    1 +
 arch/x86/include/asm/jump_label.h |    7 +-
 arch/x86/kernel/jump_label.c      |   88 +++++++---
 scripts/Makefile                  |    1 +
 scripts/Makefile.build            |   15 +-
 scripts/update_jump_label.c       |  341 +++++++++++++++++++++++++++++++++++++
 scripts/update_jump_label.h       |  208 ++++++++++++++++++++++
 9 files changed, 649 insertions(+), 25 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC][PATCH 1/2] jump labels: Add infrastructure to update jump labels at compile time
  2013-08-07 17:36 [RFC][PATCH 0/2] x86/jump labels: Add the 5 byte or 2 byte jumps Steven Rostedt
@ 2013-08-07 17:36 ` Steven Rostedt
  2013-08-07 17:36 ` [RFC][PATCH 2/2] x86/jump labels: Use etiher 5 byte or 2 byte jumps Steven Rostedt
  2013-08-07 17:54 ` [RFC][PATCH 3/2] x86/jump labels: Count and display the short jumps used Steven Rostedt
  2 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2013-08-07 17:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: H. Peter Anvin, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Jason Baron

[-- Attachment #1: 0001-jump-labels-Add-infrastructure-to-update-jump-labels.patch --]
[-- Type: text/plain, Size: 18535 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Add the infrastructure to allow architectures to modify the jump label
locations at compile time. This is mainly for x86, where the jmps may
be either 2 bytes or 5 bytes. Instead of wasting 5 bytes for all jump labels,
this code will let x86 put in a jmp instead of a 5 byte nop. Then the
assembler will make either a 2 byte or 5 byte jmp depending on where
the target is.

At compile time, this code will replace the jmps with either a 2 byte
or 5 byte nop depending on the size of the jmp that was added.

Cc: Jason Baron <jbaron@akamai.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 Makefile                    |    7 +
 arch/Kconfig                |    6 +
 scripts/Makefile            |    1 +
 scripts/Makefile.build      |   15 +-
 scripts/update_jump_label.c |  341 +++++++++++++++++++++++++++++++++++++++++++
 scripts/update_jump_label.h |  208 ++++++++++++++++++++++++++
 6 files changed, 576 insertions(+), 2 deletions(-)
 create mode 100644 scripts/update_jump_label.c
 create mode 100644 scripts/update_jump_label.h

diff --git a/Makefile b/Makefile
index 9262ba8..8ff1c91 100644
--- a/Makefile
+++ b/Makefile
@@ -638,6 +638,13 @@ ifdef CONFIG_DYNAMIC_FTRACE
 endif
 endif
 
+ifdef CONFIG_JUMP_LABEL
+	ifdef CONFIG_HAVE_BUILD_TIME_JUMP_LABEL
+		BUILD_UPDATE_JUMP_LABEL := y
+		export BUILD_UPDATE_JUMP_LABEL
+	endif
+endif
+
 # We trigger additional mismatches with less inlining
 ifdef CONFIG_DEBUG_SECTION_MISMATCH
 KBUILD_CFLAGS += $(call cc-option, -fno-inline-functions-called-once)
diff --git a/arch/Kconfig b/arch/Kconfig
index 8d2ae24..dffcb00 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -283,6 +283,12 @@ config HAVE_PERF_USER_STACK_DUMP
 	  access to the user stack pointer which is not unified across
 	  architectures.
 
+config HAVE_BUILD_TIME_JUMP_LABEL
+       bool
+       help
+	If an arch uses scripts/update_jump_label to patch in jump nops
+	at build time, then it must enable this option.
+
 config HAVE_ARCH_JUMP_LABEL
 	bool
 
diff --git a/scripts/Makefile b/scripts/Makefile
index 01e7adb..fe548fe 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -20,6 +20,7 @@ hostprogs-$(CONFIG_ASN1)	 += asn1_compiler
 
 HOSTCFLAGS_sortextable.o = -I$(srctree)/tools/include
 HOSTCFLAGS_asn1_compiler.o = -I$(srctree)/include
+hostprogs-$(BUILD_UPDATE_JUMP_LABEL) += update_jump_label
 
 always		:= $(hostprogs-y) $(hostprogs-m)
 
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index d5d859c..8084729 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -261,6 +261,15 @@ cmd_modversions =								\
 	fi;
 endif
 
+ifdef BUILD_UPDATE_JUMP_LABEL
+update_jump_label_source := $(srctree)/scripts/update_jump_label.c \
+			$(srctree)/scripts/update_jump_label.h
+cmd_update_jump_label =						\
+	if [ $(@) != "scripts/mod/empty.o" ]; then		\
+		$(objtree)/scripts/update_jump_label "$(@)";	\
+	fi;
+endif
+
 ifdef CONFIG_FTRACE_MCOUNT_RECORD
 ifdef BUILD_C_RECORDMCOUNT
 ifeq ("$(origin RECORDMCOUNT_WARN)", "command line")
@@ -297,6 +306,7 @@ define rule_cc_o_c
 	$(cmd_modversions)						  \
 	$(call echo-cmd,record_mcount)					  \
 	$(cmd_record_mcount)						  \
+	$(cmd_update_jump_label)					  \
 	scripts/basic/fixdep $(depfile) $@ '$(call make-cmd,cc_o_c)' >    \
 	                                              $(dot-target).tmp;  \
 	rm -f $(depfile);						  \
@@ -304,13 +314,14 @@ define rule_cc_o_c
 endef
 
 # Built-in and composite module parts
-$(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
+$(obj)/%.o: $(src)/%.c $(recordmcount_source) $(update_jump_label_source) FORCE
 	$(call cmd,force_checksrc)
 	$(call if_changed_rule,cc_o_c)
 
 # Single-part modules are special since we need to mark them in $(MODVERDIR)
 
-$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
+$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) \
+		  $(update_jump_label_source) FORCE
 	$(call cmd,force_checksrc)
 	$(call if_changed_rule,cc_o_c)
 	@{ echo $(@:.o=.ko); echo $@; } > $(MODVERDIR)/$(@F:.o=.mod)
diff --git a/scripts/update_jump_label.c b/scripts/update_jump_label.c
new file mode 100644
index 0000000..36d4b43
--- /dev/null
+++ b/scripts/update_jump_label.c
@@ -0,0 +1,341 @@
+/*
+ * update_jump_label.c: replace jmps with nops at compile time.
+ * Copyright 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc.
+ *  Parsing of the elf file was influenced by recordmcount.c
+ *  originally written by and copyright to John F. Reiser <jreiser@BitWagon.com>.
+ */
+
+/*
+ * Note, this code is originally designed for x86, but may be used by
+ * other archs to do the nop updates at compile time instead of at boot time.
+ * X86 uses this as an optimization, as jmps can be either 2 bytes or 5 bytes.
+ * Inserting a 2 byte where possible helps with both CPU performance and
+ * icache strain.
+ */
+#include <sys/types.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <getopt.h>
+#include <elf.h>
+#include <fcntl.h>
+#include <setjmp.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdarg.h>
+#include <string.h>
+#include <unistd.h>
+
+static int fd_map;	/* File descriptor for file being modified. */
+static struct stat sb;	/* Remember .st_size, etc. */
+static int mmap_failed; /* Boolean flag. */
+
+static void die(const char *err, const char *fmt, ...)
+{
+	va_list ap;
+
+	if (err)
+		perror(err);
+
+	if (fmt) {
+		va_start(ap, fmt);
+		fprintf(stderr, "Fatal error:  ");
+		vfprintf(stderr, fmt, ap);
+		fprintf(stderr, "\n");
+		va_end(ap);
+	}
+
+	exit(1);
+}
+
+static void usage(char **argv)
+{
+	char *arg = argv[0];
+	char *p = arg + strlen(arg);
+
+	while (p >= arg && *p != '/')
+		p--;
+	p++;
+
+	printf("usage: %s file\n"
+	       "\n", p);
+	exit(-1);
+}
+
+/* w8rev, w8nat, ...: Handle endianness. */
+
+static uint64_t w8rev(uint64_t const x)
+{
+	return   ((0xff & (x >> (0 * 8))) << (7 * 8))
+	       | ((0xff & (x >> (1 * 8))) << (6 * 8))
+	       | ((0xff & (x >> (2 * 8))) << (5 * 8))
+	       | ((0xff & (x >> (3 * 8))) << (4 * 8))
+	       | ((0xff & (x >> (4 * 8))) << (3 * 8))
+	       | ((0xff & (x >> (5 * 8))) << (2 * 8))
+	       | ((0xff & (x >> (6 * 8))) << (1 * 8))
+	       | ((0xff & (x >> (7 * 8))) << (0 * 8));
+}
+
+static uint32_t w4rev(uint32_t const x)
+{
+	return   ((0xff & (x >> (0 * 8))) << (3 * 8))
+	       | ((0xff & (x >> (1 * 8))) << (2 * 8))
+	       | ((0xff & (x >> (2 * 8))) << (1 * 8))
+	       | ((0xff & (x >> (3 * 8))) << (0 * 8));
+}
+
+static uint32_t w2rev(uint16_t const x)
+{
+	return   ((0xff & (x >> (0 * 8))) << (1 * 8))
+	       | ((0xff & (x >> (1 * 8))) << (0 * 8));
+}
+
+static uint64_t w8nat(uint64_t const x)
+{
+	return x;
+}
+
+static uint32_t w4nat(uint32_t const x)
+{
+	return x;
+}
+
+static uint32_t w2nat(uint16_t const x)
+{
+	return x;
+}
+
+static uint64_t (*w8)(uint64_t);
+static uint32_t (*w)(uint32_t);
+static uint32_t (*w2)(uint16_t);
+
+/* ulseek, uread, ...:  Check return value for errors. */
+
+static off_t
+ulseek(int const fd, off_t const offset, int const whence)
+{
+	off_t const w = lseek(fd, offset, whence);
+	if (w == (off_t)-1)
+		die("lseek", NULL);
+
+	return w;
+}
+
+static size_t
+uread(int const fd, void *const buf, size_t const count)
+{
+	size_t const n = read(fd, buf, count);
+	if (n != count)
+		die("read", NULL);
+
+	return n;
+}
+
+static size_t
+uwrite(int const fd, void const *const buf, size_t const count)
+{
+	size_t const n = write(fd, buf, count);
+	if (n != count)
+		die("write", NULL);
+
+	return n;
+}
+
+static void *
+umalloc(size_t size)
+{
+	void *const addr = malloc(size);
+	if (addr == 0)
+		die("malloc", "malloc failed: %zu bytes\n", size);
+
+	return addr;
+}
+
+/*
+ * Get the whole file as a programming convenience in order to avoid
+ * malloc+lseek+read+free of many pieces.  If successful, then mmap
+ * avoids copying unused pieces; else just read the whole file.
+ * Open for both read and write; new info will be appended to the file.
+ * Use MAP_PRIVATE so that a few changes to the in-memory ElfXX_Ehdr
+ * do not propagate to the file until an explicit overwrite at the last.
+ * This preserves most aspects of consistency (all except .st_size)
+ * for simultaneous readers of the file while we are appending to it.
+ * However, multiple writers still are bad.  We choose not to use
+ * locking because it is expensive and the use case of kernel build
+ * makes multiple writers unlikely.
+ */
+static void *mmap_file(char const *fname)
+{
+	void *addr;
+
+	fd_map = open(fname, O_RDWR);
+	if (fd_map < 0 || fstat(fd_map, &sb) < 0)
+		die(fname, "failed to open file");
+
+	if (!S_ISREG(sb.st_mode))
+		die(NULL, "not a regular file: %s\n", fname);
+
+	addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE,
+		    fd_map, 0);
+
+	mmap_failed = 0;
+	if (addr == MAP_FAILED) {
+		mmap_failed = 1;
+		addr = umalloc(sb.st_size);
+		uread(fd_map, addr, sb.st_size);
+	}
+	return addr;
+}
+
+static void munmap_file(void *addr)
+{
+	if (!mmap_failed)
+		munmap(addr, sb.st_size);
+	else
+		free(addr);
+	close(fd_map);
+}
+
+/* P6_NOP5_ATOMIC */
+static unsigned char ideal_nop5_x86_64[5] = { 0x0f, 0x1f, 0x44, 0x00, 0x00 };
+
+/* GENERIC_NOP5_ATOMIC */
+static unsigned char ideal_nop5_x86_32[5] = { 0x3e, 0x8d, 0x74, 0x26, 0x00 };
+
+/* P6_NOP2 */
+static unsigned char ideal_nop2_x86[2] = { 0x66, 0x90 };
+
+static unsigned char *ideal_nop;
+
+static int (*make_nop)(void *map, size_t const offset);
+
+static int make_nop_x86(void *map, size_t const offset)
+{
+	unsigned char *op;
+	unsigned char *nop;
+	int size;
+
+	/* Determine which type of jmp this is 2 byte or 5. */
+	op = map + offset;
+	switch (*op) {
+	case 0xeb: /* 2 byte */
+		size = 2;
+		nop = ideal_nop2_x86;
+		break;
+	case 0xe9: /* 5 byte */
+		size = 5;
+		nop = ideal_nop;
+		break;
+	default:
+		die(NULL, "Bad jump label section (bad op %x)\n", *op);
+		__builtin_unreachable();
+	}
+
+	/* convert to nop */
+	ulseek(fd_map, offset, SEEK_SET);
+	uwrite(fd_map, nop, size);
+	return 0;
+}
+
+/* 32 bit and 64 bit are very similar */
+#include "update_jump_label.h"
+#define UPDATE_JUMP_LABEL_64
+#include "update_jump_label.h"
+
+static int do_file(const char *fname)
+{
+	Elf32_Ehdr *const ehdr = mmap_file(fname);
+	unsigned int reltype = 0;
+
+	w = w4nat;
+	w2 = w2nat;
+	w8 = w8nat;
+	switch (ehdr->e_ident[EI_DATA]) {
+		static unsigned int const endian = 1;
+	default:
+		die(NULL, "unrecognized ELF data encoding %d: %s\n",
+			ehdr->e_ident[EI_DATA], fname);
+		break;
+	case ELFDATA2LSB:
+		if (*(unsigned char const *)&endian != 1) {
+			/* main() is big endian, file.o is little endian. */
+			w = w4rev;
+			w2 = w2rev;
+			w8 = w8rev;
+		}
+		break;
+	case ELFDATA2MSB:
+		if (*(unsigned char const *)&endian != 0) {
+			/* main() is little endian, file.o is big endian. */
+			w = w4rev;
+			w2 = w2rev;
+			w8 = w8rev;
+		}
+		break;
+	}  /* end switch */
+
+	if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0 ||
+	    w2(ehdr->e_type) != ET_REL ||
+	    ehdr->e_ident[EI_VERSION] != EV_CURRENT)
+		die(NULL, "unrecognized ET_REL file %s\n", fname);
+
+	switch (w2(ehdr->e_machine)) {
+	default:
+		die(NULL, "unrecognized e_machine %d %s\n",
+		    w2(ehdr->e_machine), fname);
+		break;
+	case EM_386:
+		reltype = R_386_32;
+		make_nop = make_nop_x86;
+		ideal_nop = ideal_nop5_x86_32;
+		break;
+	case EM_ARM:	 reltype = R_ARM_ABS32;
+			 break;
+	case EM_IA_64:	 reltype = R_IA64_IMM64; break;
+	case EM_MIPS:	 /* reltype: e_class    */ break;
+	case EM_PPC:	 reltype = R_PPC_ADDR32;   break;
+	case EM_PPC64:	 reltype = R_PPC64_ADDR64; break;
+	case EM_S390:    /* reltype: e_class    */ break;
+	case EM_SH:	 reltype = R_SH_DIR32;                 break;
+	case EM_SPARCV9: reltype = R_SPARC_64;     break;
+	case EM_X86_64:
+		make_nop = make_nop_x86;
+		ideal_nop = ideal_nop5_x86_64;
+		reltype = R_X86_64_64;
+		break;
+	}  /* end switch */
+
+	switch (ehdr->e_ident[EI_CLASS]) {
+	default:
+		die(NULL, "unrecognized ELF class %d %s\n",
+		    ehdr->e_ident[EI_CLASS], fname);
+		break;
+	case ELFCLASS32:
+		if (w2(ehdr->e_ehsize) != sizeof(Elf32_Ehdr)
+		||  w2(ehdr->e_shentsize) != sizeof(Elf32_Shdr))
+			die(NULL, "unrecognized ET_REL file: %s\n", fname);
+
+		do_func32(ehdr, fname, reltype);
+		break;
+	case ELFCLASS64: {
+		Elf64_Ehdr *const ghdr = (Elf64_Ehdr *)ehdr;
+		if (w2(ghdr->e_ehsize) != sizeof(Elf64_Ehdr)
+		||  w2(ghdr->e_shentsize) != sizeof(Elf64_Shdr))
+			die(NULL, "unrecognized ET_REL file: %s\n", fname);
+
+		do_func64(ghdr, fname, reltype);
+		break;
+	}
+	}  /* end switch */
+
+	munmap_file(ehdr);
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	if (argc != 2)
+		usage(argv);
+
+	return do_file(argv[1]);
+}
+
diff --git a/scripts/update_jump_label.h b/scripts/update_jump_label.h
new file mode 100644
index 0000000..181bd5b
--- /dev/null
+++ b/scripts/update_jump_label.h
@@ -0,0 +1,208 @@
+/*
+ * update_jump_label.h
+ *
+ * This code was based off of code from recordmcount.c written by
+ * Copyright 2009 John F. Reiser <jreiser@BitWagon.com>.  All rights reserved.
+ *
+ * The original code had the same algorithms for both 32bit
+ * and 64bit ELF files, but the code was duplicated to support
+ * the difference in structures that were used. This
+ * file creates a macro of everything that is different between
+ * the 64 and 32 bit code, such that by including this header
+ * twice we can create both sets of functions by including this
+ * header once with UPDATE_JUMP_LABEL_64 undefined, and again with
+ * it defined.
+ *
+ * Copyright 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc.
+ *
+ * Licensed under the GNU General Public License, version 2 (GPLv2).
+ */
+
+#undef EBITS
+#undef _w
+
+#ifdef UPDATE_JUMP_LABEL_64
+# define EBITS			64
+# define _w			w8
+#else
+# define EBITS			32
+# define _w			w
+#endif
+
+#define _FBITS(x, e)	x##e
+#define FBITS(x, e)	_FBITS(x, e)
+#define FUNC(x)		FBITS(x, EBITS)
+
+#undef Elf_Ehdr
+#undef Elf_Shdr
+#undef Elf_Rel
+#undef Elf_Rela
+#undef Elf_Sym
+#undef ELF_R_SYM
+#undef ELF_R_TYPE
+
+#define __ATTACH(x, y, z)	x##y##z
+#define ATTACH(x, y, z)		__ATTACH(x, y, z)
+
+#define Elf_Ehdr	ATTACH(Elf, EBITS, _Ehdr)
+#define Elf_Shdr	ATTACH(Elf, EBITS, _Shdr)
+#define Elf_Rel		ATTACH(Elf, EBITS, _Rel)
+#define Elf_Rela	ATTACH(Elf, EBITS, _Rela)
+#define Elf_Sym		ATTACH(Elf, EBITS, _Sym)
+#define uint_t		ATTACH(uint, EBITS, _t)
+#define ELF_R_SYM	ATTACH(ELF, EBITS, _R_SYM)
+#define ELF_R_TYPE	ATTACH(ELF, EBITS, _R_TYPE)
+
+#undef get_shdr
+#define get_shdr(ehdr) ((Elf_Shdr *)(_w((ehdr)->e_shoff) + (void *)(ehdr)))
+
+#undef get_section_loc
+#define get_section_loc(ehdr, shdr)(_w((shdr)->sh_offset) + (void *)(ehdr))
+
+/* Find relocation section hdr for a given section */
+static const Elf_Shdr *
+FUNC(find_relhdr)(const Elf_Ehdr *ehdr, const Elf_Shdr *shdr)
+{
+	const Elf_Shdr *shdr0 = get_shdr(ehdr);
+	int nhdr = w2(ehdr->e_shnum);
+	const Elf_Shdr *hdr;
+	int i;
+
+	for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) {
+		if (w(hdr->sh_type) != SHT_REL &&
+		    w(hdr->sh_type) != SHT_RELA)
+			continue;
+
+		/*
+		 * The relocation section's info field holds
+		 * the section index that it represents.
+		 */
+		if (shdr == &shdr0[w(hdr->sh_info)])
+			return hdr;
+	}
+	return NULL;
+}
+
+/* Find a section headr based on name and type */
+static const Elf_Shdr *
+FUNC(find_shdr)(const Elf_Ehdr *ehdr, const char *name, uint_t type)
+{
+	const Elf_Shdr *shdr0 = get_shdr(ehdr);
+	const Elf_Shdr *shstr = &shdr0[w2(ehdr->e_shstrndx)];
+	const char *shstrtab = (char *)get_section_loc(ehdr, shstr);
+	int nhdr = w2(ehdr->e_shnum);
+	const Elf_Shdr *hdr;
+	const char *hdrname;
+	int i;
+
+	for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) {
+		if (w(hdr->sh_type) != type)
+			continue;
+
+		/* If we are just looking for a section by type (ie. SYMTAB) */
+		if (!name)
+			return hdr;
+
+		hdrname = &shstrtab[w(hdr->sh_name)];
+		if (strcmp(hdrname, name) == 0)
+			return hdr;
+	}
+	return NULL;
+}
+
+static void
+FUNC(section_update)(const Elf_Ehdr *ehdr, const Elf_Shdr *symhdr,
+		     unsigned shtype, const Elf_Rel *rel, void *data)
+{
+	const Elf_Shdr *shdr0 = get_shdr(ehdr);
+	const Elf_Shdr *targethdr;
+	const Elf_Rela *rela;
+	const Elf_Sym *syment;
+	uint_t offset = _w(rel->r_offset);
+	uint_t info = _w(rel->r_info);
+	uint_t sym = ELF_R_SYM(info);
+	uint_t type = ELF_R_TYPE(info);
+	uint_t addend;
+	uint_t targetloc;
+
+	if (shtype == SHT_RELA) {
+		rela = (const Elf_Rela *)rel;
+		addend = _w(rela->r_addend);
+	} else
+		addend = _w(*(int *)(data + offset));
+
+	syment = (const Elf_Sym *)get_section_loc(ehdr, symhdr);
+	targethdr = &shdr0[w2(syment[sym].st_shndx)];
+	targetloc = _w(targethdr->sh_offset);
+
+	/* TODO, need a separate function for all archs */
+	if (type != R_386_32)
+		die(NULL, "Arch relocation type %d not supported", type);
+
+	targetloc += addend;
+
+	*(uint_t *)(data + offset) = targetloc;
+}
+
+/* Overall supervision for Elf32 ET_REL file. */
+static void
+FUNC(do_func)(Elf_Ehdr *ehdr, char const *const fname, unsigned const reltype)
+{
+	const Elf_Shdr *jlshdr;
+	const Elf_Shdr *jlrhdr;
+	const Elf_Shdr *symhdr;
+	const Elf_Rel *rel;
+	unsigned size;
+	unsigned cnt;
+	unsigned i;
+	uint_t type;
+	void *jdata;
+	void *data;
+
+	jlshdr = FUNC(find_shdr)(ehdr, "__jump_table", SHT_PROGBITS);
+	if (!jlshdr)
+		return;
+
+	jlrhdr = FUNC(find_relhdr)(ehdr, jlshdr);
+	if (!jlrhdr)
+		return;
+
+	/*
+	 * Create and fill in the __jump_table section and use it to
+	 * find the offsets into the text that we want to update.
+	 * We create it so that we do not depend on the order of the
+	 * relocations, and use the table directly, as it is broken
+	 * up into sections.
+	 */
+	size = _w(jlshdr->sh_size);
+	data = umalloc(size);
+
+	jdata = (void *)get_section_loc(ehdr, jlshdr);
+	memcpy(data, jdata, size);
+
+	cnt = _w(jlrhdr->sh_size) / w(jlrhdr->sh_entsize);
+
+	rel = (const Elf_Rel *)get_section_loc(ehdr, jlrhdr);
+
+	/* Is this as Rel or Rela? */
+	type = w(jlrhdr->sh_type);
+
+	symhdr = FUNC(find_shdr)(ehdr, NULL, SHT_SYMTAB);
+
+	for (i = 0; i < cnt; i++) {
+		FUNC(section_update)(ehdr, symhdr, type, rel, data);
+		rel = (void *)rel + w(jlrhdr->sh_entsize);
+	}
+
+	/*
+	 * This is specific to x86. The jump_table is stored in three
+	 * long words. The first is the location of the jmp target we
+	 * must update.
+	 */
+	cnt = size / sizeof(uint_t);
+
+	for (i = 0; i < cnt; i += 3)
+		make_nop((void *)ehdr, *(uint_t *)(data + i * sizeof(uint_t)));
+
+	free(data);
+}
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC][PATCH 2/2] x86/jump labels: Use etiher 5 byte or 2 byte jumps
  2013-08-07 17:36 [RFC][PATCH 0/2] x86/jump labels: Add the 5 byte or 2 byte jumps Steven Rostedt
  2013-08-07 17:36 ` [RFC][PATCH 1/2] jump labels: Add infrastructure to update jump labels at compile time Steven Rostedt
@ 2013-08-07 17:36 ` Steven Rostedt
  2013-08-07 17:57   ` Steven Rostedt
  2013-08-07 17:54 ` [RFC][PATCH 3/2] x86/jump labels: Count and display the short jumps used Steven Rostedt
  2 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2013-08-07 17:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: H. Peter Anvin, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Jason Baron

[-- Attachment #1: 0002-x86-jump-labels-Use-etiher-5-byte-or-2-byte-jumps.patch --]
[-- Type: text/plain, Size: 6709 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Have the jump labels add a "jmp" in the assembly instead
of a default nop. This will cause the assembler to put in
either a 2 byte or 5 byte jmp depending on where the target
lable is.

Then at compile time, the update_jump_label code will replace
the jmps with either 2 or 5 byte nops.

On boot up, the code can be examined to see if the jump label
uses either a 2 or 5 byte nop and replace it.

By allowing the jump labels to be 2 bytes, it speeds up the
nops, not only 2 byte nops are faster than 5 byte nops, but also
because it saves on cache foot print.

   text    data     bss     dec     hex filename
13403667 3666856 2998272 20068795 13239bb ../nobackup/mxtest/vmlinux-old
13398536 3666856 2998272 20063664 13225b0 ../nobackup/mxtest/vmlinux-new

Converting the current v3.2 trace points saved 5,131 bytes.
As more places use jump labels, this will have a bigger savings.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 arch/x86/Kconfig                  |    1 +
 arch/x86/include/asm/jump_label.h |    7 ++-
 arch/x86/kernel/jump_label.c      |   88 +++++++++++++++++++++++++++----------
 3 files changed, 73 insertions(+), 23 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b32ebf9..5dbbee2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -81,6 +81,7 @@ config X86
 	select HAVE_USER_RETURN_NOTIFIER
 	select ARCH_BINFMT_ELF_RANDOMIZE_PIE
 	select HAVE_ARCH_JUMP_LABEL
+	select HAVE_BUILD_TIME_JUMP_LABEL
 	select HAVE_TEXT_POKE_SMP
 	select HAVE_GENERIC_HARDIRQS
 	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index 64507f3..615adb4 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -10,6 +10,11 @@
 
 #define JUMP_LABEL_NOP_SIZE 5
 
+/*
+ * The STATIC_KEY_INIT_NOP must match the nops used in
+ * scripts/update_jump_label.c. Otherwise the boot time checks
+ * will fail and trigger a BUG() on boot up.
+ */
 #ifdef CONFIG_X86_64
 # define STATIC_KEY_INIT_NOP P6_NOP5_ATOMIC
 #else
@@ -19,7 +24,7 @@
 static __always_inline bool arch_static_branch(struct static_key *key)
 {
 	asm goto("1:"
-		".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
+		"jmp %l[l_yes]\n"
 		".pushsection __jump_table,  \"aw\" \n\t"
 		_ASM_ALIGN "\n\t"
 		_ASM_PTR "1b, %l[l_yes], %c0 \n\t"
diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index 912a528..1a943aa 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -16,12 +16,20 @@
 
 #ifdef HAVE_JUMP_LABEL
 
+/* These are the nops added at compile time */
+static const unsigned char nop_short[] = { P6_NOP2 };
+static const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
+
 union jump_code_union {
 	char code[JUMP_LABEL_NOP_SIZE];
 	struct {
 		char jump;
 		int offset;
-	} __attribute__((packed));
+	} __packed;
+	struct {
+		char jump_short;
+		char offset_short;
+	} __packed;
 };
 
 static void bug_at(unsigned char *ip, int line)
@@ -42,19 +50,29 @@ static void __jump_label_transform(struct jump_entry *entry,
 				   int init)
 {
 	union jump_code_union code;
+	unsigned char nop;
+	unsigned char op;
+	unsigned size;
+	void *ip = (void *)entry->code;
 	const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
 
-	if (type == JUMP_LABEL_ENABLE) {
-		/*
-		 * We are enabling this jump label. If it is not a nop
-		 * then something must have gone wrong.
-		 */
-		if (unlikely(memcmp((void *)entry->code, ideal_nop, 5) != 0))
-			bug_at((void *)entry->code, __LINE__);
+	/* Use probe_kernel_read()? */
+	op = *(unsigned char *)ip;
+	nop = ideal_nops[NOP_ATOMIC5][0];
 
-		code.jump = 0xe9;
-		code.offset = entry->target -
-				(entry->code + JUMP_LABEL_NOP_SIZE);
+	if (type == JUMP_LABEL_ENABLE) {
+		if (memcmp(ip, nop_short, 2) == 0) {
+			size = 2;
+			code.jump_short = 0xeb;
+			code.offset = entry->target - (entry->code + 2);
+			/* Check for overflow ? */
+		} else if ((!init && memcmp(ip, ideal_nop, 5) == 0) ||
+			   (init && memcmp(ip, default_nop, 5) == 0)) {
+			size = JUMP_LABEL_NOP_SIZE;
+			code.jump = 0xe9;
+			code.offset = entry->target - (entry->code + size);
+		} else
+			bug_at(ip, __LINE__);
 	} else {
 		/*
 		 * We are disabling this jump label. If it is not what
@@ -63,20 +81,48 @@ static void __jump_label_transform(struct jump_entry *entry,
 		 * are converting the default nop to the ideal nop.
 		 */
 		if (init) {
-			const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
-			if (unlikely(memcmp((void *)entry->code, default_nop, 5) != 0))
-				bug_at((void *)entry->code, __LINE__);
-		} else {
+			/* Ignore short nops, we do not change them */
+			if (memcmp(ip, nop_short, 2) == 0)
+				return;
+
+			/* We are initializing from the default nop */
+			if (unlikely(memcmp(ip, default_nop, 5) != 0))
+				bug_at(ip, __LINE__);
+
+			/* Set to the ideal nop */
+			size = JUMP_LABEL_NOP_SIZE;
+			memcpy(&code, ideal_nops[NOP_ATOMIC5], size);
+
+		} else if (op == 0xe9) {
+			/* Replace a 5 byte jmp */
+
+			/* Make sure this is what we expected it to be */
 			code.jump = 0xe9;
 			code.offset = entry->target -
 				(entry->code + JUMP_LABEL_NOP_SIZE);
-			if (unlikely(memcmp((void *)entry->code, &code, 5) != 0))
-				bug_at((void *)entry->code, __LINE__);
-		}
-		memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);
+
+			if (unlikely(memcmp(ip, &code, 5) != 0))
+				bug_at(ip, __LINE__);
+
+			size = JUMP_LABEL_NOP_SIZE;
+			memcpy(&code, ideal_nops[NOP_ATOMIC5], size);
+		} else if (op == 0xeb) {
+			/* Replace a 2 byte jmp */
+
+			/* Had better be a 2 byte jmp */
+			code.jump_short = 0xeb;
+			code.offset = entry->target - (entry->code + 2);
+			if (unlikely(memcmp(ip, &code, 2) != 0))
+				bug_at(ip, __LINE__);
+
+			size = 2;
+			memcpy(&code, nop_short, size);
+		} else
+			/* The code was not what we expected!  */
+			bug_at(ip, __LINE__);
 	}
 
-	(*poker)((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
+	(*poker)(ip, &code, size);
 }
 
 void arch_jump_label_transform(struct jump_entry *entry,
@@ -106,7 +152,6 @@ __init_or_module void arch_jump_label_transform_static(struct jump_entry *entry,
 	 * If it is not, then we need to update the nop to the ideal nop.
 	 */
 	if (jlstate == JL_STATE_START) {
-		const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
 		const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
 
 		if (memcmp(ideal_nop, default_nop, 5) != 0)
@@ -117,5 +162,4 @@ __init_or_module void arch_jump_label_transform_static(struct jump_entry *entry,
 	if (jlstate == JL_STATE_UPDATE)
 		__jump_label_transform(entry, type, text_poke_early, 1);
 }
-
 #endif
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC][PATCH 3/2] x86/jump labels: Count and display the short jumps used
  2013-08-07 17:36 [RFC][PATCH 0/2] x86/jump labels: Add the 5 byte or 2 byte jumps Steven Rostedt
  2013-08-07 17:36 ` [RFC][PATCH 1/2] jump labels: Add infrastructure to update jump labels at compile time Steven Rostedt
  2013-08-07 17:36 ` [RFC][PATCH 2/2] x86/jump labels: Use etiher 5 byte or 2 byte jumps Steven Rostedt
@ 2013-08-07 17:54 ` Steven Rostedt
  2013-08-07 19:22   ` Linus Torvalds
  2 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2013-08-07 17:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: H. Peter Anvin, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Jason Baron

On Wed, 2013-08-07 at 13:36 -0400, Steven Rostedt wrote:
> As I said, I would post the patches that let the jmps used by jump labels
> be turn to 2 bytes where possible. These are a bit controversial due
> to the complexity of the update_jump_label code.
> 
> These patches are based off of tip's x86/jumplabel code.
> 
> But if someone cares to play with it, feel free. I'll push this up
> to my repo under: tip/perf/jump-label-7 (internally this is my 7th version).
> 
> I'll post a patch that does the counting as well as a reply to this
> post.

Here's the patch that forces the update and counts the number of short
jumps vs long jumps. It also outputs the locations of the short jumps.

On another box, using a distro config, I had even better results:

[    2.352448] short jumps: 193
[    2.355407]  long jumps: 219

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Index: linux-trace.git/arch/x86/kernel/jump_label.c
===================================================================
--- linux-trace.git.orig/arch/x86/kernel/jump_label.c
+++ linux-trace.git/arch/x86/kernel/jump_label.c
@@ -44,6 +44,9 @@ static void bug_at(unsigned char *ip, in
 	BUG();
 }
 
+static int short_nops;
+static int long_nops;
+
 static void __jump_label_transform(struct jump_entry *entry,
 				   enum jump_label_type type,
 				   void *(*poker)(void *, const void *, size_t),
@@ -82,9 +85,14 @@ static void __jump_label_transform(struc
 		 */
 		if (init) {
 			/* Ignore short nops, we do not change them */
-			if (memcmp(ip, nop_short, 2) == 0)
+			if (memcmp(ip, nop_short, 2) == 0) {
+				short_nops++;
+				printk("short jump at: %pS %p\n",
+				       (void *)ip, (void *)ip);
 				return;
+			}
 
+			long_nops++;
 			/* We are initializing from the default nop */
 			if (unlikely(memcmp(ip, default_nop, 5) != 0))
 				bug_at(ip, __LINE__);
@@ -154,7 +162,7 @@ __init_or_module void arch_jump_label_tr
 	if (jlstate == JL_STATE_START) {
 		const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
 
-		if (memcmp(ideal_nop, default_nop, 5) != 0)
+		if (1 || memcmp(ideal_nop, default_nop, 5) != 0)
 			jlstate = JL_STATE_UPDATE;
 		else
 			jlstate = JL_STATE_NO_UPDATE;
@@ -162,4 +170,11 @@ __init_or_module void arch_jump_label_tr
 	if (jlstate == JL_STATE_UPDATE)
 		__jump_label_transform(entry, type, text_poke_early, 1);
 }
+
+static __init int output_jumps(void)
+{
+	printk("short jumps: %d\n", short_nops);
+	printk(" long jumps: %d\n", long_nops);
+}
+late_initcall(output_jumps);
 #endif



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH 2/2] x86/jump labels: Use etiher 5 byte or 2 byte jumps
  2013-08-07 17:36 ` [RFC][PATCH 2/2] x86/jump labels: Use etiher 5 byte or 2 byte jumps Steven Rostedt
@ 2013-08-07 17:57   ` Steven Rostedt
  0 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2013-08-07 17:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: H. Peter Anvin, Linus Torvalds, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Jason Baron

On Wed, 2013-08-07 at 13:36 -0400, Steven Rostedt wrote:
> plain text document attachment
> (0002-x86-jump-labels-Use-etiher-5-byte-or-2-byte-jumps.patch)
> From: Steven Rostedt <srostedt@redhat.com>
> 
> Have the jump labels add a "jmp" in the assembly instead
> of a default nop. This will cause the assembler to put in
> either a 2 byte or 5 byte jmp depending on where the target
> lable is.
> 
> Then at compile time, the update_jump_label code will replace
> the jmps with either 2 or 5 byte nops.
> 
> On boot up, the code can be examined to see if the jump label
> uses either a 2 or 5 byte nop and replace it.
> 
> By allowing the jump labels to be 2 bytes, it speeds up the
> nops, not only 2 byte nops are faster than 5 byte nops, but also
> because it saves on cache foot print.
> 
>    text    data     bss     dec     hex filename
> 13403667 3666856 2998272 20068795 13239bb ../nobackup/mxtest/vmlinux-old
> 13398536 3666856 2998272 20063664 13225b0 ../nobackup/mxtest/vmlinux-new
> 
> Converting the current v3.2 trace points saved 5,131 bytes.
> As more places use jump labels, this will have a bigger savings.
> 

FYI, I didn't bother changing the change logs, so these numbers may be
totally bogus now.

-- Steve



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH 3/2] x86/jump labels: Count and display the short jumps used
  2013-08-07 17:54 ` [RFC][PATCH 3/2] x86/jump labels: Count and display the short jumps used Steven Rostedt
@ 2013-08-07 19:22   ` Linus Torvalds
  2013-08-07 19:27     ` H. Peter Anvin
  2013-08-07 20:19     ` Jason Baron
  0 siblings, 2 replies; 15+ messages in thread
From: Linus Torvalds @ 2013-08-07 19:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linux Kernel Mailing List, H. Peter Anvin, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra, Jason Baron

On Wed, Aug 7, 2013 at 10:54 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On another box, using a distro config, I had even better results:
>
> [    2.352448] short jumps: 193
> [    2.355407]  long jumps: 219

.. well, another way of looking at this is to say that all of this
effort saves just 579 bytes.

Yes, maybe some of those bytes are in really hot paths, but the other
side of *that* coin is that the 2-vs-5 byte jump doesn't much matter
if it's already cached.

So I'd vote for not doing this. If we had some simple way to do the
short jumps, I think it would be lovely. Or if we had to parse the ELF
files and do instruction rewriting for various other reasons, and the
jump rewriting was just one small detail.

But using 576 new lines (the diffstat for your patch 1/2 that adds the
infrastructure to do the rewriting) in order to same just about
exactly that many bytes in the binary - the effort just doesn't work
out, imnsho.

              Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH 3/2] x86/jump labels: Count and display the short jumps used
  2013-08-07 19:22   ` Linus Torvalds
@ 2013-08-07 19:27     ` H. Peter Anvin
  2013-08-07 19:49       ` Linus Torvalds
  2013-08-07 20:19     ` Jason Baron
  1 sibling, 1 reply; 15+ messages in thread
From: H. Peter Anvin @ 2013-08-07 19:27 UTC (permalink / raw)
  To: Linus Torvalds, Steven Rostedt
  Cc: Linux Kernel Mailing List, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Peter Zijlstra, Jason Baron

Well we do... both to extract relocations and to sort the exception table.  Perhaps we need to merge those kinds of postprocessing tools?

Linus Torvalds <torvalds@linux-foundation.org> wrote:
>On Wed, Aug 7, 2013 at 10:54 AM, Steven Rostedt <rostedt@goodmis.org>
>wrote:
>>
>> On another box, using a distro config, I had even better results:
>>
>> [    2.352448] short jumps: 193
>> [    2.355407]  long jumps: 219
>
>.. well, another way of looking at this is to say that all of this
>effort saves just 579 bytes.
>
>Yes, maybe some of those bytes are in really hot paths, but the other
>side of *that* coin is that the 2-vs-5 byte jump doesn't much matter
>if it's already cached.
>
>So I'd vote for not doing this. If we had some simple way to do the
>short jumps, I think it would be lovely. Or if we had to parse the ELF
>files and do instruction rewriting for various other reasons, and the
>jump rewriting was just one small detail.
>
>But using 576 new lines (the diffstat for your patch 1/2 that adds the
>infrastructure to do the rewriting) in order to same just about
>exactly that many bytes in the binary - the effort just doesn't work
>out, imnsho.
>
>              Linus

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH 3/2] x86/jump labels: Count and display the short jumps used
  2013-08-07 19:27     ` H. Peter Anvin
@ 2013-08-07 19:49       ` Linus Torvalds
  2013-08-07 20:30         ` Steven Rostedt
  0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2013-08-07 19:49 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Steven Rostedt, Linux Kernel Mailing List, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra, Jason Baron

On Wed, Aug 7, 2013 at 12:27 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> Well we do... both to extract relocations and to sort the exception table.  Perhaps we need to merge those kinds of postprocessing tools?

If we can do this generically and without adding 500 lines of
specialized code, my argument obviously goes away.

Exactly where the "unnecessarily complex" line is is obviously a
matter of taste.

                  Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH 3/2] x86/jump labels: Count and display the short jumps used
  2013-08-07 19:22   ` Linus Torvalds
  2013-08-07 19:27     ` H. Peter Anvin
@ 2013-08-07 20:19     ` Jason Baron
  2013-08-07 20:33       ` Steven Rostedt
  2013-08-07 20:47       ` Linus Torvalds
  1 sibling, 2 replies; 15+ messages in thread
From: Jason Baron @ 2013-08-07 20:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Linux Kernel Mailing List, H. Peter Anvin,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra

On 08/07/2013 03:22 PM, Linus Torvalds wrote:
> On Wed, Aug 7, 2013 at 10:54 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
>> On another box, using a distro config, I had even better results:
>>
>> [    2.352448] short jumps: 193
>> [    2.355407]  long jumps: 219
> .. well, another way of looking at this is to say that all of this
> effort saves just 579 bytes.
>
> Yes, maybe some of those bytes are in really hot paths, but the other
> side of *that* coin is that the 2-vs-5 byte jump doesn't much matter
> if it's already cached.
>
> So I'd vote for not doing this. If we had some simple way to do the
> short jumps, I think it would be lovely. Or if we had to parse the ELF
> files and do instruction rewriting for various other reasons, and the
> jump rewriting was just one small detail.
>
> But using 576 new lines (the diffstat for your patch 1/2 that adds the
> infrastructure to do the rewriting) in order to same just about
> exactly that many bytes in the binary - the effort just doesn't work
> out, imnsho.
>
>               Linus

The whole point of the thread started with wanting to move the default
'disabled' branch further out-of-line. We could get there with better
compiler support for the 'cold' label attribute. Thus, in theory the
whole 2-byte jmp is just an intermediate step. (Yeah, I know that
support doesn't seem to be happening anytime soon...)

Thanks,

-Jason



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH 3/2] x86/jump labels: Count and display the short jumps used
  2013-08-07 19:49       ` Linus Torvalds
@ 2013-08-07 20:30         ` Steven Rostedt
  0 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2013-08-07 20:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Linux Kernel Mailing List, Ingo Molnar,
	Andrew Morton, Thomas Gleixner, Peter Zijlstra, Jason Baron

On Wed, 2013-08-07 at 12:49 -0700, Linus Torvalds wrote:
> On Wed, Aug 7, 2013 at 12:27 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> > Well we do... both to extract relocations and to sort the exception table.  Perhaps we need to merge those kinds of postprocessing tools?
> 
> If we can do this generically and without adding 500 lines of
> specialized code, my argument obviously goes away.

And the code that does this work is actually in the kernel today,
although, seldom used.

The recordmount.[ch] can do this modification. It's main purpose was to
create a table of mcount call locations that ftrace can use to convert
to nops on boot. As it only converts white-listed sections (doesn't
convert things like .init or .exit, as there exists no way today to
generically know when they are removed), it only creates the pointers to
those sections that ftrace will touch.

Those sections that do not get converted, can have a call to mcount that
is not changed to a nop. Most places this is not the case, because
things like __init have "notrace" defined to keep those functions from
being profiled. But if another section is added that ftrace does not
know about, it will get a call to mcount and cause a bit of a slow down.
Note, mcount itself looks like:

mcount:
	retq

To handle these missed cases, the recordmcount.c was modified to convert
the call to mcount into a nop. There's a flag you can set to have this
warn on locations that it converts:

make RECORDMCOUNT_WARN=1

It will give a warning at the locations that it converted, which should
then be white-listed or have notrace added to it.

This is the code I based the update_jump_labels from. With a bit of
work, all these ELF parsers could probably be consolidated, and this
addition may not be that hard or complex to add.

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH 3/2] x86/jump labels: Count and display the short jumps used
  2013-08-07 20:19     ` Jason Baron
@ 2013-08-07 20:33       ` Steven Rostedt
  2013-08-07 20:47       ` Linus Torvalds
  1 sibling, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2013-08-07 20:33 UTC (permalink / raw)
  To: Jason Baron
  Cc: Linus Torvalds, Linux Kernel Mailing List, H. Peter Anvin,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra

On Wed, 2013-08-07 at 16:19 -0400, Jason Baron wrote:

> The whole point of the thread started with wanting to move the default
> 'disabled' branch further out-of-line. We could get there with better
> compiler support for the 'cold' label attribute. Thus, in theory the
> whole 2-byte jmp is just an intermediate step. (Yeah, I know that
> support doesn't seem to be happening anytime soon...)
> 

Actually, Ideally, we would move the bulk of the tracing code out of
line, but we can have the jump to the tracing code still in line, and
the nop jump to it.


	[ hot path]
	jmp trace / nop
1:

	[...]


trace:
	jmp trace_main_code
	jmp 1b


Then that jmp trace can still be a 2 byte op.

-- Steve



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH 3/2] x86/jump labels: Count and display the short jumps used
  2013-08-07 20:19     ` Jason Baron
  2013-08-07 20:33       ` Steven Rostedt
@ 2013-08-07 20:47       ` Linus Torvalds
  2013-08-07 21:37         ` Jason Baron
  1 sibling, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2013-08-07 20:47 UTC (permalink / raw)
  To: Jason Baron
  Cc: Steven Rostedt, Linux Kernel Mailing List, H. Peter Anvin,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra

On Wed, Aug 7, 2013 at 1:19 PM, Jason Baron <jbaron@akamai.com> wrote:
>
> The whole point of the thread started with wanting to move the default
> 'disabled' branch further out-of-line.

Yeah, but I always disagreed with that.

Putting the unusual code out-of-line (as in "at the end of the
function") is a good idea, but putting it *far* out of line (as in "in
a different section") likely makes little sense.

Now, the tracing code is admittedly specialized enough that we could
have some "really cold" attribute and move it to that kind of "even
further away" model, but most of the other uses of the static keys are
not necessarily of the kind where the non-default case is completely
or utterly unlikely - they want to use the static keys not because
some codepath is basically never taken, but because the code-path is
so critical that loading and testing a value from memory is considered
to be excessive for when the feature is turned off (ie scheduler
statistics etc).

So the code may not even be all that cold - some people may well run
with statistics enabled all the time - it's just that the non-enabled
case really *really* doesn't want to have the overhead of even
bothering to test for this event.

So I really think that a short jump would be a good thing, it's just
that I don't think it's *so* important that it's worth having several
hundreds of lines of code to support it.

                 Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH 3/2] x86/jump labels: Count and display the short jumps used
  2013-08-07 20:47       ` Linus Torvalds
@ 2013-08-07 21:37         ` Jason Baron
  2013-08-07 21:56           ` Linus Torvalds
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Baron @ 2013-08-07 21:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, Linux Kernel Mailing List, H. Peter Anvin,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra

On 08/07/2013 04:47 PM, Linus Torvalds wrote:
> On Wed, Aug 7, 2013 at 1:19 PM, Jason Baron <jbaron@akamai.com> wrote:
>> The whole point of the thread started with wanting to move the default
>> 'disabled' branch further out-of-line.
> Yeah, but I always disagreed with that.
>
> Putting the unusual code out-of-line (as in "at the end of the
> function") is a good idea, but putting it *far* out of line (as in "in
> a different section") likely makes little sense.
>
> Now, the tracing code is admittedly specialized enough that we could
> have some "really cold" attribute and move it to that kind of "even
> further away" model, but most of the other uses of the static keys are
> not necessarily of the kind where the non-default case is completely
> or utterly unlikely - they want to use the static keys not because
> some codepath is basically never taken, but because the code-path is
> so critical that loading and testing a value from memory is considered
> to be excessive for when the feature is turned off (ie scheduler
> statistics etc).
>
> So the code may not even be all that cold - some people may well run
> with statistics enabled all the time - it's just that the non-enabled
> case really *really* doesn't want to have the overhead of even
> bothering to test for this event.
>
>

ok - I can see 2 variants here as you mentioned:

1) 'Unbiased' - we want to treat both branches equally but don't want
the load/test/jmp sequence. For things like the scheduler stats.

2) 'Biased' - where the unlikely path is moved completely out-of-line.
And we have a strong 'bias' to optimize the default path.

If we can get the completely out-of-line thing working, we could make
this distinction.

Thanks,

-Jason



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH 3/2] x86/jump labels: Count and display the short jumps used
  2013-08-07 21:37         ` Jason Baron
@ 2013-08-07 21:56           ` Linus Torvalds
  2013-08-10  0:14             ` Andy Lutomirski
  0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2013-08-07 21:56 UTC (permalink / raw)
  To: Jason Baron
  Cc: Steven Rostedt, Linux Kernel Mailing List, H. Peter Anvin,
	Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra

On Wed, Aug 7, 2013 at 2:37 PM, Jason Baron <jbaron@akamai.com> wrote:
>
> ok - I can see 2 variants here as you mentioned:
>
> 1) 'Unbiased' - we want to treat both branches equally but don't want
> the load/test/jmp sequence. For things like the scheduler stats.
>
> 2) 'Biased' - where the unlikely path is moved completely out-of-line.
> And we have a strong 'bias' to optimize the default path.

Actually, personally I'd much rather see *three* cases, and for the
static keys usually only two are relevant:

 - truly unbiased: a normal conditional that we don't really know
which one is the likely or unlikely path

   This is likely *not* the normal case for a static key, since almost
always we have a lightweight case, and a heavy one. Most of the static
keys are about things like statistics, which are inherently more
expensive than not having statistics, so they aren't really
"unbiased". There's a bias introduced by one side being more costly
than the other. You want the light (non-statistics) case to be a
fall-through, while the heavy case is going to be much more expensive
and update global variables etc.

 - "slightly biased". The normal kind of "unlikely()", where you want
to have one side of the branch be the fall-through and common case
that doesn't interrupt the I$ prefetch etc. This is the likely case
for most static key use, I think.

 - "strongly biased": a "very unlikely" kind of thing, which is either
a major exception case (ie an assertion on BUG_ON() or other basically
fatal thing), or is something like the tracing code that is uncommon
and where we want basically minimal impact when it is not enabled, but
when it's on, the impact is so big that the conditional branch no
longer even matters.

The first one really doesn't care which way a branch goes. The second
one prefers a fall-through case for the common case. And the third one
is basically a "get this code away from me".

Both of the biased cases *might* also want things like "save register
state in the unlikely path so that the *likely* path doesn't have to".
Think things like "it's a leaf function, and the likely path doesn't
need any temporaries, but the unlikely path ends up doing function
calls and needs a stack frame". If the compiler can make likely path
avoid the stack frame generation and be straight-line, that would be
really nice.

                Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH 3/2] x86/jump labels: Count and display the short jumps used
  2013-08-07 21:56           ` Linus Torvalds
@ 2013-08-10  0:14             ` Andy Lutomirski
  0 siblings, 0 replies; 15+ messages in thread
From: Andy Lutomirski @ 2013-08-10  0:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason Baron, Steven Rostedt, Linux Kernel Mailing List,
	H. Peter Anvin, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra

On 08/07/2013 02:56 PM, Linus Torvalds wrote:
> 
> Both of the biased cases *might* also want things like "save register
> state in the unlikely path so that the *likely* path doesn't have to".
> Think things like "it's a leaf function, and the likely path doesn't
> need any temporaries, but the unlikely path ends up doing function
> calls and needs a stack frame". If the compiler can make likely path
> avoid the stack frame generation and be straight-line, that would be
> really nice.

This inspired me to see what happens when you call an
__attribute__((noreturn)) function.  The results are sad:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837

I think the explanations for why that bug is WONTFIX are bogus.  And
unless gcc fixes that, trying to get efficient code that intentionally
jumps and never returns seems hard (at least without sticking the
call/jump into inline assembly).

--Andy

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2013-08-10  0:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-07 17:36 [RFC][PATCH 0/2] x86/jump labels: Add the 5 byte or 2 byte jumps Steven Rostedt
2013-08-07 17:36 ` [RFC][PATCH 1/2] jump labels: Add infrastructure to update jump labels at compile time Steven Rostedt
2013-08-07 17:36 ` [RFC][PATCH 2/2] x86/jump labels: Use etiher 5 byte or 2 byte jumps Steven Rostedt
2013-08-07 17:57   ` Steven Rostedt
2013-08-07 17:54 ` [RFC][PATCH 3/2] x86/jump labels: Count and display the short jumps used Steven Rostedt
2013-08-07 19:22   ` Linus Torvalds
2013-08-07 19:27     ` H. Peter Anvin
2013-08-07 19:49       ` Linus Torvalds
2013-08-07 20:30         ` Steven Rostedt
2013-08-07 20:19     ` Jason Baron
2013-08-07 20:33       ` Steven Rostedt
2013-08-07 20:47       ` Linus Torvalds
2013-08-07 21:37         ` Jason Baron
2013-08-07 21:56           ` Linus Torvalds
2013-08-10  0:14             ` Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.