[-- Attachment #1: Type: text/plain, Size: 2209 bytes --] Ingo, I've done some final clean ups of the 2 or 5 byte nop code and I also added a few generic clean ups of jump label itself. The first 2 patches use the best nops as defined by Peter when we do not know what the machine is running. The third patch adds paranoid checks, to make sure what we are modifying is truely what it expects. Otherwise we bug and crash the kernel (as recommended by Peter). The last two patches are the clean up versions of the 2 or 5 byte nops. Note, there is code in the infrastructure that can be shared among other users in scripts and elsewhere. I rather get this code in first and then fiddle with seeing who does what and then integrate all the code together. As that may not be such an easy task, and I don't want to post pone this code due to it. Also, I feel that we can let this code sit for a bit on LKML and let people comment. If everyone gives there acks, then we can pull this code in. I already tested it, but if there are acks, I can rebase just to update the acked-bys. Please pull (if everyone's OK with it) the latest tip/perf/jump-label tree, which can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git tip/perf/jump-label Head SHA1: 7837dddecaa7e7f31b732a0c302a4bdd255a8321 Steven Rostedt (5): x86/jump-label: Use best default nops for inital jump label calls x86/jump-label: Do not bother updating nops if they are correct x86/jump-label: Add safety checks to jump label conversions jump labels: Add infrastructure to update jump labels at compile time jump labels/x86: Use etiher 5 byte or 2 byte jumps ---- Makefile | 7 + arch/Kconfig | 6 + arch/x86/Kconfig | 1 + arch/x86/include/asm/jump_label.h | 9 +- arch/x86/kernel/jump_label.c | 110 +++++++++++- scripts/Makefile | 1 + scripts/Makefile.build | 15 ++- scripts/update_jump_label.c | 335 +++++++++++++++++++++++++++++++++++++ scripts/update_jump_label.h | 208 +++++++++++++++++++++++ 9 files changed, 678 insertions(+), 14 deletions(-) [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1576 bytes --] From: Steven Rostedt <srostedt@redhat.com> As specified by H. Peter Anvin, the best nops for x86 without knowing the running computer is: 32bit: 0x3e, 0x8d, 0x74, 0x26, 0x00 64bit: 0x0f, 0x1f, 0x44, 0x00, 0x00 also known as P6_NOP5 Currently the default nop that is used by jump label is: 0xe9 0x00 0x00 0x00 0x00 Which is really a 5byte jump to the next position. It's better to use a real nop than a jmp. Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> --- arch/x86/include/asm/jump_label.h | 9 +++++++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h index a32b18c..cefcfd3 100644 --- a/arch/x86/include/asm/jump_label.h +++ b/arch/x86/include/asm/jump_label.h @@ -3,18 +3,23 @@ #ifdef __KERNEL__ +#include <linux/stringify.h> #include <linux/types.h> #include <asm/nops.h> #include <asm/asm.h> #define JUMP_LABEL_NOP_SIZE 5 -#define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t" +#ifdef CONFIG_X86_64 +# define JUMP_LABEL_INIT_NOP P6_NOP5 +#else +# define JUMP_LABEL_INIT_NOP 0x3e, 0x8d, 0x74, 0x26, 0x00 +#endif static __always_inline bool arch_static_branch(struct jump_label_key *key) { asm goto("1:" - JUMP_LABEL_INITIAL_NOP + ".byte " __stringify(JUMP_LABEL_INIT_NOP) "\n\t" ".pushsection __jump_table, \"aw\" \n\t" _ASM_ALIGN "\n\t" _ASM_PTR "1b, %l[l_yes], %c0 \n\t" -- 1.7.8.3 [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 1621 bytes --] From: Steven Rostedt <srostedt@redhat.com> On boot up, the jump label init function scans all the jump label locations and converts them to the best nop for the machine. If the nop is already the ideal nop, do not bother with changing it. Cc: Jason Baron <jbaron@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> --- arch/x86/kernel/jump_label.c | 20 +++++++++++++++++++- 1 files changed, 19 insertions(+), 1 deletions(-) diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c index ea9d5f2f..0f4c6ba 100644 --- a/arch/x86/kernel/jump_label.c +++ b/arch/x86/kernel/jump_label.c @@ -53,7 +53,25 @@ void arch_jump_label_transform(struct jump_entry *entry, void arch_jump_label_transform_static(struct jump_entry *entry, enum jump_label_type type) { - __jump_label_transform(entry, type, text_poke_early); + static int once; + static int update; + + /* + * This function is called at boot up and when modules are + * first loaded. Check if the default nop, the one that is + * inserted at compile time, is the ideal nop. If it is, then + * we do not need to update the nop, and we can leave it as is. + * If it is not, then we need to update the nop to the ideal nop. + */ + if (!once) { + unsigned char default_nop[] = { JUMP_LABEL_INIT_NOP }; + const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5]; + once++; + if (memcmp(ideal_nop, default_nop, 5) != 0) + update = 1; + } + if (update) + __jump_label_transform(entry, type, text_poke_early); } #endif -- 1.7.8.3 [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 2703 bytes --] From: Steven Rostedt <srostedt@redhat.com> As with all modifying of kernel text, we need to be very paranoid. When converting the jump label locations to and from nops to jumps a check has been added to make sure what we are replacing is what we expect, otherwise we bug. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> --- arch/x86/kernel/jump_label.c | 32 ++++++++++++++++++++++++++++---- 1 files changed, 28 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c index 0f4c6ba..44f2528 100644 --- a/arch/x86/kernel/jump_label.c +++ b/arch/x86/kernel/jump_label.c @@ -26,16 +26,40 @@ union jump_code_union { static void __jump_label_transform(struct jump_entry *entry, enum jump_label_type type, - void *(*poker)(void *, const void *, size_t)) + void *(*poker)(void *, const void *, size_t), + int init) { union jump_code_union code; + const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5]; if (type == JUMP_LABEL_ENABLE) { + /* + * We are enabling this jump label. If it is not a nop + * then something must have gone wrong. + */ + BUG_ON(memcmp((void *)entry->code, ideal_nop, 5) != 0); + code.jump = 0xe9; code.offset = entry->target - (entry->code + JUMP_LABEL_NOP_SIZE); - } else + } else { + /* + * We are disabling this jump label. If it is not what + * we think it is, then something must have gone wrong. + * If this is the first initialization call, then we + * are converting the default nop to the ideal nop. + */ + if (init) { + unsigned char default_nop[] = { JUMP_LABEL_INIT_NOP }; + BUG_ON(memcmp((void *)entry->code, default_nop, 5) != 0); + } else { + code.jump = 0xe9; + code.offset = entry->target - + (entry->code + JUMP_LABEL_NOP_SIZE); + BUG_ON(memcmp((void *)entry->code, &code, 5) != 0); + } memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE); + } (*poker)((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE); } @@ -45,7 +69,7 @@ void arch_jump_label_transform(struct jump_entry *entry, { get_online_cpus(); mutex_lock(&text_mutex); - __jump_label_transform(entry, type, text_poke_smp); + __jump_label_transform(entry, type, text_poke_smp, 0); mutex_unlock(&text_mutex); put_online_cpus(); } @@ -71,7 +95,7 @@ void arch_jump_label_transform_static(struct jump_entry *entry, update = 1; } if (update) - __jump_label_transform(entry, type, text_poke_early); + __jump_label_transform(entry, type, text_poke_early, 1); } #endif -- 1.7.8.3 [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 19215 bytes --] From: Steven Rostedt <srostedt@redhat.com> Add the infrastructure to allow architectures to modify the jump label locations at compile time. This is mainly for x86, where the jmps may be either 2 bytes or 5 bytes. Instead of wasting 5 bytes for all jump labels, this code will let x86 put in a jmp instead of a 5 byte nop. Then the assembler will make either a 2 byte or 5 byte jmp depending on where the target is. At compile time, this code will replace the jmps with either a 2 byte or 5 byte nop depending on the size of the jmp that was added. Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> --- Makefile | 7 + arch/Kconfig | 6 + scripts/Makefile | 1 + scripts/Makefile.build | 15 ++- scripts/update_jump_label.c | 335 +++++++++++++++++++++++++++++++++++++++++++ scripts/update_jump_label.h | 208 +++++++++++++++++++++++++++ 6 files changed, 570 insertions(+), 2 deletions(-) create mode 100644 scripts/update_jump_label.c create mode 100644 scripts/update_jump_label.h diff --git a/Makefile b/Makefile index adddd11..2b3fdf3 100644 --- a/Makefile +++ b/Makefile @@ -611,6 +611,13 @@ ifdef CONFIG_DYNAMIC_FTRACE endif endif +ifdef CONFIG_JUMP_LABEL + ifdef CONFIG_HAVE_BUILD_TIME_JUMP_LABEL + BUILD_UPDATE_JUMP_LABEL := y + export BUILD_UPDATE_JUMP_LABEL + endif +endif + # We trigger additional mismatches with less inlining ifdef CONFIG_DEBUG_SECTION_MISMATCH KBUILD_CFLAGS += $(call cc-option, -fno-inline-functions-called-once) diff --git a/arch/Kconfig b/arch/Kconfig index 4b0669c..8fa6934 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -169,6 +169,12 @@ config HAVE_PERF_EVENTS_NMI subsystem. Also has support for calculating CPU cycle events to determine how many clock cycles in a given period. +config HAVE_BUILD_TIME_JUMP_LABEL + bool + help + If an arch uses scripts/update_jump_label to patch in jump nops + at build time, then it must enable this option. + config HAVE_ARCH_JUMP_LABEL bool diff --git a/scripts/Makefile b/scripts/Makefile index df7678f..738b65c 100644 --- a/scripts/Makefile +++ b/scripts/Makefile @@ -13,6 +13,7 @@ hostprogs-$(CONFIG_LOGO) += pnmtologo hostprogs-$(CONFIG_VT) += conmakehash hostprogs-$(CONFIG_IKCONFIG) += bin2c hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount +hostprogs-$(BUILD_UPDATE_JUMP_LABEL) += update_jump_label always := $(hostprogs-y) $(hostprogs-m) diff --git a/scripts/Makefile.build b/scripts/Makefile.build index d2b366c..8a84b80 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -258,6 +258,15 @@ cmd_modversions = \ fi; endif +ifdef BUILD_UPDATE_JUMP_LABEL +update_jump_label_source := $(srctree)/scripts/update_jump_label.c \ + $(srctree)/scripts/update_jump_label.h +cmd_update_jump_label = \ + if [ $(@) != "scripts/mod/empty.o" ]; then \ + $(objtree)/scripts/update_jump_label "$(@)"; \ + fi; +endif + ifdef CONFIG_FTRACE_MCOUNT_RECORD ifdef BUILD_C_RECORDMCOUNT ifeq ("$(origin RECORDMCOUNT_WARN)", "command line") @@ -294,6 +303,7 @@ define rule_cc_o_c $(cmd_modversions) \ $(call echo-cmd,record_mcount) \ $(cmd_record_mcount) \ + $(cmd_update_jump_label) \ scripts/basic/fixdep $(depfile) $@ '$(call make-cmd,cc_o_c)' > \ $(dot-target).tmp; \ rm -f $(depfile); \ @@ -301,13 +311,14 @@ define rule_cc_o_c endef # Built-in and composite module parts -$(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE +$(obj)/%.o: $(src)/%.c $(recordmcount_source) $(update_jump_label_source) FORCE $(call cmd,force_checksrc) $(call if_changed_rule,cc_o_c) # Single-part modules are special since we need to mark them in $(MODVERDIR) -$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE +$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) \ + $(update_jump_label_source) FORCE $(call cmd,force_checksrc) $(call if_changed_rule,cc_o_c) @{ echo $(@:.o=.ko); echo $@; } > $(MODVERDIR)/$(@F:.o=.mod) diff --git a/scripts/update_jump_label.c b/scripts/update_jump_label.c new file mode 100644 index 0000000..d114b74 --- /dev/null +++ b/scripts/update_jump_label.c @@ -0,0 +1,335 @@ +/* + * update_jump_label.c: replace jmps with nops at compile time. + * Copyright 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc. + * Parsing of the elf file was influenced by recordmcount.c + * originally written by and copyright to John F. Reiser <jreiser@BitWagon.com>. + */ + +/* + * Note, this code is originally designed for x86, but may be used by + * other archs to do the nop updates at compile time instead of at boot time. + * X86 uses this as an optimization, as jmps can be either 2 bytes or 5 bytes. + * Inserting a 2 byte where possible helps with both CPU performance and + * icache strain. + */ +#include <sys/types.h> +#include <sys/mman.h> +#include <sys/stat.h> +#include <getopt.h> +#include <elf.h> +#include <fcntl.h> +#include <setjmp.h> +#include <stdio.h> +#include <stdlib.h> +#include <stdarg.h> +#include <string.h> +#include <unistd.h> + +static int fd_map; /* File descriptor for file being modified. */ +static struct stat sb; /* Remember .st_size, etc. */ +static int mmap_failed; /* Boolean flag. */ + +static void die(const char *err, const char *fmt, ...) +{ + va_list ap; + + if (err) + perror(err); + + if (fmt) { + va_start(ap, fmt); + fprintf(stderr, "Fatal error: "); + vfprintf(stderr, fmt, ap); + fprintf(stderr, "\n"); + va_end(ap); + } + + exit(1); +} + +static void usage(char **argv) +{ + char *arg = argv[0]; + char *p = arg + strlen(arg); + + while (p >= arg && *p != '/') + p--; + p++; + + printf("usage: %s file\n" + "\n", p); + exit(-1); +} + +/* w8rev, w8nat, ...: Handle endianness. */ + +static uint64_t w8rev(uint64_t const x) +{ + return ((0xff & (x >> (0 * 8))) << (7 * 8)) + | ((0xff & (x >> (1 * 8))) << (6 * 8)) + | ((0xff & (x >> (2 * 8))) << (5 * 8)) + | ((0xff & (x >> (3 * 8))) << (4 * 8)) + | ((0xff & (x >> (4 * 8))) << (3 * 8)) + | ((0xff & (x >> (5 * 8))) << (2 * 8)) + | ((0xff & (x >> (6 * 8))) << (1 * 8)) + | ((0xff & (x >> (7 * 8))) << (0 * 8)); +} + +static uint32_t w4rev(uint32_t const x) +{ + return ((0xff & (x >> (0 * 8))) << (3 * 8)) + | ((0xff & (x >> (1 * 8))) << (2 * 8)) + | ((0xff & (x >> (2 * 8))) << (1 * 8)) + | ((0xff & (x >> (3 * 8))) << (0 * 8)); +} + +static uint32_t w2rev(uint16_t const x) +{ + return ((0xff & (x >> (0 * 8))) << (1 * 8)) + | ((0xff & (x >> (1 * 8))) << (0 * 8)); +} + +static uint64_t w8nat(uint64_t const x) +{ + return x; +} + +static uint32_t w4nat(uint32_t const x) +{ + return x; +} + +static uint32_t w2nat(uint16_t const x) +{ + return x; +} + +static uint64_t (*w8)(uint64_t); +static uint32_t (*w)(uint32_t); +static uint32_t (*w2)(uint16_t); + +/* ulseek, uread, ...: Check return value for errors. */ + +static off_t +ulseek(int const fd, off_t const offset, int const whence) +{ + off_t const w = lseek(fd, offset, whence); + if (w == (off_t)-1) + die("lseek", NULL); + + return w; +} + +static size_t +uread(int const fd, void *const buf, size_t const count) +{ + size_t const n = read(fd, buf, count); + if (n != count) + die("read", NULL); + + return n; +} + +static size_t +uwrite(int const fd, void const *const buf, size_t const count) +{ + size_t const n = write(fd, buf, count); + if (n != count) + die("write", NULL); + + return n; +} + +static void * +umalloc(size_t size) +{ + void *const addr = malloc(size); + if (addr == 0) + die("malloc", "malloc failed: %zu bytes\n", size); + + return addr; +} + +/* + * Get the whole file as a programming convenience in order to avoid + * malloc+lseek+read+free of many pieces. If successful, then mmap + * avoids copying unused pieces; else just read the whole file. + * Open for both read and write; new info will be appended to the file. + * Use MAP_PRIVATE so that a few changes to the in-memory ElfXX_Ehdr + * do not propagate to the file until an explicit overwrite at the last. + * This preserves most aspects of consistency (all except .st_size) + * for simultaneous readers of the file while we are appending to it. + * However, multiple writers still are bad. We choose not to use + * locking because it is expensive and the use case of kernel build + * makes multiple writers unlikely. + */ +static void *mmap_file(char const *fname) +{ + void *addr; + + fd_map = open(fname, O_RDWR); + if (fd_map < 0 || fstat(fd_map, &sb) < 0) + die(fname, "failed to open file"); + + if (!S_ISREG(sb.st_mode)) + die(NULL, "not a regular file: %s\n", fname); + + addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE, + fd_map, 0); + + mmap_failed = 0; + if (addr == MAP_FAILED) { + mmap_failed = 1; + addr = umalloc(sb.st_size); + uread(fd_map, addr, sb.st_size); + } + return addr; +} + +static void munmap_file(void *addr) +{ + if (!mmap_failed) + munmap(addr, sb.st_size); + else + free(addr); + close(fd_map); +} + +static unsigned char ideal_nop5_x86_64[5] = { 0x0f, 0x1f, 0x44, 0x00, 0x00 }; +static unsigned char ideal_nop5_x86_32[5] = { 0x3e, 0x8d, 0x74, 0x26, 0x00 }; +static unsigned char ideal_nop2_x86[2] = { 0x66, 0x90 }; +static unsigned char *ideal_nop; + +static int (*make_nop)(void *map, size_t const offset); + +static int make_nop_x86(void *map, size_t const offset) +{ + unsigned char *op; + unsigned char *nop; + int size; + + /* Determine which type of jmp this is 2 byte or 5. */ + op = map + offset; + switch (*op) { + case 0xeb: /* 2 byte */ + size = 2; + nop = ideal_nop2_x86; + break; + case 0xe9: /* 5 byte */ + size = 5; + nop = ideal_nop; + break; + default: + die(NULL, "Bad jump label section (bad op %x)\n", *op); + __builtin_unreachable(); + } + + /* convert to nop */ + ulseek(fd_map, offset, SEEK_SET); + uwrite(fd_map, nop, size); + return 0; +} + +/* 32 bit and 64 bit are very similar */ +#include "update_jump_label.h" +#define UPDATE_JUMP_LABEL_64 +#include "update_jump_label.h" + +static int do_file(const char *fname) +{ + Elf32_Ehdr *const ehdr = mmap_file(fname); + unsigned int reltype = 0; + + w = w4nat; + w2 = w2nat; + w8 = w8nat; + switch (ehdr->e_ident[EI_DATA]) { + static unsigned int const endian = 1; + default: + die(NULL, "unrecognized ELF data encoding %d: %s\n", + ehdr->e_ident[EI_DATA], fname); + break; + case ELFDATA2LSB: + if (*(unsigned char const *)&endian != 1) { + /* main() is big endian, file.o is little endian. */ + w = w4rev; + w2 = w2rev; + w8 = w8rev; + } + break; + case ELFDATA2MSB: + if (*(unsigned char const *)&endian != 0) { + /* main() is little endian, file.o is big endian. */ + w = w4rev; + w2 = w2rev; + w8 = w8rev; + } + break; + } /* end switch */ + + if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0 || + w2(ehdr->e_type) != ET_REL || + ehdr->e_ident[EI_VERSION] != EV_CURRENT) + die(NULL, "unrecognized ET_REL file %s\n", fname); + + switch (w2(ehdr->e_machine)) { + default: + die(NULL, "unrecognized e_machine %d %s\n", + w2(ehdr->e_machine), fname); + break; + case EM_386: + reltype = R_386_32; + make_nop = make_nop_x86; + ideal_nop = ideal_nop5_x86_32; + break; + case EM_ARM: reltype = R_ARM_ABS32; + break; + case EM_IA_64: reltype = R_IA64_IMM64; break; + case EM_MIPS: /* reltype: e_class */ break; + case EM_PPC: reltype = R_PPC_ADDR32; break; + case EM_PPC64: reltype = R_PPC64_ADDR64; break; + case EM_S390: /* reltype: e_class */ break; + case EM_SH: reltype = R_SH_DIR32; break; + case EM_SPARCV9: reltype = R_SPARC_64; break; + case EM_X86_64: + make_nop = make_nop_x86; + ideal_nop = ideal_nop5_x86_64; + reltype = R_X86_64_64; + break; + } /* end switch */ + + switch (ehdr->e_ident[EI_CLASS]) { + default: + die(NULL, "unrecognized ELF class %d %s\n", + ehdr->e_ident[EI_CLASS], fname); + break; + case ELFCLASS32: + if (w2(ehdr->e_ehsize) != sizeof(Elf32_Ehdr) + || w2(ehdr->e_shentsize) != sizeof(Elf32_Shdr)) + die(NULL, "unrecognized ET_REL file: %s\n", fname); + + do_func32(ehdr, fname, reltype); + break; + case ELFCLASS64: { + Elf64_Ehdr *const ghdr = (Elf64_Ehdr *)ehdr; + if (w2(ghdr->e_ehsize) != sizeof(Elf64_Ehdr) + || w2(ghdr->e_shentsize) != sizeof(Elf64_Shdr)) + die(NULL, "unrecognized ET_REL file: %s\n", fname); + + do_func64(ghdr, fname, reltype); + break; + } + } /* end switch */ + + munmap_file(ehdr); + return 0; +} + +int main(int argc, char **argv) +{ + if (argc != 2) + usage(argv); + + return do_file(argv[1]); +} + diff --git a/scripts/update_jump_label.h b/scripts/update_jump_label.h new file mode 100644 index 0000000..181bd5b --- /dev/null +++ b/scripts/update_jump_label.h @@ -0,0 +1,208 @@ +/* + * update_jump_label.h + * + * This code was based off of code from recordmcount.c written by + * Copyright 2009 John F. Reiser <jreiser@BitWagon.com>. All rights reserved. + * + * The original code had the same algorithms for both 32bit + * and 64bit ELF files, but the code was duplicated to support + * the difference in structures that were used. This + * file creates a macro of everything that is different between + * the 64 and 32 bit code, such that by including this header + * twice we can create both sets of functions by including this + * header once with UPDATE_JUMP_LABEL_64 undefined, and again with + * it defined. + * + * Copyright 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc. + * + * Licensed under the GNU General Public License, version 2 (GPLv2). + */ + +#undef EBITS +#undef _w + +#ifdef UPDATE_JUMP_LABEL_64 +# define EBITS 64 +# define _w w8 +#else +# define EBITS 32 +# define _w w +#endif + +#define _FBITS(x, e) x##e +#define FBITS(x, e) _FBITS(x, e) +#define FUNC(x) FBITS(x, EBITS) + +#undef Elf_Ehdr +#undef Elf_Shdr +#undef Elf_Rel +#undef Elf_Rela +#undef Elf_Sym +#undef ELF_R_SYM +#undef ELF_R_TYPE + +#define __ATTACH(x, y, z) x##y##z +#define ATTACH(x, y, z) __ATTACH(x, y, z) + +#define Elf_Ehdr ATTACH(Elf, EBITS, _Ehdr) +#define Elf_Shdr ATTACH(Elf, EBITS, _Shdr) +#define Elf_Rel ATTACH(Elf, EBITS, _Rel) +#define Elf_Rela ATTACH(Elf, EBITS, _Rela) +#define Elf_Sym ATTACH(Elf, EBITS, _Sym) +#define uint_t ATTACH(uint, EBITS, _t) +#define ELF_R_SYM ATTACH(ELF, EBITS, _R_SYM) +#define ELF_R_TYPE ATTACH(ELF, EBITS, _R_TYPE) + +#undef get_shdr +#define get_shdr(ehdr) ((Elf_Shdr *)(_w((ehdr)->e_shoff) + (void *)(ehdr))) + +#undef get_section_loc +#define get_section_loc(ehdr, shdr)(_w((shdr)->sh_offset) + (void *)(ehdr)) + +/* Find relocation section hdr for a given section */ +static const Elf_Shdr * +FUNC(find_relhdr)(const Elf_Ehdr *ehdr, const Elf_Shdr *shdr) +{ + const Elf_Shdr *shdr0 = get_shdr(ehdr); + int nhdr = w2(ehdr->e_shnum); + const Elf_Shdr *hdr; + int i; + + for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) { + if (w(hdr->sh_type) != SHT_REL && + w(hdr->sh_type) != SHT_RELA) + continue; + + /* + * The relocation section's info field holds + * the section index that it represents. + */ + if (shdr == &shdr0[w(hdr->sh_info)]) + return hdr; + } + return NULL; +} + +/* Find a section headr based on name and type */ +static const Elf_Shdr * +FUNC(find_shdr)(const Elf_Ehdr *ehdr, const char *name, uint_t type) +{ + const Elf_Shdr *shdr0 = get_shdr(ehdr); + const Elf_Shdr *shstr = &shdr0[w2(ehdr->e_shstrndx)]; + const char *shstrtab = (char *)get_section_loc(ehdr, shstr); + int nhdr = w2(ehdr->e_shnum); + const Elf_Shdr *hdr; + const char *hdrname; + int i; + + for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) { + if (w(hdr->sh_type) != type) + continue; + + /* If we are just looking for a section by type (ie. SYMTAB) */ + if (!name) + return hdr; + + hdrname = &shstrtab[w(hdr->sh_name)]; + if (strcmp(hdrname, name) == 0) + return hdr; + } + return NULL; +} + +static void +FUNC(section_update)(const Elf_Ehdr *ehdr, const Elf_Shdr *symhdr, + unsigned shtype, const Elf_Rel *rel, void *data) +{ + const Elf_Shdr *shdr0 = get_shdr(ehdr); + const Elf_Shdr *targethdr; + const Elf_Rela *rela; + const Elf_Sym *syment; + uint_t offset = _w(rel->r_offset); + uint_t info = _w(rel->r_info); + uint_t sym = ELF_R_SYM(info); + uint_t type = ELF_R_TYPE(info); + uint_t addend; + uint_t targetloc; + + if (shtype == SHT_RELA) { + rela = (const Elf_Rela *)rel; + addend = _w(rela->r_addend); + } else + addend = _w(*(int *)(data + offset)); + + syment = (const Elf_Sym *)get_section_loc(ehdr, symhdr); + targethdr = &shdr0[w2(syment[sym].st_shndx)]; + targetloc = _w(targethdr->sh_offset); + + /* TODO, need a separate function for all archs */ + if (type != R_386_32) + die(NULL, "Arch relocation type %d not supported", type); + + targetloc += addend; + + *(uint_t *)(data + offset) = targetloc; +} + +/* Overall supervision for Elf32 ET_REL file. */ +static void +FUNC(do_func)(Elf_Ehdr *ehdr, char const *const fname, unsigned const reltype) +{ + const Elf_Shdr *jlshdr; + const Elf_Shdr *jlrhdr; + const Elf_Shdr *symhdr; + const Elf_Rel *rel; + unsigned size; + unsigned cnt; + unsigned i; + uint_t type; + void *jdata; + void *data; + + jlshdr = FUNC(find_shdr)(ehdr, "__jump_table", SHT_PROGBITS); + if (!jlshdr) + return; + + jlrhdr = FUNC(find_relhdr)(ehdr, jlshdr); + if (!jlrhdr) + return; + + /* + * Create and fill in the __jump_table section and use it to + * find the offsets into the text that we want to update. + * We create it so that we do not depend on the order of the + * relocations, and use the table directly, as it is broken + * up into sections. + */ + size = _w(jlshdr->sh_size); + data = umalloc(size); + + jdata = (void *)get_section_loc(ehdr, jlshdr); + memcpy(data, jdata, size); + + cnt = _w(jlrhdr->sh_size) / w(jlrhdr->sh_entsize); + + rel = (const Elf_Rel *)get_section_loc(ehdr, jlrhdr); + + /* Is this as Rel or Rela? */ + type = w(jlrhdr->sh_type); + + symhdr = FUNC(find_shdr)(ehdr, NULL, SHT_SYMTAB); + + for (i = 0; i < cnt; i++) { + FUNC(section_update)(ehdr, symhdr, type, rel, data); + rel = (void *)rel + w(jlrhdr->sh_entsize); + } + + /* + * This is specific to x86. The jump_table is stored in three + * long words. The first is the location of the jmp target we + * must update. + */ + cnt = size / sizeof(uint_t); + + for (i = 0; i < cnt; i += 3) + make_nop((void *)ehdr, *(uint_t *)(data + i * sizeof(uint_t))); + + free(data); +} -- 1.7.8.3 [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 836 bytes --]
[-- Attachment #1: Type: text/plain, Size: 6324 bytes --] From: Steven Rostedt <srostedt@redhat.com> Have the jump labels add a "jmp" in the assembly instead of a default nop. This will cause the assembler to put in either a 2 byte or 5 byte jmp depending on where the target lable is. Then at compile time, the update_jump_label code will replace the jmps with either 2 or 5 byte nops. On boot up, the code can be examined to see if the jump label uses either a 2 or 5 byte nop and replace it. By allowing the jump labels to be 2 bytes, it speeds up the nops, not only 2 byte nops are faster than 5 byte nops, but also because it saves on cache foot print. text data bss dec hex filename 13403667 3666856 2998272 20068795 13239bb ../nobackup/mxtest/vmlinux-old 13398536 3666856 2998272 20063664 13225b0 ../nobackup/mxtest/vmlinux-new Converting the current v3.2 trace points saved 5,131 bytes. As more places use jump labels, this will have a bigger savings. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> --- arch/x86/Kconfig | 1 + arch/x86/include/asm/jump_label.h | 2 +- arch/x86/kernel/jump_label.c | 86 ++++++++++++++++++++++++++++-------- 3 files changed, 69 insertions(+), 20 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index efb4294..b5004c1 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -61,6 +61,7 @@ config X86 select HAVE_ARCH_KMEMCHECK select HAVE_USER_RETURN_NOTIFIER select HAVE_ARCH_JUMP_LABEL + select HAVE_BUILD_TIME_JUMP_LABEL select HAVE_TEXT_POKE_SMP select HAVE_GENERIC_HARDIRQS select HAVE_SPARSE_IRQ diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h index cefcfd3..bde5323 100644 --- a/arch/x86/include/asm/jump_label.h +++ b/arch/x86/include/asm/jump_label.h @@ -19,7 +19,7 @@ static __always_inline bool arch_static_branch(struct jump_label_key *key) { asm goto("1:" - ".byte " __stringify(JUMP_LABEL_INIT_NOP) "\n\t" + "jmp %l[l_yes]\n" ".pushsection __jump_table, \"aw\" \n\t" _ASM_ALIGN "\n\t" _ASM_PTR "1b, %l[l_yes], %c0 \n\t" diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c index 44f2528..8abd2a3 100644 --- a/arch/x86/kernel/jump_label.c +++ b/arch/x86/kernel/jump_label.c @@ -16,12 +16,21 @@ #ifdef HAVE_JUMP_LABEL +static unsigned char nop_short[] = { P6_NOP2 }; + +/* These are the nops added at compile time */ +static unsigned char default_nop[] = { JUMP_LABEL_INIT_NOP }; + union jump_code_union { char code[JUMP_LABEL_NOP_SIZE]; struct { char jump; int offset; - } __attribute__((packed)); + } __packed; + struct { + char jump_short; + char offset_short; + } __packed; }; static void __jump_label_transform(struct jump_entry *entry, @@ -30,18 +39,33 @@ static void __jump_label_transform(struct jump_entry *entry, int init) { union jump_code_union code; + unsigned char nop; + unsigned char op; + unsigned size; + void *ip = (void *)entry->code; const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5]; - if (type == JUMP_LABEL_ENABLE) { - /* - * We are enabling this jump label. If it is not a nop - * then something must have gone wrong. - */ - BUG_ON(memcmp((void *)entry->code, ideal_nop, 5) != 0); + /* Use probe_kernel_read()? */ + op = *(unsigned char *)ip; + nop = ideal_nops[NOP_ATOMIC5][0]; - code.jump = 0xe9; - code.offset = entry->target - - (entry->code + JUMP_LABEL_NOP_SIZE); + if (type == JUMP_LABEL_ENABLE) { + if (memcmp(ip, nop_short, 2) == 0) { + size = 2; + code.jump_short = 0xeb; + code.offset = entry->target - (entry->code + 2); + /* Check for overflow ? */ + } else if (memcmp(ip, ideal_nop, 5) == 0) { + size = JUMP_LABEL_NOP_SIZE; + code.jump = 0xe9; + code.offset = entry->target - (entry->code + size); + } else + /* + * The location is not a nop that we were expecting, + * something went wrong. Crash the box, as something could be + * corrupting the kernel. + */ + BUG(); } else { /* * We are disabling this jump label. If it is not what @@ -50,18 +74,44 @@ static void __jump_label_transform(struct jump_entry *entry, * are converting the default nop to the ideal nop. */ if (init) { - unsigned char default_nop[] = { JUMP_LABEL_INIT_NOP }; - BUG_ON(memcmp((void *)entry->code, default_nop, 5) != 0); - } else { + /* Ignore short nops, we do not change them */ + if (memcmp(ip, nop_short, 2) == 0) + return; + + /* We are initializing from the default nop */ + BUG_ON(memcmp(ip, default_nop, 5) != 0); + + /* Set to the ideal nop */ + size = JUMP_LABEL_NOP_SIZE; + memcpy(&code, ideal_nops[NOP_ATOMIC5], size); + + } else if (op == 0xe9) { + /* Replace a 5 byte jmp */ + + /* Make sure this is what we expected it to be */ code.jump = 0xe9; code.offset = entry->target - (entry->code + JUMP_LABEL_NOP_SIZE); - BUG_ON(memcmp((void *)entry->code, &code, 5) != 0); - } - memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE); + BUG_ON(memcmp(ip, &code, 5) != 0); + + size = JUMP_LABEL_NOP_SIZE; + memcpy(&code, ideal_nops[NOP_ATOMIC5], size); + } else if (op == 0xeb) { + /* Replace a 2 byte jmp */ + + /* Had better be a 2 byte jmp */ + code.jump_short = 0xeb; + code.offset = entry->target - (entry->code + 2); + BUG_ON(memcmp(ip, &code, 2) != 0); + + size = 2; + memcpy(&code, nop_short, size); + } else + /* The code was not what we expected! */ + BUG(); } - (*poker)((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE); + (*poker)(ip, &code, size); } void arch_jump_label_transform(struct jump_entry *entry, @@ -88,7 +138,6 @@ void arch_jump_label_transform_static(struct jump_entry *entry, * If it is not, then we need to update the nop to the ideal nop. */ if (!once) { - unsigned char default_nop[] = { JUMP_LABEL_INIT_NOP }; const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5]; once++; if (memcmp(ideal_nop, default_nop, 5) != 0) @@ -97,5 +146,4 @@ void arch_jump_label_transform_static(struct jump_entry *entry, if (update) __jump_label_transform(entry, type, text_poke_early, 1); } - #endif -- 1.7.8.3 [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 836 bytes --]
On 01/27/2012 12:14 PM, Steven Rostedt wrote:
>
> #define JUMP_LABEL_NOP_SIZE 5
>
> -#define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t"
> +#ifdef CONFIG_X86_64 +# define JUMP_LABEL_INIT_NOP P6_NOP5 +#else
> +# define JUMP_LABEL_INIT_NOP 0x3e, 0x8d, 0x74, 0x26, 0x00 +#endif
>
Why not just use ASM_NOP5_ATOMIC since you have configuration
available here?
-hpa
On Fri, 27 Jan 2012 15:14:44 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > --- a/arch/x86/kernel/jump_label.c > +++ b/arch/x86/kernel/jump_label.c > @@ -53,7 +53,25 @@ void arch_jump_label_transform(struct jump_entry *entry, > void arch_jump_label_transform_static(struct jump_entry *entry, > enum jump_label_type type) > { > - __jump_label_transform(entry, type, text_poke_early); > + static int once; > + static int update; > + > + /* > + * This function is called at boot up and when modules are > + * first loaded. Check if the default nop, the one that is > + * inserted at compile time, is the ideal nop. If it is, then > + * we do not need to update the nop, and we can leave it as is. > + * If it is not, then we need to update the nop to the ideal nop. > + */ > + if (!once) { > + unsigned char default_nop[] = { JUMP_LABEL_INIT_NOP }; const. It generates less code. With my compiler version, at least. That was a bit dumb of it, given that memcmp() is declared to take const args. > + const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5]; > + once++; > + if (memcmp(ideal_nop, default_nop, 5) != 0) > + update = 1; > + } > + if (update) > + __jump_label_transform(entry, type, text_poke_early); > }
On Fri, 2012-01-27 at 12:34 -0800, Andrew Morton wrote:
> > + if (!once) {
> > + unsigned char default_nop[] = { JUMP_LABEL_INIT_NOP };
>
> const. It generates less code. With my compiler version, at least.
> That was a bit dumb of it, given that memcmp() is declared to take
> const args.
OK will update,
Thanks!
-- Steve
[ Added Michal ]
On Fri, 2012-01-27 at 12:27 -0800, H. Peter Anvin wrote:
> On 01/27/2012 12:14 PM, Steven Rostedt wrote:
> >
> > #define JUMP_LABEL_NOP_SIZE 5
> >
> > -#define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t"
> > +#ifdef CONFIG_X86_64 +# define JUMP_LABEL_INIT_NOP P6_NOP5 +#else
> > +# define JUMP_LABEL_INIT_NOP 0x3e, 0x8d, 0x74, 0x26, 0x00 +#endif
> >
>
> Why not just use ASM_NOP5_ATOMIC since you have configuration
> available here?
This eventually needs to match the code in the update_jump_label.c,
otherwise the checks will be incorrect on boot up, and we'll BUG the
kernel.
I need to find a way to get the config options of the kernel into the
compiling of the update_jump_label.c. Then I could use the nops.h file
with the set configs.
Michal, can we get host tools to compile with the configs set by
the .config file?
update_jump_label.c is built just like recordmcount.c is. It would be
great if it could be compiled with the same config.h defines as the
kernel is. That way I can make it (and even recordmcount.c) customize to
the type of kernel that is being built.
Then again, it would also need to be part of the dependencies that are
done with configs.
If not, at the very least, I'll add a comment (in the last patch) that
explains why we use our own "JUMP_LABEL_INIT_NOP" instead of
ASM_NOP5_ATOMIC.
-- Steve
On 01/27/2012 12:49 PM, Steven Rostedt wrote:
>
> If not, at the very least, I'll add a comment (in the last patch) that
> explains why we use our own "JUMP_LABEL_INIT_NOP" instead of
> ASM_NOP5_ATOMIC.
>
Then at least use GENERIC_NOP5_ATOMIC instead of open-coding it.
-hpa
On Fri, 2012-01-27 at 12:50 -0800, H. Peter Anvin wrote:
> On 01/27/2012 12:49 PM, Steven Rostedt wrote:
> >
> > If not, at the very least, I'll add a comment (in the last patch) that
> > explains why we use our own "JUMP_LABEL_INIT_NOP" instead of
> > ASM_NOP5_ATOMIC.
> >
>
> Then at least use GENERIC_NOP5_ATOMIC instead of open-coding it.
OK, I can do that. Hehe, I looked for the x86_32 version, but it was so
obfuscated that I didn't find it. Thanks for pointing it out.
-- Steve
On Fri, 2012-01-27 at 15:14 -0500, Steven Rostedt wrote:
> +static unsigned char nop_short[] = { P6_NOP2 };
> +
> +/* These are the nops added at compile time */
> +static unsigned char default_nop[] = { JUMP_LABEL_INIT_NOP };
> +
>
Andrew,
I also made the above two into static const. Just so you know ;-)
-- Steve
On 27.1.2012 21:49, Steven Rostedt wrote: > I need to find a way to get the config options of the kernel into the > compiling of the update_jump_label.c. Then I could use the nops.h file > with the set configs. You can do a #include "../include/generated/autoconf.h" and you will get the CONFIG_ definitions. > Michal, can we get host tools to compile with the configs set by > the .config file? Yeah, we should simply compile all the host tools with autoconf.h and kconfig.h included. The idea behind not doing this probably was that a single scripts/ directory should work with multiple kernel configurations, but this is obviously not the case. Michal