linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/7] Immediate Values (redux) for 2.6.24-rc4-git3
@ 2007-12-06  2:07 Mathieu Desnoyers
  2007-12-06  2:07 ` [patch 1/7] Immediate Values - Architecture Independent Code Mathieu Desnoyers
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2007-12-06  2:07 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel

Hi,

Here are the redux version of immediate values. It currently supports x86_32,
x86_64 and powerpc. The other architectures use a generic fallback (a simple
variable).

It depends on the text edit lock patches. It diminishes the impact of dormant
markers by providing a fast branch over a function call with a "load immediate"
instruction. It minimizes the d-cache hit.

This redux version it not reentrant wrt NMIs and MCEs. One must be cautious not
to use it in code paths reached by such execution contexts.

It could be interesting to queue this for 2.6.25.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 1/7] Immediate Values - Architecture Independent Code
  2007-12-06  2:07 [patch 0/7] Immediate Values (redux) for 2.6.24-rc4-git3 Mathieu Desnoyers
@ 2007-12-06  2:07 ` Mathieu Desnoyers
  2007-12-06  2:08 ` [patch 2/7] Immediate Values - Kconfig menu in EMBEDDED Mathieu Desnoyers
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2007-12-06  2:07 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel; +Cc: Mathieu Desnoyers, Rusty Russell

[-- Attachment #1: immediate-values-architecture-independent-code.patch --]
[-- Type: text/plain, Size: 18558 bytes --]

Immediate values are used as read mostly variables that are rarely updated. They
use code patching to modify the values inscribed in the instruction stream. It
provides a way to save precious cache lines that would otherwise have to be used
by these variables.

There is a generic _imv_read() version, which uses standard global
variables, and optimized per architecture imv_read() implementations,
which use a load immediate to remove a data cache hit. When the immediate values
functionnality is disabled in the kernel, it falls back to global variables.

It adds a new rodata section "__imv" to place the pointers to the enable
value. Immediate values activation functions sits in kernel/immediate.c.

Immediate values refer to the memory address of a previously declared integer.
This integer holds the information about the state of the immediate values
associated, and must be accessed through the API found in linux/immediate.h.

At module load time, each immediate value is checked to see if it must be
enabled. It would be the case if the variable they refer to is exported from
another module and already enabled.

In the early stages of start_kernel(), the immediate values are updated to
reflect the state of the variable they refer to.

* Why should this be merged *

It improves performances on heavy memory I/O workloads.

An interesting result shows the potential this infrastructure has by
showing the slowdown a simple system call such as getppid() suffers when it is
used under heavy user-space cache trashing:

Random walk L1 and L2 trashing surrounding a getppid() call:
(note: in this test, do_syscal_trace was taken at each system call, see
Documentation/immediate.txt in these patches for details)
- No memory pressure :   getppid() takes  1573 cycles
- With memory pressure : getppid() takes 15589 cycles

We therefore have a slowdown of 10 times just to get the kernel variables from
memory. Another test on the same architecture (Intel P4) measured the memory
latency to be 559 cycles. Therefore, each cache line removed from the hot path
would improve the syscall time of 3.5% in these conditions.

Changelog:

- section __imv is already SHF_ALLOC
- Because of the wonders of ELF, section 0 has sh_addr and sh_size 0.  So
  the if (immediateindex) is unnecessary here.
- Remove module_mutex usage: depend on functions implemented in module.c for
  that.
- Does not update tainted module's immediate values.
- remove imv_*_t types, add DECLARE_IMV() and DEFINE_IMV().
  - imv_read(&var) becomes imv_read(var) because of this.
- Adding a new EXPORT_IMV_SYMBOL(_GPL).
- remove imv_if(). Should use if (unlikely(imv_read(var))) instead.
  - Wait until we have gcc support before we add the imv_if macro, since
    its form may have to change.
- Dont't declare the __imv section in vmlinux.lds.h, just put the content
  in the rodata section.
- Simplify interface : remove imv_set_early, keep track of kernel boot
  status internally.
- Remove the ALIGN(8) before the __imv section. It is packed now.
- Uses an IPI busy-loop on each CPU with interrupts disabled as a simple,
  architecture agnostic, update mechanism.
- Use imv_* instead of immediate_*.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
---
 include/asm-generic/vmlinux.lds.h |    3 
 include/linux/immediate.h         |   94 +++++++++++++++++++
 include/linux/module.h            |   16 +++
 init/main.c                       |    8 +
 kernel/Makefile                   |    1 
 kernel/immediate.c                |  187 ++++++++++++++++++++++++++++++++++++++
 kernel/module.c                   |   50 +++++++++-
 7 files changed, 358 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/include/linux/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/immediate.h	2007-11-28 09:32:04.000000000 -0500
@@ -0,0 +1,94 @@
+#ifndef _LINUX_IMMEDIATE_H
+#define _LINUX_IMMEDIATE_H
+
+/*
+ * Immediate values, can be updated at runtime and save cache lines.
+ *
+ * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#ifdef CONFIG_IMMEDIATE
+
+struct __imv {
+	unsigned long var;	/* Pointer to the identifier variable of the
+				 * immediate value
+				 */
+	unsigned long imv;	/*
+				 * Pointer to the memory location of the
+				 * immediate value within the instruction.
+				 */
+	unsigned char size;	/* Type size. */
+} __attribute__ ((packed));
+
+#include <asm/immediate.h>
+
+/**
+ * imv_set - set immediate variable (with locking)
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name, taking the module_mutex if required by
+ * the architecture.
+ */
+#define imv_set(name, i)						\
+	do {								\
+		name##__imv = (i);					\
+		core_imv_update();					\
+		module_imv_update();					\
+	} while (0)
+
+/*
+ * Internal update functions.
+ */
+extern void core_imv_update(void);
+extern void imv_update_range(const struct __imv *begin,
+	const struct __imv *end);
+
+#else
+
+/*
+ * Generic immediate values: a simple, standard, memory load.
+ */
+
+/**
+ * imv_read - read immediate variable
+ * @name: immediate value name
+ *
+ * Reads the value of @name.
+ */
+#define imv_read(name)			_imv_read(name)
+
+/**
+ * imv_set - set immediate variable (with locking)
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name, taking the module_mutex if required by
+ * the architecture.
+ */
+#define imv_set(name, i)		(name##__imv = (i))
+
+static inline void core_imv_update(void) { }
+static inline void module_imv_update(void) { }
+
+#endif
+
+#define DECLARE_IMV(type, name) extern __typeof__(type) name##__imv
+#define DEFINE_IMV(type, name)  __typeof__(type) name##__imv
+
+#define EXPORT_IMV_SYMBOL(name) EXPORT_SYMBOL(name##__imv)
+#define EXPORT_IMV_SYMBOL_GPL(name) EXPORT_SYMBOL_GPL(name##__imv)
+
+/**
+ * _imv_read - Read immediate value with standard memory load.
+ * @name: immediate value name
+ *
+ * Force a data read of the immediate value instead of the immediate value
+ * based mechanism. Useful for __init and __exit section data read.
+ */
+#define _imv_read(name)		(name##__imv)
+
+#endif
Index: linux-2.6-lttng/include/linux/module.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/module.h	2007-11-28 09:31:51.000000000 -0500
+++ linux-2.6-lttng/include/linux/module.h	2007-11-28 09:32:04.000000000 -0500
@@ -15,6 +15,7 @@
 #include <linux/stringify.h>
 #include <linux/kobject.h>
 #include <linux/moduleparam.h>
+#include <linux/immediate.h>
 #include <linux/marker.h>
 #include <asm/local.h>
 
@@ -355,6 +356,10 @@ struct module
 	/* The command line arguments (may be mangled).  People like
 	   keeping pointers to this stuff */
 	char *args;
+#ifdef CONFIG_IMMEDIATE
+	const struct __imv *immediate;
+	unsigned int num_immediate;
+#endif
 #ifdef CONFIG_MARKERS
 	struct marker *markers;
 	unsigned int num_markers;
@@ -464,6 +469,9 @@ extern void print_modules(void);
 
 extern void module_update_markers(void);
 
+extern void _module_imv_update(void);
+extern void module_imv_update(void);
+
 #else /* !CONFIG_MODULES... */
 #define EXPORT_SYMBOL(sym)
 #define EXPORT_SYMBOL_GPL(sym)
@@ -568,6 +576,14 @@ static inline void module_update_markers
 {
 }
 
+static inline void _module_imv_update(void)
+{
+}
+
+static inline void module_imv_update(void)
+{
+}
+
 #endif /* CONFIG_MODULES */
 
 struct device_driver;
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2007-11-28 09:31:51.000000000 -0500
+++ linux-2.6-lttng/kernel/module.c	2007-11-28 09:32:04.000000000 -0500
@@ -33,6 +33,7 @@
 #include <linux/cpu.h>
 #include <linux/moduleparam.h>
 #include <linux/errno.h>
+#include <linux/immediate.h>
 #include <linux/err.h>
 #include <linux/vermagic.h>
 #include <linux/notifier.h>
@@ -1675,6 +1676,7 @@ static struct module *load_module(void _
 	unsigned int unusedcrcindex;
 	unsigned int unusedgplindex;
 	unsigned int unusedgplcrcindex;
+	unsigned int immediateindex;
 	unsigned int markersindex;
 	unsigned int markersstringsindex;
 	struct module *mod;
@@ -1773,6 +1775,7 @@ static struct module *load_module(void _
 #ifdef ARCH_UNWIND_SECTION_NAME
 	unwindex = find_sec(hdr, sechdrs, secstrings, ARCH_UNWIND_SECTION_NAME);
 #endif
+	immediateindex = find_sec(hdr, sechdrs, secstrings, "__imv");
 
 	/* Don't keep modinfo section */
 	sechdrs[infoindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
@@ -1924,6 +1927,11 @@ static struct module *load_module(void _
 	mod->gpl_future_syms = (void *)sechdrs[gplfutureindex].sh_addr;
 	if (gplfuturecrcindex)
 		mod->gpl_future_crcs = (void *)sechdrs[gplfuturecrcindex].sh_addr;
+#ifdef CONFIG_IMMEDIATE
+	mod->immediate = (void *)sechdrs[immediateindex].sh_addr;
+	mod->num_immediate =
+		sechdrs[immediateindex].sh_size / sizeof(*mod->immediate);
+#endif
 
 	mod->unused_syms = (void *)sechdrs[unusedindex].sh_addr;
 	if (unusedcrcindex)
@@ -1991,11 +1999,16 @@ static struct module *load_module(void _
 
 	add_kallsyms(mod, sechdrs, symindex, strindex, secstrings);
 
+	if (!mod->taints) {
 #ifdef CONFIG_MARKERS
-	if (!mod->taints)
 		marker_update_probe_range(mod->markers,
 			mod->markers + mod->num_markers);
 #endif
+#ifdef CONFIG_IMMEDIATE
+		imv_update_range(mod->immediate,
+			mod->immediate + mod->num_immediate);
+#endif
+	}
 	err = module_finalize(hdr, sechdrs, mod);
 	if (err < 0)
 		goto cleanup;
@@ -2601,3 +2614,38 @@ void module_update_markers(void)
 	mutex_unlock(&module_mutex);
 }
 #endif
+
+#ifdef CONFIG_IMMEDIATE
+/**
+ * _module_imv_update - update all immediate values in the kernel
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ * Module_mutex must be held be the caller.
+ */
+void _module_imv_update(void)
+{
+	struct module *mod;
+
+	list_for_each_entry(mod, &modules, list) {
+		if (mod->taints)
+			continue;
+		imv_update_range(mod->immediate,
+			mod->immediate + mod->num_immediate);
+	}
+}
+EXPORT_SYMBOL_GPL(_module_imv_update);
+
+/**
+ * module_imv_update - update all immediate values in the kernel
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ * Takes module_mutex.
+ */
+void module_imv_update(void)
+{
+	mutex_lock(&module_mutex);
+	_module_imv_update();
+	mutex_unlock(&module_mutex);
+}
+EXPORT_SYMBOL_GPL(module_imv_update);
+#endif
Index: linux-2.6-lttng/kernel/immediate.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/kernel/immediate.c	2007-11-28 09:32:04.000000000 -0500
@@ -0,0 +1,187 @@
+/*
+ * Copyright (C) 2007 Mathieu Desnoyers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/immediate.h>
+#include <linux/memory.h>
+#include <linux/cpu.h>
+
+#include <asm/cacheflush.h>
+
+/*
+ * Kernel ready to execute the SMP update that may depend on trap and ipi.
+ */
+static int imv_early_boot_complete;
+
+extern const struct __imv __start___imv[];
+extern const struct __imv __stop___imv[];
+
+/*
+ * imv_mutex nests inside module_mutex. imv_mutex protects builtin
+ * immediates and module immediates.
+ */
+static DEFINE_MUTEX(imv_mutex);
+
+static atomic_t wait_sync;
+
+struct ipi_loop_data {
+	long value;
+	const struct __imv *imv;
+} loop_data;
+
+static void ipi_busy_loop(void *arg)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	atomic_dec(&wait_sync);
+	do {
+		/* Make sure the wait_sync gets re-read */
+		smp_mb();
+	} while (atomic_read(&wait_sync) > loop_data.value);
+	atomic_dec(&wait_sync);
+	do {
+		/* Make sure the wait_sync gets re-read */
+		smp_mb();
+	} while (atomic_read(&wait_sync) > 0);
+	/*
+	 * Issuing a synchronizing instruction must be done on each CPU before
+	 * reenabling interrupts after modifying an instruction. Required by
+	 * Intel's errata.
+	 */
+	sync_core();
+	flush_icache_range(loop_data.imv->imv,
+		loop_data.imv->imv + loop_data.imv->size);
+	local_irq_restore(flags);
+}
+
+/**
+ * apply_imv_update - update one immediate value
+ * @imv: pointer of type const struct __imv to update
+ *
+ * Update one immediate value. Must be called with imv_mutex held.
+ * It makes sure all CPUs are not executing the modified code by having them
+ * busy looping with interrupts disabled.
+ * It does _not_ protect against NMI and MCE (could be a problem with Intel's
+ * errata if we use immediate values in their code path).
+ */
+static int apply_imv_update(const struct __imv *imv)
+{
+	unsigned long flags;
+	long online_cpus;
+
+	/*
+	 * If the variable and the instruction have the same value, there is
+	 * nothing to do.
+	 */
+	switch (imv->size) {
+	case 1:	if (*(uint8_t *)imv->imv
+				== *(uint8_t *)imv->var)
+			return 0;
+		break;
+	case 2:	if (*(uint16_t *)imv->imv
+				== *(uint16_t *)imv->var)
+			return 0;
+		break;
+	case 4:	if (*(uint32_t *)imv->imv
+				== *(uint32_t *)imv->var)
+			return 0;
+		break;
+	case 8:	if (*(uint64_t *)imv->imv
+				== *(uint64_t *)imv->var)
+			return 0;
+		break;
+	default:return -EINVAL;
+	}
+
+	if (imv_early_boot_complete) {
+		kernel_text_lock();
+		lock_cpu_hotplug();
+		online_cpus = num_online_cpus();
+		atomic_set(&wait_sync, 2 * online_cpus);
+		loop_data.value = online_cpus;
+		loop_data.imv = imv;
+		smp_call_function(ipi_busy_loop, NULL, 1, 0);
+		local_irq_save(flags);
+		atomic_dec(&wait_sync);
+		do {
+			/* Make sure the wait_sync gets re-read */
+			smp_mb();
+		} while (atomic_read(&wait_sync) > online_cpus);
+		text_poke((void *)imv->imv, (void *)imv->var,
+				imv->size);
+		/*
+		 * Make sure the modified instruction is seen by all CPUs before
+		 * we continue (visible to other CPUs and local interrupts).
+		 */
+		wmb();
+		atomic_dec(&wait_sync);
+		flush_icache_range(imv->imv,
+				imv->imv + imv->size);
+		local_irq_restore(flags);
+		unlock_cpu_hotplug();
+		kernel_text_unlock();
+	} else
+		text_poke_early((void *)imv->imv, (void *)imv->var,
+				imv->size);
+	return 0;
+}
+
+/**
+ * imv_update_range - Update immediate values in a range
+ * @begin: pointer to the beginning of the range
+ * @end: pointer to the end of the range
+ *
+ * Updates a range of immediates.
+ */
+void imv_update_range(const struct __imv *begin,
+		const struct __imv *end)
+{
+	const struct __imv *iter;
+	int ret;
+	for (iter = begin; iter < end; iter++) {
+		mutex_lock(&imv_mutex);
+		ret = apply_imv_update(iter);
+		if (imv_early_boot_complete && ret)
+			printk(KERN_WARNING
+				"Invalid immediate value. "
+				"Variable at %p, "
+				"instruction at %p, size %hu\n",
+				(void *)iter->imv,
+				(void *)iter->var, iter->size);
+		mutex_unlock(&imv_mutex);
+	}
+}
+EXPORT_SYMBOL_GPL(imv_update_range);
+
+/**
+ * imv_update - update all immediate values in the kernel
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ */
+void core_imv_update(void)
+{
+	/* Core kernel imvs */
+	imv_update_range(__start___imv, __stop___imv);
+}
+EXPORT_SYMBOL_GPL(core_imv_update);
+
+void __init imv_init_complete(void)
+{
+	imv_early_boot_complete = 1;
+}
Index: linux-2.6-lttng/init/main.c
===================================================================
--- linux-2.6-lttng.orig/init/main.c	2007-11-28 09:27:34.000000000 -0500
+++ linux-2.6-lttng/init/main.c	2007-11-28 09:32:04.000000000 -0500
@@ -57,6 +57,7 @@
 #include <linux/device.h>
 #include <linux/kthread.h>
 #include <linux/sched.h>
+#include <linux/immediate.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -101,6 +102,11 @@ static inline void mark_rodata_ro(void) 
 #ifdef CONFIG_TC
 extern void tc_init(void);
 #endif
+#ifdef CONFIG_IMMEDIATE
+extern void imv_init_complete(void);
+#else
+static inline void imv_init_complete(void) { }
+#endif
 
 enum system_states system_state;
 EXPORT_SYMBOL(system_state);
@@ -518,6 +524,7 @@ asmlinkage void __init start_kernel(void
 	unwind_init();
 	lockdep_init();
 	cgroup_init_early();
+	core_imv_update();
 
 	local_irq_disable();
 	early_boot_irqs_off();
@@ -639,6 +646,7 @@ asmlinkage void __init start_kernel(void
 	cpuset_init();
 	taskstats_init_early();
 	delayacct_init();
+	imv_init_complete();
 
 	check_bugs();
 
Index: linux-2.6-lttng/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/kernel/Makefile	2007-11-28 09:27:34.000000000 -0500
+++ linux-2.6-lttng/kernel/Makefile	2007-11-28 09:32:04.000000000 -0500
@@ -56,6 +56,7 @@ obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
+obj-$(CONFIG_IMMEDIATE) += immediate.o
 obj-$(CONFIG_MARKERS) += marker.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2007-11-28 09:27:34.000000000 -0500
+++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2007-11-28 09:32:04.000000000 -0500
@@ -25,6 +25,9 @@
 		*(.rodata) *(.rodata.*)					\
 		*(__vermagic)		/* Kernel version magic */	\
 		*(__markers_strings)	/* Markers: strings */		\
+		VMLINUX_SYMBOL(__start___imv) = .;			\
+		*(__imv)		/* Immediate values: pointers */ \
+		VMLINUX_SYMBOL(__stop___imv) = .;			\
 	}								\
 									\
 	.rodata1          : AT(ADDR(.rodata1) - LOAD_OFFSET) {		\

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 2/7] Immediate Values - Kconfig menu in EMBEDDED
  2007-12-06  2:07 [patch 0/7] Immediate Values (redux) for 2.6.24-rc4-git3 Mathieu Desnoyers
  2007-12-06  2:07 ` [patch 1/7] Immediate Values - Architecture Independent Code Mathieu Desnoyers
@ 2007-12-06  2:08 ` Mathieu Desnoyers
  2007-12-06  2:08 ` [patch 3/7] x86: add <asm/asm.h> Mathieu Desnoyers
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2007-12-06  2:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Mathieu Desnoyers, Rusty Russell, Adrian Bunk, Andi Kleen,
	Alexey Dobriyan, Christoph Hellwig

[-- Attachment #1: immediate-values-kconfig-embedded.patch --]
[-- Type: text/plain, Size: 2663 bytes --]

Immediate values provide a way to use dynamic code patching to update variables
sitting within the instruction stream. It saves caches lines normally used by
static read mostly variables. Enable it by default, but let users disable it
through the EMBEDDED menu with the "Disable immediate values" submenu entry.

Note: Since I think that I really should let embedded systems developers using
RO memory the option to disable the immediate values, I choose to leave this
menu option there, in the EMBEDDED menu. Also, the "CONFIG_IMMEDIATE" makes
sense because we want to compile out all the immediate code when we decide not
to use optimized immediate values at all (it removes otherwise unused code).

Changelog:
- Change ARCH_SUPPORTS_IMMEDIATE for ARCH_HAS_IMMEDIATE

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Adrian Bunk <bunk@stusta.de>
CC: Andi Kleen <andi@firstfloor.org>
CC: Alexey Dobriyan <adobriyan@gmail.com>
CC: Christoph Hellwig <hch@infradead.org>
---
 init/Kconfig |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

Index: linux-2.6-lttng/init/Kconfig
===================================================================
--- linux-2.6-lttng.orig/init/Kconfig	2007-12-05 20:53:19.000000000 -0500
+++ linux-2.6-lttng/init/Kconfig	2007-12-05 20:53:35.000000000 -0500
@@ -435,6 +435,20 @@ config CC_OPTIMIZE_FOR_SIZE
 config SYSCTL
 	bool
 
+config IMMEDIATE
+	default y if !DISABLE_IMMEDIATE
+	depends on HAVE_IMMEDIATE
+	bool
+	help
+	  Immediate values are used as read-mostly variables that are rarely
+	  updated. They use code patching to modify the values inscribed in the
+	  instruction stream. It provides a way to save precious cache lines
+	  that would otherwise have to be used by these variables. They can be
+	  disabled through the EMBEDDED menu.
+
+config HAVE_IMMEDIATE
+	def_bool n
+
 menuconfig EMBEDDED
 	bool "Configure standard kernel features (for small systems)"
 	help
@@ -670,6 +684,16 @@ config MARKERS
 
 source "arch/Kconfig"
 
+config DISABLE_IMMEDIATE
+	default y if EMBEDDED
+	bool "Disable immediate values" if EMBEDDED
+	depends on HAVE_IMMEDIATE
+	help
+	  Disable code patching based immediate values for embedded systems. It
+	  consumes slightly more memory and requires to modify the instruction
+	  stream each time a variable is updated. Should really be disabled for
+	  embedded systems with read-only text.
+
 endmenu		# General setup
 
 config RT_MUTEXES

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 3/7] x86: add <asm/asm.h>
  2007-12-06  2:07 [patch 0/7] Immediate Values (redux) for 2.6.24-rc4-git3 Mathieu Desnoyers
  2007-12-06  2:07 ` [patch 1/7] Immediate Values - Architecture Independent Code Mathieu Desnoyers
  2007-12-06  2:08 ` [patch 2/7] Immediate Values - Kconfig menu in EMBEDDED Mathieu Desnoyers
@ 2007-12-06  2:08 ` Mathieu Desnoyers
  2007-12-06  2:08 ` [patch 4/7] Immediate Values - x86 Optimization Mathieu Desnoyers
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2007-12-06  2:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel; +Cc: H. Peter Anvin

[-- Attachment #1: add-x86-asm-asm-h.patch --]
[-- Type: text/plain, Size: 786 bytes --]

x86: add <asm/asm.h>

Create <asm/asm.h>, with common definitions suitable for assembly
unification.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---

diff --git a/include/asm-x86/asm.h b/include/asm-x86/asm.h
new file mode 100644
index 0000000..b5006eb
--- /dev/null
+++ b/include/asm-x86/asm.h
@@ -0,0 +1,18 @@
+#ifndef _ASM_X86_ASM_H
+#define _ASM_X86_ASM_H
+
+#ifdef CONFIG_X86_32
+/* 32 bits */
+
+# define _ASM_PTR	" .long "
+# define _ASM_ALIGN	" .balign 4 "
+
+#else
+/* 64 bits */
+
+# define _ASM_PTR	" .quad "
+# define _ASM_ALIGN	" .balign 8 "
+
+#endif /* CONFIG_X86_32 */
+
+#endif /* _ASM_X86_ASM_H */

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [patch 4/7] Immediate Values - x86 Optimization
  2007-12-06  2:07 [patch 0/7] Immediate Values (redux) for 2.6.24-rc4-git3 Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  2007-12-06  2:08 ` [patch 3/7] x86: add <asm/asm.h> Mathieu Desnoyers
@ 2007-12-06  2:08 ` Mathieu Desnoyers
  2007-12-06  2:08 ` [patch 5/7] Add text_poke and sync_core to powerpc Mathieu Desnoyers
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2007-12-06  2:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Mathieu Desnoyers, Andi Kleen, H. Peter Anvin, Chuck Ebbert,
	Christoph Hellwig, Jeremy Fitzhardinge, Thomas Gleixner,
	Ingo Molnar, Rusty Russell

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: immediate-values-x86-optimization.patch --]
[-- Type: text/plain, Size: 5288 bytes --]

x86 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.

Changelog:
- Use text_poke_early with cr0 WP save/restore to patch the bypass. We are doing
  non atomic writes to a code region only touched by us (nobody can execute it
  since we are protected by the imv_mutex).
- Put imv_set and _imv_set in the architecture independent header.
- Use $0 instead of %2 with (0) operand.
- Add x86_64 support, ready for i386+x86_64 -> x86 merge.
- Use asm-x86/asm.h.

Ok, so the most flexible solution that I see, that should fit for both
i386 and x86_64 would be :
1 byte  : "=Q" : Any register accessible as rh: a, b, c, and d.
2, 4 bytes : "=R" : Legacy register—the eight integer registers available
                 on all i386 processors (a, b, c, d, si, di, bp, sp). 8
bytes : (only for x86_64)
          "=r" : A register operand is allowed provided that it is in a
                 general register.
That should make sure x86_64 won't try to use REX prefixed opcodes for
1, 2 and 4 bytes values.

- Create the instruction in a discarded section to calculate its size. This is
  how we can align the beginning of the instruction on an address that will
  permit atomic modificatino of the immediate value without knowing the size of
  the opcode used by the compiler.
- Bugfix : 8 bytes 64 bits immediate value was declared as "4 bytes" in the
  immediate structure.
- Change the immediate.c update code to support variable length opcodes.

- Vastly simplified, using a busy looping IPI with interrupts disabled.
  Does not protect against NMI nor MCE.
- Pack the __imv section. Use smallest types required for size (char).
- Use imv_* instead of immediate_*.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <ak@muc.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Chuck Ebbert <cebbert@redhat.com>
CC: Christoph Hellwig <hch@infradead.org>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
---
 arch/x86/Kconfig            |    1 
 include/asm-x86/immediate.h |   77 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+)

Index: linux-2.6-lttng/include/asm-x86/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-x86/immediate.h	2007-11-21 11:04:33.000000000 -0500
@@ -0,0 +1,77 @@
+#ifndef _ASM_X86_IMMEDIATE_H
+#define _ASM_X86_IMMEDIATE_H
+
+/*
+ * Immediate values. x86 architecture optimizations.
+ *
+ * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm/asm.h>
+
+/**
+ * imv_read - read immediate variable
+ * @name: immediate value name
+ *
+ * Reads the value of @name.
+ * Optimized version of the immediate.
+ * Do not use in __init and __exit functions. Use _imv_read() instead.
+ * If size is bigger than the architecture long size, fall back on a memory
+ * read.
+ *
+ * Make sure to populate the initial static 64 bits opcode with a value
+ * what will generate an instruction with 8 bytes immediate value (not the REX.W
+ * prefixed one that loads a sign extended 32 bits immediate value in a r64
+ * register).
+ */
+#define imv_read(name)							\
+	({								\
+		__typeof__(name##__imv) value;				\
+		BUILD_BUG_ON(sizeof(value) > 8);			\
+		switch (sizeof(value)) {				\
+		case 1:							\
+			asm(".section __imv,\"a\",@progbits\n\t"	\
+				_ASM_PTR "%c1, (3f)-%c2\n\t"		\
+				".byte %c2\n\t"				\
+				".previous\n\t"				\
+				"mov $0,%0\n\t"				\
+				"3:\n\t"				\
+				: "=q" (value)				\
+				: "i" (&name##__imv),			\
+				  "i" (sizeof(value)));			\
+			break;						\
+		case 2:							\
+		case 4:							\
+			asm(".section __imv,\"a\",@progbits\n\t"	\
+				_ASM_PTR "%c1, (3f)-%c2\n\t"		\
+				".byte %c2\n\t"				\
+				".previous\n\t"				\
+				"mov $0,%0\n\t"				\
+				"3:\n\t"				\
+				: "=r" (value)				\
+				: "i" (&name##__imv),			\
+				  "i" (sizeof(value)));			\
+			break;						\
+		case 8:							\
+			if (sizeof(long) < 8) {				\
+				value = name##__imv;			\
+				break;					\
+			}						\
+			asm(".section __imv,\"a\",@progbits\n\t"	\
+				_ASM_PTR "%c1, (3f)-%c2\n\t"		\
+				".byte %c2\n\t"				\
+				".previous\n\t"				\
+				"mov $0xFEFEFEFE01010101,%0\n\t" 	\
+				"3:\n\t"				\
+				: "=r" (value)				\
+				: "i" (&name##__imv),			\
+				  "i" (sizeof(value)));			\
+			break;						\
+		};							\
+		value;							\
+	})
+
+#endif /* _ASM_X86_IMMEDIATE_H */
Index: linux-2.6-lttng/arch/x86/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/x86/Kconfig	2007-11-21 11:04:06.000000000 -0500
+++ linux-2.6-lttng/arch/x86/Kconfig	2007-11-21 11:04:33.000000000 -0500
@@ -21,6 +21,7 @@ config X86
 	default y
 	select HAVE_OPROFILE
 	select HAVE_KPROBES
+	select HAVE_IMMEDIATE
 
 config GENERIC_TIME
 	bool

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 5/7] Add text_poke and sync_core to powerpc
  2007-12-06  2:07 [patch 0/7] Immediate Values (redux) for 2.6.24-rc4-git3 Mathieu Desnoyers
                   ` (3 preceding siblings ...)
  2007-12-06  2:08 ` [patch 4/7] Immediate Values - x86 Optimization Mathieu Desnoyers
@ 2007-12-06  2:08 ` Mathieu Desnoyers
  2007-12-06  2:08 ` [patch 6/7] Immediate Values - Powerpc Optimization Mathieu Desnoyers
  2007-12-06  2:08 ` [patch 7/7] Immediate Values - Documentation Mathieu Desnoyers
  6 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2007-12-06  2:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Mathieu Desnoyers, Rusty Russell, Christoph Hellwig, Paul Mackerras

[-- Attachment #1: add-text-poke-to-powerpc.patch --]
[-- Type: text/plain, Size: 1355 bytes --]

- Needed on architectures where we must surround live instruction modification
  with "WP flag disable".
- Turns into a memcpy on powerpc since there is no WP flag activated for
  instruction pages (yet..).
- Add empty sync_core to powerpc so it can be used in architecture independent
  code.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Christoph Hellwig <hch@infradead.org>
CC: Paul Mackerras <paulus@samba.org>
---
 include/asm-powerpc/cacheflush.h |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/include/asm-powerpc/cacheflush.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-powerpc/cacheflush.h	2007-11-19 12:05:50.000000000 -0500
+++ linux-2.6-lttng/include/asm-powerpc/cacheflush.h	2007-11-19 13:27:36.000000000 -0500
@@ -63,7 +63,9 @@ extern void flush_dcache_phys_range(unsi
 #define copy_from_user_page(vma, page, vaddr, dst, src, len) \
 	memcpy(dst, src, len)
 
-
+#define text_poke	memcpy
+#define text_poke_early	text_poke
+#define sync_core()
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
 /* internal debugging function */

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 6/7] Immediate Values - Powerpc Optimization
  2007-12-06  2:07 [patch 0/7] Immediate Values (redux) for 2.6.24-rc4-git3 Mathieu Desnoyers
                   ` (4 preceding siblings ...)
  2007-12-06  2:08 ` [patch 5/7] Add text_poke and sync_core to powerpc Mathieu Desnoyers
@ 2007-12-06  2:08 ` Mathieu Desnoyers
  2007-12-06  2:08 ` [patch 7/7] Immediate Values - Documentation Mathieu Desnoyers
  6 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2007-12-06  2:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel
  Cc: Mathieu Desnoyers, Rusty Russell, Christoph Hellwig, Paul Mackerras

[-- Attachment #1: immediate-values-powerpc-optimization.patch --]
[-- Type: text/plain, Size: 3003 bytes --]

PowerPC optimization of the immediate values which uses a li instruction,
patched with an immediate value.

Changelog:
- Put imv_set and _imv_set in the architecture independent header.
- Pack the __imv section. Use smallest types required for size (char).
- Remove architecture specific update code : now handled by architecture
  agnostic code.
- Use imv_* instead of immediate_*.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Christoph Hellwig <hch@infradead.org>
CC: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/Kconfig            |    1 
 include/asm-powerpc/immediate.h |   55 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)

Index: linux-2.6-lttng/include/asm-powerpc/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-powerpc/immediate.h	2007-11-19 12:26:16.000000000 -0500
@@ -0,0 +1,55 @@
+#ifndef _ASM_POWERPC_IMMEDIATE_H
+#define _ASM_POWERPC_IMMEDIATE_H
+
+/*
+ * Immediate values. PowerPC architecture optimizations.
+ *
+ * (C) Copyright 2006 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm/asm-compat.h>
+
+/**
+ * imv_read - read immediate variable
+ * @name: immediate value name
+ *
+ * Reads the value of @name.
+ * Optimized version of the immediate.
+ * Do not use in __init and __exit functions. Use _imv_read() instead.
+ */
+#define imv_read(name)							\
+	({								\
+		__typeof__(name##__imv) value;				\
+		BUILD_BUG_ON(sizeof(value) > 8);			\
+		switch (sizeof(value)) {				\
+		case 1:							\
+			asm(".section __imv,\"a\",@progbits\n\t"	\
+					PPC_LONG "%c1, ((1f)-1)\n\t"	\
+					".byte 1\n\t"			\
+					".previous\n\t"			\
+					"li %0,0\n\t"			\
+					"1:\n\t"			\
+				: "=r" (value)				\
+				: "i" (&name##__imv));			\
+			break;						\
+		case 2:							\
+			asm(".section __imv,\"a\",@progbits\n\t"	\
+					PPC_LONG "%c1, ((1f)-2)\n\t"	\
+					".byte 2\n\t"			\
+					".previous\n\t"			\
+					"li %0,0\n\t"			\
+					"1:\n\t"			\
+				: "=r" (value)				\
+				: "i" (&name##__imv));			\
+			break;						\
+		case 4:							\
+		case 8:	value = name##__imv;				\
+			break;						\
+		};							\
+		value;							\
+	})
+
+#endif /* _ASM_POWERPC_IMMEDIATE_H */
Index: linux-2.6-lttng/arch/powerpc/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/powerpc/Kconfig	2007-11-19 12:25:21.000000000 -0500
+++ linux-2.6-lttng/arch/powerpc/Kconfig	2007-11-19 12:26:01.000000000 -0500
@@ -81,6 +81,7 @@ config PPC
 	default y
 	select HAVE_OPROFILE
 	select HAVE_KPROBES
+	select HAVE_IMMEDIATE
 
 config EARLY_PRINTK
 	bool

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 7/7] Immediate Values - Documentation
  2007-12-06  2:07 [patch 0/7] Immediate Values (redux) for 2.6.24-rc4-git3 Mathieu Desnoyers
                   ` (5 preceding siblings ...)
  2007-12-06  2:08 ` [patch 6/7] Immediate Values - Powerpc Optimization Mathieu Desnoyers
@ 2007-12-06  2:08 ` Mathieu Desnoyers
  6 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2007-12-06  2:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel; +Cc: Mathieu Desnoyers, Rusty Russell

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: immediate-values-documentation.patch --]
[-- Type: text/plain, Size: 8867 bytes --]

Changelog:
- Remove imv_set_early (removed from API).
- Use imv_* instead of immediate_*.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
---
 Documentation/immediate.txt |  221 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 221 insertions(+)

Index: linux-2.6-lttng/Documentation/immediate.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/Documentation/immediate.txt	2007-11-03 20:28:58.000000000 -0400
@@ -0,0 +1,221 @@
+		        Using the Immediate Values
+
+			    Mathieu Desnoyers
+
+
+This document introduces Immediate Values and their use.
+
+
+* Purpose of immediate values
+
+An immediate value is used to compile into the kernel variables that sit within
+the instruction stream. They are meant to be rarely updated but read often.
+Using immediate values for these variables will save cache lines.
+
+This infrastructure is specialized in supporting dynamic patching of the values
+in the instruction stream when multiple CPUs are running without disturbing the
+normal system behavior.
+
+Compiling code meant to be rarely enabled at runtime can be done using
+if (unlikely(imv_read(var))) as condition surrounding the code. The
+smallest data type required for the test (an 8 bits char) is preferred, since
+some architectures, such as powerpc, only allow up to 16 bits immediate values.
+
+
+* Usage
+
+In order to use the "immediate" macros, you should include linux/immediate.h.
+
+#include <linux/immediate.h>
+
+DEFINE_IMV(char, this_immediate);
+EXPORT_IMV_SYMBOL(this_immediate);
+
+
+And use, in the body of a function:
+
+Use imv_set(this_immediate) to set the immediate value.
+
+Use imv_read(this_immediate) to read the immediate value.
+
+The immediate mechanism supports inserting multiple instances of the same
+immediate. Immediate values can be put in inline functions, inlined static
+functions, and unrolled loops.
+
+If you have to read the immediate values from a function declared as __init or
+__exit, you should explicitly use _imv_read(), which will fall back on a
+global variable read. Failing to do so will leave a reference to the __init
+section after it is freed (it would generate a modpost warning).
+
+You can choose to set an initial static value to the immediate by using, for
+instance:
+
+DEFINE_IMV(long, myptr) = 10;
+
+
+* Optimization for a given architecture
+
+One can implement optimized immediate values for a given architecture by
+replacing asm-$ARCH/immediate.h.
+
+
+* Performance improvement
+
+
+  * Memory hit for a data-based branch
+
+Here are the results on a 3GHz Pentium 4:
+
+number of tests: 100
+number of branches per test: 100000
+memory hit cycles per iteration (mean): 636.611
+L1 cache hit cycles per iteration (mean): 89.6413
+instruction stream based test, cycles per iteration (mean): 85.3438
+Just getting the pointer from a modulo on a pseudo-random value, doing
+  nothing with it, cycles per iteration (mean): 77.5044
+
+So:
+Base case:                      77.50 cycles
+instruction stream based test:  +7.8394 cycles
+L1 cache hit based test:        +12.1369 cycles
+Memory load based test:         +559.1066 cycles
+
+So let's say we have a ping flood coming at
+(14014 packets transmitted, 14014 received, 0% packet loss, time 1826ms)
+7674 packets per second. If we put 2 markers for irq entry/exit, it
+brings us to 15348 markers sites executed per second.
+
+(15348 exec/s) * (559 cycles/exec) / (3G cycles/s) = 0.0029
+We therefore have a 0.29% slowdown just on this case.
+
+Compared to this, the instruction stream based test will cause a
+slowdown of:
+
+(15348 exec/s) * (7.84 cycles/exec) / (3G cycles/s) = 0.00004
+For a 0.004% slowdown.
+
+If we plan to use this for memory allocation, spinlock, and all sorts of
+very high event rate tracing, we can assume it will execute 10 to 100
+times more sites per second, which brings us to 0.4% slowdown with the
+instruction stream based test compared to 29% slowdown with the memory
+load based test on a system with high memory pressure.
+
+
+
+  * Markers impact under heavy memory load
+
+Running a kernel with my LTTng instrumentation set, in a test that
+generates memory pressure (from userspace) by trashing L1 and L2 caches
+between calls to getppid() (note: syscall_trace is active and calls
+a marker upon syscall entry and syscall exit; markers are disarmed).
+This test is done in user-space, so there are some delays due to IRQs
+coming and to the scheduler. (UP 2.6.22-rc6-mm1 kernel, task with -20
+nice level)
+
+My first set of results: Linear cache trashing, turned out not to be
+very interesting, because it seems like the linearity of the memset on a
+full array is somehow detected and it does not "really" trash the
+caches.
+
+Now the most interesting result: Random walk L1 and L2 trashing
+surrounding a getppid() call.
+
+- Markers compiled out (but syscall_trace execution forced)
+number of tests: 10000
+No memory pressure
+Reading timestamps takes 108.033 cycles
+getppid: 1681.4 cycles
+With memory pressure
+Reading timestamps takes 102.938 cycles
+getppid: 15691.6 cycles
+
+
+- With the immediate values based markers:
+number of tests: 10000
+No memory pressure
+Reading timestamps takes 108.006 cycles
+getppid: 1681.84 cycles
+With memory pressure
+Reading timestamps takes 100.291 cycles
+getppid: 11793 cycles
+
+
+- With global variables based markers:
+number of tests: 10000
+No memory pressure
+Reading timestamps takes 107.999 cycles
+getppid: 1669.06 cycles
+With memory pressure
+Reading timestamps takes 102.839 cycles
+getppid: 12535 cycles
+
+The result is quite interesting in that the kernel is slower without
+markers than with markers. I explain it by the fact that the data
+accessed is not laid out in the same manner in the cache lines when the
+markers are compiled in or out. It seems that it aligns the function's
+data better to compile-in the markers in this case.
+
+But since the interesting comparison is between the immediate values and
+global variables based markers, and because they share the same memory
+layout, except for the movl being replaced by a movz, we see that the
+global variable based markers (2 markers) adds 742 cycles to each system
+call (syscall entry and exit are traced and memory locations for both
+global variables lie on the same cache line).
+
+
+- Test redone with less iterations, but with error estimates
+
+10 runs of 100 iterations each: Tests done on a 3GHz P4. Here I run getppid with
+syscall trace inactive, comparing the case with memory pressure and without
+memory pressure. (sorry, my system is not setup to execute syscall_trace this
+time, but it will make the point anyway).
+
+No memory pressure
+Reading timestamps:     150.92 cycles,     std dev.    1.01 cycles
+getppid:               1462.09 cycles,     std dev.   18.87 cycles
+
+With memory pressure
+Reading timestamps:     578.22 cycles,     std dev.  269.51 cycles
+getppid:              17113.33 cycles,     std dev. 1655.92 cycles
+
+
+Now for memory read timing: (10 runs, branches per test: 100000)
+Memory read based branch:
+                       644.09 cycles,      std dev.   11.39 cycles
+L1 cache hit based branch:
+                        88.16 cycles,      std dev.    1.35 cycles
+
+
+So, now that we have the raw results, let's calculate:
+
+Memory read:
+644.09±11.39 - 88.16±1.35 = 555.93±11.46 cycles
+
+Getppid without memory pressure:
+1462.09±18.87 - 150.92±1.01 = 1311.17±18.90 cycles
+
+Getppid with memory pressure:
+17113.33±1655.92 - 578.22±269.51 = 16535.11±1677.71 cycles
+
+Therefore, if we add 2 markers not based on immediate values to the getppid
+code, which would add 2 memory reads, we would add
+2 * 555.93±12.74 = 1111.86±25.48 cycles
+
+Therefore,
+
+1111.86±25.48 / 16535.11±1677.71 = 0.0672
+ relative error: sqrt(((25.48/1111.86)^2)+((1677.71/16535.11)^2))
+                     = 0.1040
+ absolute error: 0.1040 * 0.0672 = 0.0070
+
+Therefore: 0.0672±0.0070 * 100% = 6.72±0.70 %
+
+We can therefore affirm that adding 2 markers to getppid, on a system with high
+memory pressure, would have a performance hit of at least 6.0% on the system
+call time, all within the uncertainty limits of these tests. The same applies to
+other kernel code paths. The smaller those code paths are, the highest the
+impact ratio will be.
+
+Therefore, not only is it interesting to use the immediate values to dynamically
+activate dormant code such as the markers, but I think it should also be
+considered as a replacement for many of the "read-mostly" static variables.

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2008-02-27 19:05     ` Mathieu Desnoyers
@ 2008-02-28 16:50       ` Jason Baron
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Baron @ 2008-02-28 16:50 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, Ingo Molnar, linux-kernel, Rusty Russell

On Wed, Feb 27, 2008 at 02:05:19PM -0500, Mathieu Desnoyers wrote:
> > +struct stop_machine_data {
> > +	int (*fn)(void *);
> > +	void *data;
> > +	struct completion done;
> > +	int run_all;
> > +} smdata;
> > +
> 
> Why do we now have to declare this static ? Can we pass it as a pointer
> to stopmachine instead ?
> 

Since the other cpus need to access 'fn', i made smdata a global.
stop_machine_run() is system-wide, so we only need one of these...

thanks,

-Jason


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2008-02-26 22:52   ` Jason Baron
  2008-02-26 23:12     ` Mathieu Desnoyers
@ 2008-02-27 19:05     ` Mathieu Desnoyers
  2008-02-28 16:50       ` Jason Baron
  1 sibling, 1 reply; 25+ messages in thread
From: Mathieu Desnoyers @ 2008-02-27 19:05 UTC (permalink / raw)
  To: Jason Baron; +Cc: akpm, Ingo Molnar, linux-kernel, Rusty Russell

* Jason Baron (jbaron@redhat.com) wrote:
> On Sat, Feb 02, 2008 at 04:08:29PM -0500, Mathieu Desnoyers wrote:
> > Changelog:
> > 
> > - section __imv is already SHF_ALLOC
> > - Because of the wonders of ELF, section 0 has sh_addr and sh_size 0.  So
> >   the if (immediateindex) is unnecessary here.
> > - Remove module_mutex usage: depend on functions implemented in module.c for
> >   that.
> 
> hi,
> 
> In testing this patch, i've run across a deadlock...apply_imv_update() can get
> called again before, ipi_busy_loop() has had a chance to finsh, and set 
> wait_sync back to its initial value. This causes ipi_busy_loop() to get stuck
> indefinitely and the subsequent apply_imv_update() hangs. I've shown this 
> deadlock below using nmi_watchdog=1 in item 1). 
> 
> After hitting this deadlock i modified apply_imv_update() to wait for 
> ipi_busy_loop(), to finish, however this resulted in a 3 way deadlock. Process
> A held a read lock on tasklist_lock, then process B called apply_imv_update().
> Process A received the IPI and begins executing ipi_busy_loop(). Then process
> C takes a write lock irq on the task list lock, before receiving the IPI. Thus,
> process A holds up process C, and C can't get an IPI b/c interrupts are
> disabled. i believe this is an inherent problem in using smp_call_function, in
> that one can't synchronize the processes on each other...you can reproduce
> these issues using the test module below item 2)
> 
> In order to address these issues, i've modified stop_machine_run() to take an
> new third parameter, RUN_ALL, to allow stop_machine_run() to run a function
> on all cpus, item 3). I then modified kernel/immediate.c to use this new 
> infrastructure item 4). the code in immediate.c simplifies quite a bit. This
> new code has been robust through all testing thus far.
> 

Ok, I see why you did it that way. Comments follow inline.

> thanks,
> 
> -Jason
> 
> 1)
[...]
 
> 2)
> 
[...]
> 
> 3)
> 
> 
> diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
> index 5bfc553..1ab1c5b 100644
> --- a/include/linux/stop_machine.h
> +++ b/include/linux/stop_machine.h
> @@ -8,6 +8,9 @@
>  #include <asm/system.h>
>  
>  #if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
> +
> +#define RUN_ALL ~0U
> +
>  /**
>   * stop_machine_run: freeze the machine on all CPUs and run this function
>   * @fn: the function to run
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 51b5ee5..e6ee46f 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -23,9 +23,17 @@ enum stopmachine_state {
>  	STOPMACHINE_WAIT,
>  	STOPMACHINE_PREPARE,
>  	STOPMACHINE_DISABLE_IRQ,
> +	STOPMACHINE_RUN,
>  	STOPMACHINE_EXIT,
>  };
>  
> +struct stop_machine_data {
> +	int (*fn)(void *);
> +	void *data;
> +	struct completion done;
> +	int run_all;
> +} smdata;
> +

Why do we now have to declare this static ? Can we pass it as a pointer
to stopmachine instead ?

>  static enum stopmachine_state stopmachine_state;
>  static unsigned int stopmachine_num_threads;
>  static atomic_t stopmachine_thread_ack;
> @@ -35,6 +43,7 @@ static int stopmachine(void *cpu)
>  {
>  	int irqs_disabled = 0;
>  	int prepared = 0;
> +	int ran = 0;
>  
>  	set_cpus_allowed(current, cpumask_of_cpu((int)(long)cpu));
>  
> @@ -59,6 +68,11 @@ static int stopmachine(void *cpu)
>  			prepared = 1;
>  			smp_mb(); /* Must read state first. */
>  			atomic_inc(&stopmachine_thread_ack);
> +		} else if (stopmachine_state == STOPMACHINE_RUN && !ran) {
> +			smdata.fn(smdata.data);
> +			ran = 1;
> +			smp_mb(); /* Must read state first. */
> +			atomic_inc(&stopmachine_thread_ack);
>  		}
>  		/* Yield in first stage: migration threads need to
>  		 * help our sisters onto their CPUs. */
> @@ -136,12 +150,10 @@ static void restart_machine(void)
>  	preempt_enable_no_resched();
>  }
>  
> -struct stop_machine_data
> +static void run_other_cpus(void)
>  {
> -	int (*fn)(void *);
> -	void *data;
> -	struct completion done;
> -};
> +	stopmachine_set_state(STOPMACHINE_RUN);

Hrm, the semantic of STOPMACHINE_RUN is a bit weird :
- The CPU where the do_stop thread runs will first execute (alone) the
  callback.
- Then, all the other CPUs will execute the callback concurrently.

Given that you use a "started" boolean in the callback, which is ok as
long as there is no concurrent modification (correct given the current
semantic, but fragile), I would tend to document that the first time the
callback is called, it is ran alone, without concurrency, and then all
the other callbacks are ran concurrently.


> +}
>  
>  static int do_stop(void *_smdata)
>  {
> @@ -151,6 +163,8 @@ static int do_stop(void *_smdata)
>  	ret = stop_machine();
>  	if (ret == 0) {
>  		ret = smdata->fn(smdata->data);
> +		if (smdata->run_all)
> +			run_other_cpus();
>  		restart_machine();
>  	}
>  
> @@ -170,17 +184,16 @@ static int do_stop(void *_smdata)
>  struct task_struct *__stop_machine_run(int (*fn)(void *), void *data,
>  				       unsigned int cpu)
>  {
> -	struct stop_machine_data smdata;
>  	struct task_struct *p;
>  
> +	down(&stopmachine_mutex);
>  	smdata.fn = fn;
>  	smdata.data = data;
> +	smdata.run_all = (cpu == RUN_ALL) ? 1 : 0;
>  	init_completion(&smdata.done);
> -
> -	down(&stopmachine_mutex);
> -
> +	smp_wmb();
>  	/* If they don't care which CPU fn runs on, bind to any online one. */
> -	if (cpu == NR_CPUS)
> +	if (cpu == NR_CPUS || cpu == RUN_ALL)
>  		cpu = raw_smp_processor_id();
>  
>  	p = kthread_create(do_stop, &smdata, "kstopmachine");
> 
> 
> 4)
> 
> 
> diff --git a/kernel/immediate.c b/kernel/immediate.c
> index 4c36a89..39ec13e 100644
> --- a/kernel/immediate.c
> +++ b/kernel/immediate.c
> @@ -20,6 +20,7 @@
>  #include <linux/immediate.h>
>  #include <linux/memory.h>
>  #include <linux/cpu.h>
> +#include <linux/stop_machine.h>
>  
>  #include <asm/cacheflush.h>
>  
> @@ -30,6 +31,23 @@ static int imv_early_boot_complete;
>  
>  extern const struct __imv __start___imv[];
>  extern const struct __imv __stop___imv[];
> +int started;
> +
> +int stop_machine_imv_update(void *imv_ptr)
> +{
> +	struct __imv *imv = imv_ptr;
> +
> +	if (!started) {
> +		text_poke((void *)imv->imv, (void *)imv->var, imv->size);
> +		started = 1;
> +		smp_wmb();

missing mb() comment here.

> +	} else
> +		sync_core();
> +

Note : we really want the sync_core()s to be executed after the
text_poke. This is ok given the implicit RUN_ALL semantic, but I guess
it should be documented in stop_machine that the first callback is
executed alone before all the others.

Thanks!

Mathieu

> +	flush_icache_range(imv->imv, imv->imv + imv->size);
> +
> +	return 0;
> +}
>  
>  /*
>   * imv_mutex nests inside module_mutex. imv_mutex protects builtin
> @@ -37,38 +55,6 @@ extern const struct __imv __stop___imv[];
>   */
>  static DEFINE_MUTEX(imv_mutex);
>  
> -static atomic_t wait_sync;
> -
> -struct ipi_loop_data {
> -	long value;
> -	const struct __imv *imv;
> -} loop_data;
> -
> -static void ipi_busy_loop(void *arg)
> -{
> -	unsigned long flags;
> -
> -	local_irq_save(flags);
> -	atomic_dec(&wait_sync);
> -	do {
> -		/* Make sure the wait_sync gets re-read */
> -		smp_mb();
> -	} while (atomic_read(&wait_sync) > loop_data.value);
> -	atomic_dec(&wait_sync);
> -	do {
> -		/* Make sure the wait_sync gets re-read */
> -		smp_mb();
> -	} while (atomic_read(&wait_sync) > 0);
> -	/*
> -	 * Issuing a synchronizing instruction must be done on each CPU before
> -	 * reenabling interrupts after modifying an instruction. Required by
> -	 * Intel's errata.
> -	 */
> -	sync_core();
> -	flush_icache_range(loop_data.imv->imv,
> -		loop_data.imv->imv + loop_data.imv->size);
> -	local_irq_restore(flags);
> -}
>  
>  /**
>   * apply_imv_update - update one immediate value
> @@ -82,9 +68,6 @@ static void ipi_busy_loop(void *arg)
>   */
>  static int apply_imv_update(const struct __imv *imv)
>  {
> -	unsigned long flags;
> -	long online_cpus;
> -
>  	/*
>  	 * If the variable and the instruction have the same value, there is
>  	 * nothing to do.
> @@ -111,30 +94,8 @@ static int apply_imv_update(const struct __imv *imv)
>  
>  	if (imv_early_boot_complete) {
>  		kernel_text_lock();
> -		get_online_cpus();
> -		online_cpus = num_online_cpus();
> -		atomic_set(&wait_sync, 2 * online_cpus);
> -		loop_data.value = online_cpus;
> -		loop_data.imv = imv;
> -		smp_call_function(ipi_busy_loop, NULL, 1, 0);
> -		local_irq_save(flags);
> -		atomic_dec(&wait_sync);
> -		do {
> -			/* Make sure the wait_sync gets re-read */
> -			smp_mb();
> -		} while (atomic_read(&wait_sync) > online_cpus);
> -		text_poke((void *)imv->imv, (void *)imv->var,
> -				imv->size);
> -		/*
> -		 * Make sure the modified instruction is seen by all CPUs before
> -		 * we continue (visible to other CPUs and local interrupts).
> -		 */
> -		wmb();
> -		atomic_dec(&wait_sync);
> -		flush_icache_range(imv->imv,
> -				imv->imv + imv->size);
> -		local_irq_restore(flags);
> -		put_online_cpus();
> +		started = 0;
> +		stop_machine_run(stop_machine_imv_update, (void *)imv, RUN_ALL);
>  		kernel_text_unlock();
>  	} else
>  		text_poke_early((void *)imv->imv, (void *)imv->var,

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2008-02-26 23:12     ` Mathieu Desnoyers
  2008-02-26 23:34       ` Mathieu Desnoyers
@ 2008-02-27 17:01       ` Jason Baron
  1 sibling, 0 replies; 25+ messages in thread
From: Jason Baron @ 2008-02-27 17:01 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Rusty Russell, Jan Kiszka

On Tue, Feb 26, 2008 at 06:12:48PM -0500, Mathieu Desnoyers wrote:
> Hrm, does your stop machine implementation disable interrupts while the
> CPUs busy loop ?
> 

yes. this is how stop_machine is implemented...and I believe is consistent with
the way in which your algorithm disables irqs. The logic to me, is we don't
want any cpus to see the kernel text in an inconsistent state via an irq.

-Jason


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2008-02-26 23:34       ` Mathieu Desnoyers
@ 2008-02-27 16:44         ` Jason Baron
  0 siblings, 0 replies; 25+ messages in thread
From: Jason Baron @ 2008-02-27 16:44 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: akpm, Ingo Molnar, linux-kernel, Rusty Russell, Jan Kiszka

On Tue, Feb 26, 2008 at 06:34:45PM -0500, Mathieu Desnoyers wrote:
> > > In testing this patch, i've run across a deadlock...apply_imv_update() can get
> > > called again before, ipi_busy_loop() has had a chance to finsh, and set 
> > > wait_sync back to its initial value. This causes ipi_busy_loop() to get stuck
> > > indefinitely and the subsequent apply_imv_update() hangs. I've shown this 
> > > deadlock below using nmi_watchdog=1 in item 1). 
> > > 
> > 
> > Hrm, yes, Jan pointed out the exact same problem in my ltt-test-tsc TSC
> > test module in LTTng a few days ago. His fix implied to add another
> > barrier upon which the smp_call_function() caller must wait for the ipis
> > to finish. Since this immediate value code does the same I did in my
> > ltt-test-tsc code, the same fix will likely apply.
> > 
> > I'll cook something.
> > 
> 
> This should work. Untested for now. Can you give it a try ?
> 

this patch results in the subsequent 3 way deadlock that I described in the
previous mail. smp_call_function() can not be used with a function that 
attempts to rendezvous cpus in the manner being done here. The patch I posted 
in the previous mail addresses these limitations on smp_call_functions(). trace
of the lockup using this patch is shown below. 

thanks,

-Jason


NMI Watchdog detected LOCKUP on CPU 2
CPU 2
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 11233, comm: make Not tainted 2.6.24-git12markers #2
RIP: 0010:[<ffffffff811248d9>]  [<ffffffff811248d9>] __write_lock_failed+0x9/0x20
RSP: 0018:ffff81006e025e30  EFLAGS: 00000087
RAX: ffff81006ece8b58 RBX: 0000000000000000 RCX: ffff81006e1f0ec8
RDX: 0000000000000011 RSI: fe60000000000000 RDI: ffffffff813d4000
RBP: ffff81006e025e38 R08: ffff81006e1f0e90 R09: 00000000ffffffff
R10: ffff810072c45e98 R11: 0000000000004111 R12: 00000000fffffff4
R13: ffff81006ece8930 R14: ffff81006818b260 R15: 0000000000000000
FS:  00002b14168b66f0(0000) GS:ffff81007f801980(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000106f000 CR3: 00000000680d2000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process make (pid: 11233, threadinfo ffff81006e024000, task ffff81006818b260)
Stack:  ffffffff8127536a ffff81006e025ed8 ffffffff81034260 ffff81006e1f0e80
 0000000000000000 0000000000000000 ffff81006e025f58 00007fff94222fe0
 0000000000004111 ffff81006ece8930 0000000000000000 ffff81006ec6f040
Call Trace:
 [<ffffffff8127536a>] ? _write_lock_irq+0x13/0x15
 [<ffffffff81034260>] copy_process+0xf7c/0x1477
 [<ffffffff81034870>] do_fork+0x75/0x20a
 [<ffffffff8100c0e9>] ? tracesys+0xdc/0xe1
 [<ffffffff8100a3c6>] sys_vfork+0x20/0x22
 [<ffffffff8100c287>] ptregscall_common+0x67/0xb0

Code: e9 07 48 89 11 31 c0 c3 48 83 e9 07 eb 00 48 c7 c0 f2 ff ff ff c3 90 90 90 90 90 90 90 90
---[ end trace 6a2bbda3f47e95fd ]---
NMI Watchdog detected LOCKUP on CPU 3
CPU 3
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 3258, comm: toggle-writer Not tainted 2.6.24-git12markers #2
RIP: 0010:[<ffffffff8101c717>]  [<ffffffff8101c717>] __smp_call_function_mask+0x9f/0xc1
RSP: 0018:ffff81006ec51da8  EFLAGS: 00000297
RAX: 00000000000008fc RBX: 0000000000000007 RCX: 000000007688fa00
RDX: 00000000000008fc RSI: 00000000000000fc RDI: 0000000000000007
RBP: ffff81006ec51e08 R08: ffffffff8840504f R09: 00007fa6423566f0
R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000003
R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff8106e7c3
FS:  00007fa6423566f0(0000) GS:ffff81007f801c80(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000031fb603080 CR3: 000000006ec49000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process toggle-writer (pid: 3258, threadinfo ffff81006ec50000, task ffff81007899a000)
Stack:  ffffffff8106e7c3 0000000000000000 0000000300000002 ffffffff00000000
 0000000000000007 0000000000000292 ffff81007899a000 0000000000000000
 0000000000000000 ffffffff8106e7c3 000000000000000f 0000000000000005
Call Trace:
 [<ffffffff8106e7c3>] ? ipi_busy_loop+0x0/0x55
 [<ffffffff8106e7c3>] ? ipi_busy_loop+0x0/0x55
 [<ffffffff8101c783>] smp_call_function_mask+0x4a/0x59
 [<ffffffff8101c7ab>] smp_call_function+0x19/0x1b
 [<ffffffff8106e702>] imv_update_range+0xd6/0x17e
 [<ffffffff8105494e>] _module_imv_update+0x35/0x54
 [<ffffffff81054982>] module_imv_update+0x15/0x23
 [<ffffffff884050c4>] :toggle_tester:proc_toggle_write+0x4c/0x64
 [<ffffffff810d64c8>] proc_reg_write+0x7b/0x96
 [<ffffffff81099f96>] vfs_write+0xae/0x157
 [<ffffffff8109a563>] sys_write+0x47/0x70
 [<ffffffff8100c0e9>] tracesys+0xdc/0xe1

Code: f8 48 3b 5d c0 48 8b 05 28 1a 41 00 75 0a bf fc 00 00 00 ff 50 38 eb 0f be fc 00 00 00 48
---[ end trace 6a2bbda3f47e95fd ]---
NMI Watchdog detected LOCKUP on CPU 1
CPU 1
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 0, comm: swapper Not tainted 2.6.24-git12markers #2
RIP: 0010:[<ffffffff8106e7dc>]  [<ffffffff8106e7dc>] ipi_busy_loop+0x19/0x55
RSP: 0018:ffff81007fb6ff70  EFLAGS: 00000002
RAX: 0000000000000008 RBX: 0000000000000000 RCX: ffff81007fb5f260
RDX: 000000000000000a RSI: 0000000000000046 RDI: 0000000000000000
RBP: ffff81007fb6ff78 R08: ffff81007fb6ff78 R09: 0000000000000002
R10: ffff810001018b80 R11: ffff810072dddc18 R12: ffffffff8106e7c3
R13: 0000000000000000 R14: ffffffff81491a80 R15: ffff810001018a80
FS:  0000000000000000(0000) GS:ffff81007f801680(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000031fac9afa0 CR3: 0000000072cfb000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff81007fb6a000, task ffff81007fb5f260)
Stack:  0000000000000000 ffff81007fb6ffa8 ffffffff8101ca4e ffff81007fb6ffa8
 ffffffff8100b0e2 0000000000000001 0000000000000040 ffff81007fb6be60
 ffffffff8100cac6 ffff81007fb6be60 <EOI>  ffff81007fb6bee8 ffff810072dddc18
Call Trace:
 <IRQ>  [<ffffffff8101ca4e>] smp_call_function_interrupt+0x48/0x71
 [<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
 [<ffffffff8100cac6>] call_function_interrupt+0x66/0x70
 <EOI>  [<ffffffff8100b127>] ? mwait_idle+0x45/0x4a
 [<ffffffff8100a723>] ? enter_idle+0x22/0x24
 [<ffffffff8100b06f>] ? cpu_idle+0x97/0xc1
 [<ffffffff81270a55>] ? start_secondary+0x3ba/0x3c6

Code: 81 48 c7 c7 02 fd 36 81 48 89 e5 e8 6b fe ff ff c9 c3 55 48 89 e5 53 9c 5e fa f0 ff 0d 72
---[ end trace 6a2bbda3f47e95fd ]---
NMI Watchdog detected LOCKUP on CPU 0
Kernel panic - not syncing: Aiee, killing interrupt handler!
CPU 0
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 5, comm: watchdog/0 Not tainted 2.6.24-git12markers #2
RIP: 0010:[<ffffffff8106e7e9>]  [<ffffffff8106e7e9>] ipi_busy_loop+0x26/0x55
RSP: 0018:ffffffff81498f70  EFLAGS: 00000002
RAX: 0000000000000008 RBX: 0000000000000000 RCX: ffffffff81394620
RDX: 000000000000000a RSI: 0000000000000046 RDI: 0000000000000000
RBP: ffffffff81498f78 R08: ffffffff81498f78 R09: ffff81000100cbe0
R10: ffff81007fb59e20 R11: 0000000000000001 R12: ffffffff8106e7c3
R13: 0000000000000000 R14: 00000000000000f0 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff813d2000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006bb374 CR3: 000000006ec88000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process watchdog/0 (pid: 5, threadinfo ffff81007fb58000, task ffff81007fb54930)
Stack:  0000000000000000 ffffffff81498fa8 ffffffff8101ca4e ffffffff884059c8
 ffff81007e4d3260 ffff81007e4d3260 00000000000003b2 ffff81007fb59e60
 ffffffff8100cac6 ffff81007fb59e60 <EOI>  ffff81007fb59f20 0000000000000001
Call Trace:
 <IRQ>  [<ffffffff8101ca4e>] smp_call_function_interrupt+0x48/0x71
 [<ffffffff8100cac6>] call_function_interrupt+0x66/0x70
 <EOI>  [<ffffffff81069685>] ? watchdog+0xd5/0x1c8
 [<ffffffff810695b0>] ? watchdog+0x0/0x1c8
 [<ffffffff81047fd3>] ? kthread+0x49/0x76
 [<ffffffff8100cd88>] ? child_rip+0xa/0x12
 [<ffffffff81047f8a>] ? kthread+0x0/0x76
 [<ffffffff8100cd7e>] ? child_rip+0x0/0x12


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2008-02-26 23:12     ` Mathieu Desnoyers
@ 2008-02-26 23:34       ` Mathieu Desnoyers
  2008-02-27 16:44         ` Jason Baron
  2008-02-27 17:01       ` Jason Baron
  1 sibling, 1 reply; 25+ messages in thread
From: Mathieu Desnoyers @ 2008-02-26 23:34 UTC (permalink / raw)
  To: Jason Baron; +Cc: akpm, Ingo Molnar, linux-kernel, Rusty Russell, Jan Kiszka

* Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca) wrote:
> * Jason Baron (jbaron@redhat.com) wrote:
> > On Sat, Feb 02, 2008 at 04:08:29PM -0500, Mathieu Desnoyers wrote:
> > > Changelog:
> > > 
> > > - section __imv is already SHF_ALLOC
> > > - Because of the wonders of ELF, section 0 has sh_addr and sh_size 0.  So
> > >   the if (immediateindex) is unnecessary here.
> > > - Remove module_mutex usage: depend on functions implemented in module.c for
> > >   that.
> > 
> > hi,
> > 
> > In testing this patch, i've run across a deadlock...apply_imv_update() can get
> > called again before, ipi_busy_loop() has had a chance to finsh, and set 
> > wait_sync back to its initial value. This causes ipi_busy_loop() to get stuck
> > indefinitely and the subsequent apply_imv_update() hangs. I've shown this 
> > deadlock below using nmi_watchdog=1 in item 1). 
> > 
> 
> Hrm, yes, Jan pointed out the exact same problem in my ltt-test-tsc TSC
> test module in LTTng a few days ago. His fix implied to add another
> barrier upon which the smp_call_function() caller must wait for the ipis
> to finish. Since this immediate value code does the same I did in my
> ltt-test-tsc code, the same fix will likely apply.
> 
> I'll cook something.
> 

This should work. Untested for now. Can you give it a try ?

Fix Immediate Deadlock

Jason Baron <jbaron@redhat.com> :

In testing this patch, i've run across a deadlock...apply_imv_update() can get
called again before, ipi_busy_loop() has had a chance to finsh, and set
wait_sync back to its initial value. This causes ipi_busy_loop() to get stuck
indefinitely and the subsequent apply_imv_update() hangs. I've shown this
deadlock below using nmi_watchdog=1 in item 1).

Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> :

Hrm, yes, Jan pointed out the exact same problem in my ltt-test-tsc TSC
test module in LTTng a few days ago. His fix implied to add another
barrier upon which the smp_call_function() caller must wait for the ipis
to finish. Since this immediate value code does the same I did in my
ltt-test-tsc code, the same fix will likely apply.

Thanks to Jason Baron for finding this bug and proposing an initial
implementation.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Jason Baron <jbaron@redhat.com>
---
 kernel/immediate.c |   15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

Index: linux-2.6-lttng/kernel/immediate.c
===================================================================
--- linux-2.6-lttng.orig/kernel/immediate.c	2008-02-26 18:16:10.000000000 -0500
+++ linux-2.6-lttng/kernel/immediate.c	2008-02-26 18:29:32.000000000 -0500
@@ -53,12 +53,12 @@ static void ipi_busy_loop(void *arg)
 	do {
 		/* Make sure the wait_sync gets re-read */
 		smp_mb();
-	} while (atomic_read(&wait_sync) > loop_data.value);
+	} while (atomic_read(&wait_sync) > 2 * loop_data.value);
 	atomic_dec(&wait_sync);
 	do {
 		/* Make sure the wait_sync gets re-read */
 		smp_mb();
-	} while (atomic_read(&wait_sync) > 0);
+	} while (atomic_read(&wait_sync) > loop_data.value);
 	/*
 	 * Issuing a synchronizing instruction must be done on each CPU before
 	 * reenabling interrupts after modifying an instruction. Required by
@@ -67,6 +67,7 @@ static void ipi_busy_loop(void *arg)
 	sync_core();
 	flush_icache_range(loop_data.imv->imv,
 		loop_data.imv->imv + loop_data.imv->size);
+	atomic_dec(&wait_sync);
 	local_irq_restore(flags);
 }
 
@@ -113,7 +114,7 @@ static int apply_imv_update(const struct
 		kernel_text_lock();
 		get_online_cpus();
 		online_cpus = num_online_cpus();
-		atomic_set(&wait_sync, 2 * online_cpus);
+		atomic_set(&wait_sync, 3 * online_cpus);
 		loop_data.value = online_cpus;
 		loop_data.imv = imv;
 		smp_call_function(ipi_busy_loop, NULL, 1, 0);
@@ -122,7 +123,7 @@ static int apply_imv_update(const struct
 		do {
 			/* Make sure the wait_sync gets re-read */
 			smp_mb();
-		} while (atomic_read(&wait_sync) > online_cpus);
+		} while (atomic_read(&wait_sync) > 2 * online_cpus);
 		text_poke((void *)imv->imv, (void *)imv->var,
 				imv->size);
 		/*
@@ -133,6 +134,12 @@ static int apply_imv_update(const struct
 		atomic_dec(&wait_sync);
 		flush_icache_range(imv->imv,
 				imv->imv + imv->size);
+		/*
+		 * Wait until all other CPUs are done so that we don't overwrite
+		 * loop_data or wait_sync prematurely.
+		 */
+		while (unlikely(atomic_read(&wait_sync) > 1))
+			cpu_relax();
 		local_irq_restore(flags);
 		put_online_cpus();
 		kernel_text_unlock();

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2008-02-26 22:52   ` Jason Baron
@ 2008-02-26 23:12     ` Mathieu Desnoyers
  2008-02-26 23:34       ` Mathieu Desnoyers
  2008-02-27 17:01       ` Jason Baron
  2008-02-27 19:05     ` Mathieu Desnoyers
  1 sibling, 2 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2008-02-26 23:12 UTC (permalink / raw)
  To: Jason Baron; +Cc: akpm, Ingo Molnar, linux-kernel, Rusty Russell, Jan Kiszka

* Jason Baron (jbaron@redhat.com) wrote:
> On Sat, Feb 02, 2008 at 04:08:29PM -0500, Mathieu Desnoyers wrote:
> > Changelog:
> > 
> > - section __imv is already SHF_ALLOC
> > - Because of the wonders of ELF, section 0 has sh_addr and sh_size 0.  So
> >   the if (immediateindex) is unnecessary here.
> > - Remove module_mutex usage: depend on functions implemented in module.c for
> >   that.
> 
> hi,
> 
> In testing this patch, i've run across a deadlock...apply_imv_update() can get
> called again before, ipi_busy_loop() has had a chance to finsh, and set 
> wait_sync back to its initial value. This causes ipi_busy_loop() to get stuck
> indefinitely and the subsequent apply_imv_update() hangs. I've shown this 
> deadlock below using nmi_watchdog=1 in item 1). 
> 

Hrm, yes, Jan pointed out the exact same problem in my ltt-test-tsc TSC
test module in LTTng a few days ago. His fix implied to add another
barrier upon which the smp_call_function() caller must wait for the ipis
to finish. Since this immediate value code does the same I did in my
ltt-test-tsc code, the same fix will likely apply.

I'll cook something.

Hrm, does your stop machine implementation disable interrupts while the
CPUs busy loop ?

> After hitting this deadlock i modified apply_imv_update() to wait for 
> ipi_busy_loop(), to finish, however this resulted in a 3 way deadlock. Process
> A held a read lock on tasklist_lock, then process B called apply_imv_update().
> Process A received the IPI and begins executing ipi_busy_loop(). Then process
> C takes a write lock irq on the task list lock, before receiving the IPI. Thus,
> process A holds up process C, and C can't get an IPI b/c interrupts are
> disabled. i believe this is an inherent problem in using smp_call_function, in
> that one can't synchronize the processes on each other...you can reproduce
> these issues using the test module below item 2)
> 
> In order to address these issues, i've modified stop_machine_run() to take an
> new third parameter, RUN_ALL, to allow stop_machine_run() to run a function
> on all cpus, item 3). I then modified kernel/immediate.c to use this new 
> infrastructure item 4). the code in immediate.c simplifies quite a bit. This
> new code has been robust through all testing thus far.
> 
> thanks,
> 
> -Jason
> 
> 1)
> 
>  NMI Watchdog detected LOCKUP on CPU 3
> CPU 3
> Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 0, comm: swapper Not tainted 2.6.24-git12markers #1
> RIP: 0010:[<ffffffff8106e7cc>]  [<ffffffff8106e7cc>] ipi_busy_loop+0x19/0x41
> RSP: 0018:ffff81007fbdff70  EFLAGS: 00000002
> RAX: 0000000000000006 RBX: 0000000000000000 RCX: ffff81007fbd2930
> RDX: ffffffff81491b80 RSI: 0000000000000046 RDI: 0000000000000000
> RBP: ffff81007fbdff78 R08: ffff81007fbdff78 R09: ffff81007dd91e80
> R10: ffff810001030b80 R11: 0000000000000246 R12: ffffffff8106e7b3
> R13: 0000000000000000 R14: ffffffff81491a80 R15: ffff810001030a80
> FS:  0000000000000000(0000) GS:ffff81007f801c80(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00000000025e8d68 CR3: 000000007549c000 CR4: 00000000000026e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 0, threadinfo ffff81007fbd6000, task ffff81007fbd2930)
> Stack:  0000000000000000 ffff81007fbdffa8 ffffffff8101ca4e ffff81007fbdffa8
>  ffffffff8100b0e2 0000000000000003 0000000000000040 ffff81007fbd7e60
>  ffffffff8100cac6 ffff81007fbd7e60 <EOI>  ffff81007fbd7ee8 0000000000000246
> Call Trace:
>  <IRQ>  [<ffffffff8101ca4e>] smp_call_function_interrupt+0x48/0x71
>  [<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
>  [<ffffffff8100cac6>] call_function_interrupt+0x66/0x70
>  <EOI>  [<ffffffff8100b127>] ? mwait_idle+0x45/0x4a
>  [<ffffffff8100a723>] ? enter_idle+0x22/0x24
>  [<ffffffff8100b06f>] ? cpu_idle+0x97/0xc1
>  [<ffffffff81270a35>] ? start_secondary+0x3ba/0x3c6
> 
> 
> ---[ end trace 738433437d960ebc ]---
> NMI Watchdog detected LOCKUP on CPU 2
> CPU 2
> Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 9704, comm: toggle-writer Not tainted 2.6.24-git12markers #1
> RIP: 0010:[<ffffffff8101c717>]  [<ffffffff8101c717>] __smp_call_function_mask+0x9f/0xc1
> RSP: 0018:ffff81003ac35da8  EFLAGS: 00000297
> RAX: 00000000000008fc RBX: 000000000000000b RCX: 00000000000008fc
> RDX: 00000000000008fc RSI: 00000000000000fc RDI: 000000000000000b
> RBP: ffff81003ac35e08 R08: ffffffff8840504f R09: 00007f01f07696f0
> R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000003
> R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff8106e7b3
> FS:  00007f01f07696f0(0000) GS:ffff81007f801980(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f01f0796000 CR3: 000000006bcdf000 CR4: 00000000000026e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process toggle-writer (pid: 9704, threadinfo ffff81003ac34000, task ffff81003ad14930)
> Stack:  ffffffff8106e7b3 0000000000000000 0000000000000002 00003fff00000000
>  000000000000000b ffffffff81075c03 00000010001280d2 0000000000000000
>  0000000000000000 ffffffff8106e7b3 000000000000000f 0000000000000005
> Call Trace:
>  [<ffffffff8106e7b3>] ? ipi_busy_loop+0x0/0x41
>  [<ffffffff81075c03>] ? __alloc_pages+0x8e/0x337
>  [<ffffffff8106e7b3>] ? ipi_busy_loop+0x0/0x41
>  [<ffffffff8101c783>] smp_call_function_mask+0x4a/0x59
>  [<ffffffff8101c7ab>] smp_call_function+0x19/0x1b
>  [<ffffffff8106e703>] imv_update_range+0xd7/0x16e
>  [<ffffffff8105494e>] _module_imv_update+0x35/0x54
>  [<ffffffff81054982>] module_imv_update+0x15/0x23
>  [<ffffffff884050c4>] :toggle_tester:proc_toggle_write+0x4c/0x64
>  [<ffffffff810d64a4>] proc_reg_write+0x7b/0x96
>  [<ffffffff81099f72>] vfs_write+0xae/0x157
>  [<ffffffff8109a53f>] sys_write+0x47/0x70
>  [<ffffffff8100c0e9>] tracesys+0xdc/0xe1
> 
> Code: f8 48 3b 5d c0 48 8b 05 28 1a 41 00 75 0a bf fc 00 00 00 ff 50 38 eb 0f be fc 00 00 00 48
> ---[ end trace 738433437d960ebc ]---
> NMI Watchdog detected LOCKUP on CPU 1
> CPU 1
> Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 0, comm: swapper Not tainted 2.6.24-git12markers #1
> RIP: 0010:[<ffffffff8106e7cc>]  [<ffffffff8106e7cc>] ipi_busy_loop+0x19/0x41
> RSP: 0018:ffff81007fb6ff70  EFLAGS: 00000002
> RAX: 0000000000000006 RBX: 0000000000000000 RCX: ffff81007fb5f260
> RDX: ffffffff81491b80 RSI: 0000000000000046 RDI: 0000000000000000
> RBP: ffff81007fb6ff78 R08: ffff81007fb6ff78 R09: ffff810001016700
> R10: ffff810001018b80 R11: 0000000000000001 R12: ffffffff8106e7b3
> R13: 0000000000000000 R14: ffffffff81491a80 R15: ffff810001018a80
> FS:  0000000000000000(0000) GS:ffff81007f801680(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00000031fac9ac20 CR3: 00000000778e6000 CR4: 00000000000026e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 0, threadinfo ffff81007fb6a000, task ffff81007fb5f260)
> Stack:  0000000000000000 ffff81007fb6ffa8 ffffffff8101ca4e ffff81007fb6ffa8
>  ffffffff8100b0e2 0000000000000001 0000000000000040 ffff81007fb6be60
>  ffffffff8100cac6 ffff81007fb6be60 <EOI>  ffff81007fb6bee8 0000000000000001
> Call Trace:
>  <IRQ>  [<ffffffff8101ca4e>] smp_call_function_interrupt+0x48/0x71
>  [<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
>  [<ffffffff8100cac6>] call_function_interrupt+0x66/0x70
>  <EOI>  [<ffffffff8100b127>] ? mwait_idle+0x45/0x4a
>  [<ffffffff8100a723>] ? enter_idle+0x22/0x24
>  [<ffffffff8100b06f>] ? cpu_idle+0x97/0xc1
>  [<ffffffff81270a35>] ? start_secondary+0x3ba/0x3c6
> 
> Code: 81 48 c7 c7 02 fd 36 81 48 89 e5 e8 7b fe ff ff c9 c3 55 48 89 e5 53 9c 5e fa f0 ff 0d 82
> ---[ end trace 738433437d960ebc ]---
> NMI Watchdog detected LOCKUP on CPU 0
> Kernel panic - not syncing: Aiee, killing interrupt handler!
> CPU 0
> Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 0, comm: swapper Not tainted 2.6.24-git12markers #1
> RIP: 0010:[<ffffffff8106e7e6>]  [<ffffffff8106e7e6>] ipi_busy_loop+0x33/0x41
> RSP: 0018:ffffffff81498f70  EFLAGS: 00000006
> RAX: 0000000000000004 RBX: 0000000000000000 RCX: ffffffff81394620
> RDX: ffffffff81491b80 RSI: 0000000000000046 RDI: 0000000000000000
> RBP: ffffffff81498f78 R08: ffffffff81498f78 R09: ffff810062cc1e98
> R10: ffff81000100cb80 R11: ffff810062cc1d58 R12: ffffffff8106e7b3
> R13: 0000000000000000 R14: ffffffff81466640 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffffffff813d2000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00000031fac6f060 CR3: 0000000000201000 CR4: 00000000000026e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper (pid: 0, threadinfo ffffffff81434000, task ffffffff81394620)
> Stack:  0000000000000000 ffffffff81498fa8 ffffffff8101ca4e ffffffff8100b0e2
>  ffffffff8100b0e2 0000000000000000 ffffffffffffffff ffffffff81435ec0
>  ffffffff8100cac6 ffffffff81435ec0 <EOI>  ffffffff81435f48 ffff810062cc1d58
> Call Trace:
>  <IRQ>  [<ffffffff8101ca4e>] smp_call_function_interrupt+0x48/0x71
>  [<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
>  [<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
>  [<ffffffff8100cac6>] call_function_interrupt+0x66/0x70
>  <EOI>  [<ffffffff8100b127>] ? mwait_idle+0x45/0x4a
>  [<ffffffff8100a723>] ? enter_idle+0x22/0x24
>  [<ffffffff8100b06f>] ? cpu_idle+0x97/0xc1
>  [<ffffffff81266ee6>] ? rest_init+0x5a/0x5c
> 
> Code: f0 ff 0d 82 6b 49 00 0f ae f0 48 63 05 78 6b 49 00 48 3b 05 5d 6b 49 00 7f ed f0 ff 0d 68
> ---[ end trace 738433437d960ebc ]---
> 
> 
> 2)
> 
> 
> --- /dev/null
> +++ b/samples/markers/toggle-tester.c
> @@ -0,0 +1,90 @@
> +/* probe-example.c
> + *
> + * toggle module to test immedidate infrastructure.
> + *
> + * (C) Copyright 2007 Jason Baron <jbaron@redhat.com>
> + *
> + * This file is released under the GPLv2.
> + * See the file COPYING for more details.
> + */
> +
> +#include <linux/sched.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/marker.h>
> +#include <asm/atomic.h>
> +#include <linux/seq_file.h>
> +#include <linux/fs.h>
> +#include <linux/proc_fs.h>
> +
> +static DECLARE_MUTEX(toggle_mutex);
> +DEFINE_IMV(char, toggle_value) = 0;
> +
> +static ssize_t proc_toggle_write(struct file *file, const char __user *buf,
> +                                size_t length, loff_t *ppos)
> +{
> +	int i;
> +
> +	down(&toggle_mutex);
> +	i = imv_read(toggle_value);
> +	imv_set(toggle_value, !i);
> +	up(&toggle_mutex);
> +
> +	return 0;
> +}
> +
> +static int proc_toggle_show(struct seq_file *s, void *p)
> +{
> +	int i;
> +
> +	down(&toggle_mutex);        
> +        i = imv_read(toggle_value);
> +	up(&toggle_mutex);
> +
> +	seq_printf(s, "%u\n", i);
> +
> +        return 0;
> +}
> +
> +
> +static int proc_toggle_open(struct inode *inode, struct file *file)
> +{
> +	return single_open(file, proc_toggle_show, NULL);
> +}
> +
> +
> +static const struct file_operations toggle_operations = {
> +	.open           = proc_toggle_open,
> +	.read           = seq_read,
> +	.write          = proc_toggle_write,
> +	.llseek         = seq_lseek,
> +	.release        = single_release,
> +};
> +	
> +
> +struct proc_dir_entry *pde;
> +
> +static int __init toggle_init(void)
> +{
> +	
> +	pde = create_proc_entry("toggle", 0, NULL);
> +	if (!pde)
> +		return -ENOMEM;
> +	pde->proc_fops = &toggle_operations;
> +
> +
> +	return 0;
> +}
> +
> +static void __exit toggle_fini(void)
> +{
> +	remove_proc_entry("toggle", &proc_root);
> +	
> +}
> +
> +module_init(toggle_init);
> +module_exit(toggle_fini);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Jason Baron");
> +MODULE_DESCRIPTION("testing immediate implementation");
> 
> 
> 3)
> 
> 
> diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
> index 5bfc553..1ab1c5b 100644
> --- a/include/linux/stop_machine.h
> +++ b/include/linux/stop_machine.h
> @@ -8,6 +8,9 @@
>  #include <asm/system.h>
>  
>  #if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
> +
> +#define RUN_ALL ~0U
> +
>  /**
>   * stop_machine_run: freeze the machine on all CPUs and run this function
>   * @fn: the function to run
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 51b5ee5..e6ee46f 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -23,9 +23,17 @@ enum stopmachine_state {
>  	STOPMACHINE_WAIT,
>  	STOPMACHINE_PREPARE,
>  	STOPMACHINE_DISABLE_IRQ,
> +	STOPMACHINE_RUN,
>  	STOPMACHINE_EXIT,
>  };
>  
> +struct stop_machine_data {
> +	int (*fn)(void *);
> +	void *data;
> +	struct completion done;
> +	int run_all;
> +} smdata;
> +
>  static enum stopmachine_state stopmachine_state;
>  static unsigned int stopmachine_num_threads;
>  static atomic_t stopmachine_thread_ack;
> @@ -35,6 +43,7 @@ static int stopmachine(void *cpu)
>  {
>  	int irqs_disabled = 0;
>  	int prepared = 0;
> +	int ran = 0;
>  
>  	set_cpus_allowed(current, cpumask_of_cpu((int)(long)cpu));
>  
> @@ -59,6 +68,11 @@ static int stopmachine(void *cpu)
>  			prepared = 1;
>  			smp_mb(); /* Must read state first. */
>  			atomic_inc(&stopmachine_thread_ack);
> +		} else if (stopmachine_state == STOPMACHINE_RUN && !ran) {
> +			smdata.fn(smdata.data);
> +			ran = 1;
> +			smp_mb(); /* Must read state first. */
> +			atomic_inc(&stopmachine_thread_ack);
>  		}
>  		/* Yield in first stage: migration threads need to
>  		 * help our sisters onto their CPUs. */
> @@ -136,12 +150,10 @@ static void restart_machine(void)
>  	preempt_enable_no_resched();
>  }
>  
> -struct stop_machine_data
> +static void run_other_cpus(void)
>  {
> -	int (*fn)(void *);
> -	void *data;
> -	struct completion done;
> -};
> +	stopmachine_set_state(STOPMACHINE_RUN);
> +}
>  
>  static int do_stop(void *_smdata)
>  {
> @@ -151,6 +163,8 @@ static int do_stop(void *_smdata)
>  	ret = stop_machine();
>  	if (ret == 0) {
>  		ret = smdata->fn(smdata->data);
> +		if (smdata->run_all)
> +			run_other_cpus();
>  		restart_machine();
>  	}
>  
> @@ -170,17 +184,16 @@ static int do_stop(void *_smdata)
>  struct task_struct *__stop_machine_run(int (*fn)(void *), void *data,
>  				       unsigned int cpu)
>  {
> -	struct stop_machine_data smdata;
>  	struct task_struct *p;
>  
> +	down(&stopmachine_mutex);
>  	smdata.fn = fn;
>  	smdata.data = data;
> +	smdata.run_all = (cpu == RUN_ALL) ? 1 : 0;
>  	init_completion(&smdata.done);
> -
> -	down(&stopmachine_mutex);
> -
> +	smp_wmb();
>  	/* If they don't care which CPU fn runs on, bind to any online one. */
> -	if (cpu == NR_CPUS)
> +	if (cpu == NR_CPUS || cpu == RUN_ALL)
>  		cpu = raw_smp_processor_id();
>  
>  	p = kthread_create(do_stop, &smdata, "kstopmachine");
> 
> 
> 4)
> 
> 
> diff --git a/kernel/immediate.c b/kernel/immediate.c
> index 4c36a89..39ec13e 100644
> --- a/kernel/immediate.c
> +++ b/kernel/immediate.c
> @@ -20,6 +20,7 @@
>  #include <linux/immediate.h>
>  #include <linux/memory.h>
>  #include <linux/cpu.h>
> +#include <linux/stop_machine.h>
>  
>  #include <asm/cacheflush.h>
>  
> @@ -30,6 +31,23 @@ static int imv_early_boot_complete;
>  
>  extern const struct __imv __start___imv[];
>  extern const struct __imv __stop___imv[];
> +int started;
> +
> +int stop_machine_imv_update(void *imv_ptr)
> +{
> +	struct __imv *imv = imv_ptr;
> +
> +	if (!started) {
> +		text_poke((void *)imv->imv, (void *)imv->var, imv->size);
> +		started = 1;
> +		smp_wmb();
> +	} else
> +		sync_core();
> +
> +	flush_icache_range(imv->imv, imv->imv + imv->size);
> +
> +	return 0;
> +}
>  
>  /*
>   * imv_mutex nests inside module_mutex. imv_mutex protects builtin
> @@ -37,38 +55,6 @@ extern const struct __imv __stop___imv[];
>   */
>  static DEFINE_MUTEX(imv_mutex);
>  
> -static atomic_t wait_sync;
> -
> -struct ipi_loop_data {
> -	long value;
> -	const struct __imv *imv;
> -} loop_data;
> -
> -static void ipi_busy_loop(void *arg)
> -{
> -	unsigned long flags;
> -
> -	local_irq_save(flags);
> -	atomic_dec(&wait_sync);
> -	do {
> -		/* Make sure the wait_sync gets re-read */
> -		smp_mb();
> -	} while (atomic_read(&wait_sync) > loop_data.value);
> -	atomic_dec(&wait_sync);
> -	do {
> -		/* Make sure the wait_sync gets re-read */
> -		smp_mb();
> -	} while (atomic_read(&wait_sync) > 0);
> -	/*
> -	 * Issuing a synchronizing instruction must be done on each CPU before
> -	 * reenabling interrupts after modifying an instruction. Required by
> -	 * Intel's errata.
> -	 */
> -	sync_core();
> -	flush_icache_range(loop_data.imv->imv,
> -		loop_data.imv->imv + loop_data.imv->size);
> -	local_irq_restore(flags);
> -}
>  
>  /**
>   * apply_imv_update - update one immediate value
> @@ -82,9 +68,6 @@ static void ipi_busy_loop(void *arg)
>   */
>  static int apply_imv_update(const struct __imv *imv)
>  {
> -	unsigned long flags;
> -	long online_cpus;
> -
>  	/*
>  	 * If the variable and the instruction have the same value, there is
>  	 * nothing to do.
> @@ -111,30 +94,8 @@ static int apply_imv_update(const struct __imv *imv)
>  
>  	if (imv_early_boot_complete) {
>  		kernel_text_lock();
> -		get_online_cpus();
> -		online_cpus = num_online_cpus();
> -		atomic_set(&wait_sync, 2 * online_cpus);
> -		loop_data.value = online_cpus;
> -		loop_data.imv = imv;
> -		smp_call_function(ipi_busy_loop, NULL, 1, 0);
> -		local_irq_save(flags);
> -		atomic_dec(&wait_sync);
> -		do {
> -			/* Make sure the wait_sync gets re-read */
> -			smp_mb();
> -		} while (atomic_read(&wait_sync) > online_cpus);
> -		text_poke((void *)imv->imv, (void *)imv->var,
> -				imv->size);
> -		/*
> -		 * Make sure the modified instruction is seen by all CPUs before
> -		 * we continue (visible to other CPUs and local interrupts).
> -		 */
> -		wmb();
> -		atomic_dec(&wait_sync);
> -		flush_icache_range(imv->imv,
> -				imv->imv + imv->size);
> -		local_irq_restore(flags);
> -		put_online_cpus();
> +		started = 0;
> +		stop_machine_run(stop_machine_imv_update, (void *)imv, RUN_ALL);
>  		kernel_text_unlock();
>  	} else
>  		text_poke_early((void *)imv->imv, (void *)imv->var,

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2008-02-02 21:08 ` [patch 1/7] Immediate Values - Architecture Independent Code Mathieu Desnoyers
@ 2008-02-26 22:52   ` Jason Baron
  2008-02-26 23:12     ` Mathieu Desnoyers
  2008-02-27 19:05     ` Mathieu Desnoyers
  0 siblings, 2 replies; 25+ messages in thread
From: Jason Baron @ 2008-02-26 22:52 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, Ingo Molnar, linux-kernel, Rusty Russell

On Sat, Feb 02, 2008 at 04:08:29PM -0500, Mathieu Desnoyers wrote:
> Changelog:
> 
> - section __imv is already SHF_ALLOC
> - Because of the wonders of ELF, section 0 has sh_addr and sh_size 0.  So
>   the if (immediateindex) is unnecessary here.
> - Remove module_mutex usage: depend on functions implemented in module.c for
>   that.

hi,

In testing this patch, i've run across a deadlock...apply_imv_update() can get
called again before, ipi_busy_loop() has had a chance to finsh, and set 
wait_sync back to its initial value. This causes ipi_busy_loop() to get stuck
indefinitely and the subsequent apply_imv_update() hangs. I've shown this 
deadlock below using nmi_watchdog=1 in item 1). 

After hitting this deadlock i modified apply_imv_update() to wait for 
ipi_busy_loop(), to finish, however this resulted in a 3 way deadlock. Process
A held a read lock on tasklist_lock, then process B called apply_imv_update().
Process A received the IPI and begins executing ipi_busy_loop(). Then process
C takes a write lock irq on the task list lock, before receiving the IPI. Thus,
process A holds up process C, and C can't get an IPI b/c interrupts are
disabled. i believe this is an inherent problem in using smp_call_function, in
that one can't synchronize the processes on each other...you can reproduce
these issues using the test module below item 2)

In order to address these issues, i've modified stop_machine_run() to take an
new third parameter, RUN_ALL, to allow stop_machine_run() to run a function
on all cpus, item 3). I then modified kernel/immediate.c to use this new 
infrastructure item 4). the code in immediate.c simplifies quite a bit. This
new code has been robust through all testing thus far.

thanks,

-Jason

1)

 NMI Watchdog detected LOCKUP on CPU 3
CPU 3
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 0, comm: swapper Not tainted 2.6.24-git12markers #1
RIP: 0010:[<ffffffff8106e7cc>]  [<ffffffff8106e7cc>] ipi_busy_loop+0x19/0x41
RSP: 0018:ffff81007fbdff70  EFLAGS: 00000002
RAX: 0000000000000006 RBX: 0000000000000000 RCX: ffff81007fbd2930
RDX: ffffffff81491b80 RSI: 0000000000000046 RDI: 0000000000000000
RBP: ffff81007fbdff78 R08: ffff81007fbdff78 R09: ffff81007dd91e80
R10: ffff810001030b80 R11: 0000000000000246 R12: ffffffff8106e7b3
R13: 0000000000000000 R14: ffffffff81491a80 R15: ffff810001030a80
FS:  0000000000000000(0000) GS:ffff81007f801c80(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000025e8d68 CR3: 000000007549c000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff81007fbd6000, task ffff81007fbd2930)
Stack:  0000000000000000 ffff81007fbdffa8 ffffffff8101ca4e ffff81007fbdffa8
 ffffffff8100b0e2 0000000000000003 0000000000000040 ffff81007fbd7e60
 ffffffff8100cac6 ffff81007fbd7e60 <EOI>  ffff81007fbd7ee8 0000000000000246
Call Trace:
 <IRQ>  [<ffffffff8101ca4e>] smp_call_function_interrupt+0x48/0x71
 [<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
 [<ffffffff8100cac6>] call_function_interrupt+0x66/0x70
 <EOI>  [<ffffffff8100b127>] ? mwait_idle+0x45/0x4a
 [<ffffffff8100a723>] ? enter_idle+0x22/0x24
 [<ffffffff8100b06f>] ? cpu_idle+0x97/0xc1
 [<ffffffff81270a35>] ? start_secondary+0x3ba/0x3c6


---[ end trace 738433437d960ebc ]---
NMI Watchdog detected LOCKUP on CPU 2
CPU 2
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 9704, comm: toggle-writer Not tainted 2.6.24-git12markers #1
RIP: 0010:[<ffffffff8101c717>]  [<ffffffff8101c717>] __smp_call_function_mask+0x9f/0xc1
RSP: 0018:ffff81003ac35da8  EFLAGS: 00000297
RAX: 00000000000008fc RBX: 000000000000000b RCX: 00000000000008fc
RDX: 00000000000008fc RSI: 00000000000000fc RDI: 000000000000000b
RBP: ffff81003ac35e08 R08: ffffffff8840504f R09: 00007f01f07696f0
R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000003
R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff8106e7b3
FS:  00007f01f07696f0(0000) GS:ffff81007f801980(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f01f0796000 CR3: 000000006bcdf000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process toggle-writer (pid: 9704, threadinfo ffff81003ac34000, task ffff81003ad14930)
Stack:  ffffffff8106e7b3 0000000000000000 0000000000000002 00003fff00000000
 000000000000000b ffffffff81075c03 00000010001280d2 0000000000000000
 0000000000000000 ffffffff8106e7b3 000000000000000f 0000000000000005
Call Trace:
 [<ffffffff8106e7b3>] ? ipi_busy_loop+0x0/0x41
 [<ffffffff81075c03>] ? __alloc_pages+0x8e/0x337
 [<ffffffff8106e7b3>] ? ipi_busy_loop+0x0/0x41
 [<ffffffff8101c783>] smp_call_function_mask+0x4a/0x59
 [<ffffffff8101c7ab>] smp_call_function+0x19/0x1b
 [<ffffffff8106e703>] imv_update_range+0xd7/0x16e
 [<ffffffff8105494e>] _module_imv_update+0x35/0x54
 [<ffffffff81054982>] module_imv_update+0x15/0x23
 [<ffffffff884050c4>] :toggle_tester:proc_toggle_write+0x4c/0x64
 [<ffffffff810d64a4>] proc_reg_write+0x7b/0x96
 [<ffffffff81099f72>] vfs_write+0xae/0x157
 [<ffffffff8109a53f>] sys_write+0x47/0x70
 [<ffffffff8100c0e9>] tracesys+0xdc/0xe1

Code: f8 48 3b 5d c0 48 8b 05 28 1a 41 00 75 0a bf fc 00 00 00 ff 50 38 eb 0f be fc 00 00 00 48
---[ end trace 738433437d960ebc ]---
NMI Watchdog detected LOCKUP on CPU 1
CPU 1
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 0, comm: swapper Not tainted 2.6.24-git12markers #1
RIP: 0010:[<ffffffff8106e7cc>]  [<ffffffff8106e7cc>] ipi_busy_loop+0x19/0x41
RSP: 0018:ffff81007fb6ff70  EFLAGS: 00000002
RAX: 0000000000000006 RBX: 0000000000000000 RCX: ffff81007fb5f260
RDX: ffffffff81491b80 RSI: 0000000000000046 RDI: 0000000000000000
RBP: ffff81007fb6ff78 R08: ffff81007fb6ff78 R09: ffff810001016700
R10: ffff810001018b80 R11: 0000000000000001 R12: ffffffff8106e7b3
R13: 0000000000000000 R14: ffffffff81491a80 R15: ffff810001018a80
FS:  0000000000000000(0000) GS:ffff81007f801680(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000031fac9ac20 CR3: 00000000778e6000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff81007fb6a000, task ffff81007fb5f260)
Stack:  0000000000000000 ffff81007fb6ffa8 ffffffff8101ca4e ffff81007fb6ffa8
 ffffffff8100b0e2 0000000000000001 0000000000000040 ffff81007fb6be60
 ffffffff8100cac6 ffff81007fb6be60 <EOI>  ffff81007fb6bee8 0000000000000001
Call Trace:
 <IRQ>  [<ffffffff8101ca4e>] smp_call_function_interrupt+0x48/0x71
 [<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
 [<ffffffff8100cac6>] call_function_interrupt+0x66/0x70
 <EOI>  [<ffffffff8100b127>] ? mwait_idle+0x45/0x4a
 [<ffffffff8100a723>] ? enter_idle+0x22/0x24
 [<ffffffff8100b06f>] ? cpu_idle+0x97/0xc1
 [<ffffffff81270a35>] ? start_secondary+0x3ba/0x3c6

Code: 81 48 c7 c7 02 fd 36 81 48 89 e5 e8 7b fe ff ff c9 c3 55 48 89 e5 53 9c 5e fa f0 ff 0d 82
---[ end trace 738433437d960ebc ]---
NMI Watchdog detected LOCKUP on CPU 0
Kernel panic - not syncing: Aiee, killing interrupt handler!
CPU 0
Modules linked in: toggle_tester ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf]Pid: 0, comm: swapper Not tainted 2.6.24-git12markers #1
RIP: 0010:[<ffffffff8106e7e6>]  [<ffffffff8106e7e6>] ipi_busy_loop+0x33/0x41
RSP: 0018:ffffffff81498f70  EFLAGS: 00000006
RAX: 0000000000000004 RBX: 0000000000000000 RCX: ffffffff81394620
RDX: ffffffff81491b80 RSI: 0000000000000046 RDI: 0000000000000000
RBP: ffffffff81498f78 R08: ffffffff81498f78 R09: ffff810062cc1e98
R10: ffff81000100cb80 R11: ffff810062cc1d58 R12: ffffffff8106e7b3
R13: 0000000000000000 R14: ffffffff81466640 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff813d2000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000031fac6f060 CR3: 0000000000201000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81434000, task ffffffff81394620)
Stack:  0000000000000000 ffffffff81498fa8 ffffffff8101ca4e ffffffff8100b0e2
 ffffffff8100b0e2 0000000000000000 ffffffffffffffff ffffffff81435ec0
 ffffffff8100cac6 ffffffff81435ec0 <EOI>  ffffffff81435f48 ffff810062cc1d58
Call Trace:
 <IRQ>  [<ffffffff8101ca4e>] smp_call_function_interrupt+0x48/0x71
 [<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
 [<ffffffff8100b0e2>] ? mwait_idle+0x0/0x4a
 [<ffffffff8100cac6>] call_function_interrupt+0x66/0x70
 <EOI>  [<ffffffff8100b127>] ? mwait_idle+0x45/0x4a
 [<ffffffff8100a723>] ? enter_idle+0x22/0x24
 [<ffffffff8100b06f>] ? cpu_idle+0x97/0xc1
 [<ffffffff81266ee6>] ? rest_init+0x5a/0x5c

Code: f0 ff 0d 82 6b 49 00 0f ae f0 48 63 05 78 6b 49 00 48 3b 05 5d 6b 49 00 7f ed f0 ff 0d 68
---[ end trace 738433437d960ebc ]---


2)


--- /dev/null
+++ b/samples/markers/toggle-tester.c
@@ -0,0 +1,90 @@
+/* probe-example.c
+ *
+ * toggle module to test immedidate infrastructure.
+ *
+ * (C) Copyright 2007 Jason Baron <jbaron@redhat.com>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <linux/sched.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/marker.h>
+#include <asm/atomic.h>
+#include <linux/seq_file.h>
+#include <linux/fs.h>
+#include <linux/proc_fs.h>
+
+static DECLARE_MUTEX(toggle_mutex);
+DEFINE_IMV(char, toggle_value) = 0;
+
+static ssize_t proc_toggle_write(struct file *file, const char __user *buf,
+                                size_t length, loff_t *ppos)
+{
+	int i;
+
+	down(&toggle_mutex);
+	i = imv_read(toggle_value);
+	imv_set(toggle_value, !i);
+	up(&toggle_mutex);
+
+	return 0;
+}
+
+static int proc_toggle_show(struct seq_file *s, void *p)
+{
+	int i;
+
+	down(&toggle_mutex);        
+        i = imv_read(toggle_value);
+	up(&toggle_mutex);
+
+	seq_printf(s, "%u\n", i);
+
+        return 0;
+}
+
+
+static int proc_toggle_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, proc_toggle_show, NULL);
+}
+
+
+static const struct file_operations toggle_operations = {
+	.open           = proc_toggle_open,
+	.read           = seq_read,
+	.write          = proc_toggle_write,
+	.llseek         = seq_lseek,
+	.release        = single_release,
+};
+	
+
+struct proc_dir_entry *pde;
+
+static int __init toggle_init(void)
+{
+	
+	pde = create_proc_entry("toggle", 0, NULL);
+	if (!pde)
+		return -ENOMEM;
+	pde->proc_fops = &toggle_operations;
+
+
+	return 0;
+}
+
+static void __exit toggle_fini(void)
+{
+	remove_proc_entry("toggle", &proc_root);
+	
+}
+
+module_init(toggle_init);
+module_exit(toggle_fini);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Jason Baron");
+MODULE_DESCRIPTION("testing immediate implementation");


3)


diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 5bfc553..1ab1c5b 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -8,6 +8,9 @@
 #include <asm/system.h>
 
 #if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)
+
+#define RUN_ALL ~0U
+
 /**
  * stop_machine_run: freeze the machine on all CPUs and run this function
  * @fn: the function to run
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 51b5ee5..e6ee46f 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -23,9 +23,17 @@ enum stopmachine_state {
 	STOPMACHINE_WAIT,
 	STOPMACHINE_PREPARE,
 	STOPMACHINE_DISABLE_IRQ,
+	STOPMACHINE_RUN,
 	STOPMACHINE_EXIT,
 };
 
+struct stop_machine_data {
+	int (*fn)(void *);
+	void *data;
+	struct completion done;
+	int run_all;
+} smdata;
+
 static enum stopmachine_state stopmachine_state;
 static unsigned int stopmachine_num_threads;
 static atomic_t stopmachine_thread_ack;
@@ -35,6 +43,7 @@ static int stopmachine(void *cpu)
 {
 	int irqs_disabled = 0;
 	int prepared = 0;
+	int ran = 0;
 
 	set_cpus_allowed(current, cpumask_of_cpu((int)(long)cpu));
 
@@ -59,6 +68,11 @@ static int stopmachine(void *cpu)
 			prepared = 1;
 			smp_mb(); /* Must read state first. */
 			atomic_inc(&stopmachine_thread_ack);
+		} else if (stopmachine_state == STOPMACHINE_RUN && !ran) {
+			smdata.fn(smdata.data);
+			ran = 1;
+			smp_mb(); /* Must read state first. */
+			atomic_inc(&stopmachine_thread_ack);
 		}
 		/* Yield in first stage: migration threads need to
 		 * help our sisters onto their CPUs. */
@@ -136,12 +150,10 @@ static void restart_machine(void)
 	preempt_enable_no_resched();
 }
 
-struct stop_machine_data
+static void run_other_cpus(void)
 {
-	int (*fn)(void *);
-	void *data;
-	struct completion done;
-};
+	stopmachine_set_state(STOPMACHINE_RUN);
+}
 
 static int do_stop(void *_smdata)
 {
@@ -151,6 +163,8 @@ static int do_stop(void *_smdata)
 	ret = stop_machine();
 	if (ret == 0) {
 		ret = smdata->fn(smdata->data);
+		if (smdata->run_all)
+			run_other_cpus();
 		restart_machine();
 	}
 
@@ -170,17 +184,16 @@ static int do_stop(void *_smdata)
 struct task_struct *__stop_machine_run(int (*fn)(void *), void *data,
 				       unsigned int cpu)
 {
-	struct stop_machine_data smdata;
 	struct task_struct *p;
 
+	down(&stopmachine_mutex);
 	smdata.fn = fn;
 	smdata.data = data;
+	smdata.run_all = (cpu == RUN_ALL) ? 1 : 0;
 	init_completion(&smdata.done);
-
-	down(&stopmachine_mutex);
-
+	smp_wmb();
 	/* If they don't care which CPU fn runs on, bind to any online one. */
-	if (cpu == NR_CPUS)
+	if (cpu == NR_CPUS || cpu == RUN_ALL)
 		cpu = raw_smp_processor_id();
 
 	p = kthread_create(do_stop, &smdata, "kstopmachine");


4)


diff --git a/kernel/immediate.c b/kernel/immediate.c
index 4c36a89..39ec13e 100644
--- a/kernel/immediate.c
+++ b/kernel/immediate.c
@@ -20,6 +20,7 @@
 #include <linux/immediate.h>
 #include <linux/memory.h>
 #include <linux/cpu.h>
+#include <linux/stop_machine.h>
 
 #include <asm/cacheflush.h>
 
@@ -30,6 +31,23 @@ static int imv_early_boot_complete;
 
 extern const struct __imv __start___imv[];
 extern const struct __imv __stop___imv[];
+int started;
+
+int stop_machine_imv_update(void *imv_ptr)
+{
+	struct __imv *imv = imv_ptr;
+
+	if (!started) {
+		text_poke((void *)imv->imv, (void *)imv->var, imv->size);
+		started = 1;
+		smp_wmb();
+	} else
+		sync_core();
+
+	flush_icache_range(imv->imv, imv->imv + imv->size);
+
+	return 0;
+}
 
 /*
  * imv_mutex nests inside module_mutex. imv_mutex protects builtin
@@ -37,38 +55,6 @@ extern const struct __imv __stop___imv[];
  */
 static DEFINE_MUTEX(imv_mutex);
 
-static atomic_t wait_sync;
-
-struct ipi_loop_data {
-	long value;
-	const struct __imv *imv;
-} loop_data;
-
-static void ipi_busy_loop(void *arg)
-{
-	unsigned long flags;
-
-	local_irq_save(flags);
-	atomic_dec(&wait_sync);
-	do {
-		/* Make sure the wait_sync gets re-read */
-		smp_mb();
-	} while (atomic_read(&wait_sync) > loop_data.value);
-	atomic_dec(&wait_sync);
-	do {
-		/* Make sure the wait_sync gets re-read */
-		smp_mb();
-	} while (atomic_read(&wait_sync) > 0);
-	/*
-	 * Issuing a synchronizing instruction must be done on each CPU before
-	 * reenabling interrupts after modifying an instruction. Required by
-	 * Intel's errata.
-	 */
-	sync_core();
-	flush_icache_range(loop_data.imv->imv,
-		loop_data.imv->imv + loop_data.imv->size);
-	local_irq_restore(flags);
-}
 
 /**
  * apply_imv_update - update one immediate value
@@ -82,9 +68,6 @@ static void ipi_busy_loop(void *arg)
  */
 static int apply_imv_update(const struct __imv *imv)
 {
-	unsigned long flags;
-	long online_cpus;
-
 	/*
 	 * If the variable and the instruction have the same value, there is
 	 * nothing to do.
@@ -111,30 +94,8 @@ static int apply_imv_update(const struct __imv *imv)
 
 	if (imv_early_boot_complete) {
 		kernel_text_lock();
-		get_online_cpus();
-		online_cpus = num_online_cpus();
-		atomic_set(&wait_sync, 2 * online_cpus);
-		loop_data.value = online_cpus;
-		loop_data.imv = imv;
-		smp_call_function(ipi_busy_loop, NULL, 1, 0);
-		local_irq_save(flags);
-		atomic_dec(&wait_sync);
-		do {
-			/* Make sure the wait_sync gets re-read */
-			smp_mb();
-		} while (atomic_read(&wait_sync) > online_cpus);
-		text_poke((void *)imv->imv, (void *)imv->var,
-				imv->size);
-		/*
-		 * Make sure the modified instruction is seen by all CPUs before
-		 * we continue (visible to other CPUs and local interrupts).
-		 */
-		wmb();
-		atomic_dec(&wait_sync);
-		flush_icache_range(imv->imv,
-				imv->imv + imv->size);
-		local_irq_restore(flags);
-		put_online_cpus();
+		started = 0;
+		stop_machine_run(stop_machine_imv_update, (void *)imv, RUN_ALL);
 		kernel_text_unlock();
 	} else
 		text_poke_early((void *)imv->imv, (void *)imv->var,

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [patch 1/7] Immediate Values - Architecture Independent Code
  2008-02-02 21:08 [patch 0/7] Immediate Values Mathieu Desnoyers
@ 2008-02-02 21:08 ` Mathieu Desnoyers
  2008-02-26 22:52   ` Jason Baron
  0 siblings, 1 reply; 25+ messages in thread
From: Mathieu Desnoyers @ 2008-02-02 21:08 UTC (permalink / raw)
  To: akpm, Ingo Molnar, linux-kernel; +Cc: Mathieu Desnoyers, Rusty Russell

[-- Attachment #1: immediate-values-architecture-independent-code.patch --]
[-- Type: text/plain, Size: 18592 bytes --]

Immediate values are used as read mostly variables that are rarely updated. They
use code patching to modify the values inscribed in the instruction stream. It
provides a way to save precious cache lines that would otherwise have to be used
by these variables.

There is a generic _imv_read() version, which uses standard global
variables, and optimized per architecture imv_read() implementations,
which use a load immediate to remove a data cache hit. When the immediate values
functionnality is disabled in the kernel, it falls back to global variables.

It adds a new rodata section "__imv" to place the pointers to the enable
value. Immediate values activation functions sits in kernel/immediate.c.

Immediate values refer to the memory address of a previously declared integer.
This integer holds the information about the state of the immediate values
associated, and must be accessed through the API found in linux/immediate.h.

At module load time, each immediate value is checked to see if it must be
enabled. It would be the case if the variable they refer to is exported from
another module and already enabled.

In the early stages of start_kernel(), the immediate values are updated to
reflect the state of the variable they refer to.

* Why should this be merged *

It improves performances on heavy memory I/O workloads.

An interesting result shows the potential this infrastructure has by
showing the slowdown a simple system call such as getppid() suffers when it is
used under heavy user-space cache trashing:

Random walk L1 and L2 trashing surrounding a getppid() call:
(note: in this test, do_syscal_trace was taken at each system call, see
Documentation/immediate.txt in these patches for details)
- No memory pressure :   getppid() takes  1573 cycles
- With memory pressure : getppid() takes 15589 cycles

We therefore have a slowdown of 10 times just to get the kernel variables from
memory. Another test on the same architecture (Intel P4) measured the memory
latency to be 559 cycles. Therefore, each cache line removed from the hot path
would improve the syscall time of 3.5% in these conditions.

Changelog:

- section __imv is already SHF_ALLOC
- Because of the wonders of ELF, section 0 has sh_addr and sh_size 0.  So
  the if (immediateindex) is unnecessary here.
- Remove module_mutex usage: depend on functions implemented in module.c for
  that.
- Does not update tainted module's immediate values.
- remove imv_*_t types, add DECLARE_IMV() and DEFINE_IMV().
  - imv_read(&var) becomes imv_read(var) because of this.
- Adding a new EXPORT_IMV_SYMBOL(_GPL).
- remove imv_if(). Should use if (unlikely(imv_read(var))) instead.
  - Wait until we have gcc support before we add the imv_if macro, since
    its form may have to change.
- Dont't declare the __imv section in vmlinux.lds.h, just put the content
  in the rodata section.
- Simplify interface : remove imv_set_early, keep track of kernel boot
  status internally.
- Remove the ALIGN(8) before the __imv section. It is packed now.
- Uses an IPI busy-loop on each CPU with interrupts disabled as a simple,
  architecture agnostic, update mechanism.
- Use imv_* instead of immediate_*.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Rusty Russell <rusty@rustcorp.com.au>
---
 include/asm-generic/vmlinux.lds.h |    3 
 include/linux/immediate.h         |   94 +++++++++++++++++++
 include/linux/module.h            |   16 +++
 init/main.c                       |    8 +
 kernel/Makefile                   |    1 
 kernel/immediate.c                |  187 ++++++++++++++++++++++++++++++++++++++
 kernel/module.c                   |   50 +++++++++-
 7 files changed, 358 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/include/linux/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/immediate.h	2008-02-02 15:54:04.000000000 -0500
@@ -0,0 +1,94 @@
+#ifndef _LINUX_IMMEDIATE_H
+#define _LINUX_IMMEDIATE_H
+
+/*
+ * Immediate values, can be updated at runtime and save cache lines.
+ *
+ * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#ifdef CONFIG_IMMEDIATE
+
+struct __imv {
+	unsigned long var;	/* Pointer to the identifier variable of the
+				 * immediate value
+				 */
+	unsigned long imv;	/*
+				 * Pointer to the memory location of the
+				 * immediate value within the instruction.
+				 */
+	unsigned char size;	/* Type size. */
+} __attribute__ ((packed));
+
+#include <asm/immediate.h>
+
+/**
+ * imv_set - set immediate variable (with locking)
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name, taking the module_mutex if required by
+ * the architecture.
+ */
+#define imv_set(name, i)						\
+	do {								\
+		name##__imv = (i);					\
+		core_imv_update();					\
+		module_imv_update();					\
+	} while (0)
+
+/*
+ * Internal update functions.
+ */
+extern void core_imv_update(void);
+extern void imv_update_range(const struct __imv *begin,
+	const struct __imv *end);
+
+#else
+
+/*
+ * Generic immediate values: a simple, standard, memory load.
+ */
+
+/**
+ * imv_read - read immediate variable
+ * @name: immediate value name
+ *
+ * Reads the value of @name.
+ */
+#define imv_read(name)			_imv_read(name)
+
+/**
+ * imv_set - set immediate variable (with locking)
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name, taking the module_mutex if required by
+ * the architecture.
+ */
+#define imv_set(name, i)		(name##__imv = (i))
+
+static inline void core_imv_update(void) { }
+static inline void module_imv_update(void) { }
+
+#endif
+
+#define DECLARE_IMV(type, name) extern __typeof__(type) name##__imv
+#define DEFINE_IMV(type, name)  __typeof__(type) name##__imv
+
+#define EXPORT_IMV_SYMBOL(name) EXPORT_SYMBOL(name##__imv)
+#define EXPORT_IMV_SYMBOL_GPL(name) EXPORT_SYMBOL_GPL(name##__imv)
+
+/**
+ * _imv_read - Read immediate value with standard memory load.
+ * @name: immediate value name
+ *
+ * Force a data read of the immediate value instead of the immediate value
+ * based mechanism. Useful for __init and __exit section data read.
+ */
+#define _imv_read(name)		(name##__imv)
+
+#endif
Index: linux-2.6-lttng/include/linux/module.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/module.h	2008-02-02 15:54:04.000000000 -0500
+++ linux-2.6-lttng/include/linux/module.h	2008-02-02 15:54:04.000000000 -0500
@@ -15,6 +15,7 @@
 #include <linux/stringify.h>
 #include <linux/kobject.h>
 #include <linux/moduleparam.h>
+#include <linux/immediate.h>
 #include <linux/marker.h>
 #include <asm/local.h>
 
@@ -355,6 +356,10 @@ struct module
 	/* The command line arguments (may be mangled).  People like
 	   keeping pointers to this stuff */
 	char *args;
+#ifdef CONFIG_IMMEDIATE
+	const struct __imv *immediate;
+	unsigned int num_immediate;
+#endif
 #ifdef CONFIG_MARKERS
 	struct marker *markers;
 	unsigned int num_markers;
@@ -467,6 +472,9 @@ extern void print_modules(void);
 
 extern void module_update_markers(void);
 
+extern void _module_imv_update(void);
+extern void module_imv_update(void);
+
 #else /* !CONFIG_MODULES... */
 #define EXPORT_SYMBOL(sym)
 #define EXPORT_SYMBOL_GPL(sym)
@@ -572,6 +580,14 @@ static inline void module_update_markers
 {
 }
 
+static inline void _module_imv_update(void)
+{
+}
+
+static inline void module_imv_update(void)
+{
+}
+
 #endif /* CONFIG_MODULES */
 
 struct device_driver;
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2008-02-02 15:54:04.000000000 -0500
+++ linux-2.6-lttng/kernel/module.c	2008-02-02 15:54:04.000000000 -0500
@@ -33,6 +33,7 @@
 #include <linux/cpu.h>
 #include <linux/moduleparam.h>
 #include <linux/errno.h>
+#include <linux/immediate.h>
 #include <linux/err.h>
 #include <linux/vermagic.h>
 #include <linux/notifier.h>
@@ -1716,6 +1717,7 @@ static struct module *load_module(void _
 	unsigned int unusedcrcindex;
 	unsigned int unusedgplindex;
 	unsigned int unusedgplcrcindex;
+	unsigned int immediateindex;
 	unsigned int markersindex;
 	unsigned int markersstringsindex;
 	struct module *mod;
@@ -1814,6 +1816,7 @@ static struct module *load_module(void _
 #ifdef ARCH_UNWIND_SECTION_NAME
 	unwindex = find_sec(hdr, sechdrs, secstrings, ARCH_UNWIND_SECTION_NAME);
 #endif
+	immediateindex = find_sec(hdr, sechdrs, secstrings, "__imv");
 
 	/* Don't keep modinfo section */
 	sechdrs[infoindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
@@ -1965,6 +1968,11 @@ static struct module *load_module(void _
 	mod->gpl_future_syms = (void *)sechdrs[gplfutureindex].sh_addr;
 	if (gplfuturecrcindex)
 		mod->gpl_future_crcs = (void *)sechdrs[gplfuturecrcindex].sh_addr;
+#ifdef CONFIG_IMMEDIATE
+	mod->immediate = (void *)sechdrs[immediateindex].sh_addr;
+	mod->num_immediate =
+		sechdrs[immediateindex].sh_size / sizeof(*mod->immediate);
+#endif
 
 	mod->unused_syms = (void *)sechdrs[unusedindex].sh_addr;
 	if (unusedcrcindex)
@@ -2032,11 +2040,16 @@ static struct module *load_module(void _
 
 	add_kallsyms(mod, sechdrs, symindex, strindex, secstrings);
 
+	if (!(mod->taints & TAINT_FORCED_MODULE)) {
 #ifdef CONFIG_MARKERS
-	if (!(mod->taints & TAINT_FORCED_MODULE))
 		marker_update_probe_range(mod->markers,
 			mod->markers + mod->num_markers);
 #endif
+#ifdef CONFIG_IMMEDIATE
+		imv_update_range(mod->immediate,
+			mod->immediate + mod->num_immediate);
+#endif
+	}
 	err = module_finalize(hdr, sechdrs, mod);
 	if (err < 0)
 		goto cleanup;
@@ -2573,3 +2586,38 @@ void module_update_markers(void)
 	mutex_unlock(&module_mutex);
 }
 #endif
+
+#ifdef CONFIG_IMMEDIATE
+/**
+ * _module_imv_update - update all immediate values in the kernel
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ * Module_mutex must be held be the caller.
+ */
+void _module_imv_update(void)
+{
+	struct module *mod;
+
+	list_for_each_entry(mod, &modules, list) {
+		if (mod->taints)
+			continue;
+		imv_update_range(mod->immediate,
+			mod->immediate + mod->num_immediate);
+	}
+}
+EXPORT_SYMBOL_GPL(_module_imv_update);
+
+/**
+ * module_imv_update - update all immediate values in the kernel
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ * Takes module_mutex.
+ */
+void module_imv_update(void)
+{
+	mutex_lock(&module_mutex);
+	_module_imv_update();
+	mutex_unlock(&module_mutex);
+}
+EXPORT_SYMBOL_GPL(module_imv_update);
+#endif
Index: linux-2.6-lttng/kernel/immediate.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/kernel/immediate.c	2008-02-02 15:57:23.000000000 -0500
@@ -0,0 +1,187 @@
+/*
+ * Copyright (C) 2007 Mathieu Desnoyers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/immediate.h>
+#include <linux/memory.h>
+#include <linux/cpu.h>
+
+#include <asm/cacheflush.h>
+
+/*
+ * Kernel ready to execute the SMP update that may depend on trap and ipi.
+ */
+static int imv_early_boot_complete;
+
+extern const struct __imv __start___imv[];
+extern const struct __imv __stop___imv[];
+
+/*
+ * imv_mutex nests inside module_mutex. imv_mutex protects builtin
+ * immediates and module immediates.
+ */
+static DEFINE_MUTEX(imv_mutex);
+
+static atomic_t wait_sync;
+
+struct ipi_loop_data {
+	long value;
+	const struct __imv *imv;
+} loop_data;
+
+static void ipi_busy_loop(void *arg)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	atomic_dec(&wait_sync);
+	do {
+		/* Make sure the wait_sync gets re-read */
+		smp_mb();
+	} while (atomic_read(&wait_sync) > loop_data.value);
+	atomic_dec(&wait_sync);
+	do {
+		/* Make sure the wait_sync gets re-read */
+		smp_mb();
+	} while (atomic_read(&wait_sync) > 0);
+	/*
+	 * Issuing a synchronizing instruction must be done on each CPU before
+	 * reenabling interrupts after modifying an instruction. Required by
+	 * Intel's errata.
+	 */
+	sync_core();
+	flush_icache_range(loop_data.imv->imv,
+		loop_data.imv->imv + loop_data.imv->size);
+	local_irq_restore(flags);
+}
+
+/**
+ * apply_imv_update - update one immediate value
+ * @imv: pointer of type const struct __imv to update
+ *
+ * Update one immediate value. Must be called with imv_mutex held.
+ * It makes sure all CPUs are not executing the modified code by having them
+ * busy looping with interrupts disabled.
+ * It does _not_ protect against NMI and MCE (could be a problem with Intel's
+ * errata if we use immediate values in their code path).
+ */
+static int apply_imv_update(const struct __imv *imv)
+{
+	unsigned long flags;
+	long online_cpus;
+
+	/*
+	 * If the variable and the instruction have the same value, there is
+	 * nothing to do.
+	 */
+	switch (imv->size) {
+	case 1:	if (*(uint8_t *)imv->imv
+				== *(uint8_t *)imv->var)
+			return 0;
+		break;
+	case 2:	if (*(uint16_t *)imv->imv
+				== *(uint16_t *)imv->var)
+			return 0;
+		break;
+	case 4:	if (*(uint32_t *)imv->imv
+				== *(uint32_t *)imv->var)
+			return 0;
+		break;
+	case 8:	if (*(uint64_t *)imv->imv
+				== *(uint64_t *)imv->var)
+			return 0;
+		break;
+	default:return -EINVAL;
+	}
+
+	if (imv_early_boot_complete) {
+		kernel_text_lock();
+		get_online_cpus();
+		online_cpus = num_online_cpus();
+		atomic_set(&wait_sync, 2 * online_cpus);
+		loop_data.value = online_cpus;
+		loop_data.imv = imv;
+		smp_call_function(ipi_busy_loop, NULL, 1, 0);
+		local_irq_save(flags);
+		atomic_dec(&wait_sync);
+		do {
+			/* Make sure the wait_sync gets re-read */
+			smp_mb();
+		} while (atomic_read(&wait_sync) > online_cpus);
+		text_poke((void *)imv->imv, (void *)imv->var,
+				imv->size);
+		/*
+		 * Make sure the modified instruction is seen by all CPUs before
+		 * we continue (visible to other CPUs and local interrupts).
+		 */
+		wmb();
+		atomic_dec(&wait_sync);
+		flush_icache_range(imv->imv,
+				imv->imv + imv->size);
+		local_irq_restore(flags);
+		put_online_cpus();
+		kernel_text_unlock();
+	} else
+		text_poke_early((void *)imv->imv, (void *)imv->var,
+				imv->size);
+	return 0;
+}
+
+/**
+ * imv_update_range - Update immediate values in a range
+ * @begin: pointer to the beginning of the range
+ * @end: pointer to the end of the range
+ *
+ * Updates a range of immediates.
+ */
+void imv_update_range(const struct __imv *begin,
+		const struct __imv *end)
+{
+	const struct __imv *iter;
+	int ret;
+	for (iter = begin; iter < end; iter++) {
+		mutex_lock(&imv_mutex);
+		ret = apply_imv_update(iter);
+		if (imv_early_boot_complete && ret)
+			printk(KERN_WARNING
+				"Invalid immediate value. "
+				"Variable at %p, "
+				"instruction at %p, size %hu\n",
+				(void *)iter->imv,
+				(void *)iter->var, iter->size);
+		mutex_unlock(&imv_mutex);
+	}
+}
+EXPORT_SYMBOL_GPL(imv_update_range);
+
+/**
+ * imv_update - update all immediate values in the kernel
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ */
+void core_imv_update(void)
+{
+	/* Core kernel imvs */
+	imv_update_range(__start___imv, __stop___imv);
+}
+EXPORT_SYMBOL_GPL(core_imv_update);
+
+void __init imv_init_complete(void)
+{
+	imv_early_boot_complete = 1;
+}
Index: linux-2.6-lttng/init/main.c
===================================================================
--- linux-2.6-lttng.orig/init/main.c	2008-02-02 15:33:24.000000000 -0500
+++ linux-2.6-lttng/init/main.c	2008-02-02 15:54:04.000000000 -0500
@@ -57,6 +57,7 @@
 #include <linux/device.h>
 #include <linux/kthread.h>
 #include <linux/sched.h>
+#include <linux/immediate.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -101,6 +102,11 @@ static inline void mark_rodata_ro(void) 
 #ifdef CONFIG_TC
 extern void tc_init(void);
 #endif
+#ifdef CONFIG_IMMEDIATE
+extern void imv_init_complete(void);
+#else
+static inline void imv_init_complete(void) { }
+#endif
 
 enum system_states system_state;
 EXPORT_SYMBOL(system_state);
@@ -522,6 +528,7 @@ asmlinkage void __init start_kernel(void
 	unwind_init();
 	lockdep_init();
 	cgroup_init_early();
+	core_imv_update();
 
 	local_irq_disable();
 	early_boot_irqs_off();
@@ -645,6 +652,7 @@ asmlinkage void __init start_kernel(void
 	cpuset_init();
 	taskstats_init_early();
 	delayacct_init();
+	imv_init_complete();
 
 	check_bugs();
 
Index: linux-2.6-lttng/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/kernel/Makefile	2008-02-02 15:33:24.000000000 -0500
+++ linux-2.6-lttng/kernel/Makefile	2008-02-02 15:54:04.000000000 -0500
@@ -63,6 +63,7 @@ obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
+obj-$(CONFIG_IMMEDIATE) += immediate.o
 obj-$(CONFIG_MARKERS) += marker.o
 obj-$(CONFIG_LATENCYTOP) += latencytop.o
 
Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2008-02-02 15:33:24.000000000 -0500
+++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2008-02-02 15:54:04.000000000 -0500
@@ -61,6 +61,9 @@
 		*(.rodata) *(.rodata.*)					\
 		*(__vermagic)		/* Kernel version magic */	\
 		*(__markers_strings)	/* Markers: strings */		\
+		VMLINUX_SYMBOL(__start___imv) = .;			\
+		*(__imv)		/* Immediate values: pointers */ \
+		VMLINUX_SYMBOL(__stop___imv) = .;			\
 	}								\
 									\
 	.rodata1          : AT(ADDR(.rodata1) - LOAD_OFFSET) {		\

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2007-09-19  8:57           ` Denys Vlasenko
@ 2007-09-19 11:27             ` Mathieu Desnoyers
  0 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2007-09-19 11:27 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: akpm, linux-kernel

* Denys Vlasenko (vda.linux@googlemail.com) wrote:
> On Tuesday 18 September 2007 21:47, Mathieu Desnoyers wrote:
> > * Denys Vlasenko (vda.linux@googlemail.com) wrote:
> > > On Tuesday 18 September 2007 18:59, Mathieu Desnoyers wrote:
> > > > * Denys Vlasenko (vda.linux@googlemail.com) wrote:
> > > > > On Monday 17 September 2007 19:42, Mathieu Desnoyers wrote:
> > > > > > Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
> > > > > > ===================================================================
> > > > > > --- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2007-09-17 13:25:06.000000000 -0400
> > > > > > +++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2007-09-17 13:35:50.000000000 -0400
> > > > > > @@ -122,6 +122,13 @@
> > > > > >  		VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .;	\
> > > > > >  	}								\
> > > > > >  									\
> > > > > > +	/* Immediate values: pointers */				\
> > > > > > +	__immediate : AT(ADDR(__immediate) - LOAD_OFFSET) {		\
> > > > > > +		VMLINUX_SYMBOL(__start___immediate) = .;		\
> > > > > > +		*(__immediate)						\
> > > > > > +		VMLINUX_SYMBOL(__stop___immediate) = .;			\
> > > > > > +	}								\
> > > > > > +									\
> > > > > 
> > > > > Why do you need an output section for that? IOW: will this work too?
> > > > > 
> > > > > .data : ... { 
> > > > > ...
> > > > > 
> > > > > 		VMLINUX_SYMBOL(__start___immediate) = .;		\
> > > > > 		*(__immediate)						\
> > > > > 		VMLINUX_SYMBOL(__stop___immediate) = .;			\
> > > > > ...
> > > > > }
> > > > > 
> > > > 
> > > > This last one could cause alignment problems. We either have to use the
> > > > proper ALIGN() before the section, or let AT(ADDR(__immediate) -
> > > > LOAD_OFFSET) take care of it. I prefer the latter.
> > > 
> > > This adds yet another output section in vmlinux, and there is
> > > no tools which need that. We already have 30+ sections there while we need ~20.
> > > 
> > > I am trying to fix the mess. Please don't add to it.
> > > 
> > > Re alignment: (1) do you really realy REALLY need it? Last I checked,
> > > i386 was handling unaligned accesses just fine; and
> > > (2) this works:
> > > 
> > > 		. = ALIGN(4)
> > >  		VMLINUX_SYMBOL(__start___immediate) = .;		\
> > >  		*(__immediate)						\
> > >  		VMLINUX_SYMBOL(__stop___immediate) = .;			\
> > > 
> > > 
> > 
> > Alignment: I need the __start___immediate and __stop___immediate values
> > to be at the same alignment as the *(__immediate) content, or else we
> > end up thinking that padding is data.
> > 
> > . = ALIGN(4) works fine as long as the structure within the section is
> > not bigger or equal to 32 bytes: gcc has the habit to align 32 bytes
> > structure on 32 bytes multiples. The safest way I found to do it is to
> > declare the section as I do: it will cause no breakage if anybody append
> > data to the structure.
> 
> If your structure will be padded by gcc, then this:
> 
> +#define immediate_read(name)                                           \
> +       ({                                                              \
> +               __typeof__(name##__immediate) value;                    \
> +               switch (sizeof(value)) {                                \
> +               case 1:                                                 \
> +                       asm (   ".section __immediate, \"a\", @progbits;\n\t" \
> +                                       ".long %1, (0f)+1, 1;\n\t"      \
> +                                       ".previous;\n\t"                \
> +                                       "0:\n\t"                        \
> +                                       "mov %2,%0;\n\t"                \
> +                               : "=r" (value)                          \
> +                               : "m" (name##__immediate),              \
> +                                 "i" (0));                             \
> +                       break;                                          \
> 
> will produce wrongly-sized "struct __immediate" (truncated one),
> since gcc has no idea that you are building struct __immediate there,
> and here:
> 
> +void immediate_update_range(const struct __immediate *begin,
> +               const struct __immediate *end)
> +{
> +       const struct __immediate *iter;
> +       int ret;
> +
> +       for (iter = begin; iter < end; iter++) {
> +               mutex_lock(&immediate_mutex);
> +               kernel_text_lock();
> +               ret = arch_immediate_update(iter);
> +               kernel_text_unlock();
> +               if (ret)
> +                       printk(KERN_WARNING "Invalid immediate value. "
> +                                           "Variable at %p, "
> +                                           "instruction at %p, size %lu\n",
> +                                           (void*)iter->immediate,
> +                                           (void*)iter->var, iter->size);
> +               mutex_unlock(&immediate_mutex);
> +       }
> +}
> 
> iter++ will go off rails.

You are right. It's ok here since we are actually smaller than 32 bytes,
but I should force the structure alignment so that if the structure
grows, the assembly declaration follows. I'll go for the gcc attribute
and then we can remove the section declaration.

Mathieu

> --
> vda

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2007-09-18 20:47         ` Mathieu Desnoyers
  2007-09-19  8:45           ` Denys Vlasenko
@ 2007-09-19  8:57           ` Denys Vlasenko
  2007-09-19 11:27             ` Mathieu Desnoyers
  1 sibling, 1 reply; 25+ messages in thread
From: Denys Vlasenko @ 2007-09-19  8:57 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel

On Tuesday 18 September 2007 21:47, Mathieu Desnoyers wrote:
> * Denys Vlasenko (vda.linux@googlemail.com) wrote:
> > On Tuesday 18 September 2007 18:59, Mathieu Desnoyers wrote:
> > > * Denys Vlasenko (vda.linux@googlemail.com) wrote:
> > > > On Monday 17 September 2007 19:42, Mathieu Desnoyers wrote:
> > > > > Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
> > > > > ===================================================================
> > > > > --- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2007-09-17 13:25:06.000000000 -0400
> > > > > +++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2007-09-17 13:35:50.000000000 -0400
> > > > > @@ -122,6 +122,13 @@
> > > > >  		VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .;	\
> > > > >  	}								\
> > > > >  									\
> > > > > +	/* Immediate values: pointers */				\
> > > > > +	__immediate : AT(ADDR(__immediate) - LOAD_OFFSET) {		\
> > > > > +		VMLINUX_SYMBOL(__start___immediate) = .;		\
> > > > > +		*(__immediate)						\
> > > > > +		VMLINUX_SYMBOL(__stop___immediate) = .;			\
> > > > > +	}								\
> > > > > +									\
> > > > 
> > > > Why do you need an output section for that? IOW: will this work too?
> > > > 
> > > > .data : ... { 
> > > > ...
> > > > 
> > > > 		VMLINUX_SYMBOL(__start___immediate) = .;		\
> > > > 		*(__immediate)						\
> > > > 		VMLINUX_SYMBOL(__stop___immediate) = .;			\
> > > > ...
> > > > }
> > > > 
> > > 
> > > This last one could cause alignment problems. We either have to use the
> > > proper ALIGN() before the section, or let AT(ADDR(__immediate) -
> > > LOAD_OFFSET) take care of it. I prefer the latter.
> > 
> > This adds yet another output section in vmlinux, and there is
> > no tools which need that. We already have 30+ sections there while we need ~20.
> > 
> > I am trying to fix the mess. Please don't add to it.
> > 
> > Re alignment: (1) do you really realy REALLY need it? Last I checked,
> > i386 was handling unaligned accesses just fine; and
> > (2) this works:
> > 
> > 		. = ALIGN(4)
> >  		VMLINUX_SYMBOL(__start___immediate) = .;		\
> >  		*(__immediate)						\
> >  		VMLINUX_SYMBOL(__stop___immediate) = .;			\
> > 
> > 
> 
> Alignment: I need the __start___immediate and __stop___immediate values
> to be at the same alignment as the *(__immediate) content, or else we
> end up thinking that padding is data.
> 
> . = ALIGN(4) works fine as long as the structure within the section is
> not bigger or equal to 32 bytes: gcc has the habit to align 32 bytes
> structure on 32 bytes multiples. The safest way I found to do it is to
> declare the section as I do: it will cause no breakage if anybody append
> data to the structure.

If your structure will be padded by gcc, then this:

+#define immediate_read(name)                                           \
+       ({                                                              \
+               __typeof__(name##__immediate) value;                    \
+               switch (sizeof(value)) {                                \
+               case 1:                                                 \
+                       asm (   ".section __immediate, \"a\", @progbits;\n\t" \
+                                       ".long %1, (0f)+1, 1;\n\t"      \
+                                       ".previous;\n\t"                \
+                                       "0:\n\t"                        \
+                                       "mov %2,%0;\n\t"                \
+                               : "=r" (value)                          \
+                               : "m" (name##__immediate),              \
+                                 "i" (0));                             \
+                       break;                                          \

will produce wrongly-sized "struct __immediate" (truncated one),
since gcc has no idea that you are building struct __immediate there,
and here:

+void immediate_update_range(const struct __immediate *begin,
+               const struct __immediate *end)
+{
+       const struct __immediate *iter;
+       int ret;
+
+       for (iter = begin; iter < end; iter++) {
+               mutex_lock(&immediate_mutex);
+               kernel_text_lock();
+               ret = arch_immediate_update(iter);
+               kernel_text_unlock();
+               if (ret)
+                       printk(KERN_WARNING "Invalid immediate value. "
+                                           "Variable at %p, "
+                                           "instruction at %p, size %lu\n",
+                                           (void*)iter->immediate,
+                                           (void*)iter->var, iter->size);
+               mutex_unlock(&immediate_mutex);
+       }
+}

iter++ will go off rails.
--
vda

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2007-09-18 20:47         ` Mathieu Desnoyers
@ 2007-09-19  8:45           ` Denys Vlasenko
  2007-09-19  8:57           ` Denys Vlasenko
  1 sibling, 0 replies; 25+ messages in thread
From: Denys Vlasenko @ 2007-09-19  8:45 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel

On Tuesday 18 September 2007 21:47, Mathieu Desnoyers wrote:
> * Denys Vlasenko (vda.linux@googlemail.com) wrote:
> > On Tuesday 18 September 2007 18:59, Mathieu Desnoyers wrote:
> > > * Denys Vlasenko (vda.linux@googlemail.com) wrote:
> > > > On Monday 17 September 2007 19:42, Mathieu Desnoyers wrote:
> > > > > Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
> > > > > ===================================================================
> > > > > --- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2007-09-17 13:25:06.000000000 -0400
> > > > > +++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2007-09-17 13:35:50.000000000 -0400
> > > > > @@ -122,6 +122,13 @@
> > > > >  		VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .;	\
> > > > >  	}								\
> > > > >  									\
> > > > > +	/* Immediate values: pointers */				\
> > > > > +	__immediate : AT(ADDR(__immediate) - LOAD_OFFSET) {		\
> > > > > +		VMLINUX_SYMBOL(__start___immediate) = .;		\
> > > > > +		*(__immediate)						\
> > > > > +		VMLINUX_SYMBOL(__stop___immediate) = .;			\
> > > > > +	}								\
> > > > > +									\
> > > > 
> > > > Why do you need an output section for that? IOW: will this work too?
> > > > 
> > > > .data : ... { 
> > > > ...
> > > > 
> > > > 		VMLINUX_SYMBOL(__start___immediate) = .;		\
> > > > 		*(__immediate)						\
> > > > 		VMLINUX_SYMBOL(__stop___immediate) = .;			\
> > > > ...
> > > > }
> > > > 
> > > 
> > > This last one could cause alignment problems. We either have to use the
> > > proper ALIGN() before the section, or let AT(ADDR(__immediate) -
> > > LOAD_OFFSET) take care of it. I prefer the latter.
> > 
> > This adds yet another output section in vmlinux, and there is
> > no tools which need that. We already have 30+ sections there while we need ~20.
> > 
> > I am trying to fix the mess. Please don't add to it.
> > 
> > Re alignment: (1) do you really realy REALLY need it? Last I checked,
> > i386 was handling unaligned accesses just fine; and
> > (2) this works:
> > 
> > 		. = ALIGN(4)
> >  		VMLINUX_SYMBOL(__start___immediate) = .;		\
> >  		*(__immediate)						\
> >  		VMLINUX_SYMBOL(__stop___immediate) = .;			\
> > 
> > 
> 
> Alignment: I need the __start___immediate and __stop___immediate values
> to be at the same alignment as the *(__immediate) content, or else we
> end up thinking that padding is data.
> 
> . = ALIGN(4) works fine as long as the structure within the section is
> not bigger or equal to 32 bytes: gcc has the habit to align 32 bytes
> structure on 32 bytes multiples. The safest way I found to do it is to

Yes, I'm painfully aware of that. gcc is too damn happy to align stuff.

> declare the section as I do: it will cause no breakage if anybody append
> data to the structure.

You can actively fight gcc's sadistic alignment tendencies instead:

struct __immediate {
       long var;               /* Identifier variable of the immediate value */
       long immediate;         /*
                                * Pointer to the memory location that holds
                                * the immediate value within the load immediate
                                * instruction.
                                */
       long size;              /* Type size. */
} __attribute__ ((aligned(sizeof(long))));  <================= HERE

Kernel is already using this technique a lot. Try

grep -r '^\} *__attribute__ *.*aligned' .
--
vda

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 1/7] Immediate Values - Architecture Independent Code
  2007-09-18 21:07 [patch 0/7] Immediate Values for 2.6.23-rc6-mm1 Mathieu Desnoyers
@ 2007-09-18 21:07 ` Mathieu Desnoyers
  0 siblings, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2007-09-18 21:07 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Mathieu Desnoyers

[-- Attachment #1: immediate-values-architecture-independent-code.patch --]
[-- Type: text/plain, Size: 17446 bytes --]

Immediate values are used as read mostly variables that are rarely updated. They
use code patching to modify the values inscribed in the instruction stream. It
provides a way to save precious cache lines that would otherwise have to be used
by these variables.

There is a generic _immediate_read() version, which uses standard global
variables, and optimized per architecture immediate_read() implementations,
which use a load immediate to remove a data cache hit. When the immediate values
functionnality is disabled in the kernel, it falls back to global variables.

It adds a new rodata section "__immediate" to place the pointers to the enable
value. Immediate values activation functions sits in kernel/immediate.c.

Immediate values refer to the memory address of a previously declared integer.
This integer holds the information about the state of the immediate values
associated, and must be accessed through the API found in linux/immediate.h.

At module load time, each immediate value is checked to see if it must be
enabled. It would be the case if the variable they refer to is exported from
another module and already enabled.

In the early stages of start_kernel(), the immediate values are updated to
reflect the state of the variable they refer to.

* Why should this be merged *

It improves performances on heavy memory I/O workloads.

An interesting result shows the potential this infrastructure has by
showing the slowdown a simple system call such as getppid() suffers when it is
used under heavy user-space cache trashing:

Random walk L1 and L2 trashing surrounding a getppid() call:
(note: in this test, do_syscal_trace was taken at each system call, see
Documentation/immediate.txt in these patches for details)
- No memory pressure :   getppid() takes  1573 cycles
- With memory pressure : getppid() takes 15589 cycles

We therefore have a slowdown of 10 times just to get the kernel variables from
memory. Another test on the same architecture (Intel P4) measured the memory
latency to be 559 cycles. Therefore, each cache line removed from the hot path
would improve the syscall time of 3.5% in these conditions.

Changelog:

- section __immediate is already SHF_ALLOC
- Because of the wonders of ELF, section 0 has sh_addr and sh_size 0.  So
  the if (immediateindex) is unnecessary here.
- Remove module_mutex usage: depend on functions implemented in module.c for
  that.
- Does not update tainted module's immediate values.
- remove immediate_*_t types, add DECLARE_IMMEDIATE() and DEFINE_IMMEDIATE().
  - immediate_read(&var) becomes immediate_read(var) because of this.
- Adding a new EXPORT_IMMEDIATE_SYMBOL(_GPL).
- remove immediate_if(). Should use if (unlikely(immediate_read(var))) instead.
  - Wait until we have gcc support before we add the immediate_if macro, since
    its form may have to change.
- Document immediate_set_early(). This design is chosen over immediate_init()
  because:
    - We can mark the function __init so it is freed after boot.
    - Module code patching can also be done by the "normal" function. This is
      preferred, even if it can be slightly slower, because update performance
      does not matter.
    - immediate_init() could confuse users because we can actually "initialize"
      an immediate value to a certain value, which will be used as initial
      value. e.g.:
        static DEFINE_IMMEDIATE(int, var) = 10;

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/asm-generic/vmlinux.lds.h |    7 ++
 include/linux/immediate.h         |  133 ++++++++++++++++++++++++++++++++++++++
 include/linux/module.h            |   17 ++++
 init/main.c                       |    2 
 kernel/Makefile                   |    1 
 kernel/immediate.c                |   92 ++++++++++++++++++++++++++
 kernel/module.c                   |   49 ++++++++++++++
 7 files changed, 301 insertions(+)

Index: linux-2.6-lttng/include/linux/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/immediate.h	2007-09-17 13:38:32.000000000 -0400
@@ -0,0 +1,133 @@
+#ifndef _LINUX_IMMEDIATE_H
+#define _LINUX_IMMEDIATE_H
+
+/*
+ * Immediate values, can be updated at runtime and save cache lines.
+ *
+ * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#ifdef CONFIG_IMMEDIATE
+
+#include <asm/immediate.h>
+
+/**
+ * immediate_set - set immediate variable (with locking)
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name, taking the module_mutex if required by
+ * the architecture.
+ */
+#define immediate_set(name, i)						\
+	do {								\
+		name##__immediate = (i);				\
+		core_immediate_update();				\
+		module_immediate_update();				\
+	} while (0)
+
+
+/**
+ * _immediate_set - set immediate variable (without locking)
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name. Must be called with module_mutex held.
+ */
+#define _immediate_set(name, i)						\
+	do {								\
+		name##__immediate = (i);				\
+		core_immediate_update();				\
+		_module_immediate_update();				\
+	} while (0)
+
+/**
+ * immediate_set_early - set immediate variable at early boot
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name. Should be used for updates at early boot, when only
+ * one CPU is active and interrupts are disabled.
+ */
+#define immediate_set_early(name, i)					\
+	do {								\
+		name##__immediate = (i);				\
+		immediate_update_early();				\
+	} while (0)
+
+/*
+ * Internal update functions.
+ */
+extern void core_immediate_update(void);
+extern void immediate_update_range(const struct __immediate *begin,
+	const struct __immediate *end);
+extern void immediate_update_early(void);
+
+#else
+
+/*
+ * Generic immediate values: a simple, standard, memory load.
+ */
+
+/**
+ * immediate_read - read immediate variable
+ * @name: immediate value name
+ *
+ * Reads the value of @name.
+ */
+#define immediate_read(name)		_immediate_read(name)
+
+/**
+ * immediate_set - set immediate variable (with locking)
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name, taking the module_mutex if required by
+ * the architecture.
+ */
+#define immediate_set(name, i)		(name##__immediate = (i))
+
+/**
+ * _immediate_set - set immediate variable (without locking)
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name. Must be called with module_mutex held.
+ */
+#define _immediate_set(name, i)		immediate_set(name, i)
+
+/**
+ * immediate_set_early - set immediate variable at early boot
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name. Should be used for early boot updates.
+ */
+#define immediate_set_early(name, i)	immediate_set(name, i)
+
+/*
+ * Internal update functions.
+ */
+static inline void immediate_update_early(void)
+{ }
+#endif
+
+#define DECLARE_IMMEDIATE(type, name) extern __typeof__(type) name##__immediate
+#define DEFINE_IMMEDIATE(type, name)  __typeof__(type) name##__immediate
+
+#define EXPORT_IMMEDIATE_SYMBOL(name) EXPORT_SYMBOL(name##__immediate)
+#define EXPORT_IMMEDIATE_SYMBOL_GPL(name) EXPORT_SYMBOL_GPL(name##__immediate)
+
+/**
+ * _immediate_read - Read immediate value with standard memory load.
+ * @name: immediate value name
+ *
+ * Force a data read of the immediate value instead of the immediate value
+ * based mechanism. Useful for __init and __exit section data read.
+ */
+#define _immediate_read(name)		(name##__immediate)
+
+#endif
Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2007-09-17 13:25:06.000000000 -0400
+++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2007-09-17 13:35:50.000000000 -0400
@@ -122,6 +122,13 @@
 		VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .;	\
 	}								\
 									\
+	/* Immediate values: pointers */				\
+	__immediate : AT(ADDR(__immediate) - LOAD_OFFSET) {		\
+		VMLINUX_SYMBOL(__start___immediate) = .;		\
+		*(__immediate)						\
+		VMLINUX_SYMBOL(__stop___immediate) = .;			\
+	}								\
+									\
 	/* Kernel symbol table: strings */				\
         __ksymtab_strings : AT(ADDR(__ksymtab_strings) - LOAD_OFFSET) {	\
 		*(__ksymtab_strings)					\
Index: linux-2.6-lttng/include/linux/module.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/module.h	2007-09-17 13:25:06.000000000 -0400
+++ linux-2.6-lttng/include/linux/module.h	2007-09-17 13:35:50.000000000 -0400
@@ -15,6 +15,7 @@
 #include <linux/stringify.h>
 #include <linux/kobject.h>
 #include <linux/moduleparam.h>
+#include <linux/immediate.h>
 #include <asm/local.h>
 
 #include <asm/module.h>
@@ -370,6 +371,11 @@ struct module
 	/* The command line arguments (may be mangled).  People like
 	   keeping pointers to this stuff */
 	char *args;
+
+#ifdef CONFIG_IMMEDIATE
+	const struct __immediate *immediate;
+	unsigned int num_immediate;
+#endif
 };
 #ifndef MODULE_ARCH_INIT
 #define MODULE_ARCH_INIT {}
@@ -473,6 +479,9 @@ int unregister_module_notifier(struct no
 
 extern void print_modules(void);
 
+extern void _module_immediate_update(void);
+extern void module_immediate_update(void);
+
 #else /* !CONFIG_MODULES... */
 #define EXPORT_SYMBOL(sym)
 #define EXPORT_SYMBOL_GPL(sym)
@@ -572,6 +581,14 @@ static inline void print_modules(void)
 {
 }
 
+static inline void _module_immediate_update(void)
+{
+}
+
+static inline void module_immediate_update(void)
+{
+}
+
 #endif /* CONFIG_MODULES */
 
 struct device_driver;
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2007-09-17 13:25:06.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c	2007-09-17 13:35:51.000000000 -0400
@@ -33,6 +33,7 @@
 #include <linux/cpu.h>
 #include <linux/moduleparam.h>
 #include <linux/errno.h>
+#include <linux/immediate.h>
 #include <linux/err.h>
 #include <linux/vermagic.h>
 #include <linux/notifier.h>
@@ -1717,6 +1718,7 @@ static struct module *load_module(void _
 	unsigned int unusedcrcindex;
 	unsigned int unusedgplindex;
 	unsigned int unusedgplcrcindex;
+	unsigned int immediateindex;
 	struct module *mod;
 	long err = 0;
 	void *percpu = NULL, *ptr = NULL; /* Stops spurious gcc warning */
@@ -1813,6 +1815,7 @@ static struct module *load_module(void _
 #ifdef ARCH_UNWIND_SECTION_NAME
 	unwindex = find_sec(hdr, sechdrs, secstrings, ARCH_UNWIND_SECTION_NAME);
 #endif
+	immediateindex = find_sec(hdr, sechdrs, secstrings, "__immediate");
 
 	/* Don't keep modinfo section */
 	sechdrs[infoindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
@@ -1963,6 +1966,11 @@ static struct module *load_module(void _
 	mod->gpl_future_syms = (void *)sechdrs[gplfutureindex].sh_addr;
 	if (gplfuturecrcindex)
 		mod->gpl_future_crcs = (void *)sechdrs[gplfuturecrcindex].sh_addr;
+#ifdef CONFIG_IMMEDIATE
+	mod->immediate = (void *)sechdrs[immediateindex].sh_addr;
+	mod->num_immediate =
+		sechdrs[immediateindex].sh_size / sizeof(*mod->immediate);
+#endif
 
 	mod->unused_syms = (void *)sechdrs[unusedindex].sh_addr;
 	if (unusedcrcindex)
@@ -2028,6 +2036,12 @@ static struct module *load_module(void _
 		 goto nomodsectinfo;
 #endif
 
+#ifdef CONFIG_IMMEDIATE
+	if (!mod->taints)
+		immediate_update_range(mod->immediate,
+			mod->immediate + mod->num_immediate);
+#endif
+
 	err = module_finalize(hdr, sechdrs, mod);
 	if (err < 0)
 		goto cleanup;
@@ -2629,3 +2643,38 @@ EXPORT_SYMBOL(module_remove_driver);
 void struct_module(struct module *mod) { return; }
 EXPORT_SYMBOL(struct_module);
 #endif
+
+#ifdef CONFIG_IMMEDIATE
+/**
+ * _module_immediate_update - update all immediate values in the kernel
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ * Module_mutex must be held be the caller.
+ */
+void _module_immediate_update(void)
+{
+	struct module *mod;
+
+	list_for_each_entry(mod, &modules, list) {
+		if (mod->taints)
+			continue;
+		immediate_update_range(mod->immediate,
+			mod->immediate + mod->num_immediate);
+	}
+}
+EXPORT_SYMBOL_GPL(_module_immediate_update);
+
+/**
+ * module_immediate_update - update all immediate values in the kernel
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ * Takes module_mutex.
+ */
+void module_immediate_update(void)
+{
+	mutex_lock(&module_mutex);
+	_module_immediate_update();
+	mutex_unlock(&module_mutex);
+}
+EXPORT_SYMBOL_GPL(module_immediate_update);
+#endif
Index: linux-2.6-lttng/kernel/immediate.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/kernel/immediate.c	2007-09-17 13:25:22.000000000 -0400
@@ -0,0 +1,92 @@
+/*
+ * Copyright (C) 2007 Mathieu Desnoyers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/immediate.h>
+#include <linux/memory.h>
+
+extern const struct __immediate __start___immediate[];
+extern const struct __immediate __stop___immediate[];
+
+/*
+ * immediate_mutex nests inside module_mutex. immediate_mutex protects builtin
+ * immediates and module immediates.
+ */
+static DEFINE_MUTEX(immediate_mutex);
+
+/**
+ * immediate_update_range - Update immediate values in a range
+ * @begin: pointer to the beginning of the range
+ * @end: pointer to the end of the range
+ *
+ * Sets a range of immediates to a enabled state : set the enable bit.
+ */
+void immediate_update_range(const struct __immediate *begin,
+		const struct __immediate *end)
+{
+	const struct __immediate *iter;
+	int ret;
+
+	for (iter = begin; iter < end; iter++) {
+		mutex_lock(&immediate_mutex);
+		kernel_text_lock();
+		ret = arch_immediate_update(iter);
+		kernel_text_unlock();
+		if (ret)
+			printk(KERN_WARNING "Invalid immediate value. "
+					    "Variable at %p, "
+					    "instruction at %p, size %lu\n",
+					    (void*)iter->immediate,
+					    (void*)iter->var, iter->size);
+		mutex_unlock(&immediate_mutex);
+	}
+}
+EXPORT_SYMBOL_GPL(immediate_update_range);
+
+/**
+ * immediate_update - update all immediate values in the kernel
+ * @lock: should a module_mutex be taken ?
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ */
+void core_immediate_update(void)
+{
+	/* Core kernel immediates */
+	immediate_update_range(__start___immediate, __stop___immediate);
+}
+EXPORT_SYMBOL_GPL(core_immediate_update);
+
+static void __init immediate_update_early_range(const struct __immediate *begin,
+		const struct __immediate *end)
+{
+	const struct __immediate *iter;
+
+	for (iter = begin; iter < end; iter++)
+		arch_immediate_update_early(iter);
+}
+
+/**
+ * immediate_update_early - Update immediate values at boot time
+ *
+ * Update the immediate values to the state of the variables they refer to. It
+ * is done before SMP is active, at the very beginning of start_kernel().
+ */
+void __init immediate_update_early(void)
+{
+	immediate_update_early_range(__start___immediate, __stop___immediate);
+}
Index: linux-2.6-lttng/init/main.c
===================================================================
--- linux-2.6-lttng.orig/init/main.c	2007-09-17 13:25:06.000000000 -0400
+++ linux-2.6-lttng/init/main.c	2007-09-17 13:25:22.000000000 -0400
@@ -56,6 +56,7 @@
 #include <linux/pid_namespace.h>
 #include <linux/device.h>
 #include <linux/kthread.h>
+#include <linux/immediate.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -525,6 +526,7 @@ asmlinkage void __init start_kernel(void
 	unwind_init();
 	lockdep_init();
 	container_init_early();
+	immediate_update_early();
 
 	local_irq_disable();
 	early_boot_irqs_off();
Index: linux-2.6-lttng/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/kernel/Makefile	2007-09-17 13:25:06.000000000 -0400
+++ linux-2.6-lttng/kernel/Makefile	2007-09-17 13:35:50.000000000 -0400
@@ -61,6 +61,7 @@ obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
 obj-$(CONFIG_RESOURCE_COUNTERS) += res_counter.o
+obj-$(CONFIG_IMMEDIATE) += immediate.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2007-09-18 20:01       ` Denys Vlasenko
@ 2007-09-18 20:47         ` Mathieu Desnoyers
  2007-09-19  8:45           ` Denys Vlasenko
  2007-09-19  8:57           ` Denys Vlasenko
  0 siblings, 2 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2007-09-18 20:47 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: akpm, linux-kernel

* Denys Vlasenko (vda.linux@googlemail.com) wrote:
> On Tuesday 18 September 2007 18:59, Mathieu Desnoyers wrote:
> > * Denys Vlasenko (vda.linux@googlemail.com) wrote:
> > > On Monday 17 September 2007 19:42, Mathieu Desnoyers wrote:
> > > > Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
> > > > ===================================================================
> > > > --- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2007-09-17 13:25:06.000000000 -0400
> > > > +++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2007-09-17 13:35:50.000000000 -0400
> > > > @@ -122,6 +122,13 @@
> > > >  		VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .;	\
> > > >  	}								\
> > > >  									\
> > > > +	/* Immediate values: pointers */				\
> > > > +	__immediate : AT(ADDR(__immediate) - LOAD_OFFSET) {		\
> > > > +		VMLINUX_SYMBOL(__start___immediate) = .;		\
> > > > +		*(__immediate)						\
> > > > +		VMLINUX_SYMBOL(__stop___immediate) = .;			\
> > > > +	}								\
> > > > +									\
> > > 
> > > Why do you need an output section for that? IOW: will this work too?
> > > 
> > > .data : ... { 
> > > ...
> > > 
> > > 		VMLINUX_SYMBOL(__start___immediate) = .;		\
> > > 		*(__immediate)						\
> > > 		VMLINUX_SYMBOL(__stop___immediate) = .;			\
> > > ...
> > > }
> > > 
> > 
> > This last one could cause alignment problems. We either have to use the
> > proper ALIGN() before the section, or let AT(ADDR(__immediate) -
> > LOAD_OFFSET) take care of it. I prefer the latter.
> 
> This adds yet another output section in vmlinux, and there is
> no tools which need that. We already have 30+ sections there while we need ~20.
> 
> I am trying to fix the mess. Please don't add to it.
> 
> Re alignment: (1) do you really realy REALLY need it? Last I checked,
> i386 was handling unaligned accesses just fine; and
> (2) this works:
> 
> 		. = ALIGN(4)
>  		VMLINUX_SYMBOL(__start___immediate) = .;		\
>  		*(__immediate)						\
>  		VMLINUX_SYMBOL(__stop___immediate) = .;			\
> 
> 

Alignment: I need the __start___immediate and __stop___immediate values
to be at the same alignment as the *(__immediate) content, or else we
end up thinking that padding is data.

. = ALIGN(4) works fine as long as the structure within the section is
not bigger or equal to 32 bytes: gcc has the habit to align 32 bytes
structure on 32 bytes multiples. The safest way I found to do it is to
declare the section as I do: it will cause no breakage if anybody append
data to the structure.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2007-09-18 17:59     ` Mathieu Desnoyers
@ 2007-09-18 20:01       ` Denys Vlasenko
  2007-09-18 20:47         ` Mathieu Desnoyers
  0 siblings, 1 reply; 25+ messages in thread
From: Denys Vlasenko @ 2007-09-18 20:01 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel

On Tuesday 18 September 2007 18:59, Mathieu Desnoyers wrote:
> * Denys Vlasenko (vda.linux@googlemail.com) wrote:
> > On Monday 17 September 2007 19:42, Mathieu Desnoyers wrote:
> > > Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
> > > ===================================================================
> > > --- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2007-09-17 13:25:06.000000000 -0400
> > > +++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2007-09-17 13:35:50.000000000 -0400
> > > @@ -122,6 +122,13 @@
> > >  		VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .;	\
> > >  	}								\
> > >  									\
> > > +	/* Immediate values: pointers */				\
> > > +	__immediate : AT(ADDR(__immediate) - LOAD_OFFSET) {		\
> > > +		VMLINUX_SYMBOL(__start___immediate) = .;		\
> > > +		*(__immediate)						\
> > > +		VMLINUX_SYMBOL(__stop___immediate) = .;			\
> > > +	}								\
> > > +									\
> > 
> > Why do you need an output section for that? IOW: will this work too?
> > 
> > .data : ... { 
> > ...
> > 
> > 		VMLINUX_SYMBOL(__start___immediate) = .;		\
> > 		*(__immediate)						\
> > 		VMLINUX_SYMBOL(__stop___immediate) = .;			\
> > ...
> > }
> > 
> 
> This last one could cause alignment problems. We either have to use the
> proper ALIGN() before the section, or let AT(ADDR(__immediate) -
> LOAD_OFFSET) take care of it. I prefer the latter.

This adds yet another output section in vmlinux, and there is
no tools which need that. We already have 30+ sections there while we need ~20.

I am trying to fix the mess. Please don't add to it.

Re alignment: (1) do you really realy REALLY need it? Last I checked,
i386 was handling unaligned accesses just fine; and
(2) this works:

		. = ALIGN(4)
 		VMLINUX_SYMBOL(__start___immediate) = .;		\
 		*(__immediate)						\
 		VMLINUX_SYMBOL(__stop___immediate) = .;			\


--
vda

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2007-09-18 17:47   ` Denys Vlasenko
@ 2007-09-18 17:59     ` Mathieu Desnoyers
  2007-09-18 20:01       ` Denys Vlasenko
  0 siblings, 1 reply; 25+ messages in thread
From: Mathieu Desnoyers @ 2007-09-18 17:59 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: akpm, linux-kernel

* Denys Vlasenko (vda.linux@googlemail.com) wrote:
> On Monday 17 September 2007 19:42, Mathieu Desnoyers wrote:
> > Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
> > ===================================================================
> > --- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2007-09-17 13:25:06.000000000 -0400
> > +++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2007-09-17 13:35:50.000000000 -0400
> > @@ -122,6 +122,13 @@
> >  		VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .;	\
> >  	}								\
> >  									\
> > +	/* Immediate values: pointers */				\
> > +	__immediate : AT(ADDR(__immediate) - LOAD_OFFSET) {		\
> > +		VMLINUX_SYMBOL(__start___immediate) = .;		\
> > +		*(__immediate)						\
> > +		VMLINUX_SYMBOL(__stop___immediate) = .;			\
> > +	}								\
> > +									\
> 
> Why do you need an output section for that? IOW: will this work too?
> 
> .data : ... { 
> ...
> 
> 		VMLINUX_SYMBOL(__start___immediate) = .;		\
> 		*(__immediate)						\
> 		VMLINUX_SYMBOL(__stop___immediate) = .;			\
> ...
> }
> 

This last one could cause alignment problems. We either have to use the
proper ALIGN() before the section, or let AT(ADDR(__immediate) -
LOAD_OFFSET) take care of it. I prefer the latter.

> 
> > Index: linux-2.6-lttng/kernel/module.c
> > ===================================================================
> > --- linux-2.6-lttng.orig/kernel/module.c	2007-09-17 13:25:06.000000000 -0400
> > +++ linux-2.6-lttng/kernel/module.c	2007-09-17 13:35:51.000000000 -0400
> > @@ -33,6 +33,7 @@
> >  #include <linux/cpu.h>
> >  #include <linux/moduleparam.h>
> >  #include <linux/errno.h>
> > +#include <linux/immediate.h>
> >  #include <linux/err.h>
> >  #include <linux/vermagic.h>
> >  #include <linux/notifier.h>
> > @@ -1717,6 +1718,7 @@ static struct module *load_module(void _
> >  	unsigned int unusedcrcindex;
> >  	unsigned int unusedgplindex;
> >  	unsigned int unusedgplcrcindex;
> > +	unsigned int immediateindex;
> >  	struct module *mod;
> >  	long err = 0;
> >  	void *percpu = NULL, *ptr = NULL; /* Stops spurious gcc warning */
> > @@ -1813,6 +1815,7 @@ static struct module *load_module(void _
> >  #ifdef ARCH_UNWIND_SECTION_NAME
> >  	unwindex = find_sec(hdr, sechdrs, secstrings, ARCH_UNWIND_SECTION_NAME);
> >  #endif
> > +	immediateindex = find_sec(hdr, sechdrs, secstrings, "__immediate");
> 
> 
> Do you need to frame immediateindex by #ifdef CONFIG_IMMEDIATE / #endif?

I could, but I have been told to leave the immediateindex there even though
immediate values are configured out. I guess visual hideousness is the
main argument there.

Mathieu

> --
> vda

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [patch 1/7] Immediate Values - Architecture Independent Code
  2007-09-17 18:42 ` [patch 1/7] Immediate Values - Architecture Independent Code Mathieu Desnoyers
@ 2007-09-18 17:47   ` Denys Vlasenko
  2007-09-18 17:59     ` Mathieu Desnoyers
  0 siblings, 1 reply; 25+ messages in thread
From: Denys Vlasenko @ 2007-09-18 17:47 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: akpm, linux-kernel

On Monday 17 September 2007 19:42, Mathieu Desnoyers wrote:
> Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
> ===================================================================
> --- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2007-09-17 13:25:06.000000000 -0400
> +++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2007-09-17 13:35:50.000000000 -0400
> @@ -122,6 +122,13 @@
>  		VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .;	\
>  	}								\
>  									\
> +	/* Immediate values: pointers */				\
> +	__immediate : AT(ADDR(__immediate) - LOAD_OFFSET) {		\
> +		VMLINUX_SYMBOL(__start___immediate) = .;		\
> +		*(__immediate)						\
> +		VMLINUX_SYMBOL(__stop___immediate) = .;			\
> +	}								\
> +									\

Why do you need an output section for that? IOW: will this work too?

.data : ... { 
...

		VMLINUX_SYMBOL(__start___immediate) = .;		\
		*(__immediate)						\
		VMLINUX_SYMBOL(__stop___immediate) = .;			\
...
}


> Index: linux-2.6-lttng/kernel/module.c
> ===================================================================
> --- linux-2.6-lttng.orig/kernel/module.c	2007-09-17 13:25:06.000000000 -0400
> +++ linux-2.6-lttng/kernel/module.c	2007-09-17 13:35:51.000000000 -0400
> @@ -33,6 +33,7 @@
>  #include <linux/cpu.h>
>  #include <linux/moduleparam.h>
>  #include <linux/errno.h>
> +#include <linux/immediate.h>
>  #include <linux/err.h>
>  #include <linux/vermagic.h>
>  #include <linux/notifier.h>
> @@ -1717,6 +1718,7 @@ static struct module *load_module(void _
>  	unsigned int unusedcrcindex;
>  	unsigned int unusedgplindex;
>  	unsigned int unusedgplcrcindex;
> +	unsigned int immediateindex;
>  	struct module *mod;
>  	long err = 0;
>  	void *percpu = NULL, *ptr = NULL; /* Stops spurious gcc warning */
> @@ -1813,6 +1815,7 @@ static struct module *load_module(void _
>  #ifdef ARCH_UNWIND_SECTION_NAME
>  	unwindex = find_sec(hdr, sechdrs, secstrings, ARCH_UNWIND_SECTION_NAME);
>  #endif
> +	immediateindex = find_sec(hdr, sechdrs, secstrings, "__immediate");


Do you need to frame immediateindex by #ifdef CONFIG_IMMEDIATE / #endif?
--
vda

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [patch 1/7] Immediate Values - Architecture Independent Code
  2007-09-17 18:42 [patch 0/7] Immediate Values Mathieu Desnoyers
@ 2007-09-17 18:42 ` Mathieu Desnoyers
  2007-09-18 17:47   ` Denys Vlasenko
  0 siblings, 1 reply; 25+ messages in thread
From: Mathieu Desnoyers @ 2007-09-17 18:42 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Mathieu Desnoyers

[-- Attachment #1: immediate-values-architecture-independent-code.patch --]
[-- Type: text/plain, Size: 17446 bytes --]

Immediate values are used as read mostly variables that are rarely updated. They
use code patching to modify the values inscribed in the instruction stream. It
provides a way to save precious cache lines that would otherwise have to be used
by these variables.

There is a generic _immediate_read() version, which uses standard global
variables, and optimized per architecture immediate_read() implementations,
which use a load immediate to remove a data cache hit. When the immediate values
functionnality is disabled in the kernel, it falls back to global variables.

It adds a new rodata section "__immediate" to place the pointers to the enable
value. Immediate values activation functions sits in kernel/immediate.c.

Immediate values refer to the memory address of a previously declared integer.
This integer holds the information about the state of the immediate values
associated, and must be accessed through the API found in linux/immediate.h.

At module load time, each immediate value is checked to see if it must be
enabled. It would be the case if the variable they refer to is exported from
another module and already enabled.

In the early stages of start_kernel(), the immediate values are updated to
reflect the state of the variable they refer to.

* Why should this be merged *

It improves performances on heavy memory I/O workloads.

An interesting result shows the potential this infrastructure has by
showing the slowdown a simple system call such as getppid() suffers when it is
used under heavy user-space cache trashing:

Random walk L1 and L2 trashing surrounding a getppid() call:
(note: in this test, do_syscal_trace was taken at each system call, see
Documentation/immediate.txt in these patches for details)
- No memory pressure :   getppid() takes  1573 cycles
- With memory pressure : getppid() takes 15589 cycles

We therefore have a slowdown of 10 times just to get the kernel variables from
memory. Another test on the same architecture (Intel P4) measured the memory
latency to be 559 cycles. Therefore, each cache line removed from the hot path
would improve the syscall time of 3.5% in these conditions.

Changelog:

- section __immediate is already SHF_ALLOC
- Because of the wonders of ELF, section 0 has sh_addr and sh_size 0.  So
  the if (immediateindex) is unnecessary here.
- Remove module_mutex usage: depend on functions implemented in module.c for
  that.
- Does not update tainted module's immediate values.
- remove immediate_*_t types, add DECLARE_IMMEDIATE() and DEFINE_IMMEDIATE().
  - immediate_read(&var) becomes immediate_read(var) because of this.
- Adding a new EXPORT_IMMEDIATE_SYMBOL(_GPL).
- remove immediate_if(). Should use if (unlikely(immediate_read(var))) instead.
  - Wait until we have gcc support before we add the immediate_if macro, since
    its form may have to change.
- Document immediate_set_early(). This design is chosen over immediate_init()
  because:
    - We can mark the function __init so it is freed after boot.
    - Module code patching can also be done by the "normal" function. This is
      preferred, even if it can be slightly slower, because update performance
      does not matter.
    - immediate_init() could confuse users because we can actually "initialize"
      an immediate value to a certain value, which will be used as initial
      value. e.g.:
        static DEFINE_IMMEDIATE(int, var) = 10;

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 include/asm-generic/vmlinux.lds.h |    7 ++
 include/linux/immediate.h         |  133 ++++++++++++++++++++++++++++++++++++++
 include/linux/module.h            |   17 ++++
 init/main.c                       |    2 
 kernel/Makefile                   |    1 
 kernel/immediate.c                |   92 ++++++++++++++++++++++++++
 kernel/module.c                   |   49 ++++++++++++++
 7 files changed, 301 insertions(+)

Index: linux-2.6-lttng/include/linux/immediate.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/immediate.h	2007-09-17 13:38:32.000000000 -0400
@@ -0,0 +1,133 @@
+#ifndef _LINUX_IMMEDIATE_H
+#define _LINUX_IMMEDIATE_H
+
+/*
+ * Immediate values, can be updated at runtime and save cache lines.
+ *
+ * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#ifdef CONFIG_IMMEDIATE
+
+#include <asm/immediate.h>
+
+/**
+ * immediate_set - set immediate variable (with locking)
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name, taking the module_mutex if required by
+ * the architecture.
+ */
+#define immediate_set(name, i)						\
+	do {								\
+		name##__immediate = (i);				\
+		core_immediate_update();				\
+		module_immediate_update();				\
+	} while (0)
+
+
+/**
+ * _immediate_set - set immediate variable (without locking)
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name. Must be called with module_mutex held.
+ */
+#define _immediate_set(name, i)						\
+	do {								\
+		name##__immediate = (i);				\
+		core_immediate_update();				\
+		_module_immediate_update();				\
+	} while (0)
+
+/**
+ * immediate_set_early - set immediate variable at early boot
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name. Should be used for updates at early boot, when only
+ * one CPU is active and interrupts are disabled.
+ */
+#define immediate_set_early(name, i)					\
+	do {								\
+		name##__immediate = (i);				\
+		immediate_update_early();				\
+	} while (0)
+
+/*
+ * Internal update functions.
+ */
+extern void core_immediate_update(void);
+extern void immediate_update_range(const struct __immediate *begin,
+	const struct __immediate *end);
+extern void immediate_update_early(void);
+
+#else
+
+/*
+ * Generic immediate values: a simple, standard, memory load.
+ */
+
+/**
+ * immediate_read - read immediate variable
+ * @name: immediate value name
+ *
+ * Reads the value of @name.
+ */
+#define immediate_read(name)		_immediate_read(name)
+
+/**
+ * immediate_set - set immediate variable (with locking)
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name, taking the module_mutex if required by
+ * the architecture.
+ */
+#define immediate_set(name, i)		(name##__immediate = (i))
+
+/**
+ * _immediate_set - set immediate variable (without locking)
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name. Must be called with module_mutex held.
+ */
+#define _immediate_set(name, i)		immediate_set(name, i)
+
+/**
+ * immediate_set_early - set immediate variable at early boot
+ * @name: immediate value name
+ * @i: required value
+ *
+ * Sets the value of @name. Should be used for early boot updates.
+ */
+#define immediate_set_early(name, i)	immediate_set(name, i)
+
+/*
+ * Internal update functions.
+ */
+static inline void immediate_update_early(void)
+{ }
+#endif
+
+#define DECLARE_IMMEDIATE(type, name) extern __typeof__(type) name##__immediate
+#define DEFINE_IMMEDIATE(type, name)  __typeof__(type) name##__immediate
+
+#define EXPORT_IMMEDIATE_SYMBOL(name) EXPORT_SYMBOL(name##__immediate)
+#define EXPORT_IMMEDIATE_SYMBOL_GPL(name) EXPORT_SYMBOL_GPL(name##__immediate)
+
+/**
+ * _immediate_read - Read immediate value with standard memory load.
+ * @name: immediate value name
+ *
+ * Force a data read of the immediate value instead of the immediate value
+ * based mechanism. Useful for __init and __exit section data read.
+ */
+#define _immediate_read(name)		(name##__immediate)
+
+#endif
Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h	2007-09-17 13:25:06.000000000 -0400
+++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h	2007-09-17 13:35:50.000000000 -0400
@@ -122,6 +122,13 @@
 		VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .;	\
 	}								\
 									\
+	/* Immediate values: pointers */				\
+	__immediate : AT(ADDR(__immediate) - LOAD_OFFSET) {		\
+		VMLINUX_SYMBOL(__start___immediate) = .;		\
+		*(__immediate)						\
+		VMLINUX_SYMBOL(__stop___immediate) = .;			\
+	}								\
+									\
 	/* Kernel symbol table: strings */				\
         __ksymtab_strings : AT(ADDR(__ksymtab_strings) - LOAD_OFFSET) {	\
 		*(__ksymtab_strings)					\
Index: linux-2.6-lttng/include/linux/module.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/module.h	2007-09-17 13:25:06.000000000 -0400
+++ linux-2.6-lttng/include/linux/module.h	2007-09-17 13:35:50.000000000 -0400
@@ -15,6 +15,7 @@
 #include <linux/stringify.h>
 #include <linux/kobject.h>
 #include <linux/moduleparam.h>
+#include <linux/immediate.h>
 #include <asm/local.h>
 
 #include <asm/module.h>
@@ -370,6 +371,11 @@ struct module
 	/* The command line arguments (may be mangled).  People like
 	   keeping pointers to this stuff */
 	char *args;
+
+#ifdef CONFIG_IMMEDIATE
+	const struct __immediate *immediate;
+	unsigned int num_immediate;
+#endif
 };
 #ifndef MODULE_ARCH_INIT
 #define MODULE_ARCH_INIT {}
@@ -473,6 +479,9 @@ int unregister_module_notifier(struct no
 
 extern void print_modules(void);
 
+extern void _module_immediate_update(void);
+extern void module_immediate_update(void);
+
 #else /* !CONFIG_MODULES... */
 #define EXPORT_SYMBOL(sym)
 #define EXPORT_SYMBOL_GPL(sym)
@@ -572,6 +581,14 @@ static inline void print_modules(void)
 {
 }
 
+static inline void _module_immediate_update(void)
+{
+}
+
+static inline void module_immediate_update(void)
+{
+}
+
 #endif /* CONFIG_MODULES */
 
 struct device_driver;
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c	2007-09-17 13:25:06.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c	2007-09-17 13:35:51.000000000 -0400
@@ -33,6 +33,7 @@
 #include <linux/cpu.h>
 #include <linux/moduleparam.h>
 #include <linux/errno.h>
+#include <linux/immediate.h>
 #include <linux/err.h>
 #include <linux/vermagic.h>
 #include <linux/notifier.h>
@@ -1717,6 +1718,7 @@ static struct module *load_module(void _
 	unsigned int unusedcrcindex;
 	unsigned int unusedgplindex;
 	unsigned int unusedgplcrcindex;
+	unsigned int immediateindex;
 	struct module *mod;
 	long err = 0;
 	void *percpu = NULL, *ptr = NULL; /* Stops spurious gcc warning */
@@ -1813,6 +1815,7 @@ static struct module *load_module(void _
 #ifdef ARCH_UNWIND_SECTION_NAME
 	unwindex = find_sec(hdr, sechdrs, secstrings, ARCH_UNWIND_SECTION_NAME);
 #endif
+	immediateindex = find_sec(hdr, sechdrs, secstrings, "__immediate");
 
 	/* Don't keep modinfo section */
 	sechdrs[infoindex].sh_flags &= ~(unsigned long)SHF_ALLOC;
@@ -1963,6 +1966,11 @@ static struct module *load_module(void _
 	mod->gpl_future_syms = (void *)sechdrs[gplfutureindex].sh_addr;
 	if (gplfuturecrcindex)
 		mod->gpl_future_crcs = (void *)sechdrs[gplfuturecrcindex].sh_addr;
+#ifdef CONFIG_IMMEDIATE
+	mod->immediate = (void *)sechdrs[immediateindex].sh_addr;
+	mod->num_immediate =
+		sechdrs[immediateindex].sh_size / sizeof(*mod->immediate);
+#endif
 
 	mod->unused_syms = (void *)sechdrs[unusedindex].sh_addr;
 	if (unusedcrcindex)
@@ -2028,6 +2036,12 @@ static struct module *load_module(void _
 		 goto nomodsectinfo;
 #endif
 
+#ifdef CONFIG_IMMEDIATE
+	if (!mod->taints)
+		immediate_update_range(mod->immediate,
+			mod->immediate + mod->num_immediate);
+#endif
+
 	err = module_finalize(hdr, sechdrs, mod);
 	if (err < 0)
 		goto cleanup;
@@ -2629,3 +2643,38 @@ EXPORT_SYMBOL(module_remove_driver);
 void struct_module(struct module *mod) { return; }
 EXPORT_SYMBOL(struct_module);
 #endif
+
+#ifdef CONFIG_IMMEDIATE
+/**
+ * _module_immediate_update - update all immediate values in the kernel
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ * Module_mutex must be held be the caller.
+ */
+void _module_immediate_update(void)
+{
+	struct module *mod;
+
+	list_for_each_entry(mod, &modules, list) {
+		if (mod->taints)
+			continue;
+		immediate_update_range(mod->immediate,
+			mod->immediate + mod->num_immediate);
+	}
+}
+EXPORT_SYMBOL_GPL(_module_immediate_update);
+
+/**
+ * module_immediate_update - update all immediate values in the kernel
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ * Takes module_mutex.
+ */
+void module_immediate_update(void)
+{
+	mutex_lock(&module_mutex);
+	_module_immediate_update();
+	mutex_unlock(&module_mutex);
+}
+EXPORT_SYMBOL_GPL(module_immediate_update);
+#endif
Index: linux-2.6-lttng/kernel/immediate.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/kernel/immediate.c	2007-09-17 13:25:22.000000000 -0400
@@ -0,0 +1,92 @@
+/*
+ * Copyright (C) 2007 Mathieu Desnoyers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ */
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/immediate.h>
+#include <linux/memory.h>
+
+extern const struct __immediate __start___immediate[];
+extern const struct __immediate __stop___immediate[];
+
+/*
+ * immediate_mutex nests inside module_mutex. immediate_mutex protects builtin
+ * immediates and module immediates.
+ */
+static DEFINE_MUTEX(immediate_mutex);
+
+/**
+ * immediate_update_range - Update immediate values in a range
+ * @begin: pointer to the beginning of the range
+ * @end: pointer to the end of the range
+ *
+ * Sets a range of immediates to a enabled state : set the enable bit.
+ */
+void immediate_update_range(const struct __immediate *begin,
+		const struct __immediate *end)
+{
+	const struct __immediate *iter;
+	int ret;
+
+	for (iter = begin; iter < end; iter++) {
+		mutex_lock(&immediate_mutex);
+		kernel_text_lock();
+		ret = arch_immediate_update(iter);
+		kernel_text_unlock();
+		if (ret)
+			printk(KERN_WARNING "Invalid immediate value. "
+					    "Variable at %p, "
+					    "instruction at %p, size %lu\n",
+					    (void*)iter->immediate,
+					    (void*)iter->var, iter->size);
+		mutex_unlock(&immediate_mutex);
+	}
+}
+EXPORT_SYMBOL_GPL(immediate_update_range);
+
+/**
+ * immediate_update - update all immediate values in the kernel
+ * @lock: should a module_mutex be taken ?
+ *
+ * Iterate on the kernel core and modules to update the immediate values.
+ */
+void core_immediate_update(void)
+{
+	/* Core kernel immediates */
+	immediate_update_range(__start___immediate, __stop___immediate);
+}
+EXPORT_SYMBOL_GPL(core_immediate_update);
+
+static void __init immediate_update_early_range(const struct __immediate *begin,
+		const struct __immediate *end)
+{
+	const struct __immediate *iter;
+
+	for (iter = begin; iter < end; iter++)
+		arch_immediate_update_early(iter);
+}
+
+/**
+ * immediate_update_early - Update immediate values at boot time
+ *
+ * Update the immediate values to the state of the variables they refer to. It
+ * is done before SMP is active, at the very beginning of start_kernel().
+ */
+void __init immediate_update_early(void)
+{
+	immediate_update_early_range(__start___immediate, __stop___immediate);
+}
Index: linux-2.6-lttng/init/main.c
===================================================================
--- linux-2.6-lttng.orig/init/main.c	2007-09-17 13:25:06.000000000 -0400
+++ linux-2.6-lttng/init/main.c	2007-09-17 13:25:22.000000000 -0400
@@ -56,6 +56,7 @@
 #include <linux/pid_namespace.h>
 #include <linux/device.h>
 #include <linux/kthread.h>
+#include <linux/immediate.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -525,6 +526,7 @@ asmlinkage void __init start_kernel(void
 	unwind_init();
 	lockdep_init();
 	container_init_early();
+	immediate_update_early();
 
 	local_irq_disable();
 	early_boot_irqs_off();
Index: linux-2.6-lttng/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/kernel/Makefile	2007-09-17 13:25:06.000000000 -0400
+++ linux-2.6-lttng/kernel/Makefile	2007-09-17 13:35:50.000000000 -0400
@@ -61,6 +61,7 @@ obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
 obj-$(CONFIG_RESOURCE_COUNTERS) += res_counter.o
+obj-$(CONFIG_IMMEDIATE) += immediate.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2008-02-28 16:51 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-06  2:07 [patch 0/7] Immediate Values (redux) for 2.6.24-rc4-git3 Mathieu Desnoyers
2007-12-06  2:07 ` [patch 1/7] Immediate Values - Architecture Independent Code Mathieu Desnoyers
2007-12-06  2:08 ` [patch 2/7] Immediate Values - Kconfig menu in EMBEDDED Mathieu Desnoyers
2007-12-06  2:08 ` [patch 3/7] x86: add <asm/asm.h> Mathieu Desnoyers
2007-12-06  2:08 ` [patch 4/7] Immediate Values - x86 Optimization Mathieu Desnoyers
2007-12-06  2:08 ` [patch 5/7] Add text_poke and sync_core to powerpc Mathieu Desnoyers
2007-12-06  2:08 ` [patch 6/7] Immediate Values - Powerpc Optimization Mathieu Desnoyers
2007-12-06  2:08 ` [patch 7/7] Immediate Values - Documentation Mathieu Desnoyers
  -- strict thread matches above, loose matches on Subject: below --
2008-02-02 21:08 [patch 0/7] Immediate Values Mathieu Desnoyers
2008-02-02 21:08 ` [patch 1/7] Immediate Values - Architecture Independent Code Mathieu Desnoyers
2008-02-26 22:52   ` Jason Baron
2008-02-26 23:12     ` Mathieu Desnoyers
2008-02-26 23:34       ` Mathieu Desnoyers
2008-02-27 16:44         ` Jason Baron
2008-02-27 17:01       ` Jason Baron
2008-02-27 19:05     ` Mathieu Desnoyers
2008-02-28 16:50       ` Jason Baron
2007-09-18 21:07 [patch 0/7] Immediate Values for 2.6.23-rc6-mm1 Mathieu Desnoyers
2007-09-18 21:07 ` [patch 1/7] Immediate Values - Architecture Independent Code Mathieu Desnoyers
2007-09-17 18:42 [patch 0/7] Immediate Values Mathieu Desnoyers
2007-09-17 18:42 ` [patch 1/7] Immediate Values - Architecture Independent Code Mathieu Desnoyers
2007-09-18 17:47   ` Denys Vlasenko
2007-09-18 17:59     ` Mathieu Desnoyers
2007-09-18 20:01       ` Denys Vlasenko
2007-09-18 20:47         ` Mathieu Desnoyers
2007-09-19  8:45           ` Denys Vlasenko
2007-09-19  8:57           ` Denys Vlasenko
2007-09-19 11:27             ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).